diff --git a/modules/image/classification/esnet_x0_25_imagenet/README.md b/modules/image/classification/esnet_x0_25_imagenet/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..a87a9ee530d6a211fe91c7504faafc8e4f68168e
--- /dev/null
+++ b/modules/image/classification/esnet_x0_25_imagenet/README.md
@@ -0,0 +1,133 @@
+# esnet_x0_25_imagenet
+
+|模型名称|esnet_x0_25_imagenet|
+| :--- | :---: |
+|类别|图像-图像分类|
+|网络|ESNet|
+|数据集|ImageNet-2012|
+|是否支持Fine-tuning|否|
+|模型大小|10 MB|
+|最新更新日期|2022-04-02|
+|数据指标|Acc|
+
+
+## 一、模型基本信息
+
+
+
+- ### 模型介绍
+
+ - ESNet(Enhanced ShuffleNet)是百度自研的一个轻量级网络,该网络在 ShuffleNetV2 的基础上融合了 MobileNetV3、GhostNet、PPLCNet 的优点,组合成了一个在 ARM 设备上速度更快、精度更高的网络,由于其出色的表现,所以在 PaddleDetection 推出的 [PP-PicoDet](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/picodet) 使用了该模型做 backbone,配合更强的目标检测算法,最终的指标一举刷新了目标检测模型在 ARM 设备上的 SOTA 指标。该模型为模型规模参数scale为x0.25下的ESNet模型。
+
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 1.6.2
+
+ - paddlehub >= 1.6.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
+
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install esnet_x0_25_imagenet
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ $ hub run esnet_x0_25_imagenet --input_path "/PATH/TO/IMAGE"
+ ```
+ - 通过命令行方式实现分类模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ classifier = hub.Module(name="esnet_x0_25_imagenet")
+ result = classifier.classification(images=[cv2.imread('/PATH/TO/IMAGE')])
+ # or
+ # result = classifier.classification(paths=['/PATH/TO/IMAGE'])
+ ```
+
+- ### 3、API
+
+
+ - ```python
+ def classification(images=None,
+ paths=None,
+ batch_size=1,
+ use_gpu=False,
+ top_k=1):
+ ```
+ - 分类接口API。
+ - **参数**
+
+ - images (list\[numpy.ndarray\]): 图片数据,每一个图片数据的shape 均为 \[H, W, C\],颜色空间为 BGR;
+ - paths (list\[str\]): 图片的路径;
+ - batch\_size (int): batch 的大小;
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
+ - top\_k (int): 返回预测结果的前 k 个。
+
+ - **返回**
+
+ - res (list\[dict\]): 分类结果,列表的每一个元素均为字典,其中 key 包括'class_ids'(种类索引), 'scores'(置信度) 和 'label_names'(种类名称)
+
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个图像识别的在线服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+ - ```shell
+ $ hub serving start -m esnet_x0_25_imagenet
+ ```
+
+ - 这样就完成了一个图像识别的在线服务的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ # 发送HTTP请求
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+ headers = {"Content-type": "application/json"\}
+ url = "http://127.0.0.1:8866/predict/esnet_x0_25_imagenet"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # 打印预测结果
+ print(r.json()["results"])
+ ```
+
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+
+ - ```shell
+ $ hub install esnet_x0_25_imagenet==1.0.0
+ ```
diff --git a/modules/image/classification/esnet_x0_25_imagenet/model.py b/modules/image/classification/esnet_x0_25_imagenet/model.py
new file mode 100644
index 0000000000000000000000000000000000000000..a2384403f29d18f1602c24233a1c4d6dc9df713d
--- /dev/null
+++ b/modules/image/classification/esnet_x0_25_imagenet/model.py
@@ -0,0 +1,506 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import math
+from typing import Any
+from typing import Callable
+from typing import Dict
+from typing import List
+from typing import Tuple
+from typing import Union
+
+import paddle
+import paddle.nn as nn
+from paddle import concat
+from paddle import ParamAttr
+from paddle import reshape
+from paddle import split
+from paddle import transpose
+from paddle.nn import AdaptiveAvgPool2D
+from paddle.nn import BatchNorm
+from paddle.nn import Conv2D
+from paddle.nn import Dropout
+from paddle.nn import Linear
+from paddle.nn import MaxPool2D
+from paddle.nn.initializer import KaimingNormal
+from paddle.regularizer import L2Decay
+
+MODEL_STAGES_PATTERN = {"ESNet": ["blocks[2]", "blocks[9]", "blocks[12]"]}
+
+
+class Identity(nn.Layer):
+
+ def __init__(self):
+ super(Identity, self).__init__()
+
+ def forward(self, inputs):
+ return inputs
+
+
+class TheseusLayer(nn.Layer):
+
+ def __init__(self, *args, **kwargs):
+ super(TheseusLayer, self).__init__()
+ self.res_dict = {}
+ self.res_name = self.full_name()
+ self.pruner = None
+ self.quanter = None
+
+ def _return_dict_hook(self, layer, input, output):
+ res_dict = {"output": output}
+ # 'list' is needed to avoid error raised by popping self.res_dict
+ for res_key in list(self.res_dict):
+ # clear the res_dict because the forward process may change according to input
+ res_dict[res_key] = self.res_dict.pop(res_key)
+ return res_dict
+
+ def init_res(self, stages_pattern, return_patterns=None, return_stages=None):
+ if return_patterns and return_stages:
+ msg = f"The 'return_patterns' would be ignored when 'return_stages' is set."
+ return_stages = None
+
+ if return_stages is True:
+ return_patterns = stages_pattern
+ # return_stages is int or bool
+ if type(return_stages) is int:
+ return_stages = [return_stages]
+ if isinstance(return_stages, list):
+ if max(return_stages) > len(stages_pattern) or min(return_stages) < 0:
+ msg = f"The 'return_stages' set error. Illegal value(s) have been ignored. The stages' pattern list is {stages_pattern}."
+ return_stages = [val for val in return_stages if val >= 0 and val < len(stages_pattern)]
+ return_patterns = [stages_pattern[i] for i in return_stages]
+
+ if return_patterns:
+ self.update_res(return_patterns)
+
+ def replace_sub(self, *args, **kwargs) -> None:
+ msg = "The function 'replace_sub()' is deprecated, please use 'upgrade_sublayer()' instead."
+ raise DeprecationWarning(msg)
+
+ def upgrade_sublayer(self, layer_name_pattern: Union[str, List[str]],
+ handle_func: Callable[[nn.Layer, str], nn.Layer]) -> Dict[str, nn.Layer]:
+ """use 'handle_func' to modify the sub-layer(s) specified by 'layer_name_pattern'.
+
+ Args:
+ layer_name_pattern (Union[str, List[str]]): The name of layer to be modified by 'handle_func'.
+ handle_func (Callable[[nn.Layer, str], nn.Layer]): The function to modify target layer specified by 'layer_name_pattern'. The formal params are the layer(nn.Layer) and pattern(str) that is (a member of) layer_name_pattern (when layer_name_pattern is List type). And the return is the layer processed.
+
+ Returns:
+ Dict[str, nn.Layer]: The key is the pattern and corresponding value is the result returned by 'handle_func()'.
+
+ Examples:
+
+ from paddle import nn
+ import paddleclas
+
+ def rep_func(layer: nn.Layer, pattern: str):
+ new_layer = nn.Conv2D(
+ in_channels=layer._in_channels,
+ out_channels=layer._out_channels,
+ kernel_size=5,
+ padding=2
+ )
+ return new_layer
+
+ net = paddleclas.MobileNetV1()
+ res = net.replace_sub(layer_name_pattern=["blocks[11].depthwise_conv.conv", "blocks[12].depthwise_conv.conv"], handle_func=rep_func)
+ print(res)
+ # {'blocks[11].depthwise_conv.conv': the corresponding new_layer, 'blocks[12].depthwise_conv.conv': the corresponding new_layer}
+ """
+
+ if not isinstance(layer_name_pattern, list):
+ layer_name_pattern = [layer_name_pattern]
+
+ hit_layer_pattern_list = []
+ for pattern in layer_name_pattern:
+ # parse pattern to find target layer and its parent
+ layer_list = parse_pattern_str(pattern=pattern, parent_layer=self)
+ if not layer_list:
+ continue
+ sub_layer_parent = layer_list[-2]["layer"] if len(layer_list) > 1 else self
+
+ sub_layer = layer_list[-1]["layer"]
+ sub_layer_name = layer_list[-1]["name"]
+ sub_layer_index = layer_list[-1]["index"]
+
+ new_sub_layer = handle_func(sub_layer, pattern)
+
+ if sub_layer_index:
+ getattr(sub_layer_parent, sub_layer_name)[sub_layer_index] = new_sub_layer
+ else:
+ setattr(sub_layer_parent, sub_layer_name, new_sub_layer)
+
+ hit_layer_pattern_list.append(pattern)
+ return hit_layer_pattern_list
+
+ def stop_after(self, stop_layer_name: str) -> bool:
+ """stop forward and backward after 'stop_layer_name'.
+
+ Args:
+ stop_layer_name (str): The name of layer that stop forward and backward after this layer.
+
+ Returns:
+ bool: 'True' if successful, 'False' otherwise.
+ """
+
+ layer_list = parse_pattern_str(stop_layer_name, self)
+ if not layer_list:
+ return False
+
+ parent_layer = self
+ for layer_dict in layer_list:
+ name, index = layer_dict["name"], layer_dict["index"]
+ if not set_identity(parent_layer, name, index):
+ msg = f"Failed to set the layers that after stop_layer_name('{stop_layer_name}') to IdentityLayer. The error layer's name is '{name}'."
+ return False
+ parent_layer = layer_dict["layer"]
+
+ return True
+
+ def update_res(self, return_patterns: Union[str, List[str]]) -> Dict[str, nn.Layer]:
+ """update the result(s) to be returned.
+
+ Args:
+ return_patterns (Union[str, List[str]]): The name of layer to return output.
+
+ Returns:
+ Dict[str, nn.Layer]: The pattern(str) and corresponding layer(nn.Layer) that have been set successfully.
+ """
+
+ # clear res_dict that could have been set
+ self.res_dict = {}
+
+ class Handler(object):
+
+ def __init__(self, res_dict):
+ # res_dict is a reference
+ self.res_dict = res_dict
+
+ def __call__(self, layer, pattern):
+ layer.res_dict = self.res_dict
+ layer.res_name = pattern
+ if hasattr(layer, "hook_remove_helper"):
+ layer.hook_remove_helper.remove()
+ layer.hook_remove_helper = layer.register_forward_post_hook(save_sub_res_hook)
+ return layer
+
+ handle_func = Handler(self.res_dict)
+
+ hit_layer_pattern_list = self.upgrade_sublayer(return_patterns, handle_func=handle_func)
+
+ if hasattr(self, "hook_remove_helper"):
+ self.hook_remove_helper.remove()
+ self.hook_remove_helper = self.register_forward_post_hook(self._return_dict_hook)
+
+ return hit_layer_pattern_list
+
+
+def save_sub_res_hook(layer, input, output):
+ layer.res_dict[layer.res_name] = output
+
+
+def set_identity(parent_layer: nn.Layer, layer_name: str, layer_index: str = None) -> bool:
+ """set the layer specified by layer_name and layer_index to Indentity.
+
+ Args:
+ parent_layer (nn.Layer): The parent layer of target layer specified by layer_name and layer_index.
+ layer_name (str): The name of target layer to be set to Indentity.
+ layer_index (str, optional): The index of target layer to be set to Indentity in parent_layer. Defaults to None.
+
+ Returns:
+ bool: True if successfully, False otherwise.
+ """
+
+ stop_after = False
+ for sub_layer_name in parent_layer._sub_layers:
+ if stop_after:
+ parent_layer._sub_layers[sub_layer_name] = Identity()
+ continue
+ if sub_layer_name == layer_name:
+ stop_after = True
+
+ if layer_index and stop_after:
+ stop_after = False
+ for sub_layer_index in parent_layer._sub_layers[layer_name]._sub_layers:
+ if stop_after:
+ parent_layer._sub_layers[layer_name][sub_layer_index] = Identity()
+ continue
+ if layer_index == sub_layer_index:
+ stop_after = True
+
+ return stop_after
+
+
+def parse_pattern_str(pattern: str, parent_layer: nn.Layer) -> Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]:
+ """parse the string type pattern.
+
+ Args:
+ pattern (str): The pattern to discribe layer.
+ parent_layer (nn.Layer): The root layer relative to the pattern.
+
+ Returns:
+ Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]: None if failed. If successfully, the members are layers parsed in order:
+ [
+ {"layer": first layer, "name": first layer's name parsed, "index": first layer's index parsed if exist},
+ {"layer": second layer, "name": second layer's name parsed, "index": second layer's index parsed if exist},
+ ...
+ ]
+ """
+
+ pattern_list = pattern.split(".")
+ if not pattern_list:
+ msg = f"The pattern('{pattern}') is illegal. Please check and retry."
+ return None
+
+ layer_list = []
+ while len(pattern_list) > 0:
+ if '[' in pattern_list[0]:
+ target_layer_name = pattern_list[0].split('[')[0]
+ target_layer_index = pattern_list[0].split('[')[1].split(']')[0]
+ else:
+ target_layer_name = pattern_list[0]
+ target_layer_index = None
+
+ target_layer = getattr(parent_layer, target_layer_name, None)
+
+ if target_layer is None:
+ msg = f"Not found layer named('{target_layer_name}') specifed in pattern('{pattern}')."
+ return None
+
+ if target_layer_index and target_layer:
+ if int(target_layer_index) < 0 or int(target_layer_index) >= len(target_layer):
+ msg = f"Not found layer by index('{target_layer_index}') specifed in pattern('{pattern}'). The index should < {len(target_layer)} and > 0."
+ return None
+
+ target_layer = target_layer[target_layer_index]
+
+ layer_list.append({"layer": target_layer, "name": target_layer_name, "index": target_layer_index})
+
+ pattern_list = pattern_list[1:]
+ parent_layer = target_layer
+ return layer_list
+
+
+def channel_shuffle(x, groups):
+ batch_size, num_channels, height, width = x.shape[0:4]
+ channels_per_group = num_channels // groups
+ x = reshape(x=x, shape=[batch_size, groups, channels_per_group, height, width])
+ x = transpose(x=x, perm=[0, 2, 1, 3, 4])
+ x = reshape(x=x, shape=[batch_size, num_channels, height, width])
+ return x
+
+
+def make_divisible(v, divisor=8, min_value=None):
+ if min_value is None:
+ min_value = divisor
+ new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
+ if new_v < 0.9 * v:
+ new_v += divisor
+ return new_v
+
+
+class ConvBNLayer(TheseusLayer):
+
+ def __init__(self, in_channels, out_channels, kernel_size, stride=1, groups=1, if_act=True):
+ super().__init__()
+ self.conv = Conv2D(in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=kernel_size,
+ stride=stride,
+ padding=(kernel_size - 1) // 2,
+ groups=groups,
+ weight_attr=ParamAttr(initializer=KaimingNormal()),
+ bias_attr=False)
+
+ self.bn = BatchNorm(out_channels,
+ param_attr=ParamAttr(regularizer=L2Decay(0.0)),
+ bias_attr=ParamAttr(regularizer=L2Decay(0.0)))
+ self.if_act = if_act
+ self.hardswish = nn.Hardswish()
+
+ def forward(self, x):
+ x = self.conv(x)
+ x = self.bn(x)
+ if self.if_act:
+ x = self.hardswish(x)
+ return x
+
+
+class SEModule(TheseusLayer):
+
+ def __init__(self, channel, reduction=4):
+ super().__init__()
+ self.avg_pool = AdaptiveAvgPool2D(1)
+ self.conv1 = Conv2D(in_channels=channel, out_channels=channel // reduction, kernel_size=1, stride=1, padding=0)
+ self.relu = nn.ReLU()
+ self.conv2 = Conv2D(in_channels=channel // reduction, out_channels=channel, kernel_size=1, stride=1, padding=0)
+ self.hardsigmoid = nn.Hardsigmoid()
+
+ def forward(self, x):
+ identity = x
+ x = self.avg_pool(x)
+ x = self.conv1(x)
+ x = self.relu(x)
+ x = self.conv2(x)
+ x = self.hardsigmoid(x)
+ x = paddle.multiply(x=identity, y=x)
+ return x
+
+
+class ESBlock1(TheseusLayer):
+
+ def __init__(self, in_channels, out_channels):
+ super().__init__()
+ self.pw_1_1 = ConvBNLayer(in_channels=in_channels // 2, out_channels=out_channels // 2, kernel_size=1, stride=1)
+ self.dw_1 = ConvBNLayer(in_channels=out_channels // 2,
+ out_channels=out_channels // 2,
+ kernel_size=3,
+ stride=1,
+ groups=out_channels // 2,
+ if_act=False)
+ self.se = SEModule(out_channels)
+
+ self.pw_1_2 = ConvBNLayer(in_channels=out_channels, out_channels=out_channels // 2, kernel_size=1, stride=1)
+
+ def forward(self, x):
+ x1, x2 = split(x, num_or_sections=[x.shape[1] // 2, x.shape[1] // 2], axis=1)
+ x2 = self.pw_1_1(x2)
+ x3 = self.dw_1(x2)
+ x3 = concat([x2, x3], axis=1)
+ x3 = self.se(x3)
+ x3 = self.pw_1_2(x3)
+ x = concat([x1, x3], axis=1)
+ return channel_shuffle(x, 2)
+
+
+class ESBlock2(TheseusLayer):
+
+ def __init__(self, in_channels, out_channels):
+ super().__init__()
+
+ # branch1
+ self.dw_1 = ConvBNLayer(in_channels=in_channels,
+ out_channels=in_channels,
+ kernel_size=3,
+ stride=2,
+ groups=in_channels,
+ if_act=False)
+ self.pw_1 = ConvBNLayer(in_channels=in_channels, out_channels=out_channels // 2, kernel_size=1, stride=1)
+ # branch2
+ self.pw_2_1 = ConvBNLayer(in_channels=in_channels, out_channels=out_channels // 2, kernel_size=1)
+ self.dw_2 = ConvBNLayer(in_channels=out_channels // 2,
+ out_channels=out_channels // 2,
+ kernel_size=3,
+ stride=2,
+ groups=out_channels // 2,
+ if_act=False)
+ self.se = SEModule(out_channels // 2)
+ self.pw_2_2 = ConvBNLayer(in_channels=out_channels // 2, out_channels=out_channels // 2, kernel_size=1)
+ self.concat_dw = ConvBNLayer(in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ groups=out_channels)
+ self.concat_pw = ConvBNLayer(in_channels=out_channels, out_channels=out_channels, kernel_size=1)
+
+ def forward(self, x):
+ x1 = self.dw_1(x)
+ x1 = self.pw_1(x1)
+ x2 = self.pw_2_1(x)
+ x2 = self.dw_2(x2)
+ x2 = self.se(x2)
+ x2 = self.pw_2_2(x2)
+ x = concat([x1, x2], axis=1)
+ x = self.concat_dw(x)
+ x = self.concat_pw(x)
+ return x
+
+
+class ESNet(TheseusLayer):
+
+ def __init__(self,
+ stages_pattern,
+ class_num=1000,
+ scale=1.0,
+ dropout_prob=0.2,
+ class_expand=1280,
+ return_patterns=None,
+ return_stages=None):
+ super().__init__()
+ self.scale = scale
+ self.class_num = class_num
+ self.class_expand = class_expand
+ stage_repeats = [3, 7, 3]
+ stage_out_channels = [
+ -1, 24, make_divisible(116 * scale),
+ make_divisible(232 * scale),
+ make_divisible(464 * scale), 1024
+ ]
+
+ self.conv1 = ConvBNLayer(in_channels=3, out_channels=stage_out_channels[1], kernel_size=3, stride=2)
+ self.max_pool = MaxPool2D(kernel_size=3, stride=2, padding=1)
+
+ block_list = []
+ for stage_id, num_repeat in enumerate(stage_repeats):
+ for i in range(num_repeat):
+ if i == 0:
+ block = ESBlock2(in_channels=stage_out_channels[stage_id + 1],
+ out_channels=stage_out_channels[stage_id + 2])
+ else:
+ block = ESBlock1(in_channels=stage_out_channels[stage_id + 2],
+ out_channels=stage_out_channels[stage_id + 2])
+ block_list.append(block)
+ self.blocks = nn.Sequential(*block_list)
+
+ self.conv2 = ConvBNLayer(in_channels=stage_out_channels[-2], out_channels=stage_out_channels[-1], kernel_size=1)
+
+ self.avg_pool = AdaptiveAvgPool2D(1)
+
+ self.last_conv = Conv2D(in_channels=stage_out_channels[-1],
+ out_channels=self.class_expand,
+ kernel_size=1,
+ stride=1,
+ padding=0,
+ bias_attr=False)
+ self.hardswish = nn.Hardswish()
+ self.dropout = Dropout(p=dropout_prob, mode="downscale_in_infer")
+ self.flatten = nn.Flatten(start_axis=1, stop_axis=-1)
+ self.fc = Linear(self.class_expand, self.class_num)
+
+ super().init_res(stages_pattern, return_patterns=return_patterns, return_stages=return_stages)
+
+ def forward(self, x):
+ x = self.conv1(x)
+ x = self.max_pool(x)
+ x = self.blocks(x)
+ x = self.conv2(x)
+ x = self.avg_pool(x)
+ x = self.last_conv(x)
+ x = self.hardswish(x)
+ x = self.dropout(x)
+ x = self.flatten(x)
+ x = self.fc(x)
+ return x
+
+
+def ESNet_x0_25(pretrained=False, use_ssld=False, **kwargs):
+ """
+ ESNet_x0_25
+ Args:
+ pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise.
+ If str, means the path of the pretrained model.
+ use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True.
+ Returns:
+ model: nn.Layer. Specific `ESNet_x0_25` model depends on args.
+ """
+ model = ESNet(scale=0.25, stages_pattern=MODEL_STAGES_PATTERN["ESNet"], **kwargs)
+ return model
diff --git a/modules/image/classification/esnet_x0_25_imagenet/module.py b/modules/image/classification/esnet_x0_25_imagenet/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..2c2edaab788a0ac410978528303c45cd86c95f76
--- /dev/null
+++ b/modules/image/classification/esnet_x0_25_imagenet/module.py
@@ -0,0 +1,154 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import copy
+import os
+
+import cv2
+import numpy as np
+import paddle
+from skimage.io import imread
+from skimage.transform import rescale
+from skimage.transform import resize
+
+import paddlehub as hub
+from .model import ESNet_x0_25
+from .processor import base64_to_cv2
+from .processor import create_operators
+from .processor import Topk
+from .utils import get_config
+from paddlehub.module.module import moduleinfo
+from paddlehub.module.module import runnable
+from paddlehub.module.module import serving
+
+
+@moduleinfo(name="esnet_x0_25_imagenet",
+ type="cv/classification",
+ author="paddlepaddle",
+ author_email="",
+ summary="",
+ version="1.0.0")
+class Esnet_x0_25_Imagenet:
+
+ def __init__(self):
+ self.config = get_config(os.path.join(self.directory, 'ESNet_x0_25.yaml'), show=False)
+ self.label_path = os.path.join(self.directory, 'imagenet1k_label_list.txt')
+ self.pretrain_path = os.path.join(self.directory, 'ESNet_x0_25_pretrained.pdparams')
+ self.config['Infer']['PostProcess']['class_id_map_file'] = self.label_path
+ self.model = ESNet_x0_25()
+ param_state_dict = paddle.load(self.pretrain_path)
+ self.model.set_dict(param_state_dict)
+ self.preprocess_funcs = create_operators(self.config["Infer"]["transforms"])
+
+ def classification(self,
+ images: list = None,
+ paths: list = None,
+ batch_size: int = 1,
+ use_gpu: bool = False,
+ top_k: int = 1):
+ '''
+ Args:
+ images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR.
+ paths (list[str]): The paths of images.
+ batch_size (int): batch size.
+ use_gpu (bool): Whether to use gpu.
+ top_k (int): Return top k results.
+
+ Returns:
+ res (list[dict]): The classfication results, each result dict contains key 'class_ids', 'scores' and 'label_names'.
+ '''
+ postprocess_func = Topk(top_k, self.label_path)
+ inputs = []
+ results = []
+ paddle.disable_static()
+ place = 'gpu:0' if use_gpu else 'cpu'
+ place = paddle.set_device(place)
+ if images == None and paths == None:
+ print('No image provided. Please input an image or a image path.')
+ return
+
+ if images != None:
+ for image in images:
+ image = image[:, :, ::-1]
+ inputs.append(image)
+
+ if paths != None:
+ for path in paths:
+ image = cv2.imread(path)[:, :, ::-1]
+ inputs.append(image)
+
+ batch_data = []
+ for idx, imagedata in enumerate(inputs):
+ for process in self.preprocess_funcs:
+ imagedata = process(imagedata)
+ batch_data.append(imagedata)
+ if len(batch_data) >= batch_size or idx == len(inputs) - 1:
+ batch_tensor = paddle.to_tensor(batch_data)
+ out = self.model(batch_tensor)
+ if isinstance(out, list):
+ out = out[0]
+ if isinstance(out, dict) and "logits" in out:
+ out = out["logits"]
+ if isinstance(out, dict) and "output" in out:
+ out = out["output"]
+ result = postprocess_func(out)
+ results.extend(result)
+ batch_data.clear()
+ return results
+
+ @runnable
+ def run_cmd(self, argvs: list):
+ """
+ Run as a command.
+ """
+ self.parser = argparse.ArgumentParser(description="Run the {} module.".format(self.name),
+ prog='hub run {}'.format(self.name),
+ usage='%(prog)s',
+ add_help=True)
+
+ self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
+ self.arg_config_group = self.parser.add_argument_group(
+ title="Config options", description="Run configuration for controlling module behavior, not required.")
+ self.add_module_config_arg()
+ self.add_module_input_arg()
+ self.args = self.parser.parse_args(argvs)
+ results = self.classification(paths=[self.args.input_path],
+ use_gpu=self.args.use_gpu,
+ batch_size=self.args.batch_size,
+ top_k=self.args.top_k)
+ return results
+
+ @serving
+ def serving_method(self, images, **kwargs):
+ """
+ Run as a service.
+ """
+ images_decode = [base64_to_cv2(image) for image in images]
+ results = self.classification(images=images_decode, **kwargs)
+ return results
+
+ def add_module_config_arg(self):
+ """
+ Add the command config options.
+ """
+ self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not")
+
+ self.arg_config_group.add_argument('--batch_size', type=int, default=1, help='batch size')
+ self.arg_config_group.add_argument('--top_k', type=int, default=1, help='Return top k results.')
+
+ def add_module_input_arg(self):
+ """
+ Add the command input options.
+ """
+ self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.")
diff --git a/modules/image/classification/esnet_x0_25_imagenet/processor.py b/modules/image/classification/esnet_x0_25_imagenet/processor.py
new file mode 100644
index 0000000000000000000000000000000000000000..40cab3917ecaef50cd47d0abb76bbd5d49062bf8
--- /dev/null
+++ b/modules/image/classification/esnet_x0_25_imagenet/processor.py
@@ -0,0 +1,374 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+
+import base64
+import inspect
+import math
+import os
+import random
+import sys
+from functools import partial
+
+import cv2
+import numpy as np
+import paddle
+import paddle.nn.functional as F
+import six
+from paddle.vision.transforms import ColorJitter as RawColorJitter
+from PIL import Image
+
+
+def create_operators(params, class_num=None):
+ """
+ create operators based on the config
+
+ Args:
+ params(list): a dict list, used to create some operators
+ """
+ assert isinstance(params, list), ('operator config should be a list')
+ ops = []
+ current_module = sys.modules[__name__]
+ for operator in params:
+ assert isinstance(operator, dict) and len(operator) == 1, "yaml format error"
+ op_name = list(operator)[0]
+ param = {} if operator[op_name] is None else operator[op_name]
+ op_func = getattr(current_module, op_name)
+ if "class_num" in inspect.getfullargspec(op_func).args:
+ param.update({"class_num": class_num})
+ op = op_func(**param)
+ ops.append(op)
+
+ return ops
+
+
+class UnifiedResize(object):
+
+ def __init__(self, interpolation=None, backend="cv2"):
+ _cv2_interp_from_str = {
+ 'nearest': cv2.INTER_NEAREST,
+ 'bilinear': cv2.INTER_LINEAR,
+ 'area': cv2.INTER_AREA,
+ 'bicubic': cv2.INTER_CUBIC,
+ 'lanczos': cv2.INTER_LANCZOS4
+ }
+ _pil_interp_from_str = {
+ 'nearest': Image.NEAREST,
+ 'bilinear': Image.BILINEAR,
+ 'bicubic': Image.BICUBIC,
+ 'box': Image.BOX,
+ 'lanczos': Image.LANCZOS,
+ 'hamming': Image.HAMMING
+ }
+
+ def _pil_resize(src, size, resample):
+ pil_img = Image.fromarray(src)
+ pil_img = pil_img.resize(size, resample)
+ return np.asarray(pil_img)
+
+ if backend.lower() == "cv2":
+ if isinstance(interpolation, str):
+ interpolation = _cv2_interp_from_str[interpolation.lower()]
+ # compatible with opencv < version 4.4.0
+ elif interpolation is None:
+ interpolation = cv2.INTER_LINEAR
+ self.resize_func = partial(cv2.resize, interpolation=interpolation)
+ elif backend.lower() == "pil":
+ if isinstance(interpolation, str):
+ interpolation = _pil_interp_from_str[interpolation.lower()]
+ self.resize_func = partial(_pil_resize, resample=interpolation)
+ else:
+ self.resize_func = cv2.resize
+
+ def __call__(self, src, size):
+ return self.resize_func(src, size)
+
+
+class OperatorParamError(ValueError):
+ """ OperatorParamError
+ """
+ pass
+
+
+class DecodeImage(object):
+ """ decode image """
+
+ def __init__(self, to_rgb=True, to_np=False, channel_first=False):
+ self.to_rgb = to_rgb
+ self.to_np = to_np # to numpy
+ self.channel_first = channel_first # only enabled when to_np is True
+
+ def __call__(self, img):
+ if six.PY2:
+ assert type(img) is str and len(img) > 0, "invalid input 'img' in DecodeImage"
+ else:
+ assert type(img) is bytes and len(img) > 0, "invalid input 'img' in DecodeImage"
+ data = np.frombuffer(img, dtype='uint8')
+ img = cv2.imdecode(data, 1)
+ if self.to_rgb:
+ assert img.shape[2] == 3, 'invalid shape of image[%s]' % (img.shape)
+ img = img[:, :, ::-1]
+
+ if self.channel_first:
+ img = img.transpose((2, 0, 1))
+
+ return img
+
+
+class ResizeImage(object):
+ """ resize image """
+
+ def __init__(self, size=None, resize_short=None, interpolation=None, backend="cv2"):
+ if resize_short is not None and resize_short > 0:
+ self.resize_short = resize_short
+ self.w = None
+ self.h = None
+ elif size is not None:
+ self.resize_short = None
+ self.w = size if type(size) is int else size[0]
+ self.h = size if type(size) is int else size[1]
+ else:
+ raise OperatorParamError("invalid params for ReisizeImage for '\
+ 'both 'size' and 'resize_short' are None")
+
+ self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend)
+
+ def __call__(self, img):
+ img_h, img_w = img.shape[:2]
+ if self.resize_short is not None:
+ percent = float(self.resize_short) / min(img_w, img_h)
+ w = int(round(img_w * percent))
+ h = int(round(img_h * percent))
+ else:
+ w = self.w
+ h = self.h
+ return self._resize_func(img, (w, h))
+
+
+class CropImage(object):
+ """ crop image """
+
+ def __init__(self, size):
+ if type(size) is int:
+ self.size = (size, size)
+ else:
+ self.size = size # (h, w)
+
+ def __call__(self, img):
+ w, h = self.size
+ img_h, img_w = img.shape[:2]
+ w_start = (img_w - w) // 2
+ h_start = (img_h - h) // 2
+
+ w_end = w_start + w
+ h_end = h_start + h
+ return img[h_start:h_end, w_start:w_end, :]
+
+
+class RandCropImage(object):
+ """ random crop image """
+
+ def __init__(self, size, scale=None, ratio=None, interpolation=None, backend="cv2"):
+ if type(size) is int:
+ self.size = (size, size) # (h, w)
+ else:
+ self.size = size
+
+ self.scale = [0.08, 1.0] if scale is None else scale
+ self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio
+
+ self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend)
+
+ def __call__(self, img):
+ size = self.size
+ scale = self.scale
+ ratio = self.ratio
+
+ aspect_ratio = math.sqrt(random.uniform(*ratio))
+ w = 1. * aspect_ratio
+ h = 1. / aspect_ratio
+
+ img_h, img_w = img.shape[:2]
+
+ bound = min((float(img_w) / img_h) / (w**2), (float(img_h) / img_w) / (h**2))
+ scale_max = min(scale[1], bound)
+ scale_min = min(scale[0], bound)
+
+ target_area = img_w * img_h * random.uniform(scale_min, scale_max)
+ target_size = math.sqrt(target_area)
+ w = int(target_size * w)
+ h = int(target_size * h)
+
+ i = random.randint(0, img_w - w)
+ j = random.randint(0, img_h - h)
+
+ img = img[j:j + h, i:i + w, :]
+
+ return self._resize_func(img, size)
+
+
+class RandFlipImage(object):
+ """ random flip image
+ flip_code:
+ 1: Flipped Horizontally
+ 0: Flipped Vertically
+ -1: Flipped Horizontally & Vertically
+ """
+
+ def __init__(self, flip_code=1):
+ assert flip_code in [-1, 0, 1], "flip_code should be a value in [-1, 0, 1]"
+ self.flip_code = flip_code
+
+ def __call__(self, img):
+ if random.randint(0, 1) == 1:
+ return cv2.flip(img, self.flip_code)
+ else:
+ return img
+
+
+class NormalizeImage(object):
+ """ normalize image such as substract mean, divide std
+ """
+
+ def __init__(self, scale=None, mean=None, std=None, order='chw', output_fp16=False, channel_num=3):
+ if isinstance(scale, str):
+ scale = eval(scale)
+ assert channel_num in [3, 4], "channel number of input image should be set to 3 or 4."
+ self.channel_num = channel_num
+ self.output_dtype = 'float16' if output_fp16 else 'float32'
+ self.scale = np.float32(scale if scale is not None else 1.0 / 255.0)
+ self.order = order
+ mean = mean if mean is not None else [0.485, 0.456, 0.406]
+ std = std if std is not None else [0.229, 0.224, 0.225]
+
+ shape = (3, 1, 1) if self.order == 'chw' else (1, 1, 3)
+ self.mean = np.array(mean).reshape(shape).astype('float32')
+ self.std = np.array(std).reshape(shape).astype('float32')
+
+ def __call__(self, img):
+ from PIL import Image
+ if isinstance(img, Image.Image):
+ img = np.array(img)
+
+ assert isinstance(img, np.ndarray), "invalid input 'img' in NormalizeImage"
+
+ img = (img.astype('float32') * self.scale - self.mean) / self.std
+
+ if self.channel_num == 4:
+ img_h = img.shape[1] if self.order == 'chw' else img.shape[0]
+ img_w = img.shape[2] if self.order == 'chw' else img.shape[1]
+ pad_zeros = np.zeros((1, img_h, img_w)) if self.order == 'chw' else np.zeros((img_h, img_w, 1))
+ img = (np.concatenate((img, pad_zeros), axis=0) if self.order == 'chw' else np.concatenate(
+ (img, pad_zeros), axis=2))
+ return img.astype(self.output_dtype)
+
+
+class ToCHWImage(object):
+ """ convert hwc image to chw image
+ """
+
+ def __init__(self):
+ pass
+
+ def __call__(self, img):
+ from PIL import Image
+ if isinstance(img, Image.Image):
+ img = np.array(img)
+
+ return img.transpose((2, 0, 1))
+
+
+class ColorJitter(RawColorJitter):
+ """ColorJitter.
+ """
+
+ def __init__(self, *args, **kwargs):
+ super().__init__(*args, **kwargs)
+
+ def __call__(self, img):
+ if not isinstance(img, Image.Image):
+ img = np.ascontiguousarray(img)
+ img = Image.fromarray(img)
+ img = super()._apply_image(img)
+ if isinstance(img, Image.Image):
+ img = np.asarray(img)
+ return img
+
+
+def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+
+class Topk(object):
+
+ def __init__(self, topk=1, class_id_map_file=None):
+ assert isinstance(topk, (int, ))
+ self.class_id_map = self.parse_class_id_map(class_id_map_file)
+ self.topk = topk
+
+ def parse_class_id_map(self, class_id_map_file):
+ if class_id_map_file is None:
+ return None
+ if not os.path.exists(class_id_map_file):
+ print(
+ "Warning: If want to use your own label_dict, please input legal path!\nOtherwise label_names will be empty!"
+ )
+ return None
+
+ try:
+ class_id_map = {}
+ with open(class_id_map_file, "r") as fin:
+ lines = fin.readlines()
+ for line in lines:
+ partition = line.split("\n")[0].partition(" ")
+ class_id_map[int(partition[0])] = str(partition[-1])
+ except Exception as ex:
+ print(ex)
+ class_id_map = None
+ return class_id_map
+
+ def __call__(self, x, file_names=None, multilabel=False):
+ assert isinstance(x, paddle.Tensor)
+ if file_names is not None:
+ assert x.shape[0] == len(file_names)
+ x = F.softmax(x, axis=-1) if not multilabel else F.sigmoid(x)
+ x = x.numpy()
+ y = []
+ for idx, probs in enumerate(x):
+ index = probs.argsort(axis=0)[-self.topk:][::-1].astype("int32") if not multilabel else np.where(
+ probs >= 0.5)[0].astype("int32")
+ clas_id_list = []
+ score_list = []
+ label_name_list = []
+ for i in index:
+ clas_id_list.append(i.item())
+ score_list.append(probs[i].item())
+ if self.class_id_map is not None:
+ label_name_list.append(self.class_id_map[i.item()])
+ result = {
+ "class_ids": clas_id_list,
+ "scores": np.around(score_list, decimals=5).tolist(),
+ }
+ if file_names is not None:
+ result["file_name"] = file_names[idx]
+ if label_name_list is not None:
+ result["label_names"] = label_name_list
+ y.append(result)
+ return y
diff --git a/modules/image/classification/esnet_x0_25_imagenet/utils.py b/modules/image/classification/esnet_x0_25_imagenet/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..df2bc36b3050beb0256bf2266dd6b33b4590e537
--- /dev/null
+++ b/modules/image/classification/esnet_x0_25_imagenet/utils.py
@@ -0,0 +1,129 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import copy
+import os
+
+import yaml
+
+__all__ = ['get_config']
+
+
+class AttrDict(dict):
+
+ def __getattr__(self, key):
+ return self[key]
+
+ def __setattr__(self, key, value):
+ if key in self.__dict__:
+ self.__dict__[key] = value
+ else:
+ self[key] = value
+
+ def __deepcopy__(self, content):
+ return copy.deepcopy(dict(self))
+
+
+def create_attr_dict(yaml_config):
+ from ast import literal_eval
+ for key, value in yaml_config.items():
+ if type(value) is dict:
+ yaml_config[key] = value = AttrDict(value)
+ if isinstance(value, str):
+ try:
+ value = literal_eval(value)
+ except BaseException:
+ pass
+ if isinstance(value, AttrDict):
+ create_attr_dict(yaml_config[key])
+ else:
+ yaml_config[key] = value
+
+
+def parse_config(cfg_file):
+ """Load a config file into AttrDict"""
+ with open(cfg_file, 'r') as fopen:
+ yaml_config = AttrDict(yaml.load(fopen, Loader=yaml.SafeLoader))
+ create_attr_dict(yaml_config)
+ return yaml_config
+
+
+def override(dl, ks, v):
+ """
+ Recursively replace dict of list
+ Args:
+ dl(dict or list): dict or list to be replaced
+ ks(list): list of keys
+ v(str): value to be replaced
+ """
+
+ def str2num(v):
+ try:
+ return eval(v)
+ except Exception:
+ return v
+
+ assert isinstance(dl, (list, dict)), ("{} should be a list or a dict")
+ assert len(ks) > 0, ('lenght of keys should larger than 0')
+ if isinstance(dl, list):
+ k = str2num(ks[0])
+ if len(ks) == 1:
+ assert k < len(dl), ('index({}) out of range({})'.format(k, dl))
+ dl[k] = str2num(v)
+ else:
+ override(dl[k], ks[1:], v)
+ else:
+ if len(ks) == 1:
+ # assert ks[0] in dl, ('{} is not exist in {}'.format(ks[0], dl))
+ if not ks[0] in dl:
+ print('A new filed ({}) detected!'.format(ks[0], dl))
+ dl[ks[0]] = str2num(v)
+ else:
+ override(dl[ks[0]], ks[1:], v)
+
+
+def override_config(config, options=None):
+ """
+ Recursively override the config
+ Args:
+ config(dict): dict to be replaced
+ options(list): list of pairs(key0.key1.idx.key2=value)
+ such as: [
+ 'topk=2',
+ 'VALID.transforms.1.ResizeImage.resize_short=300'
+ ]
+ Returns:
+ config(dict): replaced config
+ """
+ if options is not None:
+ for opt in options:
+ assert isinstance(opt, str), ("option({}) should be a str".format(opt))
+ assert "=" in opt, ("option({}) should contain a ="
+ "to distinguish between key and value".format(opt))
+ pair = opt.split('=')
+ assert len(pair) == 2, ("there can be only a = in the option")
+ key, value = pair
+ keys = key.split('.')
+ override(config, keys, value)
+ return config
+
+
+def get_config(fname, overrides=None, show=False):
+ """
+ Read config from file
+ """
+ assert os.path.exists(fname), ('config file({}) is not exist'.format(fname))
+ config = parse_config(fname)
+ override_config(config, overrides)
+ return config
diff --git a/modules/image/classification/esnet_x0_5_imagenet/README.md b/modules/image/classification/esnet_x0_5_imagenet/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..f620be394670f0cfe102dd626f1033c22f7be466
--- /dev/null
+++ b/modules/image/classification/esnet_x0_5_imagenet/README.md
@@ -0,0 +1,133 @@
+# esnet_x0_5_imagenet
+
+|模型名称|esnet_x0_5_imagenet|
+| :--- | :---: |
+|类别|图像-图像分类|
+|网络|ESNet|
+|数据集|ImageNet-2012|
+|是否支持Fine-tuning|否|
+|模型大小|12 MB|
+|最新更新日期|2022-04-02|
+|数据指标|Acc|
+
+
+## 一、模型基本信息
+
+
+
+- ### 模型介绍
+
+ - ESNet(Enhanced ShuffleNet)是百度自研的一个轻量级网络,该网络在 ShuffleNetV2 的基础上融合了 MobileNetV3、GhostNet、PPLCNet 的优点,组合成了一个在 ARM 设备上速度更快、精度更高的网络,由于其出色的表现,所以在 PaddleDetection 推出的 [PP-PicoDet](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/picodet) 使用了该模型做 backbone,配合更强的目标检测算法,最终的指标一举刷新了目标检测模型在 ARM 设备上的 SOTA 指标。该模型为模型规模参数scale为x0.5下的ESNet模型。
+
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 1.6.2
+
+ - paddlehub >= 1.6.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
+
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install esnet_x0_5_imagenet
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ $ hub run esnet_x0_5_imagenet --input_path "/PATH/TO/IMAGE"
+ ```
+ - 通过命令行方式实现分类模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ classifier = hub.Module(name="esnet_x0_5_imagenet")
+ result = classifier.classification(images=[cv2.imread('/PATH/TO/IMAGE')])
+ # or
+ # result = classifier.classification(paths=['/PATH/TO/IMAGE'])
+ ```
+
+- ### 3、API
+
+
+ - ```python
+ def classification(images=None,
+ paths=None,
+ batch_size=1,
+ use_gpu=False,
+ top_k=1):
+ ```
+ - 分类接口API。
+ - **参数**
+
+ - images (list\[numpy.ndarray\]): 图片数据,每一个图片数据的shape 均为 \[H, W, C\],颜色空间为 BGR;
+ - paths (list\[str\]): 图片的路径;
+ - batch\_size (int): batch 的大小;
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
+ - top\_k (int): 返回预测结果的前 k 个。
+
+ - **返回**
+
+ - res (list\[dict\]): 分类结果,列表的每一个元素均为字典,其中 key 包括'class_ids'(种类索引), 'scores'(置信度) 和 'label_names'(种类名称)
+
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个图像识别的在线服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+ - ```shell
+ $ hub serving start -m esnet_x0_5_imagenet
+ ```
+
+ - 这样就完成了一个图像识别的在线服务的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ # 发送HTTP请求
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+ headers = {"Content-type": "application/json"\}
+ url = "http://127.0.0.1:8866/predict/esnet_x0_5_imagenet"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # 打印预测结果
+ print(r.json()["results"])
+ ```
+
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+
+ - ```shell
+ $ hub install esnet_x0_5_imagenet==1.0.0
+ ```
diff --git a/modules/image/classification/esnet_x0_5_imagenet/model.py b/modules/image/classification/esnet_x0_5_imagenet/model.py
new file mode 100644
index 0000000000000000000000000000000000000000..4e6bd8c7b1dc4c3207ab2ad20113861b94d5af16
--- /dev/null
+++ b/modules/image/classification/esnet_x0_5_imagenet/model.py
@@ -0,0 +1,506 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import math
+from typing import Any
+from typing import Callable
+from typing import Dict
+from typing import List
+from typing import Tuple
+from typing import Union
+
+import paddle
+import paddle.nn as nn
+from paddle import concat
+from paddle import ParamAttr
+from paddle import reshape
+from paddle import split
+from paddle import transpose
+from paddle.nn import AdaptiveAvgPool2D
+from paddle.nn import BatchNorm
+from paddle.nn import Conv2D
+from paddle.nn import Dropout
+from paddle.nn import Linear
+from paddle.nn import MaxPool2D
+from paddle.nn.initializer import KaimingNormal
+from paddle.regularizer import L2Decay
+
+MODEL_STAGES_PATTERN = {"ESNet": ["blocks[2]", "blocks[9]", "blocks[12]"]}
+
+
+class Identity(nn.Layer):
+
+ def __init__(self):
+ super(Identity, self).__init__()
+
+ def forward(self, inputs):
+ return inputs
+
+
+class TheseusLayer(nn.Layer):
+
+ def __init__(self, *args, **kwargs):
+ super(TheseusLayer, self).__init__()
+ self.res_dict = {}
+ self.res_name = self.full_name()
+ self.pruner = None
+ self.quanter = None
+
+ def _return_dict_hook(self, layer, input, output):
+ res_dict = {"output": output}
+ # 'list' is needed to avoid error raised by popping self.res_dict
+ for res_key in list(self.res_dict):
+ # clear the res_dict because the forward process may change according to input
+ res_dict[res_key] = self.res_dict.pop(res_key)
+ return res_dict
+
+ def init_res(self, stages_pattern, return_patterns=None, return_stages=None):
+ if return_patterns and return_stages:
+ msg = f"The 'return_patterns' would be ignored when 'return_stages' is set."
+ return_stages = None
+
+ if return_stages is True:
+ return_patterns = stages_pattern
+ # return_stages is int or bool
+ if type(return_stages) is int:
+ return_stages = [return_stages]
+ if isinstance(return_stages, list):
+ if max(return_stages) > len(stages_pattern) or min(return_stages) < 0:
+ msg = f"The 'return_stages' set error. Illegal value(s) have been ignored. The stages' pattern list is {stages_pattern}."
+ return_stages = [val for val in return_stages if val >= 0 and val < len(stages_pattern)]
+ return_patterns = [stages_pattern[i] for i in return_stages]
+
+ if return_patterns:
+ self.update_res(return_patterns)
+
+ def replace_sub(self, *args, **kwargs) -> None:
+ msg = "The function 'replace_sub()' is deprecated, please use 'upgrade_sublayer()' instead."
+ raise DeprecationWarning(msg)
+
+ def upgrade_sublayer(self, layer_name_pattern: Union[str, List[str]],
+ handle_func: Callable[[nn.Layer, str], nn.Layer]) -> Dict[str, nn.Layer]:
+ """use 'handle_func' to modify the sub-layer(s) specified by 'layer_name_pattern'.
+
+ Args:
+ layer_name_pattern (Union[str, List[str]]): The name of layer to be modified by 'handle_func'.
+ handle_func (Callable[[nn.Layer, str], nn.Layer]): The function to modify target layer specified by 'layer_name_pattern'. The formal params are the layer(nn.Layer) and pattern(str) that is (a member of) layer_name_pattern (when layer_name_pattern is List type). And the return is the layer processed.
+
+ Returns:
+ Dict[str, nn.Layer]: The key is the pattern and corresponding value is the result returned by 'handle_func()'.
+
+ Examples:
+
+ from paddle import nn
+ import paddleclas
+
+ def rep_func(layer: nn.Layer, pattern: str):
+ new_layer = nn.Conv2D(
+ in_channels=layer._in_channels,
+ out_channels=layer._out_channels,
+ kernel_size=5,
+ padding=2
+ )
+ return new_layer
+
+ net = paddleclas.MobileNetV1()
+ res = net.replace_sub(layer_name_pattern=["blocks[11].depthwise_conv.conv", "blocks[12].depthwise_conv.conv"], handle_func=rep_func)
+ print(res)
+ # {'blocks[11].depthwise_conv.conv': the corresponding new_layer, 'blocks[12].depthwise_conv.conv': the corresponding new_layer}
+ """
+
+ if not isinstance(layer_name_pattern, list):
+ layer_name_pattern = [layer_name_pattern]
+
+ hit_layer_pattern_list = []
+ for pattern in layer_name_pattern:
+ # parse pattern to find target layer and its parent
+ layer_list = parse_pattern_str(pattern=pattern, parent_layer=self)
+ if not layer_list:
+ continue
+ sub_layer_parent = layer_list[-2]["layer"] if len(layer_list) > 1 else self
+
+ sub_layer = layer_list[-1]["layer"]
+ sub_layer_name = layer_list[-1]["name"]
+ sub_layer_index = layer_list[-1]["index"]
+
+ new_sub_layer = handle_func(sub_layer, pattern)
+
+ if sub_layer_index:
+ getattr(sub_layer_parent, sub_layer_name)[sub_layer_index] = new_sub_layer
+ else:
+ setattr(sub_layer_parent, sub_layer_name, new_sub_layer)
+
+ hit_layer_pattern_list.append(pattern)
+ return hit_layer_pattern_list
+
+ def stop_after(self, stop_layer_name: str) -> bool:
+ """stop forward and backward after 'stop_layer_name'.
+
+ Args:
+ stop_layer_name (str): The name of layer that stop forward and backward after this layer.
+
+ Returns:
+ bool: 'True' if successful, 'False' otherwise.
+ """
+
+ layer_list = parse_pattern_str(stop_layer_name, self)
+ if not layer_list:
+ return False
+
+ parent_layer = self
+ for layer_dict in layer_list:
+ name, index = layer_dict["name"], layer_dict["index"]
+ if not set_identity(parent_layer, name, index):
+ msg = f"Failed to set the layers that after stop_layer_name('{stop_layer_name}') to IdentityLayer. The error layer's name is '{name}'."
+ return False
+ parent_layer = layer_dict["layer"]
+
+ return True
+
+ def update_res(self, return_patterns: Union[str, List[str]]) -> Dict[str, nn.Layer]:
+ """update the result(s) to be returned.
+
+ Args:
+ return_patterns (Union[str, List[str]]): The name of layer to return output.
+
+ Returns:
+ Dict[str, nn.Layer]: The pattern(str) and corresponding layer(nn.Layer) that have been set successfully.
+ """
+
+ # clear res_dict that could have been set
+ self.res_dict = {}
+
+ class Handler(object):
+
+ def __init__(self, res_dict):
+ # res_dict is a reference
+ self.res_dict = res_dict
+
+ def __call__(self, layer, pattern):
+ layer.res_dict = self.res_dict
+ layer.res_name = pattern
+ if hasattr(layer, "hook_remove_helper"):
+ layer.hook_remove_helper.remove()
+ layer.hook_remove_helper = layer.register_forward_post_hook(save_sub_res_hook)
+ return layer
+
+ handle_func = Handler(self.res_dict)
+
+ hit_layer_pattern_list = self.upgrade_sublayer(return_patterns, handle_func=handle_func)
+
+ if hasattr(self, "hook_remove_helper"):
+ self.hook_remove_helper.remove()
+ self.hook_remove_helper = self.register_forward_post_hook(self._return_dict_hook)
+
+ return hit_layer_pattern_list
+
+
+def save_sub_res_hook(layer, input, output):
+ layer.res_dict[layer.res_name] = output
+
+
+def set_identity(parent_layer: nn.Layer, layer_name: str, layer_index: str = None) -> bool:
+ """set the layer specified by layer_name and layer_index to Indentity.
+
+ Args:
+ parent_layer (nn.Layer): The parent layer of target layer specified by layer_name and layer_index.
+ layer_name (str): The name of target layer to be set to Indentity.
+ layer_index (str, optional): The index of target layer to be set to Indentity in parent_layer. Defaults to None.
+
+ Returns:
+ bool: True if successfully, False otherwise.
+ """
+
+ stop_after = False
+ for sub_layer_name in parent_layer._sub_layers:
+ if stop_after:
+ parent_layer._sub_layers[sub_layer_name] = Identity()
+ continue
+ if sub_layer_name == layer_name:
+ stop_after = True
+
+ if layer_index and stop_after:
+ stop_after = False
+ for sub_layer_index in parent_layer._sub_layers[layer_name]._sub_layers:
+ if stop_after:
+ parent_layer._sub_layers[layer_name][sub_layer_index] = Identity()
+ continue
+ if layer_index == sub_layer_index:
+ stop_after = True
+
+ return stop_after
+
+
+def parse_pattern_str(pattern: str, parent_layer: nn.Layer) -> Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]:
+ """parse the string type pattern.
+
+ Args:
+ pattern (str): The pattern to discribe layer.
+ parent_layer (nn.Layer): The root layer relative to the pattern.
+
+ Returns:
+ Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]: None if failed. If successfully, the members are layers parsed in order:
+ [
+ {"layer": first layer, "name": first layer's name parsed, "index": first layer's index parsed if exist},
+ {"layer": second layer, "name": second layer's name parsed, "index": second layer's index parsed if exist},
+ ...
+ ]
+ """
+
+ pattern_list = pattern.split(".")
+ if not pattern_list:
+ msg = f"The pattern('{pattern}') is illegal. Please check and retry."
+ return None
+
+ layer_list = []
+ while len(pattern_list) > 0:
+ if '[' in pattern_list[0]:
+ target_layer_name = pattern_list[0].split('[')[0]
+ target_layer_index = pattern_list[0].split('[')[1].split(']')[0]
+ else:
+ target_layer_name = pattern_list[0]
+ target_layer_index = None
+
+ target_layer = getattr(parent_layer, target_layer_name, None)
+
+ if target_layer is None:
+ msg = f"Not found layer named('{target_layer_name}') specifed in pattern('{pattern}')."
+ return None
+
+ if target_layer_index and target_layer:
+ if int(target_layer_index) < 0 or int(target_layer_index) >= len(target_layer):
+ msg = f"Not found layer by index('{target_layer_index}') specifed in pattern('{pattern}'). The index should < {len(target_layer)} and > 0."
+ return None
+
+ target_layer = target_layer[target_layer_index]
+
+ layer_list.append({"layer": target_layer, "name": target_layer_name, "index": target_layer_index})
+
+ pattern_list = pattern_list[1:]
+ parent_layer = target_layer
+ return layer_list
+
+
+def channel_shuffle(x, groups):
+ batch_size, num_channels, height, width = x.shape[0:4]
+ channels_per_group = num_channels // groups
+ x = reshape(x=x, shape=[batch_size, groups, channels_per_group, height, width])
+ x = transpose(x=x, perm=[0, 2, 1, 3, 4])
+ x = reshape(x=x, shape=[batch_size, num_channels, height, width])
+ return x
+
+
+def make_divisible(v, divisor=8, min_value=None):
+ if min_value is None:
+ min_value = divisor
+ new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
+ if new_v < 0.9 * v:
+ new_v += divisor
+ return new_v
+
+
+class ConvBNLayer(TheseusLayer):
+
+ def __init__(self, in_channels, out_channels, kernel_size, stride=1, groups=1, if_act=True):
+ super().__init__()
+ self.conv = Conv2D(in_channels=in_channels,
+ out_channels=out_channels,
+ kernel_size=kernel_size,
+ stride=stride,
+ padding=(kernel_size - 1) // 2,
+ groups=groups,
+ weight_attr=ParamAttr(initializer=KaimingNormal()),
+ bias_attr=False)
+
+ self.bn = BatchNorm(out_channels,
+ param_attr=ParamAttr(regularizer=L2Decay(0.0)),
+ bias_attr=ParamAttr(regularizer=L2Decay(0.0)))
+ self.if_act = if_act
+ self.hardswish = nn.Hardswish()
+
+ def forward(self, x):
+ x = self.conv(x)
+ x = self.bn(x)
+ if self.if_act:
+ x = self.hardswish(x)
+ return x
+
+
+class SEModule(TheseusLayer):
+
+ def __init__(self, channel, reduction=4):
+ super().__init__()
+ self.avg_pool = AdaptiveAvgPool2D(1)
+ self.conv1 = Conv2D(in_channels=channel, out_channels=channel // reduction, kernel_size=1, stride=1, padding=0)
+ self.relu = nn.ReLU()
+ self.conv2 = Conv2D(in_channels=channel // reduction, out_channels=channel, kernel_size=1, stride=1, padding=0)
+ self.hardsigmoid = nn.Hardsigmoid()
+
+ def forward(self, x):
+ identity = x
+ x = self.avg_pool(x)
+ x = self.conv1(x)
+ x = self.relu(x)
+ x = self.conv2(x)
+ x = self.hardsigmoid(x)
+ x = paddle.multiply(x=identity, y=x)
+ return x
+
+
+class ESBlock1(TheseusLayer):
+
+ def __init__(self, in_channels, out_channels):
+ super().__init__()
+ self.pw_1_1 = ConvBNLayer(in_channels=in_channels // 2, out_channels=out_channels // 2, kernel_size=1, stride=1)
+ self.dw_1 = ConvBNLayer(in_channels=out_channels // 2,
+ out_channels=out_channels // 2,
+ kernel_size=3,
+ stride=1,
+ groups=out_channels // 2,
+ if_act=False)
+ self.se = SEModule(out_channels)
+
+ self.pw_1_2 = ConvBNLayer(in_channels=out_channels, out_channels=out_channels // 2, kernel_size=1, stride=1)
+
+ def forward(self, x):
+ x1, x2 = split(x, num_or_sections=[x.shape[1] // 2, x.shape[1] // 2], axis=1)
+ x2 = self.pw_1_1(x2)
+ x3 = self.dw_1(x2)
+ x3 = concat([x2, x3], axis=1)
+ x3 = self.se(x3)
+ x3 = self.pw_1_2(x3)
+ x = concat([x1, x3], axis=1)
+ return channel_shuffle(x, 2)
+
+
+class ESBlock2(TheseusLayer):
+
+ def __init__(self, in_channels, out_channels):
+ super().__init__()
+
+ # branch1
+ self.dw_1 = ConvBNLayer(in_channels=in_channels,
+ out_channels=in_channels,
+ kernel_size=3,
+ stride=2,
+ groups=in_channels,
+ if_act=False)
+ self.pw_1 = ConvBNLayer(in_channels=in_channels, out_channels=out_channels // 2, kernel_size=1, stride=1)
+ # branch2
+ self.pw_2_1 = ConvBNLayer(in_channels=in_channels, out_channels=out_channels // 2, kernel_size=1)
+ self.dw_2 = ConvBNLayer(in_channels=out_channels // 2,
+ out_channels=out_channels // 2,
+ kernel_size=3,
+ stride=2,
+ groups=out_channels // 2,
+ if_act=False)
+ self.se = SEModule(out_channels // 2)
+ self.pw_2_2 = ConvBNLayer(in_channels=out_channels // 2, out_channels=out_channels // 2, kernel_size=1)
+ self.concat_dw = ConvBNLayer(in_channels=out_channels,
+ out_channels=out_channels,
+ kernel_size=3,
+ groups=out_channels)
+ self.concat_pw = ConvBNLayer(in_channels=out_channels, out_channels=out_channels, kernel_size=1)
+
+ def forward(self, x):
+ x1 = self.dw_1(x)
+ x1 = self.pw_1(x1)
+ x2 = self.pw_2_1(x)
+ x2 = self.dw_2(x2)
+ x2 = self.se(x2)
+ x2 = self.pw_2_2(x2)
+ x = concat([x1, x2], axis=1)
+ x = self.concat_dw(x)
+ x = self.concat_pw(x)
+ return x
+
+
+class ESNet(TheseusLayer):
+
+ def __init__(self,
+ stages_pattern,
+ class_num=1000,
+ scale=1.0,
+ dropout_prob=0.2,
+ class_expand=1280,
+ return_patterns=None,
+ return_stages=None):
+ super().__init__()
+ self.scale = scale
+ self.class_num = class_num
+ self.class_expand = class_expand
+ stage_repeats = [3, 7, 3]
+ stage_out_channels = [
+ -1, 24, make_divisible(116 * scale),
+ make_divisible(232 * scale),
+ make_divisible(464 * scale), 1024
+ ]
+
+ self.conv1 = ConvBNLayer(in_channels=3, out_channels=stage_out_channels[1], kernel_size=3, stride=2)
+ self.max_pool = MaxPool2D(kernel_size=3, stride=2, padding=1)
+
+ block_list = []
+ for stage_id, num_repeat in enumerate(stage_repeats):
+ for i in range(num_repeat):
+ if i == 0:
+ block = ESBlock2(in_channels=stage_out_channels[stage_id + 1],
+ out_channels=stage_out_channels[stage_id + 2])
+ else:
+ block = ESBlock1(in_channels=stage_out_channels[stage_id + 2],
+ out_channels=stage_out_channels[stage_id + 2])
+ block_list.append(block)
+ self.blocks = nn.Sequential(*block_list)
+
+ self.conv2 = ConvBNLayer(in_channels=stage_out_channels[-2], out_channels=stage_out_channels[-1], kernel_size=1)
+
+ self.avg_pool = AdaptiveAvgPool2D(1)
+
+ self.last_conv = Conv2D(in_channels=stage_out_channels[-1],
+ out_channels=self.class_expand,
+ kernel_size=1,
+ stride=1,
+ padding=0,
+ bias_attr=False)
+ self.hardswish = nn.Hardswish()
+ self.dropout = Dropout(p=dropout_prob, mode="downscale_in_infer")
+ self.flatten = nn.Flatten(start_axis=1, stop_axis=-1)
+ self.fc = Linear(self.class_expand, self.class_num)
+
+ super().init_res(stages_pattern, return_patterns=return_patterns, return_stages=return_stages)
+
+ def forward(self, x):
+ x = self.conv1(x)
+ x = self.max_pool(x)
+ x = self.blocks(x)
+ x = self.conv2(x)
+ x = self.avg_pool(x)
+ x = self.last_conv(x)
+ x = self.hardswish(x)
+ x = self.dropout(x)
+ x = self.flatten(x)
+ x = self.fc(x)
+ return x
+
+
+def ESNet_x0_5(pretrained=False, use_ssld=False, **kwargs):
+ """
+ ESNet_x0_5
+ Args:
+ pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise.
+ If str, means the path of the pretrained model.
+ use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True.
+ Returns:
+ model: nn.Layer. Specific `ESNet_x0_5` model depends on args.
+ """
+ model = ESNet(scale=0.5, stages_pattern=MODEL_STAGES_PATTERN["ESNet"], **kwargs)
+ return model
diff --git a/modules/image/classification/esnet_x0_5_imagenet/module.py b/modules/image/classification/esnet_x0_5_imagenet/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..0abb6c0f5dbf0e83ee51ea730ed7cb16c4a6c6b7
--- /dev/null
+++ b/modules/image/classification/esnet_x0_5_imagenet/module.py
@@ -0,0 +1,154 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import copy
+import os
+
+import cv2
+import numpy as np
+import paddle
+from skimage.io import imread
+from skimage.transform import rescale
+from skimage.transform import resize
+
+import paddlehub as hub
+from .model import ESNet_x0_5
+from .processor import base64_to_cv2
+from .processor import create_operators
+from .processor import Topk
+from .utils import get_config
+from paddlehub.module.module import moduleinfo
+from paddlehub.module.module import runnable
+from paddlehub.module.module import serving
+
+
+@moduleinfo(name="esnet_x0_5_imagenet",
+ type="cv/classification",
+ author="paddlepaddle",
+ author_email="",
+ summary="",
+ version="1.0.0")
+class Esnet_x0_5_Imagenet:
+
+ def __init__(self):
+ self.config = get_config(os.path.join(self.directory, 'ESNet_x0_5.yaml'), show=False)
+ self.label_path = os.path.join(self.directory, 'imagenet1k_label_list.txt')
+ self.pretrain_path = os.path.join(self.directory, 'ESNet_x0_5_pretrained.pdparams')
+ self.config['Infer']['PostProcess']['class_id_map_file'] = self.label_path
+ self.model = ESNet_x0_5()
+ param_state_dict = paddle.load(self.pretrain_path)
+ self.model.set_dict(param_state_dict)
+ self.preprocess_funcs = create_operators(self.config["Infer"]["transforms"])
+
+ def classification(self,
+ images: list = None,
+ paths: list = None,
+ batch_size: int = 1,
+ use_gpu: bool = False,
+ top_k: int = 1):
+ '''
+ Args:
+ images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR.
+ paths (list[str]): The paths of images.
+ batch_size (int): batch size.
+ use_gpu (bool): Whether to use gpu.
+ top_k (int): Return top k results.
+
+ Returns:
+ res (list[dict]): The classfication results, each result dict contains key 'class_ids', 'scores' and 'label_names'.
+ '''
+ postprocess_func = Topk(top_k, self.label_path)
+ inputs = []
+ results = []
+ paddle.disable_static()
+ place = 'gpu:0' if use_gpu else 'cpu'
+ place = paddle.set_device(place)
+ if images == None and paths == None:
+ print('No image provided. Please input an image or a image path.')
+ return
+
+ if images != None:
+ for image in images:
+ image = image[:, :, ::-1]
+ inputs.append(image)
+
+ if paths != None:
+ for path in paths:
+ image = cv2.imread(path)[:, :, ::-1]
+ inputs.append(image)
+
+ batch_data = []
+ for idx, imagedata in enumerate(inputs):
+ for process in self.preprocess_funcs:
+ imagedata = process(imagedata)
+ batch_data.append(imagedata)
+ if len(batch_data) >= batch_size or idx == len(inputs) - 1:
+ batch_tensor = paddle.to_tensor(batch_data)
+ out = self.model(batch_tensor)
+ if isinstance(out, list):
+ out = out[0]
+ if isinstance(out, dict) and "logits" in out:
+ out = out["logits"]
+ if isinstance(out, dict) and "output" in out:
+ out = out["output"]
+ result = postprocess_func(out)
+ results.extend(result)
+ batch_data.clear()
+ return results
+
+ @runnable
+ def run_cmd(self, argvs: list):
+ """
+ Run as a command.
+ """
+ self.parser = argparse.ArgumentParser(description="Run the {} module.".format(self.name),
+ prog='hub run {}'.format(self.name),
+ usage='%(prog)s',
+ add_help=True)
+
+ self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
+ self.arg_config_group = self.parser.add_argument_group(
+ title="Config options", description="Run configuration for controlling module behavior, not required.")
+ self.add_module_config_arg()
+ self.add_module_input_arg()
+ self.args = self.parser.parse_args(argvs)
+ results = self.classification(paths=[self.args.input_path],
+ use_gpu=self.args.use_gpu,
+ batch_size=self.args.batch_size,
+ top_k=self.args.top_k)
+ return results
+
+ @serving
+ def serving_method(self, images, **kwargs):
+ """
+ Run as a service.
+ """
+ images_decode = [base64_to_cv2(image) for image in images]
+ results = self.classification(images=images_decode, **kwargs)
+ return results
+
+ def add_module_config_arg(self):
+ """
+ Add the command config options.
+ """
+ self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not")
+
+ self.arg_config_group.add_argument('--batch_size', type=int, default=1, help='batch size')
+ self.arg_config_group.add_argument('--top_k', type=int, default=1, help='Return top k results.')
+
+ def add_module_input_arg(self):
+ """
+ Add the command input options.
+ """
+ self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.")
diff --git a/modules/image/classification/esnet_x0_5_imagenet/processor.py b/modules/image/classification/esnet_x0_5_imagenet/processor.py
new file mode 100644
index 0000000000000000000000000000000000000000..40cab3917ecaef50cd47d0abb76bbd5d49062bf8
--- /dev/null
+++ b/modules/image/classification/esnet_x0_5_imagenet/processor.py
@@ -0,0 +1,374 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+
+import base64
+import inspect
+import math
+import os
+import random
+import sys
+from functools import partial
+
+import cv2
+import numpy as np
+import paddle
+import paddle.nn.functional as F
+import six
+from paddle.vision.transforms import ColorJitter as RawColorJitter
+from PIL import Image
+
+
+def create_operators(params, class_num=None):
+ """
+ create operators based on the config
+
+ Args:
+ params(list): a dict list, used to create some operators
+ """
+ assert isinstance(params, list), ('operator config should be a list')
+ ops = []
+ current_module = sys.modules[__name__]
+ for operator in params:
+ assert isinstance(operator, dict) and len(operator) == 1, "yaml format error"
+ op_name = list(operator)[0]
+ param = {} if operator[op_name] is None else operator[op_name]
+ op_func = getattr(current_module, op_name)
+ if "class_num" in inspect.getfullargspec(op_func).args:
+ param.update({"class_num": class_num})
+ op = op_func(**param)
+ ops.append(op)
+
+ return ops
+
+
+class UnifiedResize(object):
+
+ def __init__(self, interpolation=None, backend="cv2"):
+ _cv2_interp_from_str = {
+ 'nearest': cv2.INTER_NEAREST,
+ 'bilinear': cv2.INTER_LINEAR,
+ 'area': cv2.INTER_AREA,
+ 'bicubic': cv2.INTER_CUBIC,
+ 'lanczos': cv2.INTER_LANCZOS4
+ }
+ _pil_interp_from_str = {
+ 'nearest': Image.NEAREST,
+ 'bilinear': Image.BILINEAR,
+ 'bicubic': Image.BICUBIC,
+ 'box': Image.BOX,
+ 'lanczos': Image.LANCZOS,
+ 'hamming': Image.HAMMING
+ }
+
+ def _pil_resize(src, size, resample):
+ pil_img = Image.fromarray(src)
+ pil_img = pil_img.resize(size, resample)
+ return np.asarray(pil_img)
+
+ if backend.lower() == "cv2":
+ if isinstance(interpolation, str):
+ interpolation = _cv2_interp_from_str[interpolation.lower()]
+ # compatible with opencv < version 4.4.0
+ elif interpolation is None:
+ interpolation = cv2.INTER_LINEAR
+ self.resize_func = partial(cv2.resize, interpolation=interpolation)
+ elif backend.lower() == "pil":
+ if isinstance(interpolation, str):
+ interpolation = _pil_interp_from_str[interpolation.lower()]
+ self.resize_func = partial(_pil_resize, resample=interpolation)
+ else:
+ self.resize_func = cv2.resize
+
+ def __call__(self, src, size):
+ return self.resize_func(src, size)
+
+
+class OperatorParamError(ValueError):
+ """ OperatorParamError
+ """
+ pass
+
+
+class DecodeImage(object):
+ """ decode image """
+
+ def __init__(self, to_rgb=True, to_np=False, channel_first=False):
+ self.to_rgb = to_rgb
+ self.to_np = to_np # to numpy
+ self.channel_first = channel_first # only enabled when to_np is True
+
+ def __call__(self, img):
+ if six.PY2:
+ assert type(img) is str and len(img) > 0, "invalid input 'img' in DecodeImage"
+ else:
+ assert type(img) is bytes and len(img) > 0, "invalid input 'img' in DecodeImage"
+ data = np.frombuffer(img, dtype='uint8')
+ img = cv2.imdecode(data, 1)
+ if self.to_rgb:
+ assert img.shape[2] == 3, 'invalid shape of image[%s]' % (img.shape)
+ img = img[:, :, ::-1]
+
+ if self.channel_first:
+ img = img.transpose((2, 0, 1))
+
+ return img
+
+
+class ResizeImage(object):
+ """ resize image """
+
+ def __init__(self, size=None, resize_short=None, interpolation=None, backend="cv2"):
+ if resize_short is not None and resize_short > 0:
+ self.resize_short = resize_short
+ self.w = None
+ self.h = None
+ elif size is not None:
+ self.resize_short = None
+ self.w = size if type(size) is int else size[0]
+ self.h = size if type(size) is int else size[1]
+ else:
+ raise OperatorParamError("invalid params for ReisizeImage for '\
+ 'both 'size' and 'resize_short' are None")
+
+ self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend)
+
+ def __call__(self, img):
+ img_h, img_w = img.shape[:2]
+ if self.resize_short is not None:
+ percent = float(self.resize_short) / min(img_w, img_h)
+ w = int(round(img_w * percent))
+ h = int(round(img_h * percent))
+ else:
+ w = self.w
+ h = self.h
+ return self._resize_func(img, (w, h))
+
+
+class CropImage(object):
+ """ crop image """
+
+ def __init__(self, size):
+ if type(size) is int:
+ self.size = (size, size)
+ else:
+ self.size = size # (h, w)
+
+ def __call__(self, img):
+ w, h = self.size
+ img_h, img_w = img.shape[:2]
+ w_start = (img_w - w) // 2
+ h_start = (img_h - h) // 2
+
+ w_end = w_start + w
+ h_end = h_start + h
+ return img[h_start:h_end, w_start:w_end, :]
+
+
+class RandCropImage(object):
+ """ random crop image """
+
+ def __init__(self, size, scale=None, ratio=None, interpolation=None, backend="cv2"):
+ if type(size) is int:
+ self.size = (size, size) # (h, w)
+ else:
+ self.size = size
+
+ self.scale = [0.08, 1.0] if scale is None else scale
+ self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio
+
+ self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend)
+
+ def __call__(self, img):
+ size = self.size
+ scale = self.scale
+ ratio = self.ratio
+
+ aspect_ratio = math.sqrt(random.uniform(*ratio))
+ w = 1. * aspect_ratio
+ h = 1. / aspect_ratio
+
+ img_h, img_w = img.shape[:2]
+
+ bound = min((float(img_w) / img_h) / (w**2), (float(img_h) / img_w) / (h**2))
+ scale_max = min(scale[1], bound)
+ scale_min = min(scale[0], bound)
+
+ target_area = img_w * img_h * random.uniform(scale_min, scale_max)
+ target_size = math.sqrt(target_area)
+ w = int(target_size * w)
+ h = int(target_size * h)
+
+ i = random.randint(0, img_w - w)
+ j = random.randint(0, img_h - h)
+
+ img = img[j:j + h, i:i + w, :]
+
+ return self._resize_func(img, size)
+
+
+class RandFlipImage(object):
+ """ random flip image
+ flip_code:
+ 1: Flipped Horizontally
+ 0: Flipped Vertically
+ -1: Flipped Horizontally & Vertically
+ """
+
+ def __init__(self, flip_code=1):
+ assert flip_code in [-1, 0, 1], "flip_code should be a value in [-1, 0, 1]"
+ self.flip_code = flip_code
+
+ def __call__(self, img):
+ if random.randint(0, 1) == 1:
+ return cv2.flip(img, self.flip_code)
+ else:
+ return img
+
+
+class NormalizeImage(object):
+ """ normalize image such as substract mean, divide std
+ """
+
+ def __init__(self, scale=None, mean=None, std=None, order='chw', output_fp16=False, channel_num=3):
+ if isinstance(scale, str):
+ scale = eval(scale)
+ assert channel_num in [3, 4], "channel number of input image should be set to 3 or 4."
+ self.channel_num = channel_num
+ self.output_dtype = 'float16' if output_fp16 else 'float32'
+ self.scale = np.float32(scale if scale is not None else 1.0 / 255.0)
+ self.order = order
+ mean = mean if mean is not None else [0.485, 0.456, 0.406]
+ std = std if std is not None else [0.229, 0.224, 0.225]
+
+ shape = (3, 1, 1) if self.order == 'chw' else (1, 1, 3)
+ self.mean = np.array(mean).reshape(shape).astype('float32')
+ self.std = np.array(std).reshape(shape).astype('float32')
+
+ def __call__(self, img):
+ from PIL import Image
+ if isinstance(img, Image.Image):
+ img = np.array(img)
+
+ assert isinstance(img, np.ndarray), "invalid input 'img' in NormalizeImage"
+
+ img = (img.astype('float32') * self.scale - self.mean) / self.std
+
+ if self.channel_num == 4:
+ img_h = img.shape[1] if self.order == 'chw' else img.shape[0]
+ img_w = img.shape[2] if self.order == 'chw' else img.shape[1]
+ pad_zeros = np.zeros((1, img_h, img_w)) if self.order == 'chw' else np.zeros((img_h, img_w, 1))
+ img = (np.concatenate((img, pad_zeros), axis=0) if self.order == 'chw' else np.concatenate(
+ (img, pad_zeros), axis=2))
+ return img.astype(self.output_dtype)
+
+
+class ToCHWImage(object):
+ """ convert hwc image to chw image
+ """
+
+ def __init__(self):
+ pass
+
+ def __call__(self, img):
+ from PIL import Image
+ if isinstance(img, Image.Image):
+ img = np.array(img)
+
+ return img.transpose((2, 0, 1))
+
+
+class ColorJitter(RawColorJitter):
+ """ColorJitter.
+ """
+
+ def __init__(self, *args, **kwargs):
+ super().__init__(*args, **kwargs)
+
+ def __call__(self, img):
+ if not isinstance(img, Image.Image):
+ img = np.ascontiguousarray(img)
+ img = Image.fromarray(img)
+ img = super()._apply_image(img)
+ if isinstance(img, Image.Image):
+ img = np.asarray(img)
+ return img
+
+
+def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+
+class Topk(object):
+
+ def __init__(self, topk=1, class_id_map_file=None):
+ assert isinstance(topk, (int, ))
+ self.class_id_map = self.parse_class_id_map(class_id_map_file)
+ self.topk = topk
+
+ def parse_class_id_map(self, class_id_map_file):
+ if class_id_map_file is None:
+ return None
+ if not os.path.exists(class_id_map_file):
+ print(
+ "Warning: If want to use your own label_dict, please input legal path!\nOtherwise label_names will be empty!"
+ )
+ return None
+
+ try:
+ class_id_map = {}
+ with open(class_id_map_file, "r") as fin:
+ lines = fin.readlines()
+ for line in lines:
+ partition = line.split("\n")[0].partition(" ")
+ class_id_map[int(partition[0])] = str(partition[-1])
+ except Exception as ex:
+ print(ex)
+ class_id_map = None
+ return class_id_map
+
+ def __call__(self, x, file_names=None, multilabel=False):
+ assert isinstance(x, paddle.Tensor)
+ if file_names is not None:
+ assert x.shape[0] == len(file_names)
+ x = F.softmax(x, axis=-1) if not multilabel else F.sigmoid(x)
+ x = x.numpy()
+ y = []
+ for idx, probs in enumerate(x):
+ index = probs.argsort(axis=0)[-self.topk:][::-1].astype("int32") if not multilabel else np.where(
+ probs >= 0.5)[0].astype("int32")
+ clas_id_list = []
+ score_list = []
+ label_name_list = []
+ for i in index:
+ clas_id_list.append(i.item())
+ score_list.append(probs[i].item())
+ if self.class_id_map is not None:
+ label_name_list.append(self.class_id_map[i.item()])
+ result = {
+ "class_ids": clas_id_list,
+ "scores": np.around(score_list, decimals=5).tolist(),
+ }
+ if file_names is not None:
+ result["file_name"] = file_names[idx]
+ if label_name_list is not None:
+ result["label_names"] = label_name_list
+ y.append(result)
+ return y
diff --git a/modules/image/classification/esnet_x0_5_imagenet/utils.py b/modules/image/classification/esnet_x0_5_imagenet/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..df2bc36b3050beb0256bf2266dd6b33b4590e537
--- /dev/null
+++ b/modules/image/classification/esnet_x0_5_imagenet/utils.py
@@ -0,0 +1,129 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import copy
+import os
+
+import yaml
+
+__all__ = ['get_config']
+
+
+class AttrDict(dict):
+
+ def __getattr__(self, key):
+ return self[key]
+
+ def __setattr__(self, key, value):
+ if key in self.__dict__:
+ self.__dict__[key] = value
+ else:
+ self[key] = value
+
+ def __deepcopy__(self, content):
+ return copy.deepcopy(dict(self))
+
+
+def create_attr_dict(yaml_config):
+ from ast import literal_eval
+ for key, value in yaml_config.items():
+ if type(value) is dict:
+ yaml_config[key] = value = AttrDict(value)
+ if isinstance(value, str):
+ try:
+ value = literal_eval(value)
+ except BaseException:
+ pass
+ if isinstance(value, AttrDict):
+ create_attr_dict(yaml_config[key])
+ else:
+ yaml_config[key] = value
+
+
+def parse_config(cfg_file):
+ """Load a config file into AttrDict"""
+ with open(cfg_file, 'r') as fopen:
+ yaml_config = AttrDict(yaml.load(fopen, Loader=yaml.SafeLoader))
+ create_attr_dict(yaml_config)
+ return yaml_config
+
+
+def override(dl, ks, v):
+ """
+ Recursively replace dict of list
+ Args:
+ dl(dict or list): dict or list to be replaced
+ ks(list): list of keys
+ v(str): value to be replaced
+ """
+
+ def str2num(v):
+ try:
+ return eval(v)
+ except Exception:
+ return v
+
+ assert isinstance(dl, (list, dict)), ("{} should be a list or a dict")
+ assert len(ks) > 0, ('lenght of keys should larger than 0')
+ if isinstance(dl, list):
+ k = str2num(ks[0])
+ if len(ks) == 1:
+ assert k < len(dl), ('index({}) out of range({})'.format(k, dl))
+ dl[k] = str2num(v)
+ else:
+ override(dl[k], ks[1:], v)
+ else:
+ if len(ks) == 1:
+ # assert ks[0] in dl, ('{} is not exist in {}'.format(ks[0], dl))
+ if not ks[0] in dl:
+ print('A new filed ({}) detected!'.format(ks[0], dl))
+ dl[ks[0]] = str2num(v)
+ else:
+ override(dl[ks[0]], ks[1:], v)
+
+
+def override_config(config, options=None):
+ """
+ Recursively override the config
+ Args:
+ config(dict): dict to be replaced
+ options(list): list of pairs(key0.key1.idx.key2=value)
+ such as: [
+ 'topk=2',
+ 'VALID.transforms.1.ResizeImage.resize_short=300'
+ ]
+ Returns:
+ config(dict): replaced config
+ """
+ if options is not None:
+ for opt in options:
+ assert isinstance(opt, str), ("option({}) should be a str".format(opt))
+ assert "=" in opt, ("option({}) should contain a ="
+ "to distinguish between key and value".format(opt))
+ pair = opt.split('=')
+ assert len(pair) == 2, ("there can be only a = in the option")
+ key, value = pair
+ keys = key.split('.')
+ override(config, keys, value)
+ return config
+
+
+def get_config(fname, overrides=None, show=False):
+ """
+ Read config from file
+ """
+ assert os.path.exists(fname), ('config file({}) is not exist'.format(fname))
+ config = parse_config(fname)
+ override_config(config, overrides)
+ return config
diff --git a/modules/image/classification/levit_128_imagenet/README.md b/modules/image/classification/levit_128_imagenet/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..5a1bbedfcee657fba445bd53ad12e41489b79afb
--- /dev/null
+++ b/modules/image/classification/levit_128_imagenet/README.md
@@ -0,0 +1,132 @@
+# levit_128_imagenet
+
+|模型名称|levit_128_imagenet|
+| :--- | :---: |
+|类别|图像-图像分类|
+|网络|LeViT|
+|数据集|ImageNet-2012|
+|是否支持Fine-tuning|否|
+|模型大小|54 MB|
+|最新更新日期|2022-04-02|
+|数据指标|Acc|
+
+
+## 一、模型基本信息
+
+
+- ### 模型介绍
+
+ - LeViT 是一种快速推理的、用于图像分类任务的混合神经网络。其设计之初考虑了网络模型在不同的硬件平台上的性能,因此能够更好地反映普遍应用的真实场景。通过大量实验,作者找到了卷积神经网络与 Transformer 体系更好的结合方式,并且提出了 attention-based 方法,用于整合 Transformer 中的位置信息编码, 该模块的模型结构配置为LeViT128, 详情可参考[论文地址](https://arxiv.org/abs/2104.01136)。
+
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 1.6.2
+
+ - paddlehub >= 1.6.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
+
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install levit_128_imagenet
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ $ hub run levit_128_imagenet --input_path "/PATH/TO/IMAGE"
+ ```
+ - 通过命令行方式实现分类模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ classifier = hub.Module(name="levit_128_imagenet")
+ result = classifier.classification(images=[cv2.imread('/PATH/TO/IMAGE')])
+ # or
+ # result = classifier.classification(paths=['/PATH/TO/IMAGE'])
+ ```
+
+- ### 3、API
+
+
+ - ```python
+ def classification(images=None,
+ paths=None,
+ batch_size=1,
+ use_gpu=False,
+ top_k=1):
+ ```
+ - 分类接口API。
+ - **参数**
+
+ - images (list\[numpy.ndarray\]): 图片数据,每一个图片数据的shape 均为 \[H, W, C\],颜色空间为 BGR;
+ - paths (list\[str\]): 图片的路径;
+ - batch\_size (int): batch 的大小;
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
+ - top\_k (int): 返回预测结果的前 k 个。
+
+ - **返回**
+
+ - res (list\[dict\]): 分类结果,列表的每一个元素均为字典,其中 key 包括'class_ids'(种类索引), 'scores'(置信度) 和 'label_names'(种类名称)
+
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个图像识别的在线服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+ - ```shell
+ $ hub serving start -m levit_128_imagenet
+ ```
+
+ - 这样就完成了一个图像识别的在线服务的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ # 发送HTTP请求
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+ headers = {"Content-type": "application/json"\}
+ url = "http://127.0.0.1:8866/predict/levit_128_imagenet"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # 打印预测结果
+ print(r.json()["results"])
+ ```
+
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+
+ - ```shell
+ $ hub install levit_128_imagenet==1.0.0
+ ```
diff --git a/modules/image/classification/levit_128_imagenet/model.py b/modules/image/classification/levit_128_imagenet/model.py
new file mode 100644
index 0000000000000000000000000000000000000000..2cf87d515201935a705b77737137ed2d4567fd40
--- /dev/null
+++ b/modules/image/classification/levit_128_imagenet/model.py
@@ -0,0 +1,450 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# Code was based on https://github.com/facebookresearch/LeViT
+import itertools
+import math
+import warnings
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn.initializer import Constant
+from paddle.nn.initializer import TruncatedNormal
+from paddle.regularizer import L2Decay
+
+from .vision_transformer import Identity
+from .vision_transformer import ones_
+from .vision_transformer import trunc_normal_
+from .vision_transformer import zeros_
+
+
+def cal_attention_biases(attention_biases, attention_bias_idxs):
+ gather_list = []
+ attention_bias_t = paddle.transpose(attention_biases, (1, 0))
+ nums = attention_bias_idxs.shape[0]
+ for idx in range(nums):
+ gather = paddle.gather(attention_bias_t, attention_bias_idxs[idx])
+ gather_list.append(gather)
+ shape0, shape1 = attention_bias_idxs.shape
+ gather = paddle.concat(gather_list)
+ return paddle.transpose(gather, (1, 0)).reshape((0, shape0, shape1))
+
+
+class Conv2d_BN(nn.Sequential):
+
+ def __init__(self, a, b, ks=1, stride=1, pad=0, dilation=1, groups=1, bn_weight_init=1, resolution=-10000):
+ super().__init__()
+ self.add_sublayer('c', nn.Conv2D(a, b, ks, stride, pad, dilation, groups, bias_attr=False))
+ bn = nn.BatchNorm2D(b)
+ ones_(bn.weight)
+ zeros_(bn.bias)
+ self.add_sublayer('bn', bn)
+
+
+class Linear_BN(nn.Sequential):
+
+ def __init__(self, a, b, bn_weight_init=1):
+ super().__init__()
+ self.add_sublayer('c', nn.Linear(a, b, bias_attr=False))
+ bn = nn.BatchNorm1D(b)
+ if bn_weight_init == 0:
+ zeros_(bn.weight)
+ else:
+ ones_(bn.weight)
+ zeros_(bn.bias)
+ self.add_sublayer('bn', bn)
+
+ def forward(self, x):
+ l, bn = self._sub_layers.values()
+ x = l(x)
+ return paddle.reshape(bn(x.flatten(0, 1)), x.shape)
+
+
+class BN_Linear(nn.Sequential):
+
+ def __init__(self, a, b, bias=True, std=0.02):
+ super().__init__()
+ self.add_sublayer('bn', nn.BatchNorm1D(a))
+ l = nn.Linear(a, b, bias_attr=bias)
+ trunc_normal_(l.weight)
+ if bias:
+ zeros_(l.bias)
+ self.add_sublayer('l', l)
+
+
+def b16(n, activation, resolution=224):
+ return nn.Sequential(Conv2d_BN(3, n // 8, 3, 2, 1, resolution=resolution), activation(),
+ Conv2d_BN(n // 8, n // 4, 3, 2, 1, resolution=resolution // 2), activation(),
+ Conv2d_BN(n // 4, n // 2, 3, 2, 1, resolution=resolution // 4), activation(),
+ Conv2d_BN(n // 2, n, 3, 2, 1, resolution=resolution // 8))
+
+
+class Residual(nn.Layer):
+
+ def __init__(self, m, drop):
+ super().__init__()
+ self.m = m
+ self.drop = drop
+
+ def forward(self, x):
+ if self.training and self.drop > 0:
+ y = paddle.rand(shape=[x.shape[0], 1, 1]).__ge__(self.drop).astype("float32")
+ y = y.divide(paddle.full_like(y, 1 - self.drop))
+ return paddle.add(x, y)
+ else:
+ return paddle.add(x, self.m(x))
+
+
+class Attention(nn.Layer):
+
+ def __init__(self, dim, key_dim, num_heads=8, attn_ratio=4, activation=None, resolution=14):
+ super().__init__()
+ self.num_heads = num_heads
+ self.scale = key_dim**-0.5
+ self.key_dim = key_dim
+ self.nh_kd = nh_kd = key_dim * num_heads
+ self.d = int(attn_ratio * key_dim)
+ self.dh = int(attn_ratio * key_dim) * num_heads
+ self.attn_ratio = attn_ratio
+ self.h = self.dh + nh_kd * 2
+ self.qkv = Linear_BN(dim, self.h)
+ self.proj = nn.Sequential(activation(), Linear_BN(self.dh, dim, bn_weight_init=0))
+ points = list(itertools.product(range(resolution), range(resolution)))
+ N = len(points)
+ attention_offsets = {}
+ idxs = []
+ for p1 in points:
+ for p2 in points:
+ offset = (abs(p1[0] - p2[0]), abs(p1[1] - p2[1]))
+ if offset not in attention_offsets:
+ attention_offsets[offset] = len(attention_offsets)
+ idxs.append(attention_offsets[offset])
+ self.attention_biases = self.create_parameter(shape=(num_heads, len(attention_offsets)),
+ default_initializer=zeros_,
+ attr=paddle.ParamAttr(regularizer=L2Decay(0.0)))
+ tensor_idxs = paddle.to_tensor(idxs, dtype='int64')
+ self.register_buffer('attention_bias_idxs', paddle.reshape(tensor_idxs, [N, N]))
+
+ @paddle.no_grad()
+ def train(self, mode=True):
+ if mode:
+ super().train()
+ else:
+ super().eval()
+ if mode and hasattr(self, 'ab'):
+ del self.ab
+ else:
+ self.ab = cal_attention_biases(self.attention_biases, self.attention_bias_idxs)
+
+ def forward(self, x):
+ self.training = True
+ B, N, C = x.shape
+ qkv = self.qkv(x)
+ qkv = paddle.reshape(qkv, [B, N, self.num_heads, self.h // self.num_heads])
+ q, k, v = paddle.split(qkv, [self.key_dim, self.key_dim, self.d], axis=3)
+ q = paddle.transpose(q, perm=[0, 2, 1, 3])
+ k = paddle.transpose(k, perm=[0, 2, 1, 3])
+ v = paddle.transpose(v, perm=[0, 2, 1, 3])
+ k_transpose = paddle.transpose(k, perm=[0, 1, 3, 2])
+
+ if self.training:
+ attention_biases = cal_attention_biases(self.attention_biases, self.attention_bias_idxs)
+ else:
+ attention_biases = self.ab
+ attn = (paddle.matmul(q, k_transpose) * self.scale + attention_biases)
+ attn = F.softmax(attn)
+ x = paddle.transpose(paddle.matmul(attn, v), perm=[0, 2, 1, 3])
+ x = paddle.reshape(x, [B, N, self.dh])
+ x = self.proj(x)
+ return x
+
+
+class Subsample(nn.Layer):
+
+ def __init__(self, stride, resolution):
+ super().__init__()
+ self.stride = stride
+ self.resolution = resolution
+
+ def forward(self, x):
+ B, N, C = x.shape
+ x = paddle.reshape(x, [B, self.resolution, self.resolution, C])
+ end1, end2 = x.shape[1], x.shape[2]
+ x = x[:, 0:end1:self.stride, 0:end2:self.stride]
+ x = paddle.reshape(x, [B, -1, C])
+ return x
+
+
+class AttentionSubsample(nn.Layer):
+
+ def __init__(self,
+ in_dim,
+ out_dim,
+ key_dim,
+ num_heads=8,
+ attn_ratio=2,
+ activation=None,
+ stride=2,
+ resolution=14,
+ resolution_=7):
+ super().__init__()
+ self.num_heads = num_heads
+ self.scale = key_dim**-0.5
+ self.key_dim = key_dim
+ self.nh_kd = nh_kd = key_dim * num_heads
+ self.d = int(attn_ratio * key_dim)
+ self.dh = int(attn_ratio * key_dim) * self.num_heads
+ self.attn_ratio = attn_ratio
+ self.resolution_ = resolution_
+ self.resolution_2 = resolution_**2
+ self.training = True
+ h = self.dh + nh_kd
+ self.kv = Linear_BN(in_dim, h)
+
+ self.q = nn.Sequential(Subsample(stride, resolution), Linear_BN(in_dim, nh_kd))
+ self.proj = nn.Sequential(activation(), Linear_BN(self.dh, out_dim))
+
+ self.stride = stride
+ self.resolution = resolution
+ points = list(itertools.product(range(resolution), range(resolution)))
+ points_ = list(itertools.product(range(resolution_), range(resolution_)))
+
+ N = len(points)
+ N_ = len(points_)
+ attention_offsets = {}
+ idxs = []
+ i = 0
+ j = 0
+ for p1 in points_:
+ i += 1
+ for p2 in points:
+ j += 1
+ size = 1
+ offset = (abs(p1[0] * stride - p2[0] + (size - 1) / 2), abs(p1[1] * stride - p2[1] + (size - 1) / 2))
+ if offset not in attention_offsets:
+ attention_offsets[offset] = len(attention_offsets)
+ idxs.append(attention_offsets[offset])
+ self.attention_biases = self.create_parameter(shape=(num_heads, len(attention_offsets)),
+ default_initializer=zeros_,
+ attr=paddle.ParamAttr(regularizer=L2Decay(0.0)))
+
+ tensor_idxs_ = paddle.to_tensor(idxs, dtype='int64')
+ self.register_buffer('attention_bias_idxs', paddle.reshape(tensor_idxs_, [N_, N]))
+
+ @paddle.no_grad()
+ def train(self, mode=True):
+ if mode:
+ super().train()
+ else:
+ super().eval()
+ if mode and hasattr(self, 'ab'):
+ del self.ab
+ else:
+ self.ab = cal_attention_biases(self.attention_biases, self.attention_bias_idxs)
+
+ def forward(self, x):
+ self.training = True
+ B, N, C = x.shape
+ kv = self.kv(x)
+ kv = paddle.reshape(kv, [B, N, self.num_heads, -1])
+ k, v = paddle.split(kv, [self.key_dim, self.d], axis=3)
+ k = paddle.transpose(k, perm=[0, 2, 1, 3]) # BHNC
+ v = paddle.transpose(v, perm=[0, 2, 1, 3])
+ q = paddle.reshape(self.q(x), [B, self.resolution_2, self.num_heads, self.key_dim])
+ q = paddle.transpose(q, perm=[0, 2, 1, 3])
+
+ if self.training:
+ attention_biases = cal_attention_biases(self.attention_biases, self.attention_bias_idxs)
+ else:
+ attention_biases = self.ab
+
+ attn = (paddle.matmul(q, paddle.transpose(k, perm=[0, 1, 3, 2]))) * self.scale + attention_biases
+ attn = F.softmax(attn)
+
+ x = paddle.reshape(paddle.transpose(paddle.matmul(attn, v), perm=[0, 2, 1, 3]), [B, -1, self.dh])
+ x = self.proj(x)
+ return x
+
+
+class LeViT(nn.Layer):
+ """ Vision Transformer with support for patch or hybrid CNN input stage
+ """
+
+ def __init__(self,
+ img_size=224,
+ patch_size=16,
+ in_chans=3,
+ class_num=1000,
+ embed_dim=[192],
+ key_dim=[64],
+ depth=[12],
+ num_heads=[3],
+ attn_ratio=[2],
+ mlp_ratio=[2],
+ hybrid_backbone=None,
+ down_ops=[],
+ attention_activation=nn.Hardswish,
+ mlp_activation=nn.Hardswish,
+ distillation=True,
+ drop_path=0):
+ super().__init__()
+
+ self.class_num = class_num
+ self.num_features = embed_dim[-1]
+ self.embed_dim = embed_dim
+ self.distillation = distillation
+
+ self.patch_embed = hybrid_backbone
+
+ self.blocks = []
+ down_ops.append([''])
+ resolution = img_size // patch_size
+ for i, (ed, kd, dpth, nh, ar, mr,
+ do) in enumerate(zip(embed_dim, key_dim, depth, num_heads, attn_ratio, mlp_ratio, down_ops)):
+ for _ in range(dpth):
+ self.blocks.append(
+ Residual(
+ Attention(
+ ed,
+ kd,
+ nh,
+ attn_ratio=ar,
+ activation=attention_activation,
+ resolution=resolution,
+ ), drop_path))
+ if mr > 0:
+ h = int(ed * mr)
+ self.blocks.append(
+ Residual(
+ nn.Sequential(
+ Linear_BN(ed, h),
+ mlp_activation(),
+ Linear_BN(h, ed, bn_weight_init=0),
+ ), drop_path))
+ if do[0] == 'Subsample':
+ #('Subsample',key_dim, num_heads, attn_ratio, mlp_ratio, stride)
+ resolution_ = (resolution - 1) // do[5] + 1
+ self.blocks.append(
+ AttentionSubsample(*embed_dim[i:i + 2],
+ key_dim=do[1],
+ num_heads=do[2],
+ attn_ratio=do[3],
+ activation=attention_activation,
+ stride=do[5],
+ resolution=resolution,
+ resolution_=resolution_))
+ resolution = resolution_
+ if do[4] > 0: # mlp_ratio
+ h = int(embed_dim[i + 1] * do[4])
+ self.blocks.append(
+ Residual(
+ nn.Sequential(
+ Linear_BN(embed_dim[i + 1], h),
+ mlp_activation(),
+ Linear_BN(h, embed_dim[i + 1], bn_weight_init=0),
+ ), drop_path))
+ self.blocks = nn.Sequential(*self.blocks)
+
+ # Classifier head
+ self.head = BN_Linear(embed_dim[-1], class_num) if class_num > 0 else Identity()
+ if distillation:
+ self.head_dist = BN_Linear(embed_dim[-1], class_num) if class_num > 0 else Identity()
+
+ def forward(self, x):
+ x = self.patch_embed(x)
+ x = x.flatten(2)
+ x = paddle.transpose(x, perm=[0, 2, 1])
+ x = self.blocks(x)
+ x = x.mean(1)
+
+ x = paddle.reshape(x, [-1, self.embed_dim[-1]])
+ if self.distillation:
+ x = self.head(x), self.head_dist(x)
+ if not self.training:
+ x = (x[0] + x[1]) / 2
+ else:
+ x = self.head(x)
+ return x
+
+
+def model_factory(C, D, X, N, drop_path, class_num, distillation):
+ embed_dim = [int(x) for x in C.split('_')]
+ num_heads = [int(x) for x in N.split('_')]
+ depth = [int(x) for x in X.split('_')]
+ act = nn.Hardswish
+ model = LeViT(
+ patch_size=16,
+ embed_dim=embed_dim,
+ num_heads=num_heads,
+ key_dim=[D] * 3,
+ depth=depth,
+ attn_ratio=[2, 2, 2],
+ mlp_ratio=[2, 2, 2],
+ down_ops=[
+ #('Subsample',key_dim, num_heads, attn_ratio, mlp_ratio, stride)
+ ['Subsample', D, embed_dim[0] // D, 4, 2, 2],
+ ['Subsample', D, embed_dim[1] // D, 4, 2, 2],
+ ],
+ attention_activation=act,
+ mlp_activation=act,
+ hybrid_backbone=b16(embed_dim[0], activation=act),
+ class_num=class_num,
+ drop_path=drop_path,
+ distillation=distillation)
+
+ return model
+
+
+specification = {
+ 'LeViT_128S': {
+ 'C': '128_256_384',
+ 'D': 16,
+ 'N': '4_6_8',
+ 'X': '2_3_4',
+ 'drop_path': 0
+ },
+ 'LeViT_128': {
+ 'C': '128_256_384',
+ 'D': 16,
+ 'N': '4_8_12',
+ 'X': '4_4_4',
+ 'drop_path': 0
+ },
+ 'LeViT_192': {
+ 'C': '192_288_384',
+ 'D': 32,
+ 'N': '3_5_6',
+ 'X': '4_4_4',
+ 'drop_path': 0
+ },
+ 'LeViT_256': {
+ 'C': '256_384_512',
+ 'D': 32,
+ 'N': '4_6_8',
+ 'X': '4_4_4',
+ 'drop_path': 0
+ },
+ 'LeViT_384': {
+ 'C': '384_512_768',
+ 'D': 32,
+ 'N': '6_9_12',
+ 'X': '4_4_4',
+ 'drop_path': 0.1
+ },
+}
+
+
+def LeViT_128(**kwargs):
+ model = model_factory(**specification['LeViT_128'], class_num=1000, distillation=False)
+ return model
diff --git a/modules/image/classification/levit_128_imagenet/module.py b/modules/image/classification/levit_128_imagenet/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..1ed4aba85900e8051e912730892aee68f4e21bf0
--- /dev/null
+++ b/modules/image/classification/levit_128_imagenet/module.py
@@ -0,0 +1,154 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import copy
+import os
+
+import cv2
+import numpy as np
+import paddle
+from skimage.io import imread
+from skimage.transform import rescale
+from skimage.transform import resize
+
+import paddlehub as hub
+from .model import LeViT_128
+from .processor import base64_to_cv2
+from .processor import create_operators
+from .processor import Topk
+from .utils import get_config
+from paddlehub.module.module import moduleinfo
+from paddlehub.module.module import runnable
+from paddlehub.module.module import serving
+
+
+@moduleinfo(name="levit_128_imagenet",
+ type="cv/classification",
+ author="paddlepaddle",
+ author_email="",
+ summary="",
+ version="1.0.0")
+class LeViT_128_ImageNet:
+
+ def __init__(self):
+ self.config = get_config(os.path.join(self.directory, 'LeViT_128.yaml'), show=False)
+ self.label_path = os.path.join(self.directory, 'imagenet1k_label_list.txt')
+ self.pretrain_path = os.path.join(self.directory, 'LeViT_128_pretrained.pdparams')
+ self.config['Infer']['PostProcess']['class_id_map_file'] = self.label_path
+ self.model = LeViT_128()
+ param_state_dict = paddle.load(self.pretrain_path)
+ self.model.set_dict(param_state_dict)
+ self.preprocess_funcs = create_operators(self.config["Infer"]["transforms"])
+
+ def classification(self,
+ images: list = None,
+ paths: list = None,
+ batch_size: int = 1,
+ use_gpu: bool = False,
+ top_k: int = 1):
+ '''
+ Args:
+ images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR.
+ paths (list[str]): The paths of images.
+ batch_size (int): batch size.
+ use_gpu (bool): Whether to use gpu.
+ top_k (int): Return top k results.
+
+ Returns:
+ res (list[dict]): The classfication results, each result dict contains key 'class_ids', 'scores' and 'label_names'.
+ '''
+ postprocess_func = Topk(top_k, self.label_path)
+ inputs = []
+ results = []
+ paddle.disable_static()
+ place = 'gpu:0' if use_gpu else 'cpu'
+ place = paddle.set_device(place)
+ if images == None and paths == None:
+ print('No image provided. Please input an image or a image path.')
+ return
+
+ if images != None:
+ for image in images:
+ image = image[:, :, ::-1]
+ inputs.append(image)
+
+ if paths != None:
+ for path in paths:
+ image = cv2.imread(path)[:, :, ::-1]
+ inputs.append(image)
+
+ batch_data = []
+ for idx, imagedata in enumerate(inputs):
+ for process in self.preprocess_funcs:
+ imagedata = process(imagedata)
+ batch_data.append(imagedata)
+ if len(batch_data) >= batch_size or idx == len(inputs) - 1:
+ batch_tensor = paddle.to_tensor(batch_data)
+ out = self.model(batch_tensor)
+ if isinstance(out, list):
+ out = out[0]
+ if isinstance(out, dict) and "logits" in out:
+ out = out["logits"]
+ if isinstance(out, dict) and "output" in out:
+ out = out["output"]
+ result = postprocess_func(out)
+ results.extend(result)
+ batch_data.clear()
+ return results
+
+ @runnable
+ def run_cmd(self, argvs: list):
+ """
+ Run as a command.
+ """
+ self.parser = argparse.ArgumentParser(description="Run the {} module.".format(self.name),
+ prog='hub run {}'.format(self.name),
+ usage='%(prog)s',
+ add_help=True)
+
+ self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
+ self.arg_config_group = self.parser.add_argument_group(
+ title="Config options", description="Run configuration for controlling module behavior, not required.")
+ self.add_module_config_arg()
+ self.add_module_input_arg()
+ self.args = self.parser.parse_args(argvs)
+ results = self.classification(paths=[self.args.input_path],
+ use_gpu=self.args.use_gpu,
+ batch_size=self.args.batch_size,
+ top_k=self.args.top_k)
+ return results
+
+ @serving
+ def serving_method(self, images, **kwargs):
+ """
+ Run as a service.
+ """
+ images_decode = [base64_to_cv2(image) for image in images]
+ results = self.classification(images=images_decode, **kwargs)
+ return results
+
+ def add_module_config_arg(self):
+ """
+ Add the command config options.
+ """
+ self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not")
+
+ self.arg_config_group.add_argument('--batch_size', type=int, default=1, help='batch size')
+ self.arg_config_group.add_argument('--top_k', type=int, default=1, help='Return top k results.')
+
+ def add_module_input_arg(self):
+ """
+ Add the command input options.
+ """
+ self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.")
diff --git a/modules/image/classification/levit_128_imagenet/processor.py b/modules/image/classification/levit_128_imagenet/processor.py
new file mode 100644
index 0000000000000000000000000000000000000000..40cab3917ecaef50cd47d0abb76bbd5d49062bf8
--- /dev/null
+++ b/modules/image/classification/levit_128_imagenet/processor.py
@@ -0,0 +1,374 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+
+import base64
+import inspect
+import math
+import os
+import random
+import sys
+from functools import partial
+
+import cv2
+import numpy as np
+import paddle
+import paddle.nn.functional as F
+import six
+from paddle.vision.transforms import ColorJitter as RawColorJitter
+from PIL import Image
+
+
+def create_operators(params, class_num=None):
+ """
+ create operators based on the config
+
+ Args:
+ params(list): a dict list, used to create some operators
+ """
+ assert isinstance(params, list), ('operator config should be a list')
+ ops = []
+ current_module = sys.modules[__name__]
+ for operator in params:
+ assert isinstance(operator, dict) and len(operator) == 1, "yaml format error"
+ op_name = list(operator)[0]
+ param = {} if operator[op_name] is None else operator[op_name]
+ op_func = getattr(current_module, op_name)
+ if "class_num" in inspect.getfullargspec(op_func).args:
+ param.update({"class_num": class_num})
+ op = op_func(**param)
+ ops.append(op)
+
+ return ops
+
+
+class UnifiedResize(object):
+
+ def __init__(self, interpolation=None, backend="cv2"):
+ _cv2_interp_from_str = {
+ 'nearest': cv2.INTER_NEAREST,
+ 'bilinear': cv2.INTER_LINEAR,
+ 'area': cv2.INTER_AREA,
+ 'bicubic': cv2.INTER_CUBIC,
+ 'lanczos': cv2.INTER_LANCZOS4
+ }
+ _pil_interp_from_str = {
+ 'nearest': Image.NEAREST,
+ 'bilinear': Image.BILINEAR,
+ 'bicubic': Image.BICUBIC,
+ 'box': Image.BOX,
+ 'lanczos': Image.LANCZOS,
+ 'hamming': Image.HAMMING
+ }
+
+ def _pil_resize(src, size, resample):
+ pil_img = Image.fromarray(src)
+ pil_img = pil_img.resize(size, resample)
+ return np.asarray(pil_img)
+
+ if backend.lower() == "cv2":
+ if isinstance(interpolation, str):
+ interpolation = _cv2_interp_from_str[interpolation.lower()]
+ # compatible with opencv < version 4.4.0
+ elif interpolation is None:
+ interpolation = cv2.INTER_LINEAR
+ self.resize_func = partial(cv2.resize, interpolation=interpolation)
+ elif backend.lower() == "pil":
+ if isinstance(interpolation, str):
+ interpolation = _pil_interp_from_str[interpolation.lower()]
+ self.resize_func = partial(_pil_resize, resample=interpolation)
+ else:
+ self.resize_func = cv2.resize
+
+ def __call__(self, src, size):
+ return self.resize_func(src, size)
+
+
+class OperatorParamError(ValueError):
+ """ OperatorParamError
+ """
+ pass
+
+
+class DecodeImage(object):
+ """ decode image """
+
+ def __init__(self, to_rgb=True, to_np=False, channel_first=False):
+ self.to_rgb = to_rgb
+ self.to_np = to_np # to numpy
+ self.channel_first = channel_first # only enabled when to_np is True
+
+ def __call__(self, img):
+ if six.PY2:
+ assert type(img) is str and len(img) > 0, "invalid input 'img' in DecodeImage"
+ else:
+ assert type(img) is bytes and len(img) > 0, "invalid input 'img' in DecodeImage"
+ data = np.frombuffer(img, dtype='uint8')
+ img = cv2.imdecode(data, 1)
+ if self.to_rgb:
+ assert img.shape[2] == 3, 'invalid shape of image[%s]' % (img.shape)
+ img = img[:, :, ::-1]
+
+ if self.channel_first:
+ img = img.transpose((2, 0, 1))
+
+ return img
+
+
+class ResizeImage(object):
+ """ resize image """
+
+ def __init__(self, size=None, resize_short=None, interpolation=None, backend="cv2"):
+ if resize_short is not None and resize_short > 0:
+ self.resize_short = resize_short
+ self.w = None
+ self.h = None
+ elif size is not None:
+ self.resize_short = None
+ self.w = size if type(size) is int else size[0]
+ self.h = size if type(size) is int else size[1]
+ else:
+ raise OperatorParamError("invalid params for ReisizeImage for '\
+ 'both 'size' and 'resize_short' are None")
+
+ self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend)
+
+ def __call__(self, img):
+ img_h, img_w = img.shape[:2]
+ if self.resize_short is not None:
+ percent = float(self.resize_short) / min(img_w, img_h)
+ w = int(round(img_w * percent))
+ h = int(round(img_h * percent))
+ else:
+ w = self.w
+ h = self.h
+ return self._resize_func(img, (w, h))
+
+
+class CropImage(object):
+ """ crop image """
+
+ def __init__(self, size):
+ if type(size) is int:
+ self.size = (size, size)
+ else:
+ self.size = size # (h, w)
+
+ def __call__(self, img):
+ w, h = self.size
+ img_h, img_w = img.shape[:2]
+ w_start = (img_w - w) // 2
+ h_start = (img_h - h) // 2
+
+ w_end = w_start + w
+ h_end = h_start + h
+ return img[h_start:h_end, w_start:w_end, :]
+
+
+class RandCropImage(object):
+ """ random crop image """
+
+ def __init__(self, size, scale=None, ratio=None, interpolation=None, backend="cv2"):
+ if type(size) is int:
+ self.size = (size, size) # (h, w)
+ else:
+ self.size = size
+
+ self.scale = [0.08, 1.0] if scale is None else scale
+ self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio
+
+ self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend)
+
+ def __call__(self, img):
+ size = self.size
+ scale = self.scale
+ ratio = self.ratio
+
+ aspect_ratio = math.sqrt(random.uniform(*ratio))
+ w = 1. * aspect_ratio
+ h = 1. / aspect_ratio
+
+ img_h, img_w = img.shape[:2]
+
+ bound = min((float(img_w) / img_h) / (w**2), (float(img_h) / img_w) / (h**2))
+ scale_max = min(scale[1], bound)
+ scale_min = min(scale[0], bound)
+
+ target_area = img_w * img_h * random.uniform(scale_min, scale_max)
+ target_size = math.sqrt(target_area)
+ w = int(target_size * w)
+ h = int(target_size * h)
+
+ i = random.randint(0, img_w - w)
+ j = random.randint(0, img_h - h)
+
+ img = img[j:j + h, i:i + w, :]
+
+ return self._resize_func(img, size)
+
+
+class RandFlipImage(object):
+ """ random flip image
+ flip_code:
+ 1: Flipped Horizontally
+ 0: Flipped Vertically
+ -1: Flipped Horizontally & Vertically
+ """
+
+ def __init__(self, flip_code=1):
+ assert flip_code in [-1, 0, 1], "flip_code should be a value in [-1, 0, 1]"
+ self.flip_code = flip_code
+
+ def __call__(self, img):
+ if random.randint(0, 1) == 1:
+ return cv2.flip(img, self.flip_code)
+ else:
+ return img
+
+
+class NormalizeImage(object):
+ """ normalize image such as substract mean, divide std
+ """
+
+ def __init__(self, scale=None, mean=None, std=None, order='chw', output_fp16=False, channel_num=3):
+ if isinstance(scale, str):
+ scale = eval(scale)
+ assert channel_num in [3, 4], "channel number of input image should be set to 3 or 4."
+ self.channel_num = channel_num
+ self.output_dtype = 'float16' if output_fp16 else 'float32'
+ self.scale = np.float32(scale if scale is not None else 1.0 / 255.0)
+ self.order = order
+ mean = mean if mean is not None else [0.485, 0.456, 0.406]
+ std = std if std is not None else [0.229, 0.224, 0.225]
+
+ shape = (3, 1, 1) if self.order == 'chw' else (1, 1, 3)
+ self.mean = np.array(mean).reshape(shape).astype('float32')
+ self.std = np.array(std).reshape(shape).astype('float32')
+
+ def __call__(self, img):
+ from PIL import Image
+ if isinstance(img, Image.Image):
+ img = np.array(img)
+
+ assert isinstance(img, np.ndarray), "invalid input 'img' in NormalizeImage"
+
+ img = (img.astype('float32') * self.scale - self.mean) / self.std
+
+ if self.channel_num == 4:
+ img_h = img.shape[1] if self.order == 'chw' else img.shape[0]
+ img_w = img.shape[2] if self.order == 'chw' else img.shape[1]
+ pad_zeros = np.zeros((1, img_h, img_w)) if self.order == 'chw' else np.zeros((img_h, img_w, 1))
+ img = (np.concatenate((img, pad_zeros), axis=0) if self.order == 'chw' else np.concatenate(
+ (img, pad_zeros), axis=2))
+ return img.astype(self.output_dtype)
+
+
+class ToCHWImage(object):
+ """ convert hwc image to chw image
+ """
+
+ def __init__(self):
+ pass
+
+ def __call__(self, img):
+ from PIL import Image
+ if isinstance(img, Image.Image):
+ img = np.array(img)
+
+ return img.transpose((2, 0, 1))
+
+
+class ColorJitter(RawColorJitter):
+ """ColorJitter.
+ """
+
+ def __init__(self, *args, **kwargs):
+ super().__init__(*args, **kwargs)
+
+ def __call__(self, img):
+ if not isinstance(img, Image.Image):
+ img = np.ascontiguousarray(img)
+ img = Image.fromarray(img)
+ img = super()._apply_image(img)
+ if isinstance(img, Image.Image):
+ img = np.asarray(img)
+ return img
+
+
+def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+
+class Topk(object):
+
+ def __init__(self, topk=1, class_id_map_file=None):
+ assert isinstance(topk, (int, ))
+ self.class_id_map = self.parse_class_id_map(class_id_map_file)
+ self.topk = topk
+
+ def parse_class_id_map(self, class_id_map_file):
+ if class_id_map_file is None:
+ return None
+ if not os.path.exists(class_id_map_file):
+ print(
+ "Warning: If want to use your own label_dict, please input legal path!\nOtherwise label_names will be empty!"
+ )
+ return None
+
+ try:
+ class_id_map = {}
+ with open(class_id_map_file, "r") as fin:
+ lines = fin.readlines()
+ for line in lines:
+ partition = line.split("\n")[0].partition(" ")
+ class_id_map[int(partition[0])] = str(partition[-1])
+ except Exception as ex:
+ print(ex)
+ class_id_map = None
+ return class_id_map
+
+ def __call__(self, x, file_names=None, multilabel=False):
+ assert isinstance(x, paddle.Tensor)
+ if file_names is not None:
+ assert x.shape[0] == len(file_names)
+ x = F.softmax(x, axis=-1) if not multilabel else F.sigmoid(x)
+ x = x.numpy()
+ y = []
+ for idx, probs in enumerate(x):
+ index = probs.argsort(axis=0)[-self.topk:][::-1].astype("int32") if not multilabel else np.where(
+ probs >= 0.5)[0].astype("int32")
+ clas_id_list = []
+ score_list = []
+ label_name_list = []
+ for i in index:
+ clas_id_list.append(i.item())
+ score_list.append(probs[i].item())
+ if self.class_id_map is not None:
+ label_name_list.append(self.class_id_map[i.item()])
+ result = {
+ "class_ids": clas_id_list,
+ "scores": np.around(score_list, decimals=5).tolist(),
+ }
+ if file_names is not None:
+ result["file_name"] = file_names[idx]
+ if label_name_list is not None:
+ result["label_names"] = label_name_list
+ y.append(result)
+ return y
diff --git a/modules/image/classification/levit_128_imagenet/utils.py b/modules/image/classification/levit_128_imagenet/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..df2bc36b3050beb0256bf2266dd6b33b4590e537
--- /dev/null
+++ b/modules/image/classification/levit_128_imagenet/utils.py
@@ -0,0 +1,129 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import copy
+import os
+
+import yaml
+
+__all__ = ['get_config']
+
+
+class AttrDict(dict):
+
+ def __getattr__(self, key):
+ return self[key]
+
+ def __setattr__(self, key, value):
+ if key in self.__dict__:
+ self.__dict__[key] = value
+ else:
+ self[key] = value
+
+ def __deepcopy__(self, content):
+ return copy.deepcopy(dict(self))
+
+
+def create_attr_dict(yaml_config):
+ from ast import literal_eval
+ for key, value in yaml_config.items():
+ if type(value) is dict:
+ yaml_config[key] = value = AttrDict(value)
+ if isinstance(value, str):
+ try:
+ value = literal_eval(value)
+ except BaseException:
+ pass
+ if isinstance(value, AttrDict):
+ create_attr_dict(yaml_config[key])
+ else:
+ yaml_config[key] = value
+
+
+def parse_config(cfg_file):
+ """Load a config file into AttrDict"""
+ with open(cfg_file, 'r') as fopen:
+ yaml_config = AttrDict(yaml.load(fopen, Loader=yaml.SafeLoader))
+ create_attr_dict(yaml_config)
+ return yaml_config
+
+
+def override(dl, ks, v):
+ """
+ Recursively replace dict of list
+ Args:
+ dl(dict or list): dict or list to be replaced
+ ks(list): list of keys
+ v(str): value to be replaced
+ """
+
+ def str2num(v):
+ try:
+ return eval(v)
+ except Exception:
+ return v
+
+ assert isinstance(dl, (list, dict)), ("{} should be a list or a dict")
+ assert len(ks) > 0, ('lenght of keys should larger than 0')
+ if isinstance(dl, list):
+ k = str2num(ks[0])
+ if len(ks) == 1:
+ assert k < len(dl), ('index({}) out of range({})'.format(k, dl))
+ dl[k] = str2num(v)
+ else:
+ override(dl[k], ks[1:], v)
+ else:
+ if len(ks) == 1:
+ # assert ks[0] in dl, ('{} is not exist in {}'.format(ks[0], dl))
+ if not ks[0] in dl:
+ print('A new filed ({}) detected!'.format(ks[0], dl))
+ dl[ks[0]] = str2num(v)
+ else:
+ override(dl[ks[0]], ks[1:], v)
+
+
+def override_config(config, options=None):
+ """
+ Recursively override the config
+ Args:
+ config(dict): dict to be replaced
+ options(list): list of pairs(key0.key1.idx.key2=value)
+ such as: [
+ 'topk=2',
+ 'VALID.transforms.1.ResizeImage.resize_short=300'
+ ]
+ Returns:
+ config(dict): replaced config
+ """
+ if options is not None:
+ for opt in options:
+ assert isinstance(opt, str), ("option({}) should be a str".format(opt))
+ assert "=" in opt, ("option({}) should contain a ="
+ "to distinguish between key and value".format(opt))
+ pair = opt.split('=')
+ assert len(pair) == 2, ("there can be only a = in the option")
+ key, value = pair
+ keys = key.split('.')
+ override(config, keys, value)
+ return config
+
+
+def get_config(fname, overrides=None, show=False):
+ """
+ Read config from file
+ """
+ assert os.path.exists(fname), ('config file({}) is not exist'.format(fname))
+ config = parse_config(fname)
+ override_config(config, overrides)
+ return config
diff --git a/modules/image/classification/levit_128s_imagenet/README.md b/modules/image/classification/levit_128s_imagenet/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..34a1b82fa2df9fc6f641d467838e6d58727a9bfe
--- /dev/null
+++ b/modules/image/classification/levit_128s_imagenet/README.md
@@ -0,0 +1,132 @@
+# levit_128s_imagenet
+
+|模型名称|levit_128s_imagenet|
+| :--- | :---: |
+|类别|图像-图像分类|
+|网络|LeViT|
+|数据集|ImageNet-2012|
+|是否支持Fine-tuning|否|
+|模型大小|45 MB|
+|最新更新日期|2022-04-02|
+|数据指标|Acc|
+
+
+## 一、模型基本信息
+
+
+- ### 模型介绍
+
+ - LeViT 是一种快速推理的、用于图像分类任务的混合神经网络。其设计之初考虑了网络模型在不同的硬件平台上的性能,因此能够更好地反映普遍应用的真实场景。通过大量实验,作者找到了卷积神经网络与 Transformer 体系更好的结合方式,并且提出了 attention-based 方法,用于整合 Transformer 中的位置信息编码, 该模块的模型结构配置为LeViT128s, 详情可参考[论文地址](https://arxiv.org/abs/2104.01136)。
+
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 1.6.2
+
+ - paddlehub >= 1.6.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
+
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install levit_128s_imagenet
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ $ hub run levit_128s_imagenet --input_path "/PATH/TO/IMAGE"
+ ```
+ - 通过命令行方式实现分类模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ classifier = hub.Module(name="levit_128s_imagenet")
+ result = classifier.classification(images=[cv2.imread('/PATH/TO/IMAGE')])
+ # or
+ # result = classifier.classification(paths=['/PATH/TO/IMAGE'])
+ ```
+
+- ### 3、API
+
+
+ - ```python
+ def classification(images=None,
+ paths=None,
+ batch_size=1,
+ use_gpu=False,
+ top_k=1):
+ ```
+ - 分类接口API。
+ - **参数**
+
+ - images (list\[numpy.ndarray\]): 图片数据,每一个图片数据的shape 均为 \[H, W, C\],颜色空间为 BGR;
+ - paths (list\[str\]): 图片的路径;
+ - batch\_size (int): batch 的大小;
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
+ - top\_k (int): 返回预测结果的前 k 个。
+
+ - **返回**
+
+ - res (list\[dict\]): 分类结果,列表的每一个元素均为字典,其中 key 包括'class_ids'(种类索引), 'scores'(置信度) 和 'label_names'(种类名称)
+
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个图像识别的在线服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+ - ```shell
+ $ hub serving start -m levit_128s_imagenet
+ ```
+
+ - 这样就完成了一个图像识别的在线服务的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ # 发送HTTP请求
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+ headers = {"Content-type": "application/json"\}
+ url = "http://127.0.0.1:8866/predict/levit_128s_imagenet"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # 打印预测结果
+ print(r.json()["results"])
+ ```
+
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+
+ - ```shell
+ $ hub install levit_128s_imagenet==1.0.0
+ ```
diff --git a/modules/image/classification/levit_128s_imagenet/model.py b/modules/image/classification/levit_128s_imagenet/model.py
new file mode 100644
index 0000000000000000000000000000000000000000..7a1b8467183fc7146ea739331b6e58456812a867
--- /dev/null
+++ b/modules/image/classification/levit_128s_imagenet/model.py
@@ -0,0 +1,450 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# Code was based on https://github.com/facebookresearch/LeViT
+import itertools
+import math
+import warnings
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn.initializer import Constant
+from paddle.nn.initializer import TruncatedNormal
+from paddle.regularizer import L2Decay
+
+from .vision_transformer import Identity
+from .vision_transformer import ones_
+from .vision_transformer import trunc_normal_
+from .vision_transformer import zeros_
+
+
+def cal_attention_biases(attention_biases, attention_bias_idxs):
+ gather_list = []
+ attention_bias_t = paddle.transpose(attention_biases, (1, 0))
+ nums = attention_bias_idxs.shape[0]
+ for idx in range(nums):
+ gather = paddle.gather(attention_bias_t, attention_bias_idxs[idx])
+ gather_list.append(gather)
+ shape0, shape1 = attention_bias_idxs.shape
+ gather = paddle.concat(gather_list)
+ return paddle.transpose(gather, (1, 0)).reshape((0, shape0, shape1))
+
+
+class Conv2d_BN(nn.Sequential):
+
+ def __init__(self, a, b, ks=1, stride=1, pad=0, dilation=1, groups=1, bn_weight_init=1, resolution=-10000):
+ super().__init__()
+ self.add_sublayer('c', nn.Conv2D(a, b, ks, stride, pad, dilation, groups, bias_attr=False))
+ bn = nn.BatchNorm2D(b)
+ ones_(bn.weight)
+ zeros_(bn.bias)
+ self.add_sublayer('bn', bn)
+
+
+class Linear_BN(nn.Sequential):
+
+ def __init__(self, a, b, bn_weight_init=1):
+ super().__init__()
+ self.add_sublayer('c', nn.Linear(a, b, bias_attr=False))
+ bn = nn.BatchNorm1D(b)
+ if bn_weight_init == 0:
+ zeros_(bn.weight)
+ else:
+ ones_(bn.weight)
+ zeros_(bn.bias)
+ self.add_sublayer('bn', bn)
+
+ def forward(self, x):
+ l, bn = self._sub_layers.values()
+ x = l(x)
+ return paddle.reshape(bn(x.flatten(0, 1)), x.shape)
+
+
+class BN_Linear(nn.Sequential):
+
+ def __init__(self, a, b, bias=True, std=0.02):
+ super().__init__()
+ self.add_sublayer('bn', nn.BatchNorm1D(a))
+ l = nn.Linear(a, b, bias_attr=bias)
+ trunc_normal_(l.weight)
+ if bias:
+ zeros_(l.bias)
+ self.add_sublayer('l', l)
+
+
+def b16(n, activation, resolution=224):
+ return nn.Sequential(Conv2d_BN(3, n // 8, 3, 2, 1, resolution=resolution), activation(),
+ Conv2d_BN(n // 8, n // 4, 3, 2, 1, resolution=resolution // 2), activation(),
+ Conv2d_BN(n // 4, n // 2, 3, 2, 1, resolution=resolution // 4), activation(),
+ Conv2d_BN(n // 2, n, 3, 2, 1, resolution=resolution // 8))
+
+
+class Residual(nn.Layer):
+
+ def __init__(self, m, drop):
+ super().__init__()
+ self.m = m
+ self.drop = drop
+
+ def forward(self, x):
+ if self.training and self.drop > 0:
+ y = paddle.rand(shape=[x.shape[0], 1, 1]).__ge__(self.drop).astype("float32")
+ y = y.divide(paddle.full_like(y, 1 - self.drop))
+ return paddle.add(x, y)
+ else:
+ return paddle.add(x, self.m(x))
+
+
+class Attention(nn.Layer):
+
+ def __init__(self, dim, key_dim, num_heads=8, attn_ratio=4, activation=None, resolution=14):
+ super().__init__()
+ self.num_heads = num_heads
+ self.scale = key_dim**-0.5
+ self.key_dim = key_dim
+ self.nh_kd = nh_kd = key_dim * num_heads
+ self.d = int(attn_ratio * key_dim)
+ self.dh = int(attn_ratio * key_dim) * num_heads
+ self.attn_ratio = attn_ratio
+ self.h = self.dh + nh_kd * 2
+ self.qkv = Linear_BN(dim, self.h)
+ self.proj = nn.Sequential(activation(), Linear_BN(self.dh, dim, bn_weight_init=0))
+ points = list(itertools.product(range(resolution), range(resolution)))
+ N = len(points)
+ attention_offsets = {}
+ idxs = []
+ for p1 in points:
+ for p2 in points:
+ offset = (abs(p1[0] - p2[0]), abs(p1[1] - p2[1]))
+ if offset not in attention_offsets:
+ attention_offsets[offset] = len(attention_offsets)
+ idxs.append(attention_offsets[offset])
+ self.attention_biases = self.create_parameter(shape=(num_heads, len(attention_offsets)),
+ default_initializer=zeros_,
+ attr=paddle.ParamAttr(regularizer=L2Decay(0.0)))
+ tensor_idxs = paddle.to_tensor(idxs, dtype='int64')
+ self.register_buffer('attention_bias_idxs', paddle.reshape(tensor_idxs, [N, N]))
+
+ @paddle.no_grad()
+ def train(self, mode=True):
+ if mode:
+ super().train()
+ else:
+ super().eval()
+ if mode and hasattr(self, 'ab'):
+ del self.ab
+ else:
+ self.ab = cal_attention_biases(self.attention_biases, self.attention_bias_idxs)
+
+ def forward(self, x):
+ self.training = True
+ B, N, C = x.shape
+ qkv = self.qkv(x)
+ qkv = paddle.reshape(qkv, [B, N, self.num_heads, self.h // self.num_heads])
+ q, k, v = paddle.split(qkv, [self.key_dim, self.key_dim, self.d], axis=3)
+ q = paddle.transpose(q, perm=[0, 2, 1, 3])
+ k = paddle.transpose(k, perm=[0, 2, 1, 3])
+ v = paddle.transpose(v, perm=[0, 2, 1, 3])
+ k_transpose = paddle.transpose(k, perm=[0, 1, 3, 2])
+
+ if self.training:
+ attention_biases = cal_attention_biases(self.attention_biases, self.attention_bias_idxs)
+ else:
+ attention_biases = self.ab
+ attn = (paddle.matmul(q, k_transpose) * self.scale + attention_biases)
+ attn = F.softmax(attn)
+ x = paddle.transpose(paddle.matmul(attn, v), perm=[0, 2, 1, 3])
+ x = paddle.reshape(x, [B, N, self.dh])
+ x = self.proj(x)
+ return x
+
+
+class Subsample(nn.Layer):
+
+ def __init__(self, stride, resolution):
+ super().__init__()
+ self.stride = stride
+ self.resolution = resolution
+
+ def forward(self, x):
+ B, N, C = x.shape
+ x = paddle.reshape(x, [B, self.resolution, self.resolution, C])
+ end1, end2 = x.shape[1], x.shape[2]
+ x = x[:, 0:end1:self.stride, 0:end2:self.stride]
+ x = paddle.reshape(x, [B, -1, C])
+ return x
+
+
+class AttentionSubsample(nn.Layer):
+
+ def __init__(self,
+ in_dim,
+ out_dim,
+ key_dim,
+ num_heads=8,
+ attn_ratio=2,
+ activation=None,
+ stride=2,
+ resolution=14,
+ resolution_=7):
+ super().__init__()
+ self.num_heads = num_heads
+ self.scale = key_dim**-0.5
+ self.key_dim = key_dim
+ self.nh_kd = nh_kd = key_dim * num_heads
+ self.d = int(attn_ratio * key_dim)
+ self.dh = int(attn_ratio * key_dim) * self.num_heads
+ self.attn_ratio = attn_ratio
+ self.resolution_ = resolution_
+ self.resolution_2 = resolution_**2
+ self.training = True
+ h = self.dh + nh_kd
+ self.kv = Linear_BN(in_dim, h)
+
+ self.q = nn.Sequential(Subsample(stride, resolution), Linear_BN(in_dim, nh_kd))
+ self.proj = nn.Sequential(activation(), Linear_BN(self.dh, out_dim))
+
+ self.stride = stride
+ self.resolution = resolution
+ points = list(itertools.product(range(resolution), range(resolution)))
+ points_ = list(itertools.product(range(resolution_), range(resolution_)))
+
+ N = len(points)
+ N_ = len(points_)
+ attention_offsets = {}
+ idxs = []
+ i = 0
+ j = 0
+ for p1 in points_:
+ i += 1
+ for p2 in points:
+ j += 1
+ size = 1
+ offset = (abs(p1[0] * stride - p2[0] + (size - 1) / 2), abs(p1[1] * stride - p2[1] + (size - 1) / 2))
+ if offset not in attention_offsets:
+ attention_offsets[offset] = len(attention_offsets)
+ idxs.append(attention_offsets[offset])
+ self.attention_biases = self.create_parameter(shape=(num_heads, len(attention_offsets)),
+ default_initializer=zeros_,
+ attr=paddle.ParamAttr(regularizer=L2Decay(0.0)))
+
+ tensor_idxs_ = paddle.to_tensor(idxs, dtype='int64')
+ self.register_buffer('attention_bias_idxs', paddle.reshape(tensor_idxs_, [N_, N]))
+
+ @paddle.no_grad()
+ def train(self, mode=True):
+ if mode:
+ super().train()
+ else:
+ super().eval()
+ if mode and hasattr(self, 'ab'):
+ del self.ab
+ else:
+ self.ab = cal_attention_biases(self.attention_biases, self.attention_bias_idxs)
+
+ def forward(self, x):
+ self.training = True
+ B, N, C = x.shape
+ kv = self.kv(x)
+ kv = paddle.reshape(kv, [B, N, self.num_heads, -1])
+ k, v = paddle.split(kv, [self.key_dim, self.d], axis=3)
+ k = paddle.transpose(k, perm=[0, 2, 1, 3]) # BHNC
+ v = paddle.transpose(v, perm=[0, 2, 1, 3])
+ q = paddle.reshape(self.q(x), [B, self.resolution_2, self.num_heads, self.key_dim])
+ q = paddle.transpose(q, perm=[0, 2, 1, 3])
+
+ if self.training:
+ attention_biases = cal_attention_biases(self.attention_biases, self.attention_bias_idxs)
+ else:
+ attention_biases = self.ab
+
+ attn = (paddle.matmul(q, paddle.transpose(k, perm=[0, 1, 3, 2]))) * self.scale + attention_biases
+ attn = F.softmax(attn)
+
+ x = paddle.reshape(paddle.transpose(paddle.matmul(attn, v), perm=[0, 2, 1, 3]), [B, -1, self.dh])
+ x = self.proj(x)
+ return x
+
+
+class LeViT(nn.Layer):
+ """ Vision Transformer with support for patch or hybrid CNN input stage
+ """
+
+ def __init__(self,
+ img_size=224,
+ patch_size=16,
+ in_chans=3,
+ class_num=1000,
+ embed_dim=[192],
+ key_dim=[64],
+ depth=[12],
+ num_heads=[3],
+ attn_ratio=[2],
+ mlp_ratio=[2],
+ hybrid_backbone=None,
+ down_ops=[],
+ attention_activation=nn.Hardswish,
+ mlp_activation=nn.Hardswish,
+ distillation=True,
+ drop_path=0):
+ super().__init__()
+
+ self.class_num = class_num
+ self.num_features = embed_dim[-1]
+ self.embed_dim = embed_dim
+ self.distillation = distillation
+
+ self.patch_embed = hybrid_backbone
+
+ self.blocks = []
+ down_ops.append([''])
+ resolution = img_size // patch_size
+ for i, (ed, kd, dpth, nh, ar, mr,
+ do) in enumerate(zip(embed_dim, key_dim, depth, num_heads, attn_ratio, mlp_ratio, down_ops)):
+ for _ in range(dpth):
+ self.blocks.append(
+ Residual(
+ Attention(
+ ed,
+ kd,
+ nh,
+ attn_ratio=ar,
+ activation=attention_activation,
+ resolution=resolution,
+ ), drop_path))
+ if mr > 0:
+ h = int(ed * mr)
+ self.blocks.append(
+ Residual(
+ nn.Sequential(
+ Linear_BN(ed, h),
+ mlp_activation(),
+ Linear_BN(h, ed, bn_weight_init=0),
+ ), drop_path))
+ if do[0] == 'Subsample':
+ #('Subsample',key_dim, num_heads, attn_ratio, mlp_ratio, stride)
+ resolution_ = (resolution - 1) // do[5] + 1
+ self.blocks.append(
+ AttentionSubsample(*embed_dim[i:i + 2],
+ key_dim=do[1],
+ num_heads=do[2],
+ attn_ratio=do[3],
+ activation=attention_activation,
+ stride=do[5],
+ resolution=resolution,
+ resolution_=resolution_))
+ resolution = resolution_
+ if do[4] > 0: # mlp_ratio
+ h = int(embed_dim[i + 1] * do[4])
+ self.blocks.append(
+ Residual(
+ nn.Sequential(
+ Linear_BN(embed_dim[i + 1], h),
+ mlp_activation(),
+ Linear_BN(h, embed_dim[i + 1], bn_weight_init=0),
+ ), drop_path))
+ self.blocks = nn.Sequential(*self.blocks)
+
+ # Classifier head
+ self.head = BN_Linear(embed_dim[-1], class_num) if class_num > 0 else Identity()
+ if distillation:
+ self.head_dist = BN_Linear(embed_dim[-1], class_num) if class_num > 0 else Identity()
+
+ def forward(self, x):
+ x = self.patch_embed(x)
+ x = x.flatten(2)
+ x = paddle.transpose(x, perm=[0, 2, 1])
+ x = self.blocks(x)
+ x = x.mean(1)
+
+ x = paddle.reshape(x, [-1, self.embed_dim[-1]])
+ if self.distillation:
+ x = self.head(x), self.head_dist(x)
+ if not self.training:
+ x = (x[0] + x[1]) / 2
+ else:
+ x = self.head(x)
+ return x
+
+
+def model_factory(C, D, X, N, drop_path, class_num, distillation):
+ embed_dim = [int(x) for x in C.split('_')]
+ num_heads = [int(x) for x in N.split('_')]
+ depth = [int(x) for x in X.split('_')]
+ act = nn.Hardswish
+ model = LeViT(
+ patch_size=16,
+ embed_dim=embed_dim,
+ num_heads=num_heads,
+ key_dim=[D] * 3,
+ depth=depth,
+ attn_ratio=[2, 2, 2],
+ mlp_ratio=[2, 2, 2],
+ down_ops=[
+ #('Subsample',key_dim, num_heads, attn_ratio, mlp_ratio, stride)
+ ['Subsample', D, embed_dim[0] // D, 4, 2, 2],
+ ['Subsample', D, embed_dim[1] // D, 4, 2, 2],
+ ],
+ attention_activation=act,
+ mlp_activation=act,
+ hybrid_backbone=b16(embed_dim[0], activation=act),
+ class_num=class_num,
+ drop_path=drop_path,
+ distillation=distillation)
+
+ return model
+
+
+specification = {
+ 'LeViT_128S': {
+ 'C': '128_256_384',
+ 'D': 16,
+ 'N': '4_6_8',
+ 'X': '2_3_4',
+ 'drop_path': 0
+ },
+ 'LeViT_128': {
+ 'C': '128_256_384',
+ 'D': 16,
+ 'N': '4_8_12',
+ 'X': '4_4_4',
+ 'drop_path': 0
+ },
+ 'LeViT_192': {
+ 'C': '192_288_384',
+ 'D': 32,
+ 'N': '3_5_6',
+ 'X': '4_4_4',
+ 'drop_path': 0
+ },
+ 'LeViT_256': {
+ 'C': '256_384_512',
+ 'D': 32,
+ 'N': '4_6_8',
+ 'X': '4_4_4',
+ 'drop_path': 0
+ },
+ 'LeViT_384': {
+ 'C': '384_512_768',
+ 'D': 32,
+ 'N': '6_9_12',
+ 'X': '4_4_4',
+ 'drop_path': 0.1
+ },
+}
+
+
+def LeViT_128S(**kwargs):
+ model = model_factory(**specification['LeViT_128S'], class_num=1000, distillation=False)
+ return model
diff --git a/modules/image/classification/levit_128s_imagenet/module.py b/modules/image/classification/levit_128s_imagenet/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..9476fecfabe92d16da7098f4ca873b3fdbfff5f2
--- /dev/null
+++ b/modules/image/classification/levit_128s_imagenet/module.py
@@ -0,0 +1,154 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import copy
+import os
+
+import cv2
+import numpy as np
+import paddle
+from skimage.io import imread
+from skimage.transform import rescale
+from skimage.transform import resize
+
+import paddlehub as hub
+from .model import LeViT_128S
+from .processor import base64_to_cv2
+from .processor import create_operators
+from .processor import Topk
+from .utils import get_config
+from paddlehub.module.module import moduleinfo
+from paddlehub.module.module import runnable
+from paddlehub.module.module import serving
+
+
+@moduleinfo(name="levit_128s_imagenet",
+ type="cv/classification",
+ author="paddlepaddle",
+ author_email="",
+ summary="",
+ version="1.0.0")
+class LeViT_128S_ImageNet:
+
+ def __init__(self):
+ self.config = get_config(os.path.join(self.directory, 'LeViT_128S.yaml'), show=False)
+ self.label_path = os.path.join(self.directory, 'imagenet1k_label_list.txt')
+ self.pretrain_path = os.path.join(self.directory, 'LeViT_128S_pretrained.pdparams')
+ self.config['Infer']['PostProcess']['class_id_map_file'] = self.label_path
+ self.model = LeViT_128S()
+ param_state_dict = paddle.load(self.pretrain_path)
+ self.model.set_dict(param_state_dict)
+ self.preprocess_funcs = create_operators(self.config["Infer"]["transforms"])
+
+ def classification(self,
+ images: list = None,
+ paths: list = None,
+ batch_size: int = 1,
+ use_gpu: bool = False,
+ top_k: int = 1):
+ '''
+ Args:
+ images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR.
+ paths (list[str]): The paths of images.
+ batch_size (int): batch size.
+ use_gpu (bool): Whether to use gpu.
+ top_k (int): Return top k results.
+
+ Returns:
+ res (list[dict]): The classfication results, each result dict contains key 'class_ids', 'scores' and 'label_names'.
+ '''
+ postprocess_func = Topk(top_k, self.label_path)
+ inputs = []
+ results = []
+ paddle.disable_static()
+ place = 'gpu:0' if use_gpu else 'cpu'
+ place = paddle.set_device(place)
+ if images == None and paths == None:
+ print('No image provided. Please input an image or a image path.')
+ return
+
+ if images != None:
+ for image in images:
+ image = image[:, :, ::-1]
+ inputs.append(image)
+
+ if paths != None:
+ for path in paths:
+ image = cv2.imread(path)[:, :, ::-1]
+ inputs.append(image)
+
+ batch_data = []
+ for idx, imagedata in enumerate(inputs):
+ for process in self.preprocess_funcs:
+ imagedata = process(imagedata)
+ batch_data.append(imagedata)
+ if len(batch_data) >= batch_size or idx == len(inputs) - 1:
+ batch_tensor = paddle.to_tensor(batch_data)
+ out = self.model(batch_tensor)
+ if isinstance(out, list):
+ out = out[0]
+ if isinstance(out, dict) and "logits" in out:
+ out = out["logits"]
+ if isinstance(out, dict) and "output" in out:
+ out = out["output"]
+ result = postprocess_func(out)
+ results.extend(result)
+ batch_data.clear()
+ return results
+
+ @runnable
+ def run_cmd(self, argvs: list):
+ """
+ Run as a command.
+ """
+ self.parser = argparse.ArgumentParser(description="Run the {} module.".format(self.name),
+ prog='hub run {}'.format(self.name),
+ usage='%(prog)s',
+ add_help=True)
+
+ self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
+ self.arg_config_group = self.parser.add_argument_group(
+ title="Config options", description="Run configuration for controlling module behavior, not required.")
+ self.add_module_config_arg()
+ self.add_module_input_arg()
+ self.args = self.parser.parse_args(argvs)
+ results = self.classification(paths=[self.args.input_path],
+ use_gpu=self.args.use_gpu,
+ batch_size=self.args.batch_size,
+ top_k=self.args.top_k)
+ return results
+
+ @serving
+ def serving_method(self, images, **kwargs):
+ """
+ Run as a service.
+ """
+ images_decode = [base64_to_cv2(image) for image in images]
+ results = self.classification(images=images_decode, **kwargs)
+ return results
+
+ def add_module_config_arg(self):
+ """
+ Add the command config options.
+ """
+ self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not")
+
+ self.arg_config_group.add_argument('--batch_size', type=int, default=1, help='batch size')
+ self.arg_config_group.add_argument('--top_k', type=int, default=1, help='Return top k results.')
+
+ def add_module_input_arg(self):
+ """
+ Add the command input options.
+ """
+ self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.")
diff --git a/modules/image/classification/levit_128s_imagenet/processor.py b/modules/image/classification/levit_128s_imagenet/processor.py
new file mode 100644
index 0000000000000000000000000000000000000000..40cab3917ecaef50cd47d0abb76bbd5d49062bf8
--- /dev/null
+++ b/modules/image/classification/levit_128s_imagenet/processor.py
@@ -0,0 +1,374 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+
+import base64
+import inspect
+import math
+import os
+import random
+import sys
+from functools import partial
+
+import cv2
+import numpy as np
+import paddle
+import paddle.nn.functional as F
+import six
+from paddle.vision.transforms import ColorJitter as RawColorJitter
+from PIL import Image
+
+
+def create_operators(params, class_num=None):
+ """
+ create operators based on the config
+
+ Args:
+ params(list): a dict list, used to create some operators
+ """
+ assert isinstance(params, list), ('operator config should be a list')
+ ops = []
+ current_module = sys.modules[__name__]
+ for operator in params:
+ assert isinstance(operator, dict) and len(operator) == 1, "yaml format error"
+ op_name = list(operator)[0]
+ param = {} if operator[op_name] is None else operator[op_name]
+ op_func = getattr(current_module, op_name)
+ if "class_num" in inspect.getfullargspec(op_func).args:
+ param.update({"class_num": class_num})
+ op = op_func(**param)
+ ops.append(op)
+
+ return ops
+
+
+class UnifiedResize(object):
+
+ def __init__(self, interpolation=None, backend="cv2"):
+ _cv2_interp_from_str = {
+ 'nearest': cv2.INTER_NEAREST,
+ 'bilinear': cv2.INTER_LINEAR,
+ 'area': cv2.INTER_AREA,
+ 'bicubic': cv2.INTER_CUBIC,
+ 'lanczos': cv2.INTER_LANCZOS4
+ }
+ _pil_interp_from_str = {
+ 'nearest': Image.NEAREST,
+ 'bilinear': Image.BILINEAR,
+ 'bicubic': Image.BICUBIC,
+ 'box': Image.BOX,
+ 'lanczos': Image.LANCZOS,
+ 'hamming': Image.HAMMING
+ }
+
+ def _pil_resize(src, size, resample):
+ pil_img = Image.fromarray(src)
+ pil_img = pil_img.resize(size, resample)
+ return np.asarray(pil_img)
+
+ if backend.lower() == "cv2":
+ if isinstance(interpolation, str):
+ interpolation = _cv2_interp_from_str[interpolation.lower()]
+ # compatible with opencv < version 4.4.0
+ elif interpolation is None:
+ interpolation = cv2.INTER_LINEAR
+ self.resize_func = partial(cv2.resize, interpolation=interpolation)
+ elif backend.lower() == "pil":
+ if isinstance(interpolation, str):
+ interpolation = _pil_interp_from_str[interpolation.lower()]
+ self.resize_func = partial(_pil_resize, resample=interpolation)
+ else:
+ self.resize_func = cv2.resize
+
+ def __call__(self, src, size):
+ return self.resize_func(src, size)
+
+
+class OperatorParamError(ValueError):
+ """ OperatorParamError
+ """
+ pass
+
+
+class DecodeImage(object):
+ """ decode image """
+
+ def __init__(self, to_rgb=True, to_np=False, channel_first=False):
+ self.to_rgb = to_rgb
+ self.to_np = to_np # to numpy
+ self.channel_first = channel_first # only enabled when to_np is True
+
+ def __call__(self, img):
+ if six.PY2:
+ assert type(img) is str and len(img) > 0, "invalid input 'img' in DecodeImage"
+ else:
+ assert type(img) is bytes and len(img) > 0, "invalid input 'img' in DecodeImage"
+ data = np.frombuffer(img, dtype='uint8')
+ img = cv2.imdecode(data, 1)
+ if self.to_rgb:
+ assert img.shape[2] == 3, 'invalid shape of image[%s]' % (img.shape)
+ img = img[:, :, ::-1]
+
+ if self.channel_first:
+ img = img.transpose((2, 0, 1))
+
+ return img
+
+
+class ResizeImage(object):
+ """ resize image """
+
+ def __init__(self, size=None, resize_short=None, interpolation=None, backend="cv2"):
+ if resize_short is not None and resize_short > 0:
+ self.resize_short = resize_short
+ self.w = None
+ self.h = None
+ elif size is not None:
+ self.resize_short = None
+ self.w = size if type(size) is int else size[0]
+ self.h = size if type(size) is int else size[1]
+ else:
+ raise OperatorParamError("invalid params for ReisizeImage for '\
+ 'both 'size' and 'resize_short' are None")
+
+ self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend)
+
+ def __call__(self, img):
+ img_h, img_w = img.shape[:2]
+ if self.resize_short is not None:
+ percent = float(self.resize_short) / min(img_w, img_h)
+ w = int(round(img_w * percent))
+ h = int(round(img_h * percent))
+ else:
+ w = self.w
+ h = self.h
+ return self._resize_func(img, (w, h))
+
+
+class CropImage(object):
+ """ crop image """
+
+ def __init__(self, size):
+ if type(size) is int:
+ self.size = (size, size)
+ else:
+ self.size = size # (h, w)
+
+ def __call__(self, img):
+ w, h = self.size
+ img_h, img_w = img.shape[:2]
+ w_start = (img_w - w) // 2
+ h_start = (img_h - h) // 2
+
+ w_end = w_start + w
+ h_end = h_start + h
+ return img[h_start:h_end, w_start:w_end, :]
+
+
+class RandCropImage(object):
+ """ random crop image """
+
+ def __init__(self, size, scale=None, ratio=None, interpolation=None, backend="cv2"):
+ if type(size) is int:
+ self.size = (size, size) # (h, w)
+ else:
+ self.size = size
+
+ self.scale = [0.08, 1.0] if scale is None else scale
+ self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio
+
+ self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend)
+
+ def __call__(self, img):
+ size = self.size
+ scale = self.scale
+ ratio = self.ratio
+
+ aspect_ratio = math.sqrt(random.uniform(*ratio))
+ w = 1. * aspect_ratio
+ h = 1. / aspect_ratio
+
+ img_h, img_w = img.shape[:2]
+
+ bound = min((float(img_w) / img_h) / (w**2), (float(img_h) / img_w) / (h**2))
+ scale_max = min(scale[1], bound)
+ scale_min = min(scale[0], bound)
+
+ target_area = img_w * img_h * random.uniform(scale_min, scale_max)
+ target_size = math.sqrt(target_area)
+ w = int(target_size * w)
+ h = int(target_size * h)
+
+ i = random.randint(0, img_w - w)
+ j = random.randint(0, img_h - h)
+
+ img = img[j:j + h, i:i + w, :]
+
+ return self._resize_func(img, size)
+
+
+class RandFlipImage(object):
+ """ random flip image
+ flip_code:
+ 1: Flipped Horizontally
+ 0: Flipped Vertically
+ -1: Flipped Horizontally & Vertically
+ """
+
+ def __init__(self, flip_code=1):
+ assert flip_code in [-1, 0, 1], "flip_code should be a value in [-1, 0, 1]"
+ self.flip_code = flip_code
+
+ def __call__(self, img):
+ if random.randint(0, 1) == 1:
+ return cv2.flip(img, self.flip_code)
+ else:
+ return img
+
+
+class NormalizeImage(object):
+ """ normalize image such as substract mean, divide std
+ """
+
+ def __init__(self, scale=None, mean=None, std=None, order='chw', output_fp16=False, channel_num=3):
+ if isinstance(scale, str):
+ scale = eval(scale)
+ assert channel_num in [3, 4], "channel number of input image should be set to 3 or 4."
+ self.channel_num = channel_num
+ self.output_dtype = 'float16' if output_fp16 else 'float32'
+ self.scale = np.float32(scale if scale is not None else 1.0 / 255.0)
+ self.order = order
+ mean = mean if mean is not None else [0.485, 0.456, 0.406]
+ std = std if std is not None else [0.229, 0.224, 0.225]
+
+ shape = (3, 1, 1) if self.order == 'chw' else (1, 1, 3)
+ self.mean = np.array(mean).reshape(shape).astype('float32')
+ self.std = np.array(std).reshape(shape).astype('float32')
+
+ def __call__(self, img):
+ from PIL import Image
+ if isinstance(img, Image.Image):
+ img = np.array(img)
+
+ assert isinstance(img, np.ndarray), "invalid input 'img' in NormalizeImage"
+
+ img = (img.astype('float32') * self.scale - self.mean) / self.std
+
+ if self.channel_num == 4:
+ img_h = img.shape[1] if self.order == 'chw' else img.shape[0]
+ img_w = img.shape[2] if self.order == 'chw' else img.shape[1]
+ pad_zeros = np.zeros((1, img_h, img_w)) if self.order == 'chw' else np.zeros((img_h, img_w, 1))
+ img = (np.concatenate((img, pad_zeros), axis=0) if self.order == 'chw' else np.concatenate(
+ (img, pad_zeros), axis=2))
+ return img.astype(self.output_dtype)
+
+
+class ToCHWImage(object):
+ """ convert hwc image to chw image
+ """
+
+ def __init__(self):
+ pass
+
+ def __call__(self, img):
+ from PIL import Image
+ if isinstance(img, Image.Image):
+ img = np.array(img)
+
+ return img.transpose((2, 0, 1))
+
+
+class ColorJitter(RawColorJitter):
+ """ColorJitter.
+ """
+
+ def __init__(self, *args, **kwargs):
+ super().__init__(*args, **kwargs)
+
+ def __call__(self, img):
+ if not isinstance(img, Image.Image):
+ img = np.ascontiguousarray(img)
+ img = Image.fromarray(img)
+ img = super()._apply_image(img)
+ if isinstance(img, Image.Image):
+ img = np.asarray(img)
+ return img
+
+
+def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+
+class Topk(object):
+
+ def __init__(self, topk=1, class_id_map_file=None):
+ assert isinstance(topk, (int, ))
+ self.class_id_map = self.parse_class_id_map(class_id_map_file)
+ self.topk = topk
+
+ def parse_class_id_map(self, class_id_map_file):
+ if class_id_map_file is None:
+ return None
+ if not os.path.exists(class_id_map_file):
+ print(
+ "Warning: If want to use your own label_dict, please input legal path!\nOtherwise label_names will be empty!"
+ )
+ return None
+
+ try:
+ class_id_map = {}
+ with open(class_id_map_file, "r") as fin:
+ lines = fin.readlines()
+ for line in lines:
+ partition = line.split("\n")[0].partition(" ")
+ class_id_map[int(partition[0])] = str(partition[-1])
+ except Exception as ex:
+ print(ex)
+ class_id_map = None
+ return class_id_map
+
+ def __call__(self, x, file_names=None, multilabel=False):
+ assert isinstance(x, paddle.Tensor)
+ if file_names is not None:
+ assert x.shape[0] == len(file_names)
+ x = F.softmax(x, axis=-1) if not multilabel else F.sigmoid(x)
+ x = x.numpy()
+ y = []
+ for idx, probs in enumerate(x):
+ index = probs.argsort(axis=0)[-self.topk:][::-1].astype("int32") if not multilabel else np.where(
+ probs >= 0.5)[0].astype("int32")
+ clas_id_list = []
+ score_list = []
+ label_name_list = []
+ for i in index:
+ clas_id_list.append(i.item())
+ score_list.append(probs[i].item())
+ if self.class_id_map is not None:
+ label_name_list.append(self.class_id_map[i.item()])
+ result = {
+ "class_ids": clas_id_list,
+ "scores": np.around(score_list, decimals=5).tolist(),
+ }
+ if file_names is not None:
+ result["file_name"] = file_names[idx]
+ if label_name_list is not None:
+ result["label_names"] = label_name_list
+ y.append(result)
+ return y
diff --git a/modules/image/classification/levit_128s_imagenet/utils.py b/modules/image/classification/levit_128s_imagenet/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..df2bc36b3050beb0256bf2266dd6b33b4590e537
--- /dev/null
+++ b/modules/image/classification/levit_128s_imagenet/utils.py
@@ -0,0 +1,129 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import copy
+import os
+
+import yaml
+
+__all__ = ['get_config']
+
+
+class AttrDict(dict):
+
+ def __getattr__(self, key):
+ return self[key]
+
+ def __setattr__(self, key, value):
+ if key in self.__dict__:
+ self.__dict__[key] = value
+ else:
+ self[key] = value
+
+ def __deepcopy__(self, content):
+ return copy.deepcopy(dict(self))
+
+
+def create_attr_dict(yaml_config):
+ from ast import literal_eval
+ for key, value in yaml_config.items():
+ if type(value) is dict:
+ yaml_config[key] = value = AttrDict(value)
+ if isinstance(value, str):
+ try:
+ value = literal_eval(value)
+ except BaseException:
+ pass
+ if isinstance(value, AttrDict):
+ create_attr_dict(yaml_config[key])
+ else:
+ yaml_config[key] = value
+
+
+def parse_config(cfg_file):
+ """Load a config file into AttrDict"""
+ with open(cfg_file, 'r') as fopen:
+ yaml_config = AttrDict(yaml.load(fopen, Loader=yaml.SafeLoader))
+ create_attr_dict(yaml_config)
+ return yaml_config
+
+
+def override(dl, ks, v):
+ """
+ Recursively replace dict of list
+ Args:
+ dl(dict or list): dict or list to be replaced
+ ks(list): list of keys
+ v(str): value to be replaced
+ """
+
+ def str2num(v):
+ try:
+ return eval(v)
+ except Exception:
+ return v
+
+ assert isinstance(dl, (list, dict)), ("{} should be a list or a dict")
+ assert len(ks) > 0, ('lenght of keys should larger than 0')
+ if isinstance(dl, list):
+ k = str2num(ks[0])
+ if len(ks) == 1:
+ assert k < len(dl), ('index({}) out of range({})'.format(k, dl))
+ dl[k] = str2num(v)
+ else:
+ override(dl[k], ks[1:], v)
+ else:
+ if len(ks) == 1:
+ # assert ks[0] in dl, ('{} is not exist in {}'.format(ks[0], dl))
+ if not ks[0] in dl:
+ print('A new filed ({}) detected!'.format(ks[0], dl))
+ dl[ks[0]] = str2num(v)
+ else:
+ override(dl[ks[0]], ks[1:], v)
+
+
+def override_config(config, options=None):
+ """
+ Recursively override the config
+ Args:
+ config(dict): dict to be replaced
+ options(list): list of pairs(key0.key1.idx.key2=value)
+ such as: [
+ 'topk=2',
+ 'VALID.transforms.1.ResizeImage.resize_short=300'
+ ]
+ Returns:
+ config(dict): replaced config
+ """
+ if options is not None:
+ for opt in options:
+ assert isinstance(opt, str), ("option({}) should be a str".format(opt))
+ assert "=" in opt, ("option({}) should contain a ="
+ "to distinguish between key and value".format(opt))
+ pair = opt.split('=')
+ assert len(pair) == 2, ("there can be only a = in the option")
+ key, value = pair
+ keys = key.split('.')
+ override(config, keys, value)
+ return config
+
+
+def get_config(fname, overrides=None, show=False):
+ """
+ Read config from file
+ """
+ assert os.path.exists(fname), ('config file({}) is not exist'.format(fname))
+ config = parse_config(fname)
+ override_config(config, overrides)
+ return config
diff --git a/modules/image/classification/levit_192_imagenet/README.md b/modules/image/classification/levit_192_imagenet/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..c3e86eea1909e7497e3356b41397416d77af044a
--- /dev/null
+++ b/modules/image/classification/levit_192_imagenet/README.md
@@ -0,0 +1,132 @@
+# levit_192_imagenet
+
+|模型名称|levit_192_imagenet|
+| :--- | :---: |
+|类别|图像-图像分类|
+|网络|LeViT|
+|数据集|ImageNet-2012|
+|是否支持Fine-tuning|否|
+|模型大小|64 MB|
+|最新更新日期|2022-04-02|
+|数据指标|Acc|
+
+
+## 一、模型基本信息
+
+
+- ### 模型介绍
+
+ - LeViT 是一种快速推理的、用于图像分类任务的混合神经网络。其设计之初考虑了网络模型在不同的硬件平台上的性能,因此能够更好地反映普遍应用的真实场景。通过大量实验,作者找到了卷积神经网络与 Transformer 体系更好的结合方式,并且提出了 attention-based 方法,用于整合 Transformer 中的位置信息编码, 该模块的模型结构配置为LeViT192, 详情可参考[论文地址](https://arxiv.org/abs/2104.01136)。
+
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 1.6.2
+
+ - paddlehub >= 1.6.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
+
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install levit_192_imagenet
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ $ hub run levit_192_imagenet --input_path "/PATH/TO/IMAGE"
+ ```
+ - 通过命令行方式实现分类模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ classifier = hub.Module(name="levit_192_imagenet")
+ result = classifier.classification(images=[cv2.imread('/PATH/TO/IMAGE')])
+ # or
+ # result = classifier.classification(paths=['/PATH/TO/IMAGE'])
+ ```
+
+- ### 3、API
+
+
+ - ```python
+ def classification(images=None,
+ paths=None,
+ batch_size=1,
+ use_gpu=False,
+ top_k=1):
+ ```
+ - 分类接口API。
+ - **参数**
+
+ - images (list\[numpy.ndarray\]): 图片数据,每一个图片数据的shape 均为 \[H, W, C\],颜色空间为 BGR;
+ - paths (list\[str\]): 图片的路径;
+ - batch\_size (int): batch 的大小;
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
+ - top\_k (int): 返回预测结果的前 k 个。
+
+ - **返回**
+
+ - res (list\[dict\]): 分类结果,列表的每一个元素均为字典,其中 key 包括'class_ids'(种类索引), 'scores'(置信度) 和 'label_names'(种类名称)
+
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个图像识别的在线服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+ - ```shell
+ $ hub serving start -m levit_192_imagenet
+ ```
+
+ - 这样就完成了一个图像识别的在线服务的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ # 发送HTTP请求
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+ headers = {"Content-type": "application/json"\}
+ url = "http://127.0.0.1:8866/predict/levit_192_imagenet"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # 打印预测结果
+ print(r.json()["results"])
+ ```
+
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+
+ - ```shell
+ $ hub install levit_192_imagenet==1.0.0
+ ```
diff --git a/modules/image/classification/levit_192_imagenet/model.py b/modules/image/classification/levit_192_imagenet/model.py
new file mode 100644
index 0000000000000000000000000000000000000000..104d5f0669dba227d8574e9a25daeed62e4f23fa
--- /dev/null
+++ b/modules/image/classification/levit_192_imagenet/model.py
@@ -0,0 +1,450 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# Code was based on https://github.com/facebookresearch/LeViT
+import itertools
+import math
+import warnings
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn.initializer import Constant
+from paddle.nn.initializer import TruncatedNormal
+from paddle.regularizer import L2Decay
+
+from .vision_transformer import Identity
+from .vision_transformer import ones_
+from .vision_transformer import trunc_normal_
+from .vision_transformer import zeros_
+
+
+def cal_attention_biases(attention_biases, attention_bias_idxs):
+ gather_list = []
+ attention_bias_t = paddle.transpose(attention_biases, (1, 0))
+ nums = attention_bias_idxs.shape[0]
+ for idx in range(nums):
+ gather = paddle.gather(attention_bias_t, attention_bias_idxs[idx])
+ gather_list.append(gather)
+ shape0, shape1 = attention_bias_idxs.shape
+ gather = paddle.concat(gather_list)
+ return paddle.transpose(gather, (1, 0)).reshape((0, shape0, shape1))
+
+
+class Conv2d_BN(nn.Sequential):
+
+ def __init__(self, a, b, ks=1, stride=1, pad=0, dilation=1, groups=1, bn_weight_init=1, resolution=-10000):
+ super().__init__()
+ self.add_sublayer('c', nn.Conv2D(a, b, ks, stride, pad, dilation, groups, bias_attr=False))
+ bn = nn.BatchNorm2D(b)
+ ones_(bn.weight)
+ zeros_(bn.bias)
+ self.add_sublayer('bn', bn)
+
+
+class Linear_BN(nn.Sequential):
+
+ def __init__(self, a, b, bn_weight_init=1):
+ super().__init__()
+ self.add_sublayer('c', nn.Linear(a, b, bias_attr=False))
+ bn = nn.BatchNorm1D(b)
+ if bn_weight_init == 0:
+ zeros_(bn.weight)
+ else:
+ ones_(bn.weight)
+ zeros_(bn.bias)
+ self.add_sublayer('bn', bn)
+
+ def forward(self, x):
+ l, bn = self._sub_layers.values()
+ x = l(x)
+ return paddle.reshape(bn(x.flatten(0, 1)), x.shape)
+
+
+class BN_Linear(nn.Sequential):
+
+ def __init__(self, a, b, bias=True, std=0.02):
+ super().__init__()
+ self.add_sublayer('bn', nn.BatchNorm1D(a))
+ l = nn.Linear(a, b, bias_attr=bias)
+ trunc_normal_(l.weight)
+ if bias:
+ zeros_(l.bias)
+ self.add_sublayer('l', l)
+
+
+def b16(n, activation, resolution=224):
+ return nn.Sequential(Conv2d_BN(3, n // 8, 3, 2, 1, resolution=resolution), activation(),
+ Conv2d_BN(n // 8, n // 4, 3, 2, 1, resolution=resolution // 2), activation(),
+ Conv2d_BN(n // 4, n // 2, 3, 2, 1, resolution=resolution // 4), activation(),
+ Conv2d_BN(n // 2, n, 3, 2, 1, resolution=resolution // 8))
+
+
+class Residual(nn.Layer):
+
+ def __init__(self, m, drop):
+ super().__init__()
+ self.m = m
+ self.drop = drop
+
+ def forward(self, x):
+ if self.training and self.drop > 0:
+ y = paddle.rand(shape=[x.shape[0], 1, 1]).__ge__(self.drop).astype("float32")
+ y = y.divide(paddle.full_like(y, 1 - self.drop))
+ return paddle.add(x, y)
+ else:
+ return paddle.add(x, self.m(x))
+
+
+class Attention(nn.Layer):
+
+ def __init__(self, dim, key_dim, num_heads=8, attn_ratio=4, activation=None, resolution=14):
+ super().__init__()
+ self.num_heads = num_heads
+ self.scale = key_dim**-0.5
+ self.key_dim = key_dim
+ self.nh_kd = nh_kd = key_dim * num_heads
+ self.d = int(attn_ratio * key_dim)
+ self.dh = int(attn_ratio * key_dim) * num_heads
+ self.attn_ratio = attn_ratio
+ self.h = self.dh + nh_kd * 2
+ self.qkv = Linear_BN(dim, self.h)
+ self.proj = nn.Sequential(activation(), Linear_BN(self.dh, dim, bn_weight_init=0))
+ points = list(itertools.product(range(resolution), range(resolution)))
+ N = len(points)
+ attention_offsets = {}
+ idxs = []
+ for p1 in points:
+ for p2 in points:
+ offset = (abs(p1[0] - p2[0]), abs(p1[1] - p2[1]))
+ if offset not in attention_offsets:
+ attention_offsets[offset] = len(attention_offsets)
+ idxs.append(attention_offsets[offset])
+ self.attention_biases = self.create_parameter(shape=(num_heads, len(attention_offsets)),
+ default_initializer=zeros_,
+ attr=paddle.ParamAttr(regularizer=L2Decay(0.0)))
+ tensor_idxs = paddle.to_tensor(idxs, dtype='int64')
+ self.register_buffer('attention_bias_idxs', paddle.reshape(tensor_idxs, [N, N]))
+
+ @paddle.no_grad()
+ def train(self, mode=True):
+ if mode:
+ super().train()
+ else:
+ super().eval()
+ if mode and hasattr(self, 'ab'):
+ del self.ab
+ else:
+ self.ab = cal_attention_biases(self.attention_biases, self.attention_bias_idxs)
+
+ def forward(self, x):
+ self.training = True
+ B, N, C = x.shape
+ qkv = self.qkv(x)
+ qkv = paddle.reshape(qkv, [B, N, self.num_heads, self.h // self.num_heads])
+ q, k, v = paddle.split(qkv, [self.key_dim, self.key_dim, self.d], axis=3)
+ q = paddle.transpose(q, perm=[0, 2, 1, 3])
+ k = paddle.transpose(k, perm=[0, 2, 1, 3])
+ v = paddle.transpose(v, perm=[0, 2, 1, 3])
+ k_transpose = paddle.transpose(k, perm=[0, 1, 3, 2])
+
+ if self.training:
+ attention_biases = cal_attention_biases(self.attention_biases, self.attention_bias_idxs)
+ else:
+ attention_biases = self.ab
+ attn = (paddle.matmul(q, k_transpose) * self.scale + attention_biases)
+ attn = F.softmax(attn)
+ x = paddle.transpose(paddle.matmul(attn, v), perm=[0, 2, 1, 3])
+ x = paddle.reshape(x, [B, N, self.dh])
+ x = self.proj(x)
+ return x
+
+
+class Subsample(nn.Layer):
+
+ def __init__(self, stride, resolution):
+ super().__init__()
+ self.stride = stride
+ self.resolution = resolution
+
+ def forward(self, x):
+ B, N, C = x.shape
+ x = paddle.reshape(x, [B, self.resolution, self.resolution, C])
+ end1, end2 = x.shape[1], x.shape[2]
+ x = x[:, 0:end1:self.stride, 0:end2:self.stride]
+ x = paddle.reshape(x, [B, -1, C])
+ return x
+
+
+class AttentionSubsample(nn.Layer):
+
+ def __init__(self,
+ in_dim,
+ out_dim,
+ key_dim,
+ num_heads=8,
+ attn_ratio=2,
+ activation=None,
+ stride=2,
+ resolution=14,
+ resolution_=7):
+ super().__init__()
+ self.num_heads = num_heads
+ self.scale = key_dim**-0.5
+ self.key_dim = key_dim
+ self.nh_kd = nh_kd = key_dim * num_heads
+ self.d = int(attn_ratio * key_dim)
+ self.dh = int(attn_ratio * key_dim) * self.num_heads
+ self.attn_ratio = attn_ratio
+ self.resolution_ = resolution_
+ self.resolution_2 = resolution_**2
+ self.training = True
+ h = self.dh + nh_kd
+ self.kv = Linear_BN(in_dim, h)
+
+ self.q = nn.Sequential(Subsample(stride, resolution), Linear_BN(in_dim, nh_kd))
+ self.proj = nn.Sequential(activation(), Linear_BN(self.dh, out_dim))
+
+ self.stride = stride
+ self.resolution = resolution
+ points = list(itertools.product(range(resolution), range(resolution)))
+ points_ = list(itertools.product(range(resolution_), range(resolution_)))
+
+ N = len(points)
+ N_ = len(points_)
+ attention_offsets = {}
+ idxs = []
+ i = 0
+ j = 0
+ for p1 in points_:
+ i += 1
+ for p2 in points:
+ j += 1
+ size = 1
+ offset = (abs(p1[0] * stride - p2[0] + (size - 1) / 2), abs(p1[1] * stride - p2[1] + (size - 1) / 2))
+ if offset not in attention_offsets:
+ attention_offsets[offset] = len(attention_offsets)
+ idxs.append(attention_offsets[offset])
+ self.attention_biases = self.create_parameter(shape=(num_heads, len(attention_offsets)),
+ default_initializer=zeros_,
+ attr=paddle.ParamAttr(regularizer=L2Decay(0.0)))
+
+ tensor_idxs_ = paddle.to_tensor(idxs, dtype='int64')
+ self.register_buffer('attention_bias_idxs', paddle.reshape(tensor_idxs_, [N_, N]))
+
+ @paddle.no_grad()
+ def train(self, mode=True):
+ if mode:
+ super().train()
+ else:
+ super().eval()
+ if mode and hasattr(self, 'ab'):
+ del self.ab
+ else:
+ self.ab = cal_attention_biases(self.attention_biases, self.attention_bias_idxs)
+
+ def forward(self, x):
+ self.training = True
+ B, N, C = x.shape
+ kv = self.kv(x)
+ kv = paddle.reshape(kv, [B, N, self.num_heads, -1])
+ k, v = paddle.split(kv, [self.key_dim, self.d], axis=3)
+ k = paddle.transpose(k, perm=[0, 2, 1, 3]) # BHNC
+ v = paddle.transpose(v, perm=[0, 2, 1, 3])
+ q = paddle.reshape(self.q(x), [B, self.resolution_2, self.num_heads, self.key_dim])
+ q = paddle.transpose(q, perm=[0, 2, 1, 3])
+
+ if self.training:
+ attention_biases = cal_attention_biases(self.attention_biases, self.attention_bias_idxs)
+ else:
+ attention_biases = self.ab
+
+ attn = (paddle.matmul(q, paddle.transpose(k, perm=[0, 1, 3, 2]))) * self.scale + attention_biases
+ attn = F.softmax(attn)
+
+ x = paddle.reshape(paddle.transpose(paddle.matmul(attn, v), perm=[0, 2, 1, 3]), [B, -1, self.dh])
+ x = self.proj(x)
+ return x
+
+
+class LeViT(nn.Layer):
+ """ Vision Transformer with support for patch or hybrid CNN input stage
+ """
+
+ def __init__(self,
+ img_size=224,
+ patch_size=16,
+ in_chans=3,
+ class_num=1000,
+ embed_dim=[192],
+ key_dim=[64],
+ depth=[12],
+ num_heads=[3],
+ attn_ratio=[2],
+ mlp_ratio=[2],
+ hybrid_backbone=None,
+ down_ops=[],
+ attention_activation=nn.Hardswish,
+ mlp_activation=nn.Hardswish,
+ distillation=True,
+ drop_path=0):
+ super().__init__()
+
+ self.class_num = class_num
+ self.num_features = embed_dim[-1]
+ self.embed_dim = embed_dim
+ self.distillation = distillation
+
+ self.patch_embed = hybrid_backbone
+
+ self.blocks = []
+ down_ops.append([''])
+ resolution = img_size // patch_size
+ for i, (ed, kd, dpth, nh, ar, mr,
+ do) in enumerate(zip(embed_dim, key_dim, depth, num_heads, attn_ratio, mlp_ratio, down_ops)):
+ for _ in range(dpth):
+ self.blocks.append(
+ Residual(
+ Attention(
+ ed,
+ kd,
+ nh,
+ attn_ratio=ar,
+ activation=attention_activation,
+ resolution=resolution,
+ ), drop_path))
+ if mr > 0:
+ h = int(ed * mr)
+ self.blocks.append(
+ Residual(
+ nn.Sequential(
+ Linear_BN(ed, h),
+ mlp_activation(),
+ Linear_BN(h, ed, bn_weight_init=0),
+ ), drop_path))
+ if do[0] == 'Subsample':
+ #('Subsample',key_dim, num_heads, attn_ratio, mlp_ratio, stride)
+ resolution_ = (resolution - 1) // do[5] + 1
+ self.blocks.append(
+ AttentionSubsample(*embed_dim[i:i + 2],
+ key_dim=do[1],
+ num_heads=do[2],
+ attn_ratio=do[3],
+ activation=attention_activation,
+ stride=do[5],
+ resolution=resolution,
+ resolution_=resolution_))
+ resolution = resolution_
+ if do[4] > 0: # mlp_ratio
+ h = int(embed_dim[i + 1] * do[4])
+ self.blocks.append(
+ Residual(
+ nn.Sequential(
+ Linear_BN(embed_dim[i + 1], h),
+ mlp_activation(),
+ Linear_BN(h, embed_dim[i + 1], bn_weight_init=0),
+ ), drop_path))
+ self.blocks = nn.Sequential(*self.blocks)
+
+ # Classifier head
+ self.head = BN_Linear(embed_dim[-1], class_num) if class_num > 0 else Identity()
+ if distillation:
+ self.head_dist = BN_Linear(embed_dim[-1], class_num) if class_num > 0 else Identity()
+
+ def forward(self, x):
+ x = self.patch_embed(x)
+ x = x.flatten(2)
+ x = paddle.transpose(x, perm=[0, 2, 1])
+ x = self.blocks(x)
+ x = x.mean(1)
+
+ x = paddle.reshape(x, [-1, self.embed_dim[-1]])
+ if self.distillation:
+ x = self.head(x), self.head_dist(x)
+ if not self.training:
+ x = (x[0] + x[1]) / 2
+ else:
+ x = self.head(x)
+ return x
+
+
+def model_factory(C, D, X, N, drop_path, class_num, distillation):
+ embed_dim = [int(x) for x in C.split('_')]
+ num_heads = [int(x) for x in N.split('_')]
+ depth = [int(x) for x in X.split('_')]
+ act = nn.Hardswish
+ model = LeViT(
+ patch_size=16,
+ embed_dim=embed_dim,
+ num_heads=num_heads,
+ key_dim=[D] * 3,
+ depth=depth,
+ attn_ratio=[2, 2, 2],
+ mlp_ratio=[2, 2, 2],
+ down_ops=[
+ #('Subsample',key_dim, num_heads, attn_ratio, mlp_ratio, stride)
+ ['Subsample', D, embed_dim[0] // D, 4, 2, 2],
+ ['Subsample', D, embed_dim[1] // D, 4, 2, 2],
+ ],
+ attention_activation=act,
+ mlp_activation=act,
+ hybrid_backbone=b16(embed_dim[0], activation=act),
+ class_num=class_num,
+ drop_path=drop_path,
+ distillation=distillation)
+
+ return model
+
+
+specification = {
+ 'LeViT_128S': {
+ 'C': '128_256_384',
+ 'D': 16,
+ 'N': '4_6_8',
+ 'X': '2_3_4',
+ 'drop_path': 0
+ },
+ 'LeViT_128': {
+ 'C': '128_256_384',
+ 'D': 16,
+ 'N': '4_8_12',
+ 'X': '4_4_4',
+ 'drop_path': 0
+ },
+ 'LeViT_192': {
+ 'C': '192_288_384',
+ 'D': 32,
+ 'N': '3_5_6',
+ 'X': '4_4_4',
+ 'drop_path': 0
+ },
+ 'LeViT_256': {
+ 'C': '256_384_512',
+ 'D': 32,
+ 'N': '4_6_8',
+ 'X': '4_4_4',
+ 'drop_path': 0
+ },
+ 'LeViT_384': {
+ 'C': '384_512_768',
+ 'D': 32,
+ 'N': '6_9_12',
+ 'X': '4_4_4',
+ 'drop_path': 0.1
+ },
+}
+
+
+def LeViT_192(**kwargs):
+ model = model_factory(**specification['LeViT_192'], class_num=1000, distillation=False)
+ return model
diff --git a/modules/image/classification/levit_192_imagenet/module.py b/modules/image/classification/levit_192_imagenet/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..1e982e824dc7c0e99310ae73f63950e4e3bf0c7c
--- /dev/null
+++ b/modules/image/classification/levit_192_imagenet/module.py
@@ -0,0 +1,154 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import copy
+import os
+
+import cv2
+import numpy as np
+import paddle
+from skimage.io import imread
+from skimage.transform import rescale
+from skimage.transform import resize
+
+import paddlehub as hub
+from .model import LeViT_192
+from .processor import base64_to_cv2
+from .processor import create_operators
+from .processor import Topk
+from .utils import get_config
+from paddlehub.module.module import moduleinfo
+from paddlehub.module.module import runnable
+from paddlehub.module.module import serving
+
+
+@moduleinfo(name="levit_192_imagenet",
+ type="cv/classification",
+ author="paddlepaddle",
+ author_email="",
+ summary="",
+ version="1.0.0")
+class LeViT_192_ImageNet:
+
+ def __init__(self):
+ self.config = get_config(os.path.join(self.directory, 'LeViT_192.yaml'), show=False)
+ self.label_path = os.path.join(self.directory, 'imagenet1k_label_list.txt')
+ self.pretrain_path = os.path.join(self.directory, 'LeViT_192_pretrained.pdparams')
+ self.config['Infer']['PostProcess']['class_id_map_file'] = self.label_path
+ self.model = LeViT_192()
+ param_state_dict = paddle.load(self.pretrain_path)
+ self.model.set_dict(param_state_dict)
+ self.preprocess_funcs = create_operators(self.config["Infer"]["transforms"])
+
+ def classification(self,
+ images: list = None,
+ paths: list = None,
+ batch_size: int = 1,
+ use_gpu: bool = False,
+ top_k: int = 1):
+ '''
+ Args:
+ images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR.
+ paths (list[str]): The paths of images.
+ batch_size (int): batch size.
+ use_gpu (bool): Whether to use gpu.
+ top_k (int): Return top k results.
+
+ Returns:
+ res (list[dict]): The classfication results, each result dict contains key 'class_ids', 'scores' and 'label_names'.
+ '''
+ postprocess_func = Topk(top_k, self.label_path)
+ inputs = []
+ results = []
+ paddle.disable_static()
+ place = 'gpu:0' if use_gpu else 'cpu'
+ place = paddle.set_device(place)
+ if images == None and paths == None:
+ print('No image provided. Please input an image or a image path.')
+ return
+
+ if images != None:
+ for image in images:
+ image = image[:, :, ::-1]
+ inputs.append(image)
+
+ if paths != None:
+ for path in paths:
+ image = cv2.imread(path)[:, :, ::-1]
+ inputs.append(image)
+
+ batch_data = []
+ for idx, imagedata in enumerate(inputs):
+ for process in self.preprocess_funcs:
+ imagedata = process(imagedata)
+ batch_data.append(imagedata)
+ if len(batch_data) >= batch_size or idx == len(inputs) - 1:
+ batch_tensor = paddle.to_tensor(batch_data)
+ out = self.model(batch_tensor)
+ if isinstance(out, list):
+ out = out[0]
+ if isinstance(out, dict) and "logits" in out:
+ out = out["logits"]
+ if isinstance(out, dict) and "output" in out:
+ out = out["output"]
+ result = postprocess_func(out)
+ results.extend(result)
+ batch_data.clear()
+ return results
+
+ @runnable
+ def run_cmd(self, argvs: list):
+ """
+ Run as a command.
+ """
+ self.parser = argparse.ArgumentParser(description="Run the {} module.".format(self.name),
+ prog='hub run {}'.format(self.name),
+ usage='%(prog)s',
+ add_help=True)
+
+ self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
+ self.arg_config_group = self.parser.add_argument_group(
+ title="Config options", description="Run configuration for controlling module behavior, not required.")
+ self.add_module_config_arg()
+ self.add_module_input_arg()
+ self.args = self.parser.parse_args(argvs)
+ results = self.classification(paths=[self.args.input_path],
+ use_gpu=self.args.use_gpu,
+ batch_size=self.args.batch_size,
+ top_k=self.args.top_k)
+ return results
+
+ @serving
+ def serving_method(self, images, **kwargs):
+ """
+ Run as a service.
+ """
+ images_decode = [base64_to_cv2(image) for image in images]
+ results = self.classification(images=images_decode, **kwargs)
+ return results
+
+ def add_module_config_arg(self):
+ """
+ Add the command config options.
+ """
+ self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not")
+
+ self.arg_config_group.add_argument('--batch_size', type=int, default=1, help='batch size')
+ self.arg_config_group.add_argument('--top_k', type=int, default=1, help='Return top k results.')
+
+ def add_module_input_arg(self):
+ """
+ Add the command input options.
+ """
+ self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.")
diff --git a/modules/image/classification/levit_192_imagenet/processor.py b/modules/image/classification/levit_192_imagenet/processor.py
new file mode 100644
index 0000000000000000000000000000000000000000..40cab3917ecaef50cd47d0abb76bbd5d49062bf8
--- /dev/null
+++ b/modules/image/classification/levit_192_imagenet/processor.py
@@ -0,0 +1,374 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+
+import base64
+import inspect
+import math
+import os
+import random
+import sys
+from functools import partial
+
+import cv2
+import numpy as np
+import paddle
+import paddle.nn.functional as F
+import six
+from paddle.vision.transforms import ColorJitter as RawColorJitter
+from PIL import Image
+
+
+def create_operators(params, class_num=None):
+ """
+ create operators based on the config
+
+ Args:
+ params(list): a dict list, used to create some operators
+ """
+ assert isinstance(params, list), ('operator config should be a list')
+ ops = []
+ current_module = sys.modules[__name__]
+ for operator in params:
+ assert isinstance(operator, dict) and len(operator) == 1, "yaml format error"
+ op_name = list(operator)[0]
+ param = {} if operator[op_name] is None else operator[op_name]
+ op_func = getattr(current_module, op_name)
+ if "class_num" in inspect.getfullargspec(op_func).args:
+ param.update({"class_num": class_num})
+ op = op_func(**param)
+ ops.append(op)
+
+ return ops
+
+
+class UnifiedResize(object):
+
+ def __init__(self, interpolation=None, backend="cv2"):
+ _cv2_interp_from_str = {
+ 'nearest': cv2.INTER_NEAREST,
+ 'bilinear': cv2.INTER_LINEAR,
+ 'area': cv2.INTER_AREA,
+ 'bicubic': cv2.INTER_CUBIC,
+ 'lanczos': cv2.INTER_LANCZOS4
+ }
+ _pil_interp_from_str = {
+ 'nearest': Image.NEAREST,
+ 'bilinear': Image.BILINEAR,
+ 'bicubic': Image.BICUBIC,
+ 'box': Image.BOX,
+ 'lanczos': Image.LANCZOS,
+ 'hamming': Image.HAMMING
+ }
+
+ def _pil_resize(src, size, resample):
+ pil_img = Image.fromarray(src)
+ pil_img = pil_img.resize(size, resample)
+ return np.asarray(pil_img)
+
+ if backend.lower() == "cv2":
+ if isinstance(interpolation, str):
+ interpolation = _cv2_interp_from_str[interpolation.lower()]
+ # compatible with opencv < version 4.4.0
+ elif interpolation is None:
+ interpolation = cv2.INTER_LINEAR
+ self.resize_func = partial(cv2.resize, interpolation=interpolation)
+ elif backend.lower() == "pil":
+ if isinstance(interpolation, str):
+ interpolation = _pil_interp_from_str[interpolation.lower()]
+ self.resize_func = partial(_pil_resize, resample=interpolation)
+ else:
+ self.resize_func = cv2.resize
+
+ def __call__(self, src, size):
+ return self.resize_func(src, size)
+
+
+class OperatorParamError(ValueError):
+ """ OperatorParamError
+ """
+ pass
+
+
+class DecodeImage(object):
+ """ decode image """
+
+ def __init__(self, to_rgb=True, to_np=False, channel_first=False):
+ self.to_rgb = to_rgb
+ self.to_np = to_np # to numpy
+ self.channel_first = channel_first # only enabled when to_np is True
+
+ def __call__(self, img):
+ if six.PY2:
+ assert type(img) is str and len(img) > 0, "invalid input 'img' in DecodeImage"
+ else:
+ assert type(img) is bytes and len(img) > 0, "invalid input 'img' in DecodeImage"
+ data = np.frombuffer(img, dtype='uint8')
+ img = cv2.imdecode(data, 1)
+ if self.to_rgb:
+ assert img.shape[2] == 3, 'invalid shape of image[%s]' % (img.shape)
+ img = img[:, :, ::-1]
+
+ if self.channel_first:
+ img = img.transpose((2, 0, 1))
+
+ return img
+
+
+class ResizeImage(object):
+ """ resize image """
+
+ def __init__(self, size=None, resize_short=None, interpolation=None, backend="cv2"):
+ if resize_short is not None and resize_short > 0:
+ self.resize_short = resize_short
+ self.w = None
+ self.h = None
+ elif size is not None:
+ self.resize_short = None
+ self.w = size if type(size) is int else size[0]
+ self.h = size if type(size) is int else size[1]
+ else:
+ raise OperatorParamError("invalid params for ReisizeImage for '\
+ 'both 'size' and 'resize_short' are None")
+
+ self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend)
+
+ def __call__(self, img):
+ img_h, img_w = img.shape[:2]
+ if self.resize_short is not None:
+ percent = float(self.resize_short) / min(img_w, img_h)
+ w = int(round(img_w * percent))
+ h = int(round(img_h * percent))
+ else:
+ w = self.w
+ h = self.h
+ return self._resize_func(img, (w, h))
+
+
+class CropImage(object):
+ """ crop image """
+
+ def __init__(self, size):
+ if type(size) is int:
+ self.size = (size, size)
+ else:
+ self.size = size # (h, w)
+
+ def __call__(self, img):
+ w, h = self.size
+ img_h, img_w = img.shape[:2]
+ w_start = (img_w - w) // 2
+ h_start = (img_h - h) // 2
+
+ w_end = w_start + w
+ h_end = h_start + h
+ return img[h_start:h_end, w_start:w_end, :]
+
+
+class RandCropImage(object):
+ """ random crop image """
+
+ def __init__(self, size, scale=None, ratio=None, interpolation=None, backend="cv2"):
+ if type(size) is int:
+ self.size = (size, size) # (h, w)
+ else:
+ self.size = size
+
+ self.scale = [0.08, 1.0] if scale is None else scale
+ self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio
+
+ self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend)
+
+ def __call__(self, img):
+ size = self.size
+ scale = self.scale
+ ratio = self.ratio
+
+ aspect_ratio = math.sqrt(random.uniform(*ratio))
+ w = 1. * aspect_ratio
+ h = 1. / aspect_ratio
+
+ img_h, img_w = img.shape[:2]
+
+ bound = min((float(img_w) / img_h) / (w**2), (float(img_h) / img_w) / (h**2))
+ scale_max = min(scale[1], bound)
+ scale_min = min(scale[0], bound)
+
+ target_area = img_w * img_h * random.uniform(scale_min, scale_max)
+ target_size = math.sqrt(target_area)
+ w = int(target_size * w)
+ h = int(target_size * h)
+
+ i = random.randint(0, img_w - w)
+ j = random.randint(0, img_h - h)
+
+ img = img[j:j + h, i:i + w, :]
+
+ return self._resize_func(img, size)
+
+
+class RandFlipImage(object):
+ """ random flip image
+ flip_code:
+ 1: Flipped Horizontally
+ 0: Flipped Vertically
+ -1: Flipped Horizontally & Vertically
+ """
+
+ def __init__(self, flip_code=1):
+ assert flip_code in [-1, 0, 1], "flip_code should be a value in [-1, 0, 1]"
+ self.flip_code = flip_code
+
+ def __call__(self, img):
+ if random.randint(0, 1) == 1:
+ return cv2.flip(img, self.flip_code)
+ else:
+ return img
+
+
+class NormalizeImage(object):
+ """ normalize image such as substract mean, divide std
+ """
+
+ def __init__(self, scale=None, mean=None, std=None, order='chw', output_fp16=False, channel_num=3):
+ if isinstance(scale, str):
+ scale = eval(scale)
+ assert channel_num in [3, 4], "channel number of input image should be set to 3 or 4."
+ self.channel_num = channel_num
+ self.output_dtype = 'float16' if output_fp16 else 'float32'
+ self.scale = np.float32(scale if scale is not None else 1.0 / 255.0)
+ self.order = order
+ mean = mean if mean is not None else [0.485, 0.456, 0.406]
+ std = std if std is not None else [0.229, 0.224, 0.225]
+
+ shape = (3, 1, 1) if self.order == 'chw' else (1, 1, 3)
+ self.mean = np.array(mean).reshape(shape).astype('float32')
+ self.std = np.array(std).reshape(shape).astype('float32')
+
+ def __call__(self, img):
+ from PIL import Image
+ if isinstance(img, Image.Image):
+ img = np.array(img)
+
+ assert isinstance(img, np.ndarray), "invalid input 'img' in NormalizeImage"
+
+ img = (img.astype('float32') * self.scale - self.mean) / self.std
+
+ if self.channel_num == 4:
+ img_h = img.shape[1] if self.order == 'chw' else img.shape[0]
+ img_w = img.shape[2] if self.order == 'chw' else img.shape[1]
+ pad_zeros = np.zeros((1, img_h, img_w)) if self.order == 'chw' else np.zeros((img_h, img_w, 1))
+ img = (np.concatenate((img, pad_zeros), axis=0) if self.order == 'chw' else np.concatenate(
+ (img, pad_zeros), axis=2))
+ return img.astype(self.output_dtype)
+
+
+class ToCHWImage(object):
+ """ convert hwc image to chw image
+ """
+
+ def __init__(self):
+ pass
+
+ def __call__(self, img):
+ from PIL import Image
+ if isinstance(img, Image.Image):
+ img = np.array(img)
+
+ return img.transpose((2, 0, 1))
+
+
+class ColorJitter(RawColorJitter):
+ """ColorJitter.
+ """
+
+ def __init__(self, *args, **kwargs):
+ super().__init__(*args, **kwargs)
+
+ def __call__(self, img):
+ if not isinstance(img, Image.Image):
+ img = np.ascontiguousarray(img)
+ img = Image.fromarray(img)
+ img = super()._apply_image(img)
+ if isinstance(img, Image.Image):
+ img = np.asarray(img)
+ return img
+
+
+def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+
+class Topk(object):
+
+ def __init__(self, topk=1, class_id_map_file=None):
+ assert isinstance(topk, (int, ))
+ self.class_id_map = self.parse_class_id_map(class_id_map_file)
+ self.topk = topk
+
+ def parse_class_id_map(self, class_id_map_file):
+ if class_id_map_file is None:
+ return None
+ if not os.path.exists(class_id_map_file):
+ print(
+ "Warning: If want to use your own label_dict, please input legal path!\nOtherwise label_names will be empty!"
+ )
+ return None
+
+ try:
+ class_id_map = {}
+ with open(class_id_map_file, "r") as fin:
+ lines = fin.readlines()
+ for line in lines:
+ partition = line.split("\n")[0].partition(" ")
+ class_id_map[int(partition[0])] = str(partition[-1])
+ except Exception as ex:
+ print(ex)
+ class_id_map = None
+ return class_id_map
+
+ def __call__(self, x, file_names=None, multilabel=False):
+ assert isinstance(x, paddle.Tensor)
+ if file_names is not None:
+ assert x.shape[0] == len(file_names)
+ x = F.softmax(x, axis=-1) if not multilabel else F.sigmoid(x)
+ x = x.numpy()
+ y = []
+ for idx, probs in enumerate(x):
+ index = probs.argsort(axis=0)[-self.topk:][::-1].astype("int32") if not multilabel else np.where(
+ probs >= 0.5)[0].astype("int32")
+ clas_id_list = []
+ score_list = []
+ label_name_list = []
+ for i in index:
+ clas_id_list.append(i.item())
+ score_list.append(probs[i].item())
+ if self.class_id_map is not None:
+ label_name_list.append(self.class_id_map[i.item()])
+ result = {
+ "class_ids": clas_id_list,
+ "scores": np.around(score_list, decimals=5).tolist(),
+ }
+ if file_names is not None:
+ result["file_name"] = file_names[idx]
+ if label_name_list is not None:
+ result["label_names"] = label_name_list
+ y.append(result)
+ return y
diff --git a/modules/image/classification/levit_192_imagenet/utils.py b/modules/image/classification/levit_192_imagenet/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..df2bc36b3050beb0256bf2266dd6b33b4590e537
--- /dev/null
+++ b/modules/image/classification/levit_192_imagenet/utils.py
@@ -0,0 +1,129 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import copy
+import os
+
+import yaml
+
+__all__ = ['get_config']
+
+
+class AttrDict(dict):
+
+ def __getattr__(self, key):
+ return self[key]
+
+ def __setattr__(self, key, value):
+ if key in self.__dict__:
+ self.__dict__[key] = value
+ else:
+ self[key] = value
+
+ def __deepcopy__(self, content):
+ return copy.deepcopy(dict(self))
+
+
+def create_attr_dict(yaml_config):
+ from ast import literal_eval
+ for key, value in yaml_config.items():
+ if type(value) is dict:
+ yaml_config[key] = value = AttrDict(value)
+ if isinstance(value, str):
+ try:
+ value = literal_eval(value)
+ except BaseException:
+ pass
+ if isinstance(value, AttrDict):
+ create_attr_dict(yaml_config[key])
+ else:
+ yaml_config[key] = value
+
+
+def parse_config(cfg_file):
+ """Load a config file into AttrDict"""
+ with open(cfg_file, 'r') as fopen:
+ yaml_config = AttrDict(yaml.load(fopen, Loader=yaml.SafeLoader))
+ create_attr_dict(yaml_config)
+ return yaml_config
+
+
+def override(dl, ks, v):
+ """
+ Recursively replace dict of list
+ Args:
+ dl(dict or list): dict or list to be replaced
+ ks(list): list of keys
+ v(str): value to be replaced
+ """
+
+ def str2num(v):
+ try:
+ return eval(v)
+ except Exception:
+ return v
+
+ assert isinstance(dl, (list, dict)), ("{} should be a list or a dict")
+ assert len(ks) > 0, ('lenght of keys should larger than 0')
+ if isinstance(dl, list):
+ k = str2num(ks[0])
+ if len(ks) == 1:
+ assert k < len(dl), ('index({}) out of range({})'.format(k, dl))
+ dl[k] = str2num(v)
+ else:
+ override(dl[k], ks[1:], v)
+ else:
+ if len(ks) == 1:
+ # assert ks[0] in dl, ('{} is not exist in {}'.format(ks[0], dl))
+ if not ks[0] in dl:
+ print('A new filed ({}) detected!'.format(ks[0], dl))
+ dl[ks[0]] = str2num(v)
+ else:
+ override(dl[ks[0]], ks[1:], v)
+
+
+def override_config(config, options=None):
+ """
+ Recursively override the config
+ Args:
+ config(dict): dict to be replaced
+ options(list): list of pairs(key0.key1.idx.key2=value)
+ such as: [
+ 'topk=2',
+ 'VALID.transforms.1.ResizeImage.resize_short=300'
+ ]
+ Returns:
+ config(dict): replaced config
+ """
+ if options is not None:
+ for opt in options:
+ assert isinstance(opt, str), ("option({}) should be a str".format(opt))
+ assert "=" in opt, ("option({}) should contain a ="
+ "to distinguish between key and value".format(opt))
+ pair = opt.split('=')
+ assert len(pair) == 2, ("there can be only a = in the option")
+ key, value = pair
+ keys = key.split('.')
+ override(config, keys, value)
+ return config
+
+
+def get_config(fname, overrides=None, show=False):
+ """
+ Read config from file
+ """
+ assert os.path.exists(fname), ('config file({}) is not exist'.format(fname))
+ config = parse_config(fname)
+ override_config(config, overrides)
+ return config
diff --git a/modules/image/classification/levit_256_imagenet/README.md b/modules/image/classification/levit_256_imagenet/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..fefc5bebd522cd9199c1626764be47122e3fc761
--- /dev/null
+++ b/modules/image/classification/levit_256_imagenet/README.md
@@ -0,0 +1,132 @@
+# levit_256_imagenet
+
+|模型名称|levit_256_imagenet|
+| :--- | :---: |
+|类别|图像-图像分类|
+|网络|LeViT|
+|数据集|ImageNet-2012|
+|是否支持Fine-tuning|否|
+|模型大小|109 MB|
+|最新更新日期|2022-04-02|
+|数据指标|Acc|
+
+
+## 一、模型基本信息
+
+
+- ### 模型介绍
+
+ - LeViT 是一种快速推理的、用于图像分类任务的混合神经网络。其设计之初考虑了网络模型在不同的硬件平台上的性能,因此能够更好地反映普遍应用的真实场景。通过大量实验,作者找到了卷积神经网络与 Transformer 体系更好的结合方式,并且提出了 attention-based 方法,用于整合 Transformer 中的位置信息编码, 该模块的模型结构配置为LeViT256, 详情可参考[论文地址](https://arxiv.org/abs/2104.01136)。
+
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 1.6.2
+
+ - paddlehub >= 1.6.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
+
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install levit_256_imagenet
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ $ hub run levit_256_imagenet --input_path "/PATH/TO/IMAGE"
+ ```
+ - 通过命令行方式实现分类模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ classifier = hub.Module(name="levit_256_imagenet")
+ result = classifier.classification(images=[cv2.imread('/PATH/TO/IMAGE')])
+ # or
+ # result = classifier.classification(paths=['/PATH/TO/IMAGE'])
+ ```
+
+- ### 3、API
+
+
+ - ```python
+ def classification(images=None,
+ paths=None,
+ batch_size=1,
+ use_gpu=False,
+ top_k=1):
+ ```
+ - 分类接口API。
+ - **参数**
+
+ - images (list\[numpy.ndarray\]): 图片数据,每一个图片数据的shape 均为 \[H, W, C\],颜色空间为 BGR;
+ - paths (list\[str\]): 图片的路径;
+ - batch\_size (int): batch 的大小;
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
+ - top\_k (int): 返回预测结果的前 k 个。
+
+ - **返回**
+
+ - res (list\[dict\]): 分类结果,列表的每一个元素均为字典,其中 key 包括'class_ids'(种类索引), 'scores'(置信度) 和 'label_names'(种类名称)
+
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个图像识别的在线服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+ - ```shell
+ $ hub serving start -m levit_256_imagenet
+ ```
+
+ - 这样就完成了一个图像识别的在线服务的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ # 发送HTTP请求
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+ headers = {"Content-type": "application/json"\}
+ url = "http://127.0.0.1:8866/predict/levit_256_imagenet"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # 打印预测结果
+ print(r.json()["results"])
+ ```
+
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+
+ - ```shell
+ $ hub install levit_256_imagenet==1.0.0
+ ```
diff --git a/modules/image/classification/levit_256_imagenet/model.py b/modules/image/classification/levit_256_imagenet/model.py
new file mode 100644
index 0000000000000000000000000000000000000000..66b5cd8d040627c419d7df47f20eddd52760ff5a
--- /dev/null
+++ b/modules/image/classification/levit_256_imagenet/model.py
@@ -0,0 +1,450 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# Code was based on https://github.com/facebookresearch/LeViT
+import itertools
+import math
+import warnings
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn.initializer import Constant
+from paddle.nn.initializer import TruncatedNormal
+from paddle.regularizer import L2Decay
+
+from .vision_transformer import Identity
+from .vision_transformer import ones_
+from .vision_transformer import trunc_normal_
+from .vision_transformer import zeros_
+
+
+def cal_attention_biases(attention_biases, attention_bias_idxs):
+ gather_list = []
+ attention_bias_t = paddle.transpose(attention_biases, (1, 0))
+ nums = attention_bias_idxs.shape[0]
+ for idx in range(nums):
+ gather = paddle.gather(attention_bias_t, attention_bias_idxs[idx])
+ gather_list.append(gather)
+ shape0, shape1 = attention_bias_idxs.shape
+ gather = paddle.concat(gather_list)
+ return paddle.transpose(gather, (1, 0)).reshape((0, shape0, shape1))
+
+
+class Conv2d_BN(nn.Sequential):
+
+ def __init__(self, a, b, ks=1, stride=1, pad=0, dilation=1, groups=1, bn_weight_init=1, resolution=-10000):
+ super().__init__()
+ self.add_sublayer('c', nn.Conv2D(a, b, ks, stride, pad, dilation, groups, bias_attr=False))
+ bn = nn.BatchNorm2D(b)
+ ones_(bn.weight)
+ zeros_(bn.bias)
+ self.add_sublayer('bn', bn)
+
+
+class Linear_BN(nn.Sequential):
+
+ def __init__(self, a, b, bn_weight_init=1):
+ super().__init__()
+ self.add_sublayer('c', nn.Linear(a, b, bias_attr=False))
+ bn = nn.BatchNorm1D(b)
+ if bn_weight_init == 0:
+ zeros_(bn.weight)
+ else:
+ ones_(bn.weight)
+ zeros_(bn.bias)
+ self.add_sublayer('bn', bn)
+
+ def forward(self, x):
+ l, bn = self._sub_layers.values()
+ x = l(x)
+ return paddle.reshape(bn(x.flatten(0, 1)), x.shape)
+
+
+class BN_Linear(nn.Sequential):
+
+ def __init__(self, a, b, bias=True, std=0.02):
+ super().__init__()
+ self.add_sublayer('bn', nn.BatchNorm1D(a))
+ l = nn.Linear(a, b, bias_attr=bias)
+ trunc_normal_(l.weight)
+ if bias:
+ zeros_(l.bias)
+ self.add_sublayer('l', l)
+
+
+def b16(n, activation, resolution=224):
+ return nn.Sequential(Conv2d_BN(3, n // 8, 3, 2, 1, resolution=resolution), activation(),
+ Conv2d_BN(n // 8, n // 4, 3, 2, 1, resolution=resolution // 2), activation(),
+ Conv2d_BN(n // 4, n // 2, 3, 2, 1, resolution=resolution // 4), activation(),
+ Conv2d_BN(n // 2, n, 3, 2, 1, resolution=resolution // 8))
+
+
+class Residual(nn.Layer):
+
+ def __init__(self, m, drop):
+ super().__init__()
+ self.m = m
+ self.drop = drop
+
+ def forward(self, x):
+ if self.training and self.drop > 0:
+ y = paddle.rand(shape=[x.shape[0], 1, 1]).__ge__(self.drop).astype("float32")
+ y = y.divide(paddle.full_like(y, 1 - self.drop))
+ return paddle.add(x, y)
+ else:
+ return paddle.add(x, self.m(x))
+
+
+class Attention(nn.Layer):
+
+ def __init__(self, dim, key_dim, num_heads=8, attn_ratio=4, activation=None, resolution=14):
+ super().__init__()
+ self.num_heads = num_heads
+ self.scale = key_dim**-0.5
+ self.key_dim = key_dim
+ self.nh_kd = nh_kd = key_dim * num_heads
+ self.d = int(attn_ratio * key_dim)
+ self.dh = int(attn_ratio * key_dim) * num_heads
+ self.attn_ratio = attn_ratio
+ self.h = self.dh + nh_kd * 2
+ self.qkv = Linear_BN(dim, self.h)
+ self.proj = nn.Sequential(activation(), Linear_BN(self.dh, dim, bn_weight_init=0))
+ points = list(itertools.product(range(resolution), range(resolution)))
+ N = len(points)
+ attention_offsets = {}
+ idxs = []
+ for p1 in points:
+ for p2 in points:
+ offset = (abs(p1[0] - p2[0]), abs(p1[1] - p2[1]))
+ if offset not in attention_offsets:
+ attention_offsets[offset] = len(attention_offsets)
+ idxs.append(attention_offsets[offset])
+ self.attention_biases = self.create_parameter(shape=(num_heads, len(attention_offsets)),
+ default_initializer=zeros_,
+ attr=paddle.ParamAttr(regularizer=L2Decay(0.0)))
+ tensor_idxs = paddle.to_tensor(idxs, dtype='int64')
+ self.register_buffer('attention_bias_idxs', paddle.reshape(tensor_idxs, [N, N]))
+
+ @paddle.no_grad()
+ def train(self, mode=True):
+ if mode:
+ super().train()
+ else:
+ super().eval()
+ if mode and hasattr(self, 'ab'):
+ del self.ab
+ else:
+ self.ab = cal_attention_biases(self.attention_biases, self.attention_bias_idxs)
+
+ def forward(self, x):
+ self.training = True
+ B, N, C = x.shape
+ qkv = self.qkv(x)
+ qkv = paddle.reshape(qkv, [B, N, self.num_heads, self.h // self.num_heads])
+ q, k, v = paddle.split(qkv, [self.key_dim, self.key_dim, self.d], axis=3)
+ q = paddle.transpose(q, perm=[0, 2, 1, 3])
+ k = paddle.transpose(k, perm=[0, 2, 1, 3])
+ v = paddle.transpose(v, perm=[0, 2, 1, 3])
+ k_transpose = paddle.transpose(k, perm=[0, 1, 3, 2])
+
+ if self.training:
+ attention_biases = cal_attention_biases(self.attention_biases, self.attention_bias_idxs)
+ else:
+ attention_biases = self.ab
+ attn = (paddle.matmul(q, k_transpose) * self.scale + attention_biases)
+ attn = F.softmax(attn)
+ x = paddle.transpose(paddle.matmul(attn, v), perm=[0, 2, 1, 3])
+ x = paddle.reshape(x, [B, N, self.dh])
+ x = self.proj(x)
+ return x
+
+
+class Subsample(nn.Layer):
+
+ def __init__(self, stride, resolution):
+ super().__init__()
+ self.stride = stride
+ self.resolution = resolution
+
+ def forward(self, x):
+ B, N, C = x.shape
+ x = paddle.reshape(x, [B, self.resolution, self.resolution, C])
+ end1, end2 = x.shape[1], x.shape[2]
+ x = x[:, 0:end1:self.stride, 0:end2:self.stride]
+ x = paddle.reshape(x, [B, -1, C])
+ return x
+
+
+class AttentionSubsample(nn.Layer):
+
+ def __init__(self,
+ in_dim,
+ out_dim,
+ key_dim,
+ num_heads=8,
+ attn_ratio=2,
+ activation=None,
+ stride=2,
+ resolution=14,
+ resolution_=7):
+ super().__init__()
+ self.num_heads = num_heads
+ self.scale = key_dim**-0.5
+ self.key_dim = key_dim
+ self.nh_kd = nh_kd = key_dim * num_heads
+ self.d = int(attn_ratio * key_dim)
+ self.dh = int(attn_ratio * key_dim) * self.num_heads
+ self.attn_ratio = attn_ratio
+ self.resolution_ = resolution_
+ self.resolution_2 = resolution_**2
+ self.training = True
+ h = self.dh + nh_kd
+ self.kv = Linear_BN(in_dim, h)
+
+ self.q = nn.Sequential(Subsample(stride, resolution), Linear_BN(in_dim, nh_kd))
+ self.proj = nn.Sequential(activation(), Linear_BN(self.dh, out_dim))
+
+ self.stride = stride
+ self.resolution = resolution
+ points = list(itertools.product(range(resolution), range(resolution)))
+ points_ = list(itertools.product(range(resolution_), range(resolution_)))
+
+ N = len(points)
+ N_ = len(points_)
+ attention_offsets = {}
+ idxs = []
+ i = 0
+ j = 0
+ for p1 in points_:
+ i += 1
+ for p2 in points:
+ j += 1
+ size = 1
+ offset = (abs(p1[0] * stride - p2[0] + (size - 1) / 2), abs(p1[1] * stride - p2[1] + (size - 1) / 2))
+ if offset not in attention_offsets:
+ attention_offsets[offset] = len(attention_offsets)
+ idxs.append(attention_offsets[offset])
+ self.attention_biases = self.create_parameter(shape=(num_heads, len(attention_offsets)),
+ default_initializer=zeros_,
+ attr=paddle.ParamAttr(regularizer=L2Decay(0.0)))
+
+ tensor_idxs_ = paddle.to_tensor(idxs, dtype='int64')
+ self.register_buffer('attention_bias_idxs', paddle.reshape(tensor_idxs_, [N_, N]))
+
+ @paddle.no_grad()
+ def train(self, mode=True):
+ if mode:
+ super().train()
+ else:
+ super().eval()
+ if mode and hasattr(self, 'ab'):
+ del self.ab
+ else:
+ self.ab = cal_attention_biases(self.attention_biases, self.attention_bias_idxs)
+
+ def forward(self, x):
+ self.training = True
+ B, N, C = x.shape
+ kv = self.kv(x)
+ kv = paddle.reshape(kv, [B, N, self.num_heads, -1])
+ k, v = paddle.split(kv, [self.key_dim, self.d], axis=3)
+ k = paddle.transpose(k, perm=[0, 2, 1, 3]) # BHNC
+ v = paddle.transpose(v, perm=[0, 2, 1, 3])
+ q = paddle.reshape(self.q(x), [B, self.resolution_2, self.num_heads, self.key_dim])
+ q = paddle.transpose(q, perm=[0, 2, 1, 3])
+
+ if self.training:
+ attention_biases = cal_attention_biases(self.attention_biases, self.attention_bias_idxs)
+ else:
+ attention_biases = self.ab
+
+ attn = (paddle.matmul(q, paddle.transpose(k, perm=[0, 1, 3, 2]))) * self.scale + attention_biases
+ attn = F.softmax(attn)
+
+ x = paddle.reshape(paddle.transpose(paddle.matmul(attn, v), perm=[0, 2, 1, 3]), [B, -1, self.dh])
+ x = self.proj(x)
+ return x
+
+
+class LeViT(nn.Layer):
+ """ Vision Transformer with support for patch or hybrid CNN input stage
+ """
+
+ def __init__(self,
+ img_size=224,
+ patch_size=16,
+ in_chans=3,
+ class_num=1000,
+ embed_dim=[192],
+ key_dim=[64],
+ depth=[12],
+ num_heads=[3],
+ attn_ratio=[2],
+ mlp_ratio=[2],
+ hybrid_backbone=None,
+ down_ops=[],
+ attention_activation=nn.Hardswish,
+ mlp_activation=nn.Hardswish,
+ distillation=True,
+ drop_path=0):
+ super().__init__()
+
+ self.class_num = class_num
+ self.num_features = embed_dim[-1]
+ self.embed_dim = embed_dim
+ self.distillation = distillation
+
+ self.patch_embed = hybrid_backbone
+
+ self.blocks = []
+ down_ops.append([''])
+ resolution = img_size // patch_size
+ for i, (ed, kd, dpth, nh, ar, mr,
+ do) in enumerate(zip(embed_dim, key_dim, depth, num_heads, attn_ratio, mlp_ratio, down_ops)):
+ for _ in range(dpth):
+ self.blocks.append(
+ Residual(
+ Attention(
+ ed,
+ kd,
+ nh,
+ attn_ratio=ar,
+ activation=attention_activation,
+ resolution=resolution,
+ ), drop_path))
+ if mr > 0:
+ h = int(ed * mr)
+ self.blocks.append(
+ Residual(
+ nn.Sequential(
+ Linear_BN(ed, h),
+ mlp_activation(),
+ Linear_BN(h, ed, bn_weight_init=0),
+ ), drop_path))
+ if do[0] == 'Subsample':
+ #('Subsample',key_dim, num_heads, attn_ratio, mlp_ratio, stride)
+ resolution_ = (resolution - 1) // do[5] + 1
+ self.blocks.append(
+ AttentionSubsample(*embed_dim[i:i + 2],
+ key_dim=do[1],
+ num_heads=do[2],
+ attn_ratio=do[3],
+ activation=attention_activation,
+ stride=do[5],
+ resolution=resolution,
+ resolution_=resolution_))
+ resolution = resolution_
+ if do[4] > 0: # mlp_ratio
+ h = int(embed_dim[i + 1] * do[4])
+ self.blocks.append(
+ Residual(
+ nn.Sequential(
+ Linear_BN(embed_dim[i + 1], h),
+ mlp_activation(),
+ Linear_BN(h, embed_dim[i + 1], bn_weight_init=0),
+ ), drop_path))
+ self.blocks = nn.Sequential(*self.blocks)
+
+ # Classifier head
+ self.head = BN_Linear(embed_dim[-1], class_num) if class_num > 0 else Identity()
+ if distillation:
+ self.head_dist = BN_Linear(embed_dim[-1], class_num) if class_num > 0 else Identity()
+
+ def forward(self, x):
+ x = self.patch_embed(x)
+ x = x.flatten(2)
+ x = paddle.transpose(x, perm=[0, 2, 1])
+ x = self.blocks(x)
+ x = x.mean(1)
+
+ x = paddle.reshape(x, [-1, self.embed_dim[-1]])
+ if self.distillation:
+ x = self.head(x), self.head_dist(x)
+ if not self.training:
+ x = (x[0] + x[1]) / 2
+ else:
+ x = self.head(x)
+ return x
+
+
+def model_factory(C, D, X, N, drop_path, class_num, distillation):
+ embed_dim = [int(x) for x in C.split('_')]
+ num_heads = [int(x) for x in N.split('_')]
+ depth = [int(x) for x in X.split('_')]
+ act = nn.Hardswish
+ model = LeViT(
+ patch_size=16,
+ embed_dim=embed_dim,
+ num_heads=num_heads,
+ key_dim=[D] * 3,
+ depth=depth,
+ attn_ratio=[2, 2, 2],
+ mlp_ratio=[2, 2, 2],
+ down_ops=[
+ #('Subsample',key_dim, num_heads, attn_ratio, mlp_ratio, stride)
+ ['Subsample', D, embed_dim[0] // D, 4, 2, 2],
+ ['Subsample', D, embed_dim[1] // D, 4, 2, 2],
+ ],
+ attention_activation=act,
+ mlp_activation=act,
+ hybrid_backbone=b16(embed_dim[0], activation=act),
+ class_num=class_num,
+ drop_path=drop_path,
+ distillation=distillation)
+
+ return model
+
+
+specification = {
+ 'LeViT_128S': {
+ 'C': '128_256_384',
+ 'D': 16,
+ 'N': '4_6_8',
+ 'X': '2_3_4',
+ 'drop_path': 0
+ },
+ 'LeViT_128': {
+ 'C': '128_256_384',
+ 'D': 16,
+ 'N': '4_8_12',
+ 'X': '4_4_4',
+ 'drop_path': 0
+ },
+ 'LeViT_192': {
+ 'C': '192_288_384',
+ 'D': 32,
+ 'N': '3_5_6',
+ 'X': '4_4_4',
+ 'drop_path': 0
+ },
+ 'LeViT_256': {
+ 'C': '256_384_512',
+ 'D': 32,
+ 'N': '4_6_8',
+ 'X': '4_4_4',
+ 'drop_path': 0
+ },
+ 'LeViT_384': {
+ 'C': '384_512_768',
+ 'D': 32,
+ 'N': '6_9_12',
+ 'X': '4_4_4',
+ 'drop_path': 0.1
+ },
+}
+
+
+def LeViT_256(**kwargs):
+ model = model_factory(**specification['LeViT_256'], class_num=1000, distillation=False)
+ return model
diff --git a/modules/image/classification/levit_256_imagenet/module.py b/modules/image/classification/levit_256_imagenet/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..161cc02c0c69a2a7ebd901f1f783d35d87e0668d
--- /dev/null
+++ b/modules/image/classification/levit_256_imagenet/module.py
@@ -0,0 +1,154 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import copy
+import os
+
+import cv2
+import numpy as np
+import paddle
+from skimage.io import imread
+from skimage.transform import rescale
+from skimage.transform import resize
+
+import paddlehub as hub
+from .model import LeViT_256
+from .processor import base64_to_cv2
+from .processor import create_operators
+from .processor import Topk
+from .utils import get_config
+from paddlehub.module.module import moduleinfo
+from paddlehub.module.module import runnable
+from paddlehub.module.module import serving
+
+
+@moduleinfo(name="levit_256_imagenet",
+ type="cv/classification",
+ author="paddlepaddle",
+ author_email="",
+ summary="",
+ version="1.0.0")
+class LeViT_256_ImageNet:
+
+ def __init__(self):
+ self.config = get_config(os.path.join(self.directory, 'LeViT_256.yaml'), show=False)
+ self.label_path = os.path.join(self.directory, 'imagenet1k_label_list.txt')
+ self.pretrain_path = os.path.join(self.directory, 'LeViT_256_pretrained.pdparams')
+ self.config['Infer']['PostProcess']['class_id_map_file'] = self.label_path
+ self.model = LeViT_256()
+ param_state_dict = paddle.load(self.pretrain_path)
+ self.model.set_dict(param_state_dict)
+ self.preprocess_funcs = create_operators(self.config["Infer"]["transforms"])
+
+ def classification(self,
+ images: list = None,
+ paths: list = None,
+ batch_size: int = 1,
+ use_gpu: bool = False,
+ top_k: int = 1):
+ '''
+ Args:
+ images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR.
+ paths (list[str]): The paths of images.
+ batch_size (int): batch size.
+ use_gpu (bool): Whether to use gpu.
+ top_k (int): Return top k results.
+
+ Returns:
+ res (list[dict]): The classfication results, each result dict contains key 'class_ids', 'scores' and 'label_names'.
+ '''
+ postprocess_func = Topk(top_k, self.label_path)
+ inputs = []
+ results = []
+ paddle.disable_static()
+ place = 'gpu:0' if use_gpu else 'cpu'
+ place = paddle.set_device(place)
+ if images == None and paths == None:
+ print('No image provided. Please input an image or a image path.')
+ return
+
+ if images != None:
+ for image in images:
+ image = image[:, :, ::-1]
+ inputs.append(image)
+
+ if paths != None:
+ for path in paths:
+ image = cv2.imread(path)[:, :, ::-1]
+ inputs.append(image)
+
+ batch_data = []
+ for idx, imagedata in enumerate(inputs):
+ for process in self.preprocess_funcs:
+ imagedata = process(imagedata)
+ batch_data.append(imagedata)
+ if len(batch_data) >= batch_size or idx == len(inputs) - 1:
+ batch_tensor = paddle.to_tensor(batch_data)
+ out = self.model(batch_tensor)
+ if isinstance(out, list):
+ out = out[0]
+ if isinstance(out, dict) and "logits" in out:
+ out = out["logits"]
+ if isinstance(out, dict) and "output" in out:
+ out = out["output"]
+ result = postprocess_func(out)
+ results.extend(result)
+ batch_data.clear()
+ return results
+
+ @runnable
+ def run_cmd(self, argvs: list):
+ """
+ Run as a command.
+ """
+ self.parser = argparse.ArgumentParser(description="Run the {} module.".format(self.name),
+ prog='hub run {}'.format(self.name),
+ usage='%(prog)s',
+ add_help=True)
+
+ self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
+ self.arg_config_group = self.parser.add_argument_group(
+ title="Config options", description="Run configuration for controlling module behavior, not required.")
+ self.add_module_config_arg()
+ self.add_module_input_arg()
+ self.args = self.parser.parse_args(argvs)
+ results = self.classification(paths=[self.args.input_path],
+ use_gpu=self.args.use_gpu,
+ batch_size=self.args.batch_size,
+ top_k=self.args.top_k)
+ return results
+
+ @serving
+ def serving_method(self, images, **kwargs):
+ """
+ Run as a service.
+ """
+ images_decode = [base64_to_cv2(image) for image in images]
+ results = self.classification(images=images_decode, **kwargs)
+ return results
+
+ def add_module_config_arg(self):
+ """
+ Add the command config options.
+ """
+ self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not")
+
+ self.arg_config_group.add_argument('--batch_size', type=int, default=1, help='batch size')
+ self.arg_config_group.add_argument('--top_k', type=int, default=1, help='Return top k results.')
+
+ def add_module_input_arg(self):
+ """
+ Add the command input options.
+ """
+ self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.")
diff --git a/modules/image/classification/levit_256_imagenet/processor.py b/modules/image/classification/levit_256_imagenet/processor.py
new file mode 100644
index 0000000000000000000000000000000000000000..40cab3917ecaef50cd47d0abb76bbd5d49062bf8
--- /dev/null
+++ b/modules/image/classification/levit_256_imagenet/processor.py
@@ -0,0 +1,374 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+
+import base64
+import inspect
+import math
+import os
+import random
+import sys
+from functools import partial
+
+import cv2
+import numpy as np
+import paddle
+import paddle.nn.functional as F
+import six
+from paddle.vision.transforms import ColorJitter as RawColorJitter
+from PIL import Image
+
+
+def create_operators(params, class_num=None):
+ """
+ create operators based on the config
+
+ Args:
+ params(list): a dict list, used to create some operators
+ """
+ assert isinstance(params, list), ('operator config should be a list')
+ ops = []
+ current_module = sys.modules[__name__]
+ for operator in params:
+ assert isinstance(operator, dict) and len(operator) == 1, "yaml format error"
+ op_name = list(operator)[0]
+ param = {} if operator[op_name] is None else operator[op_name]
+ op_func = getattr(current_module, op_name)
+ if "class_num" in inspect.getfullargspec(op_func).args:
+ param.update({"class_num": class_num})
+ op = op_func(**param)
+ ops.append(op)
+
+ return ops
+
+
+class UnifiedResize(object):
+
+ def __init__(self, interpolation=None, backend="cv2"):
+ _cv2_interp_from_str = {
+ 'nearest': cv2.INTER_NEAREST,
+ 'bilinear': cv2.INTER_LINEAR,
+ 'area': cv2.INTER_AREA,
+ 'bicubic': cv2.INTER_CUBIC,
+ 'lanczos': cv2.INTER_LANCZOS4
+ }
+ _pil_interp_from_str = {
+ 'nearest': Image.NEAREST,
+ 'bilinear': Image.BILINEAR,
+ 'bicubic': Image.BICUBIC,
+ 'box': Image.BOX,
+ 'lanczos': Image.LANCZOS,
+ 'hamming': Image.HAMMING
+ }
+
+ def _pil_resize(src, size, resample):
+ pil_img = Image.fromarray(src)
+ pil_img = pil_img.resize(size, resample)
+ return np.asarray(pil_img)
+
+ if backend.lower() == "cv2":
+ if isinstance(interpolation, str):
+ interpolation = _cv2_interp_from_str[interpolation.lower()]
+ # compatible with opencv < version 4.4.0
+ elif interpolation is None:
+ interpolation = cv2.INTER_LINEAR
+ self.resize_func = partial(cv2.resize, interpolation=interpolation)
+ elif backend.lower() == "pil":
+ if isinstance(interpolation, str):
+ interpolation = _pil_interp_from_str[interpolation.lower()]
+ self.resize_func = partial(_pil_resize, resample=interpolation)
+ else:
+ self.resize_func = cv2.resize
+
+ def __call__(self, src, size):
+ return self.resize_func(src, size)
+
+
+class OperatorParamError(ValueError):
+ """ OperatorParamError
+ """
+ pass
+
+
+class DecodeImage(object):
+ """ decode image """
+
+ def __init__(self, to_rgb=True, to_np=False, channel_first=False):
+ self.to_rgb = to_rgb
+ self.to_np = to_np # to numpy
+ self.channel_first = channel_first # only enabled when to_np is True
+
+ def __call__(self, img):
+ if six.PY2:
+ assert type(img) is str and len(img) > 0, "invalid input 'img' in DecodeImage"
+ else:
+ assert type(img) is bytes and len(img) > 0, "invalid input 'img' in DecodeImage"
+ data = np.frombuffer(img, dtype='uint8')
+ img = cv2.imdecode(data, 1)
+ if self.to_rgb:
+ assert img.shape[2] == 3, 'invalid shape of image[%s]' % (img.shape)
+ img = img[:, :, ::-1]
+
+ if self.channel_first:
+ img = img.transpose((2, 0, 1))
+
+ return img
+
+
+class ResizeImage(object):
+ """ resize image """
+
+ def __init__(self, size=None, resize_short=None, interpolation=None, backend="cv2"):
+ if resize_short is not None and resize_short > 0:
+ self.resize_short = resize_short
+ self.w = None
+ self.h = None
+ elif size is not None:
+ self.resize_short = None
+ self.w = size if type(size) is int else size[0]
+ self.h = size if type(size) is int else size[1]
+ else:
+ raise OperatorParamError("invalid params for ReisizeImage for '\
+ 'both 'size' and 'resize_short' are None")
+
+ self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend)
+
+ def __call__(self, img):
+ img_h, img_w = img.shape[:2]
+ if self.resize_short is not None:
+ percent = float(self.resize_short) / min(img_w, img_h)
+ w = int(round(img_w * percent))
+ h = int(round(img_h * percent))
+ else:
+ w = self.w
+ h = self.h
+ return self._resize_func(img, (w, h))
+
+
+class CropImage(object):
+ """ crop image """
+
+ def __init__(self, size):
+ if type(size) is int:
+ self.size = (size, size)
+ else:
+ self.size = size # (h, w)
+
+ def __call__(self, img):
+ w, h = self.size
+ img_h, img_w = img.shape[:2]
+ w_start = (img_w - w) // 2
+ h_start = (img_h - h) // 2
+
+ w_end = w_start + w
+ h_end = h_start + h
+ return img[h_start:h_end, w_start:w_end, :]
+
+
+class RandCropImage(object):
+ """ random crop image """
+
+ def __init__(self, size, scale=None, ratio=None, interpolation=None, backend="cv2"):
+ if type(size) is int:
+ self.size = (size, size) # (h, w)
+ else:
+ self.size = size
+
+ self.scale = [0.08, 1.0] if scale is None else scale
+ self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio
+
+ self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend)
+
+ def __call__(self, img):
+ size = self.size
+ scale = self.scale
+ ratio = self.ratio
+
+ aspect_ratio = math.sqrt(random.uniform(*ratio))
+ w = 1. * aspect_ratio
+ h = 1. / aspect_ratio
+
+ img_h, img_w = img.shape[:2]
+
+ bound = min((float(img_w) / img_h) / (w**2), (float(img_h) / img_w) / (h**2))
+ scale_max = min(scale[1], bound)
+ scale_min = min(scale[0], bound)
+
+ target_area = img_w * img_h * random.uniform(scale_min, scale_max)
+ target_size = math.sqrt(target_area)
+ w = int(target_size * w)
+ h = int(target_size * h)
+
+ i = random.randint(0, img_w - w)
+ j = random.randint(0, img_h - h)
+
+ img = img[j:j + h, i:i + w, :]
+
+ return self._resize_func(img, size)
+
+
+class RandFlipImage(object):
+ """ random flip image
+ flip_code:
+ 1: Flipped Horizontally
+ 0: Flipped Vertically
+ -1: Flipped Horizontally & Vertically
+ """
+
+ def __init__(self, flip_code=1):
+ assert flip_code in [-1, 0, 1], "flip_code should be a value in [-1, 0, 1]"
+ self.flip_code = flip_code
+
+ def __call__(self, img):
+ if random.randint(0, 1) == 1:
+ return cv2.flip(img, self.flip_code)
+ else:
+ return img
+
+
+class NormalizeImage(object):
+ """ normalize image such as substract mean, divide std
+ """
+
+ def __init__(self, scale=None, mean=None, std=None, order='chw', output_fp16=False, channel_num=3):
+ if isinstance(scale, str):
+ scale = eval(scale)
+ assert channel_num in [3, 4], "channel number of input image should be set to 3 or 4."
+ self.channel_num = channel_num
+ self.output_dtype = 'float16' if output_fp16 else 'float32'
+ self.scale = np.float32(scale if scale is not None else 1.0 / 255.0)
+ self.order = order
+ mean = mean if mean is not None else [0.485, 0.456, 0.406]
+ std = std if std is not None else [0.229, 0.224, 0.225]
+
+ shape = (3, 1, 1) if self.order == 'chw' else (1, 1, 3)
+ self.mean = np.array(mean).reshape(shape).astype('float32')
+ self.std = np.array(std).reshape(shape).astype('float32')
+
+ def __call__(self, img):
+ from PIL import Image
+ if isinstance(img, Image.Image):
+ img = np.array(img)
+
+ assert isinstance(img, np.ndarray), "invalid input 'img' in NormalizeImage"
+
+ img = (img.astype('float32') * self.scale - self.mean) / self.std
+
+ if self.channel_num == 4:
+ img_h = img.shape[1] if self.order == 'chw' else img.shape[0]
+ img_w = img.shape[2] if self.order == 'chw' else img.shape[1]
+ pad_zeros = np.zeros((1, img_h, img_w)) if self.order == 'chw' else np.zeros((img_h, img_w, 1))
+ img = (np.concatenate((img, pad_zeros), axis=0) if self.order == 'chw' else np.concatenate(
+ (img, pad_zeros), axis=2))
+ return img.astype(self.output_dtype)
+
+
+class ToCHWImage(object):
+ """ convert hwc image to chw image
+ """
+
+ def __init__(self):
+ pass
+
+ def __call__(self, img):
+ from PIL import Image
+ if isinstance(img, Image.Image):
+ img = np.array(img)
+
+ return img.transpose((2, 0, 1))
+
+
+class ColorJitter(RawColorJitter):
+ """ColorJitter.
+ """
+
+ def __init__(self, *args, **kwargs):
+ super().__init__(*args, **kwargs)
+
+ def __call__(self, img):
+ if not isinstance(img, Image.Image):
+ img = np.ascontiguousarray(img)
+ img = Image.fromarray(img)
+ img = super()._apply_image(img)
+ if isinstance(img, Image.Image):
+ img = np.asarray(img)
+ return img
+
+
+def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+
+class Topk(object):
+
+ def __init__(self, topk=1, class_id_map_file=None):
+ assert isinstance(topk, (int, ))
+ self.class_id_map = self.parse_class_id_map(class_id_map_file)
+ self.topk = topk
+
+ def parse_class_id_map(self, class_id_map_file):
+ if class_id_map_file is None:
+ return None
+ if not os.path.exists(class_id_map_file):
+ print(
+ "Warning: If want to use your own label_dict, please input legal path!\nOtherwise label_names will be empty!"
+ )
+ return None
+
+ try:
+ class_id_map = {}
+ with open(class_id_map_file, "r") as fin:
+ lines = fin.readlines()
+ for line in lines:
+ partition = line.split("\n")[0].partition(" ")
+ class_id_map[int(partition[0])] = str(partition[-1])
+ except Exception as ex:
+ print(ex)
+ class_id_map = None
+ return class_id_map
+
+ def __call__(self, x, file_names=None, multilabel=False):
+ assert isinstance(x, paddle.Tensor)
+ if file_names is not None:
+ assert x.shape[0] == len(file_names)
+ x = F.softmax(x, axis=-1) if not multilabel else F.sigmoid(x)
+ x = x.numpy()
+ y = []
+ for idx, probs in enumerate(x):
+ index = probs.argsort(axis=0)[-self.topk:][::-1].astype("int32") if not multilabel else np.where(
+ probs >= 0.5)[0].astype("int32")
+ clas_id_list = []
+ score_list = []
+ label_name_list = []
+ for i in index:
+ clas_id_list.append(i.item())
+ score_list.append(probs[i].item())
+ if self.class_id_map is not None:
+ label_name_list.append(self.class_id_map[i.item()])
+ result = {
+ "class_ids": clas_id_list,
+ "scores": np.around(score_list, decimals=5).tolist(),
+ }
+ if file_names is not None:
+ result["file_name"] = file_names[idx]
+ if label_name_list is not None:
+ result["label_names"] = label_name_list
+ y.append(result)
+ return y
diff --git a/modules/image/classification/levit_256_imagenet/utils.py b/modules/image/classification/levit_256_imagenet/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..df2bc36b3050beb0256bf2266dd6b33b4590e537
--- /dev/null
+++ b/modules/image/classification/levit_256_imagenet/utils.py
@@ -0,0 +1,129 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import copy
+import os
+
+import yaml
+
+__all__ = ['get_config']
+
+
+class AttrDict(dict):
+
+ def __getattr__(self, key):
+ return self[key]
+
+ def __setattr__(self, key, value):
+ if key in self.__dict__:
+ self.__dict__[key] = value
+ else:
+ self[key] = value
+
+ def __deepcopy__(self, content):
+ return copy.deepcopy(dict(self))
+
+
+def create_attr_dict(yaml_config):
+ from ast import literal_eval
+ for key, value in yaml_config.items():
+ if type(value) is dict:
+ yaml_config[key] = value = AttrDict(value)
+ if isinstance(value, str):
+ try:
+ value = literal_eval(value)
+ except BaseException:
+ pass
+ if isinstance(value, AttrDict):
+ create_attr_dict(yaml_config[key])
+ else:
+ yaml_config[key] = value
+
+
+def parse_config(cfg_file):
+ """Load a config file into AttrDict"""
+ with open(cfg_file, 'r') as fopen:
+ yaml_config = AttrDict(yaml.load(fopen, Loader=yaml.SafeLoader))
+ create_attr_dict(yaml_config)
+ return yaml_config
+
+
+def override(dl, ks, v):
+ """
+ Recursively replace dict of list
+ Args:
+ dl(dict or list): dict or list to be replaced
+ ks(list): list of keys
+ v(str): value to be replaced
+ """
+
+ def str2num(v):
+ try:
+ return eval(v)
+ except Exception:
+ return v
+
+ assert isinstance(dl, (list, dict)), ("{} should be a list or a dict")
+ assert len(ks) > 0, ('lenght of keys should larger than 0')
+ if isinstance(dl, list):
+ k = str2num(ks[0])
+ if len(ks) == 1:
+ assert k < len(dl), ('index({}) out of range({})'.format(k, dl))
+ dl[k] = str2num(v)
+ else:
+ override(dl[k], ks[1:], v)
+ else:
+ if len(ks) == 1:
+ # assert ks[0] in dl, ('{} is not exist in {}'.format(ks[0], dl))
+ if not ks[0] in dl:
+ print('A new filed ({}) detected!'.format(ks[0], dl))
+ dl[ks[0]] = str2num(v)
+ else:
+ override(dl[ks[0]], ks[1:], v)
+
+
+def override_config(config, options=None):
+ """
+ Recursively override the config
+ Args:
+ config(dict): dict to be replaced
+ options(list): list of pairs(key0.key1.idx.key2=value)
+ such as: [
+ 'topk=2',
+ 'VALID.transforms.1.ResizeImage.resize_short=300'
+ ]
+ Returns:
+ config(dict): replaced config
+ """
+ if options is not None:
+ for opt in options:
+ assert isinstance(opt, str), ("option({}) should be a str".format(opt))
+ assert "=" in opt, ("option({}) should contain a ="
+ "to distinguish between key and value".format(opt))
+ pair = opt.split('=')
+ assert len(pair) == 2, ("there can be only a = in the option")
+ key, value = pair
+ keys = key.split('.')
+ override(config, keys, value)
+ return config
+
+
+def get_config(fname, overrides=None, show=False):
+ """
+ Read config from file
+ """
+ assert os.path.exists(fname), ('config file({}) is not exist'.format(fname))
+ config = parse_config(fname)
+ override_config(config, overrides)
+ return config
diff --git a/modules/image/classification/levit_384_imagenet/README.md b/modules/image/classification/levit_384_imagenet/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..45149034bf566a80d67c855ca3a0fb69435ab47e
--- /dev/null
+++ b/modules/image/classification/levit_384_imagenet/README.md
@@ -0,0 +1,132 @@
+# levit_384_imagenet
+
+|模型名称|levit_384_imagenet|
+| :--- | :---: |
+|类别|图像-图像分类|
+|网络|LeViT|
+|数据集|ImageNet-2012|
+|是否支持Fine-tuning|否|
+|模型大小|225 MB|
+|最新更新日期|2022-04-02|
+|数据指标|Acc|
+
+
+## 一、模型基本信息
+
+
+- ### 模型介绍
+
+ - LeViT 是一种快速推理的、用于图像分类任务的混合神经网络。其设计之初考虑了网络模型在不同的硬件平台上的性能,因此能够更好地反映普遍应用的真实场景。通过大量实验,作者找到了卷积神经网络与 Transformer 体系更好的结合方式,并且提出了 attention-based 方法,用于整合 Transformer 中的位置信息编码, 该模块的模型结构配置为LeViT384, 详情可参考[论文地址](https://arxiv.org/abs/2104.01136)。
+
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 1.6.2
+
+ - paddlehub >= 1.6.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
+
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install levit_384_imagenet
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ $ hub run levit_384_imagenet --input_path "/PATH/TO/IMAGE"
+ ```
+ - 通过命令行方式实现分类模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ classifier = hub.Module(name="levit_384_imagenet")
+ result = classifier.classification(images=[cv2.imread('/PATH/TO/IMAGE')])
+ # or
+ # result = classifier.classification(paths=['/PATH/TO/IMAGE'])
+ ```
+
+- ### 3、API
+
+
+ - ```python
+ def classification(images=None,
+ paths=None,
+ batch_size=1,
+ use_gpu=False,
+ top_k=1):
+ ```
+ - 分类接口API。
+ - **参数**
+
+ - images (list\[numpy.ndarray\]): 图片数据,每一个图片数据的shape 均为 \[H, W, C\],颜色空间为 BGR;
+ - paths (list\[str\]): 图片的路径;
+ - batch\_size (int): batch 的大小;
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
+ - top\_k (int): 返回预测结果的前 k 个。
+
+ - **返回**
+
+ - res (list\[dict\]): 分类结果,列表的每一个元素均为字典,其中 key 包括'class_ids'(种类索引), 'scores'(置信度) 和 'label_names'(种类名称)
+
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个图像识别的在线服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+ - ```shell
+ $ hub serving start -m levit_384_imagenet
+ ```
+
+ - 这样就完成了一个图像识别的在线服务的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ # 发送HTTP请求
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+ headers = {"Content-type": "application/json"\}
+ url = "http://127.0.0.1:8866/predict/levit_384_imagenet"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # 打印预测结果
+ print(r.json()["results"])
+ ```
+
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+
+ - ```shell
+ $ hub install levit_384_imagenet==1.0.0
+ ```
diff --git a/modules/image/classification/levit_384_imagenet/model.py b/modules/image/classification/levit_384_imagenet/model.py
new file mode 100644
index 0000000000000000000000000000000000000000..c1b3bf68f486dd3209860fe71614a1319e4f6bdb
--- /dev/null
+++ b/modules/image/classification/levit_384_imagenet/model.py
@@ -0,0 +1,450 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# Code was based on https://github.com/facebookresearch/LeViT
+import itertools
+import math
+import warnings
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn.initializer import Constant
+from paddle.nn.initializer import TruncatedNormal
+from paddle.regularizer import L2Decay
+
+from .vision_transformer import Identity
+from .vision_transformer import ones_
+from .vision_transformer import trunc_normal_
+from .vision_transformer import zeros_
+
+
+def cal_attention_biases(attention_biases, attention_bias_idxs):
+ gather_list = []
+ attention_bias_t = paddle.transpose(attention_biases, (1, 0))
+ nums = attention_bias_idxs.shape[0]
+ for idx in range(nums):
+ gather = paddle.gather(attention_bias_t, attention_bias_idxs[idx])
+ gather_list.append(gather)
+ shape0, shape1 = attention_bias_idxs.shape
+ gather = paddle.concat(gather_list)
+ return paddle.transpose(gather, (1, 0)).reshape((0, shape0, shape1))
+
+
+class Conv2d_BN(nn.Sequential):
+
+ def __init__(self, a, b, ks=1, stride=1, pad=0, dilation=1, groups=1, bn_weight_init=1, resolution=-10000):
+ super().__init__()
+ self.add_sublayer('c', nn.Conv2D(a, b, ks, stride, pad, dilation, groups, bias_attr=False))
+ bn = nn.BatchNorm2D(b)
+ ones_(bn.weight)
+ zeros_(bn.bias)
+ self.add_sublayer('bn', bn)
+
+
+class Linear_BN(nn.Sequential):
+
+ def __init__(self, a, b, bn_weight_init=1):
+ super().__init__()
+ self.add_sublayer('c', nn.Linear(a, b, bias_attr=False))
+ bn = nn.BatchNorm1D(b)
+ if bn_weight_init == 0:
+ zeros_(bn.weight)
+ else:
+ ones_(bn.weight)
+ zeros_(bn.bias)
+ self.add_sublayer('bn', bn)
+
+ def forward(self, x):
+ l, bn = self._sub_layers.values()
+ x = l(x)
+ return paddle.reshape(bn(x.flatten(0, 1)), x.shape)
+
+
+class BN_Linear(nn.Sequential):
+
+ def __init__(self, a, b, bias=True, std=0.02):
+ super().__init__()
+ self.add_sublayer('bn', nn.BatchNorm1D(a))
+ l = nn.Linear(a, b, bias_attr=bias)
+ trunc_normal_(l.weight)
+ if bias:
+ zeros_(l.bias)
+ self.add_sublayer('l', l)
+
+
+def b16(n, activation, resolution=224):
+ return nn.Sequential(Conv2d_BN(3, n // 8, 3, 2, 1, resolution=resolution), activation(),
+ Conv2d_BN(n // 8, n // 4, 3, 2, 1, resolution=resolution // 2), activation(),
+ Conv2d_BN(n // 4, n // 2, 3, 2, 1, resolution=resolution // 4), activation(),
+ Conv2d_BN(n // 2, n, 3, 2, 1, resolution=resolution // 8))
+
+
+class Residual(nn.Layer):
+
+ def __init__(self, m, drop):
+ super().__init__()
+ self.m = m
+ self.drop = drop
+
+ def forward(self, x):
+ if self.training and self.drop > 0:
+ y = paddle.rand(shape=[x.shape[0], 1, 1]).__ge__(self.drop).astype("float32")
+ y = y.divide(paddle.full_like(y, 1 - self.drop))
+ return paddle.add(x, y)
+ else:
+ return paddle.add(x, self.m(x))
+
+
+class Attention(nn.Layer):
+
+ def __init__(self, dim, key_dim, num_heads=8, attn_ratio=4, activation=None, resolution=14):
+ super().__init__()
+ self.num_heads = num_heads
+ self.scale = key_dim**-0.5
+ self.key_dim = key_dim
+ self.nh_kd = nh_kd = key_dim * num_heads
+ self.d = int(attn_ratio * key_dim)
+ self.dh = int(attn_ratio * key_dim) * num_heads
+ self.attn_ratio = attn_ratio
+ self.h = self.dh + nh_kd * 2
+ self.qkv = Linear_BN(dim, self.h)
+ self.proj = nn.Sequential(activation(), Linear_BN(self.dh, dim, bn_weight_init=0))
+ points = list(itertools.product(range(resolution), range(resolution)))
+ N = len(points)
+ attention_offsets = {}
+ idxs = []
+ for p1 in points:
+ for p2 in points:
+ offset = (abs(p1[0] - p2[0]), abs(p1[1] - p2[1]))
+ if offset not in attention_offsets:
+ attention_offsets[offset] = len(attention_offsets)
+ idxs.append(attention_offsets[offset])
+ self.attention_biases = self.create_parameter(shape=(num_heads, len(attention_offsets)),
+ default_initializer=zeros_,
+ attr=paddle.ParamAttr(regularizer=L2Decay(0.0)))
+ tensor_idxs = paddle.to_tensor(idxs, dtype='int64')
+ self.register_buffer('attention_bias_idxs', paddle.reshape(tensor_idxs, [N, N]))
+
+ @paddle.no_grad()
+ def train(self, mode=True):
+ if mode:
+ super().train()
+ else:
+ super().eval()
+ if mode and hasattr(self, 'ab'):
+ del self.ab
+ else:
+ self.ab = cal_attention_biases(self.attention_biases, self.attention_bias_idxs)
+
+ def forward(self, x):
+ self.training = True
+ B, N, C = x.shape
+ qkv = self.qkv(x)
+ qkv = paddle.reshape(qkv, [B, N, self.num_heads, self.h // self.num_heads])
+ q, k, v = paddle.split(qkv, [self.key_dim, self.key_dim, self.d], axis=3)
+ q = paddle.transpose(q, perm=[0, 2, 1, 3])
+ k = paddle.transpose(k, perm=[0, 2, 1, 3])
+ v = paddle.transpose(v, perm=[0, 2, 1, 3])
+ k_transpose = paddle.transpose(k, perm=[0, 1, 3, 2])
+
+ if self.training:
+ attention_biases = cal_attention_biases(self.attention_biases, self.attention_bias_idxs)
+ else:
+ attention_biases = self.ab
+ attn = (paddle.matmul(q, k_transpose) * self.scale + attention_biases)
+ attn = F.softmax(attn)
+ x = paddle.transpose(paddle.matmul(attn, v), perm=[0, 2, 1, 3])
+ x = paddle.reshape(x, [B, N, self.dh])
+ x = self.proj(x)
+ return x
+
+
+class Subsample(nn.Layer):
+
+ def __init__(self, stride, resolution):
+ super().__init__()
+ self.stride = stride
+ self.resolution = resolution
+
+ def forward(self, x):
+ B, N, C = x.shape
+ x = paddle.reshape(x, [B, self.resolution, self.resolution, C])
+ end1, end2 = x.shape[1], x.shape[2]
+ x = x[:, 0:end1:self.stride, 0:end2:self.stride]
+ x = paddle.reshape(x, [B, -1, C])
+ return x
+
+
+class AttentionSubsample(nn.Layer):
+
+ def __init__(self,
+ in_dim,
+ out_dim,
+ key_dim,
+ num_heads=8,
+ attn_ratio=2,
+ activation=None,
+ stride=2,
+ resolution=14,
+ resolution_=7):
+ super().__init__()
+ self.num_heads = num_heads
+ self.scale = key_dim**-0.5
+ self.key_dim = key_dim
+ self.nh_kd = nh_kd = key_dim * num_heads
+ self.d = int(attn_ratio * key_dim)
+ self.dh = int(attn_ratio * key_dim) * self.num_heads
+ self.attn_ratio = attn_ratio
+ self.resolution_ = resolution_
+ self.resolution_2 = resolution_**2
+ self.training = True
+ h = self.dh + nh_kd
+ self.kv = Linear_BN(in_dim, h)
+
+ self.q = nn.Sequential(Subsample(stride, resolution), Linear_BN(in_dim, nh_kd))
+ self.proj = nn.Sequential(activation(), Linear_BN(self.dh, out_dim))
+
+ self.stride = stride
+ self.resolution = resolution
+ points = list(itertools.product(range(resolution), range(resolution)))
+ points_ = list(itertools.product(range(resolution_), range(resolution_)))
+
+ N = len(points)
+ N_ = len(points_)
+ attention_offsets = {}
+ idxs = []
+ i = 0
+ j = 0
+ for p1 in points_:
+ i += 1
+ for p2 in points:
+ j += 1
+ size = 1
+ offset = (abs(p1[0] * stride - p2[0] + (size - 1) / 2), abs(p1[1] * stride - p2[1] + (size - 1) / 2))
+ if offset not in attention_offsets:
+ attention_offsets[offset] = len(attention_offsets)
+ idxs.append(attention_offsets[offset])
+ self.attention_biases = self.create_parameter(shape=(num_heads, len(attention_offsets)),
+ default_initializer=zeros_,
+ attr=paddle.ParamAttr(regularizer=L2Decay(0.0)))
+
+ tensor_idxs_ = paddle.to_tensor(idxs, dtype='int64')
+ self.register_buffer('attention_bias_idxs', paddle.reshape(tensor_idxs_, [N_, N]))
+
+ @paddle.no_grad()
+ def train(self, mode=True):
+ if mode:
+ super().train()
+ else:
+ super().eval()
+ if mode and hasattr(self, 'ab'):
+ del self.ab
+ else:
+ self.ab = cal_attention_biases(self.attention_biases, self.attention_bias_idxs)
+
+ def forward(self, x):
+ self.training = True
+ B, N, C = x.shape
+ kv = self.kv(x)
+ kv = paddle.reshape(kv, [B, N, self.num_heads, -1])
+ k, v = paddle.split(kv, [self.key_dim, self.d], axis=3)
+ k = paddle.transpose(k, perm=[0, 2, 1, 3]) # BHNC
+ v = paddle.transpose(v, perm=[0, 2, 1, 3])
+ q = paddle.reshape(self.q(x), [B, self.resolution_2, self.num_heads, self.key_dim])
+ q = paddle.transpose(q, perm=[0, 2, 1, 3])
+
+ if self.training:
+ attention_biases = cal_attention_biases(self.attention_biases, self.attention_bias_idxs)
+ else:
+ attention_biases = self.ab
+
+ attn = (paddle.matmul(q, paddle.transpose(k, perm=[0, 1, 3, 2]))) * self.scale + attention_biases
+ attn = F.softmax(attn)
+
+ x = paddle.reshape(paddle.transpose(paddle.matmul(attn, v), perm=[0, 2, 1, 3]), [B, -1, self.dh])
+ x = self.proj(x)
+ return x
+
+
+class LeViT(nn.Layer):
+ """ Vision Transformer with support for patch or hybrid CNN input stage
+ """
+
+ def __init__(self,
+ img_size=224,
+ patch_size=16,
+ in_chans=3,
+ class_num=1000,
+ embed_dim=[192],
+ key_dim=[64],
+ depth=[12],
+ num_heads=[3],
+ attn_ratio=[2],
+ mlp_ratio=[2],
+ hybrid_backbone=None,
+ down_ops=[],
+ attention_activation=nn.Hardswish,
+ mlp_activation=nn.Hardswish,
+ distillation=True,
+ drop_path=0):
+ super().__init__()
+
+ self.class_num = class_num
+ self.num_features = embed_dim[-1]
+ self.embed_dim = embed_dim
+ self.distillation = distillation
+
+ self.patch_embed = hybrid_backbone
+
+ self.blocks = []
+ down_ops.append([''])
+ resolution = img_size // patch_size
+ for i, (ed, kd, dpth, nh, ar, mr,
+ do) in enumerate(zip(embed_dim, key_dim, depth, num_heads, attn_ratio, mlp_ratio, down_ops)):
+ for _ in range(dpth):
+ self.blocks.append(
+ Residual(
+ Attention(
+ ed,
+ kd,
+ nh,
+ attn_ratio=ar,
+ activation=attention_activation,
+ resolution=resolution,
+ ), drop_path))
+ if mr > 0:
+ h = int(ed * mr)
+ self.blocks.append(
+ Residual(
+ nn.Sequential(
+ Linear_BN(ed, h),
+ mlp_activation(),
+ Linear_BN(h, ed, bn_weight_init=0),
+ ), drop_path))
+ if do[0] == 'Subsample':
+ #('Subsample',key_dim, num_heads, attn_ratio, mlp_ratio, stride)
+ resolution_ = (resolution - 1) // do[5] + 1
+ self.blocks.append(
+ AttentionSubsample(*embed_dim[i:i + 2],
+ key_dim=do[1],
+ num_heads=do[2],
+ attn_ratio=do[3],
+ activation=attention_activation,
+ stride=do[5],
+ resolution=resolution,
+ resolution_=resolution_))
+ resolution = resolution_
+ if do[4] > 0: # mlp_ratio
+ h = int(embed_dim[i + 1] * do[4])
+ self.blocks.append(
+ Residual(
+ nn.Sequential(
+ Linear_BN(embed_dim[i + 1], h),
+ mlp_activation(),
+ Linear_BN(h, embed_dim[i + 1], bn_weight_init=0),
+ ), drop_path))
+ self.blocks = nn.Sequential(*self.blocks)
+
+ # Classifier head
+ self.head = BN_Linear(embed_dim[-1], class_num) if class_num > 0 else Identity()
+ if distillation:
+ self.head_dist = BN_Linear(embed_dim[-1], class_num) if class_num > 0 else Identity()
+
+ def forward(self, x):
+ x = self.patch_embed(x)
+ x = x.flatten(2)
+ x = paddle.transpose(x, perm=[0, 2, 1])
+ x = self.blocks(x)
+ x = x.mean(1)
+
+ x = paddle.reshape(x, [-1, self.embed_dim[-1]])
+ if self.distillation:
+ x = self.head(x), self.head_dist(x)
+ if not self.training:
+ x = (x[0] + x[1]) / 2
+ else:
+ x = self.head(x)
+ return x
+
+
+def model_factory(C, D, X, N, drop_path, class_num, distillation):
+ embed_dim = [int(x) for x in C.split('_')]
+ num_heads = [int(x) for x in N.split('_')]
+ depth = [int(x) for x in X.split('_')]
+ act = nn.Hardswish
+ model = LeViT(
+ patch_size=16,
+ embed_dim=embed_dim,
+ num_heads=num_heads,
+ key_dim=[D] * 3,
+ depth=depth,
+ attn_ratio=[2, 2, 2],
+ mlp_ratio=[2, 2, 2],
+ down_ops=[
+ #('Subsample',key_dim, num_heads, attn_ratio, mlp_ratio, stride)
+ ['Subsample', D, embed_dim[0] // D, 4, 2, 2],
+ ['Subsample', D, embed_dim[1] // D, 4, 2, 2],
+ ],
+ attention_activation=act,
+ mlp_activation=act,
+ hybrid_backbone=b16(embed_dim[0], activation=act),
+ class_num=class_num,
+ drop_path=drop_path,
+ distillation=distillation)
+
+ return model
+
+
+specification = {
+ 'LeViT_128S': {
+ 'C': '128_256_384',
+ 'D': 16,
+ 'N': '4_6_8',
+ 'X': '2_3_4',
+ 'drop_path': 0
+ },
+ 'LeViT_128': {
+ 'C': '128_256_384',
+ 'D': 16,
+ 'N': '4_8_12',
+ 'X': '4_4_4',
+ 'drop_path': 0
+ },
+ 'LeViT_192': {
+ 'C': '192_288_384',
+ 'D': 32,
+ 'N': '3_5_6',
+ 'X': '4_4_4',
+ 'drop_path': 0
+ },
+ 'LeViT_256': {
+ 'C': '256_384_512',
+ 'D': 32,
+ 'N': '4_6_8',
+ 'X': '4_4_4',
+ 'drop_path': 0
+ },
+ 'LeViT_384': {
+ 'C': '384_512_768',
+ 'D': 32,
+ 'N': '6_9_12',
+ 'X': '4_4_4',
+ 'drop_path': 0.1
+ },
+}
+
+
+def LeViT_384(**kwargs):
+ model = model_factory(**specification['LeViT_384'], class_num=1000, distillation=False)
+ return model
diff --git a/modules/image/classification/levit_384_imagenet/module.py b/modules/image/classification/levit_384_imagenet/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..790a66d5fe19ef0554e0dfddc4b143f89af22085
--- /dev/null
+++ b/modules/image/classification/levit_384_imagenet/module.py
@@ -0,0 +1,154 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import copy
+import os
+
+import cv2
+import numpy as np
+import paddle
+from skimage.io import imread
+from skimage.transform import rescale
+from skimage.transform import resize
+
+import paddlehub as hub
+from .model import LeViT_384
+from .processor import base64_to_cv2
+from .processor import create_operators
+from .processor import Topk
+from .utils import get_config
+from paddlehub.module.module import moduleinfo
+from paddlehub.module.module import runnable
+from paddlehub.module.module import serving
+
+
+@moduleinfo(name="levit_384_imagenet",
+ type="cv/classification",
+ author="paddlepaddle",
+ author_email="",
+ summary="",
+ version="1.0.0")
+class LeViT_384_ImageNet:
+
+ def __init__(self):
+ self.config = get_config(os.path.join(self.directory, 'LeViT_384.yaml'), show=False)
+ self.label_path = os.path.join(self.directory, 'imagenet1k_label_list.txt')
+ self.pretrain_path = os.path.join(self.directory, 'LeViT_384_pretrained.pdparams')
+ self.config['Infer']['PostProcess']['class_id_map_file'] = self.label_path
+ self.model = LeViT_384()
+ param_state_dict = paddle.load(self.pretrain_path)
+ self.model.set_dict(param_state_dict)
+ self.preprocess_funcs = create_operators(self.config["Infer"]["transforms"])
+
+ def classification(self,
+ images: list = None,
+ paths: list = None,
+ batch_size: int = 1,
+ use_gpu: bool = False,
+ top_k: int = 1):
+ '''
+ Args:
+ images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR.
+ paths (list[str]): The paths of images.
+ batch_size (int): batch size.
+ use_gpu (bool): Whether to use gpu.
+ top_k (int): Return top k results.
+
+ Returns:
+ res (list[dict]): The classfication results, each result dict contains key 'class_ids', 'scores' and 'label_names'.
+ '''
+ postprocess_func = Topk(top_k, self.label_path)
+ inputs = []
+ results = []
+ paddle.disable_static()
+ place = 'gpu:0' if use_gpu else 'cpu'
+ place = paddle.set_device(place)
+ if images == None and paths == None:
+ print('No image provided. Please input an image or a image path.')
+ return
+
+ if images != None:
+ for image in images:
+ image = image[:, :, ::-1]
+ inputs.append(image)
+
+ if paths != None:
+ for path in paths:
+ image = cv2.imread(path)[:, :, ::-1]
+ inputs.append(image)
+
+ batch_data = []
+ for idx, imagedata in enumerate(inputs):
+ for process in self.preprocess_funcs:
+ imagedata = process(imagedata)
+ batch_data.append(imagedata)
+ if len(batch_data) >= batch_size or idx == len(inputs) - 1:
+ batch_tensor = paddle.to_tensor(batch_data)
+ out = self.model(batch_tensor)
+ if isinstance(out, list):
+ out = out[0]
+ if isinstance(out, dict) and "logits" in out:
+ out = out["logits"]
+ if isinstance(out, dict) and "output" in out:
+ out = out["output"]
+ result = postprocess_func(out)
+ results.extend(result)
+ batch_data.clear()
+ return results
+
+ @runnable
+ def run_cmd(self, argvs: list):
+ """
+ Run as a command.
+ """
+ self.parser = argparse.ArgumentParser(description="Run the {} module.".format(self.name),
+ prog='hub run {}'.format(self.name),
+ usage='%(prog)s',
+ add_help=True)
+
+ self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
+ self.arg_config_group = self.parser.add_argument_group(
+ title="Config options", description="Run configuration for controlling module behavior, not required.")
+ self.add_module_config_arg()
+ self.add_module_input_arg()
+ self.args = self.parser.parse_args(argvs)
+ results = self.classification(paths=[self.args.input_path],
+ use_gpu=self.args.use_gpu,
+ batch_size=self.args.batch_size,
+ top_k=self.args.top_k)
+ return results
+
+ @serving
+ def serving_method(self, images, **kwargs):
+ """
+ Run as a service.
+ """
+ images_decode = [base64_to_cv2(image) for image in images]
+ results = self.classification(images=images_decode, **kwargs)
+ return results
+
+ def add_module_config_arg(self):
+ """
+ Add the command config options.
+ """
+ self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not")
+
+ self.arg_config_group.add_argument('--batch_size', type=int, default=1, help='batch size')
+ self.arg_config_group.add_argument('--top_k', type=int, default=1, help='Return top k results.')
+
+ def add_module_input_arg(self):
+ """
+ Add the command input options.
+ """
+ self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.")
diff --git a/modules/image/classification/levit_384_imagenet/processor.py b/modules/image/classification/levit_384_imagenet/processor.py
new file mode 100644
index 0000000000000000000000000000000000000000..40cab3917ecaef50cd47d0abb76bbd5d49062bf8
--- /dev/null
+++ b/modules/image/classification/levit_384_imagenet/processor.py
@@ -0,0 +1,374 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+
+import base64
+import inspect
+import math
+import os
+import random
+import sys
+from functools import partial
+
+import cv2
+import numpy as np
+import paddle
+import paddle.nn.functional as F
+import six
+from paddle.vision.transforms import ColorJitter as RawColorJitter
+from PIL import Image
+
+
+def create_operators(params, class_num=None):
+ """
+ create operators based on the config
+
+ Args:
+ params(list): a dict list, used to create some operators
+ """
+ assert isinstance(params, list), ('operator config should be a list')
+ ops = []
+ current_module = sys.modules[__name__]
+ for operator in params:
+ assert isinstance(operator, dict) and len(operator) == 1, "yaml format error"
+ op_name = list(operator)[0]
+ param = {} if operator[op_name] is None else operator[op_name]
+ op_func = getattr(current_module, op_name)
+ if "class_num" in inspect.getfullargspec(op_func).args:
+ param.update({"class_num": class_num})
+ op = op_func(**param)
+ ops.append(op)
+
+ return ops
+
+
+class UnifiedResize(object):
+
+ def __init__(self, interpolation=None, backend="cv2"):
+ _cv2_interp_from_str = {
+ 'nearest': cv2.INTER_NEAREST,
+ 'bilinear': cv2.INTER_LINEAR,
+ 'area': cv2.INTER_AREA,
+ 'bicubic': cv2.INTER_CUBIC,
+ 'lanczos': cv2.INTER_LANCZOS4
+ }
+ _pil_interp_from_str = {
+ 'nearest': Image.NEAREST,
+ 'bilinear': Image.BILINEAR,
+ 'bicubic': Image.BICUBIC,
+ 'box': Image.BOX,
+ 'lanczos': Image.LANCZOS,
+ 'hamming': Image.HAMMING
+ }
+
+ def _pil_resize(src, size, resample):
+ pil_img = Image.fromarray(src)
+ pil_img = pil_img.resize(size, resample)
+ return np.asarray(pil_img)
+
+ if backend.lower() == "cv2":
+ if isinstance(interpolation, str):
+ interpolation = _cv2_interp_from_str[interpolation.lower()]
+ # compatible with opencv < version 4.4.0
+ elif interpolation is None:
+ interpolation = cv2.INTER_LINEAR
+ self.resize_func = partial(cv2.resize, interpolation=interpolation)
+ elif backend.lower() == "pil":
+ if isinstance(interpolation, str):
+ interpolation = _pil_interp_from_str[interpolation.lower()]
+ self.resize_func = partial(_pil_resize, resample=interpolation)
+ else:
+ self.resize_func = cv2.resize
+
+ def __call__(self, src, size):
+ return self.resize_func(src, size)
+
+
+class OperatorParamError(ValueError):
+ """ OperatorParamError
+ """
+ pass
+
+
+class DecodeImage(object):
+ """ decode image """
+
+ def __init__(self, to_rgb=True, to_np=False, channel_first=False):
+ self.to_rgb = to_rgb
+ self.to_np = to_np # to numpy
+ self.channel_first = channel_first # only enabled when to_np is True
+
+ def __call__(self, img):
+ if six.PY2:
+ assert type(img) is str and len(img) > 0, "invalid input 'img' in DecodeImage"
+ else:
+ assert type(img) is bytes and len(img) > 0, "invalid input 'img' in DecodeImage"
+ data = np.frombuffer(img, dtype='uint8')
+ img = cv2.imdecode(data, 1)
+ if self.to_rgb:
+ assert img.shape[2] == 3, 'invalid shape of image[%s]' % (img.shape)
+ img = img[:, :, ::-1]
+
+ if self.channel_first:
+ img = img.transpose((2, 0, 1))
+
+ return img
+
+
+class ResizeImage(object):
+ """ resize image """
+
+ def __init__(self, size=None, resize_short=None, interpolation=None, backend="cv2"):
+ if resize_short is not None and resize_short > 0:
+ self.resize_short = resize_short
+ self.w = None
+ self.h = None
+ elif size is not None:
+ self.resize_short = None
+ self.w = size if type(size) is int else size[0]
+ self.h = size if type(size) is int else size[1]
+ else:
+ raise OperatorParamError("invalid params for ReisizeImage for '\
+ 'both 'size' and 'resize_short' are None")
+
+ self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend)
+
+ def __call__(self, img):
+ img_h, img_w = img.shape[:2]
+ if self.resize_short is not None:
+ percent = float(self.resize_short) / min(img_w, img_h)
+ w = int(round(img_w * percent))
+ h = int(round(img_h * percent))
+ else:
+ w = self.w
+ h = self.h
+ return self._resize_func(img, (w, h))
+
+
+class CropImage(object):
+ """ crop image """
+
+ def __init__(self, size):
+ if type(size) is int:
+ self.size = (size, size)
+ else:
+ self.size = size # (h, w)
+
+ def __call__(self, img):
+ w, h = self.size
+ img_h, img_w = img.shape[:2]
+ w_start = (img_w - w) // 2
+ h_start = (img_h - h) // 2
+
+ w_end = w_start + w
+ h_end = h_start + h
+ return img[h_start:h_end, w_start:w_end, :]
+
+
+class RandCropImage(object):
+ """ random crop image """
+
+ def __init__(self, size, scale=None, ratio=None, interpolation=None, backend="cv2"):
+ if type(size) is int:
+ self.size = (size, size) # (h, w)
+ else:
+ self.size = size
+
+ self.scale = [0.08, 1.0] if scale is None else scale
+ self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio
+
+ self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend)
+
+ def __call__(self, img):
+ size = self.size
+ scale = self.scale
+ ratio = self.ratio
+
+ aspect_ratio = math.sqrt(random.uniform(*ratio))
+ w = 1. * aspect_ratio
+ h = 1. / aspect_ratio
+
+ img_h, img_w = img.shape[:2]
+
+ bound = min((float(img_w) / img_h) / (w**2), (float(img_h) / img_w) / (h**2))
+ scale_max = min(scale[1], bound)
+ scale_min = min(scale[0], bound)
+
+ target_area = img_w * img_h * random.uniform(scale_min, scale_max)
+ target_size = math.sqrt(target_area)
+ w = int(target_size * w)
+ h = int(target_size * h)
+
+ i = random.randint(0, img_w - w)
+ j = random.randint(0, img_h - h)
+
+ img = img[j:j + h, i:i + w, :]
+
+ return self._resize_func(img, size)
+
+
+class RandFlipImage(object):
+ """ random flip image
+ flip_code:
+ 1: Flipped Horizontally
+ 0: Flipped Vertically
+ -1: Flipped Horizontally & Vertically
+ """
+
+ def __init__(self, flip_code=1):
+ assert flip_code in [-1, 0, 1], "flip_code should be a value in [-1, 0, 1]"
+ self.flip_code = flip_code
+
+ def __call__(self, img):
+ if random.randint(0, 1) == 1:
+ return cv2.flip(img, self.flip_code)
+ else:
+ return img
+
+
+class NormalizeImage(object):
+ """ normalize image such as substract mean, divide std
+ """
+
+ def __init__(self, scale=None, mean=None, std=None, order='chw', output_fp16=False, channel_num=3):
+ if isinstance(scale, str):
+ scale = eval(scale)
+ assert channel_num in [3, 4], "channel number of input image should be set to 3 or 4."
+ self.channel_num = channel_num
+ self.output_dtype = 'float16' if output_fp16 else 'float32'
+ self.scale = np.float32(scale if scale is not None else 1.0 / 255.0)
+ self.order = order
+ mean = mean if mean is not None else [0.485, 0.456, 0.406]
+ std = std if std is not None else [0.229, 0.224, 0.225]
+
+ shape = (3, 1, 1) if self.order == 'chw' else (1, 1, 3)
+ self.mean = np.array(mean).reshape(shape).astype('float32')
+ self.std = np.array(std).reshape(shape).astype('float32')
+
+ def __call__(self, img):
+ from PIL import Image
+ if isinstance(img, Image.Image):
+ img = np.array(img)
+
+ assert isinstance(img, np.ndarray), "invalid input 'img' in NormalizeImage"
+
+ img = (img.astype('float32') * self.scale - self.mean) / self.std
+
+ if self.channel_num == 4:
+ img_h = img.shape[1] if self.order == 'chw' else img.shape[0]
+ img_w = img.shape[2] if self.order == 'chw' else img.shape[1]
+ pad_zeros = np.zeros((1, img_h, img_w)) if self.order == 'chw' else np.zeros((img_h, img_w, 1))
+ img = (np.concatenate((img, pad_zeros), axis=0) if self.order == 'chw' else np.concatenate(
+ (img, pad_zeros), axis=2))
+ return img.astype(self.output_dtype)
+
+
+class ToCHWImage(object):
+ """ convert hwc image to chw image
+ """
+
+ def __init__(self):
+ pass
+
+ def __call__(self, img):
+ from PIL import Image
+ if isinstance(img, Image.Image):
+ img = np.array(img)
+
+ return img.transpose((2, 0, 1))
+
+
+class ColorJitter(RawColorJitter):
+ """ColorJitter.
+ """
+
+ def __init__(self, *args, **kwargs):
+ super().__init__(*args, **kwargs)
+
+ def __call__(self, img):
+ if not isinstance(img, Image.Image):
+ img = np.ascontiguousarray(img)
+ img = Image.fromarray(img)
+ img = super()._apply_image(img)
+ if isinstance(img, Image.Image):
+ img = np.asarray(img)
+ return img
+
+
+def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+
+class Topk(object):
+
+ def __init__(self, topk=1, class_id_map_file=None):
+ assert isinstance(topk, (int, ))
+ self.class_id_map = self.parse_class_id_map(class_id_map_file)
+ self.topk = topk
+
+ def parse_class_id_map(self, class_id_map_file):
+ if class_id_map_file is None:
+ return None
+ if not os.path.exists(class_id_map_file):
+ print(
+ "Warning: If want to use your own label_dict, please input legal path!\nOtherwise label_names will be empty!"
+ )
+ return None
+
+ try:
+ class_id_map = {}
+ with open(class_id_map_file, "r") as fin:
+ lines = fin.readlines()
+ for line in lines:
+ partition = line.split("\n")[0].partition(" ")
+ class_id_map[int(partition[0])] = str(partition[-1])
+ except Exception as ex:
+ print(ex)
+ class_id_map = None
+ return class_id_map
+
+ def __call__(self, x, file_names=None, multilabel=False):
+ assert isinstance(x, paddle.Tensor)
+ if file_names is not None:
+ assert x.shape[0] == len(file_names)
+ x = F.softmax(x, axis=-1) if not multilabel else F.sigmoid(x)
+ x = x.numpy()
+ y = []
+ for idx, probs in enumerate(x):
+ index = probs.argsort(axis=0)[-self.topk:][::-1].astype("int32") if not multilabel else np.where(
+ probs >= 0.5)[0].astype("int32")
+ clas_id_list = []
+ score_list = []
+ label_name_list = []
+ for i in index:
+ clas_id_list.append(i.item())
+ score_list.append(probs[i].item())
+ if self.class_id_map is not None:
+ label_name_list.append(self.class_id_map[i.item()])
+ result = {
+ "class_ids": clas_id_list,
+ "scores": np.around(score_list, decimals=5).tolist(),
+ }
+ if file_names is not None:
+ result["file_name"] = file_names[idx]
+ if label_name_list is not None:
+ result["label_names"] = label_name_list
+ y.append(result)
+ return y
diff --git a/modules/image/classification/levit_384_imagenet/utils.py b/modules/image/classification/levit_384_imagenet/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..df2bc36b3050beb0256bf2266dd6b33b4590e537
--- /dev/null
+++ b/modules/image/classification/levit_384_imagenet/utils.py
@@ -0,0 +1,129 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import copy
+import os
+
+import yaml
+
+__all__ = ['get_config']
+
+
+class AttrDict(dict):
+
+ def __getattr__(self, key):
+ return self[key]
+
+ def __setattr__(self, key, value):
+ if key in self.__dict__:
+ self.__dict__[key] = value
+ else:
+ self[key] = value
+
+ def __deepcopy__(self, content):
+ return copy.deepcopy(dict(self))
+
+
+def create_attr_dict(yaml_config):
+ from ast import literal_eval
+ for key, value in yaml_config.items():
+ if type(value) is dict:
+ yaml_config[key] = value = AttrDict(value)
+ if isinstance(value, str):
+ try:
+ value = literal_eval(value)
+ except BaseException:
+ pass
+ if isinstance(value, AttrDict):
+ create_attr_dict(yaml_config[key])
+ else:
+ yaml_config[key] = value
+
+
+def parse_config(cfg_file):
+ """Load a config file into AttrDict"""
+ with open(cfg_file, 'r') as fopen:
+ yaml_config = AttrDict(yaml.load(fopen, Loader=yaml.SafeLoader))
+ create_attr_dict(yaml_config)
+ return yaml_config
+
+
+def override(dl, ks, v):
+ """
+ Recursively replace dict of list
+ Args:
+ dl(dict or list): dict or list to be replaced
+ ks(list): list of keys
+ v(str): value to be replaced
+ """
+
+ def str2num(v):
+ try:
+ return eval(v)
+ except Exception:
+ return v
+
+ assert isinstance(dl, (list, dict)), ("{} should be a list or a dict")
+ assert len(ks) > 0, ('lenght of keys should larger than 0')
+ if isinstance(dl, list):
+ k = str2num(ks[0])
+ if len(ks) == 1:
+ assert k < len(dl), ('index({}) out of range({})'.format(k, dl))
+ dl[k] = str2num(v)
+ else:
+ override(dl[k], ks[1:], v)
+ else:
+ if len(ks) == 1:
+ # assert ks[0] in dl, ('{} is not exist in {}'.format(ks[0], dl))
+ if not ks[0] in dl:
+ print('A new filed ({}) detected!'.format(ks[0], dl))
+ dl[ks[0]] = str2num(v)
+ else:
+ override(dl[ks[0]], ks[1:], v)
+
+
+def override_config(config, options=None):
+ """
+ Recursively override the config
+ Args:
+ config(dict): dict to be replaced
+ options(list): list of pairs(key0.key1.idx.key2=value)
+ such as: [
+ 'topk=2',
+ 'VALID.transforms.1.ResizeImage.resize_short=300'
+ ]
+ Returns:
+ config(dict): replaced config
+ """
+ if options is not None:
+ for opt in options:
+ assert isinstance(opt, str), ("option({}) should be a str".format(opt))
+ assert "=" in opt, ("option({}) should contain a ="
+ "to distinguish between key and value".format(opt))
+ pair = opt.split('=')
+ assert len(pair) == 2, ("there can be only a = in the option")
+ key, value = pair
+ keys = key.split('.')
+ override(config, keys, value)
+ return config
+
+
+def get_config(fname, overrides=None, show=False):
+ """
+ Read config from file
+ """
+ assert os.path.exists(fname), ('config file({}) is not exist'.format(fname))
+ config = parse_config(fname)
+ override_config(config, overrides)
+ return config
diff --git a/modules/image/classification/pplcnet_x0_25_imagenet/README.md b/modules/image/classification/pplcnet_x0_25_imagenet/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..1444f3e4776bd2518afee67726bd484eb37c9ab1
--- /dev/null
+++ b/modules/image/classification/pplcnet_x0_25_imagenet/README.md
@@ -0,0 +1,132 @@
+# pplcnet_x0_25_imagenet
+
+|模型名称|pplcnet_x0_25_imagenet|
+| :--- | :---: |
+|类别|图像-图像分类|
+|网络|PPLCNet|
+|数据集|ImageNet-2012|
+|是否支持Fine-tuning|否|
+|模型大小|5 MB|
+|最新更新日期|2022-04-02|
+|数据指标|Acc|
+
+
+## 一、模型基本信息
+
+
+
+- ### 模型介绍
+
+ - PP-LCNet是百度针对Intel CPU 设备以及其加速库 MKLDNN 设计的特定骨干网络 ,比起其他的轻量级的 SOTA 模型,该骨干网络可以在不增加推理时间的情况下,进一步提升模型的性能,最终大幅度超越现有的 SOTA 模型。该模型为模型规模参数scale为x0.25下的PP-LCNet模型,关于模型结构的更多信息,可参考[论文](https://arxiv.org/pdf/2109.15099.pdf)。
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 1.6.2
+
+ - paddlehub >= 1.6.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
+
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install pplcnet_x0_25_imagenet
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ $ hub run pplcnet_x0_25_imagenet --input_path "/PATH/TO/IMAGE"
+ ```
+ - 通过命令行方式实现分类模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ classifier = hub.Module(name="pplcnet_x0_25_imagenet")
+ result = classifier.classification(images=[cv2.imread('/PATH/TO/IMAGE')])
+ # or
+ # result = classifier.classification(paths=['/PATH/TO/IMAGE'])
+ ```
+
+- ### 3、API
+
+
+ - ```python
+ def classification(images=None,
+ paths=None,
+ batch_size=1,
+ use_gpu=False,
+ top_k=1):
+ ```
+ - 分类接口API。
+ - **参数**
+
+ - images (list\[numpy.ndarray\]): 图片数据,每一个图片数据的shape 均为 \[H, W, C\],颜色空间为 BGR;
+ - paths (list\[str\]): 图片的路径;
+ - batch\_size (int): batch 的大小;
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
+ - top\_k (int): 返回预测结果的前 k 个。
+
+ - **返回**
+
+ - res (list\[dict\]): 分类结果,列表的每一个元素均为字典,其中 key 包括'class_ids'(种类索引), 'scores'(置信度) 和 'label_names'(种类名称)
+
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个图像识别的在线服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+ - ```shell
+ $ hub serving start -m pplcnet_x0_25_imagenet
+ ```
+
+ - 这样就完成了一个图像识别的在线服务的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ # 发送HTTP请求
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+ headers = {"Content-type": "application/json"\}
+ url = "http://127.0.0.1:8866/predict/pplcnet_x0_25_imagenet"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # 打印预测结果
+ print(r.json()["results"])
+ ```
+
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+
+ - ```shell
+ $ hub install pplcnet_x0_25_imagenet==1.0.0
+ ```
diff --git a/modules/image/classification/pplcnet_x0_25_imagenet/model.py b/modules/image/classification/pplcnet_x0_25_imagenet/model.py
new file mode 100644
index 0000000000000000000000000000000000000000..071131b1b54563ae04a37a9a47df6cf678f8f7e3
--- /dev/null
+++ b/modules/image/classification/pplcnet_x0_25_imagenet/model.py
@@ -0,0 +1,478 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from typing import Any
+from typing import Callable
+from typing import Dict
+from typing import List
+from typing import Tuple
+from typing import Union
+
+import paddle
+import paddle.nn as nn
+from paddle import ParamAttr
+from paddle.nn import AdaptiveAvgPool2D
+from paddle.nn import BatchNorm
+from paddle.nn import Conv2D
+from paddle.nn import Dropout
+from paddle.nn import Linear
+from paddle.nn.initializer import KaimingNormal
+from paddle.regularizer import L2Decay
+
+
+class Identity(nn.Layer):
+
+ def __init__(self):
+ super(Identity, self).__init__()
+
+ def forward(self, inputs):
+ return inputs
+
+
+class TheseusLayer(nn.Layer):
+
+ def __init__(self, *args, **kwargs):
+ super(TheseusLayer, self).__init__()
+ self.res_dict = {}
+ self.res_name = self.full_name()
+ self.pruner = None
+ self.quanter = None
+
+ def _return_dict_hook(self, layer, input, output):
+ res_dict = {"output": output}
+ # 'list' is needed to avoid error raised by popping self.res_dict
+ for res_key in list(self.res_dict):
+ # clear the res_dict because the forward process may change according to input
+ res_dict[res_key] = self.res_dict.pop(res_key)
+ return res_dict
+
+ def init_res(self, stages_pattern, return_patterns=None, return_stages=None):
+ if return_patterns and return_stages:
+ msg = f"The 'return_patterns' would be ignored when 'return_stages' is set."
+ return_stages = None
+
+ if return_stages is True:
+ return_patterns = stages_pattern
+ # return_stages is int or bool
+ if type(return_stages) is int:
+ return_stages = [return_stages]
+ if isinstance(return_stages, list):
+ if max(return_stages) > len(stages_pattern) or min(return_stages) < 0:
+ msg = f"The 'return_stages' set error. Illegal value(s) have been ignored. The stages' pattern list is {stages_pattern}."
+ return_stages = [val for val in return_stages if val >= 0 and val < len(stages_pattern)]
+ return_patterns = [stages_pattern[i] for i in return_stages]
+
+ if return_patterns:
+ self.update_res(return_patterns)
+
+ def replace_sub(self, *args, **kwargs) -> None:
+ msg = "The function 'replace_sub()' is deprecated, please use 'upgrade_sublayer()' instead."
+ raise DeprecationWarning(msg)
+
+ def upgrade_sublayer(self, layer_name_pattern: Union[str, List[str]],
+ handle_func: Callable[[nn.Layer, str], nn.Layer]) -> Dict[str, nn.Layer]:
+ """use 'handle_func' to modify the sub-layer(s) specified by 'layer_name_pattern'.
+
+ Args:
+ layer_name_pattern (Union[str, List[str]]): The name of layer to be modified by 'handle_func'.
+ handle_func (Callable[[nn.Layer, str], nn.Layer]): The function to modify target layer specified by 'layer_name_pattern'. The formal params are the layer(nn.Layer) and pattern(str) that is (a member of) layer_name_pattern (when layer_name_pattern is List type). And the return is the layer processed.
+
+ Returns:
+ Dict[str, nn.Layer]: The key is the pattern and corresponding value is the result returned by 'handle_func()'.
+
+ Examples:
+
+ from paddle import nn
+ import paddleclas
+
+ def rep_func(layer: nn.Layer, pattern: str):
+ new_layer = nn.Conv2D(
+ in_channels=layer._in_channels,
+ out_channels=layer._out_channels,
+ kernel_size=5,
+ padding=2
+ )
+ return new_layer
+
+ net = paddleclas.MobileNetV1()
+ res = net.replace_sub(layer_name_pattern=["blocks[11].depthwise_conv.conv", "blocks[12].depthwise_conv.conv"], handle_func=rep_func)
+ print(res)
+ # {'blocks[11].depthwise_conv.conv': the corresponding new_layer, 'blocks[12].depthwise_conv.conv': the corresponding new_layer}
+ """
+
+ if not isinstance(layer_name_pattern, list):
+ layer_name_pattern = [layer_name_pattern]
+
+ hit_layer_pattern_list = []
+ for pattern in layer_name_pattern:
+ # parse pattern to find target layer and its parent
+ layer_list = parse_pattern_str(pattern=pattern, parent_layer=self)
+ if not layer_list:
+ continue
+ sub_layer_parent = layer_list[-2]["layer"] if len(layer_list) > 1 else self
+
+ sub_layer = layer_list[-1]["layer"]
+ sub_layer_name = layer_list[-1]["name"]
+ sub_layer_index = layer_list[-1]["index"]
+
+ new_sub_layer = handle_func(sub_layer, pattern)
+
+ if sub_layer_index:
+ getattr(sub_layer_parent, sub_layer_name)[sub_layer_index] = new_sub_layer
+ else:
+ setattr(sub_layer_parent, sub_layer_name, new_sub_layer)
+
+ hit_layer_pattern_list.append(pattern)
+ return hit_layer_pattern_list
+
+ def stop_after(self, stop_layer_name: str) -> bool:
+ """stop forward and backward after 'stop_layer_name'.
+
+ Args:
+ stop_layer_name (str): The name of layer that stop forward and backward after this layer.
+
+ Returns:
+ bool: 'True' if successful, 'False' otherwise.
+ """
+
+ layer_list = parse_pattern_str(stop_layer_name, self)
+ if not layer_list:
+ return False
+
+ parent_layer = self
+ for layer_dict in layer_list:
+ name, index = layer_dict["name"], layer_dict["index"]
+ if not set_identity(parent_layer, name, index):
+ msg = f"Failed to set the layers that after stop_layer_name('{stop_layer_name}') to IdentityLayer. The error layer's name is '{name}'."
+ return False
+ parent_layer = layer_dict["layer"]
+
+ return True
+
+ def update_res(self, return_patterns: Union[str, List[str]]) -> Dict[str, nn.Layer]:
+ """update the result(s) to be returned.
+
+ Args:
+ return_patterns (Union[str, List[str]]): The name of layer to return output.
+
+ Returns:
+ Dict[str, nn.Layer]: The pattern(str) and corresponding layer(nn.Layer) that have been set successfully.
+ """
+
+ # clear res_dict that could have been set
+ self.res_dict = {}
+
+ class Handler(object):
+
+ def __init__(self, res_dict):
+ # res_dict is a reference
+ self.res_dict = res_dict
+
+ def __call__(self, layer, pattern):
+ layer.res_dict = self.res_dict
+ layer.res_name = pattern
+ if hasattr(layer, "hook_remove_helper"):
+ layer.hook_remove_helper.remove()
+ layer.hook_remove_helper = layer.register_forward_post_hook(save_sub_res_hook)
+ return layer
+
+ handle_func = Handler(self.res_dict)
+
+ hit_layer_pattern_list = self.upgrade_sublayer(return_patterns, handle_func=handle_func)
+
+ if hasattr(self, "hook_remove_helper"):
+ self.hook_remove_helper.remove()
+ self.hook_remove_helper = self.register_forward_post_hook(self._return_dict_hook)
+
+ return hit_layer_pattern_list
+
+
+def save_sub_res_hook(layer, input, output):
+ layer.res_dict[layer.res_name] = output
+
+
+def set_identity(parent_layer: nn.Layer, layer_name: str, layer_index: str = None) -> bool:
+ """set the layer specified by layer_name and layer_index to Indentity.
+
+ Args:
+ parent_layer (nn.Layer): The parent layer of target layer specified by layer_name and layer_index.
+ layer_name (str): The name of target layer to be set to Indentity.
+ layer_index (str, optional): The index of target layer to be set to Indentity in parent_layer. Defaults to None.
+
+ Returns:
+ bool: True if successfully, False otherwise.
+ """
+
+ stop_after = False
+ for sub_layer_name in parent_layer._sub_layers:
+ if stop_after:
+ parent_layer._sub_layers[sub_layer_name] = Identity()
+ continue
+ if sub_layer_name == layer_name:
+ stop_after = True
+
+ if layer_index and stop_after:
+ stop_after = False
+ for sub_layer_index in parent_layer._sub_layers[layer_name]._sub_layers:
+ if stop_after:
+ parent_layer._sub_layers[layer_name][sub_layer_index] = Identity()
+ continue
+ if layer_index == sub_layer_index:
+ stop_after = True
+
+ return stop_after
+
+
+def parse_pattern_str(pattern: str, parent_layer: nn.Layer) -> Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]:
+ """parse the string type pattern.
+
+ Args:
+ pattern (str): The pattern to discribe layer.
+ parent_layer (nn.Layer): The root layer relative to the pattern.
+
+ Returns:
+ Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]: None if failed. If successfully, the members are layers parsed in order:
+ [
+ {"layer": first layer, "name": first layer's name parsed, "index": first layer's index parsed if exist},
+ {"layer": second layer, "name": second layer's name parsed, "index": second layer's index parsed if exist},
+ ...
+ ]
+ """
+
+ pattern_list = pattern.split(".")
+ if not pattern_list:
+ msg = f"The pattern('{pattern}') is illegal. Please check and retry."
+ return None
+
+ layer_list = []
+ while len(pattern_list) > 0:
+ if '[' in pattern_list[0]:
+ target_layer_name = pattern_list[0].split('[')[0]
+ target_layer_index = pattern_list[0].split('[')[1].split(']')[0]
+ else:
+ target_layer_name = pattern_list[0]
+ target_layer_index = None
+
+ target_layer = getattr(parent_layer, target_layer_name, None)
+
+ if target_layer is None:
+ msg = f"Not found layer named('{target_layer_name}') specifed in pattern('{pattern}')."
+ return None
+
+ if target_layer_index and target_layer:
+ if int(target_layer_index) < 0 or int(target_layer_index) >= len(target_layer):
+ msg = f"Not found layer by index('{target_layer_index}') specifed in pattern('{pattern}'). The index should < {len(target_layer)} and > 0."
+ return None
+
+ target_layer = target_layer[target_layer_index]
+
+ layer_list.append({"layer": target_layer, "name": target_layer_name, "index": target_layer_index})
+
+ pattern_list = pattern_list[1:]
+ parent_layer = target_layer
+ return layer_list
+
+
+MODEL_STAGES_PATTERN = {"PPLCNet": ["blocks2", "blocks3", "blocks4", "blocks5", "blocks6"]}
+
+# Each element(list) represents a depthwise block, which is composed of k, in_c, out_c, s, use_se.
+# k: kernel_size
+# in_c: input channel number in depthwise block
+# out_c: output channel number in depthwise block
+# s: stride in depthwise block
+# use_se: whether to use SE block
+
+NET_CONFIG = {
+ "blocks2":
+ #k, in_c, out_c, s, use_se
+ [[3, 16, 32, 1, False]],
+ "blocks3": [[3, 32, 64, 2, False], [3, 64, 64, 1, False]],
+ "blocks4": [[3, 64, 128, 2, False], [3, 128, 128, 1, False]],
+ "blocks5": [[3, 128, 256, 2, False], [5, 256, 256, 1, False], [5, 256, 256, 1, False], [5, 256, 256, 1, False],
+ [5, 256, 256, 1, False], [5, 256, 256, 1, False]],
+ "blocks6": [[5, 256, 512, 2, True], [5, 512, 512, 1, True]]
+}
+
+
+def make_divisible(v, divisor=8, min_value=None):
+ if min_value is None:
+ min_value = divisor
+ new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
+ if new_v < 0.9 * v:
+ new_v += divisor
+ return new_v
+
+
+class ConvBNLayer(TheseusLayer):
+
+ def __init__(self, num_channels, filter_size, num_filters, stride, num_groups=1):
+ super().__init__()
+
+ self.conv = Conv2D(in_channels=num_channels,
+ out_channels=num_filters,
+ kernel_size=filter_size,
+ stride=stride,
+ padding=(filter_size - 1) // 2,
+ groups=num_groups,
+ weight_attr=ParamAttr(initializer=KaimingNormal()),
+ bias_attr=False)
+
+ self.bn = BatchNorm(num_filters,
+ param_attr=ParamAttr(regularizer=L2Decay(0.0)),
+ bias_attr=ParamAttr(regularizer=L2Decay(0.0)))
+ self.hardswish = nn.Hardswish()
+
+ def forward(self, x):
+ x = self.conv(x)
+ x = self.bn(x)
+ x = self.hardswish(x)
+ return x
+
+
+class DepthwiseSeparable(TheseusLayer):
+
+ def __init__(self, num_channels, num_filters, stride, dw_size=3, use_se=False):
+ super().__init__()
+ self.use_se = use_se
+ self.dw_conv = ConvBNLayer(num_channels=num_channels,
+ num_filters=num_channels,
+ filter_size=dw_size,
+ stride=stride,
+ num_groups=num_channels)
+ if use_se:
+ self.se = SEModule(num_channels)
+ self.pw_conv = ConvBNLayer(num_channels=num_channels, filter_size=1, num_filters=num_filters, stride=1)
+
+ def forward(self, x):
+ x = self.dw_conv(x)
+ if self.use_se:
+ x = self.se(x)
+ x = self.pw_conv(x)
+ return x
+
+
+class SEModule(TheseusLayer):
+
+ def __init__(self, channel, reduction=4):
+ super().__init__()
+ self.avg_pool = AdaptiveAvgPool2D(1)
+ self.conv1 = Conv2D(in_channels=channel, out_channels=channel // reduction, kernel_size=1, stride=1, padding=0)
+ self.relu = nn.ReLU()
+ self.conv2 = Conv2D(in_channels=channel // reduction, out_channels=channel, kernel_size=1, stride=1, padding=0)
+ self.hardsigmoid = nn.Hardsigmoid()
+
+ def forward(self, x):
+ identity = x
+ x = self.avg_pool(x)
+ x = self.conv1(x)
+ x = self.relu(x)
+ x = self.conv2(x)
+ x = self.hardsigmoid(x)
+ x = paddle.multiply(x=identity, y=x)
+ return x
+
+
+class PPLCNet(TheseusLayer):
+
+ def __init__(self,
+ stages_pattern,
+ scale=1.0,
+ class_num=1000,
+ dropout_prob=0.2,
+ class_expand=1280,
+ return_patterns=None,
+ return_stages=None):
+ super().__init__()
+ self.scale = scale
+ self.class_expand = class_expand
+
+ self.conv1 = ConvBNLayer(num_channels=3, filter_size=3, num_filters=make_divisible(16 * scale), stride=2)
+
+ self.blocks2 = nn.Sequential(*[
+ DepthwiseSeparable(num_channels=make_divisible(in_c * scale),
+ num_filters=make_divisible(out_c * scale),
+ dw_size=k,
+ stride=s,
+ use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks2"])
+ ])
+
+ self.blocks3 = nn.Sequential(*[
+ DepthwiseSeparable(num_channels=make_divisible(in_c * scale),
+ num_filters=make_divisible(out_c * scale),
+ dw_size=k,
+ stride=s,
+ use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks3"])
+ ])
+
+ self.blocks4 = nn.Sequential(*[
+ DepthwiseSeparable(num_channels=make_divisible(in_c * scale),
+ num_filters=make_divisible(out_c * scale),
+ dw_size=k,
+ stride=s,
+ use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks4"])
+ ])
+
+ self.blocks5 = nn.Sequential(*[
+ DepthwiseSeparable(num_channels=make_divisible(in_c * scale),
+ num_filters=make_divisible(out_c * scale),
+ dw_size=k,
+ stride=s,
+ use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks5"])
+ ])
+
+ self.blocks6 = nn.Sequential(*[
+ DepthwiseSeparable(num_channels=make_divisible(in_c * scale),
+ num_filters=make_divisible(out_c * scale),
+ dw_size=k,
+ stride=s,
+ use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks6"])
+ ])
+
+ self.avg_pool = AdaptiveAvgPool2D(1)
+
+ self.last_conv = Conv2D(in_channels=make_divisible(NET_CONFIG["blocks6"][-1][2] * scale),
+ out_channels=self.class_expand,
+ kernel_size=1,
+ stride=1,
+ padding=0,
+ bias_attr=False)
+
+ self.hardswish = nn.Hardswish()
+ self.dropout = Dropout(p=dropout_prob, mode="downscale_in_infer")
+ self.flatten = nn.Flatten(start_axis=1, stop_axis=-1)
+
+ self.fc = Linear(self.class_expand, class_num)
+
+ super().init_res(stages_pattern, return_patterns=return_patterns, return_stages=return_stages)
+
+ def forward(self, x):
+ x = self.conv1(x)
+
+ x = self.blocks2(x)
+ x = self.blocks3(x)
+ x = self.blocks4(x)
+ x = self.blocks5(x)
+ x = self.blocks6(x)
+
+ x = self.avg_pool(x)
+ x = self.last_conv(x)
+ x = self.hardswish(x)
+ x = self.dropout(x)
+ x = self.flatten(x)
+ x = self.fc(x)
+ return x
+
+
+def PPLCNet_x0_25(**kwargs):
+ model = PPLCNet(scale=0.25, stages_pattern=MODEL_STAGES_PATTERN["PPLCNet"], **kwargs)
+ return model
diff --git a/modules/image/classification/pplcnet_x0_25_imagenet/module.py b/modules/image/classification/pplcnet_x0_25_imagenet/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..a4f9878636cce57503a6aa0db115122d958155f7
--- /dev/null
+++ b/modules/image/classification/pplcnet_x0_25_imagenet/module.py
@@ -0,0 +1,154 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import copy
+import os
+
+import cv2
+import numpy as np
+import paddle
+from skimage.io import imread
+from skimage.transform import rescale
+from skimage.transform import resize
+
+import paddlehub as hub
+from .model import PPLCNet_x0_25
+from .processor import base64_to_cv2
+from .processor import create_operators
+from .processor import Topk
+from .utils import get_config
+from paddlehub.module.module import moduleinfo
+from paddlehub.module.module import runnable
+from paddlehub.module.module import serving
+
+
+@moduleinfo(name="pplcnet_x0_25_imagenet",
+ type="cv/classification",
+ author="paddlepaddle",
+ author_email="",
+ summary="",
+ version="1.0.0")
+class PPLcNet_x0_5:
+
+ def __init__(self):
+ self.config = get_config(os.path.join(self.directory, 'PPLCNet_x0_25.yaml'), show=False)
+ self.label_path = os.path.join(self.directory, 'imagenet1k_label_list.txt')
+ self.pretrain_path = os.path.join(self.directory, 'PPLCNet_x0_25_pretrained.pdparams')
+ self.config['Infer']['PostProcess']['class_id_map_file'] = self.label_path
+ self.model = PPLCNet_x0_25()
+ param_state_dict = paddle.load(self.pretrain_path)
+ self.model.set_dict(param_state_dict)
+ self.preprocess_funcs = create_operators(self.config["Infer"]["transforms"])
+
+ def classification(self,
+ images: list = None,
+ paths: list = None,
+ batch_size: int = 1,
+ use_gpu: bool = False,
+ top_k: int = 1):
+ '''
+ Args:
+ images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR.
+ paths (list[str]): The paths of images.
+ batch_size (int): batch size.
+ use_gpu (bool): Whether to use gpu.
+ top_k (int): Return top k results.
+
+ Returns:
+ res (list[dict]): The classfication results, each result dict contains key 'class_ids', 'scores' and 'label_names'.
+ '''
+ postprocess_func = Topk(top_k, self.label_path)
+ inputs = []
+ results = []
+ paddle.disable_static()
+ place = 'gpu:0' if use_gpu else 'cpu'
+ place = paddle.set_device(place)
+ if images == None and paths == None:
+ print('No image provided. Please input an image or a image path.')
+ return
+
+ if images != None:
+ for image in images:
+ image = image[:, :, ::-1]
+ inputs.append(image)
+
+ if paths != None:
+ for path in paths:
+ image = cv2.imread(path)[:, :, ::-1]
+ inputs.append(image)
+
+ batch_data = []
+ for idx, imagedata in enumerate(inputs):
+ for process in self.preprocess_funcs:
+ imagedata = process(imagedata)
+ batch_data.append(imagedata)
+ if len(batch_data) >= batch_size or idx == len(inputs) - 1:
+ batch_tensor = paddle.to_tensor(batch_data)
+ out = self.model(batch_tensor)
+ if isinstance(out, list):
+ out = out[0]
+ if isinstance(out, dict) and "logits" in out:
+ out = out["logits"]
+ if isinstance(out, dict) and "output" in out:
+ out = out["output"]
+ result = postprocess_func(out)
+ results.extend(result)
+ batch_data.clear()
+ return results
+
+ @runnable
+ def run_cmd(self, argvs: list):
+ """
+ Run as a command.
+ """
+ self.parser = argparse.ArgumentParser(description="Run the {} module.".format(self.name),
+ prog='hub run {}'.format(self.name),
+ usage='%(prog)s',
+ add_help=True)
+
+ self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
+ self.arg_config_group = self.parser.add_argument_group(
+ title="Config options", description="Run configuration for controlling module behavior, not required.")
+ self.add_module_config_arg()
+ self.add_module_input_arg()
+ self.args = self.parser.parse_args(argvs)
+ results = self.classification(paths=[self.args.input_path],
+ use_gpu=self.args.use_gpu,
+ batch_size=self.args.batch_size,
+ top_k=self.args.top_k)
+ return results
+
+ @serving
+ def serving_method(self, images, **kwargs):
+ """
+ Run as a service.
+ """
+ images_decode = [base64_to_cv2(image) for image in images]
+ results = self.classification(images=images_decode, **kwargs)
+ return results
+
+ def add_module_config_arg(self):
+ """
+ Add the command config options.
+ """
+ self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not")
+
+ self.arg_config_group.add_argument('--batch_size', type=int, default=1, help='batch size')
+ self.arg_config_group.add_argument('--top_k', type=int, default=1, help='Return top k results.')
+
+ def add_module_input_arg(self):
+ """
+ Add the command input options.
+ """
+ self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.")
diff --git a/modules/image/classification/pplcnet_x0_25_imagenet/processor.py b/modules/image/classification/pplcnet_x0_25_imagenet/processor.py
new file mode 100644
index 0000000000000000000000000000000000000000..40cab3917ecaef50cd47d0abb76bbd5d49062bf8
--- /dev/null
+++ b/modules/image/classification/pplcnet_x0_25_imagenet/processor.py
@@ -0,0 +1,374 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+
+import base64
+import inspect
+import math
+import os
+import random
+import sys
+from functools import partial
+
+import cv2
+import numpy as np
+import paddle
+import paddle.nn.functional as F
+import six
+from paddle.vision.transforms import ColorJitter as RawColorJitter
+from PIL import Image
+
+
+def create_operators(params, class_num=None):
+ """
+ create operators based on the config
+
+ Args:
+ params(list): a dict list, used to create some operators
+ """
+ assert isinstance(params, list), ('operator config should be a list')
+ ops = []
+ current_module = sys.modules[__name__]
+ for operator in params:
+ assert isinstance(operator, dict) and len(operator) == 1, "yaml format error"
+ op_name = list(operator)[0]
+ param = {} if operator[op_name] is None else operator[op_name]
+ op_func = getattr(current_module, op_name)
+ if "class_num" in inspect.getfullargspec(op_func).args:
+ param.update({"class_num": class_num})
+ op = op_func(**param)
+ ops.append(op)
+
+ return ops
+
+
+class UnifiedResize(object):
+
+ def __init__(self, interpolation=None, backend="cv2"):
+ _cv2_interp_from_str = {
+ 'nearest': cv2.INTER_NEAREST,
+ 'bilinear': cv2.INTER_LINEAR,
+ 'area': cv2.INTER_AREA,
+ 'bicubic': cv2.INTER_CUBIC,
+ 'lanczos': cv2.INTER_LANCZOS4
+ }
+ _pil_interp_from_str = {
+ 'nearest': Image.NEAREST,
+ 'bilinear': Image.BILINEAR,
+ 'bicubic': Image.BICUBIC,
+ 'box': Image.BOX,
+ 'lanczos': Image.LANCZOS,
+ 'hamming': Image.HAMMING
+ }
+
+ def _pil_resize(src, size, resample):
+ pil_img = Image.fromarray(src)
+ pil_img = pil_img.resize(size, resample)
+ return np.asarray(pil_img)
+
+ if backend.lower() == "cv2":
+ if isinstance(interpolation, str):
+ interpolation = _cv2_interp_from_str[interpolation.lower()]
+ # compatible with opencv < version 4.4.0
+ elif interpolation is None:
+ interpolation = cv2.INTER_LINEAR
+ self.resize_func = partial(cv2.resize, interpolation=interpolation)
+ elif backend.lower() == "pil":
+ if isinstance(interpolation, str):
+ interpolation = _pil_interp_from_str[interpolation.lower()]
+ self.resize_func = partial(_pil_resize, resample=interpolation)
+ else:
+ self.resize_func = cv2.resize
+
+ def __call__(self, src, size):
+ return self.resize_func(src, size)
+
+
+class OperatorParamError(ValueError):
+ """ OperatorParamError
+ """
+ pass
+
+
+class DecodeImage(object):
+ """ decode image """
+
+ def __init__(self, to_rgb=True, to_np=False, channel_first=False):
+ self.to_rgb = to_rgb
+ self.to_np = to_np # to numpy
+ self.channel_first = channel_first # only enabled when to_np is True
+
+ def __call__(self, img):
+ if six.PY2:
+ assert type(img) is str and len(img) > 0, "invalid input 'img' in DecodeImage"
+ else:
+ assert type(img) is bytes and len(img) > 0, "invalid input 'img' in DecodeImage"
+ data = np.frombuffer(img, dtype='uint8')
+ img = cv2.imdecode(data, 1)
+ if self.to_rgb:
+ assert img.shape[2] == 3, 'invalid shape of image[%s]' % (img.shape)
+ img = img[:, :, ::-1]
+
+ if self.channel_first:
+ img = img.transpose((2, 0, 1))
+
+ return img
+
+
+class ResizeImage(object):
+ """ resize image """
+
+ def __init__(self, size=None, resize_short=None, interpolation=None, backend="cv2"):
+ if resize_short is not None and resize_short > 0:
+ self.resize_short = resize_short
+ self.w = None
+ self.h = None
+ elif size is not None:
+ self.resize_short = None
+ self.w = size if type(size) is int else size[0]
+ self.h = size if type(size) is int else size[1]
+ else:
+ raise OperatorParamError("invalid params for ReisizeImage for '\
+ 'both 'size' and 'resize_short' are None")
+
+ self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend)
+
+ def __call__(self, img):
+ img_h, img_w = img.shape[:2]
+ if self.resize_short is not None:
+ percent = float(self.resize_short) / min(img_w, img_h)
+ w = int(round(img_w * percent))
+ h = int(round(img_h * percent))
+ else:
+ w = self.w
+ h = self.h
+ return self._resize_func(img, (w, h))
+
+
+class CropImage(object):
+ """ crop image """
+
+ def __init__(self, size):
+ if type(size) is int:
+ self.size = (size, size)
+ else:
+ self.size = size # (h, w)
+
+ def __call__(self, img):
+ w, h = self.size
+ img_h, img_w = img.shape[:2]
+ w_start = (img_w - w) // 2
+ h_start = (img_h - h) // 2
+
+ w_end = w_start + w
+ h_end = h_start + h
+ return img[h_start:h_end, w_start:w_end, :]
+
+
+class RandCropImage(object):
+ """ random crop image """
+
+ def __init__(self, size, scale=None, ratio=None, interpolation=None, backend="cv2"):
+ if type(size) is int:
+ self.size = (size, size) # (h, w)
+ else:
+ self.size = size
+
+ self.scale = [0.08, 1.0] if scale is None else scale
+ self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio
+
+ self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend)
+
+ def __call__(self, img):
+ size = self.size
+ scale = self.scale
+ ratio = self.ratio
+
+ aspect_ratio = math.sqrt(random.uniform(*ratio))
+ w = 1. * aspect_ratio
+ h = 1. / aspect_ratio
+
+ img_h, img_w = img.shape[:2]
+
+ bound = min((float(img_w) / img_h) / (w**2), (float(img_h) / img_w) / (h**2))
+ scale_max = min(scale[1], bound)
+ scale_min = min(scale[0], bound)
+
+ target_area = img_w * img_h * random.uniform(scale_min, scale_max)
+ target_size = math.sqrt(target_area)
+ w = int(target_size * w)
+ h = int(target_size * h)
+
+ i = random.randint(0, img_w - w)
+ j = random.randint(0, img_h - h)
+
+ img = img[j:j + h, i:i + w, :]
+
+ return self._resize_func(img, size)
+
+
+class RandFlipImage(object):
+ """ random flip image
+ flip_code:
+ 1: Flipped Horizontally
+ 0: Flipped Vertically
+ -1: Flipped Horizontally & Vertically
+ """
+
+ def __init__(self, flip_code=1):
+ assert flip_code in [-1, 0, 1], "flip_code should be a value in [-1, 0, 1]"
+ self.flip_code = flip_code
+
+ def __call__(self, img):
+ if random.randint(0, 1) == 1:
+ return cv2.flip(img, self.flip_code)
+ else:
+ return img
+
+
+class NormalizeImage(object):
+ """ normalize image such as substract mean, divide std
+ """
+
+ def __init__(self, scale=None, mean=None, std=None, order='chw', output_fp16=False, channel_num=3):
+ if isinstance(scale, str):
+ scale = eval(scale)
+ assert channel_num in [3, 4], "channel number of input image should be set to 3 or 4."
+ self.channel_num = channel_num
+ self.output_dtype = 'float16' if output_fp16 else 'float32'
+ self.scale = np.float32(scale if scale is not None else 1.0 / 255.0)
+ self.order = order
+ mean = mean if mean is not None else [0.485, 0.456, 0.406]
+ std = std if std is not None else [0.229, 0.224, 0.225]
+
+ shape = (3, 1, 1) if self.order == 'chw' else (1, 1, 3)
+ self.mean = np.array(mean).reshape(shape).astype('float32')
+ self.std = np.array(std).reshape(shape).astype('float32')
+
+ def __call__(self, img):
+ from PIL import Image
+ if isinstance(img, Image.Image):
+ img = np.array(img)
+
+ assert isinstance(img, np.ndarray), "invalid input 'img' in NormalizeImage"
+
+ img = (img.astype('float32') * self.scale - self.mean) / self.std
+
+ if self.channel_num == 4:
+ img_h = img.shape[1] if self.order == 'chw' else img.shape[0]
+ img_w = img.shape[2] if self.order == 'chw' else img.shape[1]
+ pad_zeros = np.zeros((1, img_h, img_w)) if self.order == 'chw' else np.zeros((img_h, img_w, 1))
+ img = (np.concatenate((img, pad_zeros), axis=0) if self.order == 'chw' else np.concatenate(
+ (img, pad_zeros), axis=2))
+ return img.astype(self.output_dtype)
+
+
+class ToCHWImage(object):
+ """ convert hwc image to chw image
+ """
+
+ def __init__(self):
+ pass
+
+ def __call__(self, img):
+ from PIL import Image
+ if isinstance(img, Image.Image):
+ img = np.array(img)
+
+ return img.transpose((2, 0, 1))
+
+
+class ColorJitter(RawColorJitter):
+ """ColorJitter.
+ """
+
+ def __init__(self, *args, **kwargs):
+ super().__init__(*args, **kwargs)
+
+ def __call__(self, img):
+ if not isinstance(img, Image.Image):
+ img = np.ascontiguousarray(img)
+ img = Image.fromarray(img)
+ img = super()._apply_image(img)
+ if isinstance(img, Image.Image):
+ img = np.asarray(img)
+ return img
+
+
+def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+
+class Topk(object):
+
+ def __init__(self, topk=1, class_id_map_file=None):
+ assert isinstance(topk, (int, ))
+ self.class_id_map = self.parse_class_id_map(class_id_map_file)
+ self.topk = topk
+
+ def parse_class_id_map(self, class_id_map_file):
+ if class_id_map_file is None:
+ return None
+ if not os.path.exists(class_id_map_file):
+ print(
+ "Warning: If want to use your own label_dict, please input legal path!\nOtherwise label_names will be empty!"
+ )
+ return None
+
+ try:
+ class_id_map = {}
+ with open(class_id_map_file, "r") as fin:
+ lines = fin.readlines()
+ for line in lines:
+ partition = line.split("\n")[0].partition(" ")
+ class_id_map[int(partition[0])] = str(partition[-1])
+ except Exception as ex:
+ print(ex)
+ class_id_map = None
+ return class_id_map
+
+ def __call__(self, x, file_names=None, multilabel=False):
+ assert isinstance(x, paddle.Tensor)
+ if file_names is not None:
+ assert x.shape[0] == len(file_names)
+ x = F.softmax(x, axis=-1) if not multilabel else F.sigmoid(x)
+ x = x.numpy()
+ y = []
+ for idx, probs in enumerate(x):
+ index = probs.argsort(axis=0)[-self.topk:][::-1].astype("int32") if not multilabel else np.where(
+ probs >= 0.5)[0].astype("int32")
+ clas_id_list = []
+ score_list = []
+ label_name_list = []
+ for i in index:
+ clas_id_list.append(i.item())
+ score_list.append(probs[i].item())
+ if self.class_id_map is not None:
+ label_name_list.append(self.class_id_map[i.item()])
+ result = {
+ "class_ids": clas_id_list,
+ "scores": np.around(score_list, decimals=5).tolist(),
+ }
+ if file_names is not None:
+ result["file_name"] = file_names[idx]
+ if label_name_list is not None:
+ result["label_names"] = label_name_list
+ y.append(result)
+ return y
diff --git a/modules/image/classification/pplcnet_x0_25_imagenet/utils.py b/modules/image/classification/pplcnet_x0_25_imagenet/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..df2bc36b3050beb0256bf2266dd6b33b4590e537
--- /dev/null
+++ b/modules/image/classification/pplcnet_x0_25_imagenet/utils.py
@@ -0,0 +1,129 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import copy
+import os
+
+import yaml
+
+__all__ = ['get_config']
+
+
+class AttrDict(dict):
+
+ def __getattr__(self, key):
+ return self[key]
+
+ def __setattr__(self, key, value):
+ if key in self.__dict__:
+ self.__dict__[key] = value
+ else:
+ self[key] = value
+
+ def __deepcopy__(self, content):
+ return copy.deepcopy(dict(self))
+
+
+def create_attr_dict(yaml_config):
+ from ast import literal_eval
+ for key, value in yaml_config.items():
+ if type(value) is dict:
+ yaml_config[key] = value = AttrDict(value)
+ if isinstance(value, str):
+ try:
+ value = literal_eval(value)
+ except BaseException:
+ pass
+ if isinstance(value, AttrDict):
+ create_attr_dict(yaml_config[key])
+ else:
+ yaml_config[key] = value
+
+
+def parse_config(cfg_file):
+ """Load a config file into AttrDict"""
+ with open(cfg_file, 'r') as fopen:
+ yaml_config = AttrDict(yaml.load(fopen, Loader=yaml.SafeLoader))
+ create_attr_dict(yaml_config)
+ return yaml_config
+
+
+def override(dl, ks, v):
+ """
+ Recursively replace dict of list
+ Args:
+ dl(dict or list): dict or list to be replaced
+ ks(list): list of keys
+ v(str): value to be replaced
+ """
+
+ def str2num(v):
+ try:
+ return eval(v)
+ except Exception:
+ return v
+
+ assert isinstance(dl, (list, dict)), ("{} should be a list or a dict")
+ assert len(ks) > 0, ('lenght of keys should larger than 0')
+ if isinstance(dl, list):
+ k = str2num(ks[0])
+ if len(ks) == 1:
+ assert k < len(dl), ('index({}) out of range({})'.format(k, dl))
+ dl[k] = str2num(v)
+ else:
+ override(dl[k], ks[1:], v)
+ else:
+ if len(ks) == 1:
+ # assert ks[0] in dl, ('{} is not exist in {}'.format(ks[0], dl))
+ if not ks[0] in dl:
+ print('A new filed ({}) detected!'.format(ks[0], dl))
+ dl[ks[0]] = str2num(v)
+ else:
+ override(dl[ks[0]], ks[1:], v)
+
+
+def override_config(config, options=None):
+ """
+ Recursively override the config
+ Args:
+ config(dict): dict to be replaced
+ options(list): list of pairs(key0.key1.idx.key2=value)
+ such as: [
+ 'topk=2',
+ 'VALID.transforms.1.ResizeImage.resize_short=300'
+ ]
+ Returns:
+ config(dict): replaced config
+ """
+ if options is not None:
+ for opt in options:
+ assert isinstance(opt, str), ("option({}) should be a str".format(opt))
+ assert "=" in opt, ("option({}) should contain a ="
+ "to distinguish between key and value".format(opt))
+ pair = opt.split('=')
+ assert len(pair) == 2, ("there can be only a = in the option")
+ key, value = pair
+ keys = key.split('.')
+ override(config, keys, value)
+ return config
+
+
+def get_config(fname, overrides=None, show=False):
+ """
+ Read config from file
+ """
+ assert os.path.exists(fname), ('config file({}) is not exist'.format(fname))
+ config = parse_config(fname)
+ override_config(config, overrides)
+ return config
diff --git a/modules/image/classification/pplcnet_x0_35_imagenet/README.md b/modules/image/classification/pplcnet_x0_35_imagenet/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..021c52b8eb650b4001a55a2cf393ce061ad011a4
--- /dev/null
+++ b/modules/image/classification/pplcnet_x0_35_imagenet/README.md
@@ -0,0 +1,132 @@
+# pplcnet_x0_35_imagenet
+
+|模型名称|pplcnet_x0_35_imagenet|
+| :--- | :---: |
+|类别|图像-图像分类|
+|网络|PPLCNet|
+|数据集|ImageNet-2012|
+|是否支持Fine-tuning|否|
+|模型大小|6 MB|
+|最新更新日期|2022-04-02|
+|数据指标|Acc|
+
+
+## 一、模型基本信息
+
+
+
+- ### 模型介绍
+
+ - PP-LCNet是百度针对Intel CPU 设备以及其加速库 MKLDNN 设计的特定骨干网络 ,比起其他的轻量级的 SOTA 模型,该骨干网络可以在不增加推理时间的情况下,进一步提升模型的性能,最终大幅度超越现有的 SOTA 模型。该模型为模型规模参数scale为x0.35下的PP-LCNet模型,关于模型结构的更多信息,可参考[论文](https://arxiv.org/pdf/2109.15099.pdf)。
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 1.6.2
+
+ - paddlehub >= 1.6.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
+
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install pplcnet_x0_35_imagenet
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ $ hub run pplcnet_x0_35_imagenet --input_path "/PATH/TO/IMAGE"
+ ```
+ - 通过命令行方式实现分类模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ classifier = hub.Module(name="pplcnet_x0_35_imagenet")
+ result = classifier.classification(images=[cv2.imread('/PATH/TO/IMAGE')])
+ # or
+ # result = classifier.classification(paths=['/PATH/TO/IMAGE'])
+ ```
+
+- ### 3、API
+
+
+ - ```python
+ def classification(images=None,
+ paths=None,
+ batch_size=1,
+ use_gpu=False,
+ top_k=1):
+ ```
+ - 分类接口API。
+ - **参数**
+
+ - images (list\[numpy.ndarray\]): 图片数据,每一个图片数据的shape 均为 \[H, W, C\],颜色空间为 BGR;
+ - paths (list\[str\]): 图片的路径;
+ - batch\_size (int): batch 的大小;
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
+ - top\_k (int): 返回预测结果的前 k 个。
+
+ - **返回**
+
+ - res (list\[dict\]): 分类结果,列表的每一个元素均为字典,其中 key 包括'class_ids'(种类索引), 'scores'(置信度) 和 'label_names'(种类名称)
+
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个图像识别的在线服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+ - ```shell
+ $ hub serving start -m pplcnet_x0_35_imagenet
+ ```
+
+ - 这样就完成了一个图像识别的在线服务的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ # 发送HTTP请求
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+ headers = {"Content-type": "application/json"\}
+ url = "http://127.0.0.1:8866/predict/pplcnet_x0_35_imagenet"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # 打印预测结果
+ print(r.json()["results"])
+ ```
+
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+
+ - ```shell
+ $ hub install pplcnet_x0_35_imagenet==1.0.0
+ ```
diff --git a/modules/image/classification/pplcnet_x0_35_imagenet/model.py b/modules/image/classification/pplcnet_x0_35_imagenet/model.py
new file mode 100644
index 0000000000000000000000000000000000000000..85580ae9f6c73ab79b0e371398e343384cc459ab
--- /dev/null
+++ b/modules/image/classification/pplcnet_x0_35_imagenet/model.py
@@ -0,0 +1,478 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from typing import Any
+from typing import Callable
+from typing import Dict
+from typing import List
+from typing import Tuple
+from typing import Union
+
+import paddle
+import paddle.nn as nn
+from paddle import ParamAttr
+from paddle.nn import AdaptiveAvgPool2D
+from paddle.nn import BatchNorm
+from paddle.nn import Conv2D
+from paddle.nn import Dropout
+from paddle.nn import Linear
+from paddle.nn.initializer import KaimingNormal
+from paddle.regularizer import L2Decay
+
+
+class Identity(nn.Layer):
+
+ def __init__(self):
+ super(Identity, self).__init__()
+
+ def forward(self, inputs):
+ return inputs
+
+
+class TheseusLayer(nn.Layer):
+
+ def __init__(self, *args, **kwargs):
+ super(TheseusLayer, self).__init__()
+ self.res_dict = {}
+ self.res_name = self.full_name()
+ self.pruner = None
+ self.quanter = None
+
+ def _return_dict_hook(self, layer, input, output):
+ res_dict = {"output": output}
+ # 'list' is needed to avoid error raised by popping self.res_dict
+ for res_key in list(self.res_dict):
+ # clear the res_dict because the forward process may change according to input
+ res_dict[res_key] = self.res_dict.pop(res_key)
+ return res_dict
+
+ def init_res(self, stages_pattern, return_patterns=None, return_stages=None):
+ if return_patterns and return_stages:
+ msg = f"The 'return_patterns' would be ignored when 'return_stages' is set."
+ return_stages = None
+
+ if return_stages is True:
+ return_patterns = stages_pattern
+ # return_stages is int or bool
+ if type(return_stages) is int:
+ return_stages = [return_stages]
+ if isinstance(return_stages, list):
+ if max(return_stages) > len(stages_pattern) or min(return_stages) < 0:
+ msg = f"The 'return_stages' set error. Illegal value(s) have been ignored. The stages' pattern list is {stages_pattern}."
+ return_stages = [val for val in return_stages if val >= 0 and val < len(stages_pattern)]
+ return_patterns = [stages_pattern[i] for i in return_stages]
+
+ if return_patterns:
+ self.update_res(return_patterns)
+
+ def replace_sub(self, *args, **kwargs) -> None:
+ msg = "The function 'replace_sub()' is deprecated, please use 'upgrade_sublayer()' instead."
+ raise DeprecationWarning(msg)
+
+ def upgrade_sublayer(self, layer_name_pattern: Union[str, List[str]],
+ handle_func: Callable[[nn.Layer, str], nn.Layer]) -> Dict[str, nn.Layer]:
+ """use 'handle_func' to modify the sub-layer(s) specified by 'layer_name_pattern'.
+
+ Args:
+ layer_name_pattern (Union[str, List[str]]): The name of layer to be modified by 'handle_func'.
+ handle_func (Callable[[nn.Layer, str], nn.Layer]): The function to modify target layer specified by 'layer_name_pattern'. The formal params are the layer(nn.Layer) and pattern(str) that is (a member of) layer_name_pattern (when layer_name_pattern is List type). And the return is the layer processed.
+
+ Returns:
+ Dict[str, nn.Layer]: The key is the pattern and corresponding value is the result returned by 'handle_func()'.
+
+ Examples:
+
+ from paddle import nn
+ import paddleclas
+
+ def rep_func(layer: nn.Layer, pattern: str):
+ new_layer = nn.Conv2D(
+ in_channels=layer._in_channels,
+ out_channels=layer._out_channels,
+ kernel_size=5,
+ padding=2
+ )
+ return new_layer
+
+ net = paddleclas.MobileNetV1()
+ res = net.replace_sub(layer_name_pattern=["blocks[11].depthwise_conv.conv", "blocks[12].depthwise_conv.conv"], handle_func=rep_func)
+ print(res)
+ # {'blocks[11].depthwise_conv.conv': the corresponding new_layer, 'blocks[12].depthwise_conv.conv': the corresponding new_layer}
+ """
+
+ if not isinstance(layer_name_pattern, list):
+ layer_name_pattern = [layer_name_pattern]
+
+ hit_layer_pattern_list = []
+ for pattern in layer_name_pattern:
+ # parse pattern to find target layer and its parent
+ layer_list = parse_pattern_str(pattern=pattern, parent_layer=self)
+ if not layer_list:
+ continue
+ sub_layer_parent = layer_list[-2]["layer"] if len(layer_list) > 1 else self
+
+ sub_layer = layer_list[-1]["layer"]
+ sub_layer_name = layer_list[-1]["name"]
+ sub_layer_index = layer_list[-1]["index"]
+
+ new_sub_layer = handle_func(sub_layer, pattern)
+
+ if sub_layer_index:
+ getattr(sub_layer_parent, sub_layer_name)[sub_layer_index] = new_sub_layer
+ else:
+ setattr(sub_layer_parent, sub_layer_name, new_sub_layer)
+
+ hit_layer_pattern_list.append(pattern)
+ return hit_layer_pattern_list
+
+ def stop_after(self, stop_layer_name: str) -> bool:
+ """stop forward and backward after 'stop_layer_name'.
+
+ Args:
+ stop_layer_name (str): The name of layer that stop forward and backward after this layer.
+
+ Returns:
+ bool: 'True' if successful, 'False' otherwise.
+ """
+
+ layer_list = parse_pattern_str(stop_layer_name, self)
+ if not layer_list:
+ return False
+
+ parent_layer = self
+ for layer_dict in layer_list:
+ name, index = layer_dict["name"], layer_dict["index"]
+ if not set_identity(parent_layer, name, index):
+ msg = f"Failed to set the layers that after stop_layer_name('{stop_layer_name}') to IdentityLayer. The error layer's name is '{name}'."
+ return False
+ parent_layer = layer_dict["layer"]
+
+ return True
+
+ def update_res(self, return_patterns: Union[str, List[str]]) -> Dict[str, nn.Layer]:
+ """update the result(s) to be returned.
+
+ Args:
+ return_patterns (Union[str, List[str]]): The name of layer to return output.
+
+ Returns:
+ Dict[str, nn.Layer]: The pattern(str) and corresponding layer(nn.Layer) that have been set successfully.
+ """
+
+ # clear res_dict that could have been set
+ self.res_dict = {}
+
+ class Handler(object):
+
+ def __init__(self, res_dict):
+ # res_dict is a reference
+ self.res_dict = res_dict
+
+ def __call__(self, layer, pattern):
+ layer.res_dict = self.res_dict
+ layer.res_name = pattern
+ if hasattr(layer, "hook_remove_helper"):
+ layer.hook_remove_helper.remove()
+ layer.hook_remove_helper = layer.register_forward_post_hook(save_sub_res_hook)
+ return layer
+
+ handle_func = Handler(self.res_dict)
+
+ hit_layer_pattern_list = self.upgrade_sublayer(return_patterns, handle_func=handle_func)
+
+ if hasattr(self, "hook_remove_helper"):
+ self.hook_remove_helper.remove()
+ self.hook_remove_helper = self.register_forward_post_hook(self._return_dict_hook)
+
+ return hit_layer_pattern_list
+
+
+def save_sub_res_hook(layer, input, output):
+ layer.res_dict[layer.res_name] = output
+
+
+def set_identity(parent_layer: nn.Layer, layer_name: str, layer_index: str = None) -> bool:
+ """set the layer specified by layer_name and layer_index to Indentity.
+
+ Args:
+ parent_layer (nn.Layer): The parent layer of target layer specified by layer_name and layer_index.
+ layer_name (str): The name of target layer to be set to Indentity.
+ layer_index (str, optional): The index of target layer to be set to Indentity in parent_layer. Defaults to None.
+
+ Returns:
+ bool: True if successfully, False otherwise.
+ """
+
+ stop_after = False
+ for sub_layer_name in parent_layer._sub_layers:
+ if stop_after:
+ parent_layer._sub_layers[sub_layer_name] = Identity()
+ continue
+ if sub_layer_name == layer_name:
+ stop_after = True
+
+ if layer_index and stop_after:
+ stop_after = False
+ for sub_layer_index in parent_layer._sub_layers[layer_name]._sub_layers:
+ if stop_after:
+ parent_layer._sub_layers[layer_name][sub_layer_index] = Identity()
+ continue
+ if layer_index == sub_layer_index:
+ stop_after = True
+
+ return stop_after
+
+
+def parse_pattern_str(pattern: str, parent_layer: nn.Layer) -> Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]:
+ """parse the string type pattern.
+
+ Args:
+ pattern (str): The pattern to discribe layer.
+ parent_layer (nn.Layer): The root layer relative to the pattern.
+
+ Returns:
+ Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]: None if failed. If successfully, the members are layers parsed in order:
+ [
+ {"layer": first layer, "name": first layer's name parsed, "index": first layer's index parsed if exist},
+ {"layer": second layer, "name": second layer's name parsed, "index": second layer's index parsed if exist},
+ ...
+ ]
+ """
+
+ pattern_list = pattern.split(".")
+ if not pattern_list:
+ msg = f"The pattern('{pattern}') is illegal. Please check and retry."
+ return None
+
+ layer_list = []
+ while len(pattern_list) > 0:
+ if '[' in pattern_list[0]:
+ target_layer_name = pattern_list[0].split('[')[0]
+ target_layer_index = pattern_list[0].split('[')[1].split(']')[0]
+ else:
+ target_layer_name = pattern_list[0]
+ target_layer_index = None
+
+ target_layer = getattr(parent_layer, target_layer_name, None)
+
+ if target_layer is None:
+ msg = f"Not found layer named('{target_layer_name}') specifed in pattern('{pattern}')."
+ return None
+
+ if target_layer_index and target_layer:
+ if int(target_layer_index) < 0 or int(target_layer_index) >= len(target_layer):
+ msg = f"Not found layer by index('{target_layer_index}') specifed in pattern('{pattern}'). The index should < {len(target_layer)} and > 0."
+ return None
+
+ target_layer = target_layer[target_layer_index]
+
+ layer_list.append({"layer": target_layer, "name": target_layer_name, "index": target_layer_index})
+
+ pattern_list = pattern_list[1:]
+ parent_layer = target_layer
+ return layer_list
+
+
+MODEL_STAGES_PATTERN = {"PPLCNet": ["blocks2", "blocks3", "blocks4", "blocks5", "blocks6"]}
+
+# Each element(list) represents a depthwise block, which is composed of k, in_c, out_c, s, use_se.
+# k: kernel_size
+# in_c: input channel number in depthwise block
+# out_c: output channel number in depthwise block
+# s: stride in depthwise block
+# use_se: whether to use SE block
+
+NET_CONFIG = {
+ "blocks2":
+ #k, in_c, out_c, s, use_se
+ [[3, 16, 32, 1, False]],
+ "blocks3": [[3, 32, 64, 2, False], [3, 64, 64, 1, False]],
+ "blocks4": [[3, 64, 128, 2, False], [3, 128, 128, 1, False]],
+ "blocks5": [[3, 128, 256, 2, False], [5, 256, 256, 1, False], [5, 256, 256, 1, False], [5, 256, 256, 1, False],
+ [5, 256, 256, 1, False], [5, 256, 256, 1, False]],
+ "blocks6": [[5, 256, 512, 2, True], [5, 512, 512, 1, True]]
+}
+
+
+def make_divisible(v, divisor=8, min_value=None):
+ if min_value is None:
+ min_value = divisor
+ new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
+ if new_v < 0.9 * v:
+ new_v += divisor
+ return new_v
+
+
+class ConvBNLayer(TheseusLayer):
+
+ def __init__(self, num_channels, filter_size, num_filters, stride, num_groups=1):
+ super().__init__()
+
+ self.conv = Conv2D(in_channels=num_channels,
+ out_channels=num_filters,
+ kernel_size=filter_size,
+ stride=stride,
+ padding=(filter_size - 1) // 2,
+ groups=num_groups,
+ weight_attr=ParamAttr(initializer=KaimingNormal()),
+ bias_attr=False)
+
+ self.bn = BatchNorm(num_filters,
+ param_attr=ParamAttr(regularizer=L2Decay(0.0)),
+ bias_attr=ParamAttr(regularizer=L2Decay(0.0)))
+ self.hardswish = nn.Hardswish()
+
+ def forward(self, x):
+ x = self.conv(x)
+ x = self.bn(x)
+ x = self.hardswish(x)
+ return x
+
+
+class DepthwiseSeparable(TheseusLayer):
+
+ def __init__(self, num_channels, num_filters, stride, dw_size=3, use_se=False):
+ super().__init__()
+ self.use_se = use_se
+ self.dw_conv = ConvBNLayer(num_channels=num_channels,
+ num_filters=num_channels,
+ filter_size=dw_size,
+ stride=stride,
+ num_groups=num_channels)
+ if use_se:
+ self.se = SEModule(num_channels)
+ self.pw_conv = ConvBNLayer(num_channels=num_channels, filter_size=1, num_filters=num_filters, stride=1)
+
+ def forward(self, x):
+ x = self.dw_conv(x)
+ if self.use_se:
+ x = self.se(x)
+ x = self.pw_conv(x)
+ return x
+
+
+class SEModule(TheseusLayer):
+
+ def __init__(self, channel, reduction=4):
+ super().__init__()
+ self.avg_pool = AdaptiveAvgPool2D(1)
+ self.conv1 = Conv2D(in_channels=channel, out_channels=channel // reduction, kernel_size=1, stride=1, padding=0)
+ self.relu = nn.ReLU()
+ self.conv2 = Conv2D(in_channels=channel // reduction, out_channels=channel, kernel_size=1, stride=1, padding=0)
+ self.hardsigmoid = nn.Hardsigmoid()
+
+ def forward(self, x):
+ identity = x
+ x = self.avg_pool(x)
+ x = self.conv1(x)
+ x = self.relu(x)
+ x = self.conv2(x)
+ x = self.hardsigmoid(x)
+ x = paddle.multiply(x=identity, y=x)
+ return x
+
+
+class PPLCNet(TheseusLayer):
+
+ def __init__(self,
+ stages_pattern,
+ scale=1.0,
+ class_num=1000,
+ dropout_prob=0.2,
+ class_expand=1280,
+ return_patterns=None,
+ return_stages=None):
+ super().__init__()
+ self.scale = scale
+ self.class_expand = class_expand
+
+ self.conv1 = ConvBNLayer(num_channels=3, filter_size=3, num_filters=make_divisible(16 * scale), stride=2)
+
+ self.blocks2 = nn.Sequential(*[
+ DepthwiseSeparable(num_channels=make_divisible(in_c * scale),
+ num_filters=make_divisible(out_c * scale),
+ dw_size=k,
+ stride=s,
+ use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks2"])
+ ])
+
+ self.blocks3 = nn.Sequential(*[
+ DepthwiseSeparable(num_channels=make_divisible(in_c * scale),
+ num_filters=make_divisible(out_c * scale),
+ dw_size=k,
+ stride=s,
+ use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks3"])
+ ])
+
+ self.blocks4 = nn.Sequential(*[
+ DepthwiseSeparable(num_channels=make_divisible(in_c * scale),
+ num_filters=make_divisible(out_c * scale),
+ dw_size=k,
+ stride=s,
+ use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks4"])
+ ])
+
+ self.blocks5 = nn.Sequential(*[
+ DepthwiseSeparable(num_channels=make_divisible(in_c * scale),
+ num_filters=make_divisible(out_c * scale),
+ dw_size=k,
+ stride=s,
+ use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks5"])
+ ])
+
+ self.blocks6 = nn.Sequential(*[
+ DepthwiseSeparable(num_channels=make_divisible(in_c * scale),
+ num_filters=make_divisible(out_c * scale),
+ dw_size=k,
+ stride=s,
+ use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks6"])
+ ])
+
+ self.avg_pool = AdaptiveAvgPool2D(1)
+
+ self.last_conv = Conv2D(in_channels=make_divisible(NET_CONFIG["blocks6"][-1][2] * scale),
+ out_channels=self.class_expand,
+ kernel_size=1,
+ stride=1,
+ padding=0,
+ bias_attr=False)
+
+ self.hardswish = nn.Hardswish()
+ self.dropout = Dropout(p=dropout_prob, mode="downscale_in_infer")
+ self.flatten = nn.Flatten(start_axis=1, stop_axis=-1)
+
+ self.fc = Linear(self.class_expand, class_num)
+
+ super().init_res(stages_pattern, return_patterns=return_patterns, return_stages=return_stages)
+
+ def forward(self, x):
+ x = self.conv1(x)
+
+ x = self.blocks2(x)
+ x = self.blocks3(x)
+ x = self.blocks4(x)
+ x = self.blocks5(x)
+ x = self.blocks6(x)
+
+ x = self.avg_pool(x)
+ x = self.last_conv(x)
+ x = self.hardswish(x)
+ x = self.dropout(x)
+ x = self.flatten(x)
+ x = self.fc(x)
+ return x
+
+
+def PPLCNet_x0_35(pretrained=False, use_ssld=False, **kwargs):
+ model = PPLCNet(scale=0.35, stages_pattern=MODEL_STAGES_PATTERN["PPLCNet"], **kwargs)
+ return model
diff --git a/modules/image/classification/pplcnet_x0_35_imagenet/module.py b/modules/image/classification/pplcnet_x0_35_imagenet/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..acd31f0261c2ae7af7014c8fdc15a061b5d44128
--- /dev/null
+++ b/modules/image/classification/pplcnet_x0_35_imagenet/module.py
@@ -0,0 +1,154 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import copy
+import os
+
+import cv2
+import numpy as np
+import paddle
+from skimage.io import imread
+from skimage.transform import rescale
+from skimage.transform import resize
+
+import paddlehub as hub
+from .model import PPLCNet_x0_35
+from .processor import base64_to_cv2
+from .processor import create_operators
+from .processor import Topk
+from .utils import get_config
+from paddlehub.module.module import moduleinfo
+from paddlehub.module.module import runnable
+from paddlehub.module.module import serving
+
+
+@moduleinfo(name="pplcnet_x0_35_imagenet",
+ type="cv/classification",
+ author="paddlepaddle",
+ author_email="",
+ summary="",
+ version="1.0.0")
+class PPLcNet_x0_35:
+
+ def __init__(self):
+ self.config = get_config(os.path.join(self.directory, 'PPLCNet_x0_35.yaml'), show=False)
+ self.label_path = os.path.join(self.directory, 'imagenet1k_label_list.txt')
+ self.pretrain_path = os.path.join(self.directory, 'PPLCNet_x0_35_pretrained.pdparams')
+ self.config['Infer']['PostProcess']['class_id_map_file'] = self.label_path
+ self.model = PPLCNet_x0_35()
+ param_state_dict = paddle.load(self.pretrain_path)
+ self.model.set_dict(param_state_dict)
+ self.preprocess_funcs = create_operators(self.config["Infer"]["transforms"])
+
+ def classification(self,
+ images: list = None,
+ paths: list = None,
+ batch_size: int = 1,
+ use_gpu: bool = False,
+ top_k: int = 1):
+ '''
+ Args:
+ images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR.
+ paths (list[str]): The paths of images.
+ batch_size (int): batch size.
+ use_gpu (bool): Whether to use gpu.
+ top_k (int): Return top k results.
+
+ Returns:
+ res (list[dict]): The classfication results, each result dict contains key 'class_ids', 'scores' and 'label_names'.
+ '''
+ postprocess_func = Topk(top_k, self.label_path)
+ inputs = []
+ results = []
+ paddle.disable_static()
+ place = 'gpu:0' if use_gpu else 'cpu'
+ place = paddle.set_device(place)
+ if images == None and paths == None:
+ print('No image provided. Please input an image or a image path.')
+ return
+
+ if images != None:
+ for image in images:
+ image = image[:, :, ::-1]
+ inputs.append(image)
+
+ if paths != None:
+ for path in paths:
+ image = cv2.imread(path)[:, :, ::-1]
+ inputs.append(image)
+
+ batch_data = []
+ for idx, imagedata in enumerate(inputs):
+ for process in self.preprocess_funcs:
+ imagedata = process(imagedata)
+ batch_data.append(imagedata)
+ if len(batch_data) >= batch_size or idx == len(inputs) - 1:
+ batch_tensor = paddle.to_tensor(batch_data)
+ out = self.model(batch_tensor)
+ if isinstance(out, list):
+ out = out[0]
+ if isinstance(out, dict) and "logits" in out:
+ out = out["logits"]
+ if isinstance(out, dict) and "output" in out:
+ out = out["output"]
+ result = postprocess_func(out)
+ results.extend(result)
+ batch_data.clear()
+ return results
+
+ @runnable
+ def run_cmd(self, argvs: list):
+ """
+ Run as a command.
+ """
+ self.parser = argparse.ArgumentParser(description="Run the {} module.".format(self.name),
+ prog='hub run {}'.format(self.name),
+ usage='%(prog)s',
+ add_help=True)
+
+ self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
+ self.arg_config_group = self.parser.add_argument_group(
+ title="Config options", description="Run configuration for controlling module behavior, not required.")
+ self.add_module_config_arg()
+ self.add_module_input_arg()
+ self.args = self.parser.parse_args(argvs)
+ results = self.classification(paths=[self.args.input_path],
+ use_gpu=self.args.use_gpu,
+ batch_size=self.args.batch_size,
+ top_k=self.args.top_k)
+ return results
+
+ @serving
+ def serving_method(self, images, **kwargs):
+ """
+ Run as a service.
+ """
+ images_decode = [base64_to_cv2(image) for image in images]
+ results = self.classification(images=images_decode, **kwargs)
+ return results
+
+ def add_module_config_arg(self):
+ """
+ Add the command config options.
+ """
+ self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not")
+
+ self.arg_config_group.add_argument('--batch_size', type=int, default=1, help='batch size')
+ self.arg_config_group.add_argument('--top_k', type=int, default=1, help='Return top k results.')
+
+ def add_module_input_arg(self):
+ """
+ Add the command input options.
+ """
+ self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.")
diff --git a/modules/image/classification/pplcnet_x0_35_imagenet/processor.py b/modules/image/classification/pplcnet_x0_35_imagenet/processor.py
new file mode 100644
index 0000000000000000000000000000000000000000..40cab3917ecaef50cd47d0abb76bbd5d49062bf8
--- /dev/null
+++ b/modules/image/classification/pplcnet_x0_35_imagenet/processor.py
@@ -0,0 +1,374 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+
+import base64
+import inspect
+import math
+import os
+import random
+import sys
+from functools import partial
+
+import cv2
+import numpy as np
+import paddle
+import paddle.nn.functional as F
+import six
+from paddle.vision.transforms import ColorJitter as RawColorJitter
+from PIL import Image
+
+
+def create_operators(params, class_num=None):
+ """
+ create operators based on the config
+
+ Args:
+ params(list): a dict list, used to create some operators
+ """
+ assert isinstance(params, list), ('operator config should be a list')
+ ops = []
+ current_module = sys.modules[__name__]
+ for operator in params:
+ assert isinstance(operator, dict) and len(operator) == 1, "yaml format error"
+ op_name = list(operator)[0]
+ param = {} if operator[op_name] is None else operator[op_name]
+ op_func = getattr(current_module, op_name)
+ if "class_num" in inspect.getfullargspec(op_func).args:
+ param.update({"class_num": class_num})
+ op = op_func(**param)
+ ops.append(op)
+
+ return ops
+
+
+class UnifiedResize(object):
+
+ def __init__(self, interpolation=None, backend="cv2"):
+ _cv2_interp_from_str = {
+ 'nearest': cv2.INTER_NEAREST,
+ 'bilinear': cv2.INTER_LINEAR,
+ 'area': cv2.INTER_AREA,
+ 'bicubic': cv2.INTER_CUBIC,
+ 'lanczos': cv2.INTER_LANCZOS4
+ }
+ _pil_interp_from_str = {
+ 'nearest': Image.NEAREST,
+ 'bilinear': Image.BILINEAR,
+ 'bicubic': Image.BICUBIC,
+ 'box': Image.BOX,
+ 'lanczos': Image.LANCZOS,
+ 'hamming': Image.HAMMING
+ }
+
+ def _pil_resize(src, size, resample):
+ pil_img = Image.fromarray(src)
+ pil_img = pil_img.resize(size, resample)
+ return np.asarray(pil_img)
+
+ if backend.lower() == "cv2":
+ if isinstance(interpolation, str):
+ interpolation = _cv2_interp_from_str[interpolation.lower()]
+ # compatible with opencv < version 4.4.0
+ elif interpolation is None:
+ interpolation = cv2.INTER_LINEAR
+ self.resize_func = partial(cv2.resize, interpolation=interpolation)
+ elif backend.lower() == "pil":
+ if isinstance(interpolation, str):
+ interpolation = _pil_interp_from_str[interpolation.lower()]
+ self.resize_func = partial(_pil_resize, resample=interpolation)
+ else:
+ self.resize_func = cv2.resize
+
+ def __call__(self, src, size):
+ return self.resize_func(src, size)
+
+
+class OperatorParamError(ValueError):
+ """ OperatorParamError
+ """
+ pass
+
+
+class DecodeImage(object):
+ """ decode image """
+
+ def __init__(self, to_rgb=True, to_np=False, channel_first=False):
+ self.to_rgb = to_rgb
+ self.to_np = to_np # to numpy
+ self.channel_first = channel_first # only enabled when to_np is True
+
+ def __call__(self, img):
+ if six.PY2:
+ assert type(img) is str and len(img) > 0, "invalid input 'img' in DecodeImage"
+ else:
+ assert type(img) is bytes and len(img) > 0, "invalid input 'img' in DecodeImage"
+ data = np.frombuffer(img, dtype='uint8')
+ img = cv2.imdecode(data, 1)
+ if self.to_rgb:
+ assert img.shape[2] == 3, 'invalid shape of image[%s]' % (img.shape)
+ img = img[:, :, ::-1]
+
+ if self.channel_first:
+ img = img.transpose((2, 0, 1))
+
+ return img
+
+
+class ResizeImage(object):
+ """ resize image """
+
+ def __init__(self, size=None, resize_short=None, interpolation=None, backend="cv2"):
+ if resize_short is not None and resize_short > 0:
+ self.resize_short = resize_short
+ self.w = None
+ self.h = None
+ elif size is not None:
+ self.resize_short = None
+ self.w = size if type(size) is int else size[0]
+ self.h = size if type(size) is int else size[1]
+ else:
+ raise OperatorParamError("invalid params for ReisizeImage for '\
+ 'both 'size' and 'resize_short' are None")
+
+ self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend)
+
+ def __call__(self, img):
+ img_h, img_w = img.shape[:2]
+ if self.resize_short is not None:
+ percent = float(self.resize_short) / min(img_w, img_h)
+ w = int(round(img_w * percent))
+ h = int(round(img_h * percent))
+ else:
+ w = self.w
+ h = self.h
+ return self._resize_func(img, (w, h))
+
+
+class CropImage(object):
+ """ crop image """
+
+ def __init__(self, size):
+ if type(size) is int:
+ self.size = (size, size)
+ else:
+ self.size = size # (h, w)
+
+ def __call__(self, img):
+ w, h = self.size
+ img_h, img_w = img.shape[:2]
+ w_start = (img_w - w) // 2
+ h_start = (img_h - h) // 2
+
+ w_end = w_start + w
+ h_end = h_start + h
+ return img[h_start:h_end, w_start:w_end, :]
+
+
+class RandCropImage(object):
+ """ random crop image """
+
+ def __init__(self, size, scale=None, ratio=None, interpolation=None, backend="cv2"):
+ if type(size) is int:
+ self.size = (size, size) # (h, w)
+ else:
+ self.size = size
+
+ self.scale = [0.08, 1.0] if scale is None else scale
+ self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio
+
+ self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend)
+
+ def __call__(self, img):
+ size = self.size
+ scale = self.scale
+ ratio = self.ratio
+
+ aspect_ratio = math.sqrt(random.uniform(*ratio))
+ w = 1. * aspect_ratio
+ h = 1. / aspect_ratio
+
+ img_h, img_w = img.shape[:2]
+
+ bound = min((float(img_w) / img_h) / (w**2), (float(img_h) / img_w) / (h**2))
+ scale_max = min(scale[1], bound)
+ scale_min = min(scale[0], bound)
+
+ target_area = img_w * img_h * random.uniform(scale_min, scale_max)
+ target_size = math.sqrt(target_area)
+ w = int(target_size * w)
+ h = int(target_size * h)
+
+ i = random.randint(0, img_w - w)
+ j = random.randint(0, img_h - h)
+
+ img = img[j:j + h, i:i + w, :]
+
+ return self._resize_func(img, size)
+
+
+class RandFlipImage(object):
+ """ random flip image
+ flip_code:
+ 1: Flipped Horizontally
+ 0: Flipped Vertically
+ -1: Flipped Horizontally & Vertically
+ """
+
+ def __init__(self, flip_code=1):
+ assert flip_code in [-1, 0, 1], "flip_code should be a value in [-1, 0, 1]"
+ self.flip_code = flip_code
+
+ def __call__(self, img):
+ if random.randint(0, 1) == 1:
+ return cv2.flip(img, self.flip_code)
+ else:
+ return img
+
+
+class NormalizeImage(object):
+ """ normalize image such as substract mean, divide std
+ """
+
+ def __init__(self, scale=None, mean=None, std=None, order='chw', output_fp16=False, channel_num=3):
+ if isinstance(scale, str):
+ scale = eval(scale)
+ assert channel_num in [3, 4], "channel number of input image should be set to 3 or 4."
+ self.channel_num = channel_num
+ self.output_dtype = 'float16' if output_fp16 else 'float32'
+ self.scale = np.float32(scale if scale is not None else 1.0 / 255.0)
+ self.order = order
+ mean = mean if mean is not None else [0.485, 0.456, 0.406]
+ std = std if std is not None else [0.229, 0.224, 0.225]
+
+ shape = (3, 1, 1) if self.order == 'chw' else (1, 1, 3)
+ self.mean = np.array(mean).reshape(shape).astype('float32')
+ self.std = np.array(std).reshape(shape).astype('float32')
+
+ def __call__(self, img):
+ from PIL import Image
+ if isinstance(img, Image.Image):
+ img = np.array(img)
+
+ assert isinstance(img, np.ndarray), "invalid input 'img' in NormalizeImage"
+
+ img = (img.astype('float32') * self.scale - self.mean) / self.std
+
+ if self.channel_num == 4:
+ img_h = img.shape[1] if self.order == 'chw' else img.shape[0]
+ img_w = img.shape[2] if self.order == 'chw' else img.shape[1]
+ pad_zeros = np.zeros((1, img_h, img_w)) if self.order == 'chw' else np.zeros((img_h, img_w, 1))
+ img = (np.concatenate((img, pad_zeros), axis=0) if self.order == 'chw' else np.concatenate(
+ (img, pad_zeros), axis=2))
+ return img.astype(self.output_dtype)
+
+
+class ToCHWImage(object):
+ """ convert hwc image to chw image
+ """
+
+ def __init__(self):
+ pass
+
+ def __call__(self, img):
+ from PIL import Image
+ if isinstance(img, Image.Image):
+ img = np.array(img)
+
+ return img.transpose((2, 0, 1))
+
+
+class ColorJitter(RawColorJitter):
+ """ColorJitter.
+ """
+
+ def __init__(self, *args, **kwargs):
+ super().__init__(*args, **kwargs)
+
+ def __call__(self, img):
+ if not isinstance(img, Image.Image):
+ img = np.ascontiguousarray(img)
+ img = Image.fromarray(img)
+ img = super()._apply_image(img)
+ if isinstance(img, Image.Image):
+ img = np.asarray(img)
+ return img
+
+
+def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+
+class Topk(object):
+
+ def __init__(self, topk=1, class_id_map_file=None):
+ assert isinstance(topk, (int, ))
+ self.class_id_map = self.parse_class_id_map(class_id_map_file)
+ self.topk = topk
+
+ def parse_class_id_map(self, class_id_map_file):
+ if class_id_map_file is None:
+ return None
+ if not os.path.exists(class_id_map_file):
+ print(
+ "Warning: If want to use your own label_dict, please input legal path!\nOtherwise label_names will be empty!"
+ )
+ return None
+
+ try:
+ class_id_map = {}
+ with open(class_id_map_file, "r") as fin:
+ lines = fin.readlines()
+ for line in lines:
+ partition = line.split("\n")[0].partition(" ")
+ class_id_map[int(partition[0])] = str(partition[-1])
+ except Exception as ex:
+ print(ex)
+ class_id_map = None
+ return class_id_map
+
+ def __call__(self, x, file_names=None, multilabel=False):
+ assert isinstance(x, paddle.Tensor)
+ if file_names is not None:
+ assert x.shape[0] == len(file_names)
+ x = F.softmax(x, axis=-1) if not multilabel else F.sigmoid(x)
+ x = x.numpy()
+ y = []
+ for idx, probs in enumerate(x):
+ index = probs.argsort(axis=0)[-self.topk:][::-1].astype("int32") if not multilabel else np.where(
+ probs >= 0.5)[0].astype("int32")
+ clas_id_list = []
+ score_list = []
+ label_name_list = []
+ for i in index:
+ clas_id_list.append(i.item())
+ score_list.append(probs[i].item())
+ if self.class_id_map is not None:
+ label_name_list.append(self.class_id_map[i.item()])
+ result = {
+ "class_ids": clas_id_list,
+ "scores": np.around(score_list, decimals=5).tolist(),
+ }
+ if file_names is not None:
+ result["file_name"] = file_names[idx]
+ if label_name_list is not None:
+ result["label_names"] = label_name_list
+ y.append(result)
+ return y
diff --git a/modules/image/classification/pplcnet_x0_35_imagenet/utils.py b/modules/image/classification/pplcnet_x0_35_imagenet/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..df2bc36b3050beb0256bf2266dd6b33b4590e537
--- /dev/null
+++ b/modules/image/classification/pplcnet_x0_35_imagenet/utils.py
@@ -0,0 +1,129 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import copy
+import os
+
+import yaml
+
+__all__ = ['get_config']
+
+
+class AttrDict(dict):
+
+ def __getattr__(self, key):
+ return self[key]
+
+ def __setattr__(self, key, value):
+ if key in self.__dict__:
+ self.__dict__[key] = value
+ else:
+ self[key] = value
+
+ def __deepcopy__(self, content):
+ return copy.deepcopy(dict(self))
+
+
+def create_attr_dict(yaml_config):
+ from ast import literal_eval
+ for key, value in yaml_config.items():
+ if type(value) is dict:
+ yaml_config[key] = value = AttrDict(value)
+ if isinstance(value, str):
+ try:
+ value = literal_eval(value)
+ except BaseException:
+ pass
+ if isinstance(value, AttrDict):
+ create_attr_dict(yaml_config[key])
+ else:
+ yaml_config[key] = value
+
+
+def parse_config(cfg_file):
+ """Load a config file into AttrDict"""
+ with open(cfg_file, 'r') as fopen:
+ yaml_config = AttrDict(yaml.load(fopen, Loader=yaml.SafeLoader))
+ create_attr_dict(yaml_config)
+ return yaml_config
+
+
+def override(dl, ks, v):
+ """
+ Recursively replace dict of list
+ Args:
+ dl(dict or list): dict or list to be replaced
+ ks(list): list of keys
+ v(str): value to be replaced
+ """
+
+ def str2num(v):
+ try:
+ return eval(v)
+ except Exception:
+ return v
+
+ assert isinstance(dl, (list, dict)), ("{} should be a list or a dict")
+ assert len(ks) > 0, ('lenght of keys should larger than 0')
+ if isinstance(dl, list):
+ k = str2num(ks[0])
+ if len(ks) == 1:
+ assert k < len(dl), ('index({}) out of range({})'.format(k, dl))
+ dl[k] = str2num(v)
+ else:
+ override(dl[k], ks[1:], v)
+ else:
+ if len(ks) == 1:
+ # assert ks[0] in dl, ('{} is not exist in {}'.format(ks[0], dl))
+ if not ks[0] in dl:
+ print('A new filed ({}) detected!'.format(ks[0], dl))
+ dl[ks[0]] = str2num(v)
+ else:
+ override(dl[ks[0]], ks[1:], v)
+
+
+def override_config(config, options=None):
+ """
+ Recursively override the config
+ Args:
+ config(dict): dict to be replaced
+ options(list): list of pairs(key0.key1.idx.key2=value)
+ such as: [
+ 'topk=2',
+ 'VALID.transforms.1.ResizeImage.resize_short=300'
+ ]
+ Returns:
+ config(dict): replaced config
+ """
+ if options is not None:
+ for opt in options:
+ assert isinstance(opt, str), ("option({}) should be a str".format(opt))
+ assert "=" in opt, ("option({}) should contain a ="
+ "to distinguish between key and value".format(opt))
+ pair = opt.split('=')
+ assert len(pair) == 2, ("there can be only a = in the option")
+ key, value = pair
+ keys = key.split('.')
+ override(config, keys, value)
+ return config
+
+
+def get_config(fname, overrides=None, show=False):
+ """
+ Read config from file
+ """
+ assert os.path.exists(fname), ('config file({}) is not exist'.format(fname))
+ config = parse_config(fname)
+ override_config(config, overrides)
+ return config
diff --git a/modules/image/classification/pplcnet_x0_5_imagenet/README.md b/modules/image/classification/pplcnet_x0_5_imagenet/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..3efd7cd06b4d3177e509c8e61c2ddd05bec080bf
--- /dev/null
+++ b/modules/image/classification/pplcnet_x0_5_imagenet/README.md
@@ -0,0 +1,132 @@
+# pplcnet_x0_5_imagenet
+
+|模型名称|pplcnet_x0_5_imagenet|
+| :--- | :---: |
+|类别|图像-图像分类|
+|网络|PPLCNet|
+|数据集|ImageNet-2012|
+|是否支持Fine-tuning|否|
+|模型大小|7 MB|
+|最新更新日期|2022-04-02|
+|数据指标|Acc|
+
+
+## 一、模型基本信息
+
+
+
+- ### 模型介绍
+
+ - PP-LCNet是百度针对Intel CPU 设备以及其加速库 MKLDNN 设计的特定骨干网络 ,比起其他的轻量级的 SOTA 模型,该骨干网络可以在不增加推理时间的情况下,进一步提升模型的性能,最终大幅度超越现有的 SOTA 模型。该模型为模型规模参数scale为x0.5下的PP-LCNet模型,关于模型结构的更多信息,可参考[论文](https://arxiv.org/pdf/2109.15099.pdf)。
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 1.6.2
+
+ - paddlehub >= 1.6.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
+
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install pplcnet_x0_5_imagenet
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ $ hub run pplcnet_x0_5_imagenet --input_path "/PATH/TO/IMAGE"
+ ```
+ - 通过命令行方式实现分类模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ classifier = hub.Module(name="pplcnet_x0_5_imagenet")
+ result = classifier.classification(images=[cv2.imread('/PATH/TO/IMAGE')])
+ # or
+ # result = classifier.classification(paths=['/PATH/TO/IMAGE'])
+ ```
+
+- ### 3、API
+
+
+ - ```python
+ def classification(images=None,
+ paths=None,
+ batch_size=1,
+ use_gpu=False,
+ top_k=1):
+ ```
+ - 分类接口API。
+ - **参数**
+
+ - images (list\[numpy.ndarray\]): 图片数据,每一个图片数据的shape 均为 \[H, W, C\],颜色空间为 BGR;
+ - paths (list\[str\]): 图片的路径;
+ - batch\_size (int): batch 的大小;
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
+ - top\_k (int): 返回预测结果的前 k 个。
+
+ - **返回**
+
+ - res (list\[dict\]): 分类结果,列表的每一个元素均为字典,其中 key 包括'class_ids'(种类索引), 'scores'(置信度) 和 'label_names'(种类名称)
+
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个图像识别的在线服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+ - ```shell
+ $ hub serving start -m pplcnet_x0_5_imagenet
+ ```
+
+ - 这样就完成了一个图像识别的在线服务的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ # 发送HTTP请求
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+ headers = {"Content-type": "application/json"\}
+ url = "http://127.0.0.1:8866/predict/pplcnet_x0_5_imagenet"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # 打印预测结果
+ print(r.json()["results"])
+ ```
+
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+
+ - ```shell
+ $ hub install pplcnet_x0_5_imagenet==1.0.0
+ ```
diff --git a/modules/image/classification/pplcnet_x0_5_imagenet/model.py b/modules/image/classification/pplcnet_x0_5_imagenet/model.py
new file mode 100644
index 0000000000000000000000000000000000000000..8c6a399bc83fc29a7a33df79112bcf66e07146b5
--- /dev/null
+++ b/modules/image/classification/pplcnet_x0_5_imagenet/model.py
@@ -0,0 +1,478 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from typing import Any
+from typing import Callable
+from typing import Dict
+from typing import List
+from typing import Tuple
+from typing import Union
+
+import paddle
+import paddle.nn as nn
+from paddle import ParamAttr
+from paddle.nn import AdaptiveAvgPool2D
+from paddle.nn import BatchNorm
+from paddle.nn import Conv2D
+from paddle.nn import Dropout
+from paddle.nn import Linear
+from paddle.nn.initializer import KaimingNormal
+from paddle.regularizer import L2Decay
+
+
+class Identity(nn.Layer):
+
+ def __init__(self):
+ super(Identity, self).__init__()
+
+ def forward(self, inputs):
+ return inputs
+
+
+class TheseusLayer(nn.Layer):
+
+ def __init__(self, *args, **kwargs):
+ super(TheseusLayer, self).__init__()
+ self.res_dict = {}
+ self.res_name = self.full_name()
+ self.pruner = None
+ self.quanter = None
+
+ def _return_dict_hook(self, layer, input, output):
+ res_dict = {"output": output}
+ # 'list' is needed to avoid error raised by popping self.res_dict
+ for res_key in list(self.res_dict):
+ # clear the res_dict because the forward process may change according to input
+ res_dict[res_key] = self.res_dict.pop(res_key)
+ return res_dict
+
+ def init_res(self, stages_pattern, return_patterns=None, return_stages=None):
+ if return_patterns and return_stages:
+ msg = f"The 'return_patterns' would be ignored when 'return_stages' is set."
+ return_stages = None
+
+ if return_stages is True:
+ return_patterns = stages_pattern
+ # return_stages is int or bool
+ if type(return_stages) is int:
+ return_stages = [return_stages]
+ if isinstance(return_stages, list):
+ if max(return_stages) > len(stages_pattern) or min(return_stages) < 0:
+ msg = f"The 'return_stages' set error. Illegal value(s) have been ignored. The stages' pattern list is {stages_pattern}."
+ return_stages = [val for val in return_stages if val >= 0 and val < len(stages_pattern)]
+ return_patterns = [stages_pattern[i] for i in return_stages]
+
+ if return_patterns:
+ self.update_res(return_patterns)
+
+ def replace_sub(self, *args, **kwargs) -> None:
+ msg = "The function 'replace_sub()' is deprecated, please use 'upgrade_sublayer()' instead."
+ raise DeprecationWarning(msg)
+
+ def upgrade_sublayer(self, layer_name_pattern: Union[str, List[str]],
+ handle_func: Callable[[nn.Layer, str], nn.Layer]) -> Dict[str, nn.Layer]:
+ """use 'handle_func' to modify the sub-layer(s) specified by 'layer_name_pattern'.
+
+ Args:
+ layer_name_pattern (Union[str, List[str]]): The name of layer to be modified by 'handle_func'.
+ handle_func (Callable[[nn.Layer, str], nn.Layer]): The function to modify target layer specified by 'layer_name_pattern'. The formal params are the layer(nn.Layer) and pattern(str) that is (a member of) layer_name_pattern (when layer_name_pattern is List type). And the return is the layer processed.
+
+ Returns:
+ Dict[str, nn.Layer]: The key is the pattern and corresponding value is the result returned by 'handle_func()'.
+
+ Examples:
+
+ from paddle import nn
+ import paddleclas
+
+ def rep_func(layer: nn.Layer, pattern: str):
+ new_layer = nn.Conv2D(
+ in_channels=layer._in_channels,
+ out_channels=layer._out_channels,
+ kernel_size=5,
+ padding=2
+ )
+ return new_layer
+
+ net = paddleclas.MobileNetV1()
+ res = net.replace_sub(layer_name_pattern=["blocks[11].depthwise_conv.conv", "blocks[12].depthwise_conv.conv"], handle_func=rep_func)
+ print(res)
+ # {'blocks[11].depthwise_conv.conv': the corresponding new_layer, 'blocks[12].depthwise_conv.conv': the corresponding new_layer}
+ """
+
+ if not isinstance(layer_name_pattern, list):
+ layer_name_pattern = [layer_name_pattern]
+
+ hit_layer_pattern_list = []
+ for pattern in layer_name_pattern:
+ # parse pattern to find target layer and its parent
+ layer_list = parse_pattern_str(pattern=pattern, parent_layer=self)
+ if not layer_list:
+ continue
+ sub_layer_parent = layer_list[-2]["layer"] if len(layer_list) > 1 else self
+
+ sub_layer = layer_list[-1]["layer"]
+ sub_layer_name = layer_list[-1]["name"]
+ sub_layer_index = layer_list[-1]["index"]
+
+ new_sub_layer = handle_func(sub_layer, pattern)
+
+ if sub_layer_index:
+ getattr(sub_layer_parent, sub_layer_name)[sub_layer_index] = new_sub_layer
+ else:
+ setattr(sub_layer_parent, sub_layer_name, new_sub_layer)
+
+ hit_layer_pattern_list.append(pattern)
+ return hit_layer_pattern_list
+
+ def stop_after(self, stop_layer_name: str) -> bool:
+ """stop forward and backward after 'stop_layer_name'.
+
+ Args:
+ stop_layer_name (str): The name of layer that stop forward and backward after this layer.
+
+ Returns:
+ bool: 'True' if successful, 'False' otherwise.
+ """
+
+ layer_list = parse_pattern_str(stop_layer_name, self)
+ if not layer_list:
+ return False
+
+ parent_layer = self
+ for layer_dict in layer_list:
+ name, index = layer_dict["name"], layer_dict["index"]
+ if not set_identity(parent_layer, name, index):
+ msg = f"Failed to set the layers that after stop_layer_name('{stop_layer_name}') to IdentityLayer. The error layer's name is '{name}'."
+ return False
+ parent_layer = layer_dict["layer"]
+
+ return True
+
+ def update_res(self, return_patterns: Union[str, List[str]]) -> Dict[str, nn.Layer]:
+ """update the result(s) to be returned.
+
+ Args:
+ return_patterns (Union[str, List[str]]): The name of layer to return output.
+
+ Returns:
+ Dict[str, nn.Layer]: The pattern(str) and corresponding layer(nn.Layer) that have been set successfully.
+ """
+
+ # clear res_dict that could have been set
+ self.res_dict = {}
+
+ class Handler(object):
+
+ def __init__(self, res_dict):
+ # res_dict is a reference
+ self.res_dict = res_dict
+
+ def __call__(self, layer, pattern):
+ layer.res_dict = self.res_dict
+ layer.res_name = pattern
+ if hasattr(layer, "hook_remove_helper"):
+ layer.hook_remove_helper.remove()
+ layer.hook_remove_helper = layer.register_forward_post_hook(save_sub_res_hook)
+ return layer
+
+ handle_func = Handler(self.res_dict)
+
+ hit_layer_pattern_list = self.upgrade_sublayer(return_patterns, handle_func=handle_func)
+
+ if hasattr(self, "hook_remove_helper"):
+ self.hook_remove_helper.remove()
+ self.hook_remove_helper = self.register_forward_post_hook(self._return_dict_hook)
+
+ return hit_layer_pattern_list
+
+
+def save_sub_res_hook(layer, input, output):
+ layer.res_dict[layer.res_name] = output
+
+
+def set_identity(parent_layer: nn.Layer, layer_name: str, layer_index: str = None) -> bool:
+ """set the layer specified by layer_name and layer_index to Indentity.
+
+ Args:
+ parent_layer (nn.Layer): The parent layer of target layer specified by layer_name and layer_index.
+ layer_name (str): The name of target layer to be set to Indentity.
+ layer_index (str, optional): The index of target layer to be set to Indentity in parent_layer. Defaults to None.
+
+ Returns:
+ bool: True if successfully, False otherwise.
+ """
+
+ stop_after = False
+ for sub_layer_name in parent_layer._sub_layers:
+ if stop_after:
+ parent_layer._sub_layers[sub_layer_name] = Identity()
+ continue
+ if sub_layer_name == layer_name:
+ stop_after = True
+
+ if layer_index and stop_after:
+ stop_after = False
+ for sub_layer_index in parent_layer._sub_layers[layer_name]._sub_layers:
+ if stop_after:
+ parent_layer._sub_layers[layer_name][sub_layer_index] = Identity()
+ continue
+ if layer_index == sub_layer_index:
+ stop_after = True
+
+ return stop_after
+
+
+def parse_pattern_str(pattern: str, parent_layer: nn.Layer) -> Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]:
+ """parse the string type pattern.
+
+ Args:
+ pattern (str): The pattern to discribe layer.
+ parent_layer (nn.Layer): The root layer relative to the pattern.
+
+ Returns:
+ Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]: None if failed. If successfully, the members are layers parsed in order:
+ [
+ {"layer": first layer, "name": first layer's name parsed, "index": first layer's index parsed if exist},
+ {"layer": second layer, "name": second layer's name parsed, "index": second layer's index parsed if exist},
+ ...
+ ]
+ """
+
+ pattern_list = pattern.split(".")
+ if not pattern_list:
+ msg = f"The pattern('{pattern}') is illegal. Please check and retry."
+ return None
+
+ layer_list = []
+ while len(pattern_list) > 0:
+ if '[' in pattern_list[0]:
+ target_layer_name = pattern_list[0].split('[')[0]
+ target_layer_index = pattern_list[0].split('[')[1].split(']')[0]
+ else:
+ target_layer_name = pattern_list[0]
+ target_layer_index = None
+
+ target_layer = getattr(parent_layer, target_layer_name, None)
+
+ if target_layer is None:
+ msg = f"Not found layer named('{target_layer_name}') specifed in pattern('{pattern}')."
+ return None
+
+ if target_layer_index and target_layer:
+ if int(target_layer_index) < 0 or int(target_layer_index) >= len(target_layer):
+ msg = f"Not found layer by index('{target_layer_index}') specifed in pattern('{pattern}'). The index should < {len(target_layer)} and > 0."
+ return None
+
+ target_layer = target_layer[target_layer_index]
+
+ layer_list.append({"layer": target_layer, "name": target_layer_name, "index": target_layer_index})
+
+ pattern_list = pattern_list[1:]
+ parent_layer = target_layer
+ return layer_list
+
+
+MODEL_STAGES_PATTERN = {"PPLCNet": ["blocks2", "blocks3", "blocks4", "blocks5", "blocks6"]}
+
+# Each element(list) represents a depthwise block, which is composed of k, in_c, out_c, s, use_se.
+# k: kernel_size
+# in_c: input channel number in depthwise block
+# out_c: output channel number in depthwise block
+# s: stride in depthwise block
+# use_se: whether to use SE block
+
+NET_CONFIG = {
+ "blocks2":
+ #k, in_c, out_c, s, use_se
+ [[3, 16, 32, 1, False]],
+ "blocks3": [[3, 32, 64, 2, False], [3, 64, 64, 1, False]],
+ "blocks4": [[3, 64, 128, 2, False], [3, 128, 128, 1, False]],
+ "blocks5": [[3, 128, 256, 2, False], [5, 256, 256, 1, False], [5, 256, 256, 1, False], [5, 256, 256, 1, False],
+ [5, 256, 256, 1, False], [5, 256, 256, 1, False]],
+ "blocks6": [[5, 256, 512, 2, True], [5, 512, 512, 1, True]]
+}
+
+
+def make_divisible(v, divisor=8, min_value=None):
+ if min_value is None:
+ min_value = divisor
+ new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
+ if new_v < 0.9 * v:
+ new_v += divisor
+ return new_v
+
+
+class ConvBNLayer(TheseusLayer):
+
+ def __init__(self, num_channels, filter_size, num_filters, stride, num_groups=1):
+ super().__init__()
+
+ self.conv = Conv2D(in_channels=num_channels,
+ out_channels=num_filters,
+ kernel_size=filter_size,
+ stride=stride,
+ padding=(filter_size - 1) // 2,
+ groups=num_groups,
+ weight_attr=ParamAttr(initializer=KaimingNormal()),
+ bias_attr=False)
+
+ self.bn = BatchNorm(num_filters,
+ param_attr=ParamAttr(regularizer=L2Decay(0.0)),
+ bias_attr=ParamAttr(regularizer=L2Decay(0.0)))
+ self.hardswish = nn.Hardswish()
+
+ def forward(self, x):
+ x = self.conv(x)
+ x = self.bn(x)
+ x = self.hardswish(x)
+ return x
+
+
+class DepthwiseSeparable(TheseusLayer):
+
+ def __init__(self, num_channels, num_filters, stride, dw_size=3, use_se=False):
+ super().__init__()
+ self.use_se = use_se
+ self.dw_conv = ConvBNLayer(num_channels=num_channels,
+ num_filters=num_channels,
+ filter_size=dw_size,
+ stride=stride,
+ num_groups=num_channels)
+ if use_se:
+ self.se = SEModule(num_channels)
+ self.pw_conv = ConvBNLayer(num_channels=num_channels, filter_size=1, num_filters=num_filters, stride=1)
+
+ def forward(self, x):
+ x = self.dw_conv(x)
+ if self.use_se:
+ x = self.se(x)
+ x = self.pw_conv(x)
+ return x
+
+
+class SEModule(TheseusLayer):
+
+ def __init__(self, channel, reduction=4):
+ super().__init__()
+ self.avg_pool = AdaptiveAvgPool2D(1)
+ self.conv1 = Conv2D(in_channels=channel, out_channels=channel // reduction, kernel_size=1, stride=1, padding=0)
+ self.relu = nn.ReLU()
+ self.conv2 = Conv2D(in_channels=channel // reduction, out_channels=channel, kernel_size=1, stride=1, padding=0)
+ self.hardsigmoid = nn.Hardsigmoid()
+
+ def forward(self, x):
+ identity = x
+ x = self.avg_pool(x)
+ x = self.conv1(x)
+ x = self.relu(x)
+ x = self.conv2(x)
+ x = self.hardsigmoid(x)
+ x = paddle.multiply(x=identity, y=x)
+ return x
+
+
+class PPLCNet(TheseusLayer):
+
+ def __init__(self,
+ stages_pattern,
+ scale=1.0,
+ class_num=1000,
+ dropout_prob=0.2,
+ class_expand=1280,
+ return_patterns=None,
+ return_stages=None):
+ super().__init__()
+ self.scale = scale
+ self.class_expand = class_expand
+
+ self.conv1 = ConvBNLayer(num_channels=3, filter_size=3, num_filters=make_divisible(16 * scale), stride=2)
+
+ self.blocks2 = nn.Sequential(*[
+ DepthwiseSeparable(num_channels=make_divisible(in_c * scale),
+ num_filters=make_divisible(out_c * scale),
+ dw_size=k,
+ stride=s,
+ use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks2"])
+ ])
+
+ self.blocks3 = nn.Sequential(*[
+ DepthwiseSeparable(num_channels=make_divisible(in_c * scale),
+ num_filters=make_divisible(out_c * scale),
+ dw_size=k,
+ stride=s,
+ use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks3"])
+ ])
+
+ self.blocks4 = nn.Sequential(*[
+ DepthwiseSeparable(num_channels=make_divisible(in_c * scale),
+ num_filters=make_divisible(out_c * scale),
+ dw_size=k,
+ stride=s,
+ use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks4"])
+ ])
+
+ self.blocks5 = nn.Sequential(*[
+ DepthwiseSeparable(num_channels=make_divisible(in_c * scale),
+ num_filters=make_divisible(out_c * scale),
+ dw_size=k,
+ stride=s,
+ use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks5"])
+ ])
+
+ self.blocks6 = nn.Sequential(*[
+ DepthwiseSeparable(num_channels=make_divisible(in_c * scale),
+ num_filters=make_divisible(out_c * scale),
+ dw_size=k,
+ stride=s,
+ use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks6"])
+ ])
+
+ self.avg_pool = AdaptiveAvgPool2D(1)
+
+ self.last_conv = Conv2D(in_channels=make_divisible(NET_CONFIG["blocks6"][-1][2] * scale),
+ out_channels=self.class_expand,
+ kernel_size=1,
+ stride=1,
+ padding=0,
+ bias_attr=False)
+
+ self.hardswish = nn.Hardswish()
+ self.dropout = Dropout(p=dropout_prob, mode="downscale_in_infer")
+ self.flatten = nn.Flatten(start_axis=1, stop_axis=-1)
+
+ self.fc = Linear(self.class_expand, class_num)
+
+ super().init_res(stages_pattern, return_patterns=return_patterns, return_stages=return_stages)
+
+ def forward(self, x):
+ x = self.conv1(x)
+
+ x = self.blocks2(x)
+ x = self.blocks3(x)
+ x = self.blocks4(x)
+ x = self.blocks5(x)
+ x = self.blocks6(x)
+
+ x = self.avg_pool(x)
+ x = self.last_conv(x)
+ x = self.hardswish(x)
+ x = self.dropout(x)
+ x = self.flatten(x)
+ x = self.fc(x)
+ return x
+
+
+def PPLCNet_x0_5(pretrained=False, use_ssld=False, **kwargs):
+ model = PPLCNet(scale=0.5, stages_pattern=MODEL_STAGES_PATTERN["PPLCNet"], **kwargs)
+ return model
diff --git a/modules/image/classification/pplcnet_x0_5_imagenet/module.py b/modules/image/classification/pplcnet_x0_5_imagenet/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..05ac64722efd510096c6c88a63fb56b65e3055a3
--- /dev/null
+++ b/modules/image/classification/pplcnet_x0_5_imagenet/module.py
@@ -0,0 +1,154 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import copy
+import os
+
+import cv2
+import numpy as np
+import paddle
+from skimage.io import imread
+from skimage.transform import rescale
+from skimage.transform import resize
+
+import paddlehub as hub
+from .model import PPLCNet_x0_5
+from .processor import base64_to_cv2
+from .processor import create_operators
+from .processor import Topk
+from .utils import get_config
+from paddlehub.module.module import moduleinfo
+from paddlehub.module.module import runnable
+from paddlehub.module.module import serving
+
+
+@moduleinfo(name="pplcnet_x0_5_imagenet",
+ type="cv/classification",
+ author="paddlepaddle",
+ author_email="",
+ summary="",
+ version="1.0.0")
+class PPLcNet_x0_5:
+
+ def __init__(self):
+ self.config = get_config(os.path.join(self.directory, 'PPLCNet_x0_5.yaml'), show=False)
+ self.label_path = os.path.join(self.directory, 'imagenet1k_label_list.txt')
+ self.pretrain_path = os.path.join(self.directory, 'PPLCNet_x0_5_pretrained.pdparams')
+ self.config['Infer']['PostProcess']['class_id_map_file'] = self.label_path
+ self.model = PPLCNet_x0_5()
+ param_state_dict = paddle.load(self.pretrain_path)
+ self.model.set_dict(param_state_dict)
+ self.preprocess_funcs = create_operators(self.config["Infer"]["transforms"])
+
+ def classification(self,
+ images: list = None,
+ paths: list = None,
+ batch_size: int = 1,
+ use_gpu: bool = False,
+ top_k: int = 1):
+ '''
+ Args:
+ images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR.
+ paths (list[str]): The paths of images.
+ batch_size (int): batch size.
+ use_gpu (bool): Whether to use gpu.
+ top_k (int): Return top k results.
+
+ Returns:
+ res (list[dict]): The classfication results, each result dict contains key 'class_ids', 'scores' and 'label_names'.
+ '''
+ postprocess_func = Topk(top_k, self.label_path)
+ inputs = []
+ results = []
+ paddle.disable_static()
+ place = 'gpu:0' if use_gpu else 'cpu'
+ place = paddle.set_device(place)
+ if images == None and paths == None:
+ print('No image provided. Please input an image or a image path.')
+ return
+
+ if images != None:
+ for image in images:
+ image = image[:, :, ::-1]
+ inputs.append(image)
+
+ if paths != None:
+ for path in paths:
+ image = cv2.imread(path)[:, :, ::-1]
+ inputs.append(image)
+
+ batch_data = []
+ for idx, imagedata in enumerate(inputs):
+ for process in self.preprocess_funcs:
+ imagedata = process(imagedata)
+ batch_data.append(imagedata)
+ if len(batch_data) >= batch_size or idx == len(inputs) - 1:
+ batch_tensor = paddle.to_tensor(batch_data)
+ out = self.model(batch_tensor)
+ if isinstance(out, list):
+ out = out[0]
+ if isinstance(out, dict) and "logits" in out:
+ out = out["logits"]
+ if isinstance(out, dict) and "output" in out:
+ out = out["output"]
+ result = postprocess_func(out)
+ results.extend(result)
+ batch_data.clear()
+ return results
+
+ @runnable
+ def run_cmd(self, argvs: list):
+ """
+ Run as a command.
+ """
+ self.parser = argparse.ArgumentParser(description="Run the {} module.".format(self.name),
+ prog='hub run {}'.format(self.name),
+ usage='%(prog)s',
+ add_help=True)
+
+ self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
+ self.arg_config_group = self.parser.add_argument_group(
+ title="Config options", description="Run configuration for controlling module behavior, not required.")
+ self.add_module_config_arg()
+ self.add_module_input_arg()
+ self.args = self.parser.parse_args(argvs)
+ results = self.classification(paths=[self.args.input_path],
+ use_gpu=self.args.use_gpu,
+ batch_size=self.args.batch_size,
+ top_k=self.args.top_k)
+ return results
+
+ @serving
+ def serving_method(self, images, **kwargs):
+ """
+ Run as a service.
+ """
+ images_decode = [base64_to_cv2(image) for image in images]
+ results = self.classification(images=images_decode, **kwargs)
+ return results
+
+ def add_module_config_arg(self):
+ """
+ Add the command config options.
+ """
+ self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not")
+
+ self.arg_config_group.add_argument('--batch_size', type=int, default=1, help='batch size')
+ self.arg_config_group.add_argument('--top_k', type=int, default=1, help='Return top k results.')
+
+ def add_module_input_arg(self):
+ """
+ Add the command input options.
+ """
+ self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.")
diff --git a/modules/image/classification/pplcnet_x0_5_imagenet/processor.py b/modules/image/classification/pplcnet_x0_5_imagenet/processor.py
new file mode 100644
index 0000000000000000000000000000000000000000..40cab3917ecaef50cd47d0abb76bbd5d49062bf8
--- /dev/null
+++ b/modules/image/classification/pplcnet_x0_5_imagenet/processor.py
@@ -0,0 +1,374 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+
+import base64
+import inspect
+import math
+import os
+import random
+import sys
+from functools import partial
+
+import cv2
+import numpy as np
+import paddle
+import paddle.nn.functional as F
+import six
+from paddle.vision.transforms import ColorJitter as RawColorJitter
+from PIL import Image
+
+
+def create_operators(params, class_num=None):
+ """
+ create operators based on the config
+
+ Args:
+ params(list): a dict list, used to create some operators
+ """
+ assert isinstance(params, list), ('operator config should be a list')
+ ops = []
+ current_module = sys.modules[__name__]
+ for operator in params:
+ assert isinstance(operator, dict) and len(operator) == 1, "yaml format error"
+ op_name = list(operator)[0]
+ param = {} if operator[op_name] is None else operator[op_name]
+ op_func = getattr(current_module, op_name)
+ if "class_num" in inspect.getfullargspec(op_func).args:
+ param.update({"class_num": class_num})
+ op = op_func(**param)
+ ops.append(op)
+
+ return ops
+
+
+class UnifiedResize(object):
+
+ def __init__(self, interpolation=None, backend="cv2"):
+ _cv2_interp_from_str = {
+ 'nearest': cv2.INTER_NEAREST,
+ 'bilinear': cv2.INTER_LINEAR,
+ 'area': cv2.INTER_AREA,
+ 'bicubic': cv2.INTER_CUBIC,
+ 'lanczos': cv2.INTER_LANCZOS4
+ }
+ _pil_interp_from_str = {
+ 'nearest': Image.NEAREST,
+ 'bilinear': Image.BILINEAR,
+ 'bicubic': Image.BICUBIC,
+ 'box': Image.BOX,
+ 'lanczos': Image.LANCZOS,
+ 'hamming': Image.HAMMING
+ }
+
+ def _pil_resize(src, size, resample):
+ pil_img = Image.fromarray(src)
+ pil_img = pil_img.resize(size, resample)
+ return np.asarray(pil_img)
+
+ if backend.lower() == "cv2":
+ if isinstance(interpolation, str):
+ interpolation = _cv2_interp_from_str[interpolation.lower()]
+ # compatible with opencv < version 4.4.0
+ elif interpolation is None:
+ interpolation = cv2.INTER_LINEAR
+ self.resize_func = partial(cv2.resize, interpolation=interpolation)
+ elif backend.lower() == "pil":
+ if isinstance(interpolation, str):
+ interpolation = _pil_interp_from_str[interpolation.lower()]
+ self.resize_func = partial(_pil_resize, resample=interpolation)
+ else:
+ self.resize_func = cv2.resize
+
+ def __call__(self, src, size):
+ return self.resize_func(src, size)
+
+
+class OperatorParamError(ValueError):
+ """ OperatorParamError
+ """
+ pass
+
+
+class DecodeImage(object):
+ """ decode image """
+
+ def __init__(self, to_rgb=True, to_np=False, channel_first=False):
+ self.to_rgb = to_rgb
+ self.to_np = to_np # to numpy
+ self.channel_first = channel_first # only enabled when to_np is True
+
+ def __call__(self, img):
+ if six.PY2:
+ assert type(img) is str and len(img) > 0, "invalid input 'img' in DecodeImage"
+ else:
+ assert type(img) is bytes and len(img) > 0, "invalid input 'img' in DecodeImage"
+ data = np.frombuffer(img, dtype='uint8')
+ img = cv2.imdecode(data, 1)
+ if self.to_rgb:
+ assert img.shape[2] == 3, 'invalid shape of image[%s]' % (img.shape)
+ img = img[:, :, ::-1]
+
+ if self.channel_first:
+ img = img.transpose((2, 0, 1))
+
+ return img
+
+
+class ResizeImage(object):
+ """ resize image """
+
+ def __init__(self, size=None, resize_short=None, interpolation=None, backend="cv2"):
+ if resize_short is not None and resize_short > 0:
+ self.resize_short = resize_short
+ self.w = None
+ self.h = None
+ elif size is not None:
+ self.resize_short = None
+ self.w = size if type(size) is int else size[0]
+ self.h = size if type(size) is int else size[1]
+ else:
+ raise OperatorParamError("invalid params for ReisizeImage for '\
+ 'both 'size' and 'resize_short' are None")
+
+ self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend)
+
+ def __call__(self, img):
+ img_h, img_w = img.shape[:2]
+ if self.resize_short is not None:
+ percent = float(self.resize_short) / min(img_w, img_h)
+ w = int(round(img_w * percent))
+ h = int(round(img_h * percent))
+ else:
+ w = self.w
+ h = self.h
+ return self._resize_func(img, (w, h))
+
+
+class CropImage(object):
+ """ crop image """
+
+ def __init__(self, size):
+ if type(size) is int:
+ self.size = (size, size)
+ else:
+ self.size = size # (h, w)
+
+ def __call__(self, img):
+ w, h = self.size
+ img_h, img_w = img.shape[:2]
+ w_start = (img_w - w) // 2
+ h_start = (img_h - h) // 2
+
+ w_end = w_start + w
+ h_end = h_start + h
+ return img[h_start:h_end, w_start:w_end, :]
+
+
+class RandCropImage(object):
+ """ random crop image """
+
+ def __init__(self, size, scale=None, ratio=None, interpolation=None, backend="cv2"):
+ if type(size) is int:
+ self.size = (size, size) # (h, w)
+ else:
+ self.size = size
+
+ self.scale = [0.08, 1.0] if scale is None else scale
+ self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio
+
+ self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend)
+
+ def __call__(self, img):
+ size = self.size
+ scale = self.scale
+ ratio = self.ratio
+
+ aspect_ratio = math.sqrt(random.uniform(*ratio))
+ w = 1. * aspect_ratio
+ h = 1. / aspect_ratio
+
+ img_h, img_w = img.shape[:2]
+
+ bound = min((float(img_w) / img_h) / (w**2), (float(img_h) / img_w) / (h**2))
+ scale_max = min(scale[1], bound)
+ scale_min = min(scale[0], bound)
+
+ target_area = img_w * img_h * random.uniform(scale_min, scale_max)
+ target_size = math.sqrt(target_area)
+ w = int(target_size * w)
+ h = int(target_size * h)
+
+ i = random.randint(0, img_w - w)
+ j = random.randint(0, img_h - h)
+
+ img = img[j:j + h, i:i + w, :]
+
+ return self._resize_func(img, size)
+
+
+class RandFlipImage(object):
+ """ random flip image
+ flip_code:
+ 1: Flipped Horizontally
+ 0: Flipped Vertically
+ -1: Flipped Horizontally & Vertically
+ """
+
+ def __init__(self, flip_code=1):
+ assert flip_code in [-1, 0, 1], "flip_code should be a value in [-1, 0, 1]"
+ self.flip_code = flip_code
+
+ def __call__(self, img):
+ if random.randint(0, 1) == 1:
+ return cv2.flip(img, self.flip_code)
+ else:
+ return img
+
+
+class NormalizeImage(object):
+ """ normalize image such as substract mean, divide std
+ """
+
+ def __init__(self, scale=None, mean=None, std=None, order='chw', output_fp16=False, channel_num=3):
+ if isinstance(scale, str):
+ scale = eval(scale)
+ assert channel_num in [3, 4], "channel number of input image should be set to 3 or 4."
+ self.channel_num = channel_num
+ self.output_dtype = 'float16' if output_fp16 else 'float32'
+ self.scale = np.float32(scale if scale is not None else 1.0 / 255.0)
+ self.order = order
+ mean = mean if mean is not None else [0.485, 0.456, 0.406]
+ std = std if std is not None else [0.229, 0.224, 0.225]
+
+ shape = (3, 1, 1) if self.order == 'chw' else (1, 1, 3)
+ self.mean = np.array(mean).reshape(shape).astype('float32')
+ self.std = np.array(std).reshape(shape).astype('float32')
+
+ def __call__(self, img):
+ from PIL import Image
+ if isinstance(img, Image.Image):
+ img = np.array(img)
+
+ assert isinstance(img, np.ndarray), "invalid input 'img' in NormalizeImage"
+
+ img = (img.astype('float32') * self.scale - self.mean) / self.std
+
+ if self.channel_num == 4:
+ img_h = img.shape[1] if self.order == 'chw' else img.shape[0]
+ img_w = img.shape[2] if self.order == 'chw' else img.shape[1]
+ pad_zeros = np.zeros((1, img_h, img_w)) if self.order == 'chw' else np.zeros((img_h, img_w, 1))
+ img = (np.concatenate((img, pad_zeros), axis=0) if self.order == 'chw' else np.concatenate(
+ (img, pad_zeros), axis=2))
+ return img.astype(self.output_dtype)
+
+
+class ToCHWImage(object):
+ """ convert hwc image to chw image
+ """
+
+ def __init__(self):
+ pass
+
+ def __call__(self, img):
+ from PIL import Image
+ if isinstance(img, Image.Image):
+ img = np.array(img)
+
+ return img.transpose((2, 0, 1))
+
+
+class ColorJitter(RawColorJitter):
+ """ColorJitter.
+ """
+
+ def __init__(self, *args, **kwargs):
+ super().__init__(*args, **kwargs)
+
+ def __call__(self, img):
+ if not isinstance(img, Image.Image):
+ img = np.ascontiguousarray(img)
+ img = Image.fromarray(img)
+ img = super()._apply_image(img)
+ if isinstance(img, Image.Image):
+ img = np.asarray(img)
+ return img
+
+
+def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+
+class Topk(object):
+
+ def __init__(self, topk=1, class_id_map_file=None):
+ assert isinstance(topk, (int, ))
+ self.class_id_map = self.parse_class_id_map(class_id_map_file)
+ self.topk = topk
+
+ def parse_class_id_map(self, class_id_map_file):
+ if class_id_map_file is None:
+ return None
+ if not os.path.exists(class_id_map_file):
+ print(
+ "Warning: If want to use your own label_dict, please input legal path!\nOtherwise label_names will be empty!"
+ )
+ return None
+
+ try:
+ class_id_map = {}
+ with open(class_id_map_file, "r") as fin:
+ lines = fin.readlines()
+ for line in lines:
+ partition = line.split("\n")[0].partition(" ")
+ class_id_map[int(partition[0])] = str(partition[-1])
+ except Exception as ex:
+ print(ex)
+ class_id_map = None
+ return class_id_map
+
+ def __call__(self, x, file_names=None, multilabel=False):
+ assert isinstance(x, paddle.Tensor)
+ if file_names is not None:
+ assert x.shape[0] == len(file_names)
+ x = F.softmax(x, axis=-1) if not multilabel else F.sigmoid(x)
+ x = x.numpy()
+ y = []
+ for idx, probs in enumerate(x):
+ index = probs.argsort(axis=0)[-self.topk:][::-1].astype("int32") if not multilabel else np.where(
+ probs >= 0.5)[0].astype("int32")
+ clas_id_list = []
+ score_list = []
+ label_name_list = []
+ for i in index:
+ clas_id_list.append(i.item())
+ score_list.append(probs[i].item())
+ if self.class_id_map is not None:
+ label_name_list.append(self.class_id_map[i.item()])
+ result = {
+ "class_ids": clas_id_list,
+ "scores": np.around(score_list, decimals=5).tolist(),
+ }
+ if file_names is not None:
+ result["file_name"] = file_names[idx]
+ if label_name_list is not None:
+ result["label_names"] = label_name_list
+ y.append(result)
+ return y
diff --git a/modules/image/classification/pplcnet_x0_5_imagenet/utils.py b/modules/image/classification/pplcnet_x0_5_imagenet/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..df2bc36b3050beb0256bf2266dd6b33b4590e537
--- /dev/null
+++ b/modules/image/classification/pplcnet_x0_5_imagenet/utils.py
@@ -0,0 +1,129 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import copy
+import os
+
+import yaml
+
+__all__ = ['get_config']
+
+
+class AttrDict(dict):
+
+ def __getattr__(self, key):
+ return self[key]
+
+ def __setattr__(self, key, value):
+ if key in self.__dict__:
+ self.__dict__[key] = value
+ else:
+ self[key] = value
+
+ def __deepcopy__(self, content):
+ return copy.deepcopy(dict(self))
+
+
+def create_attr_dict(yaml_config):
+ from ast import literal_eval
+ for key, value in yaml_config.items():
+ if type(value) is dict:
+ yaml_config[key] = value = AttrDict(value)
+ if isinstance(value, str):
+ try:
+ value = literal_eval(value)
+ except BaseException:
+ pass
+ if isinstance(value, AttrDict):
+ create_attr_dict(yaml_config[key])
+ else:
+ yaml_config[key] = value
+
+
+def parse_config(cfg_file):
+ """Load a config file into AttrDict"""
+ with open(cfg_file, 'r') as fopen:
+ yaml_config = AttrDict(yaml.load(fopen, Loader=yaml.SafeLoader))
+ create_attr_dict(yaml_config)
+ return yaml_config
+
+
+def override(dl, ks, v):
+ """
+ Recursively replace dict of list
+ Args:
+ dl(dict or list): dict or list to be replaced
+ ks(list): list of keys
+ v(str): value to be replaced
+ """
+
+ def str2num(v):
+ try:
+ return eval(v)
+ except Exception:
+ return v
+
+ assert isinstance(dl, (list, dict)), ("{} should be a list or a dict")
+ assert len(ks) > 0, ('lenght of keys should larger than 0')
+ if isinstance(dl, list):
+ k = str2num(ks[0])
+ if len(ks) == 1:
+ assert k < len(dl), ('index({}) out of range({})'.format(k, dl))
+ dl[k] = str2num(v)
+ else:
+ override(dl[k], ks[1:], v)
+ else:
+ if len(ks) == 1:
+ # assert ks[0] in dl, ('{} is not exist in {}'.format(ks[0], dl))
+ if not ks[0] in dl:
+ print('A new filed ({}) detected!'.format(ks[0], dl))
+ dl[ks[0]] = str2num(v)
+ else:
+ override(dl[ks[0]], ks[1:], v)
+
+
+def override_config(config, options=None):
+ """
+ Recursively override the config
+ Args:
+ config(dict): dict to be replaced
+ options(list): list of pairs(key0.key1.idx.key2=value)
+ such as: [
+ 'topk=2',
+ 'VALID.transforms.1.ResizeImage.resize_short=300'
+ ]
+ Returns:
+ config(dict): replaced config
+ """
+ if options is not None:
+ for opt in options:
+ assert isinstance(opt, str), ("option({}) should be a str".format(opt))
+ assert "=" in opt, ("option({}) should contain a ="
+ "to distinguish between key and value".format(opt))
+ pair = opt.split('=')
+ assert len(pair) == 2, ("there can be only a = in the option")
+ key, value = pair
+ keys = key.split('.')
+ override(config, keys, value)
+ return config
+
+
+def get_config(fname, overrides=None, show=False):
+ """
+ Read config from file
+ """
+ assert os.path.exists(fname), ('config file({}) is not exist'.format(fname))
+ config = parse_config(fname)
+ override_config(config, overrides)
+ return config
diff --git a/modules/image/classification/pplcnet_x0_75_imagenet/README.md b/modules/image/classification/pplcnet_x0_75_imagenet/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..72c8c072d86af617eda317d9a03cd83978153763
--- /dev/null
+++ b/modules/image/classification/pplcnet_x0_75_imagenet/README.md
@@ -0,0 +1,132 @@
+# pplcnet_x0_75_imagenet
+
+|模型名称|pplcnet_x0_75_imagenet|
+| :--- | :---: |
+|类别|图像-图像分类|
+|网络|PPLCNet|
+|数据集|ImageNet-2012|
+|是否支持Fine-tuning|否|
+|模型大小|9 MB|
+|最新更新日期|2022-04-02|
+|数据指标|Acc|
+
+
+## 一、模型基本信息
+
+
+
+- ### 模型介绍
+
+ - PP-LCNet是百度针对Intel CPU 设备以及其加速库 MKLDNN 设计的特定骨干网络 ,比起其他的轻量级的 SOTA 模型,该骨干网络可以在不增加推理时间的情况下,进一步提升模型的性能,最终大幅度超越现有的 SOTA 模型。该模型为模型规模参数scale为x0.75下的PP-LCNet模型,关于模型结构的更多信息,可参考[论文](https://arxiv.org/pdf/2109.15099.pdf)。
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 1.6.2
+
+ - paddlehub >= 1.6.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
+
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install pplcnet_x0_75_imagenet
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ $ hub run pplcnet_x0_75_imagenet --input_path "/PATH/TO/IMAGE"
+ ```
+ - 通过命令行方式实现分类模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ classifier = hub.Module(name="pplcnet_x0_75_imagenet")
+ result = classifier.classification(images=[cv2.imread('/PATH/TO/IMAGE')])
+ # or
+ # result = classifier.classification(paths=['/PATH/TO/IMAGE'])
+ ```
+
+- ### 3、API
+
+
+ - ```python
+ def classification(images=None,
+ paths=None,
+ batch_size=1,
+ use_gpu=False,
+ top_k=1):
+ ```
+ - 分类接口API。
+ - **参数**
+
+ - images (list\[numpy.ndarray\]): 图片数据,每一个图片数据的shape 均为 \[H, W, C\],颜色空间为 BGR;
+ - paths (list\[str\]): 图片的路径;
+ - batch\_size (int): batch 的大小;
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
+ - top\_k (int): 返回预测结果的前 k 个。
+
+ - **返回**
+
+ - res (list\[dict\]): 分类结果,列表的每一个元素均为字典,其中 key 包括'class_ids'(种类索引), 'scores'(置信度) 和 'label_names'(种类名称)
+
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个图像识别的在线服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+ - ```shell
+ $ hub serving start -m pplcnet_x0_75_imagenet
+ ```
+
+ - 这样就完成了一个图像识别的在线服务的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ # 发送HTTP请求
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+ headers = {"Content-type": "application/json"\}
+ url = "http://127.0.0.1:8866/predict/pplcnet_x0_75_imagenet"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # 打印预测结果
+ print(r.json()["results"])
+ ```
+
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+
+ - ```shell
+ $ hub install pplcnet_x0_75_imagenet==1.0.0
+ ```
diff --git a/modules/image/classification/pplcnet_x0_75_imagenet/model.py b/modules/image/classification/pplcnet_x0_75_imagenet/model.py
new file mode 100644
index 0000000000000000000000000000000000000000..df546e13b47c0a9a3c64dc44b46ffbcdf326e7fd
--- /dev/null
+++ b/modules/image/classification/pplcnet_x0_75_imagenet/model.py
@@ -0,0 +1,478 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from typing import Any
+from typing import Callable
+from typing import Dict
+from typing import List
+from typing import Tuple
+from typing import Union
+
+import paddle
+import paddle.nn as nn
+from paddle import ParamAttr
+from paddle.nn import AdaptiveAvgPool2D
+from paddle.nn import BatchNorm
+from paddle.nn import Conv2D
+from paddle.nn import Dropout
+from paddle.nn import Linear
+from paddle.nn.initializer import KaimingNormal
+from paddle.regularizer import L2Decay
+
+
+class Identity(nn.Layer):
+
+ def __init__(self):
+ super(Identity, self).__init__()
+
+ def forward(self, inputs):
+ return inputs
+
+
+class TheseusLayer(nn.Layer):
+
+ def __init__(self, *args, **kwargs):
+ super(TheseusLayer, self).__init__()
+ self.res_dict = {}
+ self.res_name = self.full_name()
+ self.pruner = None
+ self.quanter = None
+
+ def _return_dict_hook(self, layer, input, output):
+ res_dict = {"output": output}
+ # 'list' is needed to avoid error raised by popping self.res_dict
+ for res_key in list(self.res_dict):
+ # clear the res_dict because the forward process may change according to input
+ res_dict[res_key] = self.res_dict.pop(res_key)
+ return res_dict
+
+ def init_res(self, stages_pattern, return_patterns=None, return_stages=None):
+ if return_patterns and return_stages:
+ msg = f"The 'return_patterns' would be ignored when 'return_stages' is set."
+ return_stages = None
+
+ if return_stages is True:
+ return_patterns = stages_pattern
+ # return_stages is int or bool
+ if type(return_stages) is int:
+ return_stages = [return_stages]
+ if isinstance(return_stages, list):
+ if max(return_stages) > len(stages_pattern) or min(return_stages) < 0:
+ msg = f"The 'return_stages' set error. Illegal value(s) have been ignored. The stages' pattern list is {stages_pattern}."
+ return_stages = [val for val in return_stages if val >= 0 and val < len(stages_pattern)]
+ return_patterns = [stages_pattern[i] for i in return_stages]
+
+ if return_patterns:
+ self.update_res(return_patterns)
+
+ def replace_sub(self, *args, **kwargs) -> None:
+ msg = "The function 'replace_sub()' is deprecated, please use 'upgrade_sublayer()' instead."
+ raise DeprecationWarning(msg)
+
+ def upgrade_sublayer(self, layer_name_pattern: Union[str, List[str]],
+ handle_func: Callable[[nn.Layer, str], nn.Layer]) -> Dict[str, nn.Layer]:
+ """use 'handle_func' to modify the sub-layer(s) specified by 'layer_name_pattern'.
+
+ Args:
+ layer_name_pattern (Union[str, List[str]]): The name of layer to be modified by 'handle_func'.
+ handle_func (Callable[[nn.Layer, str], nn.Layer]): The function to modify target layer specified by 'layer_name_pattern'. The formal params are the layer(nn.Layer) and pattern(str) that is (a member of) layer_name_pattern (when layer_name_pattern is List type). And the return is the layer processed.
+
+ Returns:
+ Dict[str, nn.Layer]: The key is the pattern and corresponding value is the result returned by 'handle_func()'.
+
+ Examples:
+
+ from paddle import nn
+ import paddleclas
+
+ def rep_func(layer: nn.Layer, pattern: str):
+ new_layer = nn.Conv2D(
+ in_channels=layer._in_channels,
+ out_channels=layer._out_channels,
+ kernel_size=5,
+ padding=2
+ )
+ return new_layer
+
+ net = paddleclas.MobileNetV1()
+ res = net.replace_sub(layer_name_pattern=["blocks[11].depthwise_conv.conv", "blocks[12].depthwise_conv.conv"], handle_func=rep_func)
+ print(res)
+ # {'blocks[11].depthwise_conv.conv': the corresponding new_layer, 'blocks[12].depthwise_conv.conv': the corresponding new_layer}
+ """
+
+ if not isinstance(layer_name_pattern, list):
+ layer_name_pattern = [layer_name_pattern]
+
+ hit_layer_pattern_list = []
+ for pattern in layer_name_pattern:
+ # parse pattern to find target layer and its parent
+ layer_list = parse_pattern_str(pattern=pattern, parent_layer=self)
+ if not layer_list:
+ continue
+ sub_layer_parent = layer_list[-2]["layer"] if len(layer_list) > 1 else self
+
+ sub_layer = layer_list[-1]["layer"]
+ sub_layer_name = layer_list[-1]["name"]
+ sub_layer_index = layer_list[-1]["index"]
+
+ new_sub_layer = handle_func(sub_layer, pattern)
+
+ if sub_layer_index:
+ getattr(sub_layer_parent, sub_layer_name)[sub_layer_index] = new_sub_layer
+ else:
+ setattr(sub_layer_parent, sub_layer_name, new_sub_layer)
+
+ hit_layer_pattern_list.append(pattern)
+ return hit_layer_pattern_list
+
+ def stop_after(self, stop_layer_name: str) -> bool:
+ """stop forward and backward after 'stop_layer_name'.
+
+ Args:
+ stop_layer_name (str): The name of layer that stop forward and backward after this layer.
+
+ Returns:
+ bool: 'True' if successful, 'False' otherwise.
+ """
+
+ layer_list = parse_pattern_str(stop_layer_name, self)
+ if not layer_list:
+ return False
+
+ parent_layer = self
+ for layer_dict in layer_list:
+ name, index = layer_dict["name"], layer_dict["index"]
+ if not set_identity(parent_layer, name, index):
+ msg = f"Failed to set the layers that after stop_layer_name('{stop_layer_name}') to IdentityLayer. The error layer's name is '{name}'."
+ return False
+ parent_layer = layer_dict["layer"]
+
+ return True
+
+ def update_res(self, return_patterns: Union[str, List[str]]) -> Dict[str, nn.Layer]:
+ """update the result(s) to be returned.
+
+ Args:
+ return_patterns (Union[str, List[str]]): The name of layer to return output.
+
+ Returns:
+ Dict[str, nn.Layer]: The pattern(str) and corresponding layer(nn.Layer) that have been set successfully.
+ """
+
+ # clear res_dict that could have been set
+ self.res_dict = {}
+
+ class Handler(object):
+
+ def __init__(self, res_dict):
+ # res_dict is a reference
+ self.res_dict = res_dict
+
+ def __call__(self, layer, pattern):
+ layer.res_dict = self.res_dict
+ layer.res_name = pattern
+ if hasattr(layer, "hook_remove_helper"):
+ layer.hook_remove_helper.remove()
+ layer.hook_remove_helper = layer.register_forward_post_hook(save_sub_res_hook)
+ return layer
+
+ handle_func = Handler(self.res_dict)
+
+ hit_layer_pattern_list = self.upgrade_sublayer(return_patterns, handle_func=handle_func)
+
+ if hasattr(self, "hook_remove_helper"):
+ self.hook_remove_helper.remove()
+ self.hook_remove_helper = self.register_forward_post_hook(self._return_dict_hook)
+
+ return hit_layer_pattern_list
+
+
+def save_sub_res_hook(layer, input, output):
+ layer.res_dict[layer.res_name] = output
+
+
+def set_identity(parent_layer: nn.Layer, layer_name: str, layer_index: str = None) -> bool:
+ """set the layer specified by layer_name and layer_index to Indentity.
+
+ Args:
+ parent_layer (nn.Layer): The parent layer of target layer specified by layer_name and layer_index.
+ layer_name (str): The name of target layer to be set to Indentity.
+ layer_index (str, optional): The index of target layer to be set to Indentity in parent_layer. Defaults to None.
+
+ Returns:
+ bool: True if successfully, False otherwise.
+ """
+
+ stop_after = False
+ for sub_layer_name in parent_layer._sub_layers:
+ if stop_after:
+ parent_layer._sub_layers[sub_layer_name] = Identity()
+ continue
+ if sub_layer_name == layer_name:
+ stop_after = True
+
+ if layer_index and stop_after:
+ stop_after = False
+ for sub_layer_index in parent_layer._sub_layers[layer_name]._sub_layers:
+ if stop_after:
+ parent_layer._sub_layers[layer_name][sub_layer_index] = Identity()
+ continue
+ if layer_index == sub_layer_index:
+ stop_after = True
+
+ return stop_after
+
+
+def parse_pattern_str(pattern: str, parent_layer: nn.Layer) -> Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]:
+ """parse the string type pattern.
+
+ Args:
+ pattern (str): The pattern to discribe layer.
+ parent_layer (nn.Layer): The root layer relative to the pattern.
+
+ Returns:
+ Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]: None if failed. If successfully, the members are layers parsed in order:
+ [
+ {"layer": first layer, "name": first layer's name parsed, "index": first layer's index parsed if exist},
+ {"layer": second layer, "name": second layer's name parsed, "index": second layer's index parsed if exist},
+ ...
+ ]
+ """
+
+ pattern_list = pattern.split(".")
+ if not pattern_list:
+ msg = f"The pattern('{pattern}') is illegal. Please check and retry."
+ return None
+
+ layer_list = []
+ while len(pattern_list) > 0:
+ if '[' in pattern_list[0]:
+ target_layer_name = pattern_list[0].split('[')[0]
+ target_layer_index = pattern_list[0].split('[')[1].split(']')[0]
+ else:
+ target_layer_name = pattern_list[0]
+ target_layer_index = None
+
+ target_layer = getattr(parent_layer, target_layer_name, None)
+
+ if target_layer is None:
+ msg = f"Not found layer named('{target_layer_name}') specifed in pattern('{pattern}')."
+ return None
+
+ if target_layer_index and target_layer:
+ if int(target_layer_index) < 0 or int(target_layer_index) >= len(target_layer):
+ msg = f"Not found layer by index('{target_layer_index}') specifed in pattern('{pattern}'). The index should < {len(target_layer)} and > 0."
+ return None
+
+ target_layer = target_layer[target_layer_index]
+
+ layer_list.append({"layer": target_layer, "name": target_layer_name, "index": target_layer_index})
+
+ pattern_list = pattern_list[1:]
+ parent_layer = target_layer
+ return layer_list
+
+
+MODEL_STAGES_PATTERN = {"PPLCNet": ["blocks2", "blocks3", "blocks4", "blocks5", "blocks6"]}
+
+# Each element(list) represents a depthwise block, which is composed of k, in_c, out_c, s, use_se.
+# k: kernel_size
+# in_c: input channel number in depthwise block
+# out_c: output channel number in depthwise block
+# s: stride in depthwise block
+# use_se: whether to use SE block
+
+NET_CONFIG = {
+ "blocks2":
+ #k, in_c, out_c, s, use_se
+ [[3, 16, 32, 1, False]],
+ "blocks3": [[3, 32, 64, 2, False], [3, 64, 64, 1, False]],
+ "blocks4": [[3, 64, 128, 2, False], [3, 128, 128, 1, False]],
+ "blocks5": [[3, 128, 256, 2, False], [5, 256, 256, 1, False], [5, 256, 256, 1, False], [5, 256, 256, 1, False],
+ [5, 256, 256, 1, False], [5, 256, 256, 1, False]],
+ "blocks6": [[5, 256, 512, 2, True], [5, 512, 512, 1, True]]
+}
+
+
+def make_divisible(v, divisor=8, min_value=None):
+ if min_value is None:
+ min_value = divisor
+ new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
+ if new_v < 0.9 * v:
+ new_v += divisor
+ return new_v
+
+
+class ConvBNLayer(TheseusLayer):
+
+ def __init__(self, num_channels, filter_size, num_filters, stride, num_groups=1):
+ super().__init__()
+
+ self.conv = Conv2D(in_channels=num_channels,
+ out_channels=num_filters,
+ kernel_size=filter_size,
+ stride=stride,
+ padding=(filter_size - 1) // 2,
+ groups=num_groups,
+ weight_attr=ParamAttr(initializer=KaimingNormal()),
+ bias_attr=False)
+
+ self.bn = BatchNorm(num_filters,
+ param_attr=ParamAttr(regularizer=L2Decay(0.0)),
+ bias_attr=ParamAttr(regularizer=L2Decay(0.0)))
+ self.hardswish = nn.Hardswish()
+
+ def forward(self, x):
+ x = self.conv(x)
+ x = self.bn(x)
+ x = self.hardswish(x)
+ return x
+
+
+class DepthwiseSeparable(TheseusLayer):
+
+ def __init__(self, num_channels, num_filters, stride, dw_size=3, use_se=False):
+ super().__init__()
+ self.use_se = use_se
+ self.dw_conv = ConvBNLayer(num_channels=num_channels,
+ num_filters=num_channels,
+ filter_size=dw_size,
+ stride=stride,
+ num_groups=num_channels)
+ if use_se:
+ self.se = SEModule(num_channels)
+ self.pw_conv = ConvBNLayer(num_channels=num_channels, filter_size=1, num_filters=num_filters, stride=1)
+
+ def forward(self, x):
+ x = self.dw_conv(x)
+ if self.use_se:
+ x = self.se(x)
+ x = self.pw_conv(x)
+ return x
+
+
+class SEModule(TheseusLayer):
+
+ def __init__(self, channel, reduction=4):
+ super().__init__()
+ self.avg_pool = AdaptiveAvgPool2D(1)
+ self.conv1 = Conv2D(in_channels=channel, out_channels=channel // reduction, kernel_size=1, stride=1, padding=0)
+ self.relu = nn.ReLU()
+ self.conv2 = Conv2D(in_channels=channel // reduction, out_channels=channel, kernel_size=1, stride=1, padding=0)
+ self.hardsigmoid = nn.Hardsigmoid()
+
+ def forward(self, x):
+ identity = x
+ x = self.avg_pool(x)
+ x = self.conv1(x)
+ x = self.relu(x)
+ x = self.conv2(x)
+ x = self.hardsigmoid(x)
+ x = paddle.multiply(x=identity, y=x)
+ return x
+
+
+class PPLCNet(TheseusLayer):
+
+ def __init__(self,
+ stages_pattern,
+ scale=1.0,
+ class_num=1000,
+ dropout_prob=0.2,
+ class_expand=1280,
+ return_patterns=None,
+ return_stages=None):
+ super().__init__()
+ self.scale = scale
+ self.class_expand = class_expand
+
+ self.conv1 = ConvBNLayer(num_channels=3, filter_size=3, num_filters=make_divisible(16 * scale), stride=2)
+
+ self.blocks2 = nn.Sequential(*[
+ DepthwiseSeparable(num_channels=make_divisible(in_c * scale),
+ num_filters=make_divisible(out_c * scale),
+ dw_size=k,
+ stride=s,
+ use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks2"])
+ ])
+
+ self.blocks3 = nn.Sequential(*[
+ DepthwiseSeparable(num_channels=make_divisible(in_c * scale),
+ num_filters=make_divisible(out_c * scale),
+ dw_size=k,
+ stride=s,
+ use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks3"])
+ ])
+
+ self.blocks4 = nn.Sequential(*[
+ DepthwiseSeparable(num_channels=make_divisible(in_c * scale),
+ num_filters=make_divisible(out_c * scale),
+ dw_size=k,
+ stride=s,
+ use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks4"])
+ ])
+
+ self.blocks5 = nn.Sequential(*[
+ DepthwiseSeparable(num_channels=make_divisible(in_c * scale),
+ num_filters=make_divisible(out_c * scale),
+ dw_size=k,
+ stride=s,
+ use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks5"])
+ ])
+
+ self.blocks6 = nn.Sequential(*[
+ DepthwiseSeparable(num_channels=make_divisible(in_c * scale),
+ num_filters=make_divisible(out_c * scale),
+ dw_size=k,
+ stride=s,
+ use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks6"])
+ ])
+
+ self.avg_pool = AdaptiveAvgPool2D(1)
+
+ self.last_conv = Conv2D(in_channels=make_divisible(NET_CONFIG["blocks6"][-1][2] * scale),
+ out_channels=self.class_expand,
+ kernel_size=1,
+ stride=1,
+ padding=0,
+ bias_attr=False)
+
+ self.hardswish = nn.Hardswish()
+ self.dropout = Dropout(p=dropout_prob, mode="downscale_in_infer")
+ self.flatten = nn.Flatten(start_axis=1, stop_axis=-1)
+
+ self.fc = Linear(self.class_expand, class_num)
+
+ super().init_res(stages_pattern, return_patterns=return_patterns, return_stages=return_stages)
+
+ def forward(self, x):
+ x = self.conv1(x)
+
+ x = self.blocks2(x)
+ x = self.blocks3(x)
+ x = self.blocks4(x)
+ x = self.blocks5(x)
+ x = self.blocks6(x)
+
+ x = self.avg_pool(x)
+ x = self.last_conv(x)
+ x = self.hardswish(x)
+ x = self.dropout(x)
+ x = self.flatten(x)
+ x = self.fc(x)
+ return x
+
+
+def PPLCNet_x0_75(pretrained=False, use_ssld=False, **kwargs):
+ model = PPLCNet(scale=0.75, stages_pattern=MODEL_STAGES_PATTERN["PPLCNet"], **kwargs)
+ return model
diff --git a/modules/image/classification/pplcnet_x0_75_imagenet/module.py b/modules/image/classification/pplcnet_x0_75_imagenet/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..7ce6c2eaca491c21266d87307110f032d92e007a
--- /dev/null
+++ b/modules/image/classification/pplcnet_x0_75_imagenet/module.py
@@ -0,0 +1,154 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import copy
+import os
+
+import cv2
+import numpy as np
+import paddle
+from skimage.io import imread
+from skimage.transform import rescale
+from skimage.transform import resize
+
+import paddlehub as hub
+from .model import PPLCNet_x0_75
+from .processor import base64_to_cv2
+from .processor import create_operators
+from .processor import Topk
+from .utils import get_config
+from paddlehub.module.module import moduleinfo
+from paddlehub.module.module import runnable
+from paddlehub.module.module import serving
+
+
+@moduleinfo(name="pplcnet_x0_75_imagenet",
+ type="cv/classification",
+ author="paddlepaddle",
+ author_email="",
+ summary="",
+ version="1.0.0")
+class PPLcNet_x0_75:
+
+ def __init__(self):
+ self.config = get_config(os.path.join(self.directory, 'PPLCNet_x0_75.yaml'), show=False)
+ self.label_path = os.path.join(self.directory, 'imagenet1k_label_list.txt')
+ self.pretrain_path = os.path.join(self.directory, 'PPLCNet_x0_75_pretrained.pdparams')
+ self.config['Infer']['PostProcess']['class_id_map_file'] = self.label_path
+ self.model = PPLCNet_x0_75()
+ param_state_dict = paddle.load(self.pretrain_path)
+ self.model.set_dict(param_state_dict)
+ self.preprocess_funcs = create_operators(self.config["Infer"]["transforms"])
+
+ def classification(self,
+ images: list = None,
+ paths: list = None,
+ batch_size: int = 1,
+ use_gpu: bool = False,
+ top_k: int = 1):
+ '''
+ Args:
+ images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR.
+ paths (list[str]): The paths of images.
+ batch_size (int): batch size.
+ use_gpu (bool): Whether to use gpu.
+ top_k (int): Return top k results.
+
+ Returns:
+ res (list[dict]): The classfication results, each result dict contains key 'class_ids', 'scores' and 'label_names'.
+ '''
+ postprocess_func = Topk(top_k, self.label_path)
+ inputs = []
+ results = []
+ paddle.disable_static()
+ place = 'gpu:0' if use_gpu else 'cpu'
+ place = paddle.set_device(place)
+ if images == None and paths == None:
+ print('No image provided. Please input an image or a image path.')
+ return
+
+ if images != None:
+ for image in images:
+ image = image[:, :, ::-1]
+ inputs.append(image)
+
+ if paths != None:
+ for path in paths:
+ image = cv2.imread(path)[:, :, ::-1]
+ inputs.append(image)
+
+ batch_data = []
+ for idx, imagedata in enumerate(inputs):
+ for process in self.preprocess_funcs:
+ imagedata = process(imagedata)
+ batch_data.append(imagedata)
+ if len(batch_data) >= batch_size or idx == len(inputs) - 1:
+ batch_tensor = paddle.to_tensor(batch_data)
+ out = self.model(batch_tensor)
+ if isinstance(out, list):
+ out = out[0]
+ if isinstance(out, dict) and "logits" in out:
+ out = out["logits"]
+ if isinstance(out, dict) and "output" in out:
+ out = out["output"]
+ result = postprocess_func(out)
+ results.extend(result)
+ batch_data.clear()
+ return results
+
+ @runnable
+ def run_cmd(self, argvs: list):
+ """
+ Run as a command.
+ """
+ self.parser = argparse.ArgumentParser(description="Run the {} module.".format(self.name),
+ prog='hub run {}'.format(self.name),
+ usage='%(prog)s',
+ add_help=True)
+
+ self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
+ self.arg_config_group = self.parser.add_argument_group(
+ title="Config options", description="Run configuration for controlling module behavior, not required.")
+ self.add_module_config_arg()
+ self.add_module_input_arg()
+ self.args = self.parser.parse_args(argvs)
+ results = self.classification(paths=[self.args.input_path],
+ use_gpu=self.args.use_gpu,
+ batch_size=self.args.batch_size,
+ top_k=self.args.top_k)
+ return results
+
+ @serving
+ def serving_method(self, images, **kwargs):
+ """
+ Run as a service.
+ """
+ images_decode = [base64_to_cv2(image) for image in images]
+ results = self.classification(images=images_decode, **kwargs)
+ return results
+
+ def add_module_config_arg(self):
+ """
+ Add the command config options.
+ """
+ self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not")
+
+ self.arg_config_group.add_argument('--batch_size', type=int, default=1, help='batch size')
+ self.arg_config_group.add_argument('--top_k', type=int, default=1, help='Return top k results.')
+
+ def add_module_input_arg(self):
+ """
+ Add the command input options.
+ """
+ self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.")
diff --git a/modules/image/classification/pplcnet_x0_75_imagenet/processor.py b/modules/image/classification/pplcnet_x0_75_imagenet/processor.py
new file mode 100644
index 0000000000000000000000000000000000000000..40cab3917ecaef50cd47d0abb76bbd5d49062bf8
--- /dev/null
+++ b/modules/image/classification/pplcnet_x0_75_imagenet/processor.py
@@ -0,0 +1,374 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+
+import base64
+import inspect
+import math
+import os
+import random
+import sys
+from functools import partial
+
+import cv2
+import numpy as np
+import paddle
+import paddle.nn.functional as F
+import six
+from paddle.vision.transforms import ColorJitter as RawColorJitter
+from PIL import Image
+
+
+def create_operators(params, class_num=None):
+ """
+ create operators based on the config
+
+ Args:
+ params(list): a dict list, used to create some operators
+ """
+ assert isinstance(params, list), ('operator config should be a list')
+ ops = []
+ current_module = sys.modules[__name__]
+ for operator in params:
+ assert isinstance(operator, dict) and len(operator) == 1, "yaml format error"
+ op_name = list(operator)[0]
+ param = {} if operator[op_name] is None else operator[op_name]
+ op_func = getattr(current_module, op_name)
+ if "class_num" in inspect.getfullargspec(op_func).args:
+ param.update({"class_num": class_num})
+ op = op_func(**param)
+ ops.append(op)
+
+ return ops
+
+
+class UnifiedResize(object):
+
+ def __init__(self, interpolation=None, backend="cv2"):
+ _cv2_interp_from_str = {
+ 'nearest': cv2.INTER_NEAREST,
+ 'bilinear': cv2.INTER_LINEAR,
+ 'area': cv2.INTER_AREA,
+ 'bicubic': cv2.INTER_CUBIC,
+ 'lanczos': cv2.INTER_LANCZOS4
+ }
+ _pil_interp_from_str = {
+ 'nearest': Image.NEAREST,
+ 'bilinear': Image.BILINEAR,
+ 'bicubic': Image.BICUBIC,
+ 'box': Image.BOX,
+ 'lanczos': Image.LANCZOS,
+ 'hamming': Image.HAMMING
+ }
+
+ def _pil_resize(src, size, resample):
+ pil_img = Image.fromarray(src)
+ pil_img = pil_img.resize(size, resample)
+ return np.asarray(pil_img)
+
+ if backend.lower() == "cv2":
+ if isinstance(interpolation, str):
+ interpolation = _cv2_interp_from_str[interpolation.lower()]
+ # compatible with opencv < version 4.4.0
+ elif interpolation is None:
+ interpolation = cv2.INTER_LINEAR
+ self.resize_func = partial(cv2.resize, interpolation=interpolation)
+ elif backend.lower() == "pil":
+ if isinstance(interpolation, str):
+ interpolation = _pil_interp_from_str[interpolation.lower()]
+ self.resize_func = partial(_pil_resize, resample=interpolation)
+ else:
+ self.resize_func = cv2.resize
+
+ def __call__(self, src, size):
+ return self.resize_func(src, size)
+
+
+class OperatorParamError(ValueError):
+ """ OperatorParamError
+ """
+ pass
+
+
+class DecodeImage(object):
+ """ decode image """
+
+ def __init__(self, to_rgb=True, to_np=False, channel_first=False):
+ self.to_rgb = to_rgb
+ self.to_np = to_np # to numpy
+ self.channel_first = channel_first # only enabled when to_np is True
+
+ def __call__(self, img):
+ if six.PY2:
+ assert type(img) is str and len(img) > 0, "invalid input 'img' in DecodeImage"
+ else:
+ assert type(img) is bytes and len(img) > 0, "invalid input 'img' in DecodeImage"
+ data = np.frombuffer(img, dtype='uint8')
+ img = cv2.imdecode(data, 1)
+ if self.to_rgb:
+ assert img.shape[2] == 3, 'invalid shape of image[%s]' % (img.shape)
+ img = img[:, :, ::-1]
+
+ if self.channel_first:
+ img = img.transpose((2, 0, 1))
+
+ return img
+
+
+class ResizeImage(object):
+ """ resize image """
+
+ def __init__(self, size=None, resize_short=None, interpolation=None, backend="cv2"):
+ if resize_short is not None and resize_short > 0:
+ self.resize_short = resize_short
+ self.w = None
+ self.h = None
+ elif size is not None:
+ self.resize_short = None
+ self.w = size if type(size) is int else size[0]
+ self.h = size if type(size) is int else size[1]
+ else:
+ raise OperatorParamError("invalid params for ReisizeImage for '\
+ 'both 'size' and 'resize_short' are None")
+
+ self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend)
+
+ def __call__(self, img):
+ img_h, img_w = img.shape[:2]
+ if self.resize_short is not None:
+ percent = float(self.resize_short) / min(img_w, img_h)
+ w = int(round(img_w * percent))
+ h = int(round(img_h * percent))
+ else:
+ w = self.w
+ h = self.h
+ return self._resize_func(img, (w, h))
+
+
+class CropImage(object):
+ """ crop image """
+
+ def __init__(self, size):
+ if type(size) is int:
+ self.size = (size, size)
+ else:
+ self.size = size # (h, w)
+
+ def __call__(self, img):
+ w, h = self.size
+ img_h, img_w = img.shape[:2]
+ w_start = (img_w - w) // 2
+ h_start = (img_h - h) // 2
+
+ w_end = w_start + w
+ h_end = h_start + h
+ return img[h_start:h_end, w_start:w_end, :]
+
+
+class RandCropImage(object):
+ """ random crop image """
+
+ def __init__(self, size, scale=None, ratio=None, interpolation=None, backend="cv2"):
+ if type(size) is int:
+ self.size = (size, size) # (h, w)
+ else:
+ self.size = size
+
+ self.scale = [0.08, 1.0] if scale is None else scale
+ self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio
+
+ self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend)
+
+ def __call__(self, img):
+ size = self.size
+ scale = self.scale
+ ratio = self.ratio
+
+ aspect_ratio = math.sqrt(random.uniform(*ratio))
+ w = 1. * aspect_ratio
+ h = 1. / aspect_ratio
+
+ img_h, img_w = img.shape[:2]
+
+ bound = min((float(img_w) / img_h) / (w**2), (float(img_h) / img_w) / (h**2))
+ scale_max = min(scale[1], bound)
+ scale_min = min(scale[0], bound)
+
+ target_area = img_w * img_h * random.uniform(scale_min, scale_max)
+ target_size = math.sqrt(target_area)
+ w = int(target_size * w)
+ h = int(target_size * h)
+
+ i = random.randint(0, img_w - w)
+ j = random.randint(0, img_h - h)
+
+ img = img[j:j + h, i:i + w, :]
+
+ return self._resize_func(img, size)
+
+
+class RandFlipImage(object):
+ """ random flip image
+ flip_code:
+ 1: Flipped Horizontally
+ 0: Flipped Vertically
+ -1: Flipped Horizontally & Vertically
+ """
+
+ def __init__(self, flip_code=1):
+ assert flip_code in [-1, 0, 1], "flip_code should be a value in [-1, 0, 1]"
+ self.flip_code = flip_code
+
+ def __call__(self, img):
+ if random.randint(0, 1) == 1:
+ return cv2.flip(img, self.flip_code)
+ else:
+ return img
+
+
+class NormalizeImage(object):
+ """ normalize image such as substract mean, divide std
+ """
+
+ def __init__(self, scale=None, mean=None, std=None, order='chw', output_fp16=False, channel_num=3):
+ if isinstance(scale, str):
+ scale = eval(scale)
+ assert channel_num in [3, 4], "channel number of input image should be set to 3 or 4."
+ self.channel_num = channel_num
+ self.output_dtype = 'float16' if output_fp16 else 'float32'
+ self.scale = np.float32(scale if scale is not None else 1.0 / 255.0)
+ self.order = order
+ mean = mean if mean is not None else [0.485, 0.456, 0.406]
+ std = std if std is not None else [0.229, 0.224, 0.225]
+
+ shape = (3, 1, 1) if self.order == 'chw' else (1, 1, 3)
+ self.mean = np.array(mean).reshape(shape).astype('float32')
+ self.std = np.array(std).reshape(shape).astype('float32')
+
+ def __call__(self, img):
+ from PIL import Image
+ if isinstance(img, Image.Image):
+ img = np.array(img)
+
+ assert isinstance(img, np.ndarray), "invalid input 'img' in NormalizeImage"
+
+ img = (img.astype('float32') * self.scale - self.mean) / self.std
+
+ if self.channel_num == 4:
+ img_h = img.shape[1] if self.order == 'chw' else img.shape[0]
+ img_w = img.shape[2] if self.order == 'chw' else img.shape[1]
+ pad_zeros = np.zeros((1, img_h, img_w)) if self.order == 'chw' else np.zeros((img_h, img_w, 1))
+ img = (np.concatenate((img, pad_zeros), axis=0) if self.order == 'chw' else np.concatenate(
+ (img, pad_zeros), axis=2))
+ return img.astype(self.output_dtype)
+
+
+class ToCHWImage(object):
+ """ convert hwc image to chw image
+ """
+
+ def __init__(self):
+ pass
+
+ def __call__(self, img):
+ from PIL import Image
+ if isinstance(img, Image.Image):
+ img = np.array(img)
+
+ return img.transpose((2, 0, 1))
+
+
+class ColorJitter(RawColorJitter):
+ """ColorJitter.
+ """
+
+ def __init__(self, *args, **kwargs):
+ super().__init__(*args, **kwargs)
+
+ def __call__(self, img):
+ if not isinstance(img, Image.Image):
+ img = np.ascontiguousarray(img)
+ img = Image.fromarray(img)
+ img = super()._apply_image(img)
+ if isinstance(img, Image.Image):
+ img = np.asarray(img)
+ return img
+
+
+def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+
+class Topk(object):
+
+ def __init__(self, topk=1, class_id_map_file=None):
+ assert isinstance(topk, (int, ))
+ self.class_id_map = self.parse_class_id_map(class_id_map_file)
+ self.topk = topk
+
+ def parse_class_id_map(self, class_id_map_file):
+ if class_id_map_file is None:
+ return None
+ if not os.path.exists(class_id_map_file):
+ print(
+ "Warning: If want to use your own label_dict, please input legal path!\nOtherwise label_names will be empty!"
+ )
+ return None
+
+ try:
+ class_id_map = {}
+ with open(class_id_map_file, "r") as fin:
+ lines = fin.readlines()
+ for line in lines:
+ partition = line.split("\n")[0].partition(" ")
+ class_id_map[int(partition[0])] = str(partition[-1])
+ except Exception as ex:
+ print(ex)
+ class_id_map = None
+ return class_id_map
+
+ def __call__(self, x, file_names=None, multilabel=False):
+ assert isinstance(x, paddle.Tensor)
+ if file_names is not None:
+ assert x.shape[0] == len(file_names)
+ x = F.softmax(x, axis=-1) if not multilabel else F.sigmoid(x)
+ x = x.numpy()
+ y = []
+ for idx, probs in enumerate(x):
+ index = probs.argsort(axis=0)[-self.topk:][::-1].astype("int32") if not multilabel else np.where(
+ probs >= 0.5)[0].astype("int32")
+ clas_id_list = []
+ score_list = []
+ label_name_list = []
+ for i in index:
+ clas_id_list.append(i.item())
+ score_list.append(probs[i].item())
+ if self.class_id_map is not None:
+ label_name_list.append(self.class_id_map[i.item()])
+ result = {
+ "class_ids": clas_id_list,
+ "scores": np.around(score_list, decimals=5).tolist(),
+ }
+ if file_names is not None:
+ result["file_name"] = file_names[idx]
+ if label_name_list is not None:
+ result["label_names"] = label_name_list
+ y.append(result)
+ return y
diff --git a/modules/image/classification/pplcnet_x0_75_imagenet/utils.py b/modules/image/classification/pplcnet_x0_75_imagenet/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..df2bc36b3050beb0256bf2266dd6b33b4590e537
--- /dev/null
+++ b/modules/image/classification/pplcnet_x0_75_imagenet/utils.py
@@ -0,0 +1,129 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import copy
+import os
+
+import yaml
+
+__all__ = ['get_config']
+
+
+class AttrDict(dict):
+
+ def __getattr__(self, key):
+ return self[key]
+
+ def __setattr__(self, key, value):
+ if key in self.__dict__:
+ self.__dict__[key] = value
+ else:
+ self[key] = value
+
+ def __deepcopy__(self, content):
+ return copy.deepcopy(dict(self))
+
+
+def create_attr_dict(yaml_config):
+ from ast import literal_eval
+ for key, value in yaml_config.items():
+ if type(value) is dict:
+ yaml_config[key] = value = AttrDict(value)
+ if isinstance(value, str):
+ try:
+ value = literal_eval(value)
+ except BaseException:
+ pass
+ if isinstance(value, AttrDict):
+ create_attr_dict(yaml_config[key])
+ else:
+ yaml_config[key] = value
+
+
+def parse_config(cfg_file):
+ """Load a config file into AttrDict"""
+ with open(cfg_file, 'r') as fopen:
+ yaml_config = AttrDict(yaml.load(fopen, Loader=yaml.SafeLoader))
+ create_attr_dict(yaml_config)
+ return yaml_config
+
+
+def override(dl, ks, v):
+ """
+ Recursively replace dict of list
+ Args:
+ dl(dict or list): dict or list to be replaced
+ ks(list): list of keys
+ v(str): value to be replaced
+ """
+
+ def str2num(v):
+ try:
+ return eval(v)
+ except Exception:
+ return v
+
+ assert isinstance(dl, (list, dict)), ("{} should be a list or a dict")
+ assert len(ks) > 0, ('lenght of keys should larger than 0')
+ if isinstance(dl, list):
+ k = str2num(ks[0])
+ if len(ks) == 1:
+ assert k < len(dl), ('index({}) out of range({})'.format(k, dl))
+ dl[k] = str2num(v)
+ else:
+ override(dl[k], ks[1:], v)
+ else:
+ if len(ks) == 1:
+ # assert ks[0] in dl, ('{} is not exist in {}'.format(ks[0], dl))
+ if not ks[0] in dl:
+ print('A new filed ({}) detected!'.format(ks[0], dl))
+ dl[ks[0]] = str2num(v)
+ else:
+ override(dl[ks[0]], ks[1:], v)
+
+
+def override_config(config, options=None):
+ """
+ Recursively override the config
+ Args:
+ config(dict): dict to be replaced
+ options(list): list of pairs(key0.key1.idx.key2=value)
+ such as: [
+ 'topk=2',
+ 'VALID.transforms.1.ResizeImage.resize_short=300'
+ ]
+ Returns:
+ config(dict): replaced config
+ """
+ if options is not None:
+ for opt in options:
+ assert isinstance(opt, str), ("option({}) should be a str".format(opt))
+ assert "=" in opt, ("option({}) should contain a ="
+ "to distinguish between key and value".format(opt))
+ pair = opt.split('=')
+ assert len(pair) == 2, ("there can be only a = in the option")
+ key, value = pair
+ keys = key.split('.')
+ override(config, keys, value)
+ return config
+
+
+def get_config(fname, overrides=None, show=False):
+ """
+ Read config from file
+ """
+ assert os.path.exists(fname), ('config file({}) is not exist'.format(fname))
+ config = parse_config(fname)
+ override_config(config, overrides)
+ return config
diff --git a/modules/image/classification/pplcnet_x1_0_imagenet/README.md b/modules/image/classification/pplcnet_x1_0_imagenet/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..22dc1b235ddca2121d42042861ada9e618fcb0ab
--- /dev/null
+++ b/modules/image/classification/pplcnet_x1_0_imagenet/README.md
@@ -0,0 +1,132 @@
+# pplcnet_x1_0_imagenet
+
+|模型名称|pplcnet_x1_0_imagenet|
+| :--- | :---: |
+|类别|图像-图像分类|
+|网络|PPLCNet|
+|数据集|ImageNet-2012|
+|是否支持Fine-tuning|否|
+|模型大小|11 MB|
+|最新更新日期|2022-04-02|
+|数据指标|Acc|
+
+
+## 一、模型基本信息
+
+
+
+- ### 模型介绍
+
+ - PP-LCNet是百度针对Intel CPU 设备以及其加速库 MKLDNN 设计的特定骨干网络 ,比起其他的轻量级的 SOTA 模型,该骨干网络可以在不增加推理时间的情况下,进一步提升模型的性能,最终大幅度超越现有的 SOTA 模型。该模型为模型规模参数scale为x1.0下的PP-LCNet模型,关于模型结构的更多信息,可参考[论文](https://arxiv.org/pdf/2109.15099.pdf)。
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 1.6.2
+
+ - paddlehub >= 1.6.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
+
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install pplcnet_x1_0_imagenet
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ $ hub run pplcnet_x1_0_imagenet --input_path "/PATH/TO/IMAGE"
+ ```
+ - 通过命令行方式实现分类模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ classifier = hub.Module(name="pplcnet_x1_0_imagenet")
+ result = classifier.classification(images=[cv2.imread('/PATH/TO/IMAGE')])
+ # or
+ # result = classifier.classification(paths=['/PATH/TO/IMAGE'])
+ ```
+
+- ### 3、API
+
+
+ - ```python
+ def classification(images=None,
+ paths=None,
+ batch_size=1,
+ use_gpu=False,
+ top_k=1):
+ ```
+ - 分类接口API。
+ - **参数**
+
+ - images (list\[numpy.ndarray\]): 图片数据,每一个图片数据的shape 均为 \[H, W, C\],颜色空间为 BGR;
+ - paths (list\[str\]): 图片的路径;
+ - batch\_size (int): batch 的大小;
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
+ - top\_k (int): 返回预测结果的前 k 个。
+
+ - **返回**
+
+ - res (list\[dict\]): 分类结果,列表的每一个元素均为字典,其中 key 包括'class_ids'(种类索引), 'scores'(置信度) 和 'label_names'(种类名称)
+
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个图像识别的在线服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+ - ```shell
+ $ hub serving start -m pplcnet_x1_0_imagenet
+ ```
+
+ - 这样就完成了一个图像识别的在线服务的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ # 发送HTTP请求
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+ headers = {"Content-type": "application/json"\}
+ url = "http://127.0.0.1:8866/predict/pplcnet_x1_0_imagenet"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # 打印预测结果
+ print(r.json()["results"])
+ ```
+
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+
+ - ```shell
+ $ hub install pplcnet_x1_0_imagenet==1.0.0
+ ```
diff --git a/modules/image/classification/pplcnet_x1_0_imagenet/model.py b/modules/image/classification/pplcnet_x1_0_imagenet/model.py
new file mode 100644
index 0000000000000000000000000000000000000000..a69f326d8d58263bcf02c2857db3d85bf738cf7b
--- /dev/null
+++ b/modules/image/classification/pplcnet_x1_0_imagenet/model.py
@@ -0,0 +1,478 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from typing import Any
+from typing import Callable
+from typing import Dict
+from typing import List
+from typing import Tuple
+from typing import Union
+
+import paddle
+import paddle.nn as nn
+from paddle import ParamAttr
+from paddle.nn import AdaptiveAvgPool2D
+from paddle.nn import BatchNorm
+from paddle.nn import Conv2D
+from paddle.nn import Dropout
+from paddle.nn import Linear
+from paddle.nn.initializer import KaimingNormal
+from paddle.regularizer import L2Decay
+
+
+class Identity(nn.Layer):
+
+ def __init__(self):
+ super(Identity, self).__init__()
+
+ def forward(self, inputs):
+ return inputs
+
+
+class TheseusLayer(nn.Layer):
+
+ def __init__(self, *args, **kwargs):
+ super(TheseusLayer, self).__init__()
+ self.res_dict = {}
+ self.res_name = self.full_name()
+ self.pruner = None
+ self.quanter = None
+
+ def _return_dict_hook(self, layer, input, output):
+ res_dict = {"output": output}
+ # 'list' is needed to avoid error raised by popping self.res_dict
+ for res_key in list(self.res_dict):
+ # clear the res_dict because the forward process may change according to input
+ res_dict[res_key] = self.res_dict.pop(res_key)
+ return res_dict
+
+ def init_res(self, stages_pattern, return_patterns=None, return_stages=None):
+ if return_patterns and return_stages:
+ msg = f"The 'return_patterns' would be ignored when 'return_stages' is set."
+ return_stages = None
+
+ if return_stages is True:
+ return_patterns = stages_pattern
+ # return_stages is int or bool
+ if type(return_stages) is int:
+ return_stages = [return_stages]
+ if isinstance(return_stages, list):
+ if max(return_stages) > len(stages_pattern) or min(return_stages) < 0:
+ msg = f"The 'return_stages' set error. Illegal value(s) have been ignored. The stages' pattern list is {stages_pattern}."
+ return_stages = [val for val in return_stages if val >= 0 and val < len(stages_pattern)]
+ return_patterns = [stages_pattern[i] for i in return_stages]
+
+ if return_patterns:
+ self.update_res(return_patterns)
+
+ def replace_sub(self, *args, **kwargs) -> None:
+ msg = "The function 'replace_sub()' is deprecated, please use 'upgrade_sublayer()' instead."
+ raise DeprecationWarning(msg)
+
+ def upgrade_sublayer(self, layer_name_pattern: Union[str, List[str]],
+ handle_func: Callable[[nn.Layer, str], nn.Layer]) -> Dict[str, nn.Layer]:
+ """use 'handle_func' to modify the sub-layer(s) specified by 'layer_name_pattern'.
+
+ Args:
+ layer_name_pattern (Union[str, List[str]]): The name of layer to be modified by 'handle_func'.
+ handle_func (Callable[[nn.Layer, str], nn.Layer]): The function to modify target layer specified by 'layer_name_pattern'. The formal params are the layer(nn.Layer) and pattern(str) that is (a member of) layer_name_pattern (when layer_name_pattern is List type). And the return is the layer processed.
+
+ Returns:
+ Dict[str, nn.Layer]: The key is the pattern and corresponding value is the result returned by 'handle_func()'.
+
+ Examples:
+
+ from paddle import nn
+ import paddleclas
+
+ def rep_func(layer: nn.Layer, pattern: str):
+ new_layer = nn.Conv2D(
+ in_channels=layer._in_channels,
+ out_channels=layer._out_channels,
+ kernel_size=5,
+ padding=2
+ )
+ return new_layer
+
+ net = paddleclas.MobileNetV1()
+ res = net.replace_sub(layer_name_pattern=["blocks[11].depthwise_conv.conv", "blocks[12].depthwise_conv.conv"], handle_func=rep_func)
+ print(res)
+ # {'blocks[11].depthwise_conv.conv': the corresponding new_layer, 'blocks[12].depthwise_conv.conv': the corresponding new_layer}
+ """
+
+ if not isinstance(layer_name_pattern, list):
+ layer_name_pattern = [layer_name_pattern]
+
+ hit_layer_pattern_list = []
+ for pattern in layer_name_pattern:
+ # parse pattern to find target layer and its parent
+ layer_list = parse_pattern_str(pattern=pattern, parent_layer=self)
+ if not layer_list:
+ continue
+ sub_layer_parent = layer_list[-2]["layer"] if len(layer_list) > 1 else self
+
+ sub_layer = layer_list[-1]["layer"]
+ sub_layer_name = layer_list[-1]["name"]
+ sub_layer_index = layer_list[-1]["index"]
+
+ new_sub_layer = handle_func(sub_layer, pattern)
+
+ if sub_layer_index:
+ getattr(sub_layer_parent, sub_layer_name)[sub_layer_index] = new_sub_layer
+ else:
+ setattr(sub_layer_parent, sub_layer_name, new_sub_layer)
+
+ hit_layer_pattern_list.append(pattern)
+ return hit_layer_pattern_list
+
+ def stop_after(self, stop_layer_name: str) -> bool:
+ """stop forward and backward after 'stop_layer_name'.
+
+ Args:
+ stop_layer_name (str): The name of layer that stop forward and backward after this layer.
+
+ Returns:
+ bool: 'True' if successful, 'False' otherwise.
+ """
+
+ layer_list = parse_pattern_str(stop_layer_name, self)
+ if not layer_list:
+ return False
+
+ parent_layer = self
+ for layer_dict in layer_list:
+ name, index = layer_dict["name"], layer_dict["index"]
+ if not set_identity(parent_layer, name, index):
+ msg = f"Failed to set the layers that after stop_layer_name('{stop_layer_name}') to IdentityLayer. The error layer's name is '{name}'."
+ return False
+ parent_layer = layer_dict["layer"]
+
+ return True
+
+ def update_res(self, return_patterns: Union[str, List[str]]) -> Dict[str, nn.Layer]:
+ """update the result(s) to be returned.
+
+ Args:
+ return_patterns (Union[str, List[str]]): The name of layer to return output.
+
+ Returns:
+ Dict[str, nn.Layer]: The pattern(str) and corresponding layer(nn.Layer) that have been set successfully.
+ """
+
+ # clear res_dict that could have been set
+ self.res_dict = {}
+
+ class Handler(object):
+
+ def __init__(self, res_dict):
+ # res_dict is a reference
+ self.res_dict = res_dict
+
+ def __call__(self, layer, pattern):
+ layer.res_dict = self.res_dict
+ layer.res_name = pattern
+ if hasattr(layer, "hook_remove_helper"):
+ layer.hook_remove_helper.remove()
+ layer.hook_remove_helper = layer.register_forward_post_hook(save_sub_res_hook)
+ return layer
+
+ handle_func = Handler(self.res_dict)
+
+ hit_layer_pattern_list = self.upgrade_sublayer(return_patterns, handle_func=handle_func)
+
+ if hasattr(self, "hook_remove_helper"):
+ self.hook_remove_helper.remove()
+ self.hook_remove_helper = self.register_forward_post_hook(self._return_dict_hook)
+
+ return hit_layer_pattern_list
+
+
+def save_sub_res_hook(layer, input, output):
+ layer.res_dict[layer.res_name] = output
+
+
+def set_identity(parent_layer: nn.Layer, layer_name: str, layer_index: str = None) -> bool:
+ """set the layer specified by layer_name and layer_index to Indentity.
+
+ Args:
+ parent_layer (nn.Layer): The parent layer of target layer specified by layer_name and layer_index.
+ layer_name (str): The name of target layer to be set to Indentity.
+ layer_index (str, optional): The index of target layer to be set to Indentity in parent_layer. Defaults to None.
+
+ Returns:
+ bool: True if successfully, False otherwise.
+ """
+
+ stop_after = False
+ for sub_layer_name in parent_layer._sub_layers:
+ if stop_after:
+ parent_layer._sub_layers[sub_layer_name] = Identity()
+ continue
+ if sub_layer_name == layer_name:
+ stop_after = True
+
+ if layer_index and stop_after:
+ stop_after = False
+ for sub_layer_index in parent_layer._sub_layers[layer_name]._sub_layers:
+ if stop_after:
+ parent_layer._sub_layers[layer_name][sub_layer_index] = Identity()
+ continue
+ if layer_index == sub_layer_index:
+ stop_after = True
+
+ return stop_after
+
+
+def parse_pattern_str(pattern: str, parent_layer: nn.Layer) -> Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]:
+ """parse the string type pattern.
+
+ Args:
+ pattern (str): The pattern to discribe layer.
+ parent_layer (nn.Layer): The root layer relative to the pattern.
+
+ Returns:
+ Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]: None if failed. If successfully, the members are layers parsed in order:
+ [
+ {"layer": first layer, "name": first layer's name parsed, "index": first layer's index parsed if exist},
+ {"layer": second layer, "name": second layer's name parsed, "index": second layer's index parsed if exist},
+ ...
+ ]
+ """
+
+ pattern_list = pattern.split(".")
+ if not pattern_list:
+ msg = f"The pattern('{pattern}') is illegal. Please check and retry."
+ return None
+
+ layer_list = []
+ while len(pattern_list) > 0:
+ if '[' in pattern_list[0]:
+ target_layer_name = pattern_list[0].split('[')[0]
+ target_layer_index = pattern_list[0].split('[')[1].split(']')[0]
+ else:
+ target_layer_name = pattern_list[0]
+ target_layer_index = None
+
+ target_layer = getattr(parent_layer, target_layer_name, None)
+
+ if target_layer is None:
+ msg = f"Not found layer named('{target_layer_name}') specifed in pattern('{pattern}')."
+ return None
+
+ if target_layer_index and target_layer:
+ if int(target_layer_index) < 0 or int(target_layer_index) >= len(target_layer):
+ msg = f"Not found layer by index('{target_layer_index}') specifed in pattern('{pattern}'). The index should < {len(target_layer)} and > 0."
+ return None
+
+ target_layer = target_layer[target_layer_index]
+
+ layer_list.append({"layer": target_layer, "name": target_layer_name, "index": target_layer_index})
+
+ pattern_list = pattern_list[1:]
+ parent_layer = target_layer
+ return layer_list
+
+
+MODEL_STAGES_PATTERN = {"PPLCNet": ["blocks2", "blocks3", "blocks4", "blocks5", "blocks6"]}
+
+# Each element(list) represents a depthwise block, which is composed of k, in_c, out_c, s, use_se.
+# k: kernel_size
+# in_c: input channel number in depthwise block
+# out_c: output channel number in depthwise block
+# s: stride in depthwise block
+# use_se: whether to use SE block
+
+NET_CONFIG = {
+ "blocks2":
+ #k, in_c, out_c, s, use_se
+ [[3, 16, 32, 1, False]],
+ "blocks3": [[3, 32, 64, 2, False], [3, 64, 64, 1, False]],
+ "blocks4": [[3, 64, 128, 2, False], [3, 128, 128, 1, False]],
+ "blocks5": [[3, 128, 256, 2, False], [5, 256, 256, 1, False], [5, 256, 256, 1, False], [5, 256, 256, 1, False],
+ [5, 256, 256, 1, False], [5, 256, 256, 1, False]],
+ "blocks6": [[5, 256, 512, 2, True], [5, 512, 512, 1, True]]
+}
+
+
+def make_divisible(v, divisor=8, min_value=None):
+ if min_value is None:
+ min_value = divisor
+ new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
+ if new_v < 0.9 * v:
+ new_v += divisor
+ return new_v
+
+
+class ConvBNLayer(TheseusLayer):
+
+ def __init__(self, num_channels, filter_size, num_filters, stride, num_groups=1):
+ super().__init__()
+
+ self.conv = Conv2D(in_channels=num_channels,
+ out_channels=num_filters,
+ kernel_size=filter_size,
+ stride=stride,
+ padding=(filter_size - 1) // 2,
+ groups=num_groups,
+ weight_attr=ParamAttr(initializer=KaimingNormal()),
+ bias_attr=False)
+
+ self.bn = BatchNorm(num_filters,
+ param_attr=ParamAttr(regularizer=L2Decay(0.0)),
+ bias_attr=ParamAttr(regularizer=L2Decay(0.0)))
+ self.hardswish = nn.Hardswish()
+
+ def forward(self, x):
+ x = self.conv(x)
+ x = self.bn(x)
+ x = self.hardswish(x)
+ return x
+
+
+class DepthwiseSeparable(TheseusLayer):
+
+ def __init__(self, num_channels, num_filters, stride, dw_size=3, use_se=False):
+ super().__init__()
+ self.use_se = use_se
+ self.dw_conv = ConvBNLayer(num_channels=num_channels,
+ num_filters=num_channels,
+ filter_size=dw_size,
+ stride=stride,
+ num_groups=num_channels)
+ if use_se:
+ self.se = SEModule(num_channels)
+ self.pw_conv = ConvBNLayer(num_channels=num_channels, filter_size=1, num_filters=num_filters, stride=1)
+
+ def forward(self, x):
+ x = self.dw_conv(x)
+ if self.use_se:
+ x = self.se(x)
+ x = self.pw_conv(x)
+ return x
+
+
+class SEModule(TheseusLayer):
+
+ def __init__(self, channel, reduction=4):
+ super().__init__()
+ self.avg_pool = AdaptiveAvgPool2D(1)
+ self.conv1 = Conv2D(in_channels=channel, out_channels=channel // reduction, kernel_size=1, stride=1, padding=0)
+ self.relu = nn.ReLU()
+ self.conv2 = Conv2D(in_channels=channel // reduction, out_channels=channel, kernel_size=1, stride=1, padding=0)
+ self.hardsigmoid = nn.Hardsigmoid()
+
+ def forward(self, x):
+ identity = x
+ x = self.avg_pool(x)
+ x = self.conv1(x)
+ x = self.relu(x)
+ x = self.conv2(x)
+ x = self.hardsigmoid(x)
+ x = paddle.multiply(x=identity, y=x)
+ return x
+
+
+class PPLCNet(TheseusLayer):
+
+ def __init__(self,
+ stages_pattern,
+ scale=1.0,
+ class_num=1000,
+ dropout_prob=0.2,
+ class_expand=1280,
+ return_patterns=None,
+ return_stages=None):
+ super().__init__()
+ self.scale = scale
+ self.class_expand = class_expand
+
+ self.conv1 = ConvBNLayer(num_channels=3, filter_size=3, num_filters=make_divisible(16 * scale), stride=2)
+
+ self.blocks2 = nn.Sequential(*[
+ DepthwiseSeparable(num_channels=make_divisible(in_c * scale),
+ num_filters=make_divisible(out_c * scale),
+ dw_size=k,
+ stride=s,
+ use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks2"])
+ ])
+
+ self.blocks3 = nn.Sequential(*[
+ DepthwiseSeparable(num_channels=make_divisible(in_c * scale),
+ num_filters=make_divisible(out_c * scale),
+ dw_size=k,
+ stride=s,
+ use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks3"])
+ ])
+
+ self.blocks4 = nn.Sequential(*[
+ DepthwiseSeparable(num_channels=make_divisible(in_c * scale),
+ num_filters=make_divisible(out_c * scale),
+ dw_size=k,
+ stride=s,
+ use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks4"])
+ ])
+
+ self.blocks5 = nn.Sequential(*[
+ DepthwiseSeparable(num_channels=make_divisible(in_c * scale),
+ num_filters=make_divisible(out_c * scale),
+ dw_size=k,
+ stride=s,
+ use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks5"])
+ ])
+
+ self.blocks6 = nn.Sequential(*[
+ DepthwiseSeparable(num_channels=make_divisible(in_c * scale),
+ num_filters=make_divisible(out_c * scale),
+ dw_size=k,
+ stride=s,
+ use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks6"])
+ ])
+
+ self.avg_pool = AdaptiveAvgPool2D(1)
+
+ self.last_conv = Conv2D(in_channels=make_divisible(NET_CONFIG["blocks6"][-1][2] * scale),
+ out_channels=self.class_expand,
+ kernel_size=1,
+ stride=1,
+ padding=0,
+ bias_attr=False)
+
+ self.hardswish = nn.Hardswish()
+ self.dropout = Dropout(p=dropout_prob, mode="downscale_in_infer")
+ self.flatten = nn.Flatten(start_axis=1, stop_axis=-1)
+
+ self.fc = Linear(self.class_expand, class_num)
+
+ super().init_res(stages_pattern, return_patterns=return_patterns, return_stages=return_stages)
+
+ def forward(self, x):
+ x = self.conv1(x)
+
+ x = self.blocks2(x)
+ x = self.blocks3(x)
+ x = self.blocks4(x)
+ x = self.blocks5(x)
+ x = self.blocks6(x)
+
+ x = self.avg_pool(x)
+ x = self.last_conv(x)
+ x = self.hardswish(x)
+ x = self.dropout(x)
+ x = self.flatten(x)
+ x = self.fc(x)
+ return x
+
+
+def PPLCNet_x1_0(pretrained=False, use_ssld=False, **kwargs):
+ model = PPLCNet(scale=1.0, stages_pattern=MODEL_STAGES_PATTERN["PPLCNet"], **kwargs)
+ return model
diff --git a/modules/image/classification/pplcnet_x1_0_imagenet/module.py b/modules/image/classification/pplcnet_x1_0_imagenet/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..3119f49bb466d5b53a76c374e6cdc8b8cbde03db
--- /dev/null
+++ b/modules/image/classification/pplcnet_x1_0_imagenet/module.py
@@ -0,0 +1,154 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import copy
+import os
+
+import cv2
+import numpy as np
+import paddle
+from skimage.io import imread
+from skimage.transform import rescale
+from skimage.transform import resize
+
+import paddlehub as hub
+from .model import PPLCNet_x1_0
+from .processor import base64_to_cv2
+from .processor import create_operators
+from .processor import Topk
+from .utils import get_config
+from paddlehub.module.module import moduleinfo
+from paddlehub.module.module import runnable
+from paddlehub.module.module import serving
+
+
+@moduleinfo(name="pplcnet_x1_0_imagenet",
+ type="cv/classification",
+ author="paddlepaddle",
+ author_email="",
+ summary="",
+ version="1.0.0")
+class PPLcNet_x1_0:
+
+ def __init__(self):
+ self.config = get_config(os.path.join(self.directory, 'PPLCNet_x1_0.yaml'), show=False)
+ self.label_path = os.path.join(self.directory, 'imagenet1k_label_list.txt')
+ self.pretrain_path = os.path.join(self.directory, 'PPLCNet_x1_0_pretrained.pdparams')
+ self.config['Infer']['PostProcess']['class_id_map_file'] = self.label_path
+ self.model = PPLCNet_x1_0()
+ param_state_dict = paddle.load(self.pretrain_path)
+ self.model.set_dict(param_state_dict)
+ self.preprocess_funcs = create_operators(self.config["Infer"]["transforms"])
+
+ def classification(self,
+ images: list = None,
+ paths: list = None,
+ batch_size: int = 1,
+ use_gpu: bool = False,
+ top_k: int = 1):
+ '''
+ Args:
+ images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR.
+ paths (list[str]): The paths of images.
+ batch_size (int): batch size.
+ use_gpu (bool): Whether to use gpu.
+ top_k (int): Return top k results.
+
+ Returns:
+ res (list[dict]): The classfication results, each result dict contains key 'class_ids', 'scores' and 'label_names'.
+ '''
+ postprocess_func = Topk(top_k, self.label_path)
+ inputs = []
+ results = []
+ paddle.disable_static()
+ place = 'gpu:0' if use_gpu else 'cpu'
+ place = paddle.set_device(place)
+ if images == None and paths == None:
+ print('No image provided. Please input an image or a image path.')
+ return
+
+ if images != None:
+ for image in images:
+ image = image[:, :, ::-1]
+ inputs.append(image)
+
+ if paths != None:
+ for path in paths:
+ image = cv2.imread(path)[:, :, ::-1]
+ inputs.append(image)
+
+ batch_data = []
+ for idx, imagedata in enumerate(inputs):
+ for process in self.preprocess_funcs:
+ imagedata = process(imagedata)
+ batch_data.append(imagedata)
+ if len(batch_data) >= batch_size or idx == len(inputs) - 1:
+ batch_tensor = paddle.to_tensor(batch_data)
+ out = self.model(batch_tensor)
+ if isinstance(out, list):
+ out = out[0]
+ if isinstance(out, dict) and "logits" in out:
+ out = out["logits"]
+ if isinstance(out, dict) and "output" in out:
+ out = out["output"]
+ result = postprocess_func(out)
+ results.extend(result)
+ batch_data.clear()
+ return results
+
+ @runnable
+ def run_cmd(self, argvs: list):
+ """
+ Run as a command.
+ """
+ self.parser = argparse.ArgumentParser(description="Run the {} module.".format(self.name),
+ prog='hub run {}'.format(self.name),
+ usage='%(prog)s',
+ add_help=True)
+
+ self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
+ self.arg_config_group = self.parser.add_argument_group(
+ title="Config options", description="Run configuration for controlling module behavior, not required.")
+ self.add_module_config_arg()
+ self.add_module_input_arg()
+ self.args = self.parser.parse_args(argvs)
+ results = self.classification(paths=[self.args.input_path],
+ use_gpu=self.args.use_gpu,
+ batch_size=self.args.batch_size,
+ top_k=self.args.top_k)
+ return results
+
+ @serving
+ def serving_method(self, images, **kwargs):
+ """
+ Run as a service.
+ """
+ images_decode = [base64_to_cv2(image) for image in images]
+ results = self.classification(images=images_decode, **kwargs)
+ return results
+
+ def add_module_config_arg(self):
+ """
+ Add the command config options.
+ """
+ self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not")
+
+ self.arg_config_group.add_argument('--batch_size', type=int, default=1, help='batch size')
+ self.arg_config_group.add_argument('--top_k', type=int, default=1, help='Return top k results.')
+
+ def add_module_input_arg(self):
+ """
+ Add the command input options.
+ """
+ self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.")
diff --git a/modules/image/classification/pplcnet_x1_0_imagenet/processor.py b/modules/image/classification/pplcnet_x1_0_imagenet/processor.py
new file mode 100644
index 0000000000000000000000000000000000000000..40cab3917ecaef50cd47d0abb76bbd5d49062bf8
--- /dev/null
+++ b/modules/image/classification/pplcnet_x1_0_imagenet/processor.py
@@ -0,0 +1,374 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+
+import base64
+import inspect
+import math
+import os
+import random
+import sys
+from functools import partial
+
+import cv2
+import numpy as np
+import paddle
+import paddle.nn.functional as F
+import six
+from paddle.vision.transforms import ColorJitter as RawColorJitter
+from PIL import Image
+
+
+def create_operators(params, class_num=None):
+ """
+ create operators based on the config
+
+ Args:
+ params(list): a dict list, used to create some operators
+ """
+ assert isinstance(params, list), ('operator config should be a list')
+ ops = []
+ current_module = sys.modules[__name__]
+ for operator in params:
+ assert isinstance(operator, dict) and len(operator) == 1, "yaml format error"
+ op_name = list(operator)[0]
+ param = {} if operator[op_name] is None else operator[op_name]
+ op_func = getattr(current_module, op_name)
+ if "class_num" in inspect.getfullargspec(op_func).args:
+ param.update({"class_num": class_num})
+ op = op_func(**param)
+ ops.append(op)
+
+ return ops
+
+
+class UnifiedResize(object):
+
+ def __init__(self, interpolation=None, backend="cv2"):
+ _cv2_interp_from_str = {
+ 'nearest': cv2.INTER_NEAREST,
+ 'bilinear': cv2.INTER_LINEAR,
+ 'area': cv2.INTER_AREA,
+ 'bicubic': cv2.INTER_CUBIC,
+ 'lanczos': cv2.INTER_LANCZOS4
+ }
+ _pil_interp_from_str = {
+ 'nearest': Image.NEAREST,
+ 'bilinear': Image.BILINEAR,
+ 'bicubic': Image.BICUBIC,
+ 'box': Image.BOX,
+ 'lanczos': Image.LANCZOS,
+ 'hamming': Image.HAMMING
+ }
+
+ def _pil_resize(src, size, resample):
+ pil_img = Image.fromarray(src)
+ pil_img = pil_img.resize(size, resample)
+ return np.asarray(pil_img)
+
+ if backend.lower() == "cv2":
+ if isinstance(interpolation, str):
+ interpolation = _cv2_interp_from_str[interpolation.lower()]
+ # compatible with opencv < version 4.4.0
+ elif interpolation is None:
+ interpolation = cv2.INTER_LINEAR
+ self.resize_func = partial(cv2.resize, interpolation=interpolation)
+ elif backend.lower() == "pil":
+ if isinstance(interpolation, str):
+ interpolation = _pil_interp_from_str[interpolation.lower()]
+ self.resize_func = partial(_pil_resize, resample=interpolation)
+ else:
+ self.resize_func = cv2.resize
+
+ def __call__(self, src, size):
+ return self.resize_func(src, size)
+
+
+class OperatorParamError(ValueError):
+ """ OperatorParamError
+ """
+ pass
+
+
+class DecodeImage(object):
+ """ decode image """
+
+ def __init__(self, to_rgb=True, to_np=False, channel_first=False):
+ self.to_rgb = to_rgb
+ self.to_np = to_np # to numpy
+ self.channel_first = channel_first # only enabled when to_np is True
+
+ def __call__(self, img):
+ if six.PY2:
+ assert type(img) is str and len(img) > 0, "invalid input 'img' in DecodeImage"
+ else:
+ assert type(img) is bytes and len(img) > 0, "invalid input 'img' in DecodeImage"
+ data = np.frombuffer(img, dtype='uint8')
+ img = cv2.imdecode(data, 1)
+ if self.to_rgb:
+ assert img.shape[2] == 3, 'invalid shape of image[%s]' % (img.shape)
+ img = img[:, :, ::-1]
+
+ if self.channel_first:
+ img = img.transpose((2, 0, 1))
+
+ return img
+
+
+class ResizeImage(object):
+ """ resize image """
+
+ def __init__(self, size=None, resize_short=None, interpolation=None, backend="cv2"):
+ if resize_short is not None and resize_short > 0:
+ self.resize_short = resize_short
+ self.w = None
+ self.h = None
+ elif size is not None:
+ self.resize_short = None
+ self.w = size if type(size) is int else size[0]
+ self.h = size if type(size) is int else size[1]
+ else:
+ raise OperatorParamError("invalid params for ReisizeImage for '\
+ 'both 'size' and 'resize_short' are None")
+
+ self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend)
+
+ def __call__(self, img):
+ img_h, img_w = img.shape[:2]
+ if self.resize_short is not None:
+ percent = float(self.resize_short) / min(img_w, img_h)
+ w = int(round(img_w * percent))
+ h = int(round(img_h * percent))
+ else:
+ w = self.w
+ h = self.h
+ return self._resize_func(img, (w, h))
+
+
+class CropImage(object):
+ """ crop image """
+
+ def __init__(self, size):
+ if type(size) is int:
+ self.size = (size, size)
+ else:
+ self.size = size # (h, w)
+
+ def __call__(self, img):
+ w, h = self.size
+ img_h, img_w = img.shape[:2]
+ w_start = (img_w - w) // 2
+ h_start = (img_h - h) // 2
+
+ w_end = w_start + w
+ h_end = h_start + h
+ return img[h_start:h_end, w_start:w_end, :]
+
+
+class RandCropImage(object):
+ """ random crop image """
+
+ def __init__(self, size, scale=None, ratio=None, interpolation=None, backend="cv2"):
+ if type(size) is int:
+ self.size = (size, size) # (h, w)
+ else:
+ self.size = size
+
+ self.scale = [0.08, 1.0] if scale is None else scale
+ self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio
+
+ self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend)
+
+ def __call__(self, img):
+ size = self.size
+ scale = self.scale
+ ratio = self.ratio
+
+ aspect_ratio = math.sqrt(random.uniform(*ratio))
+ w = 1. * aspect_ratio
+ h = 1. / aspect_ratio
+
+ img_h, img_w = img.shape[:2]
+
+ bound = min((float(img_w) / img_h) / (w**2), (float(img_h) / img_w) / (h**2))
+ scale_max = min(scale[1], bound)
+ scale_min = min(scale[0], bound)
+
+ target_area = img_w * img_h * random.uniform(scale_min, scale_max)
+ target_size = math.sqrt(target_area)
+ w = int(target_size * w)
+ h = int(target_size * h)
+
+ i = random.randint(0, img_w - w)
+ j = random.randint(0, img_h - h)
+
+ img = img[j:j + h, i:i + w, :]
+
+ return self._resize_func(img, size)
+
+
+class RandFlipImage(object):
+ """ random flip image
+ flip_code:
+ 1: Flipped Horizontally
+ 0: Flipped Vertically
+ -1: Flipped Horizontally & Vertically
+ """
+
+ def __init__(self, flip_code=1):
+ assert flip_code in [-1, 0, 1], "flip_code should be a value in [-1, 0, 1]"
+ self.flip_code = flip_code
+
+ def __call__(self, img):
+ if random.randint(0, 1) == 1:
+ return cv2.flip(img, self.flip_code)
+ else:
+ return img
+
+
+class NormalizeImage(object):
+ """ normalize image such as substract mean, divide std
+ """
+
+ def __init__(self, scale=None, mean=None, std=None, order='chw', output_fp16=False, channel_num=3):
+ if isinstance(scale, str):
+ scale = eval(scale)
+ assert channel_num in [3, 4], "channel number of input image should be set to 3 or 4."
+ self.channel_num = channel_num
+ self.output_dtype = 'float16' if output_fp16 else 'float32'
+ self.scale = np.float32(scale if scale is not None else 1.0 / 255.0)
+ self.order = order
+ mean = mean if mean is not None else [0.485, 0.456, 0.406]
+ std = std if std is not None else [0.229, 0.224, 0.225]
+
+ shape = (3, 1, 1) if self.order == 'chw' else (1, 1, 3)
+ self.mean = np.array(mean).reshape(shape).astype('float32')
+ self.std = np.array(std).reshape(shape).astype('float32')
+
+ def __call__(self, img):
+ from PIL import Image
+ if isinstance(img, Image.Image):
+ img = np.array(img)
+
+ assert isinstance(img, np.ndarray), "invalid input 'img' in NormalizeImage"
+
+ img = (img.astype('float32') * self.scale - self.mean) / self.std
+
+ if self.channel_num == 4:
+ img_h = img.shape[1] if self.order == 'chw' else img.shape[0]
+ img_w = img.shape[2] if self.order == 'chw' else img.shape[1]
+ pad_zeros = np.zeros((1, img_h, img_w)) if self.order == 'chw' else np.zeros((img_h, img_w, 1))
+ img = (np.concatenate((img, pad_zeros), axis=0) if self.order == 'chw' else np.concatenate(
+ (img, pad_zeros), axis=2))
+ return img.astype(self.output_dtype)
+
+
+class ToCHWImage(object):
+ """ convert hwc image to chw image
+ """
+
+ def __init__(self):
+ pass
+
+ def __call__(self, img):
+ from PIL import Image
+ if isinstance(img, Image.Image):
+ img = np.array(img)
+
+ return img.transpose((2, 0, 1))
+
+
+class ColorJitter(RawColorJitter):
+ """ColorJitter.
+ """
+
+ def __init__(self, *args, **kwargs):
+ super().__init__(*args, **kwargs)
+
+ def __call__(self, img):
+ if not isinstance(img, Image.Image):
+ img = np.ascontiguousarray(img)
+ img = Image.fromarray(img)
+ img = super()._apply_image(img)
+ if isinstance(img, Image.Image):
+ img = np.asarray(img)
+ return img
+
+
+def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+
+class Topk(object):
+
+ def __init__(self, topk=1, class_id_map_file=None):
+ assert isinstance(topk, (int, ))
+ self.class_id_map = self.parse_class_id_map(class_id_map_file)
+ self.topk = topk
+
+ def parse_class_id_map(self, class_id_map_file):
+ if class_id_map_file is None:
+ return None
+ if not os.path.exists(class_id_map_file):
+ print(
+ "Warning: If want to use your own label_dict, please input legal path!\nOtherwise label_names will be empty!"
+ )
+ return None
+
+ try:
+ class_id_map = {}
+ with open(class_id_map_file, "r") as fin:
+ lines = fin.readlines()
+ for line in lines:
+ partition = line.split("\n")[0].partition(" ")
+ class_id_map[int(partition[0])] = str(partition[-1])
+ except Exception as ex:
+ print(ex)
+ class_id_map = None
+ return class_id_map
+
+ def __call__(self, x, file_names=None, multilabel=False):
+ assert isinstance(x, paddle.Tensor)
+ if file_names is not None:
+ assert x.shape[0] == len(file_names)
+ x = F.softmax(x, axis=-1) if not multilabel else F.sigmoid(x)
+ x = x.numpy()
+ y = []
+ for idx, probs in enumerate(x):
+ index = probs.argsort(axis=0)[-self.topk:][::-1].astype("int32") if not multilabel else np.where(
+ probs >= 0.5)[0].astype("int32")
+ clas_id_list = []
+ score_list = []
+ label_name_list = []
+ for i in index:
+ clas_id_list.append(i.item())
+ score_list.append(probs[i].item())
+ if self.class_id_map is not None:
+ label_name_list.append(self.class_id_map[i.item()])
+ result = {
+ "class_ids": clas_id_list,
+ "scores": np.around(score_list, decimals=5).tolist(),
+ }
+ if file_names is not None:
+ result["file_name"] = file_names[idx]
+ if label_name_list is not None:
+ result["label_names"] = label_name_list
+ y.append(result)
+ return y
diff --git a/modules/image/classification/pplcnet_x1_0_imagenet/utils.py b/modules/image/classification/pplcnet_x1_0_imagenet/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..df2bc36b3050beb0256bf2266dd6b33b4590e537
--- /dev/null
+++ b/modules/image/classification/pplcnet_x1_0_imagenet/utils.py
@@ -0,0 +1,129 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import copy
+import os
+
+import yaml
+
+__all__ = ['get_config']
+
+
+class AttrDict(dict):
+
+ def __getattr__(self, key):
+ return self[key]
+
+ def __setattr__(self, key, value):
+ if key in self.__dict__:
+ self.__dict__[key] = value
+ else:
+ self[key] = value
+
+ def __deepcopy__(self, content):
+ return copy.deepcopy(dict(self))
+
+
+def create_attr_dict(yaml_config):
+ from ast import literal_eval
+ for key, value in yaml_config.items():
+ if type(value) is dict:
+ yaml_config[key] = value = AttrDict(value)
+ if isinstance(value, str):
+ try:
+ value = literal_eval(value)
+ except BaseException:
+ pass
+ if isinstance(value, AttrDict):
+ create_attr_dict(yaml_config[key])
+ else:
+ yaml_config[key] = value
+
+
+def parse_config(cfg_file):
+ """Load a config file into AttrDict"""
+ with open(cfg_file, 'r') as fopen:
+ yaml_config = AttrDict(yaml.load(fopen, Loader=yaml.SafeLoader))
+ create_attr_dict(yaml_config)
+ return yaml_config
+
+
+def override(dl, ks, v):
+ """
+ Recursively replace dict of list
+ Args:
+ dl(dict or list): dict or list to be replaced
+ ks(list): list of keys
+ v(str): value to be replaced
+ """
+
+ def str2num(v):
+ try:
+ return eval(v)
+ except Exception:
+ return v
+
+ assert isinstance(dl, (list, dict)), ("{} should be a list or a dict")
+ assert len(ks) > 0, ('lenght of keys should larger than 0')
+ if isinstance(dl, list):
+ k = str2num(ks[0])
+ if len(ks) == 1:
+ assert k < len(dl), ('index({}) out of range({})'.format(k, dl))
+ dl[k] = str2num(v)
+ else:
+ override(dl[k], ks[1:], v)
+ else:
+ if len(ks) == 1:
+ # assert ks[0] in dl, ('{} is not exist in {}'.format(ks[0], dl))
+ if not ks[0] in dl:
+ print('A new filed ({}) detected!'.format(ks[0], dl))
+ dl[ks[0]] = str2num(v)
+ else:
+ override(dl[ks[0]], ks[1:], v)
+
+
+def override_config(config, options=None):
+ """
+ Recursively override the config
+ Args:
+ config(dict): dict to be replaced
+ options(list): list of pairs(key0.key1.idx.key2=value)
+ such as: [
+ 'topk=2',
+ 'VALID.transforms.1.ResizeImage.resize_short=300'
+ ]
+ Returns:
+ config(dict): replaced config
+ """
+ if options is not None:
+ for opt in options:
+ assert isinstance(opt, str), ("option({}) should be a str".format(opt))
+ assert "=" in opt, ("option({}) should contain a ="
+ "to distinguish between key and value".format(opt))
+ pair = opt.split('=')
+ assert len(pair) == 2, ("there can be only a = in the option")
+ key, value = pair
+ keys = key.split('.')
+ override(config, keys, value)
+ return config
+
+
+def get_config(fname, overrides=None, show=False):
+ """
+ Read config from file
+ """
+ assert os.path.exists(fname), ('config file({}) is not exist'.format(fname))
+ config = parse_config(fname)
+ override_config(config, overrides)
+ return config
diff --git a/modules/image/classification/pplcnet_x1_5_imagenet/README.md b/modules/image/classification/pplcnet_x1_5_imagenet/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..eb8342effacea9ca4b3296002c1aa2577cf87ad3
--- /dev/null
+++ b/modules/image/classification/pplcnet_x1_5_imagenet/README.md
@@ -0,0 +1,132 @@
+# pplcnet_x1_5_imagenet
+
+|模型名称|pplcnet_x1_5_imagenet|
+| :--- | :---: |
+|类别|图像-图像分类|
+|网络|PPLCNet|
+|数据集|ImageNet-2012|
+|是否支持Fine-tuning|否|
+|模型大小|17 MB|
+|最新更新日期|2022-04-02|
+|数据指标|Acc|
+
+
+## 一、模型基本信息
+
+
+
+- ### 模型介绍
+
+ - PP-LCNet是百度针对Intel CPU 设备以及其加速库 MKLDNN 设计的特定骨干网络 ,比起其他的轻量级的 SOTA 模型,该骨干网络可以在不增加推理时间的情况下,进一步提升模型的性能,最终大幅度超越现有的 SOTA 模型。该模型为模型规模参数scale为x1.5下的PP-LCNet模型,关于模型结构的更多信息,可参考[论文](https://arxiv.org/pdf/2109.15099.pdf)。
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 1.6.2
+
+ - paddlehub >= 1.6.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
+
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install pplcnet_x1_5_imagenet
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ $ hub run pplcnet_x1_5_imagenet --input_path "/PATH/TO/IMAGE"
+ ```
+ - 通过命令行方式实现分类模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ classifier = hub.Module(name="pplcnet_x1_5_imagenet")
+ result = classifier.classification(images=[cv2.imread('/PATH/TO/IMAGE')])
+ # or
+ # result = classifier.classification(paths=['/PATH/TO/IMAGE'])
+ ```
+
+- ### 3、API
+
+
+ - ```python
+ def classification(images=None,
+ paths=None,
+ batch_size=1,
+ use_gpu=False,
+ top_k=1):
+ ```
+ - 分类接口API。
+ - **参数**
+
+ - images (list\[numpy.ndarray\]): 图片数据,每一个图片数据的shape 均为 \[H, W, C\],颜色空间为 BGR;
+ - paths (list\[str\]): 图片的路径;
+ - batch\_size (int): batch 的大小;
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
+ - top\_k (int): 返回预测结果的前 k 个。
+
+ - **返回**
+
+ - res (list\[dict\]): 分类结果,列表的每一个元素均为字典,其中 key 包括'class_ids'(种类索引), 'scores'(置信度) 和 'label_names'(种类名称)
+
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个图像识别的在线服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+ - ```shell
+ $ hub serving start -m pplcnet_x1_5_imagenet
+ ```
+
+ - 这样就完成了一个图像识别的在线服务的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ # 发送HTTP请求
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+ headers = {"Content-type": "application/json"\}
+ url = "http://127.0.0.1:8866/predict/pplcnet_x1_5_imagenet"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # 打印预测结果
+ print(r.json()["results"])
+ ```
+
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+
+ - ```shell
+ $ hub install pplcnet_x1_5_imagenet==1.0.0
+ ```
diff --git a/modules/image/classification/pplcnet_x1_5_imagenet/model.py b/modules/image/classification/pplcnet_x1_5_imagenet/model.py
new file mode 100644
index 0000000000000000000000000000000000000000..085bb5668a15d4783c4e8a7b412dd4c0a0b1610c
--- /dev/null
+++ b/modules/image/classification/pplcnet_x1_5_imagenet/model.py
@@ -0,0 +1,478 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from typing import Any
+from typing import Callable
+from typing import Dict
+from typing import List
+from typing import Tuple
+from typing import Union
+
+import paddle
+import paddle.nn as nn
+from paddle import ParamAttr
+from paddle.nn import AdaptiveAvgPool2D
+from paddle.nn import BatchNorm
+from paddle.nn import Conv2D
+from paddle.nn import Dropout
+from paddle.nn import Linear
+from paddle.nn.initializer import KaimingNormal
+from paddle.regularizer import L2Decay
+
+
+class Identity(nn.Layer):
+
+ def __init__(self):
+ super(Identity, self).__init__()
+
+ def forward(self, inputs):
+ return inputs
+
+
+class TheseusLayer(nn.Layer):
+
+ def __init__(self, *args, **kwargs):
+ super(TheseusLayer, self).__init__()
+ self.res_dict = {}
+ self.res_name = self.full_name()
+ self.pruner = None
+ self.quanter = None
+
+ def _return_dict_hook(self, layer, input, output):
+ res_dict = {"output": output}
+ # 'list' is needed to avoid error raised by popping self.res_dict
+ for res_key in list(self.res_dict):
+ # clear the res_dict because the forward process may change according to input
+ res_dict[res_key] = self.res_dict.pop(res_key)
+ return res_dict
+
+ def init_res(self, stages_pattern, return_patterns=None, return_stages=None):
+ if return_patterns and return_stages:
+ msg = f"The 'return_patterns' would be ignored when 'return_stages' is set."
+ return_stages = None
+
+ if return_stages is True:
+ return_patterns = stages_pattern
+ # return_stages is int or bool
+ if type(return_stages) is int:
+ return_stages = [return_stages]
+ if isinstance(return_stages, list):
+ if max(return_stages) > len(stages_pattern) or min(return_stages) < 0:
+ msg = f"The 'return_stages' set error. Illegal value(s) have been ignored. The stages' pattern list is {stages_pattern}."
+ return_stages = [val for val in return_stages if val >= 0 and val < len(stages_pattern)]
+ return_patterns = [stages_pattern[i] for i in return_stages]
+
+ if return_patterns:
+ self.update_res(return_patterns)
+
+ def replace_sub(self, *args, **kwargs) -> None:
+ msg = "The function 'replace_sub()' is deprecated, please use 'upgrade_sublayer()' instead."
+ raise DeprecationWarning(msg)
+
+ def upgrade_sublayer(self, layer_name_pattern: Union[str, List[str]],
+ handle_func: Callable[[nn.Layer, str], nn.Layer]) -> Dict[str, nn.Layer]:
+ """use 'handle_func' to modify the sub-layer(s) specified by 'layer_name_pattern'.
+
+ Args:
+ layer_name_pattern (Union[str, List[str]]): The name of layer to be modified by 'handle_func'.
+ handle_func (Callable[[nn.Layer, str], nn.Layer]): The function to modify target layer specified by 'layer_name_pattern'. The formal params are the layer(nn.Layer) and pattern(str) that is (a member of) layer_name_pattern (when layer_name_pattern is List type). And the return is the layer processed.
+
+ Returns:
+ Dict[str, nn.Layer]: The key is the pattern and corresponding value is the result returned by 'handle_func()'.
+
+ Examples:
+
+ from paddle import nn
+ import paddleclas
+
+ def rep_func(layer: nn.Layer, pattern: str):
+ new_layer = nn.Conv2D(
+ in_channels=layer._in_channels,
+ out_channels=layer._out_channels,
+ kernel_size=5,
+ padding=2
+ )
+ return new_layer
+
+ net = paddleclas.MobileNetV1()
+ res = net.replace_sub(layer_name_pattern=["blocks[11].depthwise_conv.conv", "blocks[12].depthwise_conv.conv"], handle_func=rep_func)
+ print(res)
+ # {'blocks[11].depthwise_conv.conv': the corresponding new_layer, 'blocks[12].depthwise_conv.conv': the corresponding new_layer}
+ """
+
+ if not isinstance(layer_name_pattern, list):
+ layer_name_pattern = [layer_name_pattern]
+
+ hit_layer_pattern_list = []
+ for pattern in layer_name_pattern:
+ # parse pattern to find target layer and its parent
+ layer_list = parse_pattern_str(pattern=pattern, parent_layer=self)
+ if not layer_list:
+ continue
+ sub_layer_parent = layer_list[-2]["layer"] if len(layer_list) > 1 else self
+
+ sub_layer = layer_list[-1]["layer"]
+ sub_layer_name = layer_list[-1]["name"]
+ sub_layer_index = layer_list[-1]["index"]
+
+ new_sub_layer = handle_func(sub_layer, pattern)
+
+ if sub_layer_index:
+ getattr(sub_layer_parent, sub_layer_name)[sub_layer_index] = new_sub_layer
+ else:
+ setattr(sub_layer_parent, sub_layer_name, new_sub_layer)
+
+ hit_layer_pattern_list.append(pattern)
+ return hit_layer_pattern_list
+
+ def stop_after(self, stop_layer_name: str) -> bool:
+ """stop forward and backward after 'stop_layer_name'.
+
+ Args:
+ stop_layer_name (str): The name of layer that stop forward and backward after this layer.
+
+ Returns:
+ bool: 'True' if successful, 'False' otherwise.
+ """
+
+ layer_list = parse_pattern_str(stop_layer_name, self)
+ if not layer_list:
+ return False
+
+ parent_layer = self
+ for layer_dict in layer_list:
+ name, index = layer_dict["name"], layer_dict["index"]
+ if not set_identity(parent_layer, name, index):
+ msg = f"Failed to set the layers that after stop_layer_name('{stop_layer_name}') to IdentityLayer. The error layer's name is '{name}'."
+ return False
+ parent_layer = layer_dict["layer"]
+
+ return True
+
+ def update_res(self, return_patterns: Union[str, List[str]]) -> Dict[str, nn.Layer]:
+ """update the result(s) to be returned.
+
+ Args:
+ return_patterns (Union[str, List[str]]): The name of layer to return output.
+
+ Returns:
+ Dict[str, nn.Layer]: The pattern(str) and corresponding layer(nn.Layer) that have been set successfully.
+ """
+
+ # clear res_dict that could have been set
+ self.res_dict = {}
+
+ class Handler(object):
+
+ def __init__(self, res_dict):
+ # res_dict is a reference
+ self.res_dict = res_dict
+
+ def __call__(self, layer, pattern):
+ layer.res_dict = self.res_dict
+ layer.res_name = pattern
+ if hasattr(layer, "hook_remove_helper"):
+ layer.hook_remove_helper.remove()
+ layer.hook_remove_helper = layer.register_forward_post_hook(save_sub_res_hook)
+ return layer
+
+ handle_func = Handler(self.res_dict)
+
+ hit_layer_pattern_list = self.upgrade_sublayer(return_patterns, handle_func=handle_func)
+
+ if hasattr(self, "hook_remove_helper"):
+ self.hook_remove_helper.remove()
+ self.hook_remove_helper = self.register_forward_post_hook(self._return_dict_hook)
+
+ return hit_layer_pattern_list
+
+
+def save_sub_res_hook(layer, input, output):
+ layer.res_dict[layer.res_name] = output
+
+
+def set_identity(parent_layer: nn.Layer, layer_name: str, layer_index: str = None) -> bool:
+ """set the layer specified by layer_name and layer_index to Indentity.
+
+ Args:
+ parent_layer (nn.Layer): The parent layer of target layer specified by layer_name and layer_index.
+ layer_name (str): The name of target layer to be set to Indentity.
+ layer_index (str, optional): The index of target layer to be set to Indentity in parent_layer. Defaults to None.
+
+ Returns:
+ bool: True if successfully, False otherwise.
+ """
+
+ stop_after = False
+ for sub_layer_name in parent_layer._sub_layers:
+ if stop_after:
+ parent_layer._sub_layers[sub_layer_name] = Identity()
+ continue
+ if sub_layer_name == layer_name:
+ stop_after = True
+
+ if layer_index and stop_after:
+ stop_after = False
+ for sub_layer_index in parent_layer._sub_layers[layer_name]._sub_layers:
+ if stop_after:
+ parent_layer._sub_layers[layer_name][sub_layer_index] = Identity()
+ continue
+ if layer_index == sub_layer_index:
+ stop_after = True
+
+ return stop_after
+
+
+def parse_pattern_str(pattern: str, parent_layer: nn.Layer) -> Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]:
+ """parse the string type pattern.
+
+ Args:
+ pattern (str): The pattern to discribe layer.
+ parent_layer (nn.Layer): The root layer relative to the pattern.
+
+ Returns:
+ Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]: None if failed. If successfully, the members are layers parsed in order:
+ [
+ {"layer": first layer, "name": first layer's name parsed, "index": first layer's index parsed if exist},
+ {"layer": second layer, "name": second layer's name parsed, "index": second layer's index parsed if exist},
+ ...
+ ]
+ """
+
+ pattern_list = pattern.split(".")
+ if not pattern_list:
+ msg = f"The pattern('{pattern}') is illegal. Please check and retry."
+ return None
+
+ layer_list = []
+ while len(pattern_list) > 0:
+ if '[' in pattern_list[0]:
+ target_layer_name = pattern_list[0].split('[')[0]
+ target_layer_index = pattern_list[0].split('[')[1].split(']')[0]
+ else:
+ target_layer_name = pattern_list[0]
+ target_layer_index = None
+
+ target_layer = getattr(parent_layer, target_layer_name, None)
+
+ if target_layer is None:
+ msg = f"Not found layer named('{target_layer_name}') specifed in pattern('{pattern}')."
+ return None
+
+ if target_layer_index and target_layer:
+ if int(target_layer_index) < 0 or int(target_layer_index) >= len(target_layer):
+ msg = f"Not found layer by index('{target_layer_index}') specifed in pattern('{pattern}'). The index should < {len(target_layer)} and > 0."
+ return None
+
+ target_layer = target_layer[target_layer_index]
+
+ layer_list.append({"layer": target_layer, "name": target_layer_name, "index": target_layer_index})
+
+ pattern_list = pattern_list[1:]
+ parent_layer = target_layer
+ return layer_list
+
+
+MODEL_STAGES_PATTERN = {"PPLCNet": ["blocks2", "blocks3", "blocks4", "blocks5", "blocks6"]}
+
+# Each element(list) represents a depthwise block, which is composed of k, in_c, out_c, s, use_se.
+# k: kernel_size
+# in_c: input channel number in depthwise block
+# out_c: output channel number in depthwise block
+# s: stride in depthwise block
+# use_se: whether to use SE block
+
+NET_CONFIG = {
+ "blocks2":
+ #k, in_c, out_c, s, use_se
+ [[3, 16, 32, 1, False]],
+ "blocks3": [[3, 32, 64, 2, False], [3, 64, 64, 1, False]],
+ "blocks4": [[3, 64, 128, 2, False], [3, 128, 128, 1, False]],
+ "blocks5": [[3, 128, 256, 2, False], [5, 256, 256, 1, False], [5, 256, 256, 1, False], [5, 256, 256, 1, False],
+ [5, 256, 256, 1, False], [5, 256, 256, 1, False]],
+ "blocks6": [[5, 256, 512, 2, True], [5, 512, 512, 1, True]]
+}
+
+
+def make_divisible(v, divisor=8, min_value=None):
+ if min_value is None:
+ min_value = divisor
+ new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
+ if new_v < 0.9 * v:
+ new_v += divisor
+ return new_v
+
+
+class ConvBNLayer(TheseusLayer):
+
+ def __init__(self, num_channels, filter_size, num_filters, stride, num_groups=1):
+ super().__init__()
+
+ self.conv = Conv2D(in_channels=num_channels,
+ out_channels=num_filters,
+ kernel_size=filter_size,
+ stride=stride,
+ padding=(filter_size - 1) // 2,
+ groups=num_groups,
+ weight_attr=ParamAttr(initializer=KaimingNormal()),
+ bias_attr=False)
+
+ self.bn = BatchNorm(num_filters,
+ param_attr=ParamAttr(regularizer=L2Decay(0.0)),
+ bias_attr=ParamAttr(regularizer=L2Decay(0.0)))
+ self.hardswish = nn.Hardswish()
+
+ def forward(self, x):
+ x = self.conv(x)
+ x = self.bn(x)
+ x = self.hardswish(x)
+ return x
+
+
+class DepthwiseSeparable(TheseusLayer):
+
+ def __init__(self, num_channels, num_filters, stride, dw_size=3, use_se=False):
+ super().__init__()
+ self.use_se = use_se
+ self.dw_conv = ConvBNLayer(num_channels=num_channels,
+ num_filters=num_channels,
+ filter_size=dw_size,
+ stride=stride,
+ num_groups=num_channels)
+ if use_se:
+ self.se = SEModule(num_channels)
+ self.pw_conv = ConvBNLayer(num_channels=num_channels, filter_size=1, num_filters=num_filters, stride=1)
+
+ def forward(self, x):
+ x = self.dw_conv(x)
+ if self.use_se:
+ x = self.se(x)
+ x = self.pw_conv(x)
+ return x
+
+
+class SEModule(TheseusLayer):
+
+ def __init__(self, channel, reduction=4):
+ super().__init__()
+ self.avg_pool = AdaptiveAvgPool2D(1)
+ self.conv1 = Conv2D(in_channels=channel, out_channels=channel // reduction, kernel_size=1, stride=1, padding=0)
+ self.relu = nn.ReLU()
+ self.conv2 = Conv2D(in_channels=channel // reduction, out_channels=channel, kernel_size=1, stride=1, padding=0)
+ self.hardsigmoid = nn.Hardsigmoid()
+
+ def forward(self, x):
+ identity = x
+ x = self.avg_pool(x)
+ x = self.conv1(x)
+ x = self.relu(x)
+ x = self.conv2(x)
+ x = self.hardsigmoid(x)
+ x = paddle.multiply(x=identity, y=x)
+ return x
+
+
+class PPLCNet(TheseusLayer):
+
+ def __init__(self,
+ stages_pattern,
+ scale=1.0,
+ class_num=1000,
+ dropout_prob=0.2,
+ class_expand=1280,
+ return_patterns=None,
+ return_stages=None):
+ super().__init__()
+ self.scale = scale
+ self.class_expand = class_expand
+
+ self.conv1 = ConvBNLayer(num_channels=3, filter_size=3, num_filters=make_divisible(16 * scale), stride=2)
+
+ self.blocks2 = nn.Sequential(*[
+ DepthwiseSeparable(num_channels=make_divisible(in_c * scale),
+ num_filters=make_divisible(out_c * scale),
+ dw_size=k,
+ stride=s,
+ use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks2"])
+ ])
+
+ self.blocks3 = nn.Sequential(*[
+ DepthwiseSeparable(num_channels=make_divisible(in_c * scale),
+ num_filters=make_divisible(out_c * scale),
+ dw_size=k,
+ stride=s,
+ use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks3"])
+ ])
+
+ self.blocks4 = nn.Sequential(*[
+ DepthwiseSeparable(num_channels=make_divisible(in_c * scale),
+ num_filters=make_divisible(out_c * scale),
+ dw_size=k,
+ stride=s,
+ use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks4"])
+ ])
+
+ self.blocks5 = nn.Sequential(*[
+ DepthwiseSeparable(num_channels=make_divisible(in_c * scale),
+ num_filters=make_divisible(out_c * scale),
+ dw_size=k,
+ stride=s,
+ use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks5"])
+ ])
+
+ self.blocks6 = nn.Sequential(*[
+ DepthwiseSeparable(num_channels=make_divisible(in_c * scale),
+ num_filters=make_divisible(out_c * scale),
+ dw_size=k,
+ stride=s,
+ use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks6"])
+ ])
+
+ self.avg_pool = AdaptiveAvgPool2D(1)
+
+ self.last_conv = Conv2D(in_channels=make_divisible(NET_CONFIG["blocks6"][-1][2] * scale),
+ out_channels=self.class_expand,
+ kernel_size=1,
+ stride=1,
+ padding=0,
+ bias_attr=False)
+
+ self.hardswish = nn.Hardswish()
+ self.dropout = Dropout(p=dropout_prob, mode="downscale_in_infer")
+ self.flatten = nn.Flatten(start_axis=1, stop_axis=-1)
+
+ self.fc = Linear(self.class_expand, class_num)
+
+ super().init_res(stages_pattern, return_patterns=return_patterns, return_stages=return_stages)
+
+ def forward(self, x):
+ x = self.conv1(x)
+
+ x = self.blocks2(x)
+ x = self.blocks3(x)
+ x = self.blocks4(x)
+ x = self.blocks5(x)
+ x = self.blocks6(x)
+
+ x = self.avg_pool(x)
+ x = self.last_conv(x)
+ x = self.hardswish(x)
+ x = self.dropout(x)
+ x = self.flatten(x)
+ x = self.fc(x)
+ return x
+
+
+def PPLCNet_x1_5(pretrained=False, use_ssld=False, **kwargs):
+ model = PPLCNet(scale=1.5, stages_pattern=MODEL_STAGES_PATTERN["PPLCNet"], **kwargs)
+ return model
diff --git a/modules/image/classification/pplcnet_x1_5_imagenet/module.py b/modules/image/classification/pplcnet_x1_5_imagenet/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..25f258db9b6b8cad10a63497431e28dcd67ddd2c
--- /dev/null
+++ b/modules/image/classification/pplcnet_x1_5_imagenet/module.py
@@ -0,0 +1,154 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import copy
+import os
+
+import cv2
+import numpy as np
+import paddle
+from skimage.io import imread
+from skimage.transform import rescale
+from skimage.transform import resize
+
+import paddlehub as hub
+from .model import PPLCNet_x1_5
+from .processor import base64_to_cv2
+from .processor import create_operators
+from .processor import Topk
+from .utils import get_config
+from paddlehub.module.module import moduleinfo
+from paddlehub.module.module import runnable
+from paddlehub.module.module import serving
+
+
+@moduleinfo(name="pplcnet_x1_5_imagenet",
+ type="cv/classification",
+ author="paddlepaddle",
+ author_email="",
+ summary="",
+ version="1.0.0")
+class PPLcNet_x1_5:
+
+ def __init__(self):
+ self.config = get_config(os.path.join(self.directory, 'PPLCNet_x1_5.yaml'), show=False)
+ self.label_path = os.path.join(self.directory, 'imagenet1k_label_list.txt')
+ self.pretrain_path = os.path.join(self.directory, 'PPLCNet_x1_5_pretrained.pdparams')
+ self.config['Infer']['PostProcess']['class_id_map_file'] = self.label_path
+ self.model = PPLCNet_x1_5()
+ param_state_dict = paddle.load(self.pretrain_path)
+ self.model.set_dict(param_state_dict)
+ self.preprocess_funcs = create_operators(self.config["Infer"]["transforms"])
+
+ def classification(self,
+ images: list = None,
+ paths: list = None,
+ batch_size: int = 1,
+ use_gpu: bool = False,
+ top_k: int = 1):
+ '''
+ Args:
+ images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR.
+ paths (list[str]): The paths of images.
+ batch_size (int): batch size.
+ use_gpu (bool): Whether to use gpu.
+ top_k (int): Return top k results.
+
+ Returns:
+ res (list[dict]): The classfication results, each result dict contains key 'class_ids', 'scores' and 'label_names'.
+ '''
+ postprocess_func = Topk(top_k, self.label_path)
+ inputs = []
+ results = []
+ paddle.disable_static()
+ place = 'gpu:0' if use_gpu else 'cpu'
+ place = paddle.set_device(place)
+ if images == None and paths == None:
+ print('No image provided. Please input an image or a image path.')
+ return
+
+ if images != None:
+ for image in images:
+ image = image[:, :, ::-1]
+ inputs.append(image)
+
+ if paths != None:
+ for path in paths:
+ image = cv2.imread(path)[:, :, ::-1]
+ inputs.append(image)
+
+ batch_data = []
+ for idx, imagedata in enumerate(inputs):
+ for process in self.preprocess_funcs:
+ imagedata = process(imagedata)
+ batch_data.append(imagedata)
+ if len(batch_data) >= batch_size or idx == len(inputs) - 1:
+ batch_tensor = paddle.to_tensor(batch_data)
+ out = self.model(batch_tensor)
+ if isinstance(out, list):
+ out = out[0]
+ if isinstance(out, dict) and "logits" in out:
+ out = out["logits"]
+ if isinstance(out, dict) and "output" in out:
+ out = out["output"]
+ result = postprocess_func(out)
+ results.extend(result)
+ batch_data.clear()
+ return results
+
+ @runnable
+ def run_cmd(self, argvs: list):
+ """
+ Run as a command.
+ """
+ self.parser = argparse.ArgumentParser(description="Run the {} module.".format(self.name),
+ prog='hub run {}'.format(self.name),
+ usage='%(prog)s',
+ add_help=True)
+
+ self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
+ self.arg_config_group = self.parser.add_argument_group(
+ title="Config options", description="Run configuration for controlling module behavior, not required.")
+ self.add_module_config_arg()
+ self.add_module_input_arg()
+ self.args = self.parser.parse_args(argvs)
+ results = self.classification(paths=[self.args.input_path],
+ use_gpu=self.args.use_gpu,
+ batch_size=self.args.batch_size,
+ top_k=self.args.top_k)
+ return results
+
+ @serving
+ def serving_method(self, images, **kwargs):
+ """
+ Run as a service.
+ """
+ images_decode = [base64_to_cv2(image) for image in images]
+ results = self.classification(images=images_decode, **kwargs)
+ return results
+
+ def add_module_config_arg(self):
+ """
+ Add the command config options.
+ """
+ self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not")
+
+ self.arg_config_group.add_argument('--batch_size', type=int, default=1, help='batch size')
+ self.arg_config_group.add_argument('--top_k', type=int, default=1, help='Return top k results.')
+
+ def add_module_input_arg(self):
+ """
+ Add the command input options.
+ """
+ self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.")
diff --git a/modules/image/classification/pplcnet_x1_5_imagenet/processor.py b/modules/image/classification/pplcnet_x1_5_imagenet/processor.py
new file mode 100644
index 0000000000000000000000000000000000000000..40cab3917ecaef50cd47d0abb76bbd5d49062bf8
--- /dev/null
+++ b/modules/image/classification/pplcnet_x1_5_imagenet/processor.py
@@ -0,0 +1,374 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+
+import base64
+import inspect
+import math
+import os
+import random
+import sys
+from functools import partial
+
+import cv2
+import numpy as np
+import paddle
+import paddle.nn.functional as F
+import six
+from paddle.vision.transforms import ColorJitter as RawColorJitter
+from PIL import Image
+
+
+def create_operators(params, class_num=None):
+ """
+ create operators based on the config
+
+ Args:
+ params(list): a dict list, used to create some operators
+ """
+ assert isinstance(params, list), ('operator config should be a list')
+ ops = []
+ current_module = sys.modules[__name__]
+ for operator in params:
+ assert isinstance(operator, dict) and len(operator) == 1, "yaml format error"
+ op_name = list(operator)[0]
+ param = {} if operator[op_name] is None else operator[op_name]
+ op_func = getattr(current_module, op_name)
+ if "class_num" in inspect.getfullargspec(op_func).args:
+ param.update({"class_num": class_num})
+ op = op_func(**param)
+ ops.append(op)
+
+ return ops
+
+
+class UnifiedResize(object):
+
+ def __init__(self, interpolation=None, backend="cv2"):
+ _cv2_interp_from_str = {
+ 'nearest': cv2.INTER_NEAREST,
+ 'bilinear': cv2.INTER_LINEAR,
+ 'area': cv2.INTER_AREA,
+ 'bicubic': cv2.INTER_CUBIC,
+ 'lanczos': cv2.INTER_LANCZOS4
+ }
+ _pil_interp_from_str = {
+ 'nearest': Image.NEAREST,
+ 'bilinear': Image.BILINEAR,
+ 'bicubic': Image.BICUBIC,
+ 'box': Image.BOX,
+ 'lanczos': Image.LANCZOS,
+ 'hamming': Image.HAMMING
+ }
+
+ def _pil_resize(src, size, resample):
+ pil_img = Image.fromarray(src)
+ pil_img = pil_img.resize(size, resample)
+ return np.asarray(pil_img)
+
+ if backend.lower() == "cv2":
+ if isinstance(interpolation, str):
+ interpolation = _cv2_interp_from_str[interpolation.lower()]
+ # compatible with opencv < version 4.4.0
+ elif interpolation is None:
+ interpolation = cv2.INTER_LINEAR
+ self.resize_func = partial(cv2.resize, interpolation=interpolation)
+ elif backend.lower() == "pil":
+ if isinstance(interpolation, str):
+ interpolation = _pil_interp_from_str[interpolation.lower()]
+ self.resize_func = partial(_pil_resize, resample=interpolation)
+ else:
+ self.resize_func = cv2.resize
+
+ def __call__(self, src, size):
+ return self.resize_func(src, size)
+
+
+class OperatorParamError(ValueError):
+ """ OperatorParamError
+ """
+ pass
+
+
+class DecodeImage(object):
+ """ decode image """
+
+ def __init__(self, to_rgb=True, to_np=False, channel_first=False):
+ self.to_rgb = to_rgb
+ self.to_np = to_np # to numpy
+ self.channel_first = channel_first # only enabled when to_np is True
+
+ def __call__(self, img):
+ if six.PY2:
+ assert type(img) is str and len(img) > 0, "invalid input 'img' in DecodeImage"
+ else:
+ assert type(img) is bytes and len(img) > 0, "invalid input 'img' in DecodeImage"
+ data = np.frombuffer(img, dtype='uint8')
+ img = cv2.imdecode(data, 1)
+ if self.to_rgb:
+ assert img.shape[2] == 3, 'invalid shape of image[%s]' % (img.shape)
+ img = img[:, :, ::-1]
+
+ if self.channel_first:
+ img = img.transpose((2, 0, 1))
+
+ return img
+
+
+class ResizeImage(object):
+ """ resize image """
+
+ def __init__(self, size=None, resize_short=None, interpolation=None, backend="cv2"):
+ if resize_short is not None and resize_short > 0:
+ self.resize_short = resize_short
+ self.w = None
+ self.h = None
+ elif size is not None:
+ self.resize_short = None
+ self.w = size if type(size) is int else size[0]
+ self.h = size if type(size) is int else size[1]
+ else:
+ raise OperatorParamError("invalid params for ReisizeImage for '\
+ 'both 'size' and 'resize_short' are None")
+
+ self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend)
+
+ def __call__(self, img):
+ img_h, img_w = img.shape[:2]
+ if self.resize_short is not None:
+ percent = float(self.resize_short) / min(img_w, img_h)
+ w = int(round(img_w * percent))
+ h = int(round(img_h * percent))
+ else:
+ w = self.w
+ h = self.h
+ return self._resize_func(img, (w, h))
+
+
+class CropImage(object):
+ """ crop image """
+
+ def __init__(self, size):
+ if type(size) is int:
+ self.size = (size, size)
+ else:
+ self.size = size # (h, w)
+
+ def __call__(self, img):
+ w, h = self.size
+ img_h, img_w = img.shape[:2]
+ w_start = (img_w - w) // 2
+ h_start = (img_h - h) // 2
+
+ w_end = w_start + w
+ h_end = h_start + h
+ return img[h_start:h_end, w_start:w_end, :]
+
+
+class RandCropImage(object):
+ """ random crop image """
+
+ def __init__(self, size, scale=None, ratio=None, interpolation=None, backend="cv2"):
+ if type(size) is int:
+ self.size = (size, size) # (h, w)
+ else:
+ self.size = size
+
+ self.scale = [0.08, 1.0] if scale is None else scale
+ self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio
+
+ self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend)
+
+ def __call__(self, img):
+ size = self.size
+ scale = self.scale
+ ratio = self.ratio
+
+ aspect_ratio = math.sqrt(random.uniform(*ratio))
+ w = 1. * aspect_ratio
+ h = 1. / aspect_ratio
+
+ img_h, img_w = img.shape[:2]
+
+ bound = min((float(img_w) / img_h) / (w**2), (float(img_h) / img_w) / (h**2))
+ scale_max = min(scale[1], bound)
+ scale_min = min(scale[0], bound)
+
+ target_area = img_w * img_h * random.uniform(scale_min, scale_max)
+ target_size = math.sqrt(target_area)
+ w = int(target_size * w)
+ h = int(target_size * h)
+
+ i = random.randint(0, img_w - w)
+ j = random.randint(0, img_h - h)
+
+ img = img[j:j + h, i:i + w, :]
+
+ return self._resize_func(img, size)
+
+
+class RandFlipImage(object):
+ """ random flip image
+ flip_code:
+ 1: Flipped Horizontally
+ 0: Flipped Vertically
+ -1: Flipped Horizontally & Vertically
+ """
+
+ def __init__(self, flip_code=1):
+ assert flip_code in [-1, 0, 1], "flip_code should be a value in [-1, 0, 1]"
+ self.flip_code = flip_code
+
+ def __call__(self, img):
+ if random.randint(0, 1) == 1:
+ return cv2.flip(img, self.flip_code)
+ else:
+ return img
+
+
+class NormalizeImage(object):
+ """ normalize image such as substract mean, divide std
+ """
+
+ def __init__(self, scale=None, mean=None, std=None, order='chw', output_fp16=False, channel_num=3):
+ if isinstance(scale, str):
+ scale = eval(scale)
+ assert channel_num in [3, 4], "channel number of input image should be set to 3 or 4."
+ self.channel_num = channel_num
+ self.output_dtype = 'float16' if output_fp16 else 'float32'
+ self.scale = np.float32(scale if scale is not None else 1.0 / 255.0)
+ self.order = order
+ mean = mean if mean is not None else [0.485, 0.456, 0.406]
+ std = std if std is not None else [0.229, 0.224, 0.225]
+
+ shape = (3, 1, 1) if self.order == 'chw' else (1, 1, 3)
+ self.mean = np.array(mean).reshape(shape).astype('float32')
+ self.std = np.array(std).reshape(shape).astype('float32')
+
+ def __call__(self, img):
+ from PIL import Image
+ if isinstance(img, Image.Image):
+ img = np.array(img)
+
+ assert isinstance(img, np.ndarray), "invalid input 'img' in NormalizeImage"
+
+ img = (img.astype('float32') * self.scale - self.mean) / self.std
+
+ if self.channel_num == 4:
+ img_h = img.shape[1] if self.order == 'chw' else img.shape[0]
+ img_w = img.shape[2] if self.order == 'chw' else img.shape[1]
+ pad_zeros = np.zeros((1, img_h, img_w)) if self.order == 'chw' else np.zeros((img_h, img_w, 1))
+ img = (np.concatenate((img, pad_zeros), axis=0) if self.order == 'chw' else np.concatenate(
+ (img, pad_zeros), axis=2))
+ return img.astype(self.output_dtype)
+
+
+class ToCHWImage(object):
+ """ convert hwc image to chw image
+ """
+
+ def __init__(self):
+ pass
+
+ def __call__(self, img):
+ from PIL import Image
+ if isinstance(img, Image.Image):
+ img = np.array(img)
+
+ return img.transpose((2, 0, 1))
+
+
+class ColorJitter(RawColorJitter):
+ """ColorJitter.
+ """
+
+ def __init__(self, *args, **kwargs):
+ super().__init__(*args, **kwargs)
+
+ def __call__(self, img):
+ if not isinstance(img, Image.Image):
+ img = np.ascontiguousarray(img)
+ img = Image.fromarray(img)
+ img = super()._apply_image(img)
+ if isinstance(img, Image.Image):
+ img = np.asarray(img)
+ return img
+
+
+def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+
+class Topk(object):
+
+ def __init__(self, topk=1, class_id_map_file=None):
+ assert isinstance(topk, (int, ))
+ self.class_id_map = self.parse_class_id_map(class_id_map_file)
+ self.topk = topk
+
+ def parse_class_id_map(self, class_id_map_file):
+ if class_id_map_file is None:
+ return None
+ if not os.path.exists(class_id_map_file):
+ print(
+ "Warning: If want to use your own label_dict, please input legal path!\nOtherwise label_names will be empty!"
+ )
+ return None
+
+ try:
+ class_id_map = {}
+ with open(class_id_map_file, "r") as fin:
+ lines = fin.readlines()
+ for line in lines:
+ partition = line.split("\n")[0].partition(" ")
+ class_id_map[int(partition[0])] = str(partition[-1])
+ except Exception as ex:
+ print(ex)
+ class_id_map = None
+ return class_id_map
+
+ def __call__(self, x, file_names=None, multilabel=False):
+ assert isinstance(x, paddle.Tensor)
+ if file_names is not None:
+ assert x.shape[0] == len(file_names)
+ x = F.softmax(x, axis=-1) if not multilabel else F.sigmoid(x)
+ x = x.numpy()
+ y = []
+ for idx, probs in enumerate(x):
+ index = probs.argsort(axis=0)[-self.topk:][::-1].astype("int32") if not multilabel else np.where(
+ probs >= 0.5)[0].astype("int32")
+ clas_id_list = []
+ score_list = []
+ label_name_list = []
+ for i in index:
+ clas_id_list.append(i.item())
+ score_list.append(probs[i].item())
+ if self.class_id_map is not None:
+ label_name_list.append(self.class_id_map[i.item()])
+ result = {
+ "class_ids": clas_id_list,
+ "scores": np.around(score_list, decimals=5).tolist(),
+ }
+ if file_names is not None:
+ result["file_name"] = file_names[idx]
+ if label_name_list is not None:
+ result["label_names"] = label_name_list
+ y.append(result)
+ return y
diff --git a/modules/image/classification/pplcnet_x1_5_imagenet/utils.py b/modules/image/classification/pplcnet_x1_5_imagenet/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..df2bc36b3050beb0256bf2266dd6b33b4590e537
--- /dev/null
+++ b/modules/image/classification/pplcnet_x1_5_imagenet/utils.py
@@ -0,0 +1,129 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import copy
+import os
+
+import yaml
+
+__all__ = ['get_config']
+
+
+class AttrDict(dict):
+
+ def __getattr__(self, key):
+ return self[key]
+
+ def __setattr__(self, key, value):
+ if key in self.__dict__:
+ self.__dict__[key] = value
+ else:
+ self[key] = value
+
+ def __deepcopy__(self, content):
+ return copy.deepcopy(dict(self))
+
+
+def create_attr_dict(yaml_config):
+ from ast import literal_eval
+ for key, value in yaml_config.items():
+ if type(value) is dict:
+ yaml_config[key] = value = AttrDict(value)
+ if isinstance(value, str):
+ try:
+ value = literal_eval(value)
+ except BaseException:
+ pass
+ if isinstance(value, AttrDict):
+ create_attr_dict(yaml_config[key])
+ else:
+ yaml_config[key] = value
+
+
+def parse_config(cfg_file):
+ """Load a config file into AttrDict"""
+ with open(cfg_file, 'r') as fopen:
+ yaml_config = AttrDict(yaml.load(fopen, Loader=yaml.SafeLoader))
+ create_attr_dict(yaml_config)
+ return yaml_config
+
+
+def override(dl, ks, v):
+ """
+ Recursively replace dict of list
+ Args:
+ dl(dict or list): dict or list to be replaced
+ ks(list): list of keys
+ v(str): value to be replaced
+ """
+
+ def str2num(v):
+ try:
+ return eval(v)
+ except Exception:
+ return v
+
+ assert isinstance(dl, (list, dict)), ("{} should be a list or a dict")
+ assert len(ks) > 0, ('lenght of keys should larger than 0')
+ if isinstance(dl, list):
+ k = str2num(ks[0])
+ if len(ks) == 1:
+ assert k < len(dl), ('index({}) out of range({})'.format(k, dl))
+ dl[k] = str2num(v)
+ else:
+ override(dl[k], ks[1:], v)
+ else:
+ if len(ks) == 1:
+ # assert ks[0] in dl, ('{} is not exist in {}'.format(ks[0], dl))
+ if not ks[0] in dl:
+ print('A new filed ({}) detected!'.format(ks[0], dl))
+ dl[ks[0]] = str2num(v)
+ else:
+ override(dl[ks[0]], ks[1:], v)
+
+
+def override_config(config, options=None):
+ """
+ Recursively override the config
+ Args:
+ config(dict): dict to be replaced
+ options(list): list of pairs(key0.key1.idx.key2=value)
+ such as: [
+ 'topk=2',
+ 'VALID.transforms.1.ResizeImage.resize_short=300'
+ ]
+ Returns:
+ config(dict): replaced config
+ """
+ if options is not None:
+ for opt in options:
+ assert isinstance(opt, str), ("option({}) should be a str".format(opt))
+ assert "=" in opt, ("option({}) should contain a ="
+ "to distinguish between key and value".format(opt))
+ pair = opt.split('=')
+ assert len(pair) == 2, ("there can be only a = in the option")
+ key, value = pair
+ keys = key.split('.')
+ override(config, keys, value)
+ return config
+
+
+def get_config(fname, overrides=None, show=False):
+ """
+ Read config from file
+ """
+ assert os.path.exists(fname), ('config file({}) is not exist'.format(fname))
+ config = parse_config(fname)
+ override_config(config, overrides)
+ return config
diff --git a/modules/image/classification/pplcnet_x2_0_imagenet/README.md b/modules/image/classification/pplcnet_x2_0_imagenet/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..61c681d008383dfa19a26d36145a0d6c890ecaa8
--- /dev/null
+++ b/modules/image/classification/pplcnet_x2_0_imagenet/README.md
@@ -0,0 +1,132 @@
+# pplcnet_x2_0_imagenet
+
+|模型名称|pplcnet_x2_0_imagenet|
+| :--- | :---: |
+|类别|图像-图像分类|
+|网络|PPLCNet|
+|数据集|ImageNet-2012|
+|是否支持Fine-tuning|否|
+|模型大小|24 MB|
+|最新更新日期|2022-04-02|
+|数据指标|Acc|
+
+
+## 一、模型基本信息
+
+
+
+- ### 模型介绍
+
+ - PP-LCNet是百度针对Intel CPU 设备以及其加速库 MKLDNN 设计的特定骨干网络 ,比起其他的轻量级的 SOTA 模型,该骨干网络可以在不增加推理时间的情况下,进一步提升模型的性能,最终大幅度超越现有的 SOTA 模型。该模型为模型规模参数scale为x2.0下的PP-LCNet模型,关于模型结构的更多信息,可参考[论文](https://arxiv.org/pdf/2109.15099.pdf)。
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 1.6.2
+
+ - paddlehub >= 1.6.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
+
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install pplcnet_x2_0_imagenet
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ $ hub run pplcnet_x2_0_imagenet --input_path "/PATH/TO/IMAGE"
+ ```
+ - 通过命令行方式实现分类模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ classifier = hub.Module(name="pplcnet_x2_0_imagenet")
+ result = classifier.classification(images=[cv2.imread('/PATH/TO/IMAGE')])
+ # or
+ # result = classifier.classification(paths=['/PATH/TO/IMAGE'])
+ ```
+
+- ### 3、API
+
+
+ - ```python
+ def classification(images=None,
+ paths=None,
+ batch_size=1,
+ use_gpu=False,
+ top_k=1):
+ ```
+ - 分类接口API。
+ - **参数**
+
+ - images (list\[numpy.ndarray\]): 图片数据,每一个图片数据的shape 均为 \[H, W, C\],颜色空间为 BGR;
+ - paths (list\[str\]): 图片的路径;
+ - batch\_size (int): batch 的大小;
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
+ - top\_k (int): 返回预测结果的前 k 个。
+
+ - **返回**
+
+ - res (list\[dict\]): 分类结果,列表的每一个元素均为字典,其中 key 包括'class_ids'(种类索引), 'scores'(置信度) 和 'label_names'(种类名称)
+
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个图像识别的在线服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+ - ```shell
+ $ hub serving start -m pplcnet_x2_0_imagenet
+ ```
+
+ - 这样就完成了一个图像识别的在线服务的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ # 发送HTTP请求
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+ headers = {"Content-type": "application/json"\}
+ url = "http://127.0.0.1:8866/predict/pplcnet_x2_0_imagenet"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # 打印预测结果
+ print(r.json()["results"])
+ ```
+
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+
+ - ```shell
+ $ hub install pplcnet_x2_0_imagenet==1.0.0
+ ```
diff --git a/modules/image/classification/pplcnet_x2_0_imagenet/model.py b/modules/image/classification/pplcnet_x2_0_imagenet/model.py
new file mode 100644
index 0000000000000000000000000000000000000000..a3fd8364ae59dbf085d50c597d5c650d6a7a7d73
--- /dev/null
+++ b/modules/image/classification/pplcnet_x2_0_imagenet/model.py
@@ -0,0 +1,478 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from typing import Any
+from typing import Callable
+from typing import Dict
+from typing import List
+from typing import Tuple
+from typing import Union
+
+import paddle
+import paddle.nn as nn
+from paddle import ParamAttr
+from paddle.nn import AdaptiveAvgPool2D
+from paddle.nn import BatchNorm
+from paddle.nn import Conv2D
+from paddle.nn import Dropout
+from paddle.nn import Linear
+from paddle.nn.initializer import KaimingNormal
+from paddle.regularizer import L2Decay
+
+
+class Identity(nn.Layer):
+
+ def __init__(self):
+ super(Identity, self).__init__()
+
+ def forward(self, inputs):
+ return inputs
+
+
+class TheseusLayer(nn.Layer):
+
+ def __init__(self, *args, **kwargs):
+ super(TheseusLayer, self).__init__()
+ self.res_dict = {}
+ self.res_name = self.full_name()
+ self.pruner = None
+ self.quanter = None
+
+ def _return_dict_hook(self, layer, input, output):
+ res_dict = {"output": output}
+ # 'list' is needed to avoid error raised by popping self.res_dict
+ for res_key in list(self.res_dict):
+ # clear the res_dict because the forward process may change according to input
+ res_dict[res_key] = self.res_dict.pop(res_key)
+ return res_dict
+
+ def init_res(self, stages_pattern, return_patterns=None, return_stages=None):
+ if return_patterns and return_stages:
+ msg = f"The 'return_patterns' would be ignored when 'return_stages' is set."
+ return_stages = None
+
+ if return_stages is True:
+ return_patterns = stages_pattern
+ # return_stages is int or bool
+ if type(return_stages) is int:
+ return_stages = [return_stages]
+ if isinstance(return_stages, list):
+ if max(return_stages) > len(stages_pattern) or min(return_stages) < 0:
+ msg = f"The 'return_stages' set error. Illegal value(s) have been ignored. The stages' pattern list is {stages_pattern}."
+ return_stages = [val for val in return_stages if val >= 0 and val < len(stages_pattern)]
+ return_patterns = [stages_pattern[i] for i in return_stages]
+
+ if return_patterns:
+ self.update_res(return_patterns)
+
+ def replace_sub(self, *args, **kwargs) -> None:
+ msg = "The function 'replace_sub()' is deprecated, please use 'upgrade_sublayer()' instead."
+ raise DeprecationWarning(msg)
+
+ def upgrade_sublayer(self, layer_name_pattern: Union[str, List[str]],
+ handle_func: Callable[[nn.Layer, str], nn.Layer]) -> Dict[str, nn.Layer]:
+ """use 'handle_func' to modify the sub-layer(s) specified by 'layer_name_pattern'.
+
+ Args:
+ layer_name_pattern (Union[str, List[str]]): The name of layer to be modified by 'handle_func'.
+ handle_func (Callable[[nn.Layer, str], nn.Layer]): The function to modify target layer specified by 'layer_name_pattern'. The formal params are the layer(nn.Layer) and pattern(str) that is (a member of) layer_name_pattern (when layer_name_pattern is List type). And the return is the layer processed.
+
+ Returns:
+ Dict[str, nn.Layer]: The key is the pattern and corresponding value is the result returned by 'handle_func()'.
+
+ Examples:
+
+ from paddle import nn
+ import paddleclas
+
+ def rep_func(layer: nn.Layer, pattern: str):
+ new_layer = nn.Conv2D(
+ in_channels=layer._in_channels,
+ out_channels=layer._out_channels,
+ kernel_size=5,
+ padding=2
+ )
+ return new_layer
+
+ net = paddleclas.MobileNetV1()
+ res = net.replace_sub(layer_name_pattern=["blocks[11].depthwise_conv.conv", "blocks[12].depthwise_conv.conv"], handle_func=rep_func)
+ print(res)
+ # {'blocks[11].depthwise_conv.conv': the corresponding new_layer, 'blocks[12].depthwise_conv.conv': the corresponding new_layer}
+ """
+
+ if not isinstance(layer_name_pattern, list):
+ layer_name_pattern = [layer_name_pattern]
+
+ hit_layer_pattern_list = []
+ for pattern in layer_name_pattern:
+ # parse pattern to find target layer and its parent
+ layer_list = parse_pattern_str(pattern=pattern, parent_layer=self)
+ if not layer_list:
+ continue
+ sub_layer_parent = layer_list[-2]["layer"] if len(layer_list) > 1 else self
+
+ sub_layer = layer_list[-1]["layer"]
+ sub_layer_name = layer_list[-1]["name"]
+ sub_layer_index = layer_list[-1]["index"]
+
+ new_sub_layer = handle_func(sub_layer, pattern)
+
+ if sub_layer_index:
+ getattr(sub_layer_parent, sub_layer_name)[sub_layer_index] = new_sub_layer
+ else:
+ setattr(sub_layer_parent, sub_layer_name, new_sub_layer)
+
+ hit_layer_pattern_list.append(pattern)
+ return hit_layer_pattern_list
+
+ def stop_after(self, stop_layer_name: str) -> bool:
+ """stop forward and backward after 'stop_layer_name'.
+
+ Args:
+ stop_layer_name (str): The name of layer that stop forward and backward after this layer.
+
+ Returns:
+ bool: 'True' if successful, 'False' otherwise.
+ """
+
+ layer_list = parse_pattern_str(stop_layer_name, self)
+ if not layer_list:
+ return False
+
+ parent_layer = self
+ for layer_dict in layer_list:
+ name, index = layer_dict["name"], layer_dict["index"]
+ if not set_identity(parent_layer, name, index):
+ msg = f"Failed to set the layers that after stop_layer_name('{stop_layer_name}') to IdentityLayer. The error layer's name is '{name}'."
+ return False
+ parent_layer = layer_dict["layer"]
+
+ return True
+
+ def update_res(self, return_patterns: Union[str, List[str]]) -> Dict[str, nn.Layer]:
+ """update the result(s) to be returned.
+
+ Args:
+ return_patterns (Union[str, List[str]]): The name of layer to return output.
+
+ Returns:
+ Dict[str, nn.Layer]: The pattern(str) and corresponding layer(nn.Layer) that have been set successfully.
+ """
+
+ # clear res_dict that could have been set
+ self.res_dict = {}
+
+ class Handler(object):
+
+ def __init__(self, res_dict):
+ # res_dict is a reference
+ self.res_dict = res_dict
+
+ def __call__(self, layer, pattern):
+ layer.res_dict = self.res_dict
+ layer.res_name = pattern
+ if hasattr(layer, "hook_remove_helper"):
+ layer.hook_remove_helper.remove()
+ layer.hook_remove_helper = layer.register_forward_post_hook(save_sub_res_hook)
+ return layer
+
+ handle_func = Handler(self.res_dict)
+
+ hit_layer_pattern_list = self.upgrade_sublayer(return_patterns, handle_func=handle_func)
+
+ if hasattr(self, "hook_remove_helper"):
+ self.hook_remove_helper.remove()
+ self.hook_remove_helper = self.register_forward_post_hook(self._return_dict_hook)
+
+ return hit_layer_pattern_list
+
+
+def save_sub_res_hook(layer, input, output):
+ layer.res_dict[layer.res_name] = output
+
+
+def set_identity(parent_layer: nn.Layer, layer_name: str, layer_index: str = None) -> bool:
+ """set the layer specified by layer_name and layer_index to Indentity.
+
+ Args:
+ parent_layer (nn.Layer): The parent layer of target layer specified by layer_name and layer_index.
+ layer_name (str): The name of target layer to be set to Indentity.
+ layer_index (str, optional): The index of target layer to be set to Indentity in parent_layer. Defaults to None.
+
+ Returns:
+ bool: True if successfully, False otherwise.
+ """
+
+ stop_after = False
+ for sub_layer_name in parent_layer._sub_layers:
+ if stop_after:
+ parent_layer._sub_layers[sub_layer_name] = Identity()
+ continue
+ if sub_layer_name == layer_name:
+ stop_after = True
+
+ if layer_index and stop_after:
+ stop_after = False
+ for sub_layer_index in parent_layer._sub_layers[layer_name]._sub_layers:
+ if stop_after:
+ parent_layer._sub_layers[layer_name][sub_layer_index] = Identity()
+ continue
+ if layer_index == sub_layer_index:
+ stop_after = True
+
+ return stop_after
+
+
+def parse_pattern_str(pattern: str, parent_layer: nn.Layer) -> Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]:
+ """parse the string type pattern.
+
+ Args:
+ pattern (str): The pattern to discribe layer.
+ parent_layer (nn.Layer): The root layer relative to the pattern.
+
+ Returns:
+ Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]: None if failed. If successfully, the members are layers parsed in order:
+ [
+ {"layer": first layer, "name": first layer's name parsed, "index": first layer's index parsed if exist},
+ {"layer": second layer, "name": second layer's name parsed, "index": second layer's index parsed if exist},
+ ...
+ ]
+ """
+
+ pattern_list = pattern.split(".")
+ if not pattern_list:
+ msg = f"The pattern('{pattern}') is illegal. Please check and retry."
+ return None
+
+ layer_list = []
+ while len(pattern_list) > 0:
+ if '[' in pattern_list[0]:
+ target_layer_name = pattern_list[0].split('[')[0]
+ target_layer_index = pattern_list[0].split('[')[1].split(']')[0]
+ else:
+ target_layer_name = pattern_list[0]
+ target_layer_index = None
+
+ target_layer = getattr(parent_layer, target_layer_name, None)
+
+ if target_layer is None:
+ msg = f"Not found layer named('{target_layer_name}') specifed in pattern('{pattern}')."
+ return None
+
+ if target_layer_index and target_layer:
+ if int(target_layer_index) < 0 or int(target_layer_index) >= len(target_layer):
+ msg = f"Not found layer by index('{target_layer_index}') specifed in pattern('{pattern}'). The index should < {len(target_layer)} and > 0."
+ return None
+
+ target_layer = target_layer[target_layer_index]
+
+ layer_list.append({"layer": target_layer, "name": target_layer_name, "index": target_layer_index})
+
+ pattern_list = pattern_list[1:]
+ parent_layer = target_layer
+ return layer_list
+
+
+MODEL_STAGES_PATTERN = {"PPLCNet": ["blocks2", "blocks3", "blocks4", "blocks5", "blocks6"]}
+
+# Each element(list) represents a depthwise block, which is composed of k, in_c, out_c, s, use_se.
+# k: kernel_size
+# in_c: input channel number in depthwise block
+# out_c: output channel number in depthwise block
+# s: stride in depthwise block
+# use_se: whether to use SE block
+
+NET_CONFIG = {
+ "blocks2":
+ #k, in_c, out_c, s, use_se
+ [[3, 16, 32, 1, False]],
+ "blocks3": [[3, 32, 64, 2, False], [3, 64, 64, 1, False]],
+ "blocks4": [[3, 64, 128, 2, False], [3, 128, 128, 1, False]],
+ "blocks5": [[3, 128, 256, 2, False], [5, 256, 256, 1, False], [5, 256, 256, 1, False], [5, 256, 256, 1, False],
+ [5, 256, 256, 1, False], [5, 256, 256, 1, False]],
+ "blocks6": [[5, 256, 512, 2, True], [5, 512, 512, 1, True]]
+}
+
+
+def make_divisible(v, divisor=8, min_value=None):
+ if min_value is None:
+ min_value = divisor
+ new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
+ if new_v < 0.9 * v:
+ new_v += divisor
+ return new_v
+
+
+class ConvBNLayer(TheseusLayer):
+
+ def __init__(self, num_channels, filter_size, num_filters, stride, num_groups=1):
+ super().__init__()
+
+ self.conv = Conv2D(in_channels=num_channels,
+ out_channels=num_filters,
+ kernel_size=filter_size,
+ stride=stride,
+ padding=(filter_size - 1) // 2,
+ groups=num_groups,
+ weight_attr=ParamAttr(initializer=KaimingNormal()),
+ bias_attr=False)
+
+ self.bn = BatchNorm(num_filters,
+ param_attr=ParamAttr(regularizer=L2Decay(0.0)),
+ bias_attr=ParamAttr(regularizer=L2Decay(0.0)))
+ self.hardswish = nn.Hardswish()
+
+ def forward(self, x):
+ x = self.conv(x)
+ x = self.bn(x)
+ x = self.hardswish(x)
+ return x
+
+
+class DepthwiseSeparable(TheseusLayer):
+
+ def __init__(self, num_channels, num_filters, stride, dw_size=3, use_se=False):
+ super().__init__()
+ self.use_se = use_se
+ self.dw_conv = ConvBNLayer(num_channels=num_channels,
+ num_filters=num_channels,
+ filter_size=dw_size,
+ stride=stride,
+ num_groups=num_channels)
+ if use_se:
+ self.se = SEModule(num_channels)
+ self.pw_conv = ConvBNLayer(num_channels=num_channels, filter_size=1, num_filters=num_filters, stride=1)
+
+ def forward(self, x):
+ x = self.dw_conv(x)
+ if self.use_se:
+ x = self.se(x)
+ x = self.pw_conv(x)
+ return x
+
+
+class SEModule(TheseusLayer):
+
+ def __init__(self, channel, reduction=4):
+ super().__init__()
+ self.avg_pool = AdaptiveAvgPool2D(1)
+ self.conv1 = Conv2D(in_channels=channel, out_channels=channel // reduction, kernel_size=1, stride=1, padding=0)
+ self.relu = nn.ReLU()
+ self.conv2 = Conv2D(in_channels=channel // reduction, out_channels=channel, kernel_size=1, stride=1, padding=0)
+ self.hardsigmoid = nn.Hardsigmoid()
+
+ def forward(self, x):
+ identity = x
+ x = self.avg_pool(x)
+ x = self.conv1(x)
+ x = self.relu(x)
+ x = self.conv2(x)
+ x = self.hardsigmoid(x)
+ x = paddle.multiply(x=identity, y=x)
+ return x
+
+
+class PPLCNet(TheseusLayer):
+
+ def __init__(self,
+ stages_pattern,
+ scale=1.0,
+ class_num=1000,
+ dropout_prob=0.2,
+ class_expand=1280,
+ return_patterns=None,
+ return_stages=None):
+ super().__init__()
+ self.scale = scale
+ self.class_expand = class_expand
+
+ self.conv1 = ConvBNLayer(num_channels=3, filter_size=3, num_filters=make_divisible(16 * scale), stride=2)
+
+ self.blocks2 = nn.Sequential(*[
+ DepthwiseSeparable(num_channels=make_divisible(in_c * scale),
+ num_filters=make_divisible(out_c * scale),
+ dw_size=k,
+ stride=s,
+ use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks2"])
+ ])
+
+ self.blocks3 = nn.Sequential(*[
+ DepthwiseSeparable(num_channels=make_divisible(in_c * scale),
+ num_filters=make_divisible(out_c * scale),
+ dw_size=k,
+ stride=s,
+ use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks3"])
+ ])
+
+ self.blocks4 = nn.Sequential(*[
+ DepthwiseSeparable(num_channels=make_divisible(in_c * scale),
+ num_filters=make_divisible(out_c * scale),
+ dw_size=k,
+ stride=s,
+ use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks4"])
+ ])
+
+ self.blocks5 = nn.Sequential(*[
+ DepthwiseSeparable(num_channels=make_divisible(in_c * scale),
+ num_filters=make_divisible(out_c * scale),
+ dw_size=k,
+ stride=s,
+ use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks5"])
+ ])
+
+ self.blocks6 = nn.Sequential(*[
+ DepthwiseSeparable(num_channels=make_divisible(in_c * scale),
+ num_filters=make_divisible(out_c * scale),
+ dw_size=k,
+ stride=s,
+ use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks6"])
+ ])
+
+ self.avg_pool = AdaptiveAvgPool2D(1)
+
+ self.last_conv = Conv2D(in_channels=make_divisible(NET_CONFIG["blocks6"][-1][2] * scale),
+ out_channels=self.class_expand,
+ kernel_size=1,
+ stride=1,
+ padding=0,
+ bias_attr=False)
+
+ self.hardswish = nn.Hardswish()
+ self.dropout = Dropout(p=dropout_prob, mode="downscale_in_infer")
+ self.flatten = nn.Flatten(start_axis=1, stop_axis=-1)
+
+ self.fc = Linear(self.class_expand, class_num)
+
+ super().init_res(stages_pattern, return_patterns=return_patterns, return_stages=return_stages)
+
+ def forward(self, x):
+ x = self.conv1(x)
+
+ x = self.blocks2(x)
+ x = self.blocks3(x)
+ x = self.blocks4(x)
+ x = self.blocks5(x)
+ x = self.blocks6(x)
+
+ x = self.avg_pool(x)
+ x = self.last_conv(x)
+ x = self.hardswish(x)
+ x = self.dropout(x)
+ x = self.flatten(x)
+ x = self.fc(x)
+ return x
+
+
+def PPLCNet_x2_0(pretrained=False, use_ssld=False, **kwargs):
+ model = PPLCNet(scale=2.0, stages_pattern=MODEL_STAGES_PATTERN["PPLCNet"], **kwargs)
+ return model
diff --git a/modules/image/classification/pplcnet_x2_0_imagenet/module.py b/modules/image/classification/pplcnet_x2_0_imagenet/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..d67d80800fa1953288c09ffc67a69cd85dd85b83
--- /dev/null
+++ b/modules/image/classification/pplcnet_x2_0_imagenet/module.py
@@ -0,0 +1,154 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import copy
+import os
+
+import cv2
+import numpy as np
+import paddle
+from skimage.io import imread
+from skimage.transform import rescale
+from skimage.transform import resize
+
+import paddlehub as hub
+from .model import PPLCNet_x2_0
+from .processor import base64_to_cv2
+from .processor import create_operators
+from .processor import Topk
+from .utils import get_config
+from paddlehub.module.module import moduleinfo
+from paddlehub.module.module import runnable
+from paddlehub.module.module import serving
+
+
+@moduleinfo(name="pplcnet_x2_0_imagenet",
+ type="cv/classification",
+ author="paddlepaddle",
+ author_email="",
+ summary="",
+ version="1.0.0")
+class PPLcNet_x2_0:
+
+ def __init__(self):
+ self.config = get_config(os.path.join(self.directory, 'PPLCNet_x2_0.yaml'), show=False)
+ self.label_path = os.path.join(self.directory, 'imagenet1k_label_list.txt')
+ self.pretrain_path = os.path.join(self.directory, 'PPLCNet_x2_0_pretrained.pdparams')
+ self.config['Infer']['PostProcess']['class_id_map_file'] = self.label_path
+ self.model = PPLCNet_x2_0()
+ param_state_dict = paddle.load(self.pretrain_path)
+ self.model.set_dict(param_state_dict)
+ self.preprocess_funcs = create_operators(self.config["Infer"]["transforms"])
+
+ def classification(self,
+ images: list = None,
+ paths: list = None,
+ batch_size: int = 1,
+ use_gpu: bool = False,
+ top_k: int = 1):
+ '''
+ Args:
+ images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR.
+ paths (list[str]): The paths of images.
+ batch_size (int): batch size.
+ use_gpu (bool): Whether to use gpu.
+ top_k (int): Return top k results.
+
+ Returns:
+ res (list[dict]): The classfication results, each result dict contains key 'class_ids', 'scores' and 'label_names'.
+ '''
+ postprocess_func = Topk(top_k, self.label_path)
+ inputs = []
+ results = []
+ paddle.disable_static()
+ place = 'gpu:0' if use_gpu else 'cpu'
+ place = paddle.set_device(place)
+ if images == None and paths == None:
+ print('No image provided. Please input an image or a image path.')
+ return
+
+ if images != None:
+ for image in images:
+ image = image[:, :, ::-1]
+ inputs.append(image)
+
+ if paths != None:
+ for path in paths:
+ image = cv2.imread(path)[:, :, ::-1]
+ inputs.append(image)
+
+ batch_data = []
+ for idx, imagedata in enumerate(inputs):
+ for process in self.preprocess_funcs:
+ imagedata = process(imagedata)
+ batch_data.append(imagedata)
+ if len(batch_data) >= batch_size or idx == len(inputs) - 1:
+ batch_tensor = paddle.to_tensor(batch_data)
+ out = self.model(batch_tensor)
+ if isinstance(out, list):
+ out = out[0]
+ if isinstance(out, dict) and "logits" in out:
+ out = out["logits"]
+ if isinstance(out, dict) and "output" in out:
+ out = out["output"]
+ result = postprocess_func(out)
+ results.extend(result)
+ batch_data.clear()
+ return results
+
+ @runnable
+ def run_cmd(self, argvs: list):
+ """
+ Run as a command.
+ """
+ self.parser = argparse.ArgumentParser(description="Run the {} module.".format(self.name),
+ prog='hub run {}'.format(self.name),
+ usage='%(prog)s',
+ add_help=True)
+
+ self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
+ self.arg_config_group = self.parser.add_argument_group(
+ title="Config options", description="Run configuration for controlling module behavior, not required.")
+ self.add_module_config_arg()
+ self.add_module_input_arg()
+ self.args = self.parser.parse_args(argvs)
+ results = self.classification(paths=[self.args.input_path],
+ use_gpu=self.args.use_gpu,
+ batch_size=self.args.batch_size,
+ top_k=self.args.top_k)
+ return results
+
+ @serving
+ def serving_method(self, images, **kwargs):
+ """
+ Run as a service.
+ """
+ images_decode = [base64_to_cv2(image) for image in images]
+ results = self.classification(images=images_decode, **kwargs)
+ return results
+
+ def add_module_config_arg(self):
+ """
+ Add the command config options.
+ """
+ self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not")
+
+ self.arg_config_group.add_argument('--batch_size', type=int, default=1, help='batch size')
+ self.arg_config_group.add_argument('--top_k', type=int, default=1, help='Return top k results.')
+
+ def add_module_input_arg(self):
+ """
+ Add the command input options.
+ """
+ self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.")
diff --git a/modules/image/classification/pplcnet_x2_0_imagenet/processor.py b/modules/image/classification/pplcnet_x2_0_imagenet/processor.py
new file mode 100644
index 0000000000000000000000000000000000000000..40cab3917ecaef50cd47d0abb76bbd5d49062bf8
--- /dev/null
+++ b/modules/image/classification/pplcnet_x2_0_imagenet/processor.py
@@ -0,0 +1,374 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+
+import base64
+import inspect
+import math
+import os
+import random
+import sys
+from functools import partial
+
+import cv2
+import numpy as np
+import paddle
+import paddle.nn.functional as F
+import six
+from paddle.vision.transforms import ColorJitter as RawColorJitter
+from PIL import Image
+
+
+def create_operators(params, class_num=None):
+ """
+ create operators based on the config
+
+ Args:
+ params(list): a dict list, used to create some operators
+ """
+ assert isinstance(params, list), ('operator config should be a list')
+ ops = []
+ current_module = sys.modules[__name__]
+ for operator in params:
+ assert isinstance(operator, dict) and len(operator) == 1, "yaml format error"
+ op_name = list(operator)[0]
+ param = {} if operator[op_name] is None else operator[op_name]
+ op_func = getattr(current_module, op_name)
+ if "class_num" in inspect.getfullargspec(op_func).args:
+ param.update({"class_num": class_num})
+ op = op_func(**param)
+ ops.append(op)
+
+ return ops
+
+
+class UnifiedResize(object):
+
+ def __init__(self, interpolation=None, backend="cv2"):
+ _cv2_interp_from_str = {
+ 'nearest': cv2.INTER_NEAREST,
+ 'bilinear': cv2.INTER_LINEAR,
+ 'area': cv2.INTER_AREA,
+ 'bicubic': cv2.INTER_CUBIC,
+ 'lanczos': cv2.INTER_LANCZOS4
+ }
+ _pil_interp_from_str = {
+ 'nearest': Image.NEAREST,
+ 'bilinear': Image.BILINEAR,
+ 'bicubic': Image.BICUBIC,
+ 'box': Image.BOX,
+ 'lanczos': Image.LANCZOS,
+ 'hamming': Image.HAMMING
+ }
+
+ def _pil_resize(src, size, resample):
+ pil_img = Image.fromarray(src)
+ pil_img = pil_img.resize(size, resample)
+ return np.asarray(pil_img)
+
+ if backend.lower() == "cv2":
+ if isinstance(interpolation, str):
+ interpolation = _cv2_interp_from_str[interpolation.lower()]
+ # compatible with opencv < version 4.4.0
+ elif interpolation is None:
+ interpolation = cv2.INTER_LINEAR
+ self.resize_func = partial(cv2.resize, interpolation=interpolation)
+ elif backend.lower() == "pil":
+ if isinstance(interpolation, str):
+ interpolation = _pil_interp_from_str[interpolation.lower()]
+ self.resize_func = partial(_pil_resize, resample=interpolation)
+ else:
+ self.resize_func = cv2.resize
+
+ def __call__(self, src, size):
+ return self.resize_func(src, size)
+
+
+class OperatorParamError(ValueError):
+ """ OperatorParamError
+ """
+ pass
+
+
+class DecodeImage(object):
+ """ decode image """
+
+ def __init__(self, to_rgb=True, to_np=False, channel_first=False):
+ self.to_rgb = to_rgb
+ self.to_np = to_np # to numpy
+ self.channel_first = channel_first # only enabled when to_np is True
+
+ def __call__(self, img):
+ if six.PY2:
+ assert type(img) is str and len(img) > 0, "invalid input 'img' in DecodeImage"
+ else:
+ assert type(img) is bytes and len(img) > 0, "invalid input 'img' in DecodeImage"
+ data = np.frombuffer(img, dtype='uint8')
+ img = cv2.imdecode(data, 1)
+ if self.to_rgb:
+ assert img.shape[2] == 3, 'invalid shape of image[%s]' % (img.shape)
+ img = img[:, :, ::-1]
+
+ if self.channel_first:
+ img = img.transpose((2, 0, 1))
+
+ return img
+
+
+class ResizeImage(object):
+ """ resize image """
+
+ def __init__(self, size=None, resize_short=None, interpolation=None, backend="cv2"):
+ if resize_short is not None and resize_short > 0:
+ self.resize_short = resize_short
+ self.w = None
+ self.h = None
+ elif size is not None:
+ self.resize_short = None
+ self.w = size if type(size) is int else size[0]
+ self.h = size if type(size) is int else size[1]
+ else:
+ raise OperatorParamError("invalid params for ReisizeImage for '\
+ 'both 'size' and 'resize_short' are None")
+
+ self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend)
+
+ def __call__(self, img):
+ img_h, img_w = img.shape[:2]
+ if self.resize_short is not None:
+ percent = float(self.resize_short) / min(img_w, img_h)
+ w = int(round(img_w * percent))
+ h = int(round(img_h * percent))
+ else:
+ w = self.w
+ h = self.h
+ return self._resize_func(img, (w, h))
+
+
+class CropImage(object):
+ """ crop image """
+
+ def __init__(self, size):
+ if type(size) is int:
+ self.size = (size, size)
+ else:
+ self.size = size # (h, w)
+
+ def __call__(self, img):
+ w, h = self.size
+ img_h, img_w = img.shape[:2]
+ w_start = (img_w - w) // 2
+ h_start = (img_h - h) // 2
+
+ w_end = w_start + w
+ h_end = h_start + h
+ return img[h_start:h_end, w_start:w_end, :]
+
+
+class RandCropImage(object):
+ """ random crop image """
+
+ def __init__(self, size, scale=None, ratio=None, interpolation=None, backend="cv2"):
+ if type(size) is int:
+ self.size = (size, size) # (h, w)
+ else:
+ self.size = size
+
+ self.scale = [0.08, 1.0] if scale is None else scale
+ self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio
+
+ self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend)
+
+ def __call__(self, img):
+ size = self.size
+ scale = self.scale
+ ratio = self.ratio
+
+ aspect_ratio = math.sqrt(random.uniform(*ratio))
+ w = 1. * aspect_ratio
+ h = 1. / aspect_ratio
+
+ img_h, img_w = img.shape[:2]
+
+ bound = min((float(img_w) / img_h) / (w**2), (float(img_h) / img_w) / (h**2))
+ scale_max = min(scale[1], bound)
+ scale_min = min(scale[0], bound)
+
+ target_area = img_w * img_h * random.uniform(scale_min, scale_max)
+ target_size = math.sqrt(target_area)
+ w = int(target_size * w)
+ h = int(target_size * h)
+
+ i = random.randint(0, img_w - w)
+ j = random.randint(0, img_h - h)
+
+ img = img[j:j + h, i:i + w, :]
+
+ return self._resize_func(img, size)
+
+
+class RandFlipImage(object):
+ """ random flip image
+ flip_code:
+ 1: Flipped Horizontally
+ 0: Flipped Vertically
+ -1: Flipped Horizontally & Vertically
+ """
+
+ def __init__(self, flip_code=1):
+ assert flip_code in [-1, 0, 1], "flip_code should be a value in [-1, 0, 1]"
+ self.flip_code = flip_code
+
+ def __call__(self, img):
+ if random.randint(0, 1) == 1:
+ return cv2.flip(img, self.flip_code)
+ else:
+ return img
+
+
+class NormalizeImage(object):
+ """ normalize image such as substract mean, divide std
+ """
+
+ def __init__(self, scale=None, mean=None, std=None, order='chw', output_fp16=False, channel_num=3):
+ if isinstance(scale, str):
+ scale = eval(scale)
+ assert channel_num in [3, 4], "channel number of input image should be set to 3 or 4."
+ self.channel_num = channel_num
+ self.output_dtype = 'float16' if output_fp16 else 'float32'
+ self.scale = np.float32(scale if scale is not None else 1.0 / 255.0)
+ self.order = order
+ mean = mean if mean is not None else [0.485, 0.456, 0.406]
+ std = std if std is not None else [0.229, 0.224, 0.225]
+
+ shape = (3, 1, 1) if self.order == 'chw' else (1, 1, 3)
+ self.mean = np.array(mean).reshape(shape).astype('float32')
+ self.std = np.array(std).reshape(shape).astype('float32')
+
+ def __call__(self, img):
+ from PIL import Image
+ if isinstance(img, Image.Image):
+ img = np.array(img)
+
+ assert isinstance(img, np.ndarray), "invalid input 'img' in NormalizeImage"
+
+ img = (img.astype('float32') * self.scale - self.mean) / self.std
+
+ if self.channel_num == 4:
+ img_h = img.shape[1] if self.order == 'chw' else img.shape[0]
+ img_w = img.shape[2] if self.order == 'chw' else img.shape[1]
+ pad_zeros = np.zeros((1, img_h, img_w)) if self.order == 'chw' else np.zeros((img_h, img_w, 1))
+ img = (np.concatenate((img, pad_zeros), axis=0) if self.order == 'chw' else np.concatenate(
+ (img, pad_zeros), axis=2))
+ return img.astype(self.output_dtype)
+
+
+class ToCHWImage(object):
+ """ convert hwc image to chw image
+ """
+
+ def __init__(self):
+ pass
+
+ def __call__(self, img):
+ from PIL import Image
+ if isinstance(img, Image.Image):
+ img = np.array(img)
+
+ return img.transpose((2, 0, 1))
+
+
+class ColorJitter(RawColorJitter):
+ """ColorJitter.
+ """
+
+ def __init__(self, *args, **kwargs):
+ super().__init__(*args, **kwargs)
+
+ def __call__(self, img):
+ if not isinstance(img, Image.Image):
+ img = np.ascontiguousarray(img)
+ img = Image.fromarray(img)
+ img = super()._apply_image(img)
+ if isinstance(img, Image.Image):
+ img = np.asarray(img)
+ return img
+
+
+def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+
+class Topk(object):
+
+ def __init__(self, topk=1, class_id_map_file=None):
+ assert isinstance(topk, (int, ))
+ self.class_id_map = self.parse_class_id_map(class_id_map_file)
+ self.topk = topk
+
+ def parse_class_id_map(self, class_id_map_file):
+ if class_id_map_file is None:
+ return None
+ if not os.path.exists(class_id_map_file):
+ print(
+ "Warning: If want to use your own label_dict, please input legal path!\nOtherwise label_names will be empty!"
+ )
+ return None
+
+ try:
+ class_id_map = {}
+ with open(class_id_map_file, "r") as fin:
+ lines = fin.readlines()
+ for line in lines:
+ partition = line.split("\n")[0].partition(" ")
+ class_id_map[int(partition[0])] = str(partition[-1])
+ except Exception as ex:
+ print(ex)
+ class_id_map = None
+ return class_id_map
+
+ def __call__(self, x, file_names=None, multilabel=False):
+ assert isinstance(x, paddle.Tensor)
+ if file_names is not None:
+ assert x.shape[0] == len(file_names)
+ x = F.softmax(x, axis=-1) if not multilabel else F.sigmoid(x)
+ x = x.numpy()
+ y = []
+ for idx, probs in enumerate(x):
+ index = probs.argsort(axis=0)[-self.topk:][::-1].astype("int32") if not multilabel else np.where(
+ probs >= 0.5)[0].astype("int32")
+ clas_id_list = []
+ score_list = []
+ label_name_list = []
+ for i in index:
+ clas_id_list.append(i.item())
+ score_list.append(probs[i].item())
+ if self.class_id_map is not None:
+ label_name_list.append(self.class_id_map[i.item()])
+ result = {
+ "class_ids": clas_id_list,
+ "scores": np.around(score_list, decimals=5).tolist(),
+ }
+ if file_names is not None:
+ result["file_name"] = file_names[idx]
+ if label_name_list is not None:
+ result["label_names"] = label_name_list
+ y.append(result)
+ return y
diff --git a/modules/image/classification/pplcnet_x2_0_imagenet/utils.py b/modules/image/classification/pplcnet_x2_0_imagenet/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..df2bc36b3050beb0256bf2266dd6b33b4590e537
--- /dev/null
+++ b/modules/image/classification/pplcnet_x2_0_imagenet/utils.py
@@ -0,0 +1,129 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import copy
+import os
+
+import yaml
+
+__all__ = ['get_config']
+
+
+class AttrDict(dict):
+
+ def __getattr__(self, key):
+ return self[key]
+
+ def __setattr__(self, key, value):
+ if key in self.__dict__:
+ self.__dict__[key] = value
+ else:
+ self[key] = value
+
+ def __deepcopy__(self, content):
+ return copy.deepcopy(dict(self))
+
+
+def create_attr_dict(yaml_config):
+ from ast import literal_eval
+ for key, value in yaml_config.items():
+ if type(value) is dict:
+ yaml_config[key] = value = AttrDict(value)
+ if isinstance(value, str):
+ try:
+ value = literal_eval(value)
+ except BaseException:
+ pass
+ if isinstance(value, AttrDict):
+ create_attr_dict(yaml_config[key])
+ else:
+ yaml_config[key] = value
+
+
+def parse_config(cfg_file):
+ """Load a config file into AttrDict"""
+ with open(cfg_file, 'r') as fopen:
+ yaml_config = AttrDict(yaml.load(fopen, Loader=yaml.SafeLoader))
+ create_attr_dict(yaml_config)
+ return yaml_config
+
+
+def override(dl, ks, v):
+ """
+ Recursively replace dict of list
+ Args:
+ dl(dict or list): dict or list to be replaced
+ ks(list): list of keys
+ v(str): value to be replaced
+ """
+
+ def str2num(v):
+ try:
+ return eval(v)
+ except Exception:
+ return v
+
+ assert isinstance(dl, (list, dict)), ("{} should be a list or a dict")
+ assert len(ks) > 0, ('lenght of keys should larger than 0')
+ if isinstance(dl, list):
+ k = str2num(ks[0])
+ if len(ks) == 1:
+ assert k < len(dl), ('index({}) out of range({})'.format(k, dl))
+ dl[k] = str2num(v)
+ else:
+ override(dl[k], ks[1:], v)
+ else:
+ if len(ks) == 1:
+ # assert ks[0] in dl, ('{} is not exist in {}'.format(ks[0], dl))
+ if not ks[0] in dl:
+ print('A new filed ({}) detected!'.format(ks[0], dl))
+ dl[ks[0]] = str2num(v)
+ else:
+ override(dl[ks[0]], ks[1:], v)
+
+
+def override_config(config, options=None):
+ """
+ Recursively override the config
+ Args:
+ config(dict): dict to be replaced
+ options(list): list of pairs(key0.key1.idx.key2=value)
+ such as: [
+ 'topk=2',
+ 'VALID.transforms.1.ResizeImage.resize_short=300'
+ ]
+ Returns:
+ config(dict): replaced config
+ """
+ if options is not None:
+ for opt in options:
+ assert isinstance(opt, str), ("option({}) should be a str".format(opt))
+ assert "=" in opt, ("option({}) should contain a ="
+ "to distinguish between key and value".format(opt))
+ pair = opt.split('=')
+ assert len(pair) == 2, ("there can be only a = in the option")
+ key, value = pair
+ keys = key.split('.')
+ override(config, keys, value)
+ return config
+
+
+def get_config(fname, overrides=None, show=False):
+ """
+ Read config from file
+ """
+ assert os.path.exists(fname), ('config file({}) is not exist'.format(fname))
+ config = parse_config(fname)
+ override_config(config, overrides)
+ return config
diff --git a/modules/image/classification/pplcnet_x2_5_imagenet/README.md b/modules/image/classification/pplcnet_x2_5_imagenet/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..a7099ebce1a3c914bdbc612a9761bfe0e3965b64
--- /dev/null
+++ b/modules/image/classification/pplcnet_x2_5_imagenet/README.md
@@ -0,0 +1,132 @@
+# pplcnet_x2_5_imagenet
+
+|模型名称|pplcnet_x2_5_imagenet|
+| :--- | :---: |
+|类别|图像-图像分类|
+|网络|PPLCNet|
+|数据集|ImageNet-2012|
+|是否支持Fine-tuning|否|
+|模型大小|34 MB|
+|最新更新日期|2022-04-02|
+|数据指标|Acc|
+
+
+## 一、模型基本信息
+
+
+
+- ### 模型介绍
+
+ - PP-LCNet是百度针对Intel CPU 设备以及其加速库 MKLDNN 设计的特定骨干网络 ,比起其他的轻量级的 SOTA 模型,该骨干网络可以在不增加推理时间的情况下,进一步提升模型的性能,最终大幅度超越现有的 SOTA 模型。该模型为模型规模参数scale为x2.5下的PP-LCNet模型,关于模型结构的更多信息,可参考[论文](https://arxiv.org/pdf/2109.15099.pdf)。
+
+## 二、安装
+
+- ### 1、环境依赖
+
+ - paddlepaddle >= 1.6.2
+
+ - paddlehub >= 1.6.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
+
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install pplcnet_x2_5_imagenet
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ $ hub run pplcnet_x2_5_imagenet --input_path "/PATH/TO/IMAGE"
+ ```
+ - 通过命令行方式实现分类模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+ import cv2
+
+ classifier = hub.Module(name="pplcnet_x2_5_imagenet")
+ result = classifier.classification(images=[cv2.imread('/PATH/TO/IMAGE')])
+ # or
+ # result = classifier.classification(paths=['/PATH/TO/IMAGE'])
+ ```
+
+- ### 3、API
+
+
+ - ```python
+ def classification(images=None,
+ paths=None,
+ batch_size=1,
+ use_gpu=False,
+ top_k=1):
+ ```
+ - 分类接口API。
+ - **参数**
+
+ - images (list\[numpy.ndarray\]): 图片数据,每一个图片数据的shape 均为 \[H, W, C\],颜色空间为 BGR;
+ - paths (list\[str\]): 图片的路径;
+ - batch\_size (int): batch 的大小;
+ - use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
+ - top\_k (int): 返回预测结果的前 k 个。
+
+ - **返回**
+
+ - res (list\[dict\]): 分类结果,列表的每一个元素均为字典,其中 key 包括'class_ids'(种类索引), 'scores'(置信度) 和 'label_names'(种类名称)
+
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个图像识别的在线服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+ - ```shell
+ $ hub serving start -m pplcnet_x2_5_imagenet
+ ```
+
+ - 这样就完成了一个图像识别的在线服务的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ # 发送HTTP请求
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+ headers = {"Content-type": "application/json"\}
+ url = "http://127.0.0.1:8866/predict/pplcnet_x2_5_imagenet"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # 打印预测结果
+ print(r.json()["results"])
+ ```
+
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+
+ - ```shell
+ $ hub install pplcnet_x2_5_imagenet==1.0.0
+ ```
diff --git a/modules/image/classification/pplcnet_x2_5_imagenet/model.py b/modules/image/classification/pplcnet_x2_5_imagenet/model.py
new file mode 100644
index 0000000000000000000000000000000000000000..b1395770144db6cadd6f1cb121d07b966e30e02a
--- /dev/null
+++ b/modules/image/classification/pplcnet_x2_5_imagenet/model.py
@@ -0,0 +1,478 @@
+# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from typing import Any
+from typing import Callable
+from typing import Dict
+from typing import List
+from typing import Tuple
+from typing import Union
+
+import paddle
+import paddle.nn as nn
+from paddle import ParamAttr
+from paddle.nn import AdaptiveAvgPool2D
+from paddle.nn import BatchNorm
+from paddle.nn import Conv2D
+from paddle.nn import Dropout
+from paddle.nn import Linear
+from paddle.nn.initializer import KaimingNormal
+from paddle.regularizer import L2Decay
+
+
+class Identity(nn.Layer):
+
+ def __init__(self):
+ super(Identity, self).__init__()
+
+ def forward(self, inputs):
+ return inputs
+
+
+class TheseusLayer(nn.Layer):
+
+ def __init__(self, *args, **kwargs):
+ super(TheseusLayer, self).__init__()
+ self.res_dict = {}
+ self.res_name = self.full_name()
+ self.pruner = None
+ self.quanter = None
+
+ def _return_dict_hook(self, layer, input, output):
+ res_dict = {"output": output}
+ # 'list' is needed to avoid error raised by popping self.res_dict
+ for res_key in list(self.res_dict):
+ # clear the res_dict because the forward process may change according to input
+ res_dict[res_key] = self.res_dict.pop(res_key)
+ return res_dict
+
+ def init_res(self, stages_pattern, return_patterns=None, return_stages=None):
+ if return_patterns and return_stages:
+ msg = f"The 'return_patterns' would be ignored when 'return_stages' is set."
+ return_stages = None
+
+ if return_stages is True:
+ return_patterns = stages_pattern
+ # return_stages is int or bool
+ if type(return_stages) is int:
+ return_stages = [return_stages]
+ if isinstance(return_stages, list):
+ if max(return_stages) > len(stages_pattern) or min(return_stages) < 0:
+ msg = f"The 'return_stages' set error. Illegal value(s) have been ignored. The stages' pattern list is {stages_pattern}."
+ return_stages = [val for val in return_stages if val >= 0 and val < len(stages_pattern)]
+ return_patterns = [stages_pattern[i] for i in return_stages]
+
+ if return_patterns:
+ self.update_res(return_patterns)
+
+ def replace_sub(self, *args, **kwargs) -> None:
+ msg = "The function 'replace_sub()' is deprecated, please use 'upgrade_sublayer()' instead."
+ raise DeprecationWarning(msg)
+
+ def upgrade_sublayer(self, layer_name_pattern: Union[str, List[str]],
+ handle_func: Callable[[nn.Layer, str], nn.Layer]) -> Dict[str, nn.Layer]:
+ """use 'handle_func' to modify the sub-layer(s) specified by 'layer_name_pattern'.
+
+ Args:
+ layer_name_pattern (Union[str, List[str]]): The name of layer to be modified by 'handle_func'.
+ handle_func (Callable[[nn.Layer, str], nn.Layer]): The function to modify target layer specified by 'layer_name_pattern'. The formal params are the layer(nn.Layer) and pattern(str) that is (a member of) layer_name_pattern (when layer_name_pattern is List type). And the return is the layer processed.
+
+ Returns:
+ Dict[str, nn.Layer]: The key is the pattern and corresponding value is the result returned by 'handle_func()'.
+
+ Examples:
+
+ from paddle import nn
+ import paddleclas
+
+ def rep_func(layer: nn.Layer, pattern: str):
+ new_layer = nn.Conv2D(
+ in_channels=layer._in_channels,
+ out_channels=layer._out_channels,
+ kernel_size=5,
+ padding=2
+ )
+ return new_layer
+
+ net = paddleclas.MobileNetV1()
+ res = net.replace_sub(layer_name_pattern=["blocks[11].depthwise_conv.conv", "blocks[12].depthwise_conv.conv"], handle_func=rep_func)
+ print(res)
+ # {'blocks[11].depthwise_conv.conv': the corresponding new_layer, 'blocks[12].depthwise_conv.conv': the corresponding new_layer}
+ """
+
+ if not isinstance(layer_name_pattern, list):
+ layer_name_pattern = [layer_name_pattern]
+
+ hit_layer_pattern_list = []
+ for pattern in layer_name_pattern:
+ # parse pattern to find target layer and its parent
+ layer_list = parse_pattern_str(pattern=pattern, parent_layer=self)
+ if not layer_list:
+ continue
+ sub_layer_parent = layer_list[-2]["layer"] if len(layer_list) > 1 else self
+
+ sub_layer = layer_list[-1]["layer"]
+ sub_layer_name = layer_list[-1]["name"]
+ sub_layer_index = layer_list[-1]["index"]
+
+ new_sub_layer = handle_func(sub_layer, pattern)
+
+ if sub_layer_index:
+ getattr(sub_layer_parent, sub_layer_name)[sub_layer_index] = new_sub_layer
+ else:
+ setattr(sub_layer_parent, sub_layer_name, new_sub_layer)
+
+ hit_layer_pattern_list.append(pattern)
+ return hit_layer_pattern_list
+
+ def stop_after(self, stop_layer_name: str) -> bool:
+ """stop forward and backward after 'stop_layer_name'.
+
+ Args:
+ stop_layer_name (str): The name of layer that stop forward and backward after this layer.
+
+ Returns:
+ bool: 'True' if successful, 'False' otherwise.
+ """
+
+ layer_list = parse_pattern_str(stop_layer_name, self)
+ if not layer_list:
+ return False
+
+ parent_layer = self
+ for layer_dict in layer_list:
+ name, index = layer_dict["name"], layer_dict["index"]
+ if not set_identity(parent_layer, name, index):
+ msg = f"Failed to set the layers that after stop_layer_name('{stop_layer_name}') to IdentityLayer. The error layer's name is '{name}'."
+ return False
+ parent_layer = layer_dict["layer"]
+
+ return True
+
+ def update_res(self, return_patterns: Union[str, List[str]]) -> Dict[str, nn.Layer]:
+ """update the result(s) to be returned.
+
+ Args:
+ return_patterns (Union[str, List[str]]): The name of layer to return output.
+
+ Returns:
+ Dict[str, nn.Layer]: The pattern(str) and corresponding layer(nn.Layer) that have been set successfully.
+ """
+
+ # clear res_dict that could have been set
+ self.res_dict = {}
+
+ class Handler(object):
+
+ def __init__(self, res_dict):
+ # res_dict is a reference
+ self.res_dict = res_dict
+
+ def __call__(self, layer, pattern):
+ layer.res_dict = self.res_dict
+ layer.res_name = pattern
+ if hasattr(layer, "hook_remove_helper"):
+ layer.hook_remove_helper.remove()
+ layer.hook_remove_helper = layer.register_forward_post_hook(save_sub_res_hook)
+ return layer
+
+ handle_func = Handler(self.res_dict)
+
+ hit_layer_pattern_list = self.upgrade_sublayer(return_patterns, handle_func=handle_func)
+
+ if hasattr(self, "hook_remove_helper"):
+ self.hook_remove_helper.remove()
+ self.hook_remove_helper = self.register_forward_post_hook(self._return_dict_hook)
+
+ return hit_layer_pattern_list
+
+
+def save_sub_res_hook(layer, input, output):
+ layer.res_dict[layer.res_name] = output
+
+
+def set_identity(parent_layer: nn.Layer, layer_name: str, layer_index: str = None) -> bool:
+ """set the layer specified by layer_name and layer_index to Indentity.
+
+ Args:
+ parent_layer (nn.Layer): The parent layer of target layer specified by layer_name and layer_index.
+ layer_name (str): The name of target layer to be set to Indentity.
+ layer_index (str, optional): The index of target layer to be set to Indentity in parent_layer. Defaults to None.
+
+ Returns:
+ bool: True if successfully, False otherwise.
+ """
+
+ stop_after = False
+ for sub_layer_name in parent_layer._sub_layers:
+ if stop_after:
+ parent_layer._sub_layers[sub_layer_name] = Identity()
+ continue
+ if sub_layer_name == layer_name:
+ stop_after = True
+
+ if layer_index and stop_after:
+ stop_after = False
+ for sub_layer_index in parent_layer._sub_layers[layer_name]._sub_layers:
+ if stop_after:
+ parent_layer._sub_layers[layer_name][sub_layer_index] = Identity()
+ continue
+ if layer_index == sub_layer_index:
+ stop_after = True
+
+ return stop_after
+
+
+def parse_pattern_str(pattern: str, parent_layer: nn.Layer) -> Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]:
+ """parse the string type pattern.
+
+ Args:
+ pattern (str): The pattern to discribe layer.
+ parent_layer (nn.Layer): The root layer relative to the pattern.
+
+ Returns:
+ Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]: None if failed. If successfully, the members are layers parsed in order:
+ [
+ {"layer": first layer, "name": first layer's name parsed, "index": first layer's index parsed if exist},
+ {"layer": second layer, "name": second layer's name parsed, "index": second layer's index parsed if exist},
+ ...
+ ]
+ """
+
+ pattern_list = pattern.split(".")
+ if not pattern_list:
+ msg = f"The pattern('{pattern}') is illegal. Please check and retry."
+ return None
+
+ layer_list = []
+ while len(pattern_list) > 0:
+ if '[' in pattern_list[0]:
+ target_layer_name = pattern_list[0].split('[')[0]
+ target_layer_index = pattern_list[0].split('[')[1].split(']')[0]
+ else:
+ target_layer_name = pattern_list[0]
+ target_layer_index = None
+
+ target_layer = getattr(parent_layer, target_layer_name, None)
+
+ if target_layer is None:
+ msg = f"Not found layer named('{target_layer_name}') specifed in pattern('{pattern}')."
+ return None
+
+ if target_layer_index and target_layer:
+ if int(target_layer_index) < 0 or int(target_layer_index) >= len(target_layer):
+ msg = f"Not found layer by index('{target_layer_index}') specifed in pattern('{pattern}'). The index should < {len(target_layer)} and > 0."
+ return None
+
+ target_layer = target_layer[target_layer_index]
+
+ layer_list.append({"layer": target_layer, "name": target_layer_name, "index": target_layer_index})
+
+ pattern_list = pattern_list[1:]
+ parent_layer = target_layer
+ return layer_list
+
+
+MODEL_STAGES_PATTERN = {"PPLCNet": ["blocks2", "blocks3", "blocks4", "blocks5", "blocks6"]}
+
+# Each element(list) represents a depthwise block, which is composed of k, in_c, out_c, s, use_se.
+# k: kernel_size
+# in_c: input channel number in depthwise block
+# out_c: output channel number in depthwise block
+# s: stride in depthwise block
+# use_se: whether to use SE block
+
+NET_CONFIG = {
+ "blocks2":
+ #k, in_c, out_c, s, use_se
+ [[3, 16, 32, 1, False]],
+ "blocks3": [[3, 32, 64, 2, False], [3, 64, 64, 1, False]],
+ "blocks4": [[3, 64, 128, 2, False], [3, 128, 128, 1, False]],
+ "blocks5": [[3, 128, 256, 2, False], [5, 256, 256, 1, False], [5, 256, 256, 1, False], [5, 256, 256, 1, False],
+ [5, 256, 256, 1, False], [5, 256, 256, 1, False]],
+ "blocks6": [[5, 256, 512, 2, True], [5, 512, 512, 1, True]]
+}
+
+
+def make_divisible(v, divisor=8, min_value=None):
+ if min_value is None:
+ min_value = divisor
+ new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
+ if new_v < 0.9 * v:
+ new_v += divisor
+ return new_v
+
+
+class ConvBNLayer(TheseusLayer):
+
+ def __init__(self, num_channels, filter_size, num_filters, stride, num_groups=1):
+ super().__init__()
+
+ self.conv = Conv2D(in_channels=num_channels,
+ out_channels=num_filters,
+ kernel_size=filter_size,
+ stride=stride,
+ padding=(filter_size - 1) // 2,
+ groups=num_groups,
+ weight_attr=ParamAttr(initializer=KaimingNormal()),
+ bias_attr=False)
+
+ self.bn = BatchNorm(num_filters,
+ param_attr=ParamAttr(regularizer=L2Decay(0.0)),
+ bias_attr=ParamAttr(regularizer=L2Decay(0.0)))
+ self.hardswish = nn.Hardswish()
+
+ def forward(self, x):
+ x = self.conv(x)
+ x = self.bn(x)
+ x = self.hardswish(x)
+ return x
+
+
+class DepthwiseSeparable(TheseusLayer):
+
+ def __init__(self, num_channels, num_filters, stride, dw_size=3, use_se=False):
+ super().__init__()
+ self.use_se = use_se
+ self.dw_conv = ConvBNLayer(num_channels=num_channels,
+ num_filters=num_channels,
+ filter_size=dw_size,
+ stride=stride,
+ num_groups=num_channels)
+ if use_se:
+ self.se = SEModule(num_channels)
+ self.pw_conv = ConvBNLayer(num_channels=num_channels, filter_size=1, num_filters=num_filters, stride=1)
+
+ def forward(self, x):
+ x = self.dw_conv(x)
+ if self.use_se:
+ x = self.se(x)
+ x = self.pw_conv(x)
+ return x
+
+
+class SEModule(TheseusLayer):
+
+ def __init__(self, channel, reduction=4):
+ super().__init__()
+ self.avg_pool = AdaptiveAvgPool2D(1)
+ self.conv1 = Conv2D(in_channels=channel, out_channels=channel // reduction, kernel_size=1, stride=1, padding=0)
+ self.relu = nn.ReLU()
+ self.conv2 = Conv2D(in_channels=channel // reduction, out_channels=channel, kernel_size=1, stride=1, padding=0)
+ self.hardsigmoid = nn.Hardsigmoid()
+
+ def forward(self, x):
+ identity = x
+ x = self.avg_pool(x)
+ x = self.conv1(x)
+ x = self.relu(x)
+ x = self.conv2(x)
+ x = self.hardsigmoid(x)
+ x = paddle.multiply(x=identity, y=x)
+ return x
+
+
+class PPLCNet(TheseusLayer):
+
+ def __init__(self,
+ stages_pattern,
+ scale=1.0,
+ class_num=1000,
+ dropout_prob=0.2,
+ class_expand=1280,
+ return_patterns=None,
+ return_stages=None):
+ super().__init__()
+ self.scale = scale
+ self.class_expand = class_expand
+
+ self.conv1 = ConvBNLayer(num_channels=3, filter_size=3, num_filters=make_divisible(16 * scale), stride=2)
+
+ self.blocks2 = nn.Sequential(*[
+ DepthwiseSeparable(num_channels=make_divisible(in_c * scale),
+ num_filters=make_divisible(out_c * scale),
+ dw_size=k,
+ stride=s,
+ use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks2"])
+ ])
+
+ self.blocks3 = nn.Sequential(*[
+ DepthwiseSeparable(num_channels=make_divisible(in_c * scale),
+ num_filters=make_divisible(out_c * scale),
+ dw_size=k,
+ stride=s,
+ use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks3"])
+ ])
+
+ self.blocks4 = nn.Sequential(*[
+ DepthwiseSeparable(num_channels=make_divisible(in_c * scale),
+ num_filters=make_divisible(out_c * scale),
+ dw_size=k,
+ stride=s,
+ use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks4"])
+ ])
+
+ self.blocks5 = nn.Sequential(*[
+ DepthwiseSeparable(num_channels=make_divisible(in_c * scale),
+ num_filters=make_divisible(out_c * scale),
+ dw_size=k,
+ stride=s,
+ use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks5"])
+ ])
+
+ self.blocks6 = nn.Sequential(*[
+ DepthwiseSeparable(num_channels=make_divisible(in_c * scale),
+ num_filters=make_divisible(out_c * scale),
+ dw_size=k,
+ stride=s,
+ use_se=se) for i, (k, in_c, out_c, s, se) in enumerate(NET_CONFIG["blocks6"])
+ ])
+
+ self.avg_pool = AdaptiveAvgPool2D(1)
+
+ self.last_conv = Conv2D(in_channels=make_divisible(NET_CONFIG["blocks6"][-1][2] * scale),
+ out_channels=self.class_expand,
+ kernel_size=1,
+ stride=1,
+ padding=0,
+ bias_attr=False)
+
+ self.hardswish = nn.Hardswish()
+ self.dropout = Dropout(p=dropout_prob, mode="downscale_in_infer")
+ self.flatten = nn.Flatten(start_axis=1, stop_axis=-1)
+
+ self.fc = Linear(self.class_expand, class_num)
+
+ super().init_res(stages_pattern, return_patterns=return_patterns, return_stages=return_stages)
+
+ def forward(self, x):
+ x = self.conv1(x)
+
+ x = self.blocks2(x)
+ x = self.blocks3(x)
+ x = self.blocks4(x)
+ x = self.blocks5(x)
+ x = self.blocks6(x)
+
+ x = self.avg_pool(x)
+ x = self.last_conv(x)
+ x = self.hardswish(x)
+ x = self.dropout(x)
+ x = self.flatten(x)
+ x = self.fc(x)
+ return x
+
+
+def PPLCNet_x2_5(pretrained=False, use_ssld=False, **kwargs):
+ model = PPLCNet(scale=2.5, stages_pattern=MODEL_STAGES_PATTERN["PPLCNet"], **kwargs)
+ return model
diff --git a/modules/image/classification/pplcnet_x2_5_imagenet/module.py b/modules/image/classification/pplcnet_x2_5_imagenet/module.py
new file mode 100644
index 0000000000000000000000000000000000000000..479cf4a61b46dd14b64e5915df6e8144f40cdf1c
--- /dev/null
+++ b/modules/image/classification/pplcnet_x2_5_imagenet/module.py
@@ -0,0 +1,154 @@
+# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import copy
+import os
+
+import cv2
+import numpy as np
+import paddle
+from skimage.io import imread
+from skimage.transform import rescale
+from skimage.transform import resize
+
+import paddlehub as hub
+from .model import PPLCNet_x2_5
+from .processor import base64_to_cv2
+from .processor import create_operators
+from .processor import Topk
+from .utils import get_config
+from paddlehub.module.module import moduleinfo
+from paddlehub.module.module import runnable
+from paddlehub.module.module import serving
+
+
+@moduleinfo(name="pplcnet_x2_5_imagenet",
+ type="cv/classification",
+ author="paddlepaddle",
+ author_email="",
+ summary="",
+ version="1.0.0")
+class PPLcNet_x2_5:
+
+ def __init__(self):
+ self.config = get_config(os.path.join(self.directory, 'PPLCNet_x2_5.yaml'), show=False)
+ self.label_path = os.path.join(self.directory, 'imagenet1k_label_list.txt')
+ self.pretrain_path = os.path.join(self.directory, 'PPLCNet_x2_5_pretrained.pdparams')
+ self.config['Infer']['PostProcess']['class_id_map_file'] = self.label_path
+ self.model = PPLCNet_x2_5()
+ param_state_dict = paddle.load(self.pretrain_path)
+ self.model.set_dict(param_state_dict)
+ self.preprocess_funcs = create_operators(self.config["Infer"]["transforms"])
+
+ def classification(self,
+ images: list = None,
+ paths: list = None,
+ batch_size: int = 1,
+ use_gpu: bool = False,
+ top_k: int = 1):
+ '''
+ Args:
+ images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR.
+ paths (list[str]): The paths of images.
+ batch_size (int): batch size.
+ use_gpu (bool): Whether to use gpu.
+ top_k (int): Return top k results.
+
+ Returns:
+ res (list[dict]): The classfication results, each result dict contains key 'class_ids', 'scores' and 'label_names'.
+ '''
+ postprocess_func = Topk(top_k, self.label_path)
+ inputs = []
+ results = []
+ paddle.disable_static()
+ place = 'gpu:0' if use_gpu else 'cpu'
+ place = paddle.set_device(place)
+ if images == None and paths == None:
+ print('No image provided. Please input an image or a image path.')
+ return
+
+ if images != None:
+ for image in images:
+ image = image[:, :, ::-1]
+ inputs.append(image)
+
+ if paths != None:
+ for path in paths:
+ image = cv2.imread(path)[:, :, ::-1]
+ inputs.append(image)
+
+ batch_data = []
+ for idx, imagedata in enumerate(inputs):
+ for process in self.preprocess_funcs:
+ imagedata = process(imagedata)
+ batch_data.append(imagedata)
+ if len(batch_data) >= batch_size or idx == len(inputs) - 1:
+ batch_tensor = paddle.to_tensor(batch_data)
+ out = self.model(batch_tensor)
+ if isinstance(out, list):
+ out = out[0]
+ if isinstance(out, dict) and "logits" in out:
+ out = out["logits"]
+ if isinstance(out, dict) and "output" in out:
+ out = out["output"]
+ result = postprocess_func(out)
+ results.extend(result)
+ batch_data.clear()
+ return results
+
+ @runnable
+ def run_cmd(self, argvs: list):
+ """
+ Run as a command.
+ """
+ self.parser = argparse.ArgumentParser(description="Run the {} module.".format(self.name),
+ prog='hub run {}'.format(self.name),
+ usage='%(prog)s',
+ add_help=True)
+
+ self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
+ self.arg_config_group = self.parser.add_argument_group(
+ title="Config options", description="Run configuration for controlling module behavior, not required.")
+ self.add_module_config_arg()
+ self.add_module_input_arg()
+ self.args = self.parser.parse_args(argvs)
+ results = self.classification(paths=[self.args.input_path],
+ use_gpu=self.args.use_gpu,
+ batch_size=self.args.batch_size,
+ top_k=self.args.top_k)
+ return results
+
+ @serving
+ def serving_method(self, images, **kwargs):
+ """
+ Run as a service.
+ """
+ images_decode = [base64_to_cv2(image) for image in images]
+ results = self.classification(images=images_decode, **kwargs)
+ return results
+
+ def add_module_config_arg(self):
+ """
+ Add the command config options.
+ """
+ self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not")
+
+ self.arg_config_group.add_argument('--batch_size', type=int, default=1, help='batch size')
+ self.arg_config_group.add_argument('--top_k', type=int, default=1, help='Return top k results.')
+
+ def add_module_input_arg(self):
+ """
+ Add the command input options.
+ """
+ self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.")
diff --git a/modules/image/classification/pplcnet_x2_5_imagenet/processor.py b/modules/image/classification/pplcnet_x2_5_imagenet/processor.py
new file mode 100644
index 0000000000000000000000000000000000000000..40cab3917ecaef50cd47d0abb76bbd5d49062bf8
--- /dev/null
+++ b/modules/image/classification/pplcnet_x2_5_imagenet/processor.py
@@ -0,0 +1,374 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+
+import base64
+import inspect
+import math
+import os
+import random
+import sys
+from functools import partial
+
+import cv2
+import numpy as np
+import paddle
+import paddle.nn.functional as F
+import six
+from paddle.vision.transforms import ColorJitter as RawColorJitter
+from PIL import Image
+
+
+def create_operators(params, class_num=None):
+ """
+ create operators based on the config
+
+ Args:
+ params(list): a dict list, used to create some operators
+ """
+ assert isinstance(params, list), ('operator config should be a list')
+ ops = []
+ current_module = sys.modules[__name__]
+ for operator in params:
+ assert isinstance(operator, dict) and len(operator) == 1, "yaml format error"
+ op_name = list(operator)[0]
+ param = {} if operator[op_name] is None else operator[op_name]
+ op_func = getattr(current_module, op_name)
+ if "class_num" in inspect.getfullargspec(op_func).args:
+ param.update({"class_num": class_num})
+ op = op_func(**param)
+ ops.append(op)
+
+ return ops
+
+
+class UnifiedResize(object):
+
+ def __init__(self, interpolation=None, backend="cv2"):
+ _cv2_interp_from_str = {
+ 'nearest': cv2.INTER_NEAREST,
+ 'bilinear': cv2.INTER_LINEAR,
+ 'area': cv2.INTER_AREA,
+ 'bicubic': cv2.INTER_CUBIC,
+ 'lanczos': cv2.INTER_LANCZOS4
+ }
+ _pil_interp_from_str = {
+ 'nearest': Image.NEAREST,
+ 'bilinear': Image.BILINEAR,
+ 'bicubic': Image.BICUBIC,
+ 'box': Image.BOX,
+ 'lanczos': Image.LANCZOS,
+ 'hamming': Image.HAMMING
+ }
+
+ def _pil_resize(src, size, resample):
+ pil_img = Image.fromarray(src)
+ pil_img = pil_img.resize(size, resample)
+ return np.asarray(pil_img)
+
+ if backend.lower() == "cv2":
+ if isinstance(interpolation, str):
+ interpolation = _cv2_interp_from_str[interpolation.lower()]
+ # compatible with opencv < version 4.4.0
+ elif interpolation is None:
+ interpolation = cv2.INTER_LINEAR
+ self.resize_func = partial(cv2.resize, interpolation=interpolation)
+ elif backend.lower() == "pil":
+ if isinstance(interpolation, str):
+ interpolation = _pil_interp_from_str[interpolation.lower()]
+ self.resize_func = partial(_pil_resize, resample=interpolation)
+ else:
+ self.resize_func = cv2.resize
+
+ def __call__(self, src, size):
+ return self.resize_func(src, size)
+
+
+class OperatorParamError(ValueError):
+ """ OperatorParamError
+ """
+ pass
+
+
+class DecodeImage(object):
+ """ decode image """
+
+ def __init__(self, to_rgb=True, to_np=False, channel_first=False):
+ self.to_rgb = to_rgb
+ self.to_np = to_np # to numpy
+ self.channel_first = channel_first # only enabled when to_np is True
+
+ def __call__(self, img):
+ if six.PY2:
+ assert type(img) is str and len(img) > 0, "invalid input 'img' in DecodeImage"
+ else:
+ assert type(img) is bytes and len(img) > 0, "invalid input 'img' in DecodeImage"
+ data = np.frombuffer(img, dtype='uint8')
+ img = cv2.imdecode(data, 1)
+ if self.to_rgb:
+ assert img.shape[2] == 3, 'invalid shape of image[%s]' % (img.shape)
+ img = img[:, :, ::-1]
+
+ if self.channel_first:
+ img = img.transpose((2, 0, 1))
+
+ return img
+
+
+class ResizeImage(object):
+ """ resize image """
+
+ def __init__(self, size=None, resize_short=None, interpolation=None, backend="cv2"):
+ if resize_short is not None and resize_short > 0:
+ self.resize_short = resize_short
+ self.w = None
+ self.h = None
+ elif size is not None:
+ self.resize_short = None
+ self.w = size if type(size) is int else size[0]
+ self.h = size if type(size) is int else size[1]
+ else:
+ raise OperatorParamError("invalid params for ReisizeImage for '\
+ 'both 'size' and 'resize_short' are None")
+
+ self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend)
+
+ def __call__(self, img):
+ img_h, img_w = img.shape[:2]
+ if self.resize_short is not None:
+ percent = float(self.resize_short) / min(img_w, img_h)
+ w = int(round(img_w * percent))
+ h = int(round(img_h * percent))
+ else:
+ w = self.w
+ h = self.h
+ return self._resize_func(img, (w, h))
+
+
+class CropImage(object):
+ """ crop image """
+
+ def __init__(self, size):
+ if type(size) is int:
+ self.size = (size, size)
+ else:
+ self.size = size # (h, w)
+
+ def __call__(self, img):
+ w, h = self.size
+ img_h, img_w = img.shape[:2]
+ w_start = (img_w - w) // 2
+ h_start = (img_h - h) // 2
+
+ w_end = w_start + w
+ h_end = h_start + h
+ return img[h_start:h_end, w_start:w_end, :]
+
+
+class RandCropImage(object):
+ """ random crop image """
+
+ def __init__(self, size, scale=None, ratio=None, interpolation=None, backend="cv2"):
+ if type(size) is int:
+ self.size = (size, size) # (h, w)
+ else:
+ self.size = size
+
+ self.scale = [0.08, 1.0] if scale is None else scale
+ self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio
+
+ self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend)
+
+ def __call__(self, img):
+ size = self.size
+ scale = self.scale
+ ratio = self.ratio
+
+ aspect_ratio = math.sqrt(random.uniform(*ratio))
+ w = 1. * aspect_ratio
+ h = 1. / aspect_ratio
+
+ img_h, img_w = img.shape[:2]
+
+ bound = min((float(img_w) / img_h) / (w**2), (float(img_h) / img_w) / (h**2))
+ scale_max = min(scale[1], bound)
+ scale_min = min(scale[0], bound)
+
+ target_area = img_w * img_h * random.uniform(scale_min, scale_max)
+ target_size = math.sqrt(target_area)
+ w = int(target_size * w)
+ h = int(target_size * h)
+
+ i = random.randint(0, img_w - w)
+ j = random.randint(0, img_h - h)
+
+ img = img[j:j + h, i:i + w, :]
+
+ return self._resize_func(img, size)
+
+
+class RandFlipImage(object):
+ """ random flip image
+ flip_code:
+ 1: Flipped Horizontally
+ 0: Flipped Vertically
+ -1: Flipped Horizontally & Vertically
+ """
+
+ def __init__(self, flip_code=1):
+ assert flip_code in [-1, 0, 1], "flip_code should be a value in [-1, 0, 1]"
+ self.flip_code = flip_code
+
+ def __call__(self, img):
+ if random.randint(0, 1) == 1:
+ return cv2.flip(img, self.flip_code)
+ else:
+ return img
+
+
+class NormalizeImage(object):
+ """ normalize image such as substract mean, divide std
+ """
+
+ def __init__(self, scale=None, mean=None, std=None, order='chw', output_fp16=False, channel_num=3):
+ if isinstance(scale, str):
+ scale = eval(scale)
+ assert channel_num in [3, 4], "channel number of input image should be set to 3 or 4."
+ self.channel_num = channel_num
+ self.output_dtype = 'float16' if output_fp16 else 'float32'
+ self.scale = np.float32(scale if scale is not None else 1.0 / 255.0)
+ self.order = order
+ mean = mean if mean is not None else [0.485, 0.456, 0.406]
+ std = std if std is not None else [0.229, 0.224, 0.225]
+
+ shape = (3, 1, 1) if self.order == 'chw' else (1, 1, 3)
+ self.mean = np.array(mean).reshape(shape).astype('float32')
+ self.std = np.array(std).reshape(shape).astype('float32')
+
+ def __call__(self, img):
+ from PIL import Image
+ if isinstance(img, Image.Image):
+ img = np.array(img)
+
+ assert isinstance(img, np.ndarray), "invalid input 'img' in NormalizeImage"
+
+ img = (img.astype('float32') * self.scale - self.mean) / self.std
+
+ if self.channel_num == 4:
+ img_h = img.shape[1] if self.order == 'chw' else img.shape[0]
+ img_w = img.shape[2] if self.order == 'chw' else img.shape[1]
+ pad_zeros = np.zeros((1, img_h, img_w)) if self.order == 'chw' else np.zeros((img_h, img_w, 1))
+ img = (np.concatenate((img, pad_zeros), axis=0) if self.order == 'chw' else np.concatenate(
+ (img, pad_zeros), axis=2))
+ return img.astype(self.output_dtype)
+
+
+class ToCHWImage(object):
+ """ convert hwc image to chw image
+ """
+
+ def __init__(self):
+ pass
+
+ def __call__(self, img):
+ from PIL import Image
+ if isinstance(img, Image.Image):
+ img = np.array(img)
+
+ return img.transpose((2, 0, 1))
+
+
+class ColorJitter(RawColorJitter):
+ """ColorJitter.
+ """
+
+ def __init__(self, *args, **kwargs):
+ super().__init__(*args, **kwargs)
+
+ def __call__(self, img):
+ if not isinstance(img, Image.Image):
+ img = np.ascontiguousarray(img)
+ img = Image.fromarray(img)
+ img = super()._apply_image(img)
+ if isinstance(img, Image.Image):
+ img = np.asarray(img)
+ return img
+
+
+def base64_to_cv2(b64str):
+ data = base64.b64decode(b64str.encode('utf8'))
+ data = np.fromstring(data, np.uint8)
+ data = cv2.imdecode(data, cv2.IMREAD_COLOR)
+ return data
+
+
+class Topk(object):
+
+ def __init__(self, topk=1, class_id_map_file=None):
+ assert isinstance(topk, (int, ))
+ self.class_id_map = self.parse_class_id_map(class_id_map_file)
+ self.topk = topk
+
+ def parse_class_id_map(self, class_id_map_file):
+ if class_id_map_file is None:
+ return None
+ if not os.path.exists(class_id_map_file):
+ print(
+ "Warning: If want to use your own label_dict, please input legal path!\nOtherwise label_names will be empty!"
+ )
+ return None
+
+ try:
+ class_id_map = {}
+ with open(class_id_map_file, "r") as fin:
+ lines = fin.readlines()
+ for line in lines:
+ partition = line.split("\n")[0].partition(" ")
+ class_id_map[int(partition[0])] = str(partition[-1])
+ except Exception as ex:
+ print(ex)
+ class_id_map = None
+ return class_id_map
+
+ def __call__(self, x, file_names=None, multilabel=False):
+ assert isinstance(x, paddle.Tensor)
+ if file_names is not None:
+ assert x.shape[0] == len(file_names)
+ x = F.softmax(x, axis=-1) if not multilabel else F.sigmoid(x)
+ x = x.numpy()
+ y = []
+ for idx, probs in enumerate(x):
+ index = probs.argsort(axis=0)[-self.topk:][::-1].astype("int32") if not multilabel else np.where(
+ probs >= 0.5)[0].astype("int32")
+ clas_id_list = []
+ score_list = []
+ label_name_list = []
+ for i in index:
+ clas_id_list.append(i.item())
+ score_list.append(probs[i].item())
+ if self.class_id_map is not None:
+ label_name_list.append(self.class_id_map[i.item()])
+ result = {
+ "class_ids": clas_id_list,
+ "scores": np.around(score_list, decimals=5).tolist(),
+ }
+ if file_names is not None:
+ result["file_name"] = file_names[idx]
+ if label_name_list is not None:
+ result["label_names"] = label_name_list
+ y.append(result)
+ return y
diff --git a/modules/image/classification/pplcnet_x2_5_imagenet/utils.py b/modules/image/classification/pplcnet_x2_5_imagenet/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..df2bc36b3050beb0256bf2266dd6b33b4590e537
--- /dev/null
+++ b/modules/image/classification/pplcnet_x2_5_imagenet/utils.py
@@ -0,0 +1,129 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import argparse
+import copy
+import os
+
+import yaml
+
+__all__ = ['get_config']
+
+
+class AttrDict(dict):
+
+ def __getattr__(self, key):
+ return self[key]
+
+ def __setattr__(self, key, value):
+ if key in self.__dict__:
+ self.__dict__[key] = value
+ else:
+ self[key] = value
+
+ def __deepcopy__(self, content):
+ return copy.deepcopy(dict(self))
+
+
+def create_attr_dict(yaml_config):
+ from ast import literal_eval
+ for key, value in yaml_config.items():
+ if type(value) is dict:
+ yaml_config[key] = value = AttrDict(value)
+ if isinstance(value, str):
+ try:
+ value = literal_eval(value)
+ except BaseException:
+ pass
+ if isinstance(value, AttrDict):
+ create_attr_dict(yaml_config[key])
+ else:
+ yaml_config[key] = value
+
+
+def parse_config(cfg_file):
+ """Load a config file into AttrDict"""
+ with open(cfg_file, 'r') as fopen:
+ yaml_config = AttrDict(yaml.load(fopen, Loader=yaml.SafeLoader))
+ create_attr_dict(yaml_config)
+ return yaml_config
+
+
+def override(dl, ks, v):
+ """
+ Recursively replace dict of list
+ Args:
+ dl(dict or list): dict or list to be replaced
+ ks(list): list of keys
+ v(str): value to be replaced
+ """
+
+ def str2num(v):
+ try:
+ return eval(v)
+ except Exception:
+ return v
+
+ assert isinstance(dl, (list, dict)), ("{} should be a list or a dict")
+ assert len(ks) > 0, ('lenght of keys should larger than 0')
+ if isinstance(dl, list):
+ k = str2num(ks[0])
+ if len(ks) == 1:
+ assert k < len(dl), ('index({}) out of range({})'.format(k, dl))
+ dl[k] = str2num(v)
+ else:
+ override(dl[k], ks[1:], v)
+ else:
+ if len(ks) == 1:
+ # assert ks[0] in dl, ('{} is not exist in {}'.format(ks[0], dl))
+ if not ks[0] in dl:
+ print('A new filed ({}) detected!'.format(ks[0], dl))
+ dl[ks[0]] = str2num(v)
+ else:
+ override(dl[ks[0]], ks[1:], v)
+
+
+def override_config(config, options=None):
+ """
+ Recursively override the config
+ Args:
+ config(dict): dict to be replaced
+ options(list): list of pairs(key0.key1.idx.key2=value)
+ such as: [
+ 'topk=2',
+ 'VALID.transforms.1.ResizeImage.resize_short=300'
+ ]
+ Returns:
+ config(dict): replaced config
+ """
+ if options is not None:
+ for opt in options:
+ assert isinstance(opt, str), ("option({}) should be a str".format(opt))
+ assert "=" in opt, ("option({}) should contain a ="
+ "to distinguish between key and value".format(opt))
+ pair = opt.split('=')
+ assert len(pair) == 2, ("there can be only a = in the option")
+ key, value = pair
+ keys = key.split('.')
+ override(config, keys, value)
+ return config
+
+
+def get_config(fname, overrides=None, show=False):
+ """
+ Read config from file
+ """
+ assert os.path.exists(fname), ('config file({}) is not exist'.format(fname))
+ config = parse_config(fname)
+ override_config(config, overrides)
+ return config
diff --git a/modules/text/text_generation/ernie_tiny/README.md b/modules/text/text_generation/ernie_tiny/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..15c6543286655543a7e1345d3b1fdf7394c6b8ef
--- /dev/null
+++ b/modules/text/text_generation/ernie_tiny/README.md
@@ -0,0 +1,126 @@
+# ernie_tiny
+
+|模型名称|ernie_tiny|
+| :--- | :---: |
+|类别|图像 - 图像生成|
+|网络|SPADEGenerator|
+|数据集|coco_stuff|
+|是否支持Fine-tuning|否|
+|模型大小|74MB|
+|最新更新日期|2021-12-14|
+|数据指标|-|
+
+
+## 一、模型基本信息
+
+- ### 应用效果展示
+ - 样例结果示例:
+
+
+
+
+- ### 模型介绍
+
+ - 本模块采用一个像素风格迁移网络 Pix2PixHD,能够根据输入的语义分割标签生成照片风格的图片。为了解决模型归一化层导致标签语义信息丢失的问题,向 Pix2PixHD 的生成器网络中添加了 SPADE(Spatially-Adaptive
+ Normalization)空间自适应归一化模块,通过两个卷积层保留了归一化时训练的缩放与偏置参数的空间维度,以增强生成图片的质量。语义风格标签图像可以参考[coco_stuff数据集](https://github.com/nightrome/cocostuff)获取, 也可以通过[PaddleGAN repo中的该项目](https://github.com/PaddlePaddle/PaddleGAN/blob/87537ad9d4eeda17eaa5916c6a585534ab989ea8/docs/zh_CN/tutorials/photopen.md)来自定义生成图像进行体验。
+
+
+
+## 二、安装
+
+- ### 1、环境依赖
+ - ppgan
+
+- ### 2、安装
+
+ - ```shell
+ $ hub install photopen
+ ```
+ - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
+ | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
+
+## 三、模型API预测
+
+- ### 1、命令行预测
+
+ - ```shell
+ # Read from a file
+ $ hub run photopen --input_path "/PATH/TO/IMAGE"
+ ```
+ - 通过命令行方式实现图像生成模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
+
+- ### 2、预测代码示例
+
+ - ```python
+ import paddlehub as hub
+
+ module = hub.Module(name="photopen")
+ input_path = ["/PATH/TO/IMAGE"]
+ # Read from a file
+ module.photo_transfer(paths=input_path, output_dir='./transfer_result/', use_gpu=True)
+ ```
+
+- ### 3、API
+
+ - ```python
+ photo_transfer(images=None, paths=None, output_dir='./transfer_result/', use_gpu=False, visualization=True):
+ ```
+ - 图像转换生成API。
+
+ - **参数**
+
+ - images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\];
+ - paths (list\[str\]): 图片的路径;
+ - output\_dir (str): 结果保存的路径;
+ - use\_gpu (bool): 是否使用 GPU;
+ - visualization(bool): 是否保存结果到本地文件夹
+
+
+## 四、服务部署
+
+- PaddleHub Serving可以部署一个在线图像转换生成服务。
+
+- ### 第一步:启动PaddleHub Serving
+
+ - 运行启动命令:
+ - ```shell
+ $ hub serving start -m photopen
+ ```
+
+ - 这样就完成了一个图像转换生成的在线服务API的部署,默认端口号为8866。
+
+ - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
+
+- ### 第二步:发送预测请求
+
+ - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
+
+ - ```python
+ import requests
+ import json
+ import cv2
+ import base64
+
+
+ def cv2_to_base64(image):
+ data = cv2.imencode('.jpg', image)[1]
+ return base64.b64encode(data.tostring()).decode('utf8')
+
+ # 发送HTTP请求
+ data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
+ headers = {"Content-type": "application/json"}
+ url = "http://127.0.0.1:8866/predict/photopen"
+ r = requests.post(url=url, headers=headers, data=json.dumps(data))
+
+ # 打印预测结果
+ print(r.json()["results"])
+
+## 五、更新历史
+
+* 1.0.0
+
+ 初始发布
+
+ - ```shell
+ $ hub install ernie_tiny==1.1.0
+ ```
diff --git a/modules/text/text_generation/ernie_tiny/README_en.md b/modules/text/text_generation/ernie_tiny/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..373348799089cc370f90335e6290c0ce38a8a11c
--- /dev/null
+++ b/modules/text/text_generation/ernie_tiny/README_en.md
@@ -0,0 +1,171 @@
+# ernie_tiny
+
+|Module Name|ernie_tiny|
+| :--- | :---: |
+|Category|object detection|
+|Network|faster_rcnn|
+|Dataset|COCO2017|
+|Fine-tuning supported or not|No|
+|Module Size|161MB|
+|Latest update date|2021-03-15|
+|Data indicators|-|
+
+
+## I.Basic Information
+
+- ### Application Effect Display
+ - Sample results:
+
+
+
+