提交 33c9d1a8 编写于 作者: C chenjian

add clas modules

上级 bac08ec0
# esnet_x0_25_imagenet
|模型名称|esnet_x0_25_imagenet|
| :--- | :---: |
|类别|图像-图像分类|
|网络|ESNet|
|数据集|ImageNet-2012|
|是否支持Fine-tuning|否|
|模型大小|10 MB|
|最新更新日期|2022-04-02|
|数据指标|Acc|
## 一、模型基本信息
- ### 模型介绍
- ESNet(Enhanced ShuffleNet)是百度自研的一个轻量级网络,该网络在 ShuffleNetV2 的基础上融合了 MobileNetV3、GhostNet、PPLCNet 的优点,组合成了一个在 ARM 设备上速度更快、精度更高的网络,由于其出色的表现,所以在 PaddleDetection 推出的 [PP-PicoDet](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/picodet) 使用了该模型做 backbone,配合更强的目标检测算法,最终的指标一举刷新了目标检测模型在 ARM 设备上的 SOTA 指标。该模型为模型规模参数scale为x0.25下的ESNet模型。
## 二、安装
- ### 1、环境依赖
- paddlepaddle >= 1.6.2
- paddlehub >= 1.6.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
- ### 2、安装
- ```shell
$ hub install esnet_x0_25_imagenet
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
$ hub run esnet_x0_25_imagenet --input_path "/PATH/TO/IMAGE"
```
- 通过命令行方式实现分类模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、预测代码示例
- ```python
import paddlehub as hub
import cv2
classifier = hub.Module(name="esnet_x0_25_imagenet")
result = classifier.classification(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = classifier.classification(paths=['/PATH/TO/IMAGE'])
```
- ### 3、API
- ```python
def classification(images=None,
paths=None,
batch_size=1,
use_gpu=False,
top_k=1):
```
- 分类接口API。
- **参数**
- images (list\[numpy.ndarray\]): 图片数据,每一个图片数据的shape 均为 \[H, W, C\],颜色空间为 BGR; <br/>
- paths (list\[str\]): 图片的路径; <br/>
- batch\_size (int): batch 的大小;<br/>
- use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量** <br/>
- top\_k (int): 返回预测结果的前 k 个。
- **返回**
- res (list\[dict\]): 分类结果,列表的每一个元素均为字典,其中 key 包括'class_ids'(种类索引), 'scores'(置信度) 和 'label_names'(种类名称)
## 四、服务部署
- PaddleHub Serving可以部署一个图像识别的在线服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m esnet_x0_25_imagenet
```
- 这样就完成了一个图像识别的在线服务的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"\}
url = "http://127.0.0.1:8866/predict/esnet_x0_25_imagenet"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
## 五、更新历史
* 1.0.0
初始发布
- ```shell
$ hub install esnet_x0_25_imagenet==1.0.0
```
# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import math
from typing import Any
from typing import Callable
from typing import Dict
from typing import List
from typing import Tuple
from typing import Union
import paddle
import paddle.nn as nn
from paddle import concat
from paddle import ParamAttr
from paddle import reshape
from paddle import split
from paddle import transpose
from paddle.nn import AdaptiveAvgPool2D
from paddle.nn import BatchNorm
from paddle.nn import Conv2D
from paddle.nn import Dropout
from paddle.nn import Linear
from paddle.nn import MaxPool2D
from paddle.nn.initializer import KaimingNormal
from paddle.regularizer import L2Decay
MODEL_STAGES_PATTERN = {"ESNet": ["blocks[2]", "blocks[9]", "blocks[12]"]}
class Identity(nn.Layer):
def __init__(self):
super(Identity, self).__init__()
def forward(self, inputs):
return inputs
class TheseusLayer(nn.Layer):
def __init__(self, *args, **kwargs):
super(TheseusLayer, self).__init__()
self.res_dict = {}
self.res_name = self.full_name()
self.pruner = None
self.quanter = None
def _return_dict_hook(self, layer, input, output):
res_dict = {"output": output}
# 'list' is needed to avoid error raised by popping self.res_dict
for res_key in list(self.res_dict):
# clear the res_dict because the forward process may change according to input
res_dict[res_key] = self.res_dict.pop(res_key)
return res_dict
def init_res(self, stages_pattern, return_patterns=None, return_stages=None):
if return_patterns and return_stages:
msg = f"The 'return_patterns' would be ignored when 'return_stages' is set."
return_stages = None
if return_stages is True:
return_patterns = stages_pattern
# return_stages is int or bool
if type(return_stages) is int:
return_stages = [return_stages]
if isinstance(return_stages, list):
if max(return_stages) > len(stages_pattern) or min(return_stages) < 0:
msg = f"The 'return_stages' set error. Illegal value(s) have been ignored. The stages' pattern list is {stages_pattern}."
return_stages = [val for val in return_stages if val >= 0 and val < len(stages_pattern)]
return_patterns = [stages_pattern[i] for i in return_stages]
if return_patterns:
self.update_res(return_patterns)
def replace_sub(self, *args, **kwargs) -> None:
msg = "The function 'replace_sub()' is deprecated, please use 'upgrade_sublayer()' instead."
raise DeprecationWarning(msg)
def upgrade_sublayer(self, layer_name_pattern: Union[str, List[str]],
handle_func: Callable[[nn.Layer, str], nn.Layer]) -> Dict[str, nn.Layer]:
"""use 'handle_func' to modify the sub-layer(s) specified by 'layer_name_pattern'.
Args:
layer_name_pattern (Union[str, List[str]]): The name of layer to be modified by 'handle_func'.
handle_func (Callable[[nn.Layer, str], nn.Layer]): The function to modify target layer specified by 'layer_name_pattern'. The formal params are the layer(nn.Layer) and pattern(str) that is (a member of) layer_name_pattern (when layer_name_pattern is List type). And the return is the layer processed.
Returns:
Dict[str, nn.Layer]: The key is the pattern and corresponding value is the result returned by 'handle_func()'.
Examples:
from paddle import nn
import paddleclas
def rep_func(layer: nn.Layer, pattern: str):
new_layer = nn.Conv2D(
in_channels=layer._in_channels,
out_channels=layer._out_channels,
kernel_size=5,
padding=2
)
return new_layer
net = paddleclas.MobileNetV1()
res = net.replace_sub(layer_name_pattern=["blocks[11].depthwise_conv.conv", "blocks[12].depthwise_conv.conv"], handle_func=rep_func)
print(res)
# {'blocks[11].depthwise_conv.conv': the corresponding new_layer, 'blocks[12].depthwise_conv.conv': the corresponding new_layer}
"""
if not isinstance(layer_name_pattern, list):
layer_name_pattern = [layer_name_pattern]
hit_layer_pattern_list = []
for pattern in layer_name_pattern:
# parse pattern to find target layer and its parent
layer_list = parse_pattern_str(pattern=pattern, parent_layer=self)
if not layer_list:
continue
sub_layer_parent = layer_list[-2]["layer"] if len(layer_list) > 1 else self
sub_layer = layer_list[-1]["layer"]
sub_layer_name = layer_list[-1]["name"]
sub_layer_index = layer_list[-1]["index"]
new_sub_layer = handle_func(sub_layer, pattern)
if sub_layer_index:
getattr(sub_layer_parent, sub_layer_name)[sub_layer_index] = new_sub_layer
else:
setattr(sub_layer_parent, sub_layer_name, new_sub_layer)
hit_layer_pattern_list.append(pattern)
return hit_layer_pattern_list
def stop_after(self, stop_layer_name: str) -> bool:
"""stop forward and backward after 'stop_layer_name'.
Args:
stop_layer_name (str): The name of layer that stop forward and backward after this layer.
Returns:
bool: 'True' if successful, 'False' otherwise.
"""
layer_list = parse_pattern_str(stop_layer_name, self)
if not layer_list:
return False
parent_layer = self
for layer_dict in layer_list:
name, index = layer_dict["name"], layer_dict["index"]
if not set_identity(parent_layer, name, index):
msg = f"Failed to set the layers that after stop_layer_name('{stop_layer_name}') to IdentityLayer. The error layer's name is '{name}'."
return False
parent_layer = layer_dict["layer"]
return True
def update_res(self, return_patterns: Union[str, List[str]]) -> Dict[str, nn.Layer]:
"""update the result(s) to be returned.
Args:
return_patterns (Union[str, List[str]]): The name of layer to return output.
Returns:
Dict[str, nn.Layer]: The pattern(str) and corresponding layer(nn.Layer) that have been set successfully.
"""
# clear res_dict that could have been set
self.res_dict = {}
class Handler(object):
def __init__(self, res_dict):
# res_dict is a reference
self.res_dict = res_dict
def __call__(self, layer, pattern):
layer.res_dict = self.res_dict
layer.res_name = pattern
if hasattr(layer, "hook_remove_helper"):
layer.hook_remove_helper.remove()
layer.hook_remove_helper = layer.register_forward_post_hook(save_sub_res_hook)
return layer
handle_func = Handler(self.res_dict)
hit_layer_pattern_list = self.upgrade_sublayer(return_patterns, handle_func=handle_func)
if hasattr(self, "hook_remove_helper"):
self.hook_remove_helper.remove()
self.hook_remove_helper = self.register_forward_post_hook(self._return_dict_hook)
return hit_layer_pattern_list
def save_sub_res_hook(layer, input, output):
layer.res_dict[layer.res_name] = output
def set_identity(parent_layer: nn.Layer, layer_name: str, layer_index: str = None) -> bool:
"""set the layer specified by layer_name and layer_index to Indentity.
Args:
parent_layer (nn.Layer): The parent layer of target layer specified by layer_name and layer_index.
layer_name (str): The name of target layer to be set to Indentity.
layer_index (str, optional): The index of target layer to be set to Indentity in parent_layer. Defaults to None.
Returns:
bool: True if successfully, False otherwise.
"""
stop_after = False
for sub_layer_name in parent_layer._sub_layers:
if stop_after:
parent_layer._sub_layers[sub_layer_name] = Identity()
continue
if sub_layer_name == layer_name:
stop_after = True
if layer_index and stop_after:
stop_after = False
for sub_layer_index in parent_layer._sub_layers[layer_name]._sub_layers:
if stop_after:
parent_layer._sub_layers[layer_name][sub_layer_index] = Identity()
continue
if layer_index == sub_layer_index:
stop_after = True
return stop_after
def parse_pattern_str(pattern: str, parent_layer: nn.Layer) -> Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]:
"""parse the string type pattern.
Args:
pattern (str): The pattern to discribe layer.
parent_layer (nn.Layer): The root layer relative to the pattern.
Returns:
Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]: None if failed. If successfully, the members are layers parsed in order:
[
{"layer": first layer, "name": first layer's name parsed, "index": first layer's index parsed if exist},
{"layer": second layer, "name": second layer's name parsed, "index": second layer's index parsed if exist},
...
]
"""
pattern_list = pattern.split(".")
if not pattern_list:
msg = f"The pattern('{pattern}') is illegal. Please check and retry."
return None
layer_list = []
while len(pattern_list) > 0:
if '[' in pattern_list[0]:
target_layer_name = pattern_list[0].split('[')[0]
target_layer_index = pattern_list[0].split('[')[1].split(']')[0]
else:
target_layer_name = pattern_list[0]
target_layer_index = None
target_layer = getattr(parent_layer, target_layer_name, None)
if target_layer is None:
msg = f"Not found layer named('{target_layer_name}') specifed in pattern('{pattern}')."
return None
if target_layer_index and target_layer:
if int(target_layer_index) < 0 or int(target_layer_index) >= len(target_layer):
msg = f"Not found layer by index('{target_layer_index}') specifed in pattern('{pattern}'). The index should < {len(target_layer)} and > 0."
return None
target_layer = target_layer[target_layer_index]
layer_list.append({"layer": target_layer, "name": target_layer_name, "index": target_layer_index})
pattern_list = pattern_list[1:]
parent_layer = target_layer
return layer_list
def channel_shuffle(x, groups):
batch_size, num_channels, height, width = x.shape[0:4]
channels_per_group = num_channels // groups
x = reshape(x=x, shape=[batch_size, groups, channels_per_group, height, width])
x = transpose(x=x, perm=[0, 2, 1, 3, 4])
x = reshape(x=x, shape=[batch_size, num_channels, height, width])
return x
def make_divisible(v, divisor=8, min_value=None):
if min_value is None:
min_value = divisor
new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
if new_v < 0.9 * v:
new_v += divisor
return new_v
class ConvBNLayer(TheseusLayer):
def __init__(self, in_channels, out_channels, kernel_size, stride=1, groups=1, if_act=True):
super().__init__()
self.conv = Conv2D(in_channels=in_channels,
out_channels=out_channels,
kernel_size=kernel_size,
stride=stride,
padding=(kernel_size - 1) // 2,
groups=groups,
weight_attr=ParamAttr(initializer=KaimingNormal()),
bias_attr=False)
self.bn = BatchNorm(out_channels,
param_attr=ParamAttr(regularizer=L2Decay(0.0)),
bias_attr=ParamAttr(regularizer=L2Decay(0.0)))
self.if_act = if_act
self.hardswish = nn.Hardswish()
def forward(self, x):
x = self.conv(x)
x = self.bn(x)
if self.if_act:
x = self.hardswish(x)
return x
class SEModule(TheseusLayer):
def __init__(self, channel, reduction=4):
super().__init__()
self.avg_pool = AdaptiveAvgPool2D(1)
self.conv1 = Conv2D(in_channels=channel, out_channels=channel // reduction, kernel_size=1, stride=1, padding=0)
self.relu = nn.ReLU()
self.conv2 = Conv2D(in_channels=channel // reduction, out_channels=channel, kernel_size=1, stride=1, padding=0)
self.hardsigmoid = nn.Hardsigmoid()
def forward(self, x):
identity = x
x = self.avg_pool(x)
x = self.conv1(x)
x = self.relu(x)
x = self.conv2(x)
x = self.hardsigmoid(x)
x = paddle.multiply(x=identity, y=x)
return x
class ESBlock1(TheseusLayer):
def __init__(self, in_channels, out_channels):
super().__init__()
self.pw_1_1 = ConvBNLayer(in_channels=in_channels // 2, out_channels=out_channels // 2, kernel_size=1, stride=1)
self.dw_1 = ConvBNLayer(in_channels=out_channels // 2,
out_channels=out_channels // 2,
kernel_size=3,
stride=1,
groups=out_channels // 2,
if_act=False)
self.se = SEModule(out_channels)
self.pw_1_2 = ConvBNLayer(in_channels=out_channels, out_channels=out_channels // 2, kernel_size=1, stride=1)
def forward(self, x):
x1, x2 = split(x, num_or_sections=[x.shape[1] // 2, x.shape[1] // 2], axis=1)
x2 = self.pw_1_1(x2)
x3 = self.dw_1(x2)
x3 = concat([x2, x3], axis=1)
x3 = self.se(x3)
x3 = self.pw_1_2(x3)
x = concat([x1, x3], axis=1)
return channel_shuffle(x, 2)
class ESBlock2(TheseusLayer):
def __init__(self, in_channels, out_channels):
super().__init__()
# branch1
self.dw_1 = ConvBNLayer(in_channels=in_channels,
out_channels=in_channels,
kernel_size=3,
stride=2,
groups=in_channels,
if_act=False)
self.pw_1 = ConvBNLayer(in_channels=in_channels, out_channels=out_channels // 2, kernel_size=1, stride=1)
# branch2
self.pw_2_1 = ConvBNLayer(in_channels=in_channels, out_channels=out_channels // 2, kernel_size=1)
self.dw_2 = ConvBNLayer(in_channels=out_channels // 2,
out_channels=out_channels // 2,
kernel_size=3,
stride=2,
groups=out_channels // 2,
if_act=False)
self.se = SEModule(out_channels // 2)
self.pw_2_2 = ConvBNLayer(in_channels=out_channels // 2, out_channels=out_channels // 2, kernel_size=1)
self.concat_dw = ConvBNLayer(in_channels=out_channels,
out_channels=out_channels,
kernel_size=3,
groups=out_channels)
self.concat_pw = ConvBNLayer(in_channels=out_channels, out_channels=out_channels, kernel_size=1)
def forward(self, x):
x1 = self.dw_1(x)
x1 = self.pw_1(x1)
x2 = self.pw_2_1(x)
x2 = self.dw_2(x2)
x2 = self.se(x2)
x2 = self.pw_2_2(x2)
x = concat([x1, x2], axis=1)
x = self.concat_dw(x)
x = self.concat_pw(x)
return x
class ESNet(TheseusLayer):
def __init__(self,
stages_pattern,
class_num=1000,
scale=1.0,
dropout_prob=0.2,
class_expand=1280,
return_patterns=None,
return_stages=None):
super().__init__()
self.scale = scale
self.class_num = class_num
self.class_expand = class_expand
stage_repeats = [3, 7, 3]
stage_out_channels = [
-1, 24, make_divisible(116 * scale),
make_divisible(232 * scale),
make_divisible(464 * scale), 1024
]
self.conv1 = ConvBNLayer(in_channels=3, out_channels=stage_out_channels[1], kernel_size=3, stride=2)
self.max_pool = MaxPool2D(kernel_size=3, stride=2, padding=1)
block_list = []
for stage_id, num_repeat in enumerate(stage_repeats):
for i in range(num_repeat):
if i == 0:
block = ESBlock2(in_channels=stage_out_channels[stage_id + 1],
out_channels=stage_out_channels[stage_id + 2])
else:
block = ESBlock1(in_channels=stage_out_channels[stage_id + 2],
out_channels=stage_out_channels[stage_id + 2])
block_list.append(block)
self.blocks = nn.Sequential(*block_list)
self.conv2 = ConvBNLayer(in_channels=stage_out_channels[-2], out_channels=stage_out_channels[-1], kernel_size=1)
self.avg_pool = AdaptiveAvgPool2D(1)
self.last_conv = Conv2D(in_channels=stage_out_channels[-1],
out_channels=self.class_expand,
kernel_size=1,
stride=1,
padding=0,
bias_attr=False)
self.hardswish = nn.Hardswish()
self.dropout = Dropout(p=dropout_prob, mode="downscale_in_infer")
self.flatten = nn.Flatten(start_axis=1, stop_axis=-1)
self.fc = Linear(self.class_expand, self.class_num)
super().init_res(stages_pattern, return_patterns=return_patterns, return_stages=return_stages)
def forward(self, x):
x = self.conv1(x)
x = self.max_pool(x)
x = self.blocks(x)
x = self.conv2(x)
x = self.avg_pool(x)
x = self.last_conv(x)
x = self.hardswish(x)
x = self.dropout(x)
x = self.flatten(x)
x = self.fc(x)
return x
def ESNet_x0_25(pretrained=False, use_ssld=False, **kwargs):
"""
ESNet_x0_25
Args:
pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise.
If str, means the path of the pretrained model.
use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True.
Returns:
model: nn.Layer. Specific `ESNet_x0_25` model depends on args.
"""
model = ESNet(scale=0.25, stages_pattern=MODEL_STAGES_PATTERN["ESNet"], **kwargs)
return model
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import copy
import os
import cv2
import numpy as np
import paddle
from skimage.io import imread
from skimage.transform import rescale
from skimage.transform import resize
import paddlehub as hub
from .model import ESNet_x0_25
from .processor import base64_to_cv2
from .processor import create_operators
from .processor import Topk
from .utils import get_config
from paddlehub.module.module import moduleinfo
from paddlehub.module.module import runnable
from paddlehub.module.module import serving
@moduleinfo(name="esnet_x0_25_imagenet",
type="cv/classification",
author="paddlepaddle",
author_email="",
summary="",
version="1.0.0")
class Esnet_x0_25_Imagenet:
def __init__(self):
self.config = get_config(os.path.join(self.directory, 'ESNet_x0_25.yaml'), show=False)
self.label_path = os.path.join(self.directory, 'imagenet1k_label_list.txt')
self.pretrain_path = os.path.join(self.directory, 'ESNet_x0_25_pretrained.pdparams')
self.config['Infer']['PostProcess']['class_id_map_file'] = self.label_path
self.model = ESNet_x0_25()
param_state_dict = paddle.load(self.pretrain_path)
self.model.set_dict(param_state_dict)
self.preprocess_funcs = create_operators(self.config["Infer"]["transforms"])
def classification(self,
images: list = None,
paths: list = None,
batch_size: int = 1,
use_gpu: bool = False,
top_k: int = 1):
'''
Args:
images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR.
paths (list[str]): The paths of images.
batch_size (int): batch size.
use_gpu (bool): Whether to use gpu.
top_k (int): Return top k results.
Returns:
res (list[dict]): The classfication results, each result dict contains key 'class_ids', 'scores' and 'label_names'.
'''
postprocess_func = Topk(top_k, self.label_path)
inputs = []
results = []
paddle.disable_static()
place = 'gpu:0' if use_gpu else 'cpu'
place = paddle.set_device(place)
if images == None and paths == None:
print('No image provided. Please input an image or a image path.')
return
if images != None:
for image in images:
image = image[:, :, ::-1]
inputs.append(image)
if paths != None:
for path in paths:
image = cv2.imread(path)[:, :, ::-1]
inputs.append(image)
batch_data = []
for idx, imagedata in enumerate(inputs):
for process in self.preprocess_funcs:
imagedata = process(imagedata)
batch_data.append(imagedata)
if len(batch_data) >= batch_size or idx == len(inputs) - 1:
batch_tensor = paddle.to_tensor(batch_data)
out = self.model(batch_tensor)
if isinstance(out, list):
out = out[0]
if isinstance(out, dict) and "logits" in out:
out = out["logits"]
if isinstance(out, dict) and "output" in out:
out = out["output"]
result = postprocess_func(out)
results.extend(result)
batch_data.clear()
return results
@runnable
def run_cmd(self, argvs: list):
"""
Run as a command.
"""
self.parser = argparse.ArgumentParser(description="Run the {} module.".format(self.name),
prog='hub run {}'.format(self.name),
usage='%(prog)s',
add_help=True)
self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
self.arg_config_group = self.parser.add_argument_group(
title="Config options", description="Run configuration for controlling module behavior, not required.")
self.add_module_config_arg()
self.add_module_input_arg()
self.args = self.parser.parse_args(argvs)
results = self.classification(paths=[self.args.input_path],
use_gpu=self.args.use_gpu,
batch_size=self.args.batch_size,
top_k=self.args.top_k)
return results
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.classification(images=images_decode, **kwargs)
return results
def add_module_config_arg(self):
"""
Add the command config options.
"""
self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not")
self.arg_config_group.add_argument('--batch_size', type=int, default=1, help='batch size')
self.arg_config_group.add_argument('--top_k', type=int, default=1, help='Return top k results.')
def add_module_input_arg(self):
"""
Add the command input options.
"""
self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.")
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import base64
import inspect
import math
import os
import random
import sys
from functools import partial
import cv2
import numpy as np
import paddle
import paddle.nn.functional as F
import six
from paddle.vision.transforms import ColorJitter as RawColorJitter
from PIL import Image
def create_operators(params, class_num=None):
"""
create operators based on the config
Args:
params(list): a dict list, used to create some operators
"""
assert isinstance(params, list), ('operator config should be a list')
ops = []
current_module = sys.modules[__name__]
for operator in params:
assert isinstance(operator, dict) and len(operator) == 1, "yaml format error"
op_name = list(operator)[0]
param = {} if operator[op_name] is None else operator[op_name]
op_func = getattr(current_module, op_name)
if "class_num" in inspect.getfullargspec(op_func).args:
param.update({"class_num": class_num})
op = op_func(**param)
ops.append(op)
return ops
class UnifiedResize(object):
def __init__(self, interpolation=None, backend="cv2"):
_cv2_interp_from_str = {
'nearest': cv2.INTER_NEAREST,
'bilinear': cv2.INTER_LINEAR,
'area': cv2.INTER_AREA,
'bicubic': cv2.INTER_CUBIC,
'lanczos': cv2.INTER_LANCZOS4
}
_pil_interp_from_str = {
'nearest': Image.NEAREST,
'bilinear': Image.BILINEAR,
'bicubic': Image.BICUBIC,
'box': Image.BOX,
'lanczos': Image.LANCZOS,
'hamming': Image.HAMMING
}
def _pil_resize(src, size, resample):
pil_img = Image.fromarray(src)
pil_img = pil_img.resize(size, resample)
return np.asarray(pil_img)
if backend.lower() == "cv2":
if isinstance(interpolation, str):
interpolation = _cv2_interp_from_str[interpolation.lower()]
# compatible with opencv < version 4.4.0
elif interpolation is None:
interpolation = cv2.INTER_LINEAR
self.resize_func = partial(cv2.resize, interpolation=interpolation)
elif backend.lower() == "pil":
if isinstance(interpolation, str):
interpolation = _pil_interp_from_str[interpolation.lower()]
self.resize_func = partial(_pil_resize, resample=interpolation)
else:
self.resize_func = cv2.resize
def __call__(self, src, size):
return self.resize_func(src, size)
class OperatorParamError(ValueError):
""" OperatorParamError
"""
pass
class DecodeImage(object):
""" decode image """
def __init__(self, to_rgb=True, to_np=False, channel_first=False):
self.to_rgb = to_rgb
self.to_np = to_np # to numpy
self.channel_first = channel_first # only enabled when to_np is True
def __call__(self, img):
if six.PY2:
assert type(img) is str and len(img) > 0, "invalid input 'img' in DecodeImage"
else:
assert type(img) is bytes and len(img) > 0, "invalid input 'img' in DecodeImage"
data = np.frombuffer(img, dtype='uint8')
img = cv2.imdecode(data, 1)
if self.to_rgb:
assert img.shape[2] == 3, 'invalid shape of image[%s]' % (img.shape)
img = img[:, :, ::-1]
if self.channel_first:
img = img.transpose((2, 0, 1))
return img
class ResizeImage(object):
""" resize image """
def __init__(self, size=None, resize_short=None, interpolation=None, backend="cv2"):
if resize_short is not None and resize_short > 0:
self.resize_short = resize_short
self.w = None
self.h = None
elif size is not None:
self.resize_short = None
self.w = size if type(size) is int else size[0]
self.h = size if type(size) is int else size[1]
else:
raise OperatorParamError("invalid params for ReisizeImage for '\
'both 'size' and 'resize_short' are None")
self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend)
def __call__(self, img):
img_h, img_w = img.shape[:2]
if self.resize_short is not None:
percent = float(self.resize_short) / min(img_w, img_h)
w = int(round(img_w * percent))
h = int(round(img_h * percent))
else:
w = self.w
h = self.h
return self._resize_func(img, (w, h))
class CropImage(object):
""" crop image """
def __init__(self, size):
if type(size) is int:
self.size = (size, size)
else:
self.size = size # (h, w)
def __call__(self, img):
w, h = self.size
img_h, img_w = img.shape[:2]
w_start = (img_w - w) // 2
h_start = (img_h - h) // 2
w_end = w_start + w
h_end = h_start + h
return img[h_start:h_end, w_start:w_end, :]
class RandCropImage(object):
""" random crop image """
def __init__(self, size, scale=None, ratio=None, interpolation=None, backend="cv2"):
if type(size) is int:
self.size = (size, size) # (h, w)
else:
self.size = size
self.scale = [0.08, 1.0] if scale is None else scale
self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio
self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend)
def __call__(self, img):
size = self.size
scale = self.scale
ratio = self.ratio
aspect_ratio = math.sqrt(random.uniform(*ratio))
w = 1. * aspect_ratio
h = 1. / aspect_ratio
img_h, img_w = img.shape[:2]
bound = min((float(img_w) / img_h) / (w**2), (float(img_h) / img_w) / (h**2))
scale_max = min(scale[1], bound)
scale_min = min(scale[0], bound)
target_area = img_w * img_h * random.uniform(scale_min, scale_max)
target_size = math.sqrt(target_area)
w = int(target_size * w)
h = int(target_size * h)
i = random.randint(0, img_w - w)
j = random.randint(0, img_h - h)
img = img[j:j + h, i:i + w, :]
return self._resize_func(img, size)
class RandFlipImage(object):
""" random flip image
flip_code:
1: Flipped Horizontally
0: Flipped Vertically
-1: Flipped Horizontally & Vertically
"""
def __init__(self, flip_code=1):
assert flip_code in [-1, 0, 1], "flip_code should be a value in [-1, 0, 1]"
self.flip_code = flip_code
def __call__(self, img):
if random.randint(0, 1) == 1:
return cv2.flip(img, self.flip_code)
else:
return img
class NormalizeImage(object):
""" normalize image such as substract mean, divide std
"""
def __init__(self, scale=None, mean=None, std=None, order='chw', output_fp16=False, channel_num=3):
if isinstance(scale, str):
scale = eval(scale)
assert channel_num in [3, 4], "channel number of input image should be set to 3 or 4."
self.channel_num = channel_num
self.output_dtype = 'float16' if output_fp16 else 'float32'
self.scale = np.float32(scale if scale is not None else 1.0 / 255.0)
self.order = order
mean = mean if mean is not None else [0.485, 0.456, 0.406]
std = std if std is not None else [0.229, 0.224, 0.225]
shape = (3, 1, 1) if self.order == 'chw' else (1, 1, 3)
self.mean = np.array(mean).reshape(shape).astype('float32')
self.std = np.array(std).reshape(shape).astype('float32')
def __call__(self, img):
from PIL import Image
if isinstance(img, Image.Image):
img = np.array(img)
assert isinstance(img, np.ndarray), "invalid input 'img' in NormalizeImage"
img = (img.astype('float32') * self.scale - self.mean) / self.std
if self.channel_num == 4:
img_h = img.shape[1] if self.order == 'chw' else img.shape[0]
img_w = img.shape[2] if self.order == 'chw' else img.shape[1]
pad_zeros = np.zeros((1, img_h, img_w)) if self.order == 'chw' else np.zeros((img_h, img_w, 1))
img = (np.concatenate((img, pad_zeros), axis=0) if self.order == 'chw' else np.concatenate(
(img, pad_zeros), axis=2))
return img.astype(self.output_dtype)
class ToCHWImage(object):
""" convert hwc image to chw image
"""
def __init__(self):
pass
def __call__(self, img):
from PIL import Image
if isinstance(img, Image.Image):
img = np.array(img)
return img.transpose((2, 0, 1))
class ColorJitter(RawColorJitter):
"""ColorJitter.
"""
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
def __call__(self, img):
if not isinstance(img, Image.Image):
img = np.ascontiguousarray(img)
img = Image.fromarray(img)
img = super()._apply_image(img)
if isinstance(img, Image.Image):
img = np.asarray(img)
return img
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
class Topk(object):
def __init__(self, topk=1, class_id_map_file=None):
assert isinstance(topk, (int, ))
self.class_id_map = self.parse_class_id_map(class_id_map_file)
self.topk = topk
def parse_class_id_map(self, class_id_map_file):
if class_id_map_file is None:
return None
if not os.path.exists(class_id_map_file):
print(
"Warning: If want to use your own label_dict, please input legal path!\nOtherwise label_names will be empty!"
)
return None
try:
class_id_map = {}
with open(class_id_map_file, "r") as fin:
lines = fin.readlines()
for line in lines:
partition = line.split("\n")[0].partition(" ")
class_id_map[int(partition[0])] = str(partition[-1])
except Exception as ex:
print(ex)
class_id_map = None
return class_id_map
def __call__(self, x, file_names=None, multilabel=False):
assert isinstance(x, paddle.Tensor)
if file_names is not None:
assert x.shape[0] == len(file_names)
x = F.softmax(x, axis=-1) if not multilabel else F.sigmoid(x)
x = x.numpy()
y = []
for idx, probs in enumerate(x):
index = probs.argsort(axis=0)[-self.topk:][::-1].astype("int32") if not multilabel else np.where(
probs >= 0.5)[0].astype("int32")
clas_id_list = []
score_list = []
label_name_list = []
for i in index:
clas_id_list.append(i.item())
score_list.append(probs[i].item())
if self.class_id_map is not None:
label_name_list.append(self.class_id_map[i.item()])
result = {
"class_ids": clas_id_list,
"scores": np.around(score_list, decimals=5).tolist(),
}
if file_names is not None:
result["file_name"] = file_names[idx]
if label_name_list is not None:
result["label_names"] = label_name_list
y.append(result)
return y
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import copy
import os
import yaml
__all__ = ['get_config']
class AttrDict(dict):
def __getattr__(self, key):
return self[key]
def __setattr__(self, key, value):
if key in self.__dict__:
self.__dict__[key] = value
else:
self[key] = value
def __deepcopy__(self, content):
return copy.deepcopy(dict(self))
def create_attr_dict(yaml_config):
from ast import literal_eval
for key, value in yaml_config.items():
if type(value) is dict:
yaml_config[key] = value = AttrDict(value)
if isinstance(value, str):
try:
value = literal_eval(value)
except BaseException:
pass
if isinstance(value, AttrDict):
create_attr_dict(yaml_config[key])
else:
yaml_config[key] = value
def parse_config(cfg_file):
"""Load a config file into AttrDict"""
with open(cfg_file, 'r') as fopen:
yaml_config = AttrDict(yaml.load(fopen, Loader=yaml.SafeLoader))
create_attr_dict(yaml_config)
return yaml_config
def override(dl, ks, v):
"""
Recursively replace dict of list
Args:
dl(dict or list): dict or list to be replaced
ks(list): list of keys
v(str): value to be replaced
"""
def str2num(v):
try:
return eval(v)
except Exception:
return v
assert isinstance(dl, (list, dict)), ("{} should be a list or a dict")
assert len(ks) > 0, ('lenght of keys should larger than 0')
if isinstance(dl, list):
k = str2num(ks[0])
if len(ks) == 1:
assert k < len(dl), ('index({}) out of range({})'.format(k, dl))
dl[k] = str2num(v)
else:
override(dl[k], ks[1:], v)
else:
if len(ks) == 1:
# assert ks[0] in dl, ('{} is not exist in {}'.format(ks[0], dl))
if not ks[0] in dl:
print('A new filed ({}) detected!'.format(ks[0], dl))
dl[ks[0]] = str2num(v)
else:
override(dl[ks[0]], ks[1:], v)
def override_config(config, options=None):
"""
Recursively override the config
Args:
config(dict): dict to be replaced
options(list): list of pairs(key0.key1.idx.key2=value)
such as: [
'topk=2',
'VALID.transforms.1.ResizeImage.resize_short=300'
]
Returns:
config(dict): replaced config
"""
if options is not None:
for opt in options:
assert isinstance(opt, str), ("option({}) should be a str".format(opt))
assert "=" in opt, ("option({}) should contain a ="
"to distinguish between key and value".format(opt))
pair = opt.split('=')
assert len(pair) == 2, ("there can be only a = in the option")
key, value = pair
keys = key.split('.')
override(config, keys, value)
return config
def get_config(fname, overrides=None, show=False):
"""
Read config from file
"""
assert os.path.exists(fname), ('config file({}) is not exist'.format(fname))
config = parse_config(fname)
override_config(config, overrides)
return config
# esnet_x0_5_imagenet
|模型名称|esnet_x0_5_imagenet|
| :--- | :---: |
|类别|图像-图像分类|
|网络|ESNet|
|数据集|ImageNet-2012|
|是否支持Fine-tuning|否|
|模型大小|12 MB|
|最新更新日期|2022-04-02|
|数据指标|Acc|
## 一、模型基本信息
- ### 模型介绍
- ESNet(Enhanced ShuffleNet)是百度自研的一个轻量级网络,该网络在 ShuffleNetV2 的基础上融合了 MobileNetV3、GhostNet、PPLCNet 的优点,组合成了一个在 ARM 设备上速度更快、精度更高的网络,由于其出色的表现,所以在 PaddleDetection 推出的 [PP-PicoDet](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/picodet) 使用了该模型做 backbone,配合更强的目标检测算法,最终的指标一举刷新了目标检测模型在 ARM 设备上的 SOTA 指标。该模型为模型规模参数scale为x0.5下的ESNet模型。
## 二、安装
- ### 1、环境依赖
- paddlepaddle >= 1.6.2
- paddlehub >= 1.6.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
- ### 2、安装
- ```shell
$ hub install esnet_x0_5_imagenet
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
$ hub run esnet_x0_5_imagenet --input_path "/PATH/TO/IMAGE"
```
- 通过命令行方式实现分类模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、预测代码示例
- ```python
import paddlehub as hub
import cv2
classifier = hub.Module(name="esnet_x0_5_imagenet")
result = classifier.classification(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = classifier.classification(paths=['/PATH/TO/IMAGE'])
```
- ### 3、API
- ```python
def classification(images=None,
paths=None,
batch_size=1,
use_gpu=False,
top_k=1):
```
- 分类接口API。
- **参数**
- images (list\[numpy.ndarray\]): 图片数据,每一个图片数据的shape 均为 \[H, W, C\],颜色空间为 BGR; <br/>
- paths (list\[str\]): 图片的路径; <br/>
- batch\_size (int): batch 的大小;<br/>
- use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量** <br/>
- top\_k (int): 返回预测结果的前 k 个。
- **返回**
- res (list\[dict\]): 分类结果,列表的每一个元素均为字典,其中 key 包括'class_ids'(种类索引), 'scores'(置信度) 和 'label_names'(种类名称)
## 四、服务部署
- PaddleHub Serving可以部署一个图像识别的在线服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m esnet_x0_5_imagenet
```
- 这样就完成了一个图像识别的在线服务的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"\}
url = "http://127.0.0.1:8866/predict/esnet_x0_5_imagenet"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
## 五、更新历史
* 1.0.0
初始发布
- ```shell
$ hub install esnet_x0_5_imagenet==1.0.0
```
# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import math
from typing import Any
from typing import Callable
from typing import Dict
from typing import List
from typing import Tuple
from typing import Union
import paddle
import paddle.nn as nn
from paddle import concat
from paddle import ParamAttr
from paddle import reshape
from paddle import split
from paddle import transpose
from paddle.nn import AdaptiveAvgPool2D
from paddle.nn import BatchNorm
from paddle.nn import Conv2D
from paddle.nn import Dropout
from paddle.nn import Linear
from paddle.nn import MaxPool2D
from paddle.nn.initializer import KaimingNormal
from paddle.regularizer import L2Decay
MODEL_STAGES_PATTERN = {"ESNet": ["blocks[2]", "blocks[9]", "blocks[12]"]}
class Identity(nn.Layer):
def __init__(self):
super(Identity, self).__init__()
def forward(self, inputs):
return inputs
class TheseusLayer(nn.Layer):
def __init__(self, *args, **kwargs):
super(TheseusLayer, self).__init__()
self.res_dict = {}
self.res_name = self.full_name()
self.pruner = None
self.quanter = None
def _return_dict_hook(self, layer, input, output):
res_dict = {"output": output}
# 'list' is needed to avoid error raised by popping self.res_dict
for res_key in list(self.res_dict):
# clear the res_dict because the forward process may change according to input
res_dict[res_key] = self.res_dict.pop(res_key)
return res_dict
def init_res(self, stages_pattern, return_patterns=None, return_stages=None):
if return_patterns and return_stages:
msg = f"The 'return_patterns' would be ignored when 'return_stages' is set."
return_stages = None
if return_stages is True:
return_patterns = stages_pattern
# return_stages is int or bool
if type(return_stages) is int:
return_stages = [return_stages]
if isinstance(return_stages, list):
if max(return_stages) > len(stages_pattern) or min(return_stages) < 0:
msg = f"The 'return_stages' set error. Illegal value(s) have been ignored. The stages' pattern list is {stages_pattern}."
return_stages = [val for val in return_stages if val >= 0 and val < len(stages_pattern)]
return_patterns = [stages_pattern[i] for i in return_stages]
if return_patterns:
self.update_res(return_patterns)
def replace_sub(self, *args, **kwargs) -> None:
msg = "The function 'replace_sub()' is deprecated, please use 'upgrade_sublayer()' instead."
raise DeprecationWarning(msg)
def upgrade_sublayer(self, layer_name_pattern: Union[str, List[str]],
handle_func: Callable[[nn.Layer, str], nn.Layer]) -> Dict[str, nn.Layer]:
"""use 'handle_func' to modify the sub-layer(s) specified by 'layer_name_pattern'.
Args:
layer_name_pattern (Union[str, List[str]]): The name of layer to be modified by 'handle_func'.
handle_func (Callable[[nn.Layer, str], nn.Layer]): The function to modify target layer specified by 'layer_name_pattern'. The formal params are the layer(nn.Layer) and pattern(str) that is (a member of) layer_name_pattern (when layer_name_pattern is List type). And the return is the layer processed.
Returns:
Dict[str, nn.Layer]: The key is the pattern and corresponding value is the result returned by 'handle_func()'.
Examples:
from paddle import nn
import paddleclas
def rep_func(layer: nn.Layer, pattern: str):
new_layer = nn.Conv2D(
in_channels=layer._in_channels,
out_channels=layer._out_channels,
kernel_size=5,
padding=2
)
return new_layer
net = paddleclas.MobileNetV1()
res = net.replace_sub(layer_name_pattern=["blocks[11].depthwise_conv.conv", "blocks[12].depthwise_conv.conv"], handle_func=rep_func)
print(res)
# {'blocks[11].depthwise_conv.conv': the corresponding new_layer, 'blocks[12].depthwise_conv.conv': the corresponding new_layer}
"""
if not isinstance(layer_name_pattern, list):
layer_name_pattern = [layer_name_pattern]
hit_layer_pattern_list = []
for pattern in layer_name_pattern:
# parse pattern to find target layer and its parent
layer_list = parse_pattern_str(pattern=pattern, parent_layer=self)
if not layer_list:
continue
sub_layer_parent = layer_list[-2]["layer"] if len(layer_list) > 1 else self
sub_layer = layer_list[-1]["layer"]
sub_layer_name = layer_list[-1]["name"]
sub_layer_index = layer_list[-1]["index"]
new_sub_layer = handle_func(sub_layer, pattern)
if sub_layer_index:
getattr(sub_layer_parent, sub_layer_name)[sub_layer_index] = new_sub_layer
else:
setattr(sub_layer_parent, sub_layer_name, new_sub_layer)
hit_layer_pattern_list.append(pattern)
return hit_layer_pattern_list
def stop_after(self, stop_layer_name: str) -> bool:
"""stop forward and backward after 'stop_layer_name'.
Args:
stop_layer_name (str): The name of layer that stop forward and backward after this layer.
Returns:
bool: 'True' if successful, 'False' otherwise.
"""
layer_list = parse_pattern_str(stop_layer_name, self)
if not layer_list:
return False
parent_layer = self
for layer_dict in layer_list:
name, index = layer_dict["name"], layer_dict["index"]
if not set_identity(parent_layer, name, index):
msg = f"Failed to set the layers that after stop_layer_name('{stop_layer_name}') to IdentityLayer. The error layer's name is '{name}'."
return False
parent_layer = layer_dict["layer"]
return True
def update_res(self, return_patterns: Union[str, List[str]]) -> Dict[str, nn.Layer]:
"""update the result(s) to be returned.
Args:
return_patterns (Union[str, List[str]]): The name of layer to return output.
Returns:
Dict[str, nn.Layer]: The pattern(str) and corresponding layer(nn.Layer) that have been set successfully.
"""
# clear res_dict that could have been set
self.res_dict = {}
class Handler(object):
def __init__(self, res_dict):
# res_dict is a reference
self.res_dict = res_dict
def __call__(self, layer, pattern):
layer.res_dict = self.res_dict
layer.res_name = pattern
if hasattr(layer, "hook_remove_helper"):
layer.hook_remove_helper.remove()
layer.hook_remove_helper = layer.register_forward_post_hook(save_sub_res_hook)
return layer
handle_func = Handler(self.res_dict)
hit_layer_pattern_list = self.upgrade_sublayer(return_patterns, handle_func=handle_func)
if hasattr(self, "hook_remove_helper"):
self.hook_remove_helper.remove()
self.hook_remove_helper = self.register_forward_post_hook(self._return_dict_hook)
return hit_layer_pattern_list
def save_sub_res_hook(layer, input, output):
layer.res_dict[layer.res_name] = output
def set_identity(parent_layer: nn.Layer, layer_name: str, layer_index: str = None) -> bool:
"""set the layer specified by layer_name and layer_index to Indentity.
Args:
parent_layer (nn.Layer): The parent layer of target layer specified by layer_name and layer_index.
layer_name (str): The name of target layer to be set to Indentity.
layer_index (str, optional): The index of target layer to be set to Indentity in parent_layer. Defaults to None.
Returns:
bool: True if successfully, False otherwise.
"""
stop_after = False
for sub_layer_name in parent_layer._sub_layers:
if stop_after:
parent_layer._sub_layers[sub_layer_name] = Identity()
continue
if sub_layer_name == layer_name:
stop_after = True
if layer_index and stop_after:
stop_after = False
for sub_layer_index in parent_layer._sub_layers[layer_name]._sub_layers:
if stop_after:
parent_layer._sub_layers[layer_name][sub_layer_index] = Identity()
continue
if layer_index == sub_layer_index:
stop_after = True
return stop_after
def parse_pattern_str(pattern: str, parent_layer: nn.Layer) -> Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]:
"""parse the string type pattern.
Args:
pattern (str): The pattern to discribe layer.
parent_layer (nn.Layer): The root layer relative to the pattern.
Returns:
Union[None, List[Dict[str, Union[nn.Layer, str, None]]]]: None if failed. If successfully, the members are layers parsed in order:
[
{"layer": first layer, "name": first layer's name parsed, "index": first layer's index parsed if exist},
{"layer": second layer, "name": second layer's name parsed, "index": second layer's index parsed if exist},
...
]
"""
pattern_list = pattern.split(".")
if not pattern_list:
msg = f"The pattern('{pattern}') is illegal. Please check and retry."
return None
layer_list = []
while len(pattern_list) > 0:
if '[' in pattern_list[0]:
target_layer_name = pattern_list[0].split('[')[0]
target_layer_index = pattern_list[0].split('[')[1].split(']')[0]
else:
target_layer_name = pattern_list[0]
target_layer_index = None
target_layer = getattr(parent_layer, target_layer_name, None)
if target_layer is None:
msg = f"Not found layer named('{target_layer_name}') specifed in pattern('{pattern}')."
return None
if target_layer_index and target_layer:
if int(target_layer_index) < 0 or int(target_layer_index) >= len(target_layer):
msg = f"Not found layer by index('{target_layer_index}') specifed in pattern('{pattern}'). The index should < {len(target_layer)} and > 0."
return None
target_layer = target_layer[target_layer_index]
layer_list.append({"layer": target_layer, "name": target_layer_name, "index": target_layer_index})
pattern_list = pattern_list[1:]
parent_layer = target_layer
return layer_list
def channel_shuffle(x, groups):
batch_size, num_channels, height, width = x.shape[0:4]
channels_per_group = num_channels // groups
x = reshape(x=x, shape=[batch_size, groups, channels_per_group, height, width])
x = transpose(x=x, perm=[0, 2, 1, 3, 4])
x = reshape(x=x, shape=[batch_size, num_channels, height, width])
return x
def make_divisible(v, divisor=8, min_value=None):
if min_value is None:
min_value = divisor
new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
if new_v < 0.9 * v:
new_v += divisor
return new_v
class ConvBNLayer(TheseusLayer):
def __init__(self, in_channels, out_channels, kernel_size, stride=1, groups=1, if_act=True):
super().__init__()
self.conv = Conv2D(in_channels=in_channels,
out_channels=out_channels,
kernel_size=kernel_size,
stride=stride,
padding=(kernel_size - 1) // 2,
groups=groups,
weight_attr=ParamAttr(initializer=KaimingNormal()),
bias_attr=False)
self.bn = BatchNorm(out_channels,
param_attr=ParamAttr(regularizer=L2Decay(0.0)),
bias_attr=ParamAttr(regularizer=L2Decay(0.0)))
self.if_act = if_act
self.hardswish = nn.Hardswish()
def forward(self, x):
x = self.conv(x)
x = self.bn(x)
if self.if_act:
x = self.hardswish(x)
return x
class SEModule(TheseusLayer):
def __init__(self, channel, reduction=4):
super().__init__()
self.avg_pool = AdaptiveAvgPool2D(1)
self.conv1 = Conv2D(in_channels=channel, out_channels=channel // reduction, kernel_size=1, stride=1, padding=0)
self.relu = nn.ReLU()
self.conv2 = Conv2D(in_channels=channel // reduction, out_channels=channel, kernel_size=1, stride=1, padding=0)
self.hardsigmoid = nn.Hardsigmoid()
def forward(self, x):
identity = x
x = self.avg_pool(x)
x = self.conv1(x)
x = self.relu(x)
x = self.conv2(x)
x = self.hardsigmoid(x)
x = paddle.multiply(x=identity, y=x)
return x
class ESBlock1(TheseusLayer):
def __init__(self, in_channels, out_channels):
super().__init__()
self.pw_1_1 = ConvBNLayer(in_channels=in_channels // 2, out_channels=out_channels // 2, kernel_size=1, stride=1)
self.dw_1 = ConvBNLayer(in_channels=out_channels // 2,
out_channels=out_channels // 2,
kernel_size=3,
stride=1,
groups=out_channels // 2,
if_act=False)
self.se = SEModule(out_channels)
self.pw_1_2 = ConvBNLayer(in_channels=out_channels, out_channels=out_channels // 2, kernel_size=1, stride=1)
def forward(self, x):
x1, x2 = split(x, num_or_sections=[x.shape[1] // 2, x.shape[1] // 2], axis=1)
x2 = self.pw_1_1(x2)
x3 = self.dw_1(x2)
x3 = concat([x2, x3], axis=1)
x3 = self.se(x3)
x3 = self.pw_1_2(x3)
x = concat([x1, x3], axis=1)
return channel_shuffle(x, 2)
class ESBlock2(TheseusLayer):
def __init__(self, in_channels, out_channels):
super().__init__()
# branch1
self.dw_1 = ConvBNLayer(in_channels=in_channels,
out_channels=in_channels,
kernel_size=3,
stride=2,
groups=in_channels,
if_act=False)
self.pw_1 = ConvBNLayer(in_channels=in_channels, out_channels=out_channels // 2, kernel_size=1, stride=1)
# branch2
self.pw_2_1 = ConvBNLayer(in_channels=in_channels, out_channels=out_channels // 2, kernel_size=1)
self.dw_2 = ConvBNLayer(in_channels=out_channels // 2,
out_channels=out_channels // 2,
kernel_size=3,
stride=2,
groups=out_channels // 2,
if_act=False)
self.se = SEModule(out_channels // 2)
self.pw_2_2 = ConvBNLayer(in_channels=out_channels // 2, out_channels=out_channels // 2, kernel_size=1)
self.concat_dw = ConvBNLayer(in_channels=out_channels,
out_channels=out_channels,
kernel_size=3,
groups=out_channels)
self.concat_pw = ConvBNLayer(in_channels=out_channels, out_channels=out_channels, kernel_size=1)
def forward(self, x):
x1 = self.dw_1(x)
x1 = self.pw_1(x1)
x2 = self.pw_2_1(x)
x2 = self.dw_2(x2)
x2 = self.se(x2)
x2 = self.pw_2_2(x2)
x = concat([x1, x2], axis=1)
x = self.concat_dw(x)
x = self.concat_pw(x)
return x
class ESNet(TheseusLayer):
def __init__(self,
stages_pattern,
class_num=1000,
scale=1.0,
dropout_prob=0.2,
class_expand=1280,
return_patterns=None,
return_stages=None):
super().__init__()
self.scale = scale
self.class_num = class_num
self.class_expand = class_expand
stage_repeats = [3, 7, 3]
stage_out_channels = [
-1, 24, make_divisible(116 * scale),
make_divisible(232 * scale),
make_divisible(464 * scale), 1024
]
self.conv1 = ConvBNLayer(in_channels=3, out_channels=stage_out_channels[1], kernel_size=3, stride=2)
self.max_pool = MaxPool2D(kernel_size=3, stride=2, padding=1)
block_list = []
for stage_id, num_repeat in enumerate(stage_repeats):
for i in range(num_repeat):
if i == 0:
block = ESBlock2(in_channels=stage_out_channels[stage_id + 1],
out_channels=stage_out_channels[stage_id + 2])
else:
block = ESBlock1(in_channels=stage_out_channels[stage_id + 2],
out_channels=stage_out_channels[stage_id + 2])
block_list.append(block)
self.blocks = nn.Sequential(*block_list)
self.conv2 = ConvBNLayer(in_channels=stage_out_channels[-2], out_channels=stage_out_channels[-1], kernel_size=1)
self.avg_pool = AdaptiveAvgPool2D(1)
self.last_conv = Conv2D(in_channels=stage_out_channels[-1],
out_channels=self.class_expand,
kernel_size=1,
stride=1,
padding=0,
bias_attr=False)
self.hardswish = nn.Hardswish()
self.dropout = Dropout(p=dropout_prob, mode="downscale_in_infer")
self.flatten = nn.Flatten(start_axis=1, stop_axis=-1)
self.fc = Linear(self.class_expand, self.class_num)
super().init_res(stages_pattern, return_patterns=return_patterns, return_stages=return_stages)
def forward(self, x):
x = self.conv1(x)
x = self.max_pool(x)
x = self.blocks(x)
x = self.conv2(x)
x = self.avg_pool(x)
x = self.last_conv(x)
x = self.hardswish(x)
x = self.dropout(x)
x = self.flatten(x)
x = self.fc(x)
return x
def ESNet_x0_5(pretrained=False, use_ssld=False, **kwargs):
"""
ESNet_x0_5
Args:
pretrained: bool=False or str. If `True` load pretrained parameters, `False` otherwise.
If str, means the path of the pretrained model.
use_ssld: bool=False. Whether using distillation pretrained model when pretrained=True.
Returns:
model: nn.Layer. Specific `ESNet_x0_5` model depends on args.
"""
model = ESNet(scale=0.5, stages_pattern=MODEL_STAGES_PATTERN["ESNet"], **kwargs)
return model
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import copy
import os
import cv2
import numpy as np
import paddle
from skimage.io import imread
from skimage.transform import rescale
from skimage.transform import resize
import paddlehub as hub
from .model import ESNet_x0_5
from .processor import base64_to_cv2
from .processor import create_operators
from .processor import Topk
from .utils import get_config
from paddlehub.module.module import moduleinfo
from paddlehub.module.module import runnable
from paddlehub.module.module import serving
@moduleinfo(name="esnet_x0_5_imagenet",
type="cv/classification",
author="paddlepaddle",
author_email="",
summary="",
version="1.0.0")
class Esnet_x0_5_Imagenet:
def __init__(self):
self.config = get_config(os.path.join(self.directory, 'ESNet_x0_5.yaml'), show=False)
self.label_path = os.path.join(self.directory, 'imagenet1k_label_list.txt')
self.pretrain_path = os.path.join(self.directory, 'ESNet_x0_5_pretrained.pdparams')
self.config['Infer']['PostProcess']['class_id_map_file'] = self.label_path
self.model = ESNet_x0_5()
param_state_dict = paddle.load(self.pretrain_path)
self.model.set_dict(param_state_dict)
self.preprocess_funcs = create_operators(self.config["Infer"]["transforms"])
def classification(self,
images: list = None,
paths: list = None,
batch_size: int = 1,
use_gpu: bool = False,
top_k: int = 1):
'''
Args:
images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR.
paths (list[str]): The paths of images.
batch_size (int): batch size.
use_gpu (bool): Whether to use gpu.
top_k (int): Return top k results.
Returns:
res (list[dict]): The classfication results, each result dict contains key 'class_ids', 'scores' and 'label_names'.
'''
postprocess_func = Topk(top_k, self.label_path)
inputs = []
results = []
paddle.disable_static()
place = 'gpu:0' if use_gpu else 'cpu'
place = paddle.set_device(place)
if images == None and paths == None:
print('No image provided. Please input an image or a image path.')
return
if images != None:
for image in images:
image = image[:, :, ::-1]
inputs.append(image)
if paths != None:
for path in paths:
image = cv2.imread(path)[:, :, ::-1]
inputs.append(image)
batch_data = []
for idx, imagedata in enumerate(inputs):
for process in self.preprocess_funcs:
imagedata = process(imagedata)
batch_data.append(imagedata)
if len(batch_data) >= batch_size or idx == len(inputs) - 1:
batch_tensor = paddle.to_tensor(batch_data)
out = self.model(batch_tensor)
if isinstance(out, list):
out = out[0]
if isinstance(out, dict) and "logits" in out:
out = out["logits"]
if isinstance(out, dict) and "output" in out:
out = out["output"]
result = postprocess_func(out)
results.extend(result)
batch_data.clear()
return results
@runnable
def run_cmd(self, argvs: list):
"""
Run as a command.
"""
self.parser = argparse.ArgumentParser(description="Run the {} module.".format(self.name),
prog='hub run {}'.format(self.name),
usage='%(prog)s',
add_help=True)
self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
self.arg_config_group = self.parser.add_argument_group(
title="Config options", description="Run configuration for controlling module behavior, not required.")
self.add_module_config_arg()
self.add_module_input_arg()
self.args = self.parser.parse_args(argvs)
results = self.classification(paths=[self.args.input_path],
use_gpu=self.args.use_gpu,
batch_size=self.args.batch_size,
top_k=self.args.top_k)
return results
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.classification(images=images_decode, **kwargs)
return results
def add_module_config_arg(self):
"""
Add the command config options.
"""
self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not")
self.arg_config_group.add_argument('--batch_size', type=int, default=1, help='batch size')
self.arg_config_group.add_argument('--top_k', type=int, default=1, help='Return top k results.')
def add_module_input_arg(self):
"""
Add the command input options.
"""
self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.")
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import base64
import inspect
import math
import os
import random
import sys
from functools import partial
import cv2
import numpy as np
import paddle
import paddle.nn.functional as F
import six
from paddle.vision.transforms import ColorJitter as RawColorJitter
from PIL import Image
def create_operators(params, class_num=None):
"""
create operators based on the config
Args:
params(list): a dict list, used to create some operators
"""
assert isinstance(params, list), ('operator config should be a list')
ops = []
current_module = sys.modules[__name__]
for operator in params:
assert isinstance(operator, dict) and len(operator) == 1, "yaml format error"
op_name = list(operator)[0]
param = {} if operator[op_name] is None else operator[op_name]
op_func = getattr(current_module, op_name)
if "class_num" in inspect.getfullargspec(op_func).args:
param.update({"class_num": class_num})
op = op_func(**param)
ops.append(op)
return ops
class UnifiedResize(object):
def __init__(self, interpolation=None, backend="cv2"):
_cv2_interp_from_str = {
'nearest': cv2.INTER_NEAREST,
'bilinear': cv2.INTER_LINEAR,
'area': cv2.INTER_AREA,
'bicubic': cv2.INTER_CUBIC,
'lanczos': cv2.INTER_LANCZOS4
}
_pil_interp_from_str = {
'nearest': Image.NEAREST,
'bilinear': Image.BILINEAR,
'bicubic': Image.BICUBIC,
'box': Image.BOX,
'lanczos': Image.LANCZOS,
'hamming': Image.HAMMING
}
def _pil_resize(src, size, resample):
pil_img = Image.fromarray(src)
pil_img = pil_img.resize(size, resample)
return np.asarray(pil_img)
if backend.lower() == "cv2":
if isinstance(interpolation, str):
interpolation = _cv2_interp_from_str[interpolation.lower()]
# compatible with opencv < version 4.4.0
elif interpolation is None:
interpolation = cv2.INTER_LINEAR
self.resize_func = partial(cv2.resize, interpolation=interpolation)
elif backend.lower() == "pil":
if isinstance(interpolation, str):
interpolation = _pil_interp_from_str[interpolation.lower()]
self.resize_func = partial(_pil_resize, resample=interpolation)
else:
self.resize_func = cv2.resize
def __call__(self, src, size):
return self.resize_func(src, size)
class OperatorParamError(ValueError):
""" OperatorParamError
"""
pass
class DecodeImage(object):
""" decode image """
def __init__(self, to_rgb=True, to_np=False, channel_first=False):
self.to_rgb = to_rgb
self.to_np = to_np # to numpy
self.channel_first = channel_first # only enabled when to_np is True
def __call__(self, img):
if six.PY2:
assert type(img) is str and len(img) > 0, "invalid input 'img' in DecodeImage"
else:
assert type(img) is bytes and len(img) > 0, "invalid input 'img' in DecodeImage"
data = np.frombuffer(img, dtype='uint8')
img = cv2.imdecode(data, 1)
if self.to_rgb:
assert img.shape[2] == 3, 'invalid shape of image[%s]' % (img.shape)
img = img[:, :, ::-1]
if self.channel_first:
img = img.transpose((2, 0, 1))
return img
class ResizeImage(object):
""" resize image """
def __init__(self, size=None, resize_short=None, interpolation=None, backend="cv2"):
if resize_short is not None and resize_short > 0:
self.resize_short = resize_short
self.w = None
self.h = None
elif size is not None:
self.resize_short = None
self.w = size if type(size) is int else size[0]
self.h = size if type(size) is int else size[1]
else:
raise OperatorParamError("invalid params for ReisizeImage for '\
'both 'size' and 'resize_short' are None")
self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend)
def __call__(self, img):
img_h, img_w = img.shape[:2]
if self.resize_short is not None:
percent = float(self.resize_short) / min(img_w, img_h)
w = int(round(img_w * percent))
h = int(round(img_h * percent))
else:
w = self.w
h = self.h
return self._resize_func(img, (w, h))
class CropImage(object):
""" crop image """
def __init__(self, size):
if type(size) is int:
self.size = (size, size)
else:
self.size = size # (h, w)
def __call__(self, img):
w, h = self.size
img_h, img_w = img.shape[:2]
w_start = (img_w - w) // 2
h_start = (img_h - h) // 2
w_end = w_start + w
h_end = h_start + h
return img[h_start:h_end, w_start:w_end, :]
class RandCropImage(object):
""" random crop image """
def __init__(self, size, scale=None, ratio=None, interpolation=None, backend="cv2"):
if type(size) is int:
self.size = (size, size) # (h, w)
else:
self.size = size
self.scale = [0.08, 1.0] if scale is None else scale
self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio
self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend)
def __call__(self, img):
size = self.size
scale = self.scale
ratio = self.ratio
aspect_ratio = math.sqrt(random.uniform(*ratio))
w = 1. * aspect_ratio
h = 1. / aspect_ratio
img_h, img_w = img.shape[:2]
bound = min((float(img_w) / img_h) / (w**2), (float(img_h) / img_w) / (h**2))
scale_max = min(scale[1], bound)
scale_min = min(scale[0], bound)
target_area = img_w * img_h * random.uniform(scale_min, scale_max)
target_size = math.sqrt(target_area)
w = int(target_size * w)
h = int(target_size * h)
i = random.randint(0, img_w - w)
j = random.randint(0, img_h - h)
img = img[j:j + h, i:i + w, :]
return self._resize_func(img, size)
class RandFlipImage(object):
""" random flip image
flip_code:
1: Flipped Horizontally
0: Flipped Vertically
-1: Flipped Horizontally & Vertically
"""
def __init__(self, flip_code=1):
assert flip_code in [-1, 0, 1], "flip_code should be a value in [-1, 0, 1]"
self.flip_code = flip_code
def __call__(self, img):
if random.randint(0, 1) == 1:
return cv2.flip(img, self.flip_code)
else:
return img
class NormalizeImage(object):
""" normalize image such as substract mean, divide std
"""
def __init__(self, scale=None, mean=None, std=None, order='chw', output_fp16=False, channel_num=3):
if isinstance(scale, str):
scale = eval(scale)
assert channel_num in [3, 4], "channel number of input image should be set to 3 or 4."
self.channel_num = channel_num
self.output_dtype = 'float16' if output_fp16 else 'float32'
self.scale = np.float32(scale if scale is not None else 1.0 / 255.0)
self.order = order
mean = mean if mean is not None else [0.485, 0.456, 0.406]
std = std if std is not None else [0.229, 0.224, 0.225]
shape = (3, 1, 1) if self.order == 'chw' else (1, 1, 3)
self.mean = np.array(mean).reshape(shape).astype('float32')
self.std = np.array(std).reshape(shape).astype('float32')
def __call__(self, img):
from PIL import Image
if isinstance(img, Image.Image):
img = np.array(img)
assert isinstance(img, np.ndarray), "invalid input 'img' in NormalizeImage"
img = (img.astype('float32') * self.scale - self.mean) / self.std
if self.channel_num == 4:
img_h = img.shape[1] if self.order == 'chw' else img.shape[0]
img_w = img.shape[2] if self.order == 'chw' else img.shape[1]
pad_zeros = np.zeros((1, img_h, img_w)) if self.order == 'chw' else np.zeros((img_h, img_w, 1))
img = (np.concatenate((img, pad_zeros), axis=0) if self.order == 'chw' else np.concatenate(
(img, pad_zeros), axis=2))
return img.astype(self.output_dtype)
class ToCHWImage(object):
""" convert hwc image to chw image
"""
def __init__(self):
pass
def __call__(self, img):
from PIL import Image
if isinstance(img, Image.Image):
img = np.array(img)
return img.transpose((2, 0, 1))
class ColorJitter(RawColorJitter):
"""ColorJitter.
"""
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
def __call__(self, img):
if not isinstance(img, Image.Image):
img = np.ascontiguousarray(img)
img = Image.fromarray(img)
img = super()._apply_image(img)
if isinstance(img, Image.Image):
img = np.asarray(img)
return img
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
class Topk(object):
def __init__(self, topk=1, class_id_map_file=None):
assert isinstance(topk, (int, ))
self.class_id_map = self.parse_class_id_map(class_id_map_file)
self.topk = topk
def parse_class_id_map(self, class_id_map_file):
if class_id_map_file is None:
return None
if not os.path.exists(class_id_map_file):
print(
"Warning: If want to use your own label_dict, please input legal path!\nOtherwise label_names will be empty!"
)
return None
try:
class_id_map = {}
with open(class_id_map_file, "r") as fin:
lines = fin.readlines()
for line in lines:
partition = line.split("\n")[0].partition(" ")
class_id_map[int(partition[0])] = str(partition[-1])
except Exception as ex:
print(ex)
class_id_map = None
return class_id_map
def __call__(self, x, file_names=None, multilabel=False):
assert isinstance(x, paddle.Tensor)
if file_names is not None:
assert x.shape[0] == len(file_names)
x = F.softmax(x, axis=-1) if not multilabel else F.sigmoid(x)
x = x.numpy()
y = []
for idx, probs in enumerate(x):
index = probs.argsort(axis=0)[-self.topk:][::-1].astype("int32") if not multilabel else np.where(
probs >= 0.5)[0].astype("int32")
clas_id_list = []
score_list = []
label_name_list = []
for i in index:
clas_id_list.append(i.item())
score_list.append(probs[i].item())
if self.class_id_map is not None:
label_name_list.append(self.class_id_map[i.item()])
result = {
"class_ids": clas_id_list,
"scores": np.around(score_list, decimals=5).tolist(),
}
if file_names is not None:
result["file_name"] = file_names[idx]
if label_name_list is not None:
result["label_names"] = label_name_list
y.append(result)
return y
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import copy
import os
import yaml
__all__ = ['get_config']
class AttrDict(dict):
def __getattr__(self, key):
return self[key]
def __setattr__(self, key, value):
if key in self.__dict__:
self.__dict__[key] = value
else:
self[key] = value
def __deepcopy__(self, content):
return copy.deepcopy(dict(self))
def create_attr_dict(yaml_config):
from ast import literal_eval
for key, value in yaml_config.items():
if type(value) is dict:
yaml_config[key] = value = AttrDict(value)
if isinstance(value, str):
try:
value = literal_eval(value)
except BaseException:
pass
if isinstance(value, AttrDict):
create_attr_dict(yaml_config[key])
else:
yaml_config[key] = value
def parse_config(cfg_file):
"""Load a config file into AttrDict"""
with open(cfg_file, 'r') as fopen:
yaml_config = AttrDict(yaml.load(fopen, Loader=yaml.SafeLoader))
create_attr_dict(yaml_config)
return yaml_config
def override(dl, ks, v):
"""
Recursively replace dict of list
Args:
dl(dict or list): dict or list to be replaced
ks(list): list of keys
v(str): value to be replaced
"""
def str2num(v):
try:
return eval(v)
except Exception:
return v
assert isinstance(dl, (list, dict)), ("{} should be a list or a dict")
assert len(ks) > 0, ('lenght of keys should larger than 0')
if isinstance(dl, list):
k = str2num(ks[0])
if len(ks) == 1:
assert k < len(dl), ('index({}) out of range({})'.format(k, dl))
dl[k] = str2num(v)
else:
override(dl[k], ks[1:], v)
else:
if len(ks) == 1:
# assert ks[0] in dl, ('{} is not exist in {}'.format(ks[0], dl))
if not ks[0] in dl:
print('A new filed ({}) detected!'.format(ks[0], dl))
dl[ks[0]] = str2num(v)
else:
override(dl[ks[0]], ks[1:], v)
def override_config(config, options=None):
"""
Recursively override the config
Args:
config(dict): dict to be replaced
options(list): list of pairs(key0.key1.idx.key2=value)
such as: [
'topk=2',
'VALID.transforms.1.ResizeImage.resize_short=300'
]
Returns:
config(dict): replaced config
"""
if options is not None:
for opt in options:
assert isinstance(opt, str), ("option({}) should be a str".format(opt))
assert "=" in opt, ("option({}) should contain a ="
"to distinguish between key and value".format(opt))
pair = opt.split('=')
assert len(pair) == 2, ("there can be only a = in the option")
key, value = pair
keys = key.split('.')
override(config, keys, value)
return config
def get_config(fname, overrides=None, show=False):
"""
Read config from file
"""
assert os.path.exists(fname), ('config file({}) is not exist'.format(fname))
config = parse_config(fname)
override_config(config, overrides)
return config
# levit_128_imagenet
|模型名称|levit_128_imagenet|
| :--- | :---: |
|类别|图像-图像分类|
|网络|LeViT|
|数据集|ImageNet-2012|
|是否支持Fine-tuning|否|
|模型大小|54 MB|
|最新更新日期|2022-04-02|
|数据指标|Acc|
## 一、模型基本信息
- ### 模型介绍
- LeViT 是一种快速推理的、用于图像分类任务的混合神经网络。其设计之初考虑了网络模型在不同的硬件平台上的性能,因此能够更好地反映普遍应用的真实场景。通过大量实验,作者找到了卷积神经网络与 Transformer 体系更好的结合方式,并且提出了 attention-based 方法,用于整合 Transformer 中的位置信息编码, 该模块的模型结构配置为LeViT128, 详情可参考[论文地址](https://arxiv.org/abs/2104.01136)
## 二、安装
- ### 1、环境依赖
- paddlepaddle >= 1.6.2
- paddlehub >= 1.6.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
- ### 2、安装
- ```shell
$ hub install levit_128_imagenet
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
$ hub run levit_128_imagenet --input_path "/PATH/TO/IMAGE"
```
- 通过命令行方式实现分类模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、预测代码示例
- ```python
import paddlehub as hub
import cv2
classifier = hub.Module(name="levit_128_imagenet")
result = classifier.classification(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = classifier.classification(paths=['/PATH/TO/IMAGE'])
```
- ### 3、API
- ```python
def classification(images=None,
paths=None,
batch_size=1,
use_gpu=False,
top_k=1):
```
- 分类接口API。
- **参数**
- images (list\[numpy.ndarray\]): 图片数据,每一个图片数据的shape 均为 \[H, W, C\],颜色空间为 BGR; <br/>
- paths (list\[str\]): 图片的路径; <br/>
- batch\_size (int): batch 的大小;<br/>
- use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量** <br/>
- top\_k (int): 返回预测结果的前 k 个。
- **返回**
- res (list\[dict\]): 分类结果,列表的每一个元素均为字典,其中 key 包括'class_ids'(种类索引), 'scores'(置信度) 和 'label_names'(种类名称)
## 四、服务部署
- PaddleHub Serving可以部署一个图像识别的在线服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m levit_128_imagenet
```
- 这样就完成了一个图像识别的在线服务的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"\}
url = "http://127.0.0.1:8866/predict/levit_128_imagenet"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
## 五、更新历史
* 1.0.0
初始发布
- ```shell
$ hub install levit_128_imagenet==1.0.0
```
# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Code was based on https://github.com/facebookresearch/LeViT
import itertools
import math
import warnings
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddle.nn.initializer import Constant
from paddle.nn.initializer import TruncatedNormal
from paddle.regularizer import L2Decay
from .vision_transformer import Identity
from .vision_transformer import ones_
from .vision_transformer import trunc_normal_
from .vision_transformer import zeros_
def cal_attention_biases(attention_biases, attention_bias_idxs):
gather_list = []
attention_bias_t = paddle.transpose(attention_biases, (1, 0))
nums = attention_bias_idxs.shape[0]
for idx in range(nums):
gather = paddle.gather(attention_bias_t, attention_bias_idxs[idx])
gather_list.append(gather)
shape0, shape1 = attention_bias_idxs.shape
gather = paddle.concat(gather_list)
return paddle.transpose(gather, (1, 0)).reshape((0, shape0, shape1))
class Conv2d_BN(nn.Sequential):
def __init__(self, a, b, ks=1, stride=1, pad=0, dilation=1, groups=1, bn_weight_init=1, resolution=-10000):
super().__init__()
self.add_sublayer('c', nn.Conv2D(a, b, ks, stride, pad, dilation, groups, bias_attr=False))
bn = nn.BatchNorm2D(b)
ones_(bn.weight)
zeros_(bn.bias)
self.add_sublayer('bn', bn)
class Linear_BN(nn.Sequential):
def __init__(self, a, b, bn_weight_init=1):
super().__init__()
self.add_sublayer('c', nn.Linear(a, b, bias_attr=False))
bn = nn.BatchNorm1D(b)
if bn_weight_init == 0:
zeros_(bn.weight)
else:
ones_(bn.weight)
zeros_(bn.bias)
self.add_sublayer('bn', bn)
def forward(self, x):
l, bn = self._sub_layers.values()
x = l(x)
return paddle.reshape(bn(x.flatten(0, 1)), x.shape)
class BN_Linear(nn.Sequential):
def __init__(self, a, b, bias=True, std=0.02):
super().__init__()
self.add_sublayer('bn', nn.BatchNorm1D(a))
l = nn.Linear(a, b, bias_attr=bias)
trunc_normal_(l.weight)
if bias:
zeros_(l.bias)
self.add_sublayer('l', l)
def b16(n, activation, resolution=224):
return nn.Sequential(Conv2d_BN(3, n // 8, 3, 2, 1, resolution=resolution), activation(),
Conv2d_BN(n // 8, n // 4, 3, 2, 1, resolution=resolution // 2), activation(),
Conv2d_BN(n // 4, n // 2, 3, 2, 1, resolution=resolution // 4), activation(),
Conv2d_BN(n // 2, n, 3, 2, 1, resolution=resolution // 8))
class Residual(nn.Layer):
def __init__(self, m, drop):
super().__init__()
self.m = m
self.drop = drop
def forward(self, x):
if self.training and self.drop > 0:
y = paddle.rand(shape=[x.shape[0], 1, 1]).__ge__(self.drop).astype("float32")
y = y.divide(paddle.full_like(y, 1 - self.drop))
return paddle.add(x, y)
else:
return paddle.add(x, self.m(x))
class Attention(nn.Layer):
def __init__(self, dim, key_dim, num_heads=8, attn_ratio=4, activation=None, resolution=14):
super().__init__()
self.num_heads = num_heads
self.scale = key_dim**-0.5
self.key_dim = key_dim
self.nh_kd = nh_kd = key_dim * num_heads
self.d = int(attn_ratio * key_dim)
self.dh = int(attn_ratio * key_dim) * num_heads
self.attn_ratio = attn_ratio
self.h = self.dh + nh_kd * 2
self.qkv = Linear_BN(dim, self.h)
self.proj = nn.Sequential(activation(), Linear_BN(self.dh, dim, bn_weight_init=0))
points = list(itertools.product(range(resolution), range(resolution)))
N = len(points)
attention_offsets = {}
idxs = []
for p1 in points:
for p2 in points:
offset = (abs(p1[0] - p2[0]), abs(p1[1] - p2[1]))
if offset not in attention_offsets:
attention_offsets[offset] = len(attention_offsets)
idxs.append(attention_offsets[offset])
self.attention_biases = self.create_parameter(shape=(num_heads, len(attention_offsets)),
default_initializer=zeros_,
attr=paddle.ParamAttr(regularizer=L2Decay(0.0)))
tensor_idxs = paddle.to_tensor(idxs, dtype='int64')
self.register_buffer('attention_bias_idxs', paddle.reshape(tensor_idxs, [N, N]))
@paddle.no_grad()
def train(self, mode=True):
if mode:
super().train()
else:
super().eval()
if mode and hasattr(self, 'ab'):
del self.ab
else:
self.ab = cal_attention_biases(self.attention_biases, self.attention_bias_idxs)
def forward(self, x):
self.training = True
B, N, C = x.shape
qkv = self.qkv(x)
qkv = paddle.reshape(qkv, [B, N, self.num_heads, self.h // self.num_heads])
q, k, v = paddle.split(qkv, [self.key_dim, self.key_dim, self.d], axis=3)
q = paddle.transpose(q, perm=[0, 2, 1, 3])
k = paddle.transpose(k, perm=[0, 2, 1, 3])
v = paddle.transpose(v, perm=[0, 2, 1, 3])
k_transpose = paddle.transpose(k, perm=[0, 1, 3, 2])
if self.training:
attention_biases = cal_attention_biases(self.attention_biases, self.attention_bias_idxs)
else:
attention_biases = self.ab
attn = (paddle.matmul(q, k_transpose) * self.scale + attention_biases)
attn = F.softmax(attn)
x = paddle.transpose(paddle.matmul(attn, v), perm=[0, 2, 1, 3])
x = paddle.reshape(x, [B, N, self.dh])
x = self.proj(x)
return x
class Subsample(nn.Layer):
def __init__(self, stride, resolution):
super().__init__()
self.stride = stride
self.resolution = resolution
def forward(self, x):
B, N, C = x.shape
x = paddle.reshape(x, [B, self.resolution, self.resolution, C])
end1, end2 = x.shape[1], x.shape[2]
x = x[:, 0:end1:self.stride, 0:end2:self.stride]
x = paddle.reshape(x, [B, -1, C])
return x
class AttentionSubsample(nn.Layer):
def __init__(self,
in_dim,
out_dim,
key_dim,
num_heads=8,
attn_ratio=2,
activation=None,
stride=2,
resolution=14,
resolution_=7):
super().__init__()
self.num_heads = num_heads
self.scale = key_dim**-0.5
self.key_dim = key_dim
self.nh_kd = nh_kd = key_dim * num_heads
self.d = int(attn_ratio * key_dim)
self.dh = int(attn_ratio * key_dim) * self.num_heads
self.attn_ratio = attn_ratio
self.resolution_ = resolution_
self.resolution_2 = resolution_**2
self.training = True
h = self.dh + nh_kd
self.kv = Linear_BN(in_dim, h)
self.q = nn.Sequential(Subsample(stride, resolution), Linear_BN(in_dim, nh_kd))
self.proj = nn.Sequential(activation(), Linear_BN(self.dh, out_dim))
self.stride = stride
self.resolution = resolution
points = list(itertools.product(range(resolution), range(resolution)))
points_ = list(itertools.product(range(resolution_), range(resolution_)))
N = len(points)
N_ = len(points_)
attention_offsets = {}
idxs = []
i = 0
j = 0
for p1 in points_:
i += 1
for p2 in points:
j += 1
size = 1
offset = (abs(p1[0] * stride - p2[0] + (size - 1) / 2), abs(p1[1] * stride - p2[1] + (size - 1) / 2))
if offset not in attention_offsets:
attention_offsets[offset] = len(attention_offsets)
idxs.append(attention_offsets[offset])
self.attention_biases = self.create_parameter(shape=(num_heads, len(attention_offsets)),
default_initializer=zeros_,
attr=paddle.ParamAttr(regularizer=L2Decay(0.0)))
tensor_idxs_ = paddle.to_tensor(idxs, dtype='int64')
self.register_buffer('attention_bias_idxs', paddle.reshape(tensor_idxs_, [N_, N]))
@paddle.no_grad()
def train(self, mode=True):
if mode:
super().train()
else:
super().eval()
if mode and hasattr(self, 'ab'):
del self.ab
else:
self.ab = cal_attention_biases(self.attention_biases, self.attention_bias_idxs)
def forward(self, x):
self.training = True
B, N, C = x.shape
kv = self.kv(x)
kv = paddle.reshape(kv, [B, N, self.num_heads, -1])
k, v = paddle.split(kv, [self.key_dim, self.d], axis=3)
k = paddle.transpose(k, perm=[0, 2, 1, 3]) # BHNC
v = paddle.transpose(v, perm=[0, 2, 1, 3])
q = paddle.reshape(self.q(x), [B, self.resolution_2, self.num_heads, self.key_dim])
q = paddle.transpose(q, perm=[0, 2, 1, 3])
if self.training:
attention_biases = cal_attention_biases(self.attention_biases, self.attention_bias_idxs)
else:
attention_biases = self.ab
attn = (paddle.matmul(q, paddle.transpose(k, perm=[0, 1, 3, 2]))) * self.scale + attention_biases
attn = F.softmax(attn)
x = paddle.reshape(paddle.transpose(paddle.matmul(attn, v), perm=[0, 2, 1, 3]), [B, -1, self.dh])
x = self.proj(x)
return x
class LeViT(nn.Layer):
""" Vision Transformer with support for patch or hybrid CNN input stage
"""
def __init__(self,
img_size=224,
patch_size=16,
in_chans=3,
class_num=1000,
embed_dim=[192],
key_dim=[64],
depth=[12],
num_heads=[3],
attn_ratio=[2],
mlp_ratio=[2],
hybrid_backbone=None,
down_ops=[],
attention_activation=nn.Hardswish,
mlp_activation=nn.Hardswish,
distillation=True,
drop_path=0):
super().__init__()
self.class_num = class_num
self.num_features = embed_dim[-1]
self.embed_dim = embed_dim
self.distillation = distillation
self.patch_embed = hybrid_backbone
self.blocks = []
down_ops.append([''])
resolution = img_size // patch_size
for i, (ed, kd, dpth, nh, ar, mr,
do) in enumerate(zip(embed_dim, key_dim, depth, num_heads, attn_ratio, mlp_ratio, down_ops)):
for _ in range(dpth):
self.blocks.append(
Residual(
Attention(
ed,
kd,
nh,
attn_ratio=ar,
activation=attention_activation,
resolution=resolution,
), drop_path))
if mr > 0:
h = int(ed * mr)
self.blocks.append(
Residual(
nn.Sequential(
Linear_BN(ed, h),
mlp_activation(),
Linear_BN(h, ed, bn_weight_init=0),
), drop_path))
if do[0] == 'Subsample':
#('Subsample',key_dim, num_heads, attn_ratio, mlp_ratio, stride)
resolution_ = (resolution - 1) // do[5] + 1
self.blocks.append(
AttentionSubsample(*embed_dim[i:i + 2],
key_dim=do[1],
num_heads=do[2],
attn_ratio=do[3],
activation=attention_activation,
stride=do[5],
resolution=resolution,
resolution_=resolution_))
resolution = resolution_
if do[4] > 0: # mlp_ratio
h = int(embed_dim[i + 1] * do[4])
self.blocks.append(
Residual(
nn.Sequential(
Linear_BN(embed_dim[i + 1], h),
mlp_activation(),
Linear_BN(h, embed_dim[i + 1], bn_weight_init=0),
), drop_path))
self.blocks = nn.Sequential(*self.blocks)
# Classifier head
self.head = BN_Linear(embed_dim[-1], class_num) if class_num > 0 else Identity()
if distillation:
self.head_dist = BN_Linear(embed_dim[-1], class_num) if class_num > 0 else Identity()
def forward(self, x):
x = self.patch_embed(x)
x = x.flatten(2)
x = paddle.transpose(x, perm=[0, 2, 1])
x = self.blocks(x)
x = x.mean(1)
x = paddle.reshape(x, [-1, self.embed_dim[-1]])
if self.distillation:
x = self.head(x), self.head_dist(x)
if not self.training:
x = (x[0] + x[1]) / 2
else:
x = self.head(x)
return x
def model_factory(C, D, X, N, drop_path, class_num, distillation):
embed_dim = [int(x) for x in C.split('_')]
num_heads = [int(x) for x in N.split('_')]
depth = [int(x) for x in X.split('_')]
act = nn.Hardswish
model = LeViT(
patch_size=16,
embed_dim=embed_dim,
num_heads=num_heads,
key_dim=[D] * 3,
depth=depth,
attn_ratio=[2, 2, 2],
mlp_ratio=[2, 2, 2],
down_ops=[
#('Subsample',key_dim, num_heads, attn_ratio, mlp_ratio, stride)
['Subsample', D, embed_dim[0] // D, 4, 2, 2],
['Subsample', D, embed_dim[1] // D, 4, 2, 2],
],
attention_activation=act,
mlp_activation=act,
hybrid_backbone=b16(embed_dim[0], activation=act),
class_num=class_num,
drop_path=drop_path,
distillation=distillation)
return model
specification = {
'LeViT_128S': {
'C': '128_256_384',
'D': 16,
'N': '4_6_8',
'X': '2_3_4',
'drop_path': 0
},
'LeViT_128': {
'C': '128_256_384',
'D': 16,
'N': '4_8_12',
'X': '4_4_4',
'drop_path': 0
},
'LeViT_192': {
'C': '192_288_384',
'D': 32,
'N': '3_5_6',
'X': '4_4_4',
'drop_path': 0
},
'LeViT_256': {
'C': '256_384_512',
'D': 32,
'N': '4_6_8',
'X': '4_4_4',
'drop_path': 0
},
'LeViT_384': {
'C': '384_512_768',
'D': 32,
'N': '6_9_12',
'X': '4_4_4',
'drop_path': 0.1
},
}
def LeViT_128(**kwargs):
model = model_factory(**specification['LeViT_128'], class_num=1000, distillation=False)
return model
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import copy
import os
import cv2
import numpy as np
import paddle
from skimage.io import imread
from skimage.transform import rescale
from skimage.transform import resize
import paddlehub as hub
from .model import LeViT_128
from .processor import base64_to_cv2
from .processor import create_operators
from .processor import Topk
from .utils import get_config
from paddlehub.module.module import moduleinfo
from paddlehub.module.module import runnable
from paddlehub.module.module import serving
@moduleinfo(name="levit_128_imagenet",
type="cv/classification",
author="paddlepaddle",
author_email="",
summary="",
version="1.0.0")
class LeViT_128_ImageNet:
def __init__(self):
self.config = get_config(os.path.join(self.directory, 'LeViT_128.yaml'), show=False)
self.label_path = os.path.join(self.directory, 'imagenet1k_label_list.txt')
self.pretrain_path = os.path.join(self.directory, 'LeViT_128_pretrained.pdparams')
self.config['Infer']['PostProcess']['class_id_map_file'] = self.label_path
self.model = LeViT_128()
param_state_dict = paddle.load(self.pretrain_path)
self.model.set_dict(param_state_dict)
self.preprocess_funcs = create_operators(self.config["Infer"]["transforms"])
def classification(self,
images: list = None,
paths: list = None,
batch_size: int = 1,
use_gpu: bool = False,
top_k: int = 1):
'''
Args:
images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR.
paths (list[str]): The paths of images.
batch_size (int): batch size.
use_gpu (bool): Whether to use gpu.
top_k (int): Return top k results.
Returns:
res (list[dict]): The classfication results, each result dict contains key 'class_ids', 'scores' and 'label_names'.
'''
postprocess_func = Topk(top_k, self.label_path)
inputs = []
results = []
paddle.disable_static()
place = 'gpu:0' if use_gpu else 'cpu'
place = paddle.set_device(place)
if images == None and paths == None:
print('No image provided. Please input an image or a image path.')
return
if images != None:
for image in images:
image = image[:, :, ::-1]
inputs.append(image)
if paths != None:
for path in paths:
image = cv2.imread(path)[:, :, ::-1]
inputs.append(image)
batch_data = []
for idx, imagedata in enumerate(inputs):
for process in self.preprocess_funcs:
imagedata = process(imagedata)
batch_data.append(imagedata)
if len(batch_data) >= batch_size or idx == len(inputs) - 1:
batch_tensor = paddle.to_tensor(batch_data)
out = self.model(batch_tensor)
if isinstance(out, list):
out = out[0]
if isinstance(out, dict) and "logits" in out:
out = out["logits"]
if isinstance(out, dict) and "output" in out:
out = out["output"]
result = postprocess_func(out)
results.extend(result)
batch_data.clear()
return results
@runnable
def run_cmd(self, argvs: list):
"""
Run as a command.
"""
self.parser = argparse.ArgumentParser(description="Run the {} module.".format(self.name),
prog='hub run {}'.format(self.name),
usage='%(prog)s',
add_help=True)
self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
self.arg_config_group = self.parser.add_argument_group(
title="Config options", description="Run configuration for controlling module behavior, not required.")
self.add_module_config_arg()
self.add_module_input_arg()
self.args = self.parser.parse_args(argvs)
results = self.classification(paths=[self.args.input_path],
use_gpu=self.args.use_gpu,
batch_size=self.args.batch_size,
top_k=self.args.top_k)
return results
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.classification(images=images_decode, **kwargs)
return results
def add_module_config_arg(self):
"""
Add the command config options.
"""
self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not")
self.arg_config_group.add_argument('--batch_size', type=int, default=1, help='batch size')
self.arg_config_group.add_argument('--top_k', type=int, default=1, help='Return top k results.')
def add_module_input_arg(self):
"""
Add the command input options.
"""
self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.")
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import base64
import inspect
import math
import os
import random
import sys
from functools import partial
import cv2
import numpy as np
import paddle
import paddle.nn.functional as F
import six
from paddle.vision.transforms import ColorJitter as RawColorJitter
from PIL import Image
def create_operators(params, class_num=None):
"""
create operators based on the config
Args:
params(list): a dict list, used to create some operators
"""
assert isinstance(params, list), ('operator config should be a list')
ops = []
current_module = sys.modules[__name__]
for operator in params:
assert isinstance(operator, dict) and len(operator) == 1, "yaml format error"
op_name = list(operator)[0]
param = {} if operator[op_name] is None else operator[op_name]
op_func = getattr(current_module, op_name)
if "class_num" in inspect.getfullargspec(op_func).args:
param.update({"class_num": class_num})
op = op_func(**param)
ops.append(op)
return ops
class UnifiedResize(object):
def __init__(self, interpolation=None, backend="cv2"):
_cv2_interp_from_str = {
'nearest': cv2.INTER_NEAREST,
'bilinear': cv2.INTER_LINEAR,
'area': cv2.INTER_AREA,
'bicubic': cv2.INTER_CUBIC,
'lanczos': cv2.INTER_LANCZOS4
}
_pil_interp_from_str = {
'nearest': Image.NEAREST,
'bilinear': Image.BILINEAR,
'bicubic': Image.BICUBIC,
'box': Image.BOX,
'lanczos': Image.LANCZOS,
'hamming': Image.HAMMING
}
def _pil_resize(src, size, resample):
pil_img = Image.fromarray(src)
pil_img = pil_img.resize(size, resample)
return np.asarray(pil_img)
if backend.lower() == "cv2":
if isinstance(interpolation, str):
interpolation = _cv2_interp_from_str[interpolation.lower()]
# compatible with opencv < version 4.4.0
elif interpolation is None:
interpolation = cv2.INTER_LINEAR
self.resize_func = partial(cv2.resize, interpolation=interpolation)
elif backend.lower() == "pil":
if isinstance(interpolation, str):
interpolation = _pil_interp_from_str[interpolation.lower()]
self.resize_func = partial(_pil_resize, resample=interpolation)
else:
self.resize_func = cv2.resize
def __call__(self, src, size):
return self.resize_func(src, size)
class OperatorParamError(ValueError):
""" OperatorParamError
"""
pass
class DecodeImage(object):
""" decode image """
def __init__(self, to_rgb=True, to_np=False, channel_first=False):
self.to_rgb = to_rgb
self.to_np = to_np # to numpy
self.channel_first = channel_first # only enabled when to_np is True
def __call__(self, img):
if six.PY2:
assert type(img) is str and len(img) > 0, "invalid input 'img' in DecodeImage"
else:
assert type(img) is bytes and len(img) > 0, "invalid input 'img' in DecodeImage"
data = np.frombuffer(img, dtype='uint8')
img = cv2.imdecode(data, 1)
if self.to_rgb:
assert img.shape[2] == 3, 'invalid shape of image[%s]' % (img.shape)
img = img[:, :, ::-1]
if self.channel_first:
img = img.transpose((2, 0, 1))
return img
class ResizeImage(object):
""" resize image """
def __init__(self, size=None, resize_short=None, interpolation=None, backend="cv2"):
if resize_short is not None and resize_short > 0:
self.resize_short = resize_short
self.w = None
self.h = None
elif size is not None:
self.resize_short = None
self.w = size if type(size) is int else size[0]
self.h = size if type(size) is int else size[1]
else:
raise OperatorParamError("invalid params for ReisizeImage for '\
'both 'size' and 'resize_short' are None")
self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend)
def __call__(self, img):
img_h, img_w = img.shape[:2]
if self.resize_short is not None:
percent = float(self.resize_short) / min(img_w, img_h)
w = int(round(img_w * percent))
h = int(round(img_h * percent))
else:
w = self.w
h = self.h
return self._resize_func(img, (w, h))
class CropImage(object):
""" crop image """
def __init__(self, size):
if type(size) is int:
self.size = (size, size)
else:
self.size = size # (h, w)
def __call__(self, img):
w, h = self.size
img_h, img_w = img.shape[:2]
w_start = (img_w - w) // 2
h_start = (img_h - h) // 2
w_end = w_start + w
h_end = h_start + h
return img[h_start:h_end, w_start:w_end, :]
class RandCropImage(object):
""" random crop image """
def __init__(self, size, scale=None, ratio=None, interpolation=None, backend="cv2"):
if type(size) is int:
self.size = (size, size) # (h, w)
else:
self.size = size
self.scale = [0.08, 1.0] if scale is None else scale
self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio
self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend)
def __call__(self, img):
size = self.size
scale = self.scale
ratio = self.ratio
aspect_ratio = math.sqrt(random.uniform(*ratio))
w = 1. * aspect_ratio
h = 1. / aspect_ratio
img_h, img_w = img.shape[:2]
bound = min((float(img_w) / img_h) / (w**2), (float(img_h) / img_w) / (h**2))
scale_max = min(scale[1], bound)
scale_min = min(scale[0], bound)
target_area = img_w * img_h * random.uniform(scale_min, scale_max)
target_size = math.sqrt(target_area)
w = int(target_size * w)
h = int(target_size * h)
i = random.randint(0, img_w - w)
j = random.randint(0, img_h - h)
img = img[j:j + h, i:i + w, :]
return self._resize_func(img, size)
class RandFlipImage(object):
""" random flip image
flip_code:
1: Flipped Horizontally
0: Flipped Vertically
-1: Flipped Horizontally & Vertically
"""
def __init__(self, flip_code=1):
assert flip_code in [-1, 0, 1], "flip_code should be a value in [-1, 0, 1]"
self.flip_code = flip_code
def __call__(self, img):
if random.randint(0, 1) == 1:
return cv2.flip(img, self.flip_code)
else:
return img
class NormalizeImage(object):
""" normalize image such as substract mean, divide std
"""
def __init__(self, scale=None, mean=None, std=None, order='chw', output_fp16=False, channel_num=3):
if isinstance(scale, str):
scale = eval(scale)
assert channel_num in [3, 4], "channel number of input image should be set to 3 or 4."
self.channel_num = channel_num
self.output_dtype = 'float16' if output_fp16 else 'float32'
self.scale = np.float32(scale if scale is not None else 1.0 / 255.0)
self.order = order
mean = mean if mean is not None else [0.485, 0.456, 0.406]
std = std if std is not None else [0.229, 0.224, 0.225]
shape = (3, 1, 1) if self.order == 'chw' else (1, 1, 3)
self.mean = np.array(mean).reshape(shape).astype('float32')
self.std = np.array(std).reshape(shape).astype('float32')
def __call__(self, img):
from PIL import Image
if isinstance(img, Image.Image):
img = np.array(img)
assert isinstance(img, np.ndarray), "invalid input 'img' in NormalizeImage"
img = (img.astype('float32') * self.scale - self.mean) / self.std
if self.channel_num == 4:
img_h = img.shape[1] if self.order == 'chw' else img.shape[0]
img_w = img.shape[2] if self.order == 'chw' else img.shape[1]
pad_zeros = np.zeros((1, img_h, img_w)) if self.order == 'chw' else np.zeros((img_h, img_w, 1))
img = (np.concatenate((img, pad_zeros), axis=0) if self.order == 'chw' else np.concatenate(
(img, pad_zeros), axis=2))
return img.astype(self.output_dtype)
class ToCHWImage(object):
""" convert hwc image to chw image
"""
def __init__(self):
pass
def __call__(self, img):
from PIL import Image
if isinstance(img, Image.Image):
img = np.array(img)
return img.transpose((2, 0, 1))
class ColorJitter(RawColorJitter):
"""ColorJitter.
"""
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
def __call__(self, img):
if not isinstance(img, Image.Image):
img = np.ascontiguousarray(img)
img = Image.fromarray(img)
img = super()._apply_image(img)
if isinstance(img, Image.Image):
img = np.asarray(img)
return img
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
class Topk(object):
def __init__(self, topk=1, class_id_map_file=None):
assert isinstance(topk, (int, ))
self.class_id_map = self.parse_class_id_map(class_id_map_file)
self.topk = topk
def parse_class_id_map(self, class_id_map_file):
if class_id_map_file is None:
return None
if not os.path.exists(class_id_map_file):
print(
"Warning: If want to use your own label_dict, please input legal path!\nOtherwise label_names will be empty!"
)
return None
try:
class_id_map = {}
with open(class_id_map_file, "r") as fin:
lines = fin.readlines()
for line in lines:
partition = line.split("\n")[0].partition(" ")
class_id_map[int(partition[0])] = str(partition[-1])
except Exception as ex:
print(ex)
class_id_map = None
return class_id_map
def __call__(self, x, file_names=None, multilabel=False):
assert isinstance(x, paddle.Tensor)
if file_names is not None:
assert x.shape[0] == len(file_names)
x = F.softmax(x, axis=-1) if not multilabel else F.sigmoid(x)
x = x.numpy()
y = []
for idx, probs in enumerate(x):
index = probs.argsort(axis=0)[-self.topk:][::-1].astype("int32") if not multilabel else np.where(
probs >= 0.5)[0].astype("int32")
clas_id_list = []
score_list = []
label_name_list = []
for i in index:
clas_id_list.append(i.item())
score_list.append(probs[i].item())
if self.class_id_map is not None:
label_name_list.append(self.class_id_map[i.item()])
result = {
"class_ids": clas_id_list,
"scores": np.around(score_list, decimals=5).tolist(),
}
if file_names is not None:
result["file_name"] = file_names[idx]
if label_name_list is not None:
result["label_names"] = label_name_list
y.append(result)
return y
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import copy
import os
import yaml
__all__ = ['get_config']
class AttrDict(dict):
def __getattr__(self, key):
return self[key]
def __setattr__(self, key, value):
if key in self.__dict__:
self.__dict__[key] = value
else:
self[key] = value
def __deepcopy__(self, content):
return copy.deepcopy(dict(self))
def create_attr_dict(yaml_config):
from ast import literal_eval
for key, value in yaml_config.items():
if type(value) is dict:
yaml_config[key] = value = AttrDict(value)
if isinstance(value, str):
try:
value = literal_eval(value)
except BaseException:
pass
if isinstance(value, AttrDict):
create_attr_dict(yaml_config[key])
else:
yaml_config[key] = value
def parse_config(cfg_file):
"""Load a config file into AttrDict"""
with open(cfg_file, 'r') as fopen:
yaml_config = AttrDict(yaml.load(fopen, Loader=yaml.SafeLoader))
create_attr_dict(yaml_config)
return yaml_config
def override(dl, ks, v):
"""
Recursively replace dict of list
Args:
dl(dict or list): dict or list to be replaced
ks(list): list of keys
v(str): value to be replaced
"""
def str2num(v):
try:
return eval(v)
except Exception:
return v
assert isinstance(dl, (list, dict)), ("{} should be a list or a dict")
assert len(ks) > 0, ('lenght of keys should larger than 0')
if isinstance(dl, list):
k = str2num(ks[0])
if len(ks) == 1:
assert k < len(dl), ('index({}) out of range({})'.format(k, dl))
dl[k] = str2num(v)
else:
override(dl[k], ks[1:], v)
else:
if len(ks) == 1:
# assert ks[0] in dl, ('{} is not exist in {}'.format(ks[0], dl))
if not ks[0] in dl:
print('A new filed ({}) detected!'.format(ks[0], dl))
dl[ks[0]] = str2num(v)
else:
override(dl[ks[0]], ks[1:], v)
def override_config(config, options=None):
"""
Recursively override the config
Args:
config(dict): dict to be replaced
options(list): list of pairs(key0.key1.idx.key2=value)
such as: [
'topk=2',
'VALID.transforms.1.ResizeImage.resize_short=300'
]
Returns:
config(dict): replaced config
"""
if options is not None:
for opt in options:
assert isinstance(opt, str), ("option({}) should be a str".format(opt))
assert "=" in opt, ("option({}) should contain a ="
"to distinguish between key and value".format(opt))
pair = opt.split('=')
assert len(pair) == 2, ("there can be only a = in the option")
key, value = pair
keys = key.split('.')
override(config, keys, value)
return config
def get_config(fname, overrides=None, show=False):
"""
Read config from file
"""
assert os.path.exists(fname), ('config file({}) is not exist'.format(fname))
config = parse_config(fname)
override_config(config, overrides)
return config
# levit_128s_imagenet
|模型名称|levit_128s_imagenet|
| :--- | :---: |
|类别|图像-图像分类|
|网络|LeViT|
|数据集|ImageNet-2012|
|是否支持Fine-tuning|否|
|模型大小|45 MB|
|最新更新日期|2022-04-02|
|数据指标|Acc|
## 一、模型基本信息
- ### 模型介绍
- LeViT 是一种快速推理的、用于图像分类任务的混合神经网络。其设计之初考虑了网络模型在不同的硬件平台上的性能,因此能够更好地反映普遍应用的真实场景。通过大量实验,作者找到了卷积神经网络与 Transformer 体系更好的结合方式,并且提出了 attention-based 方法,用于整合 Transformer 中的位置信息编码, 该模块的模型结构配置为LeViT128s, 详情可参考[论文地址](https://arxiv.org/abs/2104.01136)
## 二、安装
- ### 1、环境依赖
- paddlepaddle >= 1.6.2
- paddlehub >= 1.6.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
- ### 2、安装
- ```shell
$ hub install levit_128s_imagenet
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
$ hub run levit_128s_imagenet --input_path "/PATH/TO/IMAGE"
```
- 通过命令行方式实现分类模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、预测代码示例
- ```python
import paddlehub as hub
import cv2
classifier = hub.Module(name="levit_128s_imagenet")
result = classifier.classification(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = classifier.classification(paths=['/PATH/TO/IMAGE'])
```
- ### 3、API
- ```python
def classification(images=None,
paths=None,
batch_size=1,
use_gpu=False,
top_k=1):
```
- 分类接口API。
- **参数**
- images (list\[numpy.ndarray\]): 图片数据,每一个图片数据的shape 均为 \[H, W, C\],颜色空间为 BGR; <br/>
- paths (list\[str\]): 图片的路径; <br/>
- batch\_size (int): batch 的大小;<br/>
- use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量** <br/>
- top\_k (int): 返回预测结果的前 k 个。
- **返回**
- res (list\[dict\]): 分类结果,列表的每一个元素均为字典,其中 key 包括'class_ids'(种类索引), 'scores'(置信度) 和 'label_names'(种类名称)
## 四、服务部署
- PaddleHub Serving可以部署一个图像识别的在线服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m levit_128s_imagenet
```
- 这样就完成了一个图像识别的在线服务的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"\}
url = "http://127.0.0.1:8866/predict/levit_128s_imagenet"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
## 五、更新历史
* 1.0.0
初始发布
- ```shell
$ hub install levit_128s_imagenet==1.0.0
```
# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Code was based on https://github.com/facebookresearch/LeViT
import itertools
import math
import warnings
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddle.nn.initializer import Constant
from paddle.nn.initializer import TruncatedNormal
from paddle.regularizer import L2Decay
from .vision_transformer import Identity
from .vision_transformer import ones_
from .vision_transformer import trunc_normal_
from .vision_transformer import zeros_
def cal_attention_biases(attention_biases, attention_bias_idxs):
gather_list = []
attention_bias_t = paddle.transpose(attention_biases, (1, 0))
nums = attention_bias_idxs.shape[0]
for idx in range(nums):
gather = paddle.gather(attention_bias_t, attention_bias_idxs[idx])
gather_list.append(gather)
shape0, shape1 = attention_bias_idxs.shape
gather = paddle.concat(gather_list)
return paddle.transpose(gather, (1, 0)).reshape((0, shape0, shape1))
class Conv2d_BN(nn.Sequential):
def __init__(self, a, b, ks=1, stride=1, pad=0, dilation=1, groups=1, bn_weight_init=1, resolution=-10000):
super().__init__()
self.add_sublayer('c', nn.Conv2D(a, b, ks, stride, pad, dilation, groups, bias_attr=False))
bn = nn.BatchNorm2D(b)
ones_(bn.weight)
zeros_(bn.bias)
self.add_sublayer('bn', bn)
class Linear_BN(nn.Sequential):
def __init__(self, a, b, bn_weight_init=1):
super().__init__()
self.add_sublayer('c', nn.Linear(a, b, bias_attr=False))
bn = nn.BatchNorm1D(b)
if bn_weight_init == 0:
zeros_(bn.weight)
else:
ones_(bn.weight)
zeros_(bn.bias)
self.add_sublayer('bn', bn)
def forward(self, x):
l, bn = self._sub_layers.values()
x = l(x)
return paddle.reshape(bn(x.flatten(0, 1)), x.shape)
class BN_Linear(nn.Sequential):
def __init__(self, a, b, bias=True, std=0.02):
super().__init__()
self.add_sublayer('bn', nn.BatchNorm1D(a))
l = nn.Linear(a, b, bias_attr=bias)
trunc_normal_(l.weight)
if bias:
zeros_(l.bias)
self.add_sublayer('l', l)
def b16(n, activation, resolution=224):
return nn.Sequential(Conv2d_BN(3, n // 8, 3, 2, 1, resolution=resolution), activation(),
Conv2d_BN(n // 8, n // 4, 3, 2, 1, resolution=resolution // 2), activation(),
Conv2d_BN(n // 4, n // 2, 3, 2, 1, resolution=resolution // 4), activation(),
Conv2d_BN(n // 2, n, 3, 2, 1, resolution=resolution // 8))
class Residual(nn.Layer):
def __init__(self, m, drop):
super().__init__()
self.m = m
self.drop = drop
def forward(self, x):
if self.training and self.drop > 0:
y = paddle.rand(shape=[x.shape[0], 1, 1]).__ge__(self.drop).astype("float32")
y = y.divide(paddle.full_like(y, 1 - self.drop))
return paddle.add(x, y)
else:
return paddle.add(x, self.m(x))
class Attention(nn.Layer):
def __init__(self, dim, key_dim, num_heads=8, attn_ratio=4, activation=None, resolution=14):
super().__init__()
self.num_heads = num_heads
self.scale = key_dim**-0.5
self.key_dim = key_dim
self.nh_kd = nh_kd = key_dim * num_heads
self.d = int(attn_ratio * key_dim)
self.dh = int(attn_ratio * key_dim) * num_heads
self.attn_ratio = attn_ratio
self.h = self.dh + nh_kd * 2
self.qkv = Linear_BN(dim, self.h)
self.proj = nn.Sequential(activation(), Linear_BN(self.dh, dim, bn_weight_init=0))
points = list(itertools.product(range(resolution), range(resolution)))
N = len(points)
attention_offsets = {}
idxs = []
for p1 in points:
for p2 in points:
offset = (abs(p1[0] - p2[0]), abs(p1[1] - p2[1]))
if offset not in attention_offsets:
attention_offsets[offset] = len(attention_offsets)
idxs.append(attention_offsets[offset])
self.attention_biases = self.create_parameter(shape=(num_heads, len(attention_offsets)),
default_initializer=zeros_,
attr=paddle.ParamAttr(regularizer=L2Decay(0.0)))
tensor_idxs = paddle.to_tensor(idxs, dtype='int64')
self.register_buffer('attention_bias_idxs', paddle.reshape(tensor_idxs, [N, N]))
@paddle.no_grad()
def train(self, mode=True):
if mode:
super().train()
else:
super().eval()
if mode and hasattr(self, 'ab'):
del self.ab
else:
self.ab = cal_attention_biases(self.attention_biases, self.attention_bias_idxs)
def forward(self, x):
self.training = True
B, N, C = x.shape
qkv = self.qkv(x)
qkv = paddle.reshape(qkv, [B, N, self.num_heads, self.h // self.num_heads])
q, k, v = paddle.split(qkv, [self.key_dim, self.key_dim, self.d], axis=3)
q = paddle.transpose(q, perm=[0, 2, 1, 3])
k = paddle.transpose(k, perm=[0, 2, 1, 3])
v = paddle.transpose(v, perm=[0, 2, 1, 3])
k_transpose = paddle.transpose(k, perm=[0, 1, 3, 2])
if self.training:
attention_biases = cal_attention_biases(self.attention_biases, self.attention_bias_idxs)
else:
attention_biases = self.ab
attn = (paddle.matmul(q, k_transpose) * self.scale + attention_biases)
attn = F.softmax(attn)
x = paddle.transpose(paddle.matmul(attn, v), perm=[0, 2, 1, 3])
x = paddle.reshape(x, [B, N, self.dh])
x = self.proj(x)
return x
class Subsample(nn.Layer):
def __init__(self, stride, resolution):
super().__init__()
self.stride = stride
self.resolution = resolution
def forward(self, x):
B, N, C = x.shape
x = paddle.reshape(x, [B, self.resolution, self.resolution, C])
end1, end2 = x.shape[1], x.shape[2]
x = x[:, 0:end1:self.stride, 0:end2:self.stride]
x = paddle.reshape(x, [B, -1, C])
return x
class AttentionSubsample(nn.Layer):
def __init__(self,
in_dim,
out_dim,
key_dim,
num_heads=8,
attn_ratio=2,
activation=None,
stride=2,
resolution=14,
resolution_=7):
super().__init__()
self.num_heads = num_heads
self.scale = key_dim**-0.5
self.key_dim = key_dim
self.nh_kd = nh_kd = key_dim * num_heads
self.d = int(attn_ratio * key_dim)
self.dh = int(attn_ratio * key_dim) * self.num_heads
self.attn_ratio = attn_ratio
self.resolution_ = resolution_
self.resolution_2 = resolution_**2
self.training = True
h = self.dh + nh_kd
self.kv = Linear_BN(in_dim, h)
self.q = nn.Sequential(Subsample(stride, resolution), Linear_BN(in_dim, nh_kd))
self.proj = nn.Sequential(activation(), Linear_BN(self.dh, out_dim))
self.stride = stride
self.resolution = resolution
points = list(itertools.product(range(resolution), range(resolution)))
points_ = list(itertools.product(range(resolution_), range(resolution_)))
N = len(points)
N_ = len(points_)
attention_offsets = {}
idxs = []
i = 0
j = 0
for p1 in points_:
i += 1
for p2 in points:
j += 1
size = 1
offset = (abs(p1[0] * stride - p2[0] + (size - 1) / 2), abs(p1[1] * stride - p2[1] + (size - 1) / 2))
if offset not in attention_offsets:
attention_offsets[offset] = len(attention_offsets)
idxs.append(attention_offsets[offset])
self.attention_biases = self.create_parameter(shape=(num_heads, len(attention_offsets)),
default_initializer=zeros_,
attr=paddle.ParamAttr(regularizer=L2Decay(0.0)))
tensor_idxs_ = paddle.to_tensor(idxs, dtype='int64')
self.register_buffer('attention_bias_idxs', paddle.reshape(tensor_idxs_, [N_, N]))
@paddle.no_grad()
def train(self, mode=True):
if mode:
super().train()
else:
super().eval()
if mode and hasattr(self, 'ab'):
del self.ab
else:
self.ab = cal_attention_biases(self.attention_biases, self.attention_bias_idxs)
def forward(self, x):
self.training = True
B, N, C = x.shape
kv = self.kv(x)
kv = paddle.reshape(kv, [B, N, self.num_heads, -1])
k, v = paddle.split(kv, [self.key_dim, self.d], axis=3)
k = paddle.transpose(k, perm=[0, 2, 1, 3]) # BHNC
v = paddle.transpose(v, perm=[0, 2, 1, 3])
q = paddle.reshape(self.q(x), [B, self.resolution_2, self.num_heads, self.key_dim])
q = paddle.transpose(q, perm=[0, 2, 1, 3])
if self.training:
attention_biases = cal_attention_biases(self.attention_biases, self.attention_bias_idxs)
else:
attention_biases = self.ab
attn = (paddle.matmul(q, paddle.transpose(k, perm=[0, 1, 3, 2]))) * self.scale + attention_biases
attn = F.softmax(attn)
x = paddle.reshape(paddle.transpose(paddle.matmul(attn, v), perm=[0, 2, 1, 3]), [B, -1, self.dh])
x = self.proj(x)
return x
class LeViT(nn.Layer):
""" Vision Transformer with support for patch or hybrid CNN input stage
"""
def __init__(self,
img_size=224,
patch_size=16,
in_chans=3,
class_num=1000,
embed_dim=[192],
key_dim=[64],
depth=[12],
num_heads=[3],
attn_ratio=[2],
mlp_ratio=[2],
hybrid_backbone=None,
down_ops=[],
attention_activation=nn.Hardswish,
mlp_activation=nn.Hardswish,
distillation=True,
drop_path=0):
super().__init__()
self.class_num = class_num
self.num_features = embed_dim[-1]
self.embed_dim = embed_dim
self.distillation = distillation
self.patch_embed = hybrid_backbone
self.blocks = []
down_ops.append([''])
resolution = img_size // patch_size
for i, (ed, kd, dpth, nh, ar, mr,
do) in enumerate(zip(embed_dim, key_dim, depth, num_heads, attn_ratio, mlp_ratio, down_ops)):
for _ in range(dpth):
self.blocks.append(
Residual(
Attention(
ed,
kd,
nh,
attn_ratio=ar,
activation=attention_activation,
resolution=resolution,
), drop_path))
if mr > 0:
h = int(ed * mr)
self.blocks.append(
Residual(
nn.Sequential(
Linear_BN(ed, h),
mlp_activation(),
Linear_BN(h, ed, bn_weight_init=0),
), drop_path))
if do[0] == 'Subsample':
#('Subsample',key_dim, num_heads, attn_ratio, mlp_ratio, stride)
resolution_ = (resolution - 1) // do[5] + 1
self.blocks.append(
AttentionSubsample(*embed_dim[i:i + 2],
key_dim=do[1],
num_heads=do[2],
attn_ratio=do[3],
activation=attention_activation,
stride=do[5],
resolution=resolution,
resolution_=resolution_))
resolution = resolution_
if do[4] > 0: # mlp_ratio
h = int(embed_dim[i + 1] * do[4])
self.blocks.append(
Residual(
nn.Sequential(
Linear_BN(embed_dim[i + 1], h),
mlp_activation(),
Linear_BN(h, embed_dim[i + 1], bn_weight_init=0),
), drop_path))
self.blocks = nn.Sequential(*self.blocks)
# Classifier head
self.head = BN_Linear(embed_dim[-1], class_num) if class_num > 0 else Identity()
if distillation:
self.head_dist = BN_Linear(embed_dim[-1], class_num) if class_num > 0 else Identity()
def forward(self, x):
x = self.patch_embed(x)
x = x.flatten(2)
x = paddle.transpose(x, perm=[0, 2, 1])
x = self.blocks(x)
x = x.mean(1)
x = paddle.reshape(x, [-1, self.embed_dim[-1]])
if self.distillation:
x = self.head(x), self.head_dist(x)
if not self.training:
x = (x[0] + x[1]) / 2
else:
x = self.head(x)
return x
def model_factory(C, D, X, N, drop_path, class_num, distillation):
embed_dim = [int(x) for x in C.split('_')]
num_heads = [int(x) for x in N.split('_')]
depth = [int(x) for x in X.split('_')]
act = nn.Hardswish
model = LeViT(
patch_size=16,
embed_dim=embed_dim,
num_heads=num_heads,
key_dim=[D] * 3,
depth=depth,
attn_ratio=[2, 2, 2],
mlp_ratio=[2, 2, 2],
down_ops=[
#('Subsample',key_dim, num_heads, attn_ratio, mlp_ratio, stride)
['Subsample', D, embed_dim[0] // D, 4, 2, 2],
['Subsample', D, embed_dim[1] // D, 4, 2, 2],
],
attention_activation=act,
mlp_activation=act,
hybrid_backbone=b16(embed_dim[0], activation=act),
class_num=class_num,
drop_path=drop_path,
distillation=distillation)
return model
specification = {
'LeViT_128S': {
'C': '128_256_384',
'D': 16,
'N': '4_6_8',
'X': '2_3_4',
'drop_path': 0
},
'LeViT_128': {
'C': '128_256_384',
'D': 16,
'N': '4_8_12',
'X': '4_4_4',
'drop_path': 0
},
'LeViT_192': {
'C': '192_288_384',
'D': 32,
'N': '3_5_6',
'X': '4_4_4',
'drop_path': 0
},
'LeViT_256': {
'C': '256_384_512',
'D': 32,
'N': '4_6_8',
'X': '4_4_4',
'drop_path': 0
},
'LeViT_384': {
'C': '384_512_768',
'D': 32,
'N': '6_9_12',
'X': '4_4_4',
'drop_path': 0.1
},
}
def LeViT_128S(**kwargs):
model = model_factory(**specification['LeViT_128S'], class_num=1000, distillation=False)
return model
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import copy
import os
import cv2
import numpy as np
import paddle
from skimage.io import imread
from skimage.transform import rescale
from skimage.transform import resize
import paddlehub as hub
from .model import LeViT_128S
from .processor import base64_to_cv2
from .processor import create_operators
from .processor import Topk
from .utils import get_config
from paddlehub.module.module import moduleinfo
from paddlehub.module.module import runnable
from paddlehub.module.module import serving
@moduleinfo(name="levit_128s_imagenet",
type="cv/classification",
author="paddlepaddle",
author_email="",
summary="",
version="1.0.0")
class LeViT_128S_ImageNet:
def __init__(self):
self.config = get_config(os.path.join(self.directory, 'LeViT_128S.yaml'), show=False)
self.label_path = os.path.join(self.directory, 'imagenet1k_label_list.txt')
self.pretrain_path = os.path.join(self.directory, 'LeViT_128S_pretrained.pdparams')
self.config['Infer']['PostProcess']['class_id_map_file'] = self.label_path
self.model = LeViT_128S()
param_state_dict = paddle.load(self.pretrain_path)
self.model.set_dict(param_state_dict)
self.preprocess_funcs = create_operators(self.config["Infer"]["transforms"])
def classification(self,
images: list = None,
paths: list = None,
batch_size: int = 1,
use_gpu: bool = False,
top_k: int = 1):
'''
Args:
images (list[numpy.ndarray]): data of images, shape of each is [H, W, C], color space must be BGR.
paths (list[str]): The paths of images.
batch_size (int): batch size.
use_gpu (bool): Whether to use gpu.
top_k (int): Return top k results.
Returns:
res (list[dict]): The classfication results, each result dict contains key 'class_ids', 'scores' and 'label_names'.
'''
postprocess_func = Topk(top_k, self.label_path)
inputs = []
results = []
paddle.disable_static()
place = 'gpu:0' if use_gpu else 'cpu'
place = paddle.set_device(place)
if images == None and paths == None:
print('No image provided. Please input an image or a image path.')
return
if images != None:
for image in images:
image = image[:, :, ::-1]
inputs.append(image)
if paths != None:
for path in paths:
image = cv2.imread(path)[:, :, ::-1]
inputs.append(image)
batch_data = []
for idx, imagedata in enumerate(inputs):
for process in self.preprocess_funcs:
imagedata = process(imagedata)
batch_data.append(imagedata)
if len(batch_data) >= batch_size or idx == len(inputs) - 1:
batch_tensor = paddle.to_tensor(batch_data)
out = self.model(batch_tensor)
if isinstance(out, list):
out = out[0]
if isinstance(out, dict) and "logits" in out:
out = out["logits"]
if isinstance(out, dict) and "output" in out:
out = out["output"]
result = postprocess_func(out)
results.extend(result)
batch_data.clear()
return results
@runnable
def run_cmd(self, argvs: list):
"""
Run as a command.
"""
self.parser = argparse.ArgumentParser(description="Run the {} module.".format(self.name),
prog='hub run {}'.format(self.name),
usage='%(prog)s',
add_help=True)
self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required")
self.arg_config_group = self.parser.add_argument_group(
title="Config options", description="Run configuration for controlling module behavior, not required.")
self.add_module_config_arg()
self.add_module_input_arg()
self.args = self.parser.parse_args(argvs)
results = self.classification(paths=[self.args.input_path],
use_gpu=self.args.use_gpu,
batch_size=self.args.batch_size,
top_k=self.args.top_k)
return results
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.classification(images=images_decode, **kwargs)
return results
def add_module_config_arg(self):
"""
Add the command config options.
"""
self.arg_config_group.add_argument('--use_gpu', action='store_true', help="use GPU or not")
self.arg_config_group.add_argument('--batch_size', type=int, default=1, help='batch size')
self.arg_config_group.add_argument('--top_k', type=int, default=1, help='Return top k results.')
def add_module_input_arg(self):
"""
Add the command input options.
"""
self.arg_input_group.add_argument('--input_path', type=str, help="path to input image.")
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import base64
import inspect
import math
import os
import random
import sys
from functools import partial
import cv2
import numpy as np
import paddle
import paddle.nn.functional as F
import six
from paddle.vision.transforms import ColorJitter as RawColorJitter
from PIL import Image
def create_operators(params, class_num=None):
"""
create operators based on the config
Args:
params(list): a dict list, used to create some operators
"""
assert isinstance(params, list), ('operator config should be a list')
ops = []
current_module = sys.modules[__name__]
for operator in params:
assert isinstance(operator, dict) and len(operator) == 1, "yaml format error"
op_name = list(operator)[0]
param = {} if operator[op_name] is None else operator[op_name]
op_func = getattr(current_module, op_name)
if "class_num" in inspect.getfullargspec(op_func).args:
param.update({"class_num": class_num})
op = op_func(**param)
ops.append(op)
return ops
class UnifiedResize(object):
def __init__(self, interpolation=None, backend="cv2"):
_cv2_interp_from_str = {
'nearest': cv2.INTER_NEAREST,
'bilinear': cv2.INTER_LINEAR,
'area': cv2.INTER_AREA,
'bicubic': cv2.INTER_CUBIC,
'lanczos': cv2.INTER_LANCZOS4
}
_pil_interp_from_str = {
'nearest': Image.NEAREST,
'bilinear': Image.BILINEAR,
'bicubic': Image.BICUBIC,
'box': Image.BOX,
'lanczos': Image.LANCZOS,
'hamming': Image.HAMMING
}
def _pil_resize(src, size, resample):
pil_img = Image.fromarray(src)
pil_img = pil_img.resize(size, resample)
return np.asarray(pil_img)
if backend.lower() == "cv2":
if isinstance(interpolation, str):
interpolation = _cv2_interp_from_str[interpolation.lower()]
# compatible with opencv < version 4.4.0
elif interpolation is None:
interpolation = cv2.INTER_LINEAR
self.resize_func = partial(cv2.resize, interpolation=interpolation)
elif backend.lower() == "pil":
if isinstance(interpolation, str):
interpolation = _pil_interp_from_str[interpolation.lower()]
self.resize_func = partial(_pil_resize, resample=interpolation)
else:
self.resize_func = cv2.resize
def __call__(self, src, size):
return self.resize_func(src, size)
class OperatorParamError(ValueError):
""" OperatorParamError
"""
pass
class DecodeImage(object):
""" decode image """
def __init__(self, to_rgb=True, to_np=False, channel_first=False):
self.to_rgb = to_rgb
self.to_np = to_np # to numpy
self.channel_first = channel_first # only enabled when to_np is True
def __call__(self, img):
if six.PY2:
assert type(img) is str and len(img) > 0, "invalid input 'img' in DecodeImage"
else:
assert type(img) is bytes and len(img) > 0, "invalid input 'img' in DecodeImage"
data = np.frombuffer(img, dtype='uint8')
img = cv2.imdecode(data, 1)
if self.to_rgb:
assert img.shape[2] == 3, 'invalid shape of image[%s]' % (img.shape)
img = img[:, :, ::-1]
if self.channel_first:
img = img.transpose((2, 0, 1))
return img
class ResizeImage(object):
""" resize image """
def __init__(self, size=None, resize_short=None, interpolation=None, backend="cv2"):
if resize_short is not None and resize_short > 0:
self.resize_short = resize_short
self.w = None
self.h = None
elif size is not None:
self.resize_short = None
self.w = size if type(size) is int else size[0]
self.h = size if type(size) is int else size[1]
else:
raise OperatorParamError("invalid params for ReisizeImage for '\
'both 'size' and 'resize_short' are None")
self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend)
def __call__(self, img):
img_h, img_w = img.shape[:2]
if self.resize_short is not None:
percent = float(self.resize_short) / min(img_w, img_h)
w = int(round(img_w * percent))
h = int(round(img_h * percent))
else:
w = self.w
h = self.h
return self._resize_func(img, (w, h))
class CropImage(object):
""" crop image """
def __init__(self, size):
if type(size) is int:
self.size = (size, size)
else:
self.size = size # (h, w)
def __call__(self, img):
w, h = self.size
img_h, img_w = img.shape[:2]
w_start = (img_w - w) // 2
h_start = (img_h - h) // 2
w_end = w_start + w
h_end = h_start + h
return img[h_start:h_end, w_start:w_end, :]
class RandCropImage(object):
""" random crop image """
def __init__(self, size, scale=None, ratio=None, interpolation=None, backend="cv2"):
if type(size) is int:
self.size = (size, size) # (h, w)
else:
self.size = size
self.scale = [0.08, 1.0] if scale is None else scale
self.ratio = [3. / 4., 4. / 3.] if ratio is None else ratio
self._resize_func = UnifiedResize(interpolation=interpolation, backend=backend)
def __call__(self, img):
size = self.size
scale = self.scale
ratio = self.ratio
aspect_ratio = math.sqrt(random.uniform(*ratio))
w = 1. * aspect_ratio
h = 1. / aspect_ratio
img_h, img_w = img.shape[:2]
bound = min((float(img_w) / img_h) / (w**2), (float(img_h) / img_w) / (h**2))
scale_max = min(scale[1], bound)
scale_min = min(scale[0], bound)
target_area = img_w * img_h * random.uniform(scale_min, scale_max)
target_size = math.sqrt(target_area)
w = int(target_size * w)
h = int(target_size * h)
i = random.randint(0, img_w - w)
j = random.randint(0, img_h - h)
img = img[j:j + h, i:i + w, :]
return self._resize_func(img, size)
class RandFlipImage(object):
""" random flip image
flip_code:
1: Flipped Horizontally
0: Flipped Vertically
-1: Flipped Horizontally & Vertically
"""
def __init__(self, flip_code=1):
assert flip_code in [-1, 0, 1], "flip_code should be a value in [-1, 0, 1]"
self.flip_code = flip_code
def __call__(self, img):
if random.randint(0, 1) == 1:
return cv2.flip(img, self.flip_code)
else:
return img
class NormalizeImage(object):
""" normalize image such as substract mean, divide std
"""
def __init__(self, scale=None, mean=None, std=None, order='chw', output_fp16=False, channel_num=3):
if isinstance(scale, str):
scale = eval(scale)
assert channel_num in [3, 4], "channel number of input image should be set to 3 or 4."
self.channel_num = channel_num
self.output_dtype = 'float16' if output_fp16 else 'float32'
self.scale = np.float32(scale if scale is not None else 1.0 / 255.0)
self.order = order
mean = mean if mean is not None else [0.485, 0.456, 0.406]
std = std if std is not None else [0.229, 0.224, 0.225]
shape = (3, 1, 1) if self.order == 'chw' else (1, 1, 3)
self.mean = np.array(mean).reshape(shape).astype('float32')
self.std = np.array(std).reshape(shape).astype('float32')
def __call__(self, img):
from PIL import Image
if isinstance(img, Image.Image):
img = np.array(img)
assert isinstance(img, np.ndarray), "invalid input 'img' in NormalizeImage"
img = (img.astype('float32') * self.scale - self.mean) / self.std
if self.channel_num == 4:
img_h = img.shape[1] if self.order == 'chw' else img.shape[0]
img_w = img.shape[2] if self.order == 'chw' else img.shape[1]
pad_zeros = np.zeros((1, img_h, img_w)) if self.order == 'chw' else np.zeros((img_h, img_w, 1))
img = (np.concatenate((img, pad_zeros), axis=0) if self.order == 'chw' else np.concatenate(
(img, pad_zeros), axis=2))
return img.astype(self.output_dtype)
class ToCHWImage(object):
""" convert hwc image to chw image
"""
def __init__(self):
pass
def __call__(self, img):
from PIL import Image
if isinstance(img, Image.Image):
img = np.array(img)
return img.transpose((2, 0, 1))
class ColorJitter(RawColorJitter):
"""ColorJitter.
"""
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
def __call__(self, img):
if not isinstance(img, Image.Image):
img = np.ascontiguousarray(img)
img = Image.fromarray(img)
img = super()._apply_image(img)
if isinstance(img, Image.Image):
img = np.asarray(img)
return img
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
class Topk(object):
def __init__(self, topk=1, class_id_map_file=None):
assert isinstance(topk, (int, ))
self.class_id_map = self.parse_class_id_map(class_id_map_file)
self.topk = topk
def parse_class_id_map(self, class_id_map_file):
if class_id_map_file is None:
return None
if not os.path.exists(class_id_map_file):
print(
"Warning: If want to use your own label_dict, please input legal path!\nOtherwise label_names will be empty!"
)
return None
try:
class_id_map = {}
with open(class_id_map_file, "r") as fin:
lines = fin.readlines()
for line in lines:
partition = line.split("\n")[0].partition(" ")
class_id_map[int(partition[0])] = str(partition[-1])
except Exception as ex:
print(ex)
class_id_map = None
return class_id_map
def __call__(self, x, file_names=None, multilabel=False):
assert isinstance(x, paddle.Tensor)
if file_names is not None:
assert x.shape[0] == len(file_names)
x = F.softmax(x, axis=-1) if not multilabel else F.sigmoid(x)
x = x.numpy()
y = []
for idx, probs in enumerate(x):
index = probs.argsort(axis=0)[-self.topk:][::-1].astype("int32") if not multilabel else np.where(
probs >= 0.5)[0].astype("int32")
clas_id_list = []
score_list = []
label_name_list = []
for i in index:
clas_id_list.append(i.item())
score_list.append(probs[i].item())
if self.class_id_map is not None:
label_name_list.append(self.class_id_map[i.item()])
result = {
"class_ids": clas_id_list,
"scores": np.around(score_list, decimals=5).tolist(),
}
if file_names is not None:
result["file_name"] = file_names[idx]
if label_name_list is not None:
result["label_names"] = label_name_list
y.append(result)
return y
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册