未验证 提交 c3f1e085 编写于 作者: C chenjian 提交者: GitHub

Update object detection module README (#1589)

上级 13cc2d57
## 命令行预测 # faster_rcnn_resnet50_coco2017
```shell
$ hub run faster_rcnn_resnet50_coco2017 --input_path "/PATH/TO/IMAGE"
```
## API
```python
def context(num_classes=81,
trainable=True,
pretrained=True,
phase='train')
```
提取特征,用于迁移学习。
**参数**
* num\_classes (int): 类别数;
* trainable(bool): 参数是否可训练;
* pretrained (bool): 是否加载预训练模型;
* phase (str): 可选值为 'train'/'predict','trian' 用于训练,'predict' 用于预测。
**返回**
* inputs (dict): 模型的输入,相应的取值为:
当 phase 为 'train'时,包含:
* image (Variable): 图像变量
* im\_size (Variable): 图像的尺寸
* im\_info (Variable): 图像缩放信息
* gt\_class (Variable): 检测框类别
* gt\_box (Variable): 检测框坐标
* is\_crowd (Variable): 单个框内是否包含多个物体
当 phase 为 'predict'时,包含:
* image (Variable): 图像变量
* im\_size (Variable): 图像的尺寸
* im\_info (Variable): 图像缩放信息
* outputs (dict): 模型的输出,相应的取值为:
当 phase 为 'train'时,包含:
* head_features (Variable): 所提取的特征
* rpn\_cls\_loss (Variable): 检测框分类损失
* rpn\_reg\_loss (Variable): 检测框回归损失
* generate\_proposal\_labels (Variable): 图像信息
当 phase 为 'predict'时,包含:
* head_features (Variable): 所提取的特征
* rois (Variable): 提取的roi
* bbox\_out (Variable): 预测结果
* context\_prog (Program): 用于迁移学习的 Program。
```python
def object_detection(paths=None,
images=None,
batch_size=1,
use_gpu=False,
output_dir='detection_result',
score_thresh=0.5,
visualization=True)
```
预测API,检测输入图片中的所有目标的位置。
**参数**
* paths (list\[str\]): 图片的路径;
* images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
* batch\_size (int): batch 的大小;
* use\_gpu (bool): 是否使用 GPU;
* score\_thresh (float): 识别置信度的阈值;
* visualization (bool): 是否将识别结果保存为图片文件;
* output\_dir (str): 图片的保存路径,默认设为 detection\_result;
**返回**
* res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
* data (list): 检测结果,list的每一个元素为 dict,各字段为:
* confidence (float): 识别的置信度;
* label (str): 标签;
* left (int): 边界框的左上角x坐标;
* top (int): 边界框的左上角y坐标;
* right (int): 边界框的右下角x坐标;
* bottom (int): 边界框的右下角y坐标;
* save\_path (str, optional): 识别结果的保存路径 (仅当visualization=True时存在)。
```python
def save_inference_model(dirname,
model_filename=None,
params_filename=None,
combined=True)
```
将模型保存到指定路径。
**参数**
* dirname: 存在模型的目录名称
* model\_filename: 模型文件名称,默认为\_\_model\_\_
* params\_filename: 参数文件名称,默认为\_\_params\_\_(仅当`combined`为True时生效)
* combined: 是否将参数保存到统一的一个文件中
## 代码示例
```python
import paddlehub as hub
import cv2
object_detector = hub.Module(name="faster_rcnn_resnet50_coco2017")
result = object_detector.object_detection(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = object_detector.object_detection((paths=['/PATH/TO/IMAGE'])
```
## 服务部署
PaddleHub Serving 可以部署一个目标检测的在线服务。
## 第一步:启动PaddleHub Serving
运行启动命令:
```shell
$ hub serving start -m faster_rcnn_resnet50_coco2017
```
这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
## 第二步:发送预测请求
配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
```python
import requests
import json
import cv2
import base64
|模型名称|faster_rcnn_resnet50_coco2017|
| :--- | :---: |
|类别|图像 - 目标检测|
|网络|faster_rcnn|
|数据集|COCO2017|
|是否支持Fine-tuning|否|
|模型大小|131MB|
|最新更新日期|2021-03-15|
|数据指标|-|
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
## 一、模型基本信息
# 发送HTTP请求 - ### 应用效果展示
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]} - 样例结果示例:
headers = {"Content-type": "application/json"} <p align="center">
url = "http://127.0.0.1:8866/predict/faster_rcnn_resnet50_coco2017" <img src="https://user-images.githubusercontent.com/22424850/131504887-d024c7e5-fc09-4d6b-92b8-4d0c965949d0.jpg" width='50%' hspace='10'/>
r = requests.post(url=url, headers=headers, data=json.dumps(data)) <br />
</p>
# 打印预测结果
print(r.json()["results"])
```
### 依赖 - ### 模型介绍
paddlepaddle >= 1.6.2 - Faster_RCNN是两阶段目标检测器,对图像生成候选区域、提取特征、判别特征类别并修正候选框位置。Faster_RCNN整体网络可以分为4部分,一是ResNet-50作为基础卷积层,二是区域生成网络,三是Rol Align,四是检测层。Faster_RCNN是在MS-COCO数据集上预训练的模型。目前仅提供预测功能。
paddlehub >= 1.6.0
## 二、安装
- ### 1、环境依赖
- paddlepaddle >= 1.6.2
- paddlehub >= 1.6.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
- ### 2、安装
- ```shell
$ hub install faster_rcnn_resnet50_coco2017
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
$ hub run faster_rcnn_resnet50_coco2017 --input_path "/PATH/TO/IMAGE"
```
- 通过命令行方式实现目标检测模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、代码示例
- ```python
import paddlehub as hub
import cv2
object_detector = hub.Module(name="faster_rcnn_resnet50_coco2017")
result = object_detector.object_detection(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = object_detector.object_detection((paths=['/PATH/TO/IMAGE'])
```
- ### 3、API
- ```python
def object_detection(paths=None,
images=None,
batch_size=1,
use_gpu=False,
output_dir='detection_result',
score_thresh=0.5,
visualization=True)
```
- 预测API,检测输入图片中的所有目标的位置。
- **参数**
- paths (list\[str\]): 图片的路径; <br/>
- images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式; <br/>
- batch\_size (int): batch 的大小;<br/>
- use\_gpu (bool): 是否使用 GPU;<br/>
- output\_dir (str): 图片的保存路径,默认设为 detection\_result;<br/>
- score\_thresh (float): 识别置信度的阈值;<br/>
- visualization (bool): 是否将识别结果保存为图片文件。
**NOTE:** paths和images两个参数选择其一进行提供数据
- **返回**
- res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
- data (list): 检测结果,list的每一个元素为 dict,各字段为:
- confidence (float): 识别的置信度
- label (str): 标签
- left (int): 边界框的左上角x坐标
- top (int): 边界框的左上角y坐标
- right (int): 边界框的右下角x坐标
- bottom (int): 边界框的右下角y坐标
- save\_path (str, optional): 识别结果的保存路径 (仅当visualization=True时存在)
- ```python
def save_inference_model(dirname,
model_filename=None,
params_filename=None,
combined=True)
```
- 将模型保存到指定路径。
- **参数**
- dirname: 存在模型的目录名称; <br/>
- model\_filename: 模型文件名称,默认为\_\_model\_\_; <br/>
- params\_filename: 参数文件名称,默认为\_\_params\_\_(仅当`combined`为True时生效);<br/>
- combined: 是否将参数保存到统一的一个文件中。
## 四、服务部署
- PaddleHub Serving可以部署一个目标检测的在线服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m faster_rcnn_resnet50_coco2017
```
- 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/faster_rcnn_resnet50_coco2017"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
## 五、更新历史
* 1.1.0
初始发布
* 1.1.1
修复numpy数据读取问题
- ```shell
$ hub install faster_rcnn_resnet50_coco2017==1.1.1
```
...@@ -45,11 +45,18 @@ class SmoothL1Loss(object): ...@@ -45,11 +45,18 @@ class SmoothL1Loss(object):
def __call__(self, x, y, inside_weight=None, outside_weight=None): def __call__(self, x, y, inside_weight=None, outside_weight=None):
return fluid.layers.smooth_l1( return fluid.layers.smooth_l1(
x, y, inside_weight=inside_weight, outside_weight=outside_weight, sigma=self.sigma) x,
y,
inside_weight=inside_weight,
outside_weight=outside_weight,
sigma=self.sigma)
class BoxCoder(object): class BoxCoder(object):
def __init__(self, prior_box_var=[0.1, 0.1, 0.2, 0.2], code_type='decode_center_size', box_normalized=False, def __init__(self,
prior_box_var=[0.1, 0.1, 0.2, 0.2],
code_type='decode_center_size',
box_normalized=False,
axis=1): axis=1):
super(BoxCoder, self).__init__() super(BoxCoder, self).__init__()
self.prior_box_var = prior_box_var self.prior_box_var = prior_box_var
...@@ -78,14 +85,16 @@ class TwoFCHead(object): ...@@ -78,14 +85,16 @@ class TwoFCHead(object):
act='relu', act='relu',
name='fc6', name='fc6',
param_attr=ParamAttr(name='fc6_w', initializer=Xavier(fan_out=fan)), param_attr=ParamAttr(name='fc6_w', initializer=Xavier(fan_out=fan)),
bias_attr=ParamAttr(name='fc6_b', learning_rate=2., regularizer=L2Decay(0.))) bias_attr=ParamAttr(
name='fc6_b', learning_rate=2., regularizer=L2Decay(0.)))
head_feat = fluid.layers.fc( head_feat = fluid.layers.fc(
input=fc6, input=fc6,
size=self.mlp_dim, size=self.mlp_dim,
act='relu', act='relu',
name='fc7', name='fc7',
param_attr=ParamAttr(name='fc7_w', initializer=Xavier()), param_attr=ParamAttr(name='fc7_w', initializer=Xavier()),
bias_attr=ParamAttr(name='fc7_b', learning_rate=2., regularizer=L2Decay(0.))) bias_attr=ParamAttr(
name='fc7_b', learning_rate=2., regularizer=L2Decay(0.)))
return head_feat return head_feat
...@@ -103,7 +112,12 @@ class BBoxHead(object): ...@@ -103,7 +112,12 @@ class BBoxHead(object):
__inject__ = ['head', 'box_coder', 'nms', 'bbox_loss'] __inject__ = ['head', 'box_coder', 'nms', 'bbox_loss']
__shared__ = ['num_classes'] __shared__ = ['num_classes']
def __init__(self, head, box_coder=BoxCoder(), nms=MultiClassNMS(), bbox_loss=SmoothL1Loss(), num_classes=81): def __init__(self,
head,
box_coder=BoxCoder(),
nms=MultiClassNMS(),
bbox_loss=SmoothL1Loss(),
num_classes=81):
super(BBoxHead, self).__init__() super(BBoxHead, self).__init__()
self.head = head self.head = head
self.num_classes = num_classes self.num_classes = num_classes
...@@ -140,24 +154,30 @@ class BBoxHead(object): ...@@ -140,24 +154,30 @@ class BBoxHead(object):
head_feat = self.get_head_feat(roi_feat) head_feat = self.get_head_feat(roi_feat)
# when ResNetC5 output a single feature map # when ResNetC5 output a single feature map
if not isinstance(self.head, TwoFCHead): if not isinstance(self.head, TwoFCHead):
head_feat = fluid.layers.pool2d(head_feat, pool_type='avg', global_pooling=True) head_feat = fluid.layers.pool2d(
head_feat, pool_type='avg', global_pooling=True)
cls_score = fluid.layers.fc( cls_score = fluid.layers.fc(
input=head_feat, input=head_feat,
size=self.num_classes, size=self.num_classes,
act=None, act=None,
name='cls_score', name='cls_score',
param_attr=ParamAttr(name='cls_score_w', initializer=Normal(loc=0.0, scale=0.01)), param_attr=ParamAttr(
bias_attr=ParamAttr(name='cls_score_b', learning_rate=2., regularizer=L2Decay(0.))) name='cls_score_w', initializer=Normal(loc=0.0, scale=0.01)),
bias_attr=ParamAttr(
name='cls_score_b', learning_rate=2., regularizer=L2Decay(0.)))
bbox_pred = fluid.layers.fc( bbox_pred = fluid.layers.fc(
input=head_feat, input=head_feat,
size=4 * self.num_classes, size=4 * self.num_classes,
act=None, act=None,
name='bbox_pred', name='bbox_pred',
param_attr=ParamAttr(name='bbox_pred_w', initializer=Normal(loc=0.0, scale=0.001)), param_attr=ParamAttr(
bias_attr=ParamAttr(name='bbox_pred_b', learning_rate=2., regularizer=L2Decay(0.))) name='bbox_pred_w', initializer=Normal(loc=0.0, scale=0.001)),
bias_attr=ParamAttr(
name='bbox_pred_b', learning_rate=2., regularizer=L2Decay(0.)))
return cls_score, bbox_pred return cls_score, bbox_pred
def get_loss(self, roi_feat, labels_int32, bbox_targets, bbox_inside_weights, bbox_outside_weights): def get_loss(self, roi_feat, labels_int32, bbox_targets,
bbox_inside_weights, bbox_outside_weights):
""" """
Get bbox_head loss. Get bbox_head loss.
...@@ -186,11 +206,19 @@ class BBoxHead(object): ...@@ -186,11 +206,19 @@ class BBoxHead(object):
logits=cls_score, label=labels_int64, numeric_stable_mode=True) logits=cls_score, label=labels_int64, numeric_stable_mode=True)
loss_cls = fluid.layers.reduce_mean(loss_cls) loss_cls = fluid.layers.reduce_mean(loss_cls)
loss_bbox = self.bbox_loss( loss_bbox = self.bbox_loss(
x=bbox_pred, y=bbox_targets, inside_weight=bbox_inside_weights, outside_weight=bbox_outside_weights) x=bbox_pred,
y=bbox_targets,
inside_weight=bbox_inside_weights,
outside_weight=bbox_outside_weights)
loss_bbox = fluid.layers.reduce_mean(loss_bbox) loss_bbox = fluid.layers.reduce_mean(loss_bbox)
return {'loss_cls': loss_cls, 'loss_bbox': loss_bbox} return {'loss_cls': loss_cls, 'loss_bbox': loss_bbox}
def get_prediction(self, roi_feat, rois, im_info, im_shape, return_box_score=False): def get_prediction(self,
roi_feat,
rois,
im_info,
im_shape,
return_box_score=False):
""" """
Get prediction bounding box in test stage. Get prediction bounding box in test stage.
......
...@@ -30,7 +30,8 @@ def test_reader(paths=None, images=None): ...@@ -30,7 +30,8 @@ def test_reader(paths=None, images=None):
img_list = list() img_list = list()
if paths: if paths:
for img_path in paths: for img_path in paths:
assert os.path.isfile(img_path), "The {} isn't a valid file path.".format(img_path) assert os.path.isfile(
img_path), "The {} isn't a valid file path.".format(img_path)
img = cv2.imread(img_path).astype('float32') img = cv2.imread(img_path).astype('float32')
img_list.append(img) img_list.append(img)
if images is not None: if images is not None:
...@@ -65,7 +66,13 @@ def test_reader(paths=None, images=None): ...@@ -65,7 +66,13 @@ def test_reader(paths=None, images=None):
# im_info holds the resize info of image. # im_info holds the resize info of image.
im_info = np.array([resize_h, resize_w, im_scale]).astype('float32') im_info = np.array([resize_h, resize_w, im_scale]).astype('float32')
im = cv2.resize(im, None, None, fx=im_scale, fy=im_scale, interpolation=cv2.INTER_LINEAR) im = cv2.resize(
im,
None,
None,
fx=im_scale,
fy=im_scale,
interpolation=cv2.INTER_LINEAR)
# HWC --> CHW # HWC --> CHW
im = np.swapaxes(im, 1, 2) im = np.swapaxes(im, 1, 2)
...@@ -74,11 +81,14 @@ def test_reader(paths=None, images=None): ...@@ -74,11 +81,14 @@ def test_reader(paths=None, images=None):
def padding_minibatch(batch_data, coarsest_stride=0, use_padded_im_info=True): def padding_minibatch(batch_data, coarsest_stride=0, use_padded_im_info=True):
max_shape_org = np.array([data['image'].shape for data in batch_data]).max(axis=0) max_shape_org = np.array(
[data['image'].shape for data in batch_data]).max(axis=0)
if coarsest_stride > 0: if coarsest_stride > 0:
max_shape = np.zeros((3)).astype('int32') max_shape = np.zeros((3)).astype('int32')
max_shape[1] = int(np.ceil(max_shape_org[1] / coarsest_stride) * coarsest_stride) max_shape[1] = int(
max_shape[2] = int(np.ceil(max_shape_org[2] / coarsest_stride) * coarsest_stride) np.ceil(max_shape_org[1] / coarsest_stride) * coarsest_stride)
max_shape[2] = int(
np.ceil(max_shape_org[2] / coarsest_stride) * coarsest_stride)
else: else:
max_shape = max_shape_org.astype('int32') max_shape = max_shape_org.astype('int32')
...@@ -89,12 +99,15 @@ def padding_minibatch(batch_data, coarsest_stride=0, use_padded_im_info=True): ...@@ -89,12 +99,15 @@ def padding_minibatch(batch_data, coarsest_stride=0, use_padded_im_info=True):
for data in batch_data: for data in batch_data:
im_c, im_h, im_w = data['image'].shape im_c, im_h, im_w = data['image'].shape
# image # image
padding_im = np.zeros((im_c, max_shape[1], max_shape[2]), dtype=np.float32) padding_im = np.zeros((im_c, max_shape[1], max_shape[2]),
dtype=np.float32)
padding_im[:, 0:im_h, 0:im_w] = data['image'] padding_im[:, 0:im_h, 0:im_w] = data['image']
padding_image.append(padding_im) padding_image.append(padding_im)
# im_info # im_info
data['im_info'][0] = max_shape[1] if use_padded_im_info else max_shape_org[1] data['im_info'][
data['im_info'][1] = max_shape[2] if use_padded_im_info else max_shape_org[2] 0] = max_shape[1] if use_padded_im_info else max_shape_org[1]
data['im_info'][
1] = max_shape[2] if use_padded_im_info else max_shape_org[2]
padding_info.append(data['im_info']) padding_info.append(data['im_info'])
padding_shape.append(data['im_shape']) padding_shape.append(data['im_shape'])
......
...@@ -29,16 +29,19 @@ from faster_rcnn_resnet50_coco2017.roi_extractor import RoIAlign ...@@ -29,16 +29,19 @@ from faster_rcnn_resnet50_coco2017.roi_extractor import RoIAlign
@moduleinfo( @moduleinfo(
name="faster_rcnn_resnet50_coco2017", name="faster_rcnn_resnet50_coco2017",
version="1.1.0", version="1.1.1",
type="cv/object_detection", type="cv/object_detection",
summary="Baidu's Faster R-CNN model for object detection with backbone ResNet50, trained with dataset COCO2017", summary=
"Baidu's Faster R-CNN model for object detection with backbone ResNet50, trained with dataset COCO2017",
author="paddlepaddle", author="paddlepaddle",
author_email="paddle-dev@baidu.com") author_email="paddle-dev@baidu.com")
class FasterRCNNResNet50(hub.Module): class FasterRCNNResNet50(hub.Module):
def _initialize(self): def _initialize(self):
# default pretrained model, Faster-RCNN with backbone ResNet50, shape of input tensor is [3, 800, 1333] # default pretrained model, Faster-RCNN with backbone ResNet50, shape of input tensor is [3, 800, 1333]
self.default_pretrained_model_path = os.path.join(self.directory, "faster_rcnn_resnet50_model") self.default_pretrained_model_path = os.path.join(
self.label_names = load_label_info(os.path.join(self.directory, "label_file.txt")) self.directory, "faster_rcnn_resnet50_model")
self.label_names = load_label_info(
os.path.join(self.directory, "label_file.txt"))
self._set_config() self._set_config()
def _set_config(self): def _set_config(self):
...@@ -62,7 +65,11 @@ class FasterRCNNResNet50(hub.Module): ...@@ -62,7 +65,11 @@ class FasterRCNNResNet50(hub.Module):
gpu_config.enable_use_gpu(memory_pool_init_size_mb=500, device_id=0) gpu_config.enable_use_gpu(memory_pool_init_size_mb=500, device_id=0)
self.gpu_predictor = create_paddle_predictor(gpu_config) self.gpu_predictor = create_paddle_predictor(gpu_config)
def context(self, num_classes=81, trainable=True, pretrained=True, phase='train'): def context(self,
num_classes=81,
trainable=True,
pretrained=True,
phase='train'):
""" """
Distill the Head Features, so as to perform transfer learning. Distill the Head Features, so as to perform transfer learning.
...@@ -81,24 +88,34 @@ class FasterRCNNResNet50(hub.Module): ...@@ -81,24 +88,34 @@ class FasterRCNNResNet50(hub.Module):
startup_program = fluid.Program() startup_program = fluid.Program()
with fluid.program_guard(context_prog, startup_program): with fluid.program_guard(context_prog, startup_program):
with fluid.unique_name.guard(): with fluid.unique_name.guard():
image = fluid.layers.data(name='image', shape=[-1, 3, -1, -1], dtype='float32') image = fluid.layers.data(
name='image', shape=[-1, 3, -1, -1], dtype='float32')
# backbone # backbone
backbone = ResNet(norm_type='affine_channel', depth=50, feature_maps=4, freeze_at=2) backbone = ResNet(
norm_type='affine_channel',
depth=50,
feature_maps=4,
freeze_at=2)
body_feats = backbone(image) body_feats = backbone(image)
# var_prefix # var_prefix
var_prefix = '@HUB_{}@'.format(self.name) var_prefix = '@HUB_{}@'.format(self.name)
im_info = fluid.layers.data(name='im_info', shape=[3], dtype='float32', lod_level=0) im_info = fluid.layers.data(
im_shape = fluid.layers.data(name='im_shape', shape=[3], dtype='float32', lod_level=0) name='im_info', shape=[3], dtype='float32', lod_level=0)
im_shape = fluid.layers.data(
name='im_shape', shape=[3], dtype='float32', lod_level=0)
body_feat_names = list(body_feats.keys()) body_feat_names = list(body_feats.keys())
# rpn_head: RPNHead # rpn_head: RPNHead
rpn_head = self.rpn_head() rpn_head = self.rpn_head()
rois = rpn_head.get_proposals(body_feats, im_info, mode=phase) rois = rpn_head.get_proposals(body_feats, im_info, mode=phase)
# train # train
if phase == 'train': if phase == 'train':
gt_bbox = fluid.layers.data(name='gt_bbox', shape=[4], dtype='float32', lod_level=1) gt_bbox = fluid.layers.data(
is_crowd = fluid.layers.data(name='is_crowd', shape=[1], dtype='int32', lod_level=1) name='gt_bbox', shape=[4], dtype='float32', lod_level=1)
gt_class = fluid.layers.data(name='gt_class', shape=[1], dtype='int32', lod_level=1) is_crowd = fluid.layers.data(
name='is_crowd', shape=[1], dtype='int32', lod_level=1)
gt_class = fluid.layers.data(
name='gt_class', shape=[1], dtype='int32', lod_level=1)
rpn_loss = rpn_head.get_loss(im_info, gt_bbox, is_crowd) rpn_loss = rpn_head.get_loss(im_info, gt_bbox, is_crowd)
# bbox_assigner: BBoxAssigner # bbox_assigner: BBoxAssigner
bbox_assigner = self.bbox_assigner(num_classes) bbox_assigner = self.bbox_assigner(num_classes)
...@@ -143,13 +160,18 @@ class FasterRCNNResNet50(hub.Module): ...@@ -143,13 +160,18 @@ class FasterRCNNResNet50(hub.Module):
'is_crowd': var_prefix + is_crowd.name 'is_crowd': var_prefix + is_crowd.name
} }
outputs = { outputs = {
'head_features': var_prefix + head_feat.name, 'head_features':
'rpn_cls_loss': var_prefix + rpn_loss['rpn_cls_loss'].name, var_prefix + head_feat.name,
'rpn_reg_loss': var_prefix + rpn_loss['rpn_reg_loss'].name, 'rpn_cls_loss':
'generate_proposal_labels': [var_prefix + var.name for var in outs] var_prefix + rpn_loss['rpn_cls_loss'].name,
'rpn_reg_loss':
var_prefix + rpn_loss['rpn_reg_loss'].name,
'generate_proposal_labels':
[var_prefix + var.name for var in outs]
} }
elif phase == 'predict': elif phase == 'predict':
pred = bbox_head.get_prediction(roi_feat, rois, im_info, im_shape) pred = bbox_head.get_prediction(roi_feat, rois, im_info,
im_shape)
inputs = { inputs = {
'image': var_prefix + image.name, 'image': var_prefix + image.name,
'im_info': var_prefix + im_info.name, 'im_info': var_prefix + im_info.name,
...@@ -164,9 +186,13 @@ class FasterRCNNResNet50(hub.Module): ...@@ -164,9 +186,13 @@ class FasterRCNNResNet50(hub.Module):
add_vars_prefix(startup_program, var_prefix) add_vars_prefix(startup_program, var_prefix)
global_vars = context_prog.global_block().vars global_vars = context_prog.global_block().vars
inputs = {key: global_vars[value] for key, value in inputs.items()} inputs = {
key: global_vars[value]
for key, value in inputs.items()
}
outputs = { outputs = {
key: global_vars[value] if not isinstance(value, list) else [global_vars[var] for var in value] key: global_vars[value] if not isinstance(value, list) else
[global_vars[var] for var in value]
for key, value in outputs.items() for key, value in outputs.items()
} }
...@@ -182,9 +208,14 @@ class FasterRCNNResNet50(hub.Module): ...@@ -182,9 +208,14 @@ class FasterRCNNResNet50(hub.Module):
if num_classes != 81: if num_classes != 81:
if 'bbox_pred' in var.name or 'cls_score' in var.name: if 'bbox_pred' in var.name or 'cls_score' in var.name:
return False return False
return os.path.exists(os.path.join(self.default_pretrained_model_path, var.name)) return os.path.exists(
os.path.join(self.default_pretrained_model_path,
fluid.io.load_vars(exe, self.default_pretrained_model_path, predicate=_if_exist) var.name))
fluid.io.load_vars(
exe,
self.default_pretrained_model_path,
predicate=_if_exist)
return inputs, outputs, context_prog return inputs, outputs, context_prog
def rpn_head(self): def rpn_head(self):
...@@ -200,8 +231,16 @@ class FasterRCNNResNet50(hub.Module): ...@@ -200,8 +231,16 @@ class FasterRCNNResNet50(hub.Module):
rpn_negative_overlap=0.3, rpn_negative_overlap=0.3,
rpn_positive_overlap=0.7, rpn_positive_overlap=0.7,
rpn_straddle_thresh=0.0), rpn_straddle_thresh=0.0),
train_proposal=GenerateProposals(min_size=0.0, nms_thresh=0.7, post_nms_top_n=12000, pre_nms_top_n=2000), train_proposal=GenerateProposals(
test_proposal=GenerateProposals(min_size=0.0, nms_thresh=0.7, post_nms_top_n=6000, pre_nms_top_n=1000)) min_size=0.0,
nms_thresh=0.7,
post_nms_top_n=12000,
pre_nms_top_n=2000),
test_proposal=GenerateProposals(
min_size=0.0,
nms_thresh=0.7,
post_nms_top_n=6000,
pre_nms_top_n=1000))
def roi_extractor(self): def roi_extractor(self):
return RoIAlign(resolution=14, sampling_ratio=0, spatial_scale=0.0625) return RoIAlign(resolution=14, sampling_ratio=0, spatial_scale=0.0625)
...@@ -209,7 +248,8 @@ class FasterRCNNResNet50(hub.Module): ...@@ -209,7 +248,8 @@ class FasterRCNNResNet50(hub.Module):
def bbox_head(self, num_classes): def bbox_head(self, num_classes):
return BBoxHead( return BBoxHead(
head=ResNetC5(depth=50, norm_type='affine_channel'), head=ResNetC5(depth=50, norm_type='affine_channel'),
nms=MultiClassNMS(keep_top_k=100, nms_threshold=0.5, score_threshold=0.05), nms=MultiClassNMS(
keep_top_k=100, nms_threshold=0.5, score_threshold=0.05),
bbox_loss=SmoothL1Loss(), bbox_loss=SmoothL1Loss(),
num_classes=num_classes) num_classes=num_classes)
...@@ -223,7 +263,11 @@ class FasterRCNNResNet50(hub.Module): ...@@ -223,7 +263,11 @@ class FasterRCNNResNet50(hub.Module):
fg_thresh=0.5, fg_thresh=0.5,
class_nums=num_classes) class_nums=num_classes)
def save_inference_model(self, dirname, model_filename=None, params_filename=None, combined=True): def save_inference_model(self,
dirname,
model_filename=None,
params_filename=None,
combined=True):
if combined: if combined:
model_filename = "__model__" if not model_filename else model_filename model_filename = "__model__" if not model_filename else model_filename
params_filename = "__params__" if not params_filename else params_filename params_filename = "__params__" if not params_filename else params_filename
...@@ -279,7 +323,7 @@ class FasterRCNNResNet50(hub.Module): ...@@ -279,7 +323,7 @@ class FasterRCNNResNet50(hub.Module):
int(_places[0]) int(_places[0])
except: except:
raise RuntimeError( raise RuntimeError(
"Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES as cuda_device_id." "Attempt to use GPU for prediction, but environment variable CUDA_VISIBLE_DEVICES was not set correctly."
) )
paths = paths if paths else list() paths = paths if paths else list()
if data and 'image' in data: if data and 'image' in data:
...@@ -301,11 +345,14 @@ class FasterRCNNResNet50(hub.Module): ...@@ -301,11 +345,14 @@ class FasterRCNNResNet50(hub.Module):
except: except:
pass pass
padding_image, padding_info, padding_shape = padding_minibatch(batch_data) padding_image, padding_info, padding_shape = padding_minibatch(
batch_data)
padding_image_tensor = PaddleTensor(padding_image.copy()) padding_image_tensor = PaddleTensor(padding_image.copy())
padding_info_tensor = PaddleTensor(padding_info.copy()) padding_info_tensor = PaddleTensor(padding_info.copy())
padding_shape_tensor = PaddleTensor(padding_shape.copy()) padding_shape_tensor = PaddleTensor(padding_shape.copy())
feed_list = [padding_image_tensor, padding_info_tensor, padding_shape_tensor] feed_list = [
padding_image_tensor, padding_info_tensor, padding_shape_tensor
]
if use_gpu: if use_gpu:
data_out = self.gpu_predictor.run(feed_list) data_out = self.gpu_predictor.run(feed_list)
else: else:
...@@ -327,17 +374,29 @@ class FasterRCNNResNet50(hub.Module): ...@@ -327,17 +374,29 @@ class FasterRCNNResNet50(hub.Module):
Add the command config options Add the command config options
""" """
self.arg_config_group.add_argument( self.arg_config_group.add_argument(
'--use_gpu', type=ast.literal_eval, default=False, help="whether use GPU or not") '--use_gpu',
type=ast.literal_eval,
default=False,
help="whether use GPU or not")
self.arg_config_group.add_argument('--batch_size', type=int, default=1, help="batch size for prediction") self.arg_config_group.add_argument(
'--batch_size',
type=int,
default=1,
help="batch size for prediction")
def add_module_input_arg(self): def add_module_input_arg(self):
""" """
Add the command input options Add the command input options
""" """
self.arg_input_group.add_argument('--input_path', type=str, default=None, help="input data") self.arg_input_group.add_argument(
'--input_path', type=str, default=None, help="input data")
self.arg_input_group.add_argument('--input_file', type=str, default=None, help="file contain input data") self.arg_input_group.add_argument(
'--input_file',
type=str,
default=None,
help="file contain input data")
def check_input_data(self, args): def check_input_data(self, args):
input_data = [] input_data = []
...@@ -366,9 +425,12 @@ class FasterRCNNResNet50(hub.Module): ...@@ -366,9 +425,12 @@ class FasterRCNNResNet50(hub.Module):
prog="hub run {}".format(self.name), prog="hub run {}".format(self.name),
usage='%(prog)s', usage='%(prog)s',
add_help=True) add_help=True)
self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required") self.arg_input_group = self.parser.add_argument_group(
title="Input options", description="Input data. Required")
self.arg_config_group = self.parser.add_argument_group( self.arg_config_group = self.parser.add_argument_group(
title="Config options", description="Run configuration for controlling module behavior, not required.") title="Config options",
description=
"Run configuration for controlling module behavior, not required.")
self.add_module_config_arg() self.add_module_config_arg()
self.add_module_input_arg() self.add_module_input_arg()
...@@ -380,5 +442,7 @@ class FasterRCNNResNet50(hub.Module): ...@@ -380,5 +442,7 @@ class FasterRCNNResNet50(hub.Module):
else: else:
for image_path in input_data: for image_path in input_data:
if not os.path.exists(image_path): if not os.path.exists(image_path):
raise RuntimeError("File %s or %s is not exist." % image_path) raise RuntimeError(
return self.object_detection(paths=input_data, use_gpu=args.use_gpu, batch_size=args.batch_size) "File %s or %s is not exist." % image_path)
return self.object_detection(
paths=input_data, use_gpu=args.use_gpu, batch_size=args.batch_size)
...@@ -22,7 +22,8 @@ nonlocal_params = { ...@@ -22,7 +22,8 @@ nonlocal_params = {
} }
def space_nonlocal(input, dim_in, dim_out, prefix, dim_inner, max_pool_stride=2): def space_nonlocal(input, dim_in, dim_out, prefix, dim_inner,
max_pool_stride=2):
cur = input cur = input
theta = fluid.layers.conv2d(input = cur, num_filters = dim_inner, \ theta = fluid.layers.conv2d(input = cur, num_filters = dim_inner, \
filter_size = [1, 1], stride = [1, 1], \ filter_size = [1, 1], stride = [1, 1], \
...@@ -82,7 +83,8 @@ def space_nonlocal(input, dim_in, dim_out, prefix, dim_inner, max_pool_stride=2) ...@@ -82,7 +83,8 @@ def space_nonlocal(input, dim_in, dim_out, prefix, dim_inner, max_pool_stride=2)
theta_phi_sc = fluid.layers.scale(theta_phi, scale=dim_inner**-.5) theta_phi_sc = fluid.layers.scale(theta_phi, scale=dim_inner**-.5)
else: else:
theta_phi_sc = theta_phi theta_phi_sc = theta_phi
p = fluid.layers.softmax(theta_phi_sc, name=prefix + '_affinity' + '_prob') p = fluid.layers.softmax(
theta_phi_sc, name=prefix + '_affinity' + '_prob')
else: else:
# not clear about what is doing in xlw's code # not clear about what is doing in xlw's code
p = None # not implemented p = None # not implemented
...@@ -96,7 +98,8 @@ def space_nonlocal(input, dim_in, dim_out, prefix, dim_inner, max_pool_stride=2) ...@@ -96,7 +98,8 @@ def space_nonlocal(input, dim_in, dim_out, prefix, dim_inner, max_pool_stride=2)
# reshape back # reshape back
# e.g. (8, 1024, 784) => (8, 1024, 4, 14, 14) # e.g. (8, 1024, 784) => (8, 1024, 4, 14, 14)
t_shape = t.shape t_shape = t.shape
t_re = fluid.layers.reshape(t, shape=list(theta_shape), actual_shape=theta_shape_op) t_re = fluid.layers.reshape(
t, shape=list(theta_shape), actual_shape=theta_shape_op)
blob_out = t_re blob_out = t_re
blob_out = fluid.layers.conv2d(input = blob_out, num_filters = dim_out, \ blob_out = fluid.layers.conv2d(input = blob_out, num_filters = dim_out, \
filter_size = [1, 1], stride = [1, 1], padding = [0, 0], \ filter_size = [1, 1], stride = [1, 1], padding = [0, 0], \
......
...@@ -19,6 +19,12 @@ def base64_to_cv2(b64str): ...@@ -19,6 +19,12 @@ def base64_to_cv2(b64str):
data = cv2.imdecode(data, cv2.IMREAD_COLOR) data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data return data
def check_dir(dir_path):
if not os.path.exists(dir_path):
os.makedirs(dir_path)
elif os.path.isfile(dir_path):
os.remove(dir_path)
os.makedirs(dir_path)
def get_save_image_name(img, output_dir, image_path): def get_save_image_name(img, output_dir, image_path):
"""Get save image name from source image path. """Get save image name from source image path.
...@@ -48,17 +54,23 @@ def draw_bounding_box_on_image(image_path, data_list, save_dir): ...@@ -48,17 +54,23 @@ def draw_bounding_box_on_image(image_path, data_list, save_dir):
image = Image.open(image_path) image = Image.open(image_path)
draw = ImageDraw.Draw(image) draw = ImageDraw.Draw(image)
for data in data_list: for data in data_list:
left, right, top, bottom = data['left'], data['right'], data['top'], data['bottom'] left, right, top, bottom = data['left'], data['right'], data[
'top'], data['bottom']
# draw bbox # draw bbox
draw.line([(left, top), (left, bottom), (right, bottom), (right, top), (left, top)], width=2, fill='red') draw.line([(left, top), (left, bottom), (right, bottom), (right, top),
(left, top)],
width=2,
fill='red')
# draw label # draw label
if image.mode == 'RGB': if image.mode == 'RGB':
text = data['label'] + ": %.2f%%" % (100 * data['confidence']) text = data['label'] + ": %.2f%%" % (100 * data['confidence'])
textsize_width, textsize_height = draw.textsize(text=text) textsize_width, textsize_height = draw.textsize(text=text)
draw.rectangle( draw.rectangle(
xy=(left, top - (textsize_height + 5), left + textsize_width + 10, top), fill=(255, 255, 255)) xy=(left, top - (textsize_height + 5),
left + textsize_width + 10, top),
fill=(255, 255, 255))
draw.text(xy=(left, top - 15), text=text, fill=(0, 0, 0)) draw.text(xy=(left, top - 15), text=text, fill=(0, 0, 0))
save_name = get_save_image_name(image, save_dir, image_path) save_name = get_save_image_name(image, save_dir, image_path)
...@@ -86,7 +98,14 @@ def load_label_info(file_path): ...@@ -86,7 +98,14 @@ def load_label_info(file_path):
return label_names return label_names
def postprocess(paths, images, data_out, score_thresh, label_names, output_dir, handle_id, visualization=True): def postprocess(paths,
images,
data_out,
score_thresh,
label_names,
output_dir,
handle_id,
visualization=True):
""" """
postprocess the lod_tensor produced by fluid.Executor.run postprocess the lod_tensor produced by fluid.Executor.run
...@@ -115,16 +134,26 @@ def postprocess(paths, images, data_out, score_thresh, label_names, output_dir, ...@@ -115,16 +134,26 @@ def postprocess(paths, images, data_out, score_thresh, label_names, output_dir,
lod = lod_tensor.lod[0] lod = lod_tensor.lod[0]
results = lod_tensor.as_ndarray() results = lod_tensor.as_ndarray()
if handle_id < len(paths): check_dir(output_dir)
unhandled_paths = paths[handle_id:]
unhandled_paths_num = len(unhandled_paths) if paths:
else: assert type(paths) is list, "type(paths) is not list."
unhandled_paths_num = 0 if handle_id < len(paths):
unhandled_paths = paths[handle_id:]
unhandled_paths_num = len(unhandled_paths)
else:
unhandled_paths_num = 0
if images is not None:
if handle_id < len(images):
unhandled_paths = None
unhandled_paths_num = len(images) - handle_id
else:
unhandled_paths_num = 0
output = [] output = []
for index in range(len(lod) - 1): for index in range(len(lod) - 1):
output_i = {'data': []} output_i = {'data': []}
if index < unhandled_paths_num: if unhandled_paths and index < unhandled_paths_num:
org_img_path = unhandled_paths[index] org_img_path = unhandled_paths[index]
org_img = Image.open(org_img_path) org_img = Image.open(org_img_path)
output_i['path'] = org_img_path output_i['path'] = org_img_path
...@@ -133,7 +162,9 @@ def postprocess(paths, images, data_out, score_thresh, label_names, output_dir, ...@@ -133,7 +162,9 @@ def postprocess(paths, images, data_out, score_thresh, label_names, output_dir,
org_img = org_img.astype(np.uint8) org_img = org_img.astype(np.uint8)
org_img = Image.fromarray(org_img[:, :, ::-1]) org_img = Image.fromarray(org_img[:, :, ::-1])
if visualization: if visualization:
org_img_path = get_save_image_name(org_img, output_dir, 'image_numpy_{}'.format((handle_id + index))) org_img_path = get_save_image_name(
org_img, output_dir, 'image_numpy_{}'.format(
(handle_id + index)))
org_img.save(org_img_path) org_img.save(org_img_path)
org_img_height = org_img.height org_img_height = org_img.height
org_img_width = org_img.width org_img_width = org_img.width
...@@ -149,11 +180,13 @@ def postprocess(paths, images, data_out, score_thresh, label_names, output_dir, ...@@ -149,11 +180,13 @@ def postprocess(paths, images, data_out, score_thresh, label_names, output_dir,
dt = {} dt = {}
dt['label'] = label_names[category_id] dt['label'] = label_names[category_id]
dt['confidence'] = float(confidence) dt['confidence'] = float(confidence)
dt['left'], dt['top'], dt['right'], dt['bottom'] = clip_bbox(bbox, org_img_width, org_img_height) dt['left'], dt['top'], dt['right'], dt['bottom'] = clip_bbox(
bbox, org_img_width, org_img_height)
output_i['data'].append(dt) output_i['data'].append(dt)
output.append(output_i) output.append(output_i)
if visualization: if visualization:
output_i['save_path'] = draw_bounding_box_on_image(org_img_path, output_i['data'], output_dir) output_i['save_path'] = draw_bounding_box_on_image(
org_img_path, output_i['data'], output_dir)
return output return output
...@@ -90,7 +90,13 @@ class ResNet(object): ...@@ -90,7 +90,13 @@ class ResNet(object):
self.get_prediction = get_prediction self.get_prediction = get_prediction
self.class_dim = class_dim self.class_dim = class_dim
def _conv_offset(self, input, filter_size, stride, padding, act=None, name=None): def _conv_offset(self,
input,
filter_size,
stride,
padding,
act=None,
name=None):
out_channel = filter_size * filter_size * 3 out_channel = filter_size * filter_size * 3
out = fluid.layers.conv2d( out = fluid.layers.conv2d(
input, input,
...@@ -104,7 +110,15 @@ class ResNet(object): ...@@ -104,7 +110,15 @@ class ResNet(object):
name=name) name=name)
return out return out
def _conv_norm(self, input, num_filters, filter_size, stride=1, groups=1, act=None, name=None, dcn_v2=False): def _conv_norm(self,
input,
num_filters,
filter_size,
stride=1,
groups=1,
act=None,
name=None,
dcn_v2=False):
_name = self.prefix_name + name if self.prefix_name != '' else name _name = self.prefix_name + name if self.prefix_name != '' else name
if not dcn_v2: if not dcn_v2:
conv = fluid.layers.conv2d( conv = fluid.layers.conv2d(
...@@ -129,7 +143,10 @@ class ResNet(object): ...@@ -129,7 +143,10 @@ class ResNet(object):
name=_name + "_conv_offset") name=_name + "_conv_offset")
offset_channel = filter_size**2 * 2 offset_channel = filter_size**2 * 2
mask_channel = filter_size**2 mask_channel = filter_size**2
offset, mask = fluid.layers.split(input=offset_mask, num_or_sections=[offset_channel, mask_channel], dim=1) offset, mask = fluid.layers.split(
input=offset_mask,
num_or_sections=[offset_channel, mask_channel],
dim=1)
mask = fluid.layers.sigmoid(mask) mask = fluid.layers.sigmoid(mask)
conv = fluid.layers.deformable_conv( conv = fluid.layers.deformable_conv(
input=input, input=input,
...@@ -151,8 +168,14 @@ class ResNet(object): ...@@ -151,8 +168,14 @@ class ResNet(object):
norm_lr = 0. if self.freeze_norm else 1. norm_lr = 0. if self.freeze_norm else 1.
norm_decay = self.norm_decay norm_decay = self.norm_decay
pattr = ParamAttr(name=bn_name + '_scale', learning_rate=norm_lr, regularizer=L2Decay(norm_decay)) pattr = ParamAttr(
battr = ParamAttr(name=bn_name + '_offset', learning_rate=norm_lr, regularizer=L2Decay(norm_decay)) name=bn_name + '_scale',
learning_rate=norm_lr,
regularizer=L2Decay(norm_decay))
battr = ParamAttr(
name=bn_name + '_offset',
learning_rate=norm_lr,
regularizer=L2Decay(norm_decay))
if self.norm_type in ['bn', 'sync_bn']: if self.norm_type in ['bn', 'sync_bn']:
global_stats = True if self.freeze_norm else False global_stats = True if self.freeze_norm else False
...@@ -169,10 +192,17 @@ class ResNet(object): ...@@ -169,10 +192,17 @@ class ResNet(object):
bias = fluid.framework._get_var(battr.name) bias = fluid.framework._get_var(battr.name)
elif self.norm_type == 'affine_channel': elif self.norm_type == 'affine_channel':
scale = fluid.layers.create_parameter( scale = fluid.layers.create_parameter(
shape=[conv.shape[1]], dtype=conv.dtype, attr=pattr, default_initializer=fluid.initializer.Constant(1.)) shape=[conv.shape[1]],
dtype=conv.dtype,
attr=pattr,
default_initializer=fluid.initializer.Constant(1.))
bias = fluid.layers.create_parameter( bias = fluid.layers.create_parameter(
shape=[conv.shape[1]], dtype=conv.dtype, attr=battr, default_initializer=fluid.initializer.Constant(0.)) shape=[conv.shape[1]],
out = fluid.layers.affine_channel(x=conv, scale=scale, bias=bias, act=act) dtype=conv.dtype,
attr=battr,
default_initializer=fluid.initializer.Constant(0.))
out = fluid.layers.affine_channel(
x=conv, scale=scale, bias=bias, act=act)
if self.freeze_norm: if self.freeze_norm:
scale.stop_gradient = True scale.stop_gradient = True
bias.stop_gradient = True bias.stop_gradient = True
...@@ -192,13 +222,24 @@ class ResNet(object): ...@@ -192,13 +222,24 @@ class ResNet(object):
return self._conv_norm(input, ch_out, 3, stride, name=name) return self._conv_norm(input, ch_out, 3, stride, name=name)
if max_pooling_in_short_cut and not is_first: if max_pooling_in_short_cut and not is_first:
input = fluid.layers.pool2d( input = fluid.layers.pool2d(
input=input, pool_size=2, pool_stride=2, pool_padding=0, ceil_mode=True, pool_type='avg') input=input,
pool_size=2,
pool_stride=2,
pool_padding=0,
ceil_mode=True,
pool_type='avg')
return self._conv_norm(input, ch_out, 1, 1, name=name) return self._conv_norm(input, ch_out, 1, 1, name=name)
return self._conv_norm(input, ch_out, 1, stride, name=name) return self._conv_norm(input, ch_out, 1, stride, name=name)
else: else:
return input return input
def bottleneck(self, input, num_filters, stride, is_first, name, dcn_v2=False): def bottleneck(self,
input,
num_filters,
stride,
is_first,
name,
dcn_v2=False):
if self.variant == 'a': if self.variant == 'a':
stride1, stride2 = stride, 1 stride1, stride2 = stride, 1
else: else:
...@@ -219,8 +260,9 @@ class ResNet(object): ...@@ -219,8 +260,9 @@ class ResNet(object):
shortcut_name = self.na.fix_bottleneck_name(name) shortcut_name = self.na.fix_bottleneck_name(name)
std_senet = getattr(self, 'std_senet', False) std_senet = getattr(self, 'std_senet', False)
if std_senet: if std_senet:
conv_def = [[int(num_filters / 2), 1, stride1, 'relu', 1, conv_name1], conv_def = [[
[num_filters, 3, stride2, 'relu', groups, conv_name2], int(num_filters / 2), 1, stride1, 'relu', 1, conv_name1
], [num_filters, 3, stride2, 'relu', groups, conv_name2],
[num_filters * expand, 1, 1, None, 1, conv_name3]] [num_filters * expand, 1, 1, None, 1, conv_name3]]
else: else:
conv_def = [[num_filters, 1, stride1, 'relu', 1, conv_name1], conv_def = [[num_filters, 1, stride1, 'relu', 1, conv_name1],
...@@ -238,18 +280,42 @@ class ResNet(object): ...@@ -238,18 +280,42 @@ class ResNet(object):
groups=g, groups=g,
name=_name, name=_name,
dcn_v2=(i == 1 and dcn_v2)) dcn_v2=(i == 1 and dcn_v2))
short = self._shortcut(input, num_filters * expand, stride, is_first=is_first, name=shortcut_name) short = self._shortcut(
input,
num_filters * expand,
stride,
is_first=is_first,
name=shortcut_name)
# Squeeze-and-Excitation # Squeeze-and-Excitation
if callable(getattr(self, '_squeeze_excitation', None)): if callable(getattr(self, '_squeeze_excitation', None)):
residual = self._squeeze_excitation(input=residual, num_channels=num_filters, name='fc' + name) residual = self._squeeze_excitation(
return fluid.layers.elementwise_add(x=short, y=residual, act='relu', name=name + ".add.output.5") input=residual, num_channels=num_filters, name='fc' + name)
return fluid.layers.elementwise_add(
def basicblock(self, input, num_filters, stride, is_first, name, dcn_v2=False): x=short, y=residual, act='relu', name=name + ".add.output.5")
def basicblock(self,
input,
num_filters,
stride,
is_first,
name,
dcn_v2=False):
assert dcn_v2 is False, "Not implemented yet." assert dcn_v2 is False, "Not implemented yet."
conv0 = self._conv_norm( conv0 = self._conv_norm(
input=input, num_filters=num_filters, filter_size=3, act='relu', stride=stride, name=name + "_branch2a") input=input,
conv1 = self._conv_norm(input=conv0, num_filters=num_filters, filter_size=3, act=None, name=name + "_branch2b") num_filters=num_filters,
short = self._shortcut(input, num_filters, stride, is_first, name=name + "_branch1") filter_size=3,
act='relu',
stride=stride,
name=name + "_branch2a")
conv1 = self._conv_norm(
input=conv0,
num_filters=num_filters,
filter_size=3,
act=None,
name=name + "_branch2b")
short = self._shortcut(
input, num_filters, stride, is_first, name=name + "_branch1")
return fluid.layers.elementwise_add(x=short, y=conv1, act='relu') return fluid.layers.elementwise_add(x=short, y=conv1, act='relu')
def layer_warp(self, input, stage_num): def layer_warp(self, input, stage_num):
...@@ -272,7 +338,8 @@ class ResNet(object): ...@@ -272,7 +338,8 @@ class ResNet(object):
nonlocal_mod = 1000 nonlocal_mod = 1000
if stage_num in self.nonlocal_stages: if stage_num in self.nonlocal_stages:
nonlocal_mod = self.nonlocal_mod_cfg[self.depth] if stage_num == 4 else 2 nonlocal_mod = self.nonlocal_mod_cfg[
self.depth] if stage_num == 4 else 2
# Make the layer name and parameter name consistent # Make the layer name and parameter name consistent
# with ImageNet pre-trained model # with ImageNet pre-trained model
...@@ -293,7 +360,9 @@ class ResNet(object): ...@@ -293,7 +360,9 @@ class ResNet(object):
dim_in = conv.shape[1] dim_in = conv.shape[1]
nonlocal_name = "nonlocal_conv{}".format(stage_num) nonlocal_name = "nonlocal_conv{}".format(stage_num)
if i % nonlocal_mod == nonlocal_mod - 1: if i % nonlocal_mod == nonlocal_mod - 1:
conv = add_space_nonlocal(conv, dim_in, dim_in, nonlocal_name + '_{}'.format(i), int(dim_in / 2)) conv = add_space_nonlocal(conv, dim_in, dim_in,
nonlocal_name + '_{}'.format(i),
int(dim_in / 2))
return conv return conv
def c1_stage(self, input): def c1_stage(self, input):
...@@ -311,9 +380,20 @@ class ResNet(object): ...@@ -311,9 +380,20 @@ class ResNet(object):
conv_def = [[out_chan, 7, 2, conv1_name]] conv_def = [[out_chan, 7, 2, conv1_name]]
for (c, k, s, _name) in conv_def: for (c, k, s, _name) in conv_def:
input = self._conv_norm(input=input, num_filters=c, filter_size=k, stride=s, act='relu', name=_name) input = self._conv_norm(
input=input,
output = fluid.layers.pool2d(input=input, pool_size=3, pool_stride=2, pool_padding=1, pool_type='max') num_filters=c,
filter_size=k,
stride=s,
act='relu',
name=_name)
output = fluid.layers.pool2d(
input=input,
pool_size=3,
pool_stride=2,
pool_padding=1,
pool_type='max')
return output return output
def __call__(self, input): def __call__(self, input):
...@@ -337,17 +417,19 @@ class ResNet(object): ...@@ -337,17 +417,19 @@ class ResNet(object):
if self.freeze_at >= i: if self.freeze_at >= i:
res.stop_gradient = True res.stop_gradient = True
if self.get_prediction: if self.get_prediction:
pool = fluid.layers.pool2d(input=res, pool_type='avg', global_pooling=True) pool = fluid.layers.pool2d(
input=res, pool_type='avg', global_pooling=True)
stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0) stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
out = fluid.layers.fc( out = fluid.layers.fc(
input=pool, input=pool,
size=self.class_dim, size=self.class_dim,
param_attr=fluid.param_attr.ParamAttr(initializer=fluid.initializer.Uniform(-stdv, stdv))) param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, stdv)))
out = fluid.layers.softmax(out) out = fluid.layers.softmax(out)
return out return out
return OrderedDict( return OrderedDict([('res{}_sum'.format(self.feature_maps[idx]), feat)
[('res{}_sum'.format(self.feature_maps[idx]), feat) for idx, feat in enumerate(res_endpoints)]) for idx, feat in enumerate(res_endpoints)])
class ResNetC5(ResNet): class ResNetC5(ResNet):
...@@ -360,5 +442,6 @@ class ResNetC5(ResNet): ...@@ -360,5 +442,6 @@ class ResNetC5(ResNet):
variant='b', variant='b',
feature_maps=[5], feature_maps=[5],
weight_prefix_name=''): weight_prefix_name=''):
super(ResNetC5, self).__init__(depth, freeze_at, norm_type, freeze_norm, norm_decay, variant, feature_maps) super(ResNetC5, self).__init__(depth, freeze_at, norm_type, freeze_norm,
norm_decay, variant, feature_maps)
self.severed_head = True self.severed_head = True
...@@ -45,7 +45,12 @@ class RPNTargetAssign(object): ...@@ -45,7 +45,12 @@ class RPNTargetAssign(object):
class GenerateProposals(object): class GenerateProposals(object):
# __op__ = fluid.layers.generate_proposals # __op__ = fluid.layers.generate_proposals
def __init__(self, pre_nms_top_n=6000, post_nms_top_n=1000, nms_thresh=.5, min_size=.1, eta=1.): def __init__(self,
pre_nms_top_n=6000,
post_nms_top_n=1000,
nms_thresh=.5,
min_size=.1,
eta=1.):
super(GenerateProposals, self).__init__() super(GenerateProposals, self).__init__()
self.pre_nms_top_n = pre_nms_top_n self.pre_nms_top_n = pre_nms_top_n
self.post_nms_top_n = post_nms_top_n self.post_nms_top_n = post_nms_top_n
...@@ -65,9 +70,17 @@ class RPNHead(object): ...@@ -65,9 +70,17 @@ class RPNHead(object):
test_proposal (object): `GenerateProposals` instance for testing test_proposal (object): `GenerateProposals` instance for testing
num_classes (int): number of classes in rpn output num_classes (int): number of classes in rpn output
""" """
__inject__ = ['anchor_generator', 'rpn_target_assign', 'train_proposal', 'test_proposal'] __inject__ = [
'anchor_generator', 'rpn_target_assign', 'train_proposal',
'test_proposal'
]
def __init__(self, anchor_generator, rpn_target_assign, train_proposal, test_proposal, num_classes=1): def __init__(self,
anchor_generator,
rpn_target_assign,
train_proposal,
test_proposal,
num_classes=1):
super(RPNHead, self).__init__() super(RPNHead, self).__init__()
self.anchor_generator = anchor_generator self.anchor_generator = anchor_generator
self.rpn_target_assign = rpn_target_assign self.rpn_target_assign = rpn_target_assign
...@@ -95,8 +108,10 @@ class RPNHead(object): ...@@ -95,8 +108,10 @@ class RPNHead(object):
padding=1, padding=1,
act='relu', act='relu',
name='conv_rpn', name='conv_rpn',
param_attr=ParamAttr(name="conv_rpn_w", initializer=Normal(loc=0., scale=0.01)), param_attr=ParamAttr(
bias_attr=ParamAttr(name="conv_rpn_b", learning_rate=2., regularizer=L2Decay(0.))) name="conv_rpn_w", initializer=Normal(loc=0., scale=0.01)),
bias_attr=ParamAttr(
name="conv_rpn_b", learning_rate=2., regularizer=L2Decay(0.)))
# Generate anchors self.anchor_generator # Generate anchors self.anchor_generator
self.anchor, self.anchor_var = fluid.layers.anchor_generator( self.anchor, self.anchor_var = fluid.layers.anchor_generator(
input=rpn_conv, input=rpn_conv,
...@@ -115,8 +130,13 @@ class RPNHead(object): ...@@ -115,8 +130,13 @@ class RPNHead(object):
padding=0, padding=0,
act=None, act=None,
name='rpn_cls_score', name='rpn_cls_score',
param_attr=ParamAttr(name="rpn_cls_logits_w", initializer=Normal(loc=0., scale=0.01)), param_attr=ParamAttr(
bias_attr=ParamAttr(name="rpn_cls_logits_b", learning_rate=2., regularizer=L2Decay(0.))) name="rpn_cls_logits_w", initializer=Normal(loc=0.,
scale=0.01)),
bias_attr=ParamAttr(
name="rpn_cls_logits_b",
learning_rate=2.,
regularizer=L2Decay(0.)))
# Proposal bbox regression deltas # Proposal bbox regression deltas
self.rpn_bbox_pred = fluid.layers.conv2d( self.rpn_bbox_pred = fluid.layers.conv2d(
rpn_conv, rpn_conv,
...@@ -126,8 +146,12 @@ class RPNHead(object): ...@@ -126,8 +146,12 @@ class RPNHead(object):
padding=0, padding=0,
act=None, act=None,
name='rpn_bbox_pred', name='rpn_bbox_pred',
param_attr=ParamAttr(name="rpn_bbox_pred_w", initializer=Normal(loc=0., scale=0.01)), param_attr=ParamAttr(
bias_attr=ParamAttr(name="rpn_bbox_pred_b", learning_rate=2., regularizer=L2Decay(0.))) name="rpn_bbox_pred_w", initializer=Normal(loc=0., scale=0.01)),
bias_attr=ParamAttr(
name="rpn_bbox_pred_b",
learning_rate=2.,
regularizer=L2Decay(0.)))
return self.rpn_cls_score, self.rpn_bbox_pred return self.rpn_cls_score, self.rpn_bbox_pred
def get_proposals(self, body_feats, im_info, mode='train'): def get_proposals(self, body_feats, im_info, mode='train'):
...@@ -150,15 +174,22 @@ class RPNHead(object): ...@@ -150,15 +174,22 @@ class RPNHead(object):
rpn_cls_score, rpn_bbox_pred = self._get_output(body_feat) rpn_cls_score, rpn_bbox_pred = self._get_output(body_feat)
if self.num_classes == 1: if self.num_classes == 1:
rpn_cls_prob = fluid.layers.sigmoid(rpn_cls_score, name='rpn_cls_prob') rpn_cls_prob = fluid.layers.sigmoid(
rpn_cls_score, name='rpn_cls_prob')
else: else:
rpn_cls_score = fluid.layers.transpose(rpn_cls_score, perm=[0, 2, 3, 1]) rpn_cls_score = fluid.layers.transpose(
rpn_cls_score = fluid.layers.reshape(rpn_cls_score, shape=(0, 0, 0, -1, self.num_classes)) rpn_cls_score, perm=[0, 2, 3, 1])
rpn_cls_prob_tmp = fluid.layers.softmax(rpn_cls_score, use_cudnn=False, name='rpn_cls_prob') rpn_cls_score = fluid.layers.reshape(
rpn_cls_prob_slice = fluid.layers.slice(rpn_cls_prob_tmp, axes=[4], starts=[1], ends=[self.num_classes]) rpn_cls_score, shape=(0, 0, 0, -1, self.num_classes))
rpn_cls_prob_tmp = fluid.layers.softmax(
rpn_cls_score, use_cudnn=False, name='rpn_cls_prob')
rpn_cls_prob_slice = fluid.layers.slice(
rpn_cls_prob_tmp, axes=[4], starts=[1], ends=[self.num_classes])
rpn_cls_prob, _ = fluid.layers.topk(rpn_cls_prob_slice, 1) rpn_cls_prob, _ = fluid.layers.topk(rpn_cls_prob_slice, 1)
rpn_cls_prob = fluid.layers.reshape(rpn_cls_prob, shape=(0, 0, 0, -1)) rpn_cls_prob = fluid.layers.reshape(
rpn_cls_prob = fluid.layers.transpose(rpn_cls_prob, perm=[0, 3, 1, 2]) rpn_cls_prob, shape=(0, 0, 0, -1))
rpn_cls_prob = fluid.layers.transpose(
rpn_cls_prob, perm=[0, 3, 1, 2])
prop_op = self.train_proposal if mode == 'train' else self.test_proposal prop_op = self.train_proposal if mode == 'train' else self.test_proposal
# prop_op # prop_op
rpn_rois, rpn_roi_probs = fluid.layers.generate_proposals( rpn_rois, rpn_roi_probs = fluid.layers.generate_proposals(
...@@ -174,20 +205,24 @@ class RPNHead(object): ...@@ -174,20 +205,24 @@ class RPNHead(object):
eta=prop_op.eta) eta=prop_op.eta)
return rpn_rois return rpn_rois
def _transform_input(self, rpn_cls_score, rpn_bbox_pred, anchor, anchor_var): def _transform_input(self, rpn_cls_score, rpn_bbox_pred, anchor,
anchor_var):
rpn_cls_score = fluid.layers.transpose(rpn_cls_score, perm=[0, 2, 3, 1]) rpn_cls_score = fluid.layers.transpose(rpn_cls_score, perm=[0, 2, 3, 1])
rpn_bbox_pred = fluid.layers.transpose(rpn_bbox_pred, perm=[0, 2, 3, 1]) rpn_bbox_pred = fluid.layers.transpose(rpn_bbox_pred, perm=[0, 2, 3, 1])
anchor = fluid.layers.reshape(anchor, shape=(-1, 4)) anchor = fluid.layers.reshape(anchor, shape=(-1, 4))
anchor_var = fluid.layers.reshape(anchor_var, shape=(-1, 4)) anchor_var = fluid.layers.reshape(anchor_var, shape=(-1, 4))
rpn_cls_score = fluid.layers.reshape(x=rpn_cls_score, shape=(0, -1, self.num_classes)) rpn_cls_score = fluid.layers.reshape(
x=rpn_cls_score, shape=(0, -1, self.num_classes))
rpn_bbox_pred = fluid.layers.reshape(x=rpn_bbox_pred, shape=(0, -1, 4)) rpn_bbox_pred = fluid.layers.reshape(x=rpn_bbox_pred, shape=(0, -1, 4))
return rpn_cls_score, rpn_bbox_pred, anchor, anchor_var return rpn_cls_score, rpn_bbox_pred, anchor, anchor_var
def _get_loss_input(self): def _get_loss_input(self):
for attr in ['rpn_cls_score', 'rpn_bbox_pred', 'anchor', 'anchor_var']: for attr in ['rpn_cls_score', 'rpn_bbox_pred', 'anchor', 'anchor_var']:
if not getattr(self, attr, None): if not getattr(self, attr, None):
raise ValueError("self.{} should not be None,".format(attr), "call RPNHead.get_proposals first") raise ValueError("self.{} should not be None,".format(attr),
return self._transform_input(self.rpn_cls_score, self.rpn_bbox_pred, self.anchor, self.anchor_var) "call RPNHead.get_proposals first")
return self._transform_input(self.rpn_cls_score, self.rpn_bbox_pred,
self.anchor, self.anchor_var)
def get_loss(self, im_info, gt_box, is_crowd, gt_label=None): def get_loss(self, im_info, gt_box, is_crowd, gt_label=None):
""" """
...@@ -227,7 +262,8 @@ class RPNHead(object): ...@@ -227,7 +262,8 @@ class RPNHead(object):
use_random=self.rpn_target_assign.use_random) use_random=self.rpn_target_assign.use_random)
score_tgt = fluid.layers.cast(x=score_tgt, dtype='float32') score_tgt = fluid.layers.cast(x=score_tgt, dtype='float32')
score_tgt.stop_gradient = True score_tgt.stop_gradient = True
rpn_cls_loss = fluid.layers.sigmoid_cross_entropy_with_logits(x=score_pred, label=score_tgt) rpn_cls_loss = fluid.layers.sigmoid_cross_entropy_with_logits(
x=score_pred, label=score_tgt)
else: else:
score_pred, loc_pred, score_tgt, loc_tgt, bbox_weight = \ score_pred, loc_pred, score_tgt, loc_tgt, bbox_weight = \
self.rpn_target_assign( self.rpn_target_assign(
...@@ -245,13 +281,19 @@ class RPNHead(object): ...@@ -245,13 +281,19 @@ class RPNHead(object):
rpn_cls_loss = fluid.layers.softmax_with_cross_entropy( rpn_cls_loss = fluid.layers.softmax_with_cross_entropy(
logits=score_pred, label=labels_int64, numeric_stable_mode=True) logits=score_pred, label=labels_int64, numeric_stable_mode=True)
rpn_cls_loss = fluid.layers.reduce_mean(rpn_cls_loss, name='loss_rpn_cls') rpn_cls_loss = fluid.layers.reduce_mean(
rpn_cls_loss, name='loss_rpn_cls')
loc_tgt = fluid.layers.cast(x=loc_tgt, dtype='float32') loc_tgt = fluid.layers.cast(x=loc_tgt, dtype='float32')
loc_tgt.stop_gradient = True loc_tgt.stop_gradient = True
rpn_reg_loss = fluid.layers.smooth_l1( rpn_reg_loss = fluid.layers.smooth_l1(
x=loc_pred, y=loc_tgt, sigma=3.0, inside_weight=bbox_weight, outside_weight=bbox_weight) x=loc_pred,
rpn_reg_loss = fluid.layers.reduce_sum(rpn_reg_loss, name='loss_rpn_bbox') y=loc_tgt,
sigma=3.0,
inside_weight=bbox_weight,
outside_weight=bbox_weight)
rpn_reg_loss = fluid.layers.reduce_sum(
rpn_reg_loss, name='loss_rpn_bbox')
score_shape = fluid.layers.shape(score_tgt) score_shape = fluid.layers.shape(score_tgt)
score_shape = fluid.layers.cast(x=score_shape, dtype='float32') score_shape = fluid.layers.cast(x=score_shape, dtype='float32')
norm = fluid.layers.reduce_prod(score_shape) norm = fluid.layers.reduce_prod(score_shape)
......
## 命令行预测 # faster_rcnn_resnet50_fpn_coco2017
```shell
$ hub run faster_rcnn_resnet50_fpn_coco2017 --input_path "/PATH/TO/IMAGE"
```
## API
```python
def context(num_classes=81,
trainable=True,
pretrained=True,
phase='train')
```
提取特征,用于迁移学习。
**参数**
* num\_classes (int): 类别数;
* trainable(bool): 参数是否可训练;
* pretrained (bool): 是否加载预训练模型;
* phase (str): 可选值为 'train'/'predict','trian' 用于训练,'predict' 用于预测。
**返回**
* inputs (dict): 模型的输入,相应的取值为:
当 phase 为 'train'时,包含:
* image (Variable): 图像变量
* im\_size (Variable): 图像的尺寸
* im\_info (Variable): 图像缩放信息
* gt\_class (Variable): 检测框类别
* gt\_box (Variable): 检测框坐标
* is\_crowd (Variable): 单个框内是否包含多个物体
当 phase 为 'predict'时,包含:
* image (Variable): 图像变量
* im\_size (Variable): 图像的尺寸
* im\_info (Variable): 图像缩放信息
* outputs (dict): 模型的输出,相应的取值为:
当 phase 为 'train'时,包含:
* head_features (Variable): 所提取的特征
* rpn\_cls\_loss (Variable): 检测框分类损失
* rpn\_reg\_loss (Variable): 检测框回归损失
* generate\_proposal\_labels (Variable): 图像信息
当 phase 为 'predict'时,包含:
* head_features (Variable): 所提取的特征
* rois (Variable): 提取的roi
* bbox\_out (Variable): 预测结果
* context\_prog (Program): 用于迁移学习的 Program。
```python
def object_detection(paths=None,
images=None,
batch_size=1,
use_gpu=False,
output_dir='detection_result',
score_thresh=0.5,
visualization=True)
```
预测API,检测输入图片中的所有目标的位置。
**参数**
* paths (list\[str\]): 图片的路径;
* images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
* batch\_size (int): batch 的大小;
* use\_gpu (bool): 是否使用 GPU;
* score\_thresh (float): 识别置信度的阈值;
* visualization (bool): 是否将识别结果保存为图片文件;
* output\_dir (str): 图片的保存路径,默认设为 detection\_result;
**返回**
* res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
* data (list): 检测结果,list的每一个元素为 dict,各字段为:
* confidence (float): 识别的置信度;
* label (str): 标签;
* left (int): 边界框的左上角x坐标;
* top (int): 边界框的左上角y坐标;
* right (int): 边界框的右下角x坐标;
* bottom (int): 边界框的右下角y坐标;
* save\_path (str, optional): 识别结果的保存路径 (仅当visualization=True时存在)。
```python
def save_inference_model(dirname,
model_filename=None,
params_filename=None,
combined=True)
```
将模型保存到指定路径。
**参数**
* dirname: 存在模型的目录名称
* model\_filename: 模型文件名称,默认为\_\_model\_\_
* params\_filename: 参数文件名称,默认为\_\_params\_\_(仅当`combined`为True时生效)
* combined: 是否将参数保存到统一的一个文件中
## 代码示例
```python
import paddlehub as hub
import cv2
object_detector = hub.Module(name="faster_rcnn_resnet50_fpn_coco2017")
result = object_detector.object_detection(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = object_detector.object_detection((paths=['/PATH/TO/IMAGE'])
```
## 服务部署
PaddleHub Serving 可以部署一个目标检测的在线服务。
## 第一步:启动PaddleHub Serving
运行启动命令:
```shell
$ hub serving start -m faster_rcnn_resnet50_fpn_coco2017
```
这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
## 第二步:发送预测请求
配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
```python
import requests
import json
import cv2
import base64
|模型名称|faster_rcnn_resnet50_fpn_coco2017|
| :--- | :---: |
|类别|图像 - 目标检测|
|网络|faster_rcnn|
|数据集|COCO2017|
|是否支持Fine-tuning|否|
|模型大小|161MB|
|最新更新日期|2021-03-15|
|数据指标|-|
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
## 一、模型基本信息
# 发送HTTP请求 - ### 应用效果展示
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]} - 样例结果示例:
headers = {"Content-type": "application/json"} <p align="center">
url = "http://127.0.0.1:8866/predict/faster_rcnn_resnet50_fpn_coco2017" <img src="https://user-images.githubusercontent.com/22424850/131504887-d024c7e5-fc09-4d6b-92b8-4d0c965949d0.jpg" width='50%' hspace='10'/>
r = requests.post(url=url, headers=headers, data=json.dumps(data)) <br />
</p>
# 打印预测结果 - ### 模型介绍
print(r.json()["results"])
```
### 依赖 - Faster_RCNN是两阶段目标检测器,对图像生成候选区域、提取特征、判别特征类别并修正候选框位置。Faster_RCNN整体网络可以分为4个部分,一是ResNet-50作为基础卷积层,二是区域生成网络,三是Rol Align,四是检测层。Faster_RCNN是在MS-COCO数据集上预训练的模型。目前仅支持预测。
paddlepaddle >= 1.6.2
paddlehub >= 1.6.0 ## 二、安装
- ### 1、环境依赖
- paddlepaddle >= 1.6.2
- paddlehub >= 1.6.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
- ### 2、安装
- ```shell
$ hub install faster_rcnn_resnet50_fpn_coco2017
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
$ hub run faster_rcnn_resnet50_fpn_coco2017 --input_path "/PATH/TO/IMAGE"
```
- 通过命令行方式实现目标检测模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、代码示例
- ```python
import paddlehub as hub
import cv2
object_detector = hub.Module(name="faster_rcnn_resnet50_fpn_coco2017")
result = object_detector.object_detection(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = object_detector.object_detection((paths=['/PATH/TO/IMAGE'])
```
- ### 3、API
- ```python
def object_detection(paths=None,
images=None,
batch_size=1,
use_gpu=False,
output_dir='detection_result',
score_thresh=0.5,
visualization=True)
```
- 预测API,检测输入图片中的所有目标的位置。
- **参数**
- paths (list\[str\]): 图片的路径; <br/>
- images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式; <br/>
- batch\_size (int): batch 的大小;<br/>
- use\_gpu (bool): 是否使用 GPU;<br/>
- output\_dir (str): 图片的保存路径,默认设为 detection\_result;<br/>
- score\_thresh (float): 识别置信度的阈值;<br/>
- visualization (bool): 是否将识别结果保存为图片文件。
**NOTE:** paths和images两个参数选择其一进行提供数据
- **返回**
- res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
- data (list): 检测结果,list的每一个元素为 dict,各字段为:
- confidence (float): 识别的置信度
- label (str): 标签
- left (int): 边界框的左上角x坐标
- top (int): 边界框的左上角y坐标
- right (int): 边界框的右下角x坐标
- bottom (int): 边界框的右下角y坐标
- save\_path (str, optional): 识别结果的保存路径 (仅当visualization=True时存在)
- ```python
def save_inference_model(dirname,
model_filename=None,
params_filename=None,
combined=True)
```
- 将模型保存到指定路径。
- **参数**
- dirname: 存在模型的目录名称; <br/>
- model\_filename: 模型文件名称,默认为\_\_model\_\_; <br/>
- params\_filename: 参数文件名称,默认为\_\_params\_\_(仅当`combined`为True时生效);<br/>
- combined: 是否将参数保存到统一的一个文件中。
## 四、服务部署
- PaddleHub Serving可以部署一个目标检测的在线服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m faster_rcnn_resnet50_fpn_coco2017
```
- 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/faster_rcnn_resnet50_fpn_coco2017"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
## 五、更新历史
* 1.0.0
初始发布
* 1.0.1
修复numpy数据读取问题
- ```shell
$ hub install faster_rcnn_resnet50_fpn_coco2017==1.0.1
```
...@@ -45,11 +45,18 @@ class SmoothL1Loss(object): ...@@ -45,11 +45,18 @@ class SmoothL1Loss(object):
def __call__(self, x, y, inside_weight=None, outside_weight=None): def __call__(self, x, y, inside_weight=None, outside_weight=None):
return fluid.layers.smooth_l1( return fluid.layers.smooth_l1(
x, y, inside_weight=inside_weight, outside_weight=outside_weight, sigma=self.sigma) x,
y,
inside_weight=inside_weight,
outside_weight=outside_weight,
sigma=self.sigma)
class BoxCoder(object): class BoxCoder(object):
def __init__(self, prior_box_var=[0.1, 0.1, 0.2, 0.2], code_type='decode_center_size', box_normalized=False, def __init__(self,
prior_box_var=[0.1, 0.1, 0.2, 0.2],
code_type='decode_center_size',
box_normalized=False,
axis=1): axis=1):
super(BoxCoder, self).__init__() super(BoxCoder, self).__init__()
self.prior_box_var = prior_box_var self.prior_box_var = prior_box_var
...@@ -79,14 +86,16 @@ class TwoFCHead(object): ...@@ -79,14 +86,16 @@ class TwoFCHead(object):
act='relu', act='relu',
name='fc6', name='fc6',
param_attr=ParamAttr(name='fc6_w', initializer=Xavier(fan_out=fan)), param_attr=ParamAttr(name='fc6_w', initializer=Xavier(fan_out=fan)),
bias_attr=ParamAttr(name='fc6_b', learning_rate=2., regularizer=L2Decay(0.))) bias_attr=ParamAttr(
name='fc6_b', learning_rate=2., regularizer=L2Decay(0.)))
head_feat = fluid.layers.fc( head_feat = fluid.layers.fc(
input=fc6, input=fc6,
size=self.mlp_dim, size=self.mlp_dim,
act='relu', act='relu',
name='fc7', name='fc7',
param_attr=ParamAttr(name='fc7_w', initializer=Xavier()), param_attr=ParamAttr(name='fc7_w', initializer=Xavier()),
bias_attr=ParamAttr(name='fc7_b', learning_rate=2., regularizer=L2Decay(0.))) bias_attr=ParamAttr(
name='fc7_b', learning_rate=2., regularizer=L2Decay(0.)))
return head_feat return head_feat
...@@ -104,7 +113,12 @@ class BBoxHead(object): ...@@ -104,7 +113,12 @@ class BBoxHead(object):
__inject__ = ['head', 'box_coder', 'nms', 'bbox_loss'] __inject__ = ['head', 'box_coder', 'nms', 'bbox_loss']
__shared__ = ['num_classes'] __shared__ = ['num_classes']
def __init__(self, head, box_coder=BoxCoder(), nms=MultiClassNMS(), bbox_loss=SmoothL1Loss(), num_classes=81): def __init__(self,
head,
box_coder=BoxCoder(),
nms=MultiClassNMS(),
bbox_loss=SmoothL1Loss(),
num_classes=81):
super(BBoxHead, self).__init__() super(BBoxHead, self).__init__()
self.head = head self.head = head
self.num_classes = num_classes self.num_classes = num_classes
...@@ -141,24 +155,30 @@ class BBoxHead(object): ...@@ -141,24 +155,30 @@ class BBoxHead(object):
head_feat = self.get_head_feat(roi_feat) head_feat = self.get_head_feat(roi_feat)
# when ResNetC5 output a single feature map # when ResNetC5 output a single feature map
if not isinstance(self.head, TwoFCHead): if not isinstance(self.head, TwoFCHead):
head_feat = fluid.layers.pool2d(head_feat, pool_type='avg', global_pooling=True) head_feat = fluid.layers.pool2d(
head_feat, pool_type='avg', global_pooling=True)
cls_score = fluid.layers.fc( cls_score = fluid.layers.fc(
input=head_feat, input=head_feat,
size=self.num_classes, size=self.num_classes,
act=None, act=None,
name='cls_score', name='cls_score',
param_attr=ParamAttr(name='cls_score_w', initializer=Normal(loc=0.0, scale=0.01)), param_attr=ParamAttr(
bias_attr=ParamAttr(name='cls_score_b', learning_rate=2., regularizer=L2Decay(0.))) name='cls_score_w', initializer=Normal(loc=0.0, scale=0.01)),
bias_attr=ParamAttr(
name='cls_score_b', learning_rate=2., regularizer=L2Decay(0.)))
bbox_pred = fluid.layers.fc( bbox_pred = fluid.layers.fc(
input=head_feat, input=head_feat,
size=4 * self.num_classes, size=4 * self.num_classes,
act=None, act=None,
name='bbox_pred', name='bbox_pred',
param_attr=ParamAttr(name='bbox_pred_w', initializer=Normal(loc=0.0, scale=0.001)), param_attr=ParamAttr(
bias_attr=ParamAttr(name='bbox_pred_b', learning_rate=2., regularizer=L2Decay(0.))) name='bbox_pred_w', initializer=Normal(loc=0.0, scale=0.001)),
bias_attr=ParamAttr(
name='bbox_pred_b', learning_rate=2., regularizer=L2Decay(0.)))
return cls_score, bbox_pred return cls_score, bbox_pred
def get_loss(self, roi_feat, labels_int32, bbox_targets, bbox_inside_weights, bbox_outside_weights): def get_loss(self, roi_feat, labels_int32, bbox_targets,
bbox_inside_weights, bbox_outside_weights):
""" """
Get bbox_head loss. Get bbox_head loss.
...@@ -187,11 +207,19 @@ class BBoxHead(object): ...@@ -187,11 +207,19 @@ class BBoxHead(object):
logits=cls_score, label=labels_int64, numeric_stable_mode=True) logits=cls_score, label=labels_int64, numeric_stable_mode=True)
loss_cls = fluid.layers.reduce_mean(loss_cls) loss_cls = fluid.layers.reduce_mean(loss_cls)
loss_bbox = self.bbox_loss( loss_bbox = self.bbox_loss(
x=bbox_pred, y=bbox_targets, inside_weight=bbox_inside_weights, outside_weight=bbox_outside_weights) x=bbox_pred,
y=bbox_targets,
inside_weight=bbox_inside_weights,
outside_weight=bbox_outside_weights)
loss_bbox = fluid.layers.reduce_mean(loss_bbox) loss_bbox = fluid.layers.reduce_mean(loss_bbox)
return {'loss_cls': loss_cls, 'loss_bbox': loss_bbox} return {'loss_cls': loss_cls, 'loss_bbox': loss_bbox}
def get_prediction(self, roi_feat, rois, im_info, im_shape, return_box_score=False): def get_prediction(self,
roi_feat,
rois,
im_info,
im_shape,
return_box_score=False):
""" """
Get prediction bounding box in test stage. Get prediction bounding box in test stage.
......
...@@ -31,7 +31,8 @@ def test_reader(paths=None, images=None): ...@@ -31,7 +31,8 @@ def test_reader(paths=None, images=None):
img_list = list() img_list = list()
if paths: if paths:
for img_path in paths: for img_path in paths:
assert os.path.isfile(img_path), "The {} isn't a valid file path.".format(img_path) assert os.path.isfile(
img_path), "The {} isn't a valid file path.".format(img_path)
img = cv2.imread(img_path).astype('float32') img = cv2.imread(img_path).astype('float32')
img_list.append(img) img_list.append(img)
if images is not None: if images is not None:
...@@ -66,7 +67,13 @@ def test_reader(paths=None, images=None): ...@@ -66,7 +67,13 @@ def test_reader(paths=None, images=None):
# im_info holds the resize info of image. # im_info holds the resize info of image.
im_info = np.array([resize_h, resize_w, im_scale]).astype('float32') im_info = np.array([resize_h, resize_w, im_scale]).astype('float32')
im = cv2.resize(im, None, None, fx=im_scale, fy=im_scale, interpolation=cv2.INTER_LINEAR) im = cv2.resize(
im,
None,
None,
fx=im_scale,
fy=im_scale,
interpolation=cv2.INTER_LINEAR)
# HWC --> CHW # HWC --> CHW
im = np.swapaxes(im, 1, 2) im = np.swapaxes(im, 1, 2)
...@@ -75,11 +82,14 @@ def test_reader(paths=None, images=None): ...@@ -75,11 +82,14 @@ def test_reader(paths=None, images=None):
def padding_minibatch(batch_data, coarsest_stride=0, use_padded_im_info=True): def padding_minibatch(batch_data, coarsest_stride=0, use_padded_im_info=True):
max_shape_org = np.array([data['image'].shape for data in batch_data]).max(axis=0) max_shape_org = np.array(
[data['image'].shape for data in batch_data]).max(axis=0)
if coarsest_stride > 0: if coarsest_stride > 0:
max_shape = np.zeros((3)).astype('int32') max_shape = np.zeros((3)).astype('int32')
max_shape[1] = int(np.ceil(max_shape_org[1] / coarsest_stride) * coarsest_stride) max_shape[1] = int(
max_shape[2] = int(np.ceil(max_shape_org[2] / coarsest_stride) * coarsest_stride) np.ceil(max_shape_org[1] / coarsest_stride) * coarsest_stride)
max_shape[2] = int(
np.ceil(max_shape_org[2] / coarsest_stride) * coarsest_stride)
else: else:
max_shape = max_shape_org.astype('int32') max_shape = max_shape_org.astype('int32')
...@@ -90,12 +100,15 @@ def padding_minibatch(batch_data, coarsest_stride=0, use_padded_im_info=True): ...@@ -90,12 +100,15 @@ def padding_minibatch(batch_data, coarsest_stride=0, use_padded_im_info=True):
for data in batch_data: for data in batch_data:
im_c, im_h, im_w = data['image'].shape im_c, im_h, im_w = data['image'].shape
# image # image
padding_im = np.zeros((im_c, max_shape[1], max_shape[2]), dtype=np.float32) padding_im = np.zeros((im_c, max_shape[1], max_shape[2]),
dtype=np.float32)
padding_im[:, 0:im_h, 0:im_w] = data['image'] padding_im[:, 0:im_h, 0:im_w] = data['image']
padding_image.append(padding_im) padding_image.append(padding_im)
# im_info # im_info
data['im_info'][0] = max_shape[1] if use_padded_im_info else max_shape_org[1] data['im_info'][
data['im_info'][1] = max_shape[2] if use_padded_im_info else max_shape_org[2] 0] = max_shape[1] if use_padded_im_info else max_shape_org[1]
data['im_info'][
1] = max_shape[2] if use_padded_im_info else max_shape_org[2]
padding_info.append(data['im_info']) padding_info.append(data['im_info'])
padding_shape.append(data['im_shape']) padding_shape.append(data['im_shape'])
......
...@@ -52,13 +52,22 @@ def ConvNorm(input, ...@@ -52,13 +52,22 @@ def ConvNorm(input,
dilation=dilation, dilation=dilation,
groups=groups, groups=groups,
act=None, act=None,
param_attr=ParamAttr(name=name + "_weights", initializer=initializer, learning_rate=lr_scale), param_attr=ParamAttr(
name=name + "_weights",
initializer=initializer,
learning_rate=lr_scale),
bias_attr=False, bias_attr=False,
name=name + '.conv2d.output.1') name=name + '.conv2d.output.1')
norm_lr = 0. if freeze_norm else 1. norm_lr = 0. if freeze_norm else 1.
pattr = ParamAttr(name=norm_name + '_scale', learning_rate=norm_lr * lr_scale, regularizer=L2Decay(norm_decay)) pattr = ParamAttr(
battr = ParamAttr(name=norm_name + '_offset', learning_rate=norm_lr * lr_scale, regularizer=L2Decay(norm_decay)) name=norm_name + '_scale',
learning_rate=norm_lr * lr_scale,
regularizer=L2Decay(norm_decay))
battr = ParamAttr(
name=norm_name + '_offset',
learning_rate=norm_lr * lr_scale,
regularizer=L2Decay(norm_decay))
if norm_type in ['bn', 'sync_bn']: if norm_type in ['bn', 'sync_bn']:
global_stats = True if freeze_norm else False global_stats = True if freeze_norm else False
...@@ -75,15 +84,27 @@ def ConvNorm(input, ...@@ -75,15 +84,27 @@ def ConvNorm(input,
bias = fluid.framework._get_var(battr.name) bias = fluid.framework._get_var(battr.name)
elif norm_type == 'gn': elif norm_type == 'gn':
out = fluid.layers.group_norm( out = fluid.layers.group_norm(
input=conv, act=act, name=norm_name + '.output.1', groups=norm_groups, param_attr=pattr, bias_attr=battr) input=conv,
act=act,
name=norm_name + '.output.1',
groups=norm_groups,
param_attr=pattr,
bias_attr=battr)
scale = fluid.framework._get_var(pattr.name) scale = fluid.framework._get_var(pattr.name)
bias = fluid.framework._get_var(battr.name) bias = fluid.framework._get_var(battr.name)
elif norm_type == 'affine_channel': elif norm_type == 'affine_channel':
scale = fluid.layers.create_parameter( scale = fluid.layers.create_parameter(
shape=[conv.shape[1]], dtype=conv.dtype, attr=pattr, default_initializer=fluid.initializer.Constant(1.)) shape=[conv.shape[1]],
dtype=conv.dtype,
attr=pattr,
default_initializer=fluid.initializer.Constant(1.))
bias = fluid.layers.create_parameter( bias = fluid.layers.create_parameter(
shape=[conv.shape[1]], dtype=conv.dtype, attr=battr, default_initializer=fluid.initializer.Constant(0.)) shape=[conv.shape[1]],
out = fluid.layers.affine_channel(x=conv, scale=scale, bias=bias, act=act) dtype=conv.dtype,
attr=battr,
default_initializer=fluid.initializer.Constant(0.))
out = fluid.layers.affine_channel(
x=conv, scale=scale, bias=bias, act=act)
if freeze_norm: if freeze_norm:
scale.stop_gradient = True scale.stop_gradient = True
bias.stop_gradient = True bias.stop_gradient = True
...@@ -140,10 +161,15 @@ class FPN(object): ...@@ -140,10 +161,15 @@ class FPN(object):
body_input, body_input,
self.num_chan, self.num_chan,
1, 1,
param_attr=ParamAttr(name=lateral_name + "_w", initializer=Xavier(fan_out=fan)), param_attr=ParamAttr(
bias_attr=ParamAttr(name=lateral_name + "_b", learning_rate=2., regularizer=L2Decay(0.)), name=lateral_name + "_w", initializer=Xavier(fan_out=fan)),
bias_attr=ParamAttr(
name=lateral_name + "_b",
learning_rate=2.,
regularizer=L2Decay(0.)),
name=lateral_name) name=lateral_name)
topdown = fluid.layers.resize_nearest(upper_output, scale=2., name=topdown_name) topdown = fluid.layers.resize_nearest(
upper_output, scale=2., name=topdown_name)
return lateral + topdown return lateral + topdown
def get_output(self, body_dict): def get_output(self, body_dict):
...@@ -182,14 +208,20 @@ class FPN(object): ...@@ -182,14 +208,20 @@ class FPN(object):
body_input, body_input,
self.num_chan, self.num_chan,
1, 1,
param_attr=ParamAttr(name=fpn_inner_name + "_w", initializer=Xavier(fan_out=fan)), param_attr=ParamAttr(
bias_attr=ParamAttr(name=fpn_inner_name + "_b", learning_rate=2., regularizer=L2Decay(0.)), name=fpn_inner_name + "_w",
initializer=Xavier(fan_out=fan)),
bias_attr=ParamAttr(
name=fpn_inner_name + "_b",
learning_rate=2.,
regularizer=L2Decay(0.)),
name=fpn_inner_name) name=fpn_inner_name)
for i in range(1, num_backbone_stages): for i in range(1, num_backbone_stages):
body_name = body_name_list[i] body_name = body_name_list[i]
body_input = body_dict[body_name] body_input = body_dict[body_name]
top_output = self.fpn_inner_output[i - 1] top_output = self.fpn_inner_output[i - 1]
fpn_inner_single = self._add_topdown_lateral(body_name, body_input, top_output) fpn_inner_single = self._add_topdown_lateral(
body_name, body_input, top_output)
self.fpn_inner_output[i] = fpn_inner_single self.fpn_inner_output[i] = fpn_inner_single
fpn_dict = {} fpn_dict = {}
fpn_name_list = [] fpn_name_list = []
...@@ -213,15 +245,24 @@ class FPN(object): ...@@ -213,15 +245,24 @@ class FPN(object):
self.num_chan, self.num_chan,
filter_size=3, filter_size=3,
padding=1, padding=1,
param_attr=ParamAttr(name=fpn_name + "_w", initializer=Xavier(fan_out=fan)), param_attr=ParamAttr(
bias_attr=ParamAttr(name=fpn_name + "_b", learning_rate=2., regularizer=L2Decay(0.)), name=fpn_name + "_w", initializer=Xavier(fan_out=fan)),
bias_attr=ParamAttr(
name=fpn_name + "_b",
learning_rate=2.,
regularizer=L2Decay(0.)),
name=fpn_name) name=fpn_name)
fpn_dict[fpn_name] = fpn_output fpn_dict[fpn_name] = fpn_output
fpn_name_list.append(fpn_name) fpn_name_list.append(fpn_name)
if not self.has_extra_convs and self.max_level - self.min_level == len(spatial_scale): if not self.has_extra_convs and self.max_level - self.min_level == len(
spatial_scale):
body_top_name = fpn_name_list[0] body_top_name = fpn_name_list[0]
body_top_extension = fluid.layers.pool2d( body_top_extension = fluid.layers.pool2d(
fpn_dict[body_top_name], 1, 'max', pool_stride=2, name=body_top_name + '_subsampled_2x') fpn_dict[body_top_name],
1,
'max',
pool_stride=2,
name=body_top_name + '_subsampled_2x')
fpn_dict[body_top_name + '_subsampled_2x'] = body_top_extension fpn_dict[body_top_name + '_subsampled_2x'] = body_top_extension
fpn_name_list.insert(0, body_top_name + '_subsampled_2x') fpn_name_list.insert(0, body_top_name + '_subsampled_2x')
spatial_scale.insert(0, spatial_scale[0] * 0.5) spatial_scale.insert(0, spatial_scale[0] * 0.5)
...@@ -241,8 +282,12 @@ class FPN(object): ...@@ -241,8 +282,12 @@ class FPN(object):
filter_size=3, filter_size=3,
stride=2, stride=2,
padding=1, padding=1,
param_attr=ParamAttr(name=fpn_name + "_w", initializer=Xavier(fan_out=fan)), param_attr=ParamAttr(
bias_attr=ParamAttr(name=fpn_name + "_b", learning_rate=2., regularizer=L2Decay(0.)), name=fpn_name + "_w", initializer=Xavier(fan_out=fan)),
bias_attr=ParamAttr(
name=fpn_name + "_b",
learning_rate=2.,
regularizer=L2Decay(0.)),
name=fpn_name) name=fpn_name)
fpn_dict[fpn_name] = fpn_blob fpn_dict[fpn_name] = fpn_blob
fpn_name_list.insert(0, fpn_name) fpn_name_list.insert(0, fpn_name)
......
...@@ -30,7 +30,7 @@ from faster_rcnn_resnet50_fpn_coco2017.roi_extractor import FPNRoIAlign ...@@ -30,7 +30,7 @@ from faster_rcnn_resnet50_fpn_coco2017.roi_extractor import FPNRoIAlign
@moduleinfo( @moduleinfo(
name="faster_rcnn_resnet50_fpn_coco2017", name="faster_rcnn_resnet50_fpn_coco2017",
version="1.0.0", version="1.0.1",
type="cv/object_detection", type="cv/object_detection",
summary= summary=
"Baidu's Faster-RCNN model for object detection, whose backbone is ResNet50, processed with Feature Pyramid Networks", "Baidu's Faster-RCNN model for object detection, whose backbone is ResNet50, processed with Feature Pyramid Networks",
...@@ -39,8 +39,10 @@ from faster_rcnn_resnet50_fpn_coco2017.roi_extractor import FPNRoIAlign ...@@ -39,8 +39,10 @@ from faster_rcnn_resnet50_fpn_coco2017.roi_extractor import FPNRoIAlign
class FasterRCNNResNet50RPN(hub.Module): class FasterRCNNResNet50RPN(hub.Module):
def _initialize(self): def _initialize(self):
# default pretrained model, Faster-RCNN with backbone ResNet50, shape of input tensor is [3, 800, 1333] # default pretrained model, Faster-RCNN with backbone ResNet50, shape of input tensor is [3, 800, 1333]
self.default_pretrained_model_path = os.path.join(self.directory, "faster_rcnn_resnet50_fpn_model") self.default_pretrained_model_path = os.path.join(
self.label_names = load_label_info(os.path.join(self.directory, "label_file.txt")) self.directory, "faster_rcnn_resnet50_fpn_model")
self.label_names = load_label_info(
os.path.join(self.directory, "label_file.txt"))
self._set_config() self._set_config()
def _set_config(self): def _set_config(self):
...@@ -64,7 +66,11 @@ class FasterRCNNResNet50RPN(hub.Module): ...@@ -64,7 +66,11 @@ class FasterRCNNResNet50RPN(hub.Module):
gpu_config.enable_use_gpu(memory_pool_init_size_mb=500, device_id=0) gpu_config.enable_use_gpu(memory_pool_init_size_mb=500, device_id=0)
self.gpu_predictor = create_paddle_predictor(gpu_config) self.gpu_predictor = create_paddle_predictor(gpu_config)
def context(self, num_classes=81, trainable=True, pretrained=True, phase='train'): def context(self,
num_classes=81,
trainable=True,
pretrained=True,
phase='train'):
""" """
Distill the Head Features, so as to perform transfer learning. Distill the Head Features, so as to perform transfer learning.
...@@ -83,15 +89,26 @@ class FasterRCNNResNet50RPN(hub.Module): ...@@ -83,15 +89,26 @@ class FasterRCNNResNet50RPN(hub.Module):
startup_program = fluid.Program() startup_program = fluid.Program()
with fluid.program_guard(context_prog, startup_program): with fluid.program_guard(context_prog, startup_program):
with fluid.unique_name.guard(): with fluid.unique_name.guard():
image = fluid.layers.data(name='image', shape=[-1, 3, -1, -1], dtype='float32') image = fluid.layers.data(
name='image', shape=[-1, 3, -1, -1], dtype='float32')
# backbone # backbone
backbone = ResNet(norm_type='affine_channel', depth=50, feature_maps=[2, 3, 4, 5], freeze_at=2) backbone = ResNet(
norm_type='affine_channel',
depth=50,
feature_maps=[2, 3, 4, 5],
freeze_at=2)
body_feats = backbone(image) body_feats = backbone(image)
# fpn # fpn
fpn = FPN(max_level=6, min_level=2, num_chan=256, spatial_scale=[0.03125, 0.0625, 0.125, 0.25]) fpn = FPN(
max_level=6,
min_level=2,
num_chan=256,
spatial_scale=[0.03125, 0.0625, 0.125, 0.25])
var_prefix = '@HUB_{}@'.format(self.name) var_prefix = '@HUB_{}@'.format(self.name)
im_info = fluid.layers.data(name='im_info', shape=[3], dtype='float32', lod_level=0) im_info = fluid.layers.data(
im_shape = fluid.layers.data(name='im_shape', shape=[3], dtype='float32', lod_level=0) name='im_info', shape=[3], dtype='float32', lod_level=0)
im_shape = fluid.layers.data(
name='im_shape', shape=[3], dtype='float32', lod_level=0)
body_feat_names = list(body_feats.keys()) body_feat_names = list(body_feats.keys())
body_feats, spatial_scale = fpn.get_output(body_feats) body_feats, spatial_scale = fpn.get_output(body_feats)
# rpn_head: RPNHead # rpn_head: RPNHead
...@@ -99,9 +116,12 @@ class FasterRCNNResNet50RPN(hub.Module): ...@@ -99,9 +116,12 @@ class FasterRCNNResNet50RPN(hub.Module):
rois = rpn_head.get_proposals(body_feats, im_info, mode=phase) rois = rpn_head.get_proposals(body_feats, im_info, mode=phase)
# train # train
if phase == 'train': if phase == 'train':
gt_bbox = fluid.layers.data(name='gt_bbox', shape=[4], dtype='float32', lod_level=1) gt_bbox = fluid.layers.data(
is_crowd = fluid.layers.data(name='is_crowd', shape=[1], dtype='int32', lod_level=1) name='gt_bbox', shape=[4], dtype='float32', lod_level=1)
gt_class = fluid.layers.data(name='gt_class', shape=[1], dtype='int32', lod_level=1) is_crowd = fluid.layers.data(
name='is_crowd', shape=[1], dtype='int32', lod_level=1)
gt_class = fluid.layers.data(
name='gt_class', shape=[1], dtype='int32', lod_level=1)
rpn_loss = rpn_head.get_loss(im_info, gt_bbox, is_crowd) rpn_loss = rpn_head.get_loss(im_info, gt_bbox, is_crowd)
# bbox_assigner: BBoxAssigner # bbox_assigner: BBoxAssigner
bbox_assigner = self.bbox_assigner(num_classes) bbox_assigner = self.bbox_assigner(num_classes)
...@@ -122,7 +142,10 @@ class FasterRCNNResNet50RPN(hub.Module): ...@@ -122,7 +142,10 @@ class FasterRCNNResNet50RPN(hub.Module):
rois = outs[0] rois = outs[0]
roi_extractor = self.roi_extractor() roi_extractor = self.roi_extractor()
roi_feat = roi_extractor(head_inputs=body_feats, rois=rois, spatial_scale=spatial_scale) roi_feat = roi_extractor(
head_inputs=body_feats,
rois=rois,
spatial_scale=spatial_scale)
# head_feat # head_feat
bbox_head = self.bbox_head(num_classes) bbox_head = self.bbox_head(num_classes)
head_feat = bbox_head.head(roi_feat) head_feat = bbox_head.head(roi_feat)
...@@ -138,13 +161,18 @@ class FasterRCNNResNet50RPN(hub.Module): ...@@ -138,13 +161,18 @@ class FasterRCNNResNet50RPN(hub.Module):
'is_crowd': var_prefix + is_crowd.name 'is_crowd': var_prefix + is_crowd.name
} }
outputs = { outputs = {
'head_features': var_prefix + head_feat.name, 'head_features':
'rpn_cls_loss': var_prefix + rpn_loss['rpn_cls_loss'].name, var_prefix + head_feat.name,
'rpn_reg_loss': var_prefix + rpn_loss['rpn_reg_loss'].name, 'rpn_cls_loss':
'generate_proposal_labels': [var_prefix + var.name for var in outs] var_prefix + rpn_loss['rpn_cls_loss'].name,
'rpn_reg_loss':
var_prefix + rpn_loss['rpn_reg_loss'].name,
'generate_proposal_labels':
[var_prefix + var.name for var in outs]
} }
elif phase == 'predict': elif phase == 'predict':
pred = bbox_head.get_prediction(roi_feat, rois, im_info, im_shape) pred = bbox_head.get_prediction(roi_feat, rois, im_info,
im_shape)
inputs = { inputs = {
'image': var_prefix + image.name, 'image': var_prefix + image.name,
'im_info': var_prefix + im_info.name, 'im_info': var_prefix + im_info.name,
...@@ -159,9 +187,13 @@ class FasterRCNNResNet50RPN(hub.Module): ...@@ -159,9 +187,13 @@ class FasterRCNNResNet50RPN(hub.Module):
add_vars_prefix(startup_program, var_prefix) add_vars_prefix(startup_program, var_prefix)
global_vars = context_prog.global_block().vars global_vars = context_prog.global_block().vars
inputs = {key: global_vars[value] for key, value in inputs.items()} inputs = {
key: global_vars[value]
for key, value in inputs.items()
}
outputs = { outputs = {
key: global_vars[value] if not isinstance(value, list) else [global_vars[var] for var in value] key: global_vars[value] if not isinstance(value, list) else
[global_vars[var] for var in value]
for key, value in outputs.items() for key, value in outputs.items()
} }
...@@ -177,9 +209,14 @@ class FasterRCNNResNet50RPN(hub.Module): ...@@ -177,9 +209,14 @@ class FasterRCNNResNet50RPN(hub.Module):
if num_classes != 81: if num_classes != 81:
if 'bbox_pred' in var.name or 'cls_score' in var.name: if 'bbox_pred' in var.name or 'cls_score' in var.name:
return False return False
return os.path.exists(os.path.join(self.default_pretrained_model_path, var.name)) return os.path.exists(
os.path.join(self.default_pretrained_model_path,
fluid.io.load_vars(exe, self.default_pretrained_model_path, predicate=_if_exist) var.name))
fluid.io.load_vars(
exe,
self.default_pretrained_model_path,
predicate=_if_exist)
return inputs, outputs, context_prog return inputs, outputs, context_prog
def rpn_head(self): def rpn_head(self):
...@@ -195,8 +232,16 @@ class FasterRCNNResNet50RPN(hub.Module): ...@@ -195,8 +232,16 @@ class FasterRCNNResNet50RPN(hub.Module):
rpn_negative_overlap=0.3, rpn_negative_overlap=0.3,
rpn_positive_overlap=0.7, rpn_positive_overlap=0.7,
rpn_straddle_thresh=0.0), rpn_straddle_thresh=0.0),
train_proposal=GenerateProposals(min_size=0.0, nms_thresh=0.7, post_nms_top_n=2000, pre_nms_top_n=2000), train_proposal=GenerateProposals(
test_proposal=GenerateProposals(min_size=0.0, nms_thresh=0.7, post_nms_top_n=1000, pre_nms_top_n=1000), min_size=0.0,
nms_thresh=0.7,
post_nms_top_n=2000,
pre_nms_top_n=2000),
test_proposal=GenerateProposals(
min_size=0.0,
nms_thresh=0.7,
post_nms_top_n=1000,
pre_nms_top_n=1000),
anchor_start_size=32, anchor_start_size=32,
num_chan=256, num_chan=256,
min_level=2, min_level=2,
...@@ -204,12 +249,18 @@ class FasterRCNNResNet50RPN(hub.Module): ...@@ -204,12 +249,18 @@ class FasterRCNNResNet50RPN(hub.Module):
def roi_extractor(self): def roi_extractor(self):
return FPNRoIAlign( return FPNRoIAlign(
canconical_level=4, canonical_size=224, max_level=5, min_level=2, box_resolution=7, sampling_ratio=2) canconical_level=4,
canonical_size=224,
max_level=5,
min_level=2,
box_resolution=7,
sampling_ratio=2)
def bbox_head(self, num_classes): def bbox_head(self, num_classes):
return BBoxHead( return BBoxHead(
head=TwoFCHead(mlp_dim=1024), head=TwoFCHead(mlp_dim=1024),
nms=MultiClassNMS(keep_top_k=100, nms_threshold=0.5, score_threshold=0.05), nms=MultiClassNMS(
keep_top_k=100, nms_threshold=0.5, score_threshold=0.05),
num_classes=num_classes) num_classes=num_classes)
def bbox_assigner(self, num_classes): def bbox_assigner(self, num_classes):
...@@ -222,7 +273,11 @@ class FasterRCNNResNet50RPN(hub.Module): ...@@ -222,7 +273,11 @@ class FasterRCNNResNet50RPN(hub.Module):
fg_thresh=0.5, fg_thresh=0.5,
class_nums=num_classes) class_nums=num_classes)
def save_inference_model(self, dirname, model_filename=None, params_filename=None, combined=True): def save_inference_model(self,
dirname,
model_filename=None,
params_filename=None,
combined=True):
if combined: if combined:
model_filename = "__model__" if not model_filename else model_filename model_filename = "__model__" if not model_filename else model_filename
params_filename = "__params__" if not params_filename else params_filename params_filename = "__params__" if not params_filename else params_filename
...@@ -278,7 +333,7 @@ class FasterRCNNResNet50RPN(hub.Module): ...@@ -278,7 +333,7 @@ class FasterRCNNResNet50RPN(hub.Module):
int(_places[0]) int(_places[0])
except: except:
raise RuntimeError( raise RuntimeError(
"Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES as cuda_device_id." "Attempt to use GPU for prediction, but environment variable CUDA_VISIBLE_DEVICES was not set correctly."
) )
paths = paths if paths else list() paths = paths if paths else list()
...@@ -308,7 +363,9 @@ class FasterRCNNResNet50RPN(hub.Module): ...@@ -308,7 +363,9 @@ class FasterRCNNResNet50RPN(hub.Module):
padding_image_tensor = PaddleTensor(padding_image.copy()) padding_image_tensor = PaddleTensor(padding_image.copy())
padding_info_tensor = PaddleTensor(padding_info.copy()) padding_info_tensor = PaddleTensor(padding_info.copy())
padding_shape_tensor = PaddleTensor(padding_shape.copy()) padding_shape_tensor = PaddleTensor(padding_shape.copy())
feed_list = [padding_image_tensor, padding_info_tensor, padding_shape_tensor] feed_list = [
padding_image_tensor, padding_info_tensor, padding_shape_tensor
]
if use_gpu: if use_gpu:
data_out = self.gpu_predictor.run(feed_list) data_out = self.gpu_predictor.run(feed_list)
...@@ -333,17 +390,29 @@ class FasterRCNNResNet50RPN(hub.Module): ...@@ -333,17 +390,29 @@ class FasterRCNNResNet50RPN(hub.Module):
Add the command config options Add the command config options
""" """
self.arg_config_group.add_argument( self.arg_config_group.add_argument(
'--use_gpu', type=ast.literal_eval, default=False, help="whether use GPU or not") '--use_gpu',
type=ast.literal_eval,
default=False,
help="whether use GPU or not")
self.arg_config_group.add_argument('--batch_size', type=int, default=1, help="batch size for prediction") self.arg_config_group.add_argument(
'--batch_size',
type=int,
default=1,
help="batch size for prediction")
def add_module_input_arg(self): def add_module_input_arg(self):
""" """
Add the command input options Add the command input options
""" """
self.arg_input_group.add_argument('--input_path', type=str, default=None, help="input data") self.arg_input_group.add_argument(
'--input_path', type=str, default=None, help="input data")
self.arg_input_group.add_argument('--input_file', type=str, default=None, help="file contain input data") self.arg_input_group.add_argument(
'--input_file',
type=str,
default=None,
help="file contain input data")
def check_input_data(self, args): def check_input_data(self, args):
input_data = [] input_data = []
...@@ -372,9 +441,12 @@ class FasterRCNNResNet50RPN(hub.Module): ...@@ -372,9 +441,12 @@ class FasterRCNNResNet50RPN(hub.Module):
prog="hub run {}".format(self.name), prog="hub run {}".format(self.name),
usage='%(prog)s', usage='%(prog)s',
add_help=True) add_help=True)
self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required") self.arg_input_group = self.parser.add_argument_group(
title="Input options", description="Input data. Required")
self.arg_config_group = self.parser.add_argument_group( self.arg_config_group = self.parser.add_argument_group(
title="Config options", description="Run configuration for controlling module behavior, not required.") title="Config options",
description=
"Run configuration for controlling module behavior, not required.")
self.add_module_config_arg() self.add_module_config_arg()
self.add_module_input_arg() self.add_module_input_arg()
...@@ -386,5 +458,7 @@ class FasterRCNNResNet50RPN(hub.Module): ...@@ -386,5 +458,7 @@ class FasterRCNNResNet50RPN(hub.Module):
else: else:
for image_path in input_data: for image_path in input_data:
if not os.path.exists(image_path): if not os.path.exists(image_path):
raise RuntimeError("File %s or %s is not exist." % image_path) raise RuntimeError(
return self.object_detection(paths=input_data, use_gpu=args.use_gpu, batch_size=args.batch_size) "File %s or %s is not exist." % image_path)
return self.object_detection(
paths=input_data, use_gpu=args.use_gpu, batch_size=args.batch_size)
...@@ -22,7 +22,8 @@ nonlocal_params = { ...@@ -22,7 +22,8 @@ nonlocal_params = {
} }
def space_nonlocal(input, dim_in, dim_out, prefix, dim_inner, max_pool_stride=2): def space_nonlocal(input, dim_in, dim_out, prefix, dim_inner,
max_pool_stride=2):
cur = input cur = input
theta = fluid.layers.conv2d(input = cur, num_filters = dim_inner, \ theta = fluid.layers.conv2d(input = cur, num_filters = dim_inner, \
filter_size = [1, 1], stride = [1, 1], \ filter_size = [1, 1], stride = [1, 1], \
...@@ -82,7 +83,8 @@ def space_nonlocal(input, dim_in, dim_out, prefix, dim_inner, max_pool_stride=2) ...@@ -82,7 +83,8 @@ def space_nonlocal(input, dim_in, dim_out, prefix, dim_inner, max_pool_stride=2)
theta_phi_sc = fluid.layers.scale(theta_phi, scale=dim_inner**-.5) theta_phi_sc = fluid.layers.scale(theta_phi, scale=dim_inner**-.5)
else: else:
theta_phi_sc = theta_phi theta_phi_sc = theta_phi
p = fluid.layers.softmax(theta_phi_sc, name=prefix + '_affinity' + '_prob') p = fluid.layers.softmax(
theta_phi_sc, name=prefix + '_affinity' + '_prob')
else: else:
# not clear about what is doing in xlw's code # not clear about what is doing in xlw's code
p = None # not implemented p = None # not implemented
...@@ -96,7 +98,8 @@ def space_nonlocal(input, dim_in, dim_out, prefix, dim_inner, max_pool_stride=2) ...@@ -96,7 +98,8 @@ def space_nonlocal(input, dim_in, dim_out, prefix, dim_inner, max_pool_stride=2)
# reshape back # reshape back
# e.g. (8, 1024, 784) => (8, 1024, 4, 14, 14) # e.g. (8, 1024, 784) => (8, 1024, 4, 14, 14)
t_shape = t.shape t_shape = t.shape
t_re = fluid.layers.reshape(t, shape=list(theta_shape), actual_shape=theta_shape_op) t_re = fluid.layers.reshape(
t, shape=list(theta_shape), actual_shape=theta_shape_op)
blob_out = t_re blob_out = t_re
blob_out = fluid.layers.conv2d(input = blob_out, num_filters = dim_out, \ blob_out = fluid.layers.conv2d(input = blob_out, num_filters = dim_out, \
filter_size = [1, 1], stride = [1, 1], padding = [0, 0], \ filter_size = [1, 1], stride = [1, 1], padding = [0, 0], \
......
...@@ -19,6 +19,12 @@ def base64_to_cv2(b64str): ...@@ -19,6 +19,12 @@ def base64_to_cv2(b64str):
data = cv2.imdecode(data, cv2.IMREAD_COLOR) data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data return data
def check_dir(dir_path):
if not os.path.exists(dir_path):
os.makedirs(dir_path)
elif os.path.isfile(dir_path):
os.remove(dir_path)
os.makedirs(dir_path)
def get_save_image_name(img, output_dir, image_path): def get_save_image_name(img, output_dir, image_path):
"""Get save image name from source image path. """Get save image name from source image path.
...@@ -48,17 +54,23 @@ def draw_bounding_box_on_image(image_path, data_list, save_dir): ...@@ -48,17 +54,23 @@ def draw_bounding_box_on_image(image_path, data_list, save_dir):
image = Image.open(image_path) image = Image.open(image_path)
draw = ImageDraw.Draw(image) draw = ImageDraw.Draw(image)
for data in data_list: for data in data_list:
left, right, top, bottom = data['left'], data['right'], data['top'], data['bottom'] left, right, top, bottom = data['left'], data['right'], data[
'top'], data['bottom']
# draw bbox # draw bbox
draw.line([(left, top), (left, bottom), (right, bottom), (right, top), (left, top)], width=2, fill='red') draw.line([(left, top), (left, bottom), (right, bottom), (right, top),
(left, top)],
width=2,
fill='red')
# draw label # draw label
if image.mode == 'RGB': if image.mode == 'RGB':
text = data['label'] + ": %.2f%%" % (100 * data['confidence']) text = data['label'] + ": %.2f%%" % (100 * data['confidence'])
textsize_width, textsize_height = draw.textsize(text=text) textsize_width, textsize_height = draw.textsize(text=text)
draw.rectangle( draw.rectangle(
xy=(left, top - (textsize_height + 5), left + textsize_width + 10, top), fill=(255, 255, 255)) xy=(left, top - (textsize_height + 5),
left + textsize_width + 10, top),
fill=(255, 255, 255))
draw.text(xy=(left, top - 15), text=text, fill=(0, 0, 0)) draw.text(xy=(left, top - 15), text=text, fill=(0, 0, 0))
save_name = get_save_image_name(image, save_dir, image_path) save_name = get_save_image_name(image, save_dir, image_path)
...@@ -86,7 +98,14 @@ def load_label_info(file_path): ...@@ -86,7 +98,14 @@ def load_label_info(file_path):
return label_names return label_names
def postprocess(paths, images, data_out, score_thresh, label_names, output_dir, handle_id, visualization=True): def postprocess(paths,
images,
data_out,
score_thresh,
label_names,
output_dir,
handle_id,
visualization=True):
""" """
postprocess the lod_tensor produced by fluid.Executor.run postprocess the lod_tensor produced by fluid.Executor.run
...@@ -115,16 +134,26 @@ def postprocess(paths, images, data_out, score_thresh, label_names, output_dir, ...@@ -115,16 +134,26 @@ def postprocess(paths, images, data_out, score_thresh, label_names, output_dir,
lod = lod_tensor.lod[0] lod = lod_tensor.lod[0]
results = lod_tensor.as_ndarray() results = lod_tensor.as_ndarray()
if handle_id < len(paths): check_dir(output_dir)
unhandled_paths = paths[handle_id:]
unhandled_paths_num = len(unhandled_paths) if paths:
else: assert type(paths) is list, "type(paths) is not list."
unhandled_paths_num = 0 if handle_id < len(paths):
unhandled_paths = paths[handle_id:]
unhandled_paths_num = len(unhandled_paths)
else:
unhandled_paths_num = 0
if images is not None:
if handle_id < len(images):
unhandled_paths = None
unhandled_paths_num = len(images) - handle_id
else:
unhandled_paths_num = 0
output = [] output = []
for index in range(len(lod) - 1): for index in range(len(lod) - 1):
output_i = {'data': []} output_i = {'data': []}
if index < unhandled_paths_num: if unhandled_paths and index < unhandled_paths_num:
org_img_path = unhandled_paths[index] org_img_path = unhandled_paths[index]
org_img = Image.open(org_img_path) org_img = Image.open(org_img_path)
output_i['path'] = org_img_path output_i['path'] = org_img_path
...@@ -133,7 +162,9 @@ def postprocess(paths, images, data_out, score_thresh, label_names, output_dir, ...@@ -133,7 +162,9 @@ def postprocess(paths, images, data_out, score_thresh, label_names, output_dir,
org_img = org_img.astype(np.uint8) org_img = org_img.astype(np.uint8)
org_img = Image.fromarray(org_img[:, :, ::-1]) org_img = Image.fromarray(org_img[:, :, ::-1])
if visualization: if visualization:
org_img_path = get_save_image_name(org_img, output_dir, 'image_numpy_{}'.format((handle_id + index))) org_img_path = get_save_image_name(
org_img, output_dir, 'image_numpy_{}'.format(
(handle_id + index)))
org_img.save(org_img_path) org_img.save(org_img_path)
org_img_height = org_img.height org_img_height = org_img.height
org_img_width = org_img.width org_img_width = org_img.width
...@@ -149,11 +180,13 @@ def postprocess(paths, images, data_out, score_thresh, label_names, output_dir, ...@@ -149,11 +180,13 @@ def postprocess(paths, images, data_out, score_thresh, label_names, output_dir,
dt = {} dt = {}
dt['label'] = label_names[category_id] dt['label'] = label_names[category_id]
dt['confidence'] = float(confidence) dt['confidence'] = float(confidence)
dt['left'], dt['top'], dt['right'], dt['bottom'] = clip_bbox(bbox, org_img_width, org_img_height) dt['left'], dt['top'], dt['right'], dt['bottom'] = clip_bbox(
bbox, org_img_width, org_img_height)
output_i['data'].append(dt) output_i['data'].append(dt)
output.append(output_i) output.append(output_i)
if visualization: if visualization:
output_i['save_path'] = draw_bounding_box_on_image(org_img_path, output_i['data'], output_dir) output_i['save_path'] = draw_bounding_box_on_image(
org_img_path, output_i['data'], output_dir)
return output return output
...@@ -90,7 +90,13 @@ class ResNet(object): ...@@ -90,7 +90,13 @@ class ResNet(object):
self.get_prediction = get_prediction self.get_prediction = get_prediction
self.class_dim = class_dim self.class_dim = class_dim
def _conv_offset(self, input, filter_size, stride, padding, act=None, name=None): def _conv_offset(self,
input,
filter_size,
stride,
padding,
act=None,
name=None):
out_channel = filter_size * filter_size * 3 out_channel = filter_size * filter_size * 3
out = fluid.layers.conv2d( out = fluid.layers.conv2d(
input, input,
...@@ -104,7 +110,15 @@ class ResNet(object): ...@@ -104,7 +110,15 @@ class ResNet(object):
name=name) name=name)
return out return out
def _conv_norm(self, input, num_filters, filter_size, stride=1, groups=1, act=None, name=None, dcn_v2=False): def _conv_norm(self,
input,
num_filters,
filter_size,
stride=1,
groups=1,
act=None,
name=None,
dcn_v2=False):
_name = self.prefix_name + name if self.prefix_name != '' else name _name = self.prefix_name + name if self.prefix_name != '' else name
if not dcn_v2: if not dcn_v2:
conv = fluid.layers.conv2d( conv = fluid.layers.conv2d(
...@@ -129,7 +143,10 @@ class ResNet(object): ...@@ -129,7 +143,10 @@ class ResNet(object):
name=_name + "_conv_offset") name=_name + "_conv_offset")
offset_channel = filter_size**2 * 2 offset_channel = filter_size**2 * 2
mask_channel = filter_size**2 mask_channel = filter_size**2
offset, mask = fluid.layers.split(input=offset_mask, num_or_sections=[offset_channel, mask_channel], dim=1) offset, mask = fluid.layers.split(
input=offset_mask,
num_or_sections=[offset_channel, mask_channel],
dim=1)
mask = fluid.layers.sigmoid(mask) mask = fluid.layers.sigmoid(mask)
conv = fluid.layers.deformable_conv( conv = fluid.layers.deformable_conv(
input=input, input=input,
...@@ -151,8 +168,14 @@ class ResNet(object): ...@@ -151,8 +168,14 @@ class ResNet(object):
norm_lr = 0. if self.freeze_norm else 1. norm_lr = 0. if self.freeze_norm else 1.
norm_decay = self.norm_decay norm_decay = self.norm_decay
pattr = ParamAttr(name=bn_name + '_scale', learning_rate=norm_lr, regularizer=L2Decay(norm_decay)) pattr = ParamAttr(
battr = ParamAttr(name=bn_name + '_offset', learning_rate=norm_lr, regularizer=L2Decay(norm_decay)) name=bn_name + '_scale',
learning_rate=norm_lr,
regularizer=L2Decay(norm_decay))
battr = ParamAttr(
name=bn_name + '_offset',
learning_rate=norm_lr,
regularizer=L2Decay(norm_decay))
if self.norm_type in ['bn', 'sync_bn']: if self.norm_type in ['bn', 'sync_bn']:
global_stats = True if self.freeze_norm else False global_stats = True if self.freeze_norm else False
...@@ -169,10 +192,17 @@ class ResNet(object): ...@@ -169,10 +192,17 @@ class ResNet(object):
bias = fluid.framework._get_var(battr.name) bias = fluid.framework._get_var(battr.name)
elif self.norm_type == 'affine_channel': elif self.norm_type == 'affine_channel':
scale = fluid.layers.create_parameter( scale = fluid.layers.create_parameter(
shape=[conv.shape[1]], dtype=conv.dtype, attr=pattr, default_initializer=fluid.initializer.Constant(1.)) shape=[conv.shape[1]],
dtype=conv.dtype,
attr=pattr,
default_initializer=fluid.initializer.Constant(1.))
bias = fluid.layers.create_parameter( bias = fluid.layers.create_parameter(
shape=[conv.shape[1]], dtype=conv.dtype, attr=battr, default_initializer=fluid.initializer.Constant(0.)) shape=[conv.shape[1]],
out = fluid.layers.affine_channel(x=conv, scale=scale, bias=bias, act=act) dtype=conv.dtype,
attr=battr,
default_initializer=fluid.initializer.Constant(0.))
out = fluid.layers.affine_channel(
x=conv, scale=scale, bias=bias, act=act)
if self.freeze_norm: if self.freeze_norm:
scale.stop_gradient = True scale.stop_gradient = True
bias.stop_gradient = True bias.stop_gradient = True
...@@ -192,13 +222,24 @@ class ResNet(object): ...@@ -192,13 +222,24 @@ class ResNet(object):
return self._conv_norm(input, ch_out, 3, stride, name=name) return self._conv_norm(input, ch_out, 3, stride, name=name)
if max_pooling_in_short_cut and not is_first: if max_pooling_in_short_cut and not is_first:
input = fluid.layers.pool2d( input = fluid.layers.pool2d(
input=input, pool_size=2, pool_stride=2, pool_padding=0, ceil_mode=True, pool_type='avg') input=input,
pool_size=2,
pool_stride=2,
pool_padding=0,
ceil_mode=True,
pool_type='avg')
return self._conv_norm(input, ch_out, 1, 1, name=name) return self._conv_norm(input, ch_out, 1, 1, name=name)
return self._conv_norm(input, ch_out, 1, stride, name=name) return self._conv_norm(input, ch_out, 1, stride, name=name)
else: else:
return input return input
def bottleneck(self, input, num_filters, stride, is_first, name, dcn_v2=False): def bottleneck(self,
input,
num_filters,
stride,
is_first,
name,
dcn_v2=False):
if self.variant == 'a': if self.variant == 'a':
stride1, stride2 = stride, 1 stride1, stride2 = stride, 1
else: else:
...@@ -219,8 +260,9 @@ class ResNet(object): ...@@ -219,8 +260,9 @@ class ResNet(object):
shortcut_name = self.na.fix_bottleneck_name(name) shortcut_name = self.na.fix_bottleneck_name(name)
std_senet = getattr(self, 'std_senet', False) std_senet = getattr(self, 'std_senet', False)
if std_senet: if std_senet:
conv_def = [[int(num_filters / 2), 1, stride1, 'relu', 1, conv_name1], conv_def = [[
[num_filters, 3, stride2, 'relu', groups, conv_name2], int(num_filters / 2), 1, stride1, 'relu', 1, conv_name1
], [num_filters, 3, stride2, 'relu', groups, conv_name2],
[num_filters * expand, 1, 1, None, 1, conv_name3]] [num_filters * expand, 1, 1, None, 1, conv_name3]]
else: else:
conv_def = [[num_filters, 1, stride1, 'relu', 1, conv_name1], conv_def = [[num_filters, 1, stride1, 'relu', 1, conv_name1],
...@@ -238,18 +280,42 @@ class ResNet(object): ...@@ -238,18 +280,42 @@ class ResNet(object):
groups=g, groups=g,
name=_name, name=_name,
dcn_v2=(i == 1 and dcn_v2)) dcn_v2=(i == 1 and dcn_v2))
short = self._shortcut(input, num_filters * expand, stride, is_first=is_first, name=shortcut_name) short = self._shortcut(
input,
num_filters * expand,
stride,
is_first=is_first,
name=shortcut_name)
# Squeeze-and-Excitation # Squeeze-and-Excitation
if callable(getattr(self, '_squeeze_excitation', None)): if callable(getattr(self, '_squeeze_excitation', None)):
residual = self._squeeze_excitation(input=residual, num_channels=num_filters, name='fc' + name) residual = self._squeeze_excitation(
return fluid.layers.elementwise_add(x=short, y=residual, act='relu', name=name + ".add.output.5") input=residual, num_channels=num_filters, name='fc' + name)
return fluid.layers.elementwise_add(
def basicblock(self, input, num_filters, stride, is_first, name, dcn_v2=False): x=short, y=residual, act='relu', name=name + ".add.output.5")
def basicblock(self,
input,
num_filters,
stride,
is_first,
name,
dcn_v2=False):
assert dcn_v2 is False, "Not implemented yet." assert dcn_v2 is False, "Not implemented yet."
conv0 = self._conv_norm( conv0 = self._conv_norm(
input=input, num_filters=num_filters, filter_size=3, act='relu', stride=stride, name=name + "_branch2a") input=input,
conv1 = self._conv_norm(input=conv0, num_filters=num_filters, filter_size=3, act=None, name=name + "_branch2b") num_filters=num_filters,
short = self._shortcut(input, num_filters, stride, is_first, name=name + "_branch1") filter_size=3,
act='relu',
stride=stride,
name=name + "_branch2a")
conv1 = self._conv_norm(
input=conv0,
num_filters=num_filters,
filter_size=3,
act=None,
name=name + "_branch2b")
short = self._shortcut(
input, num_filters, stride, is_first, name=name + "_branch1")
return fluid.layers.elementwise_add(x=short, y=conv1, act='relu') return fluid.layers.elementwise_add(x=short, y=conv1, act='relu')
def layer_warp(self, input, stage_num): def layer_warp(self, input, stage_num):
...@@ -272,7 +338,8 @@ class ResNet(object): ...@@ -272,7 +338,8 @@ class ResNet(object):
nonlocal_mod = 1000 nonlocal_mod = 1000
if stage_num in self.nonlocal_stages: if stage_num in self.nonlocal_stages:
nonlocal_mod = self.nonlocal_mod_cfg[self.depth] if stage_num == 4 else 2 nonlocal_mod = self.nonlocal_mod_cfg[
self.depth] if stage_num == 4 else 2
# Make the layer name and parameter name consistent # Make the layer name and parameter name consistent
# with ImageNet pre-trained model # with ImageNet pre-trained model
...@@ -293,7 +360,9 @@ class ResNet(object): ...@@ -293,7 +360,9 @@ class ResNet(object):
dim_in = conv.shape[1] dim_in = conv.shape[1]
nonlocal_name = "nonlocal_conv{}".format(stage_num) nonlocal_name = "nonlocal_conv{}".format(stage_num)
if i % nonlocal_mod == nonlocal_mod - 1: if i % nonlocal_mod == nonlocal_mod - 1:
conv = add_space_nonlocal(conv, dim_in, dim_in, nonlocal_name + '_{}'.format(i), int(dim_in / 2)) conv = add_space_nonlocal(conv, dim_in, dim_in,
nonlocal_name + '_{}'.format(i),
int(dim_in / 2))
return conv return conv
def c1_stage(self, input): def c1_stage(self, input):
...@@ -311,9 +380,20 @@ class ResNet(object): ...@@ -311,9 +380,20 @@ class ResNet(object):
conv_def = [[out_chan, 7, 2, conv1_name]] conv_def = [[out_chan, 7, 2, conv1_name]]
for (c, k, s, _name) in conv_def: for (c, k, s, _name) in conv_def:
input = self._conv_norm(input=input, num_filters=c, filter_size=k, stride=s, act='relu', name=_name) input = self._conv_norm(
input=input,
output = fluid.layers.pool2d(input=input, pool_size=3, pool_stride=2, pool_padding=1, pool_type='max') num_filters=c,
filter_size=k,
stride=s,
act='relu',
name=_name)
output = fluid.layers.pool2d(
input=input,
pool_size=3,
pool_stride=2,
pool_padding=1,
pool_type='max')
return output return output
def __call__(self, input): def __call__(self, input):
...@@ -337,17 +417,19 @@ class ResNet(object): ...@@ -337,17 +417,19 @@ class ResNet(object):
if self.freeze_at >= i: if self.freeze_at >= i:
res.stop_gradient = True res.stop_gradient = True
if self.get_prediction: if self.get_prediction:
pool = fluid.layers.pool2d(input=res, pool_type='avg', global_pooling=True) pool = fluid.layers.pool2d(
input=res, pool_type='avg', global_pooling=True)
stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0) stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
out = fluid.layers.fc( out = fluid.layers.fc(
input=pool, input=pool,
size=self.class_dim, size=self.class_dim,
param_attr=fluid.param_attr.ParamAttr(initializer=fluid.initializer.Uniform(-stdv, stdv))) param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, stdv)))
out = fluid.layers.softmax(out) out = fluid.layers.softmax(out)
return out return out
return OrderedDict( return OrderedDict([('res{}_sum'.format(self.feature_maps[idx]), feat)
[('res{}_sum'.format(self.feature_maps[idx]), feat) for idx, feat in enumerate(res_endpoints)]) for idx, feat in enumerate(res_endpoints)])
class ResNetC5(ResNet): class ResNetC5(ResNet):
...@@ -360,5 +442,6 @@ class ResNetC5(ResNet): ...@@ -360,5 +442,6 @@ class ResNetC5(ResNet):
variant='b', variant='b',
feature_maps=[5], feature_maps=[5],
weight_prefix_name=''): weight_prefix_name=''):
super(ResNetC5, self).__init__(depth, freeze_at, norm_type, freeze_norm, norm_decay, variant, feature_maps) super(ResNetC5, self).__init__(depth, freeze_at, norm_type, freeze_norm,
norm_decay, variant, feature_maps)
self.severed_head = True self.severed_head = True
...@@ -51,8 +51,8 @@ class FPNRoIAlign(object): ...@@ -51,8 +51,8 @@ class FPNRoIAlign(object):
name_list = list(head_inputs.keys()) name_list = list(head_inputs.keys())
input_name_list = name_list[-num_roi_lvls:] input_name_list = name_list[-num_roi_lvls:]
spatial_scale = spatial_scale[-num_roi_lvls:] spatial_scale = spatial_scale[-num_roi_lvls:]
rois_dist, restore_index = fluid.layers.distribute_fpn_proposals(rois, k_min, k_max, self.canconical_level, rois_dist, restore_index = fluid.layers.distribute_fpn_proposals(
self.canonical_size) rois, k_min, k_max, self.canconical_level, self.canonical_size)
# rois_dist is in ascend order # rois_dist is in ascend order
roi_out_list = [] roi_out_list = []
resolution = is_mask and self.mask_resolution or self.box_resolution resolution = is_mask and self.mask_resolution or self.box_resolution
......
...@@ -8,7 +8,10 @@ from paddle.fluid.param_attr import ParamAttr ...@@ -8,7 +8,10 @@ from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.initializer import Normal from paddle.fluid.initializer import Normal
from paddle.fluid.regularizer import L2Decay from paddle.fluid.regularizer import L2Decay
__all__ = ['AnchorGenerator', 'RPNTargetAssign', 'GenerateProposals', 'RPNHead', 'FPNRPNHead'] __all__ = [
'AnchorGenerator', 'RPNTargetAssign', 'GenerateProposals', 'RPNHead',
'FPNRPNHead'
]
class AnchorGenerator(object): class AnchorGenerator(object):
...@@ -45,7 +48,12 @@ class RPNTargetAssign(object): ...@@ -45,7 +48,12 @@ class RPNTargetAssign(object):
class GenerateProposals(object): class GenerateProposals(object):
# __op__ = fluid.layers.generate_proposals # __op__ = fluid.layers.generate_proposals
def __init__(self, pre_nms_top_n=6000, post_nms_top_n=1000, nms_thresh=.5, min_size=.1, eta=1.): def __init__(self,
pre_nms_top_n=6000,
post_nms_top_n=1000,
nms_thresh=.5,
min_size=.1,
eta=1.):
super(GenerateProposals, self).__init__() super(GenerateProposals, self).__init__()
self.pre_nms_top_n = pre_nms_top_n self.pre_nms_top_n = pre_nms_top_n
self.post_nms_top_n = post_nms_top_n self.post_nms_top_n = post_nms_top_n
...@@ -65,9 +73,17 @@ class RPNHead(object): ...@@ -65,9 +73,17 @@ class RPNHead(object):
test_proposal (object): `GenerateProposals` instance for testing test_proposal (object): `GenerateProposals` instance for testing
num_classes (int): number of classes in rpn output num_classes (int): number of classes in rpn output
""" """
__inject__ = ['anchor_generator', 'rpn_target_assign', 'train_proposal', 'test_proposal'] __inject__ = [
'anchor_generator', 'rpn_target_assign', 'train_proposal',
'test_proposal'
]
def __init__(self, anchor_generator, rpn_target_assign, train_proposal, test_proposal, num_classes=1): def __init__(self,
anchor_generator,
rpn_target_assign,
train_proposal,
test_proposal,
num_classes=1):
super(RPNHead, self).__init__() super(RPNHead, self).__init__()
self.anchor_generator = anchor_generator self.anchor_generator = anchor_generator
self.rpn_target_assign = rpn_target_assign self.rpn_target_assign = rpn_target_assign
...@@ -95,8 +111,10 @@ class RPNHead(object): ...@@ -95,8 +111,10 @@ class RPNHead(object):
padding=1, padding=1,
act='relu', act='relu',
name='conv_rpn', name='conv_rpn',
param_attr=ParamAttr(name="conv_rpn_w", initializer=Normal(loc=0., scale=0.01)), param_attr=ParamAttr(
bias_attr=ParamAttr(name="conv_rpn_b", learning_rate=2., regularizer=L2Decay(0.))) name="conv_rpn_w", initializer=Normal(loc=0., scale=0.01)),
bias_attr=ParamAttr(
name="conv_rpn_b", learning_rate=2., regularizer=L2Decay(0.)))
# Generate anchors self.anchor_generator # Generate anchors self.anchor_generator
self.anchor, self.anchor_var = fluid.layers.anchor_generator( self.anchor, self.anchor_var = fluid.layers.anchor_generator(
input=rpn_conv, input=rpn_conv,
...@@ -115,8 +133,13 @@ class RPNHead(object): ...@@ -115,8 +133,13 @@ class RPNHead(object):
padding=0, padding=0,
act=None, act=None,
name='rpn_cls_score', name='rpn_cls_score',
param_attr=ParamAttr(name="rpn_cls_logits_w", initializer=Normal(loc=0., scale=0.01)), param_attr=ParamAttr(
bias_attr=ParamAttr(name="rpn_cls_logits_b", learning_rate=2., regularizer=L2Decay(0.))) name="rpn_cls_logits_w", initializer=Normal(loc=0.,
scale=0.01)),
bias_attr=ParamAttr(
name="rpn_cls_logits_b",
learning_rate=2.,
regularizer=L2Decay(0.)))
# Proposal bbox regression deltas # Proposal bbox regression deltas
self.rpn_bbox_pred = fluid.layers.conv2d( self.rpn_bbox_pred = fluid.layers.conv2d(
rpn_conv, rpn_conv,
...@@ -126,8 +149,12 @@ class RPNHead(object): ...@@ -126,8 +149,12 @@ class RPNHead(object):
padding=0, padding=0,
act=None, act=None,
name='rpn_bbox_pred', name='rpn_bbox_pred',
param_attr=ParamAttr(name="rpn_bbox_pred_w", initializer=Normal(loc=0., scale=0.01)), param_attr=ParamAttr(
bias_attr=ParamAttr(name="rpn_bbox_pred_b", learning_rate=2., regularizer=L2Decay(0.))) name="rpn_bbox_pred_w", initializer=Normal(loc=0., scale=0.01)),
bias_attr=ParamAttr(
name="rpn_bbox_pred_b",
learning_rate=2.,
regularizer=L2Decay(0.)))
return self.rpn_cls_score, self.rpn_bbox_pred return self.rpn_cls_score, self.rpn_bbox_pred
def get_proposals(self, body_feats, im_info, mode='train'): def get_proposals(self, body_feats, im_info, mode='train'):
...@@ -150,15 +177,22 @@ class RPNHead(object): ...@@ -150,15 +177,22 @@ class RPNHead(object):
rpn_cls_score, rpn_bbox_pred = self._get_output(body_feat) rpn_cls_score, rpn_bbox_pred = self._get_output(body_feat)
if self.num_classes == 1: if self.num_classes == 1:
rpn_cls_prob = fluid.layers.sigmoid(rpn_cls_score, name='rpn_cls_prob') rpn_cls_prob = fluid.layers.sigmoid(
rpn_cls_score, name='rpn_cls_prob')
else: else:
rpn_cls_score = fluid.layers.transpose(rpn_cls_score, perm=[0, 2, 3, 1]) rpn_cls_score = fluid.layers.transpose(
rpn_cls_score = fluid.layers.reshape(rpn_cls_score, shape=(0, 0, 0, -1, self.num_classes)) rpn_cls_score, perm=[0, 2, 3, 1])
rpn_cls_prob_tmp = fluid.layers.softmax(rpn_cls_score, use_cudnn=False, name='rpn_cls_prob') rpn_cls_score = fluid.layers.reshape(
rpn_cls_prob_slice = fluid.layers.slice(rpn_cls_prob_tmp, axes=[4], starts=[1], ends=[self.num_classes]) rpn_cls_score, shape=(0, 0, 0, -1, self.num_classes))
rpn_cls_prob_tmp = fluid.layers.softmax(
rpn_cls_score, use_cudnn=False, name='rpn_cls_prob')
rpn_cls_prob_slice = fluid.layers.slice(
rpn_cls_prob_tmp, axes=[4], starts=[1], ends=[self.num_classes])
rpn_cls_prob, _ = fluid.layers.topk(rpn_cls_prob_slice, 1) rpn_cls_prob, _ = fluid.layers.topk(rpn_cls_prob_slice, 1)
rpn_cls_prob = fluid.layers.reshape(rpn_cls_prob, shape=(0, 0, 0, -1)) rpn_cls_prob = fluid.layers.reshape(
rpn_cls_prob = fluid.layers.transpose(rpn_cls_prob, perm=[0, 3, 1, 2]) rpn_cls_prob, shape=(0, 0, 0, -1))
rpn_cls_prob = fluid.layers.transpose(
rpn_cls_prob, perm=[0, 3, 1, 2])
prop_op = self.train_proposal if mode == 'train' else self.test_proposal prop_op = self.train_proposal if mode == 'train' else self.test_proposal
# prop_op # prop_op
rpn_rois, rpn_roi_probs = fluid.layers.generate_proposals( rpn_rois, rpn_roi_probs = fluid.layers.generate_proposals(
...@@ -174,20 +208,24 @@ class RPNHead(object): ...@@ -174,20 +208,24 @@ class RPNHead(object):
eta=prop_op.eta) eta=prop_op.eta)
return rpn_rois return rpn_rois
def _transform_input(self, rpn_cls_score, rpn_bbox_pred, anchor, anchor_var): def _transform_input(self, rpn_cls_score, rpn_bbox_pred, anchor,
anchor_var):
rpn_cls_score = fluid.layers.transpose(rpn_cls_score, perm=[0, 2, 3, 1]) rpn_cls_score = fluid.layers.transpose(rpn_cls_score, perm=[0, 2, 3, 1])
rpn_bbox_pred = fluid.layers.transpose(rpn_bbox_pred, perm=[0, 2, 3, 1]) rpn_bbox_pred = fluid.layers.transpose(rpn_bbox_pred, perm=[0, 2, 3, 1])
anchor = fluid.layers.reshape(anchor, shape=(-1, 4)) anchor = fluid.layers.reshape(anchor, shape=(-1, 4))
anchor_var = fluid.layers.reshape(anchor_var, shape=(-1, 4)) anchor_var = fluid.layers.reshape(anchor_var, shape=(-1, 4))
rpn_cls_score = fluid.layers.reshape(x=rpn_cls_score, shape=(0, -1, self.num_classes)) rpn_cls_score = fluid.layers.reshape(
x=rpn_cls_score, shape=(0, -1, self.num_classes))
rpn_bbox_pred = fluid.layers.reshape(x=rpn_bbox_pred, shape=(0, -1, 4)) rpn_bbox_pred = fluid.layers.reshape(x=rpn_bbox_pred, shape=(0, -1, 4))
return rpn_cls_score, rpn_bbox_pred, anchor, anchor_var return rpn_cls_score, rpn_bbox_pred, anchor, anchor_var
def _get_loss_input(self): def _get_loss_input(self):
for attr in ['rpn_cls_score', 'rpn_bbox_pred', 'anchor', 'anchor_var']: for attr in ['rpn_cls_score', 'rpn_bbox_pred', 'anchor', 'anchor_var']:
if not getattr(self, attr, None): if not getattr(self, attr, None):
raise ValueError("self.{} should not be None,".format(attr), "call RPNHead.get_proposals first") raise ValueError("self.{} should not be None,".format(attr),
return self._transform_input(self.rpn_cls_score, self.rpn_bbox_pred, self.anchor, self.anchor_var) "call RPNHead.get_proposals first")
return self._transform_input(self.rpn_cls_score, self.rpn_bbox_pred,
self.anchor, self.anchor_var)
def get_loss(self, im_info, gt_box, is_crowd, gt_label=None): def get_loss(self, im_info, gt_box, is_crowd, gt_label=None):
""" """
...@@ -227,7 +265,8 @@ class RPNHead(object): ...@@ -227,7 +265,8 @@ class RPNHead(object):
use_random=self.rpn_target_assign.use_random) use_random=self.rpn_target_assign.use_random)
score_tgt = fluid.layers.cast(x=score_tgt, dtype='float32') score_tgt = fluid.layers.cast(x=score_tgt, dtype='float32')
score_tgt.stop_gradient = True score_tgt.stop_gradient = True
rpn_cls_loss = fluid.layers.sigmoid_cross_entropy_with_logits(x=score_pred, label=score_tgt) rpn_cls_loss = fluid.layers.sigmoid_cross_entropy_with_logits(
x=score_pred, label=score_tgt)
else: else:
score_pred, loc_pred, score_tgt, loc_tgt, bbox_weight = \ score_pred, loc_pred, score_tgt, loc_tgt, bbox_weight = \
self.rpn_target_assign( self.rpn_target_assign(
...@@ -245,13 +284,19 @@ class RPNHead(object): ...@@ -245,13 +284,19 @@ class RPNHead(object):
rpn_cls_loss = fluid.layers.softmax_with_cross_entropy( rpn_cls_loss = fluid.layers.softmax_with_cross_entropy(
logits=score_pred, label=labels_int64, numeric_stable_mode=True) logits=score_pred, label=labels_int64, numeric_stable_mode=True)
rpn_cls_loss = fluid.layers.reduce_mean(rpn_cls_loss, name='loss_rpn_cls') rpn_cls_loss = fluid.layers.reduce_mean(
rpn_cls_loss, name='loss_rpn_cls')
loc_tgt = fluid.layers.cast(x=loc_tgt, dtype='float32') loc_tgt = fluid.layers.cast(x=loc_tgt, dtype='float32')
loc_tgt.stop_gradient = True loc_tgt.stop_gradient = True
rpn_reg_loss = fluid.layers.smooth_l1( rpn_reg_loss = fluid.layers.smooth_l1(
x=loc_pred, y=loc_tgt, sigma=3.0, inside_weight=bbox_weight, outside_weight=bbox_weight) x=loc_pred,
rpn_reg_loss = fluid.layers.reduce_sum(rpn_reg_loss, name='loss_rpn_bbox') y=loc_tgt,
sigma=3.0,
inside_weight=bbox_weight,
outside_weight=bbox_weight)
rpn_reg_loss = fluid.layers.reduce_sum(
rpn_reg_loss, name='loss_rpn_bbox')
score_shape = fluid.layers.shape(score_tgt) score_shape = fluid.layers.shape(score_tgt)
score_shape = fluid.layers.cast(x=score_shape, dtype='float32') score_shape = fluid.layers.cast(x=score_shape, dtype='float32')
norm = fluid.layers.reduce_prod(score_shape) norm = fluid.layers.reduce_prod(score_shape)
...@@ -286,7 +331,8 @@ class FPNRPNHead(RPNHead): ...@@ -286,7 +331,8 @@ class FPNRPNHead(RPNHead):
min_level=2, min_level=2,
max_level=6, max_level=6,
num_classes=1): num_classes=1):
super(FPNRPNHead, self).__init__(anchor_generator, rpn_target_assign, train_proposal, test_proposal) super(FPNRPNHead, self).__init__(anchor_generator, rpn_target_assign,
train_proposal, test_proposal)
self.anchor_start_size = anchor_start_size self.anchor_start_size = anchor_start_size
self.num_chan = num_chan self.num_chan = num_chan
self.min_level = min_level self.min_level = min_level
...@@ -328,13 +374,19 @@ class FPNRPNHead(RPNHead): ...@@ -328,13 +374,19 @@ class FPNRPNHead(RPNHead):
padding=1, padding=1,
act='relu', act='relu',
name=conv_name, name=conv_name,
param_attr=ParamAttr(name=conv_share_name + '_w', initializer=Normal(loc=0., scale=0.01)), param_attr=ParamAttr(
bias_attr=ParamAttr(name=conv_share_name + '_b', learning_rate=2., regularizer=L2Decay(0.))) name=conv_share_name + '_w',
initializer=Normal(loc=0., scale=0.01)),
bias_attr=ParamAttr(
name=conv_share_name + '_b',
learning_rate=2.,
regularizer=L2Decay(0.)))
# self.anchor_generator # self.anchor_generator
self.anchors, self.anchor_var = fluid.layers.anchor_generator( self.anchors, self.anchor_var = fluid.layers.anchor_generator(
input=conv_rpn_fpn, input=conv_rpn_fpn,
anchor_sizes=(self.anchor_start_size * 2.**(feat_lvl - self.min_level), ), anchor_sizes=(self.anchor_start_size * 2.**
(feat_lvl - self.min_level), ),
stride=(2.**feat_lvl, 2.**feat_lvl), stride=(2.**feat_lvl, 2.**feat_lvl),
aspect_ratios=self.anchor_generator.aspect_ratios, aspect_ratios=self.anchor_generator.aspect_ratios,
variance=self.anchor_generator.variance) variance=self.anchor_generator.variance)
...@@ -346,16 +398,26 @@ class FPNRPNHead(RPNHead): ...@@ -346,16 +398,26 @@ class FPNRPNHead(RPNHead):
filter_size=1, filter_size=1,
act=None, act=None,
name=cls_name, name=cls_name,
param_attr=ParamAttr(name=cls_share_name + '_w', initializer=Normal(loc=0., scale=0.01)), param_attr=ParamAttr(
bias_attr=ParamAttr(name=cls_share_name + '_b', learning_rate=2., regularizer=L2Decay(0.))) name=cls_share_name + '_w',
initializer=Normal(loc=0., scale=0.01)),
bias_attr=ParamAttr(
name=cls_share_name + '_b',
learning_rate=2.,
regularizer=L2Decay(0.)))
self.rpn_bbox_pred = fluid.layers.conv2d( self.rpn_bbox_pred = fluid.layers.conv2d(
input=conv_rpn_fpn, input=conv_rpn_fpn,
num_filters=num_anchors * 4, num_filters=num_anchors * 4,
filter_size=1, filter_size=1,
act=None, act=None,
name=bbox_name, name=bbox_name,
param_attr=ParamAttr(name=bbox_share_name + '_w', initializer=Normal(loc=0., scale=0.01)), param_attr=ParamAttr(
bias_attr=ParamAttr(name=bbox_share_name + '_b', learning_rate=2., regularizer=L2Decay(0.))) name=bbox_share_name + '_w',
initializer=Normal(loc=0., scale=0.01)),
bias_attr=ParamAttr(
name=bbox_share_name + '_b',
learning_rate=2.,
regularizer=L2Decay(0.)))
return self.rpn_cls_score, self.rpn_bbox_pred return self.rpn_cls_score, self.rpn_bbox_pred
def _get_single_proposals(self, body_feat, im_info, feat_lvl, mode='train'): def _get_single_proposals(self, body_feat, im_info, feat_lvl, mode='train'):
...@@ -375,20 +437,29 @@ class FPNRPNHead(RPNHead): ...@@ -375,20 +437,29 @@ class FPNRPNHead(RPNHead):
shape of (rois_num, 1). shape of (rois_num, 1).
""" """
rpn_cls_score_fpn, rpn_bbox_pred_fpn = self._get_output(body_feat, feat_lvl) rpn_cls_score_fpn, rpn_bbox_pred_fpn = self._get_output(
body_feat, feat_lvl)
prop_op = self.train_proposal if mode == 'train' else self.test_proposal prop_op = self.train_proposal if mode == 'train' else self.test_proposal
if self.num_classes == 1: if self.num_classes == 1:
rpn_cls_prob_fpn = fluid.layers.sigmoid(rpn_cls_score_fpn, name='rpn_cls_prob_fpn' + str(feat_lvl)) rpn_cls_prob_fpn = fluid.layers.sigmoid(
rpn_cls_score_fpn, name='rpn_cls_prob_fpn' + str(feat_lvl))
else: else:
rpn_cls_score_fpn = fluid.layers.transpose(rpn_cls_score_fpn, perm=[0, 2, 3, 1]) rpn_cls_score_fpn = fluid.layers.transpose(
rpn_cls_score_fpn = fluid.layers.reshape(rpn_cls_score_fpn, shape=(0, 0, 0, -1, self.num_classes)) rpn_cls_score_fpn, perm=[0, 2, 3, 1])
rpn_cls_score_fpn = fluid.layers.reshape(
rpn_cls_score_fpn, shape=(0, 0, 0, -1, self.num_classes))
rpn_cls_prob_fpn = fluid.layers.softmax( rpn_cls_prob_fpn = fluid.layers.softmax(
rpn_cls_score_fpn, use_cudnn=False, name='rpn_cls_prob_fpn' + str(feat_lvl)) rpn_cls_score_fpn,
rpn_cls_prob_fpn = fluid.layers.slice(rpn_cls_prob_fpn, axes=[4], starts=[1], ends=[self.num_classes]) use_cudnn=False,
name='rpn_cls_prob_fpn' + str(feat_lvl))
rpn_cls_prob_fpn = fluid.layers.slice(
rpn_cls_prob_fpn, axes=[4], starts=[1], ends=[self.num_classes])
rpn_cls_prob_fpn, _ = fluid.layers.topk(rpn_cls_prob_fpn, 1) rpn_cls_prob_fpn, _ = fluid.layers.topk(rpn_cls_prob_fpn, 1)
rpn_cls_prob_fpn = fluid.layers.reshape(rpn_cls_prob_fpn, shape=(0, 0, 0, -1)) rpn_cls_prob_fpn = fluid.layers.reshape(
rpn_cls_prob_fpn = fluid.layers.transpose(rpn_cls_prob_fpn, perm=[0, 3, 1, 2]) rpn_cls_prob_fpn, shape=(0, 0, 0, -1))
rpn_cls_prob_fpn = fluid.layers.transpose(
rpn_cls_prob_fpn, perm=[0, 3, 1, 2])
# prop_op # prop_op
rpn_rois_fpn, rpn_roi_prob_fpn = fluid.layers.generate_proposals( rpn_rois_fpn, rpn_roi_prob_fpn = fluid.layers.generate_proposals(
scores=rpn_cls_prob_fpn, scores=rpn_cls_prob_fpn,
...@@ -423,7 +494,8 @@ class FPNRPNHead(RPNHead): ...@@ -423,7 +494,8 @@ class FPNRPNHead(RPNHead):
for lvl in range(self.min_level, self.max_level + 1): for lvl in range(self.min_level, self.max_level + 1):
fpn_feat_name = fpn_feat_names[self.max_level - lvl] fpn_feat_name = fpn_feat_names[self.max_level - lvl]
fpn_feat = fpn_feats[fpn_feat_name] fpn_feat = fpn_feats[fpn_feat_name]
rois_fpn, roi_probs_fpn = self._get_single_proposals(fpn_feat, im_info, lvl, mode) rois_fpn, roi_probs_fpn = self._get_single_proposals(
fpn_feat, im_info, lvl, mode)
self.fpn_rpn_list.append((self.rpn_cls_score, self.rpn_bbox_pred)) self.fpn_rpn_list.append((self.rpn_cls_score, self.rpn_bbox_pred))
rois_list.append(rois_fpn) rois_list.append(rois_fpn)
roi_probs_list.append(roi_probs_fpn) roi_probs_list.append(roi_probs_fpn)
...@@ -432,7 +504,12 @@ class FPNRPNHead(RPNHead): ...@@ -432,7 +504,12 @@ class FPNRPNHead(RPNHead):
prop_op = self.train_proposal if mode == 'train' else self.test_proposal prop_op = self.train_proposal if mode == 'train' else self.test_proposal
post_nms_top_n = prop_op.post_nms_top_n post_nms_top_n = prop_op.post_nms_top_n
rois_collect = fluid.layers.collect_fpn_proposals( rois_collect = fluid.layers.collect_fpn_proposals(
rois_list, roi_probs_list, self.min_level, self.max_level, post_nms_top_n, name='collect') rois_list,
roi_probs_list,
self.min_level,
self.max_level,
post_nms_top_n,
name='collect')
return rois_collect return rois_collect
def _get_loss_input(self): def _get_loss_input(self):
...@@ -441,8 +518,9 @@ class FPNRPNHead(RPNHead): ...@@ -441,8 +518,9 @@ class FPNRPNHead(RPNHead):
anchors = [] anchors = []
anchor_vars = [] anchor_vars = []
for i in range(len(self.fpn_rpn_list)): for i in range(len(self.fpn_rpn_list)):
single_input = self._transform_input(self.fpn_rpn_list[i][0], self.fpn_rpn_list[i][1], self.anchors_list[i], single_input = self._transform_input(
self.anchor_var_list[i]) self.fpn_rpn_list[i][0], self.fpn_rpn_list[i][1],
self.anchors_list[i], self.anchor_var_list[i])
rpn_clses.append(single_input[0]) rpn_clses.append(single_input[0])
rpn_bboxes.append(single_input[1]) rpn_bboxes.append(single_input[1])
anchors.append(single_input[2]) anchors.append(single_input[2])
......
## 命令行预测 # faster_rcnn_resnet50_fpn_venus
```shell |模型名称|faster_rcnn_resnet50_fpn_venus|
$ hub run faster_rcnn_resnet50_fpn_venus --input_path "/PATH/TO/IMAGE" | :--- | :---: |
``` |类别|图像 - 目标检测|
|网络|faster_rcnn|
## API |数据集|百度自建数据集|
|是否支持Fine-tuning|是|
```python |模型大小|317MB|
def context(num_classes=81, |最新更新日期|2021-02-26|
trainable=True, |数据指标|-|
pretrained=True,
phase='train')
``` ## 一、模型基本信息
提取特征,用于迁移学习。 - ### 模型介绍
**参数** - Faster_RCNN是两阶段目标检测器,对图像生成候选区域、提取特征、判别特征类别并修正候选框位置。Faster_RCNN整体网络可以分为4个部分,一是ResNet-50作为基础卷积层,二是区域生成网络,三是Rol Align,四是检测层。该PaddleHub Module是由800+tag,170w图片,1000w+检测框训练的大规模通用检测模型,在8个数据集上MAP平均提升2.06%,iou=0.5的准确率平均提升1.78%。对比于其他通用检测模型,使用该Module进行finetune,可以更快收敛,达到较优效果。
* num\_classes (int): 类别数;
* trainable(bool): 参数是否可训练; ## 二、安装
* pretrained (bool): 是否加载预训练模型;
* phase (str): 可选值为 'train'/'predict','trian' 用于训练,'predict' 用于预测。 - ### 1、环境依赖
**返回** - paddlepaddle >= 1.6.2
* inputs (dict): 模型的输入,相应的取值为: - paddlehub >= 1.6.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
当 phase 为 'train'时,包含:
* image (Variable): 图像变量 - ### 2、安装
* im\_size (Variable): 图像的尺寸
* im\_info (Variable): 图像缩放信息 - ```shell
* gt\_class (Variable): 检测框类别 $ hub install faster_rcnn_resnet50_fpn_venus
* gt\_box (Variable): 检测框坐标 ```
* is\_crowd (Variable): 单个框内是否包含多个物体 - 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
当 phase 为 'predict'时,包含: | [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
* image (Variable): 图像变量
* im\_size (Variable): 图像的尺寸 ## 三、模型API预测
* im\_info (Variable): 图像缩放信息
* outputs (dict): 模型的输出,相应的取值为: - ### 1、API
当 phase 为 'train'时,包含:
* head_features (Variable): 所提取的特征 - ```python
* rpn\_cls\_loss (Variable): 检测框分类损失 def context(num_classes=81,
* rpn\_reg\_loss (Variable): 检测框回归损失 trainable=True,
* generate\_proposal\_labels (Variable): 图像信息 pretrained=True,
当 phase 为 'predict'时,包含: phase='train')
* head_features (Variable): 所提取的特征 ```
* rois (Variable): 提取的roi
* bbox\_out (Variable): 预测结果 - 提取特征,用于迁移学习。
* context\_prog (Program): 用于迁移学习的 Program。
- **参数**
```python - num\_classes (int): 类别数;<br/>
def save_inference_model(dirname, - trainable (bool): 参数是否可训练;<br/>
model_filename=None, - pretrained (bool): 是否加载预训练模型;<br/>
params_filename=None, - get\_prediction (bool): 可选值为 'train'/'predict','train' 用于训练,'predict' 用于预测。
combined=True)
``` - **返回**
- inputs (dict): 模型的输入,相应的取值为:
将模型保存到指定路径。 当phase为'train'时,包含:
- image (Variable): 图像变量
**参数** - im\_size (Variable): 图像的尺寸
- im\_info (Variable): 图像缩放信息
* dirname: 存在模型的目录名称 - gt\_class (Variable): 检测框类别
* model\_filename: 模型文件名称,默认为\_\_model\_\_ - gt\_box (Variable): 检测框坐标
* params\_filename: 参数文件名称,默认为\_\_params\_\_(仅当`combined`为True时生效) - is\_crowd (Variable): 单个框内是否包含多个物体
* combined: 是否将参数保存到统一的一个文件中 当 phase 为 'predict'时,包含:
- image (Variable): 图像变量
### 依赖 - im\_size (Variable): 图像的尺寸
- im\_info (Variable): 图像缩放信息
paddlepaddle >= 1.6.2 - outputs (dict): 模型的输出,响应的取值为:
当 phase 为 'train'时,包含:
paddlehub >= 1.6.0 - head_features (Variable): 所提取的特征
- rpn\_cls\_loss (Variable): 检测框分类损失
- rpn\_reg\_loss (Variable): 检测框回归损失
- generate\_proposal\_labels (Variable): 图像信息
当 phase 为 'predict'时,包含:
- head_features (Variable): 所提取的特征
- rois (Variable): 提取的roi
- bbox\_out (Variable): 预测结果
- context\_prog (Program): 用于迁移学习的 Program
- ```python
def save_inference_model(dirname,
model_filename=None,
params_filename=None,
combined=True)
```
- 将模型保存到指定路径。
- **参数**
- dirname: 存在模型的目录名称; <br/>
- model\_filename: 模型文件名称,默认为\_\_model\_\_; <br/>
- params\_filename: 参数文件名称,默认为\_\_params\_\_(仅当`combined`为True时生效);<br/>
- combined: 是否将参数保存到统一的一个文件中。
## 四、更新历史
* 1.0.0
初始发布
- ```shell
$ hub install faster_rcnn_resnet50_fpn_venus==1.0.0
```
## 命令行预测 # ssd_mobilenet_v1_pascal
```shell |模型名称|ssd_mobilenet_v1_pascal|
$ hub run ssd_mobilenet_v1_pascal --input_path "/PATH/TO/IMAGE" | :--- | :---: |
``` |类别|图像 - 目标检测|
|网络|SSD|
|数据集|PASCAL VOC|
|是否支持Fine-tuning|否|
|模型大小|24MB|
|最新更新日期|2021-02-26|
|数据指标|-|
## API
```python ## 一、模型基本信息
def context(trainable=True,
pretrained=True,
get_prediction=False)
```
提取特征,用于迁移学习。 - ### 应用效果展示
- 样例结果示例:
<p align="center">
<img src="https://user-images.githubusercontent.com/22424850/131504887-d024c7e5-fc09-4d6b-92b8-4d0c965949d0.jpg" width='50%' hspace='10'/>
<br />
</p>
**参数** - ### 模型介绍
* trainable(bool): 参数是否可训练; - Single Shot MultiBox Detector (SSD) 是一种单阶段的目标检测器。与两阶段的检测方法不同,单阶段目标检测并不进行区域推荐,而是直接从特征图回归出目标的边界框和分类概率。SSD 运用了这种单阶段检测的思想,并且对其进行改进:在不同尺度的特征图上检测对应尺度的目标。该PaddleHub Module的基网络为MobileNet-v1模型,在Pascal数据集上预训练得到,目前仅支持预测。
* pretrained (bool): 是否加载预训练模型;
* get\_prediction (bool): 是否执行预测。
**返回**
* inputs (dict): 模型的输入,keys 包括 'image', 'im\_size',相应的取值为: ## 二、安装
* image (Variable): 图像变量
* im\_size (Variable): 图片的尺寸
* outputs (dict): 模型的输出。如果 get\_prediction 为 False,输出 'head\_features',否则输出 'bbox\_out'。
* context\_prog (Program): 用于迁移学习的 Program.
```python - ### 1、环境依赖
def object_detection(paths=None,
images=None,
batch_size=1,
use_gpu=False,
output_dir='detection_result',
score_thresh=0.5,
visualization=True)
```
预测API,检测输入图片中的所有目标的位置。 - paddlepaddle >= 1.6.2
**参数** - paddlehub >= 1.6.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
* paths (list\[str\]): 图片的路径; - ### 2、安装
* images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
* batch\_size (int): batch 的大小;
* use\_gpu (bool): 是否使用 GPU;
* score\_thresh (float): 识别置信度的阈值;
* visualization (bool): 是否将识别结果保存为图片文件;
* output\_dir (str): 图片的保存路径,默认设为 detection\_result;
**返回** - ```shell
$ hub install ssd_mobilenet_v1_pascal
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
* res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为: ## 三、模型API预测
* data (list): 检测结果,list的每一个元素为 dict,各字段为:
* confidence (float): 识别的置信度;
* label (str): 标签;
* left (int): 边界框的左上角x坐标;
* top (int): 边界框的左上角y坐标;
* right (int): 边界框的右下角x坐标;
* bottom (int): 边界框的右下角y坐标;
* save\_path (str, optional): 识别结果的保存路径 (仅当visualization=True时存在)。
```python - ### 1、命令行预测
def save_inference_model(dirname,
model_filename=None,
params_filename=None,
combined=True)
```
将模型保存到指定路径。 - ```shell
$ hub run ssd_mobilenet_v1_pascal --input_path "/PATH/TO/IMAGE"
```
- 通过命令行方式实现目标检测模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、代码示例
**参数** - ```python
import paddlehub as hub
import cv2
* dirname: 存在模型的目录名称 object_detector = hub.Module(name="ssd_mobilenet_v1_pascal")
* model\_filename: 模型文件名称,默认为\_\_model\_\_ result = object_detector.object_detection(images=[cv2.imread('/PATH/TO/IMAGE')])
* params\_filename: 参数文件名称,默认为\_\_params\_\_(仅当`combined`为True时生效) # or
* combined: 是否将参数保存到统一的一个文件中 # result = object_detector.object_detection((paths=['/PATH/TO/IMAGE'])
```
## 代码示例 - ### 3、API
```python - ```python
import paddlehub as hub def object_detection(paths=None,
import cv2 images=None,
batch_size=1,
use_gpu=False,
output_dir='detection_result',
score_thresh=0.5,
visualization=True,
)
```
object_detector = hub.Module(name="ssd_mobilenet_v1_pascal") - 预测API,检测输入图片中的所有目标的位置。
result = object_detector.object_detection(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = object_detector.object_detection((paths=['/PATH/TO/IMAGE'])
```
## 服务部署 - **参数**
PaddleHub Serving可以部署一个目标检测的在线服务。 - paths (list\[str\]): 图片的路径; <br/>
- images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式; <br/>
- batch\_size (int): batch 的大小;<br/>
- use\_gpu (bool): 是否使用 GPU;<br/>
- output\_dir (str): 图片的保存路径,默认设为 detection\_result; <br/>
- score\_thresh (float): 识别置信度的阈值;<br/>
- visualization (bool): 是否将识别结果保存为图片文件。
## 第一步:启动PaddleHub Serving **NOTE:** paths和images两个参数选择其一进行提供数据
运行启动命令:
```shell
$ hub serving start -m ssd_mobilenet_v1_pascal
```
这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。 - **返回**
**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。 - res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
- data (list): 检测结果,list的每一个元素为 dict,各字段为:
- confidence (float): 识别的置信度
- label (str): 标签
- left (int): 边界框的左上角x坐标
- top (int): 边界框的左上角y坐标
- right (int): 边界框的右下角x坐标
- bottom (int): 边界框的右下角y坐标
- save\_path (str, optional): 识别结果的保存路径 (仅当visualization=True时存在)
## 第二步:发送预测请求 - ```python
def save_inference_model(dirname,
model_filename=None,
params_filename=None,
combined=True)
```
- 将模型保存到指定路径。
配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 - **参数**
```python - dirname: 存在模型的目录名称; <br/>
import requests - model\_filename: 模型文件名称,默认为\_\_model\_\_; <br/>
import json - params\_filename: 参数文件名称,默认为\_\_params\_\_(仅当`combined`为True时生效);<br/>
import cv2 - combined: 是否将参数保存到统一的一个文件中。
import base64
def cv2_to_base64(image): ## 四、服务部署
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
- PaddleHub Serving可以部署一个目标检测的在线服务。
# 发送HTTP请求 - ### 第一步:启动PaddleHub Serving
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/ssd_mobilenet_v1_pascal"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果 - 运行启动命令:
print(r.json()["results"]) - ```shell
``` $ hub serving start -m ssd_mobilenet_v1_pascal
```
### 依赖 - 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
paddlepaddle >= 1.6.2 - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
paddlehub >= 1.6.0 - ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/ssd_mobilenet_v1_pascal"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
## 五、更新历史
* 1.0.0
初始发布
* 1.1.2
修复numpy数据读取问题
- ```shell
$ hub install ssd_mobilenet_v1_pascal==1.1.2
```
...@@ -34,7 +34,11 @@ class DecodeImage(object): ...@@ -34,7 +34,11 @@ class DecodeImage(object):
class ResizeImage(object): class ResizeImage(object):
def __init__(self, target_size=0, max_size=0, interp=cv2.INTER_LINEAR, use_cv2=True): def __init__(self,
target_size=0,
max_size=0,
interp=cv2.INTER_LINEAR,
use_cv2=True):
""" """
Rescale image to the specified target size, and capped at max_size Rescale image to the specified target size, and capped at max_size
if max_size != 0. if max_size != 0.
...@@ -88,11 +92,18 @@ class ResizeImage(object): ...@@ -88,11 +92,18 @@ class ResizeImage(object):
resize_h = selected_size resize_h = selected_size
if self.use_cv2: if self.use_cv2:
im = cv2.resize(im, None, None, fx=im_scale_x, fy=im_scale_y, interpolation=self.interp) im = cv2.resize(
im,
None,
None,
fx=im_scale_x,
fy=im_scale_y,
interpolation=self.interp)
else: else:
if self.max_size != 0: if self.max_size != 0:
raise TypeError('If you set max_size to cap the maximum size of image,' raise TypeError(
'please set use_cv2 to True to resize the image.') 'If you set max_size to cap the maximum size of image,'
'please set use_cv2 to True to resize the image.')
im = im.astype('uint8') im = im.astype('uint8')
im = Image.fromarray(im) im = Image.fromarray(im)
im = im.resize((int(resize_w), int(resize_h)), self.interp) im = im.resize((int(resize_w), int(resize_h)), self.interp)
...@@ -102,7 +113,11 @@ class ResizeImage(object): ...@@ -102,7 +113,11 @@ class ResizeImage(object):
class NormalizeImage(object): class NormalizeImage(object):
def __init__(self, mean=[0.485, 0.456, 0.406], std=[1, 1, 1], is_scale=True, is_channel_first=True): def __init__(self,
mean=[0.485, 0.456, 0.406],
std=[1, 1, 1],
is_scale=True,
is_channel_first=True):
""" """
Args: Args:
mean (list): the pixel mean mean (list): the pixel mean
...@@ -158,9 +173,11 @@ class Permute(object): ...@@ -158,9 +173,11 @@ class Permute(object):
def reader(paths=[], def reader(paths=[],
images=None, images=None,
decode_image=DecodeImage(to_rgb=True, with_mixup=False), decode_image=DecodeImage(to_rgb=True, with_mixup=False),
resize_image=ResizeImage(target_size=512, interp=1, max_size=0, use_cv2=False), resize_image=ResizeImage(
target_size=512, interp=1, max_size=0, use_cv2=False),
permute_image=Permute(to_bgr=False), permute_image=Permute(to_bgr=False),
normalize_image=NormalizeImage(mean=[104, 117, 123], std=[1, 1, 1], is_scale=False)): normalize_image=NormalizeImage(
mean=[104, 117, 123], std=[1, 1, 1], is_scale=False)):
""" """
data generator data generator
...@@ -176,7 +193,8 @@ def reader(paths=[], ...@@ -176,7 +193,8 @@ def reader(paths=[],
if paths is not None: if paths is not None:
assert type(paths) is list, "type(paths) is not list." assert type(paths) is list, "type(paths) is not list."
for img_path in paths: for img_path in paths:
assert os.path.isfile(img_path), "The {} isn't a valid file path.".format(img_path) assert os.path.isfile(
img_path), "The {} isn't a valid file path.".format(img_path)
img = cv2.imread(img_path).astype('float32') img = cv2.imread(img_path).astype('float32')
img_list.append(img) img_list.append(img)
if images is not None: if images is not None:
...@@ -184,10 +202,13 @@ def reader(paths=[], ...@@ -184,10 +202,13 @@ def reader(paths=[],
img_list.append(img) img_list.append(img)
decode_image = DecodeImage(to_rgb=True, with_mixup=False) decode_image = DecodeImage(to_rgb=True, with_mixup=False)
resize_image = ResizeImage(target_size=300, interp=1, max_size=0, use_cv2=False) resize_image = ResizeImage(
target_size=300, interp=1, max_size=0, use_cv2=False)
permute_image = Permute() permute_image = Permute()
normalize_image = NormalizeImage( normalize_image = NormalizeImage(
mean=[127.5, 127.5, 127.5], std=[127.502231, 127.502231, 127.502231], is_scale=False) mean=[127.5, 127.5, 127.5],
std=[127.502231, 127.502231, 127.502231],
is_scale=False)
for img in img_list: for img in img_list:
preprocessed_img = decode_image(img) preprocessed_img = decode_image(img)
......
...@@ -31,7 +31,8 @@ class MobileNet(object): ...@@ -31,7 +31,8 @@ class MobileNet(object):
conv_group_scale=1, conv_group_scale=1,
conv_learning_rate=1.0, conv_learning_rate=1.0,
with_extra_blocks=False, with_extra_blocks=False,
extra_block_filters=[[256, 512], [128, 256], [128, 256], [64, 128]], extra_block_filters=[[256, 512], [128, 256], [128, 256],
[64, 128]],
weight_prefix_name='', weight_prefix_name='',
class_dim=1000, class_dim=1000,
yolo_v3=False): yolo_v3=False):
...@@ -56,7 +57,9 @@ class MobileNet(object): ...@@ -56,7 +57,9 @@ class MobileNet(object):
use_cudnn=True, use_cudnn=True,
name=None): name=None):
parameter_attr = ParamAttr( parameter_attr = ParamAttr(
learning_rate=self.conv_learning_rate, initializer=fluid.initializer.MSRA(), name=name + "_weights") learning_rate=self.conv_learning_rate,
initializer=fluid.initializer.MSRA(),
name=name + "_weights")
conv = fluid.layers.conv2d( conv = fluid.layers.conv2d(
input=input, input=input,
num_filters=num_filters, num_filters=num_filters,
...@@ -71,8 +74,10 @@ class MobileNet(object): ...@@ -71,8 +74,10 @@ class MobileNet(object):
bn_name = name + "_bn" bn_name = name + "_bn"
norm_decay = self.norm_decay norm_decay = self.norm_decay
bn_param_attr = ParamAttr(regularizer=L2Decay(norm_decay), name=bn_name + '_scale') bn_param_attr = ParamAttr(
bn_bias_attr = ParamAttr(regularizer=L2Decay(norm_decay), name=bn_name + '_offset') regularizer=L2Decay(norm_decay), name=bn_name + '_scale')
bn_bias_attr = ParamAttr(
regularizer=L2Decay(norm_decay), name=bn_name + '_offset')
return fluid.layers.batch_norm( return fluid.layers.batch_norm(
input=conv, input=conv,
act=act, act=act,
...@@ -81,7 +86,14 @@ class MobileNet(object): ...@@ -81,7 +86,14 @@ class MobileNet(object):
moving_mean_name=bn_name + '_mean', moving_mean_name=bn_name + '_mean',
moving_variance_name=bn_name + '_variance') moving_variance_name=bn_name + '_variance')
def depthwise_separable(self, input, num_filters1, num_filters2, num_groups, stride, scale, name=None): def depthwise_separable(self,
input,
num_filters1,
num_filters2,
num_groups,
stride,
scale,
name=None):
depthwise_conv = self._conv_norm( depthwise_conv = self._conv_norm(
input=input, input=input,
filter_size=3, filter_size=3,
...@@ -101,7 +113,13 @@ class MobileNet(object): ...@@ -101,7 +113,13 @@ class MobileNet(object):
name=name + "_sep") name=name + "_sep")
return pointwise_conv return pointwise_conv
def _extra_block(self, input, num_filters1, num_filters2, num_groups, stride, name=None): def _extra_block(self,
input,
num_filters1,
num_filters2,
num_groups,
stride,
name=None):
pointwise_conv = self._conv_norm( pointwise_conv = self._conv_norm(
input=input, input=input,
filter_size=1, filter_size=1,
...@@ -124,44 +142,70 @@ class MobileNet(object): ...@@ -124,44 +142,70 @@ class MobileNet(object):
scale = self.conv_group_scale scale = self.conv_group_scale
blocks = [] blocks = []
# input 1/1 # input 1/1
out = self._conv_norm(input, 3, int(32 * scale), 2, 1, name=self.prefix_name + "conv1") out = self._conv_norm(
input, 3, int(32 * scale), 2, 1, name=self.prefix_name + "conv1")
# 1/2 # 1/2
out = self.depthwise_separable(out, 32, 64, 32, 1, scale, name=self.prefix_name + "conv2_1") out = self.depthwise_separable(
out = self.depthwise_separable(out, 64, 128, 64, 2, scale, name=self.prefix_name + "conv2_2") out, 32, 64, 32, 1, scale, name=self.prefix_name + "conv2_1")
out = self.depthwise_separable(
out, 64, 128, 64, 2, scale, name=self.prefix_name + "conv2_2")
# 1/4 # 1/4
out = self.depthwise_separable(out, 128, 128, 128, 1, scale, name=self.prefix_name + "conv3_1") out = self.depthwise_separable(
out = self.depthwise_separable(out, 128, 256, 128, 2, scale, name=self.prefix_name + "conv3_2") out, 128, 128, 128, 1, scale, name=self.prefix_name + "conv3_1")
out = self.depthwise_separable(
out, 128, 256, 128, 2, scale, name=self.prefix_name + "conv3_2")
# 1/8 # 1/8
blocks.append(out) blocks.append(out)
out = self.depthwise_separable(out, 256, 256, 256, 1, scale, name=self.prefix_name + "conv4_1") out = self.depthwise_separable(
out = self.depthwise_separable(out, 256, 512, 256, 2, scale, name=self.prefix_name + "conv4_2") out, 256, 256, 256, 1, scale, name=self.prefix_name + "conv4_1")
out = self.depthwise_separable(
out, 256, 512, 256, 2, scale, name=self.prefix_name + "conv4_2")
# 1/16 # 1/16
blocks.append(out) blocks.append(out)
for i in range(5): for i in range(5):
out = self.depthwise_separable(out, 512, 512, 512, 1, scale, name=self.prefix_name + "conv5_" + str(i + 1)) out = self.depthwise_separable(
out,
512,
512,
512,
1,
scale,
name=self.prefix_name + "conv5_" + str(i + 1))
module11 = out module11 = out
out = self.depthwise_separable(out, 512, 1024, 512, 2, scale, name=self.prefix_name + "conv5_6") out = self.depthwise_separable(
out, 512, 1024, 512, 2, scale, name=self.prefix_name + "conv5_6")
# 1/32 # 1/32
out = self.depthwise_separable(out, 1024, 1024, 1024, 1, scale, name=self.prefix_name + "conv6") out = self.depthwise_separable(
out, 1024, 1024, 1024, 1, scale, name=self.prefix_name + "conv6")
module13 = out module13 = out
blocks.append(out) blocks.append(out)
if self.yolo_v3: if self.yolo_v3:
return blocks return blocks
if not self.with_extra_blocks: if not self.with_extra_blocks:
out = fluid.layers.pool2d(input=out, pool_type='avg', global_pooling=True) out = fluid.layers.pool2d(
input=out, pool_type='avg', global_pooling=True)
out = fluid.layers.fc( out = fluid.layers.fc(
input=out, input=out,
size=self.class_dim, size=self.class_dim,
param_attr=ParamAttr(initializer=fluid.initializer.MSRA(), name="fc7_weights"), param_attr=ParamAttr(
initializer=fluid.initializer.MSRA(), name="fc7_weights"),
bias_attr=ParamAttr(name="fc7_offset")) bias_attr=ParamAttr(name="fc7_offset"))
out = fluid.layers.softmax(out) out = fluid.layers.softmax(out)
blocks.append(out) blocks.append(out)
return blocks return blocks
num_filters = self.extra_block_filters num_filters = self.extra_block_filters
module14 = self._extra_block(module13, num_filters[0][0], num_filters[0][1], 1, 2, self.prefix_name + "conv7_1") module14 = self._extra_block(module13, num_filters[0][0],
module15 = self._extra_block(module14, num_filters[1][0], num_filters[1][1], 1, 2, self.prefix_name + "conv7_2") num_filters[0][1], 1, 2,
module16 = self._extra_block(module15, num_filters[2][0], num_filters[2][1], 1, 2, self.prefix_name + "conv7_3") self.prefix_name + "conv7_1")
module17 = self._extra_block(module16, num_filters[3][0], num_filters[3][1], 1, 2, self.prefix_name + "conv7_4") module15 = self._extra_block(module14, num_filters[1][0],
num_filters[1][1], 1, 2,
self.prefix_name + "conv7_2")
module16 = self._extra_block(module15, num_filters[2][0],
num_filters[2][1], 1, 2,
self.prefix_name + "conv7_3")
module17 = self._extra_block(module16, num_filters[3][0],
num_filters[3][1], 1, 2,
self.prefix_name + "conv7_4")
return module11, module13, module14, module15, module16, module17 return module11, module13, module14, module15, module16, module17
...@@ -21,15 +21,17 @@ from ssd_mobilenet_v1_pascal.data_feed import reader ...@@ -21,15 +21,17 @@ from ssd_mobilenet_v1_pascal.data_feed import reader
@moduleinfo( @moduleinfo(
name="ssd_mobilenet_v1_pascal", name="ssd_mobilenet_v1_pascal",
version="1.1.1", version="1.1.2",
type="cv/object_detection", type="cv/object_detection",
summary="SSD with backbone MobileNet_V1, trained with dataset Pasecal VOC.", summary="SSD with backbone MobileNet_V1, trained with dataset Pasecal VOC.",
author="paddlepaddle", author="paddlepaddle",
author_email="paddle-dev@baidu.com") author_email="paddle-dev@baidu.com")
class SSDMobileNetv1(hub.Module): class SSDMobileNetv1(hub.Module):
def _initialize(self): def _initialize(self):
self.default_pretrained_model_path = os.path.join(self.directory, "ssd_mobilenet_v1_model") self.default_pretrained_model_path = os.path.join(
self.label_names = load_label_info(os.path.join(self.directory, "label_file.txt")) self.directory, "ssd_mobilenet_v1_model")
self.label_names = load_label_info(
os.path.join(self.directory, "label_file.txt"))
self.model_config = None self.model_config = None
self._set_config() self._set_config()
...@@ -81,34 +83,55 @@ class SSDMobileNetv1(hub.Module): ...@@ -81,34 +83,55 @@ class SSDMobileNetv1(hub.Module):
with fluid.program_guard(context_prog, startup_program): with fluid.program_guard(context_prog, startup_program):
with fluid.unique_name.guard(): with fluid.unique_name.guard():
# image # image
image = fluid.layers.data(name='image', shape=[3, 300, 300], dtype='float32') image = fluid.layers.data(
name='image', shape=[3, 300, 300], dtype='float32')
# backbone # backbone
backbone = MobileNet(**self.mobilenet_config) backbone = MobileNet(**self.mobilenet_config)
# body_feats # body_feats
body_feats = backbone(image) body_feats = backbone(image)
# im_size # im_size
im_size = fluid.layers.data(name='im_size', shape=[2], dtype='int32') im_size = fluid.layers.data(
name='im_size', shape=[2], dtype='int32')
# var_prefix # var_prefix
var_prefix = '@HUB_{}@'.format(self.name) var_prefix = '@HUB_{}@'.format(self.name)
# names of inputs # names of inputs
inputs = {'image': var_prefix + image.name, 'im_size': var_prefix + im_size.name} inputs = {
'image': var_prefix + image.name,
'im_size': var_prefix + im_size.name
}
# names of outputs # names of outputs
if get_prediction: if get_prediction:
locs, confs, box, box_var = fluid.layers.multi_box_head( locs, confs, box, box_var = fluid.layers.multi_box_head(
inputs=body_feats, image=image, num_classes=21, **self.multi_box_head_config) inputs=body_feats,
image=image,
num_classes=21,
**self.multi_box_head_config)
pred = fluid.layers.detection_output( pred = fluid.layers.detection_output(
loc=locs, scores=confs, prior_box=box, prior_box_var=box_var, **self.output_decoder_config) loc=locs,
scores=confs,
prior_box=box,
prior_box_var=box_var,
**self.output_decoder_config)
outputs = {'bbox_out': [var_prefix + pred.name]} outputs = {'bbox_out': [var_prefix + pred.name]}
else: else:
outputs = {'body_features': [var_prefix + var.name for var in body_feats]} outputs = {
'body_features':
[var_prefix + var.name for var in body_feats]
}
# add_vars_prefix # add_vars_prefix
add_vars_prefix(context_prog, var_prefix) add_vars_prefix(context_prog, var_prefix)
add_vars_prefix(fluid.default_startup_program(), var_prefix) add_vars_prefix(fluid.default_startup_program(), var_prefix)
# inputs # inputs
inputs = {key: context_prog.global_block().vars[value] for key, value in inputs.items()} inputs = {
key: context_prog.global_block().vars[value]
for key, value in inputs.items()
}
outputs = { outputs = {
out_key: [context_prog.global_block().vars[varname] for varname in out_value] out_key: [
context_prog.global_block().vars[varname]
for varname in out_value
]
for out_key, out_value in outputs.items() for out_key, out_value in outputs.items()
} }
# trainable # trainable
...@@ -121,9 +144,14 @@ class SSDMobileNetv1(hub.Module): ...@@ -121,9 +144,14 @@ class SSDMobileNetv1(hub.Module):
if pretrained: if pretrained:
def _if_exist(var): def _if_exist(var):
return os.path.exists(os.path.join(self.default_pretrained_model_path, var.name)) return os.path.exists(
os.path.join(self.default_pretrained_model_path,
var.name))
fluid.io.load_vars(exe, self.default_pretrained_model_path, predicate=_if_exist) fluid.io.load_vars(
exe,
self.default_pretrained_model_path,
predicate=_if_exist)
else: else:
exe.run(startup_program) exe.run(startup_program)
...@@ -166,7 +194,7 @@ class SSDMobileNetv1(hub.Module): ...@@ -166,7 +194,7 @@ class SSDMobileNetv1(hub.Module):
int(_places[0]) int(_places[0])
except: except:
raise RuntimeError( raise RuntimeError(
"Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES as cuda_device_id." "Attempt to use GPU for prediction, but environment variable CUDA_VISIBLE_DEVICES was not set correctly."
) )
paths = paths if paths else list() paths = paths if paths else list()
...@@ -196,7 +224,11 @@ class SSDMobileNetv1(hub.Module): ...@@ -196,7 +224,11 @@ class SSDMobileNetv1(hub.Module):
res.extend(output) res.extend(output)
return res return res
def save_inference_model(self, dirname, model_filename=None, params_filename=None, combined=True): def save_inference_model(self,
dirname,
model_filename=None,
params_filename=None,
combined=True):
if combined: if combined:
model_filename = "__model__" if not model_filename else model_filename model_filename = "__model__" if not model_filename else model_filename
params_filename = "__params__" if not params_filename else params_filename params_filename = "__params__" if not params_filename else params_filename
...@@ -234,9 +266,12 @@ class SSDMobileNetv1(hub.Module): ...@@ -234,9 +266,12 @@ class SSDMobileNetv1(hub.Module):
prog='hub run {}'.format(self.name), prog='hub run {}'.format(self.name),
usage='%(prog)s', usage='%(prog)s',
add_help=True) add_help=True)
self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required") self.arg_input_group = self.parser.add_argument_group(
title="Input options", description="Input data. Required")
self.arg_config_group = self.parser.add_argument_group( self.arg_config_group = self.parser.add_argument_group(
title="Config options", description="Run configuration for controlling module behavior, not required.") title="Config options",
description=
"Run configuration for controlling module behavior, not required.")
self.add_module_config_arg() self.add_module_config_arg()
self.add_module_input_arg() self.add_module_input_arg()
args = self.parser.parse_args(argvs) args = self.parser.parse_args(argvs)
...@@ -254,17 +289,34 @@ class SSDMobileNetv1(hub.Module): ...@@ -254,17 +289,34 @@ class SSDMobileNetv1(hub.Module):
Add the command config options. Add the command config options.
""" """
self.arg_config_group.add_argument( self.arg_config_group.add_argument(
'--use_gpu', type=ast.literal_eval, default=False, help="whether use GPU or not") '--use_gpu',
type=ast.literal_eval,
default=False,
help="whether use GPU or not")
self.arg_config_group.add_argument( self.arg_config_group.add_argument(
'--output_dir', type=str, default='detection_result', help="The directory to save output images.") '--output_dir',
type=str,
default='detection_result',
help="The directory to save output images.")
self.arg_config_group.add_argument( self.arg_config_group.add_argument(
'--visualization', type=ast.literal_eval, default=False, help="whether to save output as images.") '--visualization',
type=ast.literal_eval,
default=False,
help="whether to save output as images.")
def add_module_input_arg(self): def add_module_input_arg(self):
""" """
Add the command input options. Add the command input options.
""" """
self.arg_input_group.add_argument('--input_path', type=str, help="path to image.")
self.arg_input_group.add_argument('--batch_size', type=ast.literal_eval, default=1, help="batch size.")
self.arg_input_group.add_argument( self.arg_input_group.add_argument(
'--score_thresh', type=ast.literal_eval, default=0.5, help="threshold for object detecion.") '--input_path', type=str, help="path to image.")
self.arg_input_group.add_argument(
'--batch_size',
type=ast.literal_eval,
default=1,
help="batch size.")
self.arg_input_group.add_argument(
'--score_thresh',
type=ast.literal_eval,
default=0.5,
help="threshold for object detecion.")
...@@ -15,6 +15,12 @@ def base64_to_cv2(b64str): ...@@ -15,6 +15,12 @@ def base64_to_cv2(b64str):
data = cv2.imdecode(data, cv2.IMREAD_COLOR) data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data return data
def check_dir(dir_path):
if not os.path.exists(dir_path):
os.makedirs(dir_path)
elif os.path.isfile(dir_path):
os.remove(dir_path)
os.makedirs(dir_path)
def get_save_image_name(img, output_dir, image_path): def get_save_image_name(img, output_dir, image_path):
""" """
...@@ -44,17 +50,23 @@ def draw_bounding_box_on_image(image_path, data_list, save_dir): ...@@ -44,17 +50,23 @@ def draw_bounding_box_on_image(image_path, data_list, save_dir):
image = Image.open(image_path) image = Image.open(image_path)
draw = ImageDraw.Draw(image) draw = ImageDraw.Draw(image)
for data in data_list: for data in data_list:
left, right, top, bottom = data['left'], data['right'], data['top'], data['bottom'] left, right, top, bottom = data['left'], data['right'], data[
'top'], data['bottom']
# draw bbox # draw bbox
draw.line([(left, top), (left, bottom), (right, bottom), (right, top), (left, top)], width=2, fill='red') draw.line([(left, top), (left, bottom), (right, bottom), (right, top),
(left, top)],
width=2,
fill='red')
# draw label # draw label
if image.mode == 'RGB': if image.mode == 'RGB':
text = data['label'] + ": %.2f%%" % (100 * data['confidence']) text = data['label'] + ": %.2f%%" % (100 * data['confidence'])
textsize_width, textsize_height = draw.textsize(text=text) textsize_width, textsize_height = draw.textsize(text=text)
draw.rectangle( draw.rectangle(
xy=(left, top - (textsize_height + 5), left + textsize_width + 10, top), fill=(255, 255, 255)) xy=(left, top - (textsize_height + 5),
left + textsize_width + 10, top),
fill=(255, 255, 255))
draw.text(xy=(left, top - 15), text=text, fill=(0, 0, 0)) draw.text(xy=(left, top - 15), text=text, fill=(0, 0, 0))
save_name = get_save_image_name(image, save_dir, image_path) save_name = get_save_image_name(image, save_dir, image_path)
...@@ -83,7 +95,14 @@ def load_label_info(file_path): ...@@ -83,7 +95,14 @@ def load_label_info(file_path):
return label_names return label_names
def postprocess(paths, images, data_out, score_thresh, label_names, output_dir, handle_id, visualization=True): def postprocess(paths,
images,
data_out,
score_thresh,
label_names,
output_dir,
handle_id,
visualization=True):
""" """
postprocess the lod_tensor produced by fluid.Executor.run postprocess the lod_tensor produced by fluid.Executor.run
...@@ -111,16 +130,27 @@ def postprocess(paths, images, data_out, score_thresh, label_names, output_dir, ...@@ -111,16 +130,27 @@ def postprocess(paths, images, data_out, score_thresh, label_names, output_dir,
lod_tensor = data_out[0] lod_tensor = data_out[0]
lod = lod_tensor.lod[0] lod = lod_tensor.lod[0]
results = lod_tensor.as_ndarray() results = lod_tensor.as_ndarray()
if handle_id < len(paths):
unhandled_paths = paths[handle_id:] check_dir(output_dir)
unhandled_paths_num = len(unhandled_paths)
else: if paths:
unhandled_paths_num = 0 assert type(paths) is list, "type(paths) is not list."
if handle_id < len(paths):
unhandled_paths = paths[handle_id:]
unhandled_paths_num = len(unhandled_paths)
else:
unhandled_paths_num = 0
if images is not None:
if handle_id < len(images):
unhandled_paths = None
unhandled_paths_num = len(images) - handle_id
else:
unhandled_paths_num = 0
output = [] output = []
for index in range(len(lod) - 1): for index in range(len(lod) - 1):
output_i = {'data': []} output_i = {'data': []}
if index < unhandled_paths_num: if unhandled_paths and index < unhandled_paths_num:
org_img_path = unhandled_paths[index] org_img_path = unhandled_paths[index]
org_img = Image.open(org_img_path) org_img = Image.open(org_img_path)
output_i['path'] = org_img_path output_i['path'] = org_img_path
...@@ -129,7 +159,9 @@ def postprocess(paths, images, data_out, score_thresh, label_names, output_dir, ...@@ -129,7 +159,9 @@ def postprocess(paths, images, data_out, score_thresh, label_names, output_dir,
org_img = org_img.astype(np.uint8) org_img = org_img.astype(np.uint8)
org_img = Image.fromarray(org_img[:, :, ::-1]) org_img = Image.fromarray(org_img[:, :, ::-1])
if visualization: if visualization:
org_img_path = get_save_image_name(org_img, output_dir, 'image_numpy_{}'.format((handle_id + index))) org_img_path = get_save_image_name(
org_img, output_dir, 'image_numpy_{}'.format(
(handle_id + index)))
org_img.save(org_img_path) org_img.save(org_img_path)
org_img_height = org_img.height org_img_height = org_img.height
org_img_width = org_img.width org_img_width = org_img.width
...@@ -149,11 +181,13 @@ def postprocess(paths, images, data_out, score_thresh, label_names, output_dir, ...@@ -149,11 +181,13 @@ def postprocess(paths, images, data_out, score_thresh, label_names, output_dir,
dt = {} dt = {}
dt['label'] = label_names[category_id] dt['label'] = label_names[category_id]
dt['confidence'] = float(confidence) dt['confidence'] = float(confidence)
dt['left'], dt['top'], dt['right'], dt['bottom'] = clip_bbox(bbox, org_img_width, org_img_height) dt['left'], dt['top'], dt['right'], dt['bottom'] = clip_bbox(
bbox, org_img_width, org_img_height)
output_i['data'].append(dt) output_i['data'].append(dt)
output.append(output_i) output.append(output_i)
if visualization: if visualization:
output_i['save_path'] = draw_bounding_box_on_image(org_img_path, output_i['data'], output_dir) output_i['save_path'] = draw_bounding_box_on_image(
org_img_path, output_i['data'], output_dir)
return output return output
## 命令行预测 # ssd_vgg16_512_coco2017
```shell |模型名称|ssd_vgg16_512_coco2017|
$ hub run ssd_vgg16_512_coco2017 --input_path "/PATH/TO/IMAGE" | :--- | :---: |
``` |类别|图像 - 目标检测|
|网络|SSD|
|数据集|COCO2017|
|是否支持Fine-tuning|否|
|模型大小|139MB|
|最新更新日期|2021-03-15|
|数据指标|-|
## API
```python ## 一、模型基本信息
def context(trainable=True,
pretrained=True,
get_prediction=False)
```
提取特征,用于迁移学习。 - ### 应用效果展示
- 样例结果示例:
<p align="center">
<img src="https://user-images.githubusercontent.com/22424850/131506781-b4ecb77b-5ab1-4795-88da-5f547f7f7f9c.jpg" width='50%' hspace='10'/>
<br />
</p>
**参数** - ### 模型介绍
* trainable(bool): 参数是否可训练; - Single Shot MultiBox Detector (SSD) 是一种单阶段的目标检测器。与两阶段的检测方法不同,单阶段目标检测并不进行区域推荐,而是直接从特征图回归出目标的边界框和分类概率。SSD 运用了这种单阶段检测的思想,并且对其进行改进:在不同尺度的特征图上检测对应尺度的目标。该PaddleHub Module的基网络为VGG16模型,在Pascal数据集上预训练得到,目前仅支持预测。
* pretrained (bool): 是否加载预训练模型;
* get\_prediction (bool): 是否执行预测。
**返回**
* inputs (dict): 模型的输入,keys 包括 'image', 'im\_size',相应的取值为: ## 二、安装
* image (Variable): 图像变量
* im\_size (Variable): 图片的尺寸
* outputs (dict): 模型的输出。如果 get\_prediction 为 False,输出 'head\_features',否则输出 'bbox\_out'。
* context\_prog (Program): 用于迁移学习的 Program.
```python - ### 1、环境依赖
def object_detection(paths=None,
images=None,
batch_size=1,
use_gpu=False,
output_dir='detection_result',
score_thresh=0.5,
visualization=True)
```
预测API,检测输入图片中的所有目标的位置。 - paddlepaddle >= 1.6.2
**参数** - paddlehub >= 1.6.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
* paths (list\[str\]): 图片的路径; - ### 2、安装
* images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
* batch\_size (int): batch 的大小;
* use\_gpu (bool): 是否使用 GPU;
* score\_thresh (float): 识别置信度的阈值;
* visualization (bool): 是否将识别结果保存为图片文件;
* output\_dir (str): 图片的保存路径,默认设为 detection\_result;
**返回** - ```shell
$ hub install ssd_vgg16_512_coco2017
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
* res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为: ## 三、模型API预测
* data (list): 检测结果,list的每一个元素为 dict,各字段为:
* confidence (float): 识别的置信度;
* label (str): 标签;
* left (int): 边界框的左上角x坐标;
* top (int): 边界框的左上角y坐标;
* right (int): 边界框的右下角x坐标;
* bottom (int): 边界框的右下角y坐标;
* save\_path (str, optional): 识别结果的保存路径 (仅当visualization=True时存在)。
```python - ### 1、命令行预测
def save_inference_model(dirname,
model_filename=None,
params_filename=None,
combined=True)
```
将模型保存到指定路径。 - ```shell
$ hub run ssd_vgg16_512_coco2017 --input_path "/PATH/TO/IMAGE"
```
- 通过命令行方式实现目标检测模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、代码示例
**参数** - ```python
import paddlehub as hub
import cv2
* dirname: 存在模型的目录名称 object_detector = hub.Module(name="ssd_vgg16_512_coco2017")
* model\_filename: 模型文件名称,默认为\_\_model\_\_ result = object_detector.object_detection(images=[cv2.imread('/PATH/TO/IMAGE')])
* params\_filename: 参数文件名称,默认为\_\_params\_\_(仅当`combined`为True时生效) # or
* combined: 是否将参数保存到统一的一个文件中 # result = object_detector.object_detection((paths=['/PATH/TO/IMAGE'])
```
## 代码示例 - ### 3、API
```python - ```python
import paddlehub as hub def object_detection(paths=None,
import cv2 images=None,
batch_size=1,
use_gpu=False,
output_dir='detection_result',
score_thresh=0.5,
visualization=True)
```
object_detector = hub.Module(name="ssd_vgg16_512_coco2017") - 预测API,检测输入图片中的所有目标的位置。
result = object_detector.object_detection(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = object_detector.object_detection((paths=['/PATH/TO/IMAGE'])
```
## 服务部署 - **参数**
PaddleHub Serving可以部署一个目标检测的在线服务。 - paths (list\[str\]): 图片的路径; <br/>
- images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式; <br/>
- batch\_size (int): batch 的大小;<br/>
- use\_gpu (bool): 是否使用 GPU;<br/>
- output\_dir (str): 图片的保存路径,默认设为 detection\_result;<br/>
- score\_thresh (float): 识别置信度的阈值;<br/>
- visualization (bool): 是否将识别结果保存为图片文件。
## 第一步:启动PaddleHub Serving **NOTE:** paths和images两个参数选择其一进行提供数据
运行启动命令: - **返回**
```shell
$ hub serving start -m ssd_vgg16_512_coco2017
```
这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。 - res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
- data (list): 检测结果,list的每一个元素为 dict,各字段为:
- confidence (float): 识别的置信度
- label (str): 标签
- left (int): 边界框的左上角x坐标
- top (int): 边界框的左上角y坐标
- right (int): 边界框的右下角x坐标
- bottom (int): 边界框的右下角y坐标
- save\_path (str, optional): 识别结果的保存路径 (仅当visualization=True时存在)
**NOTE:** 如使用GPU 预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。 - ```python
def save_inference_model(dirname,
model_filename=None,
params_filename=None,
combined=True)
```
- 将模型保存到指定路径。
## 第二步:发送预测请求 - **参数**
配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 - dirname: 存在模型的目录名称; <br/>
- model\_filename: 模型文件名称,默认为\_\_model\_\_; <br/>
- params\_filename: 参数文件名称,默认为\_\_params\_\_(仅当`combined`为True时生效);<br/>
- combined: 是否将参数保存到统一的一个文件中。
```python
import requests
import json
import cv2
import base64
## 四、服务部署
def cv2_to_base64(image): - PaddleHub Serving可以部署一个目标检测的在线服务。
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
- ### 第一步:启动PaddleHub Serving
# 发送HTTP请求 - 运行启动命令:
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]} - ```shell
headers = {"Content-type": "application/json"} $ hub serving start -m ssd_vgg16_512_coco2017
url = "http://127.0.0.1:8866/predict/ssd_vgg16_512_coco2017" ```
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果 - 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
print(r.json()["results"])
```
### 依赖 - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
paddlepaddle >= 1.6.2 - ### 第二步:发送预测请求
paddlehub >= 1.6.0 - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/ssd_vgg16_512_coco2017"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
## 五、更新历史
* 1.0.0
初始发布
* 1.0.2
修复numpy数据读取问题
- ```shell
$ hub install ssd_vgg16_512_coco2017==1.0.2
```
...@@ -34,7 +34,11 @@ class DecodeImage(object): ...@@ -34,7 +34,11 @@ class DecodeImage(object):
class ResizeImage(object): class ResizeImage(object):
def __init__(self, target_size=0, max_size=0, interp=cv2.INTER_LINEAR, use_cv2=True): def __init__(self,
target_size=0,
max_size=0,
interp=cv2.INTER_LINEAR,
use_cv2=True):
""" """
Rescale image to the specified target size, and capped at max_size Rescale image to the specified target size, and capped at max_size
if max_size != 0. if max_size != 0.
...@@ -88,11 +92,18 @@ class ResizeImage(object): ...@@ -88,11 +92,18 @@ class ResizeImage(object):
resize_h = selected_size resize_h = selected_size
if self.use_cv2: if self.use_cv2:
im = cv2.resize(im, None, None, fx=im_scale_x, fy=im_scale_y, interpolation=self.interp) im = cv2.resize(
im,
None,
None,
fx=im_scale_x,
fy=im_scale_y,
interpolation=self.interp)
else: else:
if self.max_size != 0: if self.max_size != 0:
raise TypeError('If you set max_size to cap the maximum size of image,' raise TypeError(
'please set use_cv2 to True to resize the image.') 'If you set max_size to cap the maximum size of image,'
'please set use_cv2 to True to resize the image.')
im = im.astype('uint8') im = im.astype('uint8')
im = Image.fromarray(im) im = Image.fromarray(im)
im = im.resize((int(resize_w), int(resize_h)), self.interp) im = im.resize((int(resize_w), int(resize_h)), self.interp)
...@@ -102,7 +113,11 @@ class ResizeImage(object): ...@@ -102,7 +113,11 @@ class ResizeImage(object):
class NormalizeImage(object): class NormalizeImage(object):
def __init__(self, mean=[0.485, 0.456, 0.406], std=[1, 1, 1], is_scale=True, is_channel_first=True): def __init__(self,
mean=[0.485, 0.456, 0.406],
std=[1, 1, 1],
is_scale=True,
is_channel_first=True):
""" """
Args: Args:
mean (list): the pixel mean mean (list): the pixel mean
...@@ -158,9 +173,11 @@ class Permute(object): ...@@ -158,9 +173,11 @@ class Permute(object):
def reader(paths=[], def reader(paths=[],
images=None, images=None,
decode_image=DecodeImage(to_rgb=True, with_mixup=False), decode_image=DecodeImage(to_rgb=True, with_mixup=False),
resize_image=ResizeImage(target_size=512, interp=1, max_size=0, use_cv2=False), resize_image=ResizeImage(
target_size=512, interp=1, max_size=0, use_cv2=False),
permute_image=Permute(to_bgr=False), permute_image=Permute(to_bgr=False),
normalize_image=NormalizeImage(mean=[104, 117, 123], std=[1, 1, 1], is_scale=False)): normalize_image=NormalizeImage(
mean=[104, 117, 123], std=[1, 1, 1], is_scale=False)):
""" """
data generator data generator
...@@ -176,7 +193,8 @@ def reader(paths=[], ...@@ -176,7 +193,8 @@ def reader(paths=[],
if paths is not None: if paths is not None:
assert type(paths) is list, "type(paths) is not list." assert type(paths) is list, "type(paths) is not list."
for img_path in paths: for img_path in paths:
assert os.path.isfile(img_path), "The {} isn't a valid file path.".format(img_path) assert os.path.isfile(
img_path), "The {} isn't a valid file path.".format(img_path)
img = cv2.imread(img_path).astype('float32') img = cv2.imread(img_path).astype('float32')
img_list.append(img) img_list.append(img)
if images is not None: if images is not None:
......
...@@ -21,15 +21,17 @@ from ssd_vgg16_512_coco2017.data_feed import reader ...@@ -21,15 +21,17 @@ from ssd_vgg16_512_coco2017.data_feed import reader
@moduleinfo( @moduleinfo(
name="ssd_vgg16_512_coco2017", name="ssd_vgg16_512_coco2017",
version="1.0.1", version="1.0.2",
type="cv/object_detection", type="cv/object_detection",
summary="SSD with backbone VGG16, trained with dataset COCO.", summary="SSD with backbone VGG16, trained with dataset COCO.",
author="paddlepaddle", author="paddlepaddle",
author_email="paddle-dev@baidu.com") author_email="paddle-dev@baidu.com")
class SSDVGG16_512(hub.Module): class SSDVGG16_512(hub.Module):
def _initialize(self): def _initialize(self):
self.default_pretrained_model_path = os.path.join(self.directory, "ssd_vgg16_512_model") self.default_pretrained_model_path = os.path.join(
self.label_names = load_label_info(os.path.join(self.directory, "label_file.txt")) self.directory, "ssd_vgg16_512_model")
self.label_names = load_label_info(
os.path.join(self.directory, "label_file.txt"))
self.model_config = None self.model_config = None
self._set_config() self._set_config()
...@@ -80,39 +82,63 @@ class SSDVGG16_512(hub.Module): ...@@ -80,39 +82,63 @@ class SSDVGG16_512(hub.Module):
with fluid.program_guard(context_prog, startup_program): with fluid.program_guard(context_prog, startup_program):
with fluid.unique_name.guard(): with fluid.unique_name.guard():
# image # image
image = fluid.layers.data(name='image', shape=[3, 512, 512], dtype='float32') image = fluid.layers.data(
name='image', shape=[3, 512, 512], dtype='float32')
# backbone # backbone
backbone = VGG( backbone = VGG(
depth=16, depth=16,
with_extra_blocks=True, with_extra_blocks=True,
normalizations=[20., -1, -1, -1, -1, -1, -1], normalizations=[20., -1, -1, -1, -1, -1, -1],
extra_block_filters=[[256, 512, 1, 2, 3], [128, 256, 1, 2, 3], [128, 256, 1, 2, 3], extra_block_filters=[[256, 512, 1, 2,
[128, 256, 1, 2, 3], [128, 256, 1, 1, 4]]) 3], [128, 256, 1, 2, 3],
[128, 256, 1, 2,
3], [128, 256, 1, 2, 3],
[128, 256, 1, 1, 4]])
# body_feats # body_feats
body_feats = backbone(image) body_feats = backbone(image)
# im_size # im_size
im_size = fluid.layers.data(name='im_size', shape=[2], dtype='int32') im_size = fluid.layers.data(
name='im_size', shape=[2], dtype='int32')
# var_prefix # var_prefix
var_prefix = '@HUB_{}@'.format(self.name) var_prefix = '@HUB_{}@'.format(self.name)
# names of inputs # names of inputs
inputs = {'image': var_prefix + image.name, 'im_size': var_prefix + im_size.name} inputs = {
'image': var_prefix + image.name,
'im_size': var_prefix + im_size.name
}
# names of outputs # names of outputs
if get_prediction: if get_prediction:
locs, confs, box, box_var = fluid.layers.multi_box_head( locs, confs, box, box_var = fluid.layers.multi_box_head(
inputs=body_feats, image=image, num_classes=81, **self.multi_box_head_config) inputs=body_feats,
image=image,
num_classes=81,
**self.multi_box_head_config)
pred = fluid.layers.detection_output( pred = fluid.layers.detection_output(
loc=locs, scores=confs, prior_box=box, prior_box_var=box_var, **self.output_decoder_config) loc=locs,
scores=confs,
prior_box=box,
prior_box_var=box_var,
**self.output_decoder_config)
outputs = {'bbox_out': [var_prefix + pred.name]} outputs = {'bbox_out': [var_prefix + pred.name]}
else: else:
outputs = {'body_features': [var_prefix + var.name for var in body_feats]} outputs = {
'body_features':
[var_prefix + var.name for var in body_feats]
}
# add_vars_prefix # add_vars_prefix
add_vars_prefix(context_prog, var_prefix) add_vars_prefix(context_prog, var_prefix)
add_vars_prefix(fluid.default_startup_program(), var_prefix) add_vars_prefix(fluid.default_startup_program(), var_prefix)
# inputs # inputs
inputs = {key: context_prog.global_block().vars[value] for key, value in inputs.items()} inputs = {
key: context_prog.global_block().vars[value]
for key, value in inputs.items()
}
outputs = { outputs = {
out_key: [context_prog.global_block().vars[varname] for varname in out_value] out_key: [
context_prog.global_block().vars[varname]
for varname in out_value
]
for out_key, out_value in outputs.items() for out_key, out_value in outputs.items()
} }
# trainable # trainable
...@@ -125,9 +151,14 @@ class SSDVGG16_512(hub.Module): ...@@ -125,9 +151,14 @@ class SSDVGG16_512(hub.Module):
if pretrained: if pretrained:
def _if_exist(var): def _if_exist(var):
return os.path.exists(os.path.join(self.default_pretrained_model_path, var.name)) return os.path.exists(
os.path.join(self.default_pretrained_model_path,
var.name))
fluid.io.load_vars(exe, self.default_pretrained_model_path, predicate=_if_exist) fluid.io.load_vars(
exe,
self.default_pretrained_model_path,
predicate=_if_exist)
else: else:
exe.run(startup_program) exe.run(startup_program)
...@@ -169,7 +200,7 @@ class SSDVGG16_512(hub.Module): ...@@ -169,7 +200,7 @@ class SSDVGG16_512(hub.Module):
int(_places[0]) int(_places[0])
except: except:
raise RuntimeError( raise RuntimeError(
"Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES as cuda_device_id." "Attempt to use GPU for prediction, but environment variable CUDA_VISIBLE_DEVICES was not set correctly."
) )
paths = paths if paths else list() paths = paths if paths else list()
...@@ -196,7 +227,11 @@ class SSDVGG16_512(hub.Module): ...@@ -196,7 +227,11 @@ class SSDVGG16_512(hub.Module):
res.extend(output) res.extend(output)
return res return res
def save_inference_model(self, dirname, model_filename=None, params_filename=None, combined=True): def save_inference_model(self,
dirname,
model_filename=None,
params_filename=None,
combined=True):
if combined: if combined:
model_filename = "__model__" if not model_filename else model_filename model_filename = "__model__" if not model_filename else model_filename
params_filename = "__params__" if not params_filename else params_filename params_filename = "__params__" if not params_filename else params_filename
...@@ -234,9 +269,12 @@ class SSDVGG16_512(hub.Module): ...@@ -234,9 +269,12 @@ class SSDVGG16_512(hub.Module):
prog='hub run {}'.format(self.name), prog='hub run {}'.format(self.name),
usage='%(prog)s', usage='%(prog)s',
add_help=True) add_help=True)
self.arg_input_group = self.parser.add_argument_group(title="Input options", description="Input data. Required") self.arg_input_group = self.parser.add_argument_group(
title="Input options", description="Input data. Required")
self.arg_config_group = self.parser.add_argument_group( self.arg_config_group = self.parser.add_argument_group(
title="Config options", description="Run configuration for controlling module behavior, not required.") title="Config options",
description=
"Run configuration for controlling module behavior, not required.")
self.add_module_config_arg() self.add_module_config_arg()
self.add_module_input_arg() self.add_module_input_arg()
args = self.parser.parse_args(argvs) args = self.parser.parse_args(argvs)
...@@ -254,17 +292,34 @@ class SSDVGG16_512(hub.Module): ...@@ -254,17 +292,34 @@ class SSDVGG16_512(hub.Module):
Add the command config options. Add the command config options.
""" """
self.arg_config_group.add_argument( self.arg_config_group.add_argument(
'--use_gpu', type=ast.literal_eval, default=False, help="whether use GPU or not") '--use_gpu',
type=ast.literal_eval,
default=False,
help="whether use GPU or not")
self.arg_config_group.add_argument( self.arg_config_group.add_argument(
'--output_dir', type=str, default='detection_result', help="The directory to save output images.") '--output_dir',
type=str,
default='detection_result',
help="The directory to save output images.")
self.arg_config_group.add_argument( self.arg_config_group.add_argument(
'--visualization', type=ast.literal_eval, default=False, help="whether to save output as images.") '--visualization',
type=ast.literal_eval,
default=False,
help="whether to save output as images.")
def add_module_input_arg(self): def add_module_input_arg(self):
""" """
Add the command input options. Add the command input options.
""" """
self.arg_input_group.add_argument('--input_path', type=str, help="path to image.")
self.arg_input_group.add_argument('--batch_size', type=ast.literal_eval, default=1, help="batch size.")
self.arg_input_group.add_argument( self.arg_input_group.add_argument(
'--score_thresh', type=ast.literal_eval, default=0.5, help="threshold for object detecion.") '--input_path', type=str, help="path to image.")
self.arg_input_group.add_argument(
'--batch_size',
type=ast.literal_eval,
default=1,
help="batch size.")
self.arg_input_group.add_argument(
'--score_thresh',
type=ast.literal_eval,
default=0.5,
help="threshold for object detecion.")
...@@ -15,6 +15,12 @@ def base64_to_cv2(b64str): ...@@ -15,6 +15,12 @@ def base64_to_cv2(b64str):
data = cv2.imdecode(data, cv2.IMREAD_COLOR) data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data return data
def check_dir(dir_path):
if not os.path.exists(dir_path):
os.makedirs(dir_path)
elif os.path.isfile(dir_path):
os.remove(dir_path)
os.makedirs(dir_path)
def get_save_image_name(img, output_dir, image_path): def get_save_image_name(img, output_dir, image_path):
""" """
...@@ -44,17 +50,23 @@ def draw_bounding_box_on_image(image_path, data_list, save_dir): ...@@ -44,17 +50,23 @@ def draw_bounding_box_on_image(image_path, data_list, save_dir):
image = Image.open(image_path) image = Image.open(image_path)
draw = ImageDraw.Draw(image) draw = ImageDraw.Draw(image)
for data in data_list: for data in data_list:
left, right, top, bottom = data['left'], data['right'], data['top'], data['bottom'] left, right, top, bottom = data['left'], data['right'], data[
'top'], data['bottom']
# draw bbox # draw bbox
draw.line([(left, top), (left, bottom), (right, bottom), (right, top), (left, top)], width=2, fill='red') draw.line([(left, top), (left, bottom), (right, bottom), (right, top),
(left, top)],
width=2,
fill='red')
# draw label # draw label
if image.mode == 'RGB': if image.mode == 'RGB':
text = data['label'] + ": %.2f%%" % (100 * data['confidence']) text = data['label'] + ": %.2f%%" % (100 * data['confidence'])
textsize_width, textsize_height = draw.textsize(text=text) textsize_width, textsize_height = draw.textsize(text=text)
draw.rectangle( draw.rectangle(
xy=(left, top - (textsize_height + 5), left + textsize_width + 10, top), fill=(255, 255, 255)) xy=(left, top - (textsize_height + 5),
left + textsize_width + 10, top),
fill=(255, 255, 255))
draw.text(xy=(left, top - 15), text=text, fill=(0, 0, 0)) draw.text(xy=(left, top - 15), text=text, fill=(0, 0, 0))
save_name = get_save_image_name(image, save_dir, image_path) save_name = get_save_image_name(image, save_dir, image_path)
...@@ -83,7 +95,14 @@ def load_label_info(file_path): ...@@ -83,7 +95,14 @@ def load_label_info(file_path):
return label_names return label_names
def postprocess(paths, images, data_out, score_thresh, label_names, output_dir, handle_id, visualization=True): def postprocess(paths,
images,
data_out,
score_thresh,
label_names,
output_dir,
handle_id,
visualization=True):
""" """
postprocess the lod_tensor produced by fluid.Executor.run postprocess the lod_tensor produced by fluid.Executor.run
...@@ -111,16 +130,27 @@ def postprocess(paths, images, data_out, score_thresh, label_names, output_dir, ...@@ -111,16 +130,27 @@ def postprocess(paths, images, data_out, score_thresh, label_names, output_dir,
lod_tensor = data_out[0] lod_tensor = data_out[0]
lod = lod_tensor.lod[0] lod = lod_tensor.lod[0]
results = lod_tensor.as_ndarray() results = lod_tensor.as_ndarray()
if handle_id < len(paths):
unhandled_paths = paths[handle_id:] check_dir(output_dir)
unhandled_paths_num = len(unhandled_paths)
else: if paths:
unhandled_paths_num = 0 assert type(paths) is list, "type(paths) is not list."
if handle_id < len(paths):
unhandled_paths = paths[handle_id:]
unhandled_paths_num = len(unhandled_paths)
else:
unhandled_paths_num = 0
if images is not None:
if handle_id < len(images):
unhandled_paths = None
unhandled_paths_num = len(images) - handle_id
else:
unhandled_paths_num = 0
output = [] output = []
for index in range(len(lod) - 1): for index in range(len(lod) - 1):
output_i = {'data': []} output_i = {'data': []}
if index < unhandled_paths_num: if unhandled_paths and index < unhandled_paths_num:
org_img_path = unhandled_paths[index] org_img_path = unhandled_paths[index]
org_img = Image.open(org_img_path) org_img = Image.open(org_img_path)
output_i['path'] = org_img_path output_i['path'] = org_img_path
...@@ -129,7 +159,9 @@ def postprocess(paths, images, data_out, score_thresh, label_names, output_dir, ...@@ -129,7 +159,9 @@ def postprocess(paths, images, data_out, score_thresh, label_names, output_dir,
org_img = org_img.astype(np.uint8) org_img = org_img.astype(np.uint8)
org_img = Image.fromarray(org_img[:, :, ::-1]) org_img = Image.fromarray(org_img[:, :, ::-1])
if visualization: if visualization:
org_img_path = get_save_image_name(org_img, output_dir, 'image_numpy_{}'.format((handle_id + index))) org_img_path = get_save_image_name(
org_img, output_dir, 'image_numpy_{}'.format(
(handle_id + index)))
org_img.save(org_img_path) org_img.save(org_img_path)
org_img_height = org_img.height org_img_height = org_img.height
org_img_width = org_img.width org_img_width = org_img.width
...@@ -149,11 +181,13 @@ def postprocess(paths, images, data_out, score_thresh, label_names, output_dir, ...@@ -149,11 +181,13 @@ def postprocess(paths, images, data_out, score_thresh, label_names, output_dir,
dt = {} dt = {}
dt['label'] = label_names[category_id] dt['label'] = label_names[category_id]
dt['confidence'] = float(confidence) dt['confidence'] = float(confidence)
dt['left'], dt['top'], dt['right'], dt['bottom'] = clip_bbox(bbox, org_img_width, org_img_height) dt['left'], dt['top'], dt['right'], dt['bottom'] = clip_bbox(
bbox, org_img_width, org_img_height)
output_i['data'].append(dt) output_i['data'].append(dt)
output.append(output_i) output.append(output_i)
if visualization: if visualization:
output_i['save_path'] = draw_bounding_box_on_image(org_img_path, output_i['data'], output_dir) output_i['save_path'] = draw_bounding_box_on_image(
org_img_path, output_i['data'], output_dir)
return output return output
...@@ -27,8 +27,8 @@ class VGG(object): ...@@ -27,8 +27,8 @@ class VGG(object):
depth=16, depth=16,
with_extra_blocks=False, with_extra_blocks=False,
normalizations=[20., -1, -1, -1, -1, -1], normalizations=[20., -1, -1, -1, -1, -1],
extra_block_filters=[[256, 512, 1, 2, 3], [128, 256, 1, 2, 3], [128, 256, 0, 1, 3], extra_block_filters=[[256, 512, 1, 2, 3], [128, 256, 1, 2, 3],
[128, 256, 0, 1, 3]], [128, 256, 0, 1, 3], [128, 256, 0, 1, 3]],
class_dim=1000): class_dim=1000):
assert depth in [16, 19], "depth {} not in [16, 19]" assert depth in [16, 19], "depth {} not in [16, 19]"
self.depth = depth self.depth = depth
...@@ -60,7 +60,8 @@ class VGG(object): ...@@ -60,7 +60,8 @@ class VGG(object):
res_layer = [] res_layer = []
layers = [] layers = []
for k, v in enumerate(vgg_base): for k, v in enumerate(vgg_base):
conv = self._conv_block(conv, v, nums[k], name="conv{}_".format(k + 1)) conv = self._conv_block(
conv, v, nums[k], name="conv{}_".format(k + 1))
layers.append(conv) layers.append(conv)
if self.with_extra_blocks: if self.with_extra_blocks:
if k == 4: if k == 4:
...@@ -76,19 +77,25 @@ class VGG(object): ...@@ -76,19 +77,25 @@ class VGG(object):
input=conv, input=conv,
size=fc_dim, size=fc_dim,
act='relu', act='relu',
param_attr=fluid.param_attr.ParamAttr(name=fc_name[0] + "_weights"), param_attr=fluid.param_attr.ParamAttr(
bias_attr=fluid.param_attr.ParamAttr(name=fc_name[0] + "_offset")) name=fc_name[0] + "_weights"),
bias_attr=fluid.param_attr.ParamAttr(
name=fc_name[0] + "_offset"))
fc2 = fluid.layers.fc( fc2 = fluid.layers.fc(
input=fc1, input=fc1,
size=fc_dim, size=fc_dim,
act='relu', act='relu',
param_attr=fluid.param_attr.ParamAttr(name=fc_name[1] + "_weights"), param_attr=fluid.param_attr.ParamAttr(
bias_attr=fluid.param_attr.ParamAttr(name=fc_name[1] + "_offset")) name=fc_name[1] + "_weights"),
bias_attr=fluid.param_attr.ParamAttr(
name=fc_name[1] + "_offset"))
out = fluid.layers.fc( out = fluid.layers.fc(
input=fc2, input=fc2,
size=self.class_dim, size=self.class_dim,
param_attr=fluid.param_attr.ParamAttr(name=fc_name[2] + "_weights"), param_attr=fluid.param_attr.ParamAttr(
bias_attr=fluid.param_attr.ParamAttr(name=fc_name[2] + "_offset")) name=fc_name[2] + "_weights"),
bias_attr=fluid.param_attr.ParamAttr(
name=fc_name[2] + "_offset"))
out = fluid.layers.softmax(out) out = fluid.layers.softmax(out)
res_layer.append(out) res_layer.append(out)
return [out] return [out]
...@@ -103,7 +110,14 @@ class VGG(object): ...@@ -103,7 +110,14 @@ class VGG(object):
layers = [] layers = []
for k, v in enumerate(cfg): for k, v in enumerate(cfg):
assert len(v) == 5, "extra_block_filters size not fix" assert len(v) == 5, "extra_block_filters size not fix"
conv = self._extra_block(conv, v[0], v[1], v[2], v[3], v[4], name="conv{}_".format(6 + k)) conv = self._extra_block(
conv,
v[0],
v[1],
v[2],
v[3],
v[4],
name="conv{}_".format(6 + k))
layers.append(conv) layers.append(conv)
return layers return layers
...@@ -121,10 +135,23 @@ class VGG(object): ...@@ -121,10 +135,23 @@ class VGG(object):
name=name + str(i + 1)) name=name + str(i + 1))
return conv return conv
def _extra_block(self, input, num_filters1, num_filters2, padding_size, stride_size, filter_size, name=None): def _extra_block(self,
input,
num_filters1,
num_filters2,
padding_size,
stride_size,
filter_size,
name=None):
# 1x1 conv # 1x1 conv
conv_1 = self._conv_layer( conv_1 = self._conv_layer(
input=input, num_filters=int(num_filters1), filter_size=1, stride=1, act='relu', padding=0, name=name + "1") input=input,
num_filters=int(num_filters1),
filter_size=1,
stride=1,
act='relu',
padding=0,
name=name + "1")
# 3x3 conv # 3x3 conv
conv_2 = self._conv_layer( conv_2 = self._conv_layer(
...@@ -157,11 +184,17 @@ class VGG(object): ...@@ -157,11 +184,17 @@ class VGG(object):
act=act, act=act,
use_cudnn=use_cudnn, use_cudnn=use_cudnn,
param_attr=ParamAttr(name=name + "_weights"), param_attr=ParamAttr(name=name + "_weights"),
bias_attr=ParamAttr(name=name + "_biases") if self.with_extra_blocks else False, bias_attr=ParamAttr(
name=name + "_biases") if self.with_extra_blocks else False,
name=name + '.conv2d.output.1') name=name + '.conv2d.output.1')
return conv return conv
def _pooling_block(self, conv, pool_size, pool_stride, pool_padding=0, ceil_mode=True): def _pooling_block(self,
conv,
pool_size,
pool_stride,
pool_padding=0,
ceil_mode=True):
pool = fluid.layers.pool2d( pool = fluid.layers.pool2d(
input=conv, input=conv,
pool_size=pool_size, pool_size=pool_size,
...@@ -175,10 +208,17 @@ class VGG(object): ...@@ -175,10 +208,17 @@ class VGG(object):
from paddle.fluid.layer_helper import LayerHelper from paddle.fluid.layer_helper import LayerHelper
from paddle.fluid.initializer import Constant from paddle.fluid.initializer import Constant
helper = LayerHelper("Scale") helper = LayerHelper("Scale")
l2_norm = fluid.layers.l2_normalize(input, axis=1) # l2 norm along channel l2_norm = fluid.layers.l2_normalize(
input, axis=1) # l2 norm along channel
shape = [1] if channel_shared else [input.shape[1]] shape = [1] if channel_shared else [input.shape[1]]
scale = helper.create_parameter( scale = helper.create_parameter(
attr=helper.param_attr, shape=shape, dtype=input.dtype, default_initializer=Constant(init_scale)) attr=helper.param_attr,
shape=shape,
dtype=input.dtype,
default_initializer=Constant(init_scale))
out = fluid.layers.elementwise_mul( out = fluid.layers.elementwise_mul(
x=l2_norm, y=scale, axis=-1 if channel_shared else 1, name="conv4_3_norm_scale") x=l2_norm,
y=scale,
axis=-1 if channel_shared else 1,
name="conv4_3_norm_scale")
return out return out
## 命令行预测 # yolov3_darknet53_coco2017
```shell |模型名称|yolov3_darknet53_coco2017|
$ hub run yolov3_darknet53_coco2017 --input_path "/PATH/TO/IMAGE" | :--- | :---: |
``` |类别|图像 - 目标检测|
|网络|YOLOv3|
|数据集|COCO2017|
|是否支持Fine-tuning|否|
|模型大小|239MB|
|最新更新日期|2021-02-26|
|数据指标|-|
## API
```python ## 一、模型基本信息
def context(trainable=True,
pretrained=True,
get_prediction=False)
```
提取特征,用于迁移学习。 - ### 应用效果展示
- 样例结果示例:
<p align="center">
<img src="https://user-images.githubusercontent.com/22424850/131506781-b4ecb77b-5ab1-4795-88da-5f547f7f7f9c.jpg" width='50%' hspace='10'/>
<br />
</p>
**参数** - ### 模型介绍
* trainable(bool): 参数是否可训练; - YOLOv3是由Joseph Redmon和Ali Farhadi提出的单阶段检测器, 该检测器与达到同样精度的传统目标检测方法相比,推断速度能达到接近两倍。 YOLOv3将输入图像划分格子,并对每个格子预测bounding box。YOLOv3的loss函数由三部分组成:Location误差,Confidence误差和分类误差。该PaddleHub Module预训练数据集为COCO2017,目前仅支持预测。
* pretrained (bool): 是否加载预训练模型;
* get\_prediction (bool): 是否执行预测。
**返回**
* inputs (dict): 模型的输入,keys 包括 'image', 'im\_size',相应的取值为: ## 二、安装
* image (Variable): 图像变量
* im\_size (Variable): 图片的尺寸
* outputs (dict): 模型的输出。如果 get\_prediction 为 False,输出 'head\_features'、'body\_features',否则输出 'bbox\_out'。
* context\_prog (Program): 用于迁移学习的 Program.
```python - ### 1、环境依赖
def object_detection(paths=None,
images=None,
batch_size=1,
use_gpu=False,
output_dir='detection_result',
score_thresh=0.5,
visualization=True)
```
预测API,检测输入图片中的所有目标的位置。 - paddlepaddle >= 1.6.2
**参数** - paddlehub >= 1.6.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
* paths (list\[str\]): 图片的路径; - ### 2、安装
* images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
* batch\_size (int): batch 的大小;
* use\_gpu (bool): 是否使用 GPU;
* score\_thresh (float): 识别置信度的阈值;
* visualization (bool): 是否将识别结果保存为图片文件;
* output\_dir (str): 图片的保存路径,默认设为 detection\_result;
**返回** - ```shell
$ hub install yolov3_darknet53_coco2017
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
* res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为: ## 三、模型API预测
* data (list): 检测结果,list的每一个元素为 dict,各字段为:
* confidence (float): 识别的置信度;
* label (str): 标签;
* left (int): 边界框的左上角x坐标;
* top (int): 边界框的左上角y坐标;
* right (int): 边界框的右下角x坐标;
* bottom (int): 边界框的右下角y坐标;
* save\_path (str, optional): 识别结果的保存路径 (仅当visualization=True时存在)。
```python - ### 1、命令行预测
def save_inference_model(dirname,
model_filename=None,
params_filename=None,
combined=True)
```
将模型保存到指定路径。 - ```shell
$ hub run yolov3_darknet53_coco2017 --input_path "/PATH/TO/IMAGE"
```
- 通过命令行方式实现目标检测模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、代码示例
**参数** - ```python
import paddlehub as hub
import cv2
* dirname: 存在模型的目录名称 object_detector = hub.Module(name="yolov3_darknet53_coco2017")
* model\_filename: 模型文件名称,默认为\_\_model\_\_ result = object_detector.object_detection(images=[cv2.imread('/PATH/TO/IMAGE')])
* params\_filename: 参数文件名称,默认为\_\_params\_\_(仅当`combined`为True时生效) # or
* combined: 是否将参数保存到统一的一个文件中 # result = object_detector.object_detection((paths=['/PATH/TO/IMAGE'])
```
## 代码示例 - ### 3、API
```python - ```python
import paddlehub as hub def object_detection(paths=None,
import cv2 images=None,
batch_size=1,
use_gpu=False,
output_dir='detection_result',
score_thresh=0.5,
visualization=True)
```
object_detector = hub.Module(name="yolov3_darknet53_coco2017") - 预测API,检测输入图片中的所有目标的位置。
result = object_detector.object_detection(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = object_detector.object_detection((paths=['/PATH/TO/IMAGE'])
```
## 服务部署 - **参数**
PaddleHub Serving可以部署一个目标检测的在线服务。 - paths (list\[str\]): 图片的路径; <br/>
- images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式; <br/>
- batch\_size (int): batch 的大小;<br/>
- use\_gpu (bool): 是否使用 GPU;<br/>
- output\_dir (str): 图片的保存路径,默认设为 detection\_result;<br/>
- score\_thresh (float): 识别置信度的阈值;<br/>
- visualization (bool): 是否将识别结果保存为图片文件。
## 第一步:启动PaddleHub Serving **NOTE:** paths和images两个参数选择其一进行提供数据
运行启动命令: - **返回**
```shell
$ hub serving start -m yolov3_darknet53_coco2017
```
这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。 - res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
- data (list): 检测结果,list的每一个元素为 dict,各字段为:
- confidence (float): 识别的置信度
- label (str): 标签
- left (int): 边界框的左上角x坐标
- top (int): 边界框的左上角y坐标
- right (int): 边界框的右下角x坐标
- bottom (int): 边界框的右下角y坐标
- save\_path (str, optional): 识别结果的保存路径 (仅当visualization=True时存在)
**NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。 - ```python
def save_inference_model(dirname,
model_filename=None,
params_filename=None,
combined=True)
```
- 将模型保存到指定路径。
## 第二步:发送预测请求 - **参数**
配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果 - dirname: 存在模型的目录名称; <br/>
- model\_filename: 模型文件名称,默认为\_\_model\_\_; <br/>
- params\_filename: 参数文件名称,默认为\_\_params\_\_(仅当`combined`为True时生效);<br/>
- combined: 是否将参数保存到统一的一个文件中。
```python
import requests
import json
import cv2
import base64
## 四、服务部署
def cv2_to_base64(image): - PaddleHub Serving可以部署一个目标检测的在线服务。
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
- ### 第一步:启动PaddleHub Serving
# 发送HTTP请求 - 运行启动命令:
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]} - ```shell
headers = {"Content-type": "application/json"} $ hub serving start -m yolov3_darknet53_coco2017
url = "http://127.0.0.1:8866/predict/yolov3_darknet53_coco2017" ```
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果 - 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
print(r.json()["results"])
```
### 依赖 - **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
paddlepaddle >= 1.6.2 - ### 第二步:发送预测请求
paddlehub >= 1.6.0 - 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/yolov3_darknet53_coco2017"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
## 五、更新历史
* 1.0.0
初始发布
* 1.1.1
修复numpy数据读取问题
- ```shell
$ hub install yolov3_darknet53_coco2017==1.1.1
```
...@@ -39,7 +39,14 @@ class DarkNet(object): ...@@ -39,7 +39,14 @@ class DarkNet(object):
self.class_dim = class_dim self.class_dim = class_dim
self.get_prediction = get_prediction self.get_prediction = get_prediction
def _conv_norm(self, input, ch_out, filter_size, stride, padding, act='leaky', name=None): def _conv_norm(self,
input,
ch_out,
filter_size,
stride,
padding,
act='leaky',
name=None):
conv = fluid.layers.conv2d( conv = fluid.layers.conv2d(
input=input, input=input,
num_filters=ch_out, num_filters=ch_out,
...@@ -51,8 +58,12 @@ class DarkNet(object): ...@@ -51,8 +58,12 @@ class DarkNet(object):
bias_attr=False) bias_attr=False)
bn_name = name + ".bn" bn_name = name + ".bn"
bn_param_attr = ParamAttr(regularizer=L2Decay(float(self.norm_decay)), name=bn_name + '.scale') bn_param_attr = ParamAttr(
bn_bias_attr = ParamAttr(regularizer=L2Decay(float(self.norm_decay)), name=bn_name + '.offset') regularizer=L2Decay(float(self.norm_decay)),
name=bn_name + '.scale')
bn_bias_attr = ParamAttr(
regularizer=L2Decay(float(self.norm_decay)),
name=bn_name + '.offset')
out = fluid.layers.batch_norm( out = fluid.layers.batch_norm(
input=conv, input=conv,
...@@ -69,12 +80,36 @@ class DarkNet(object): ...@@ -69,12 +80,36 @@ class DarkNet(object):
return out return out
def _downsample(self, input, ch_out, filter_size=3, stride=2, padding=1, name=None): def _downsample(self,
return self._conv_norm(input, ch_out=ch_out, filter_size=filter_size, stride=stride, padding=padding, name=name) input,
ch_out,
filter_size=3,
stride=2,
padding=1,
name=None):
return self._conv_norm(
input,
ch_out=ch_out,
filter_size=filter_size,
stride=stride,
padding=padding,
name=name)
def basicblock(self, input, ch_out, name=None): def basicblock(self, input, ch_out, name=None):
conv1 = self._conv_norm(input, ch_out=ch_out, filter_size=1, stride=1, padding=0, name=name + ".0") conv1 = self._conv_norm(
conv2 = self._conv_norm(conv1, ch_out=ch_out * 2, filter_size=3, stride=1, padding=1, name=name + ".1") input,
ch_out=ch_out,
filter_size=1,
stride=1,
padding=0,
name=name + ".0")
conv2 = self._conv_norm(
conv1,
ch_out=ch_out * 2,
filter_size=3,
stride=1,
padding=1,
name=name + ".1")
out = fluid.layers.elementwise_add(x=input, y=conv2, act=None) out = fluid.layers.elementwise_add(x=input, y=conv2, act=None)
return out return out
...@@ -94,9 +129,16 @@ class DarkNet(object): ...@@ -94,9 +129,16 @@ class DarkNet(object):
stages, block_func = self.depth_cfg[self.depth] stages, block_func = self.depth_cfg[self.depth]
stages = stages[0:5] stages = stages[0:5]
conv = self._conv_norm( conv = self._conv_norm(
input=input, ch_out=32, filter_size=3, stride=1, padding=1, name=self.prefix_name + "yolo_input") input=input,
ch_out=32,
filter_size=3,
stride=1,
padding=1,
name=self.prefix_name + "yolo_input")
downsample_ = self._downsample( downsample_ = self._downsample(
input=conv, ch_out=conv.shape[1] * 2, name=self.prefix_name + "yolo_input.downsample") input=conv,
ch_out=conv.shape[1] * 2,
name=self.prefix_name + "yolo_input.downsample")
blocks = [] blocks = []
for i, stage in enumerate(stages): for i, stage in enumerate(stages):
block = self.layer_warp( block = self.layer_warp(
...@@ -108,14 +150,19 @@ class DarkNet(object): ...@@ -108,14 +150,19 @@ class DarkNet(object):
blocks.append(block) blocks.append(block)
if i < len(stages) - 1: # do not downsaple in the last stage if i < len(stages) - 1: # do not downsaple in the last stage
downsample_ = self._downsample( downsample_ = self._downsample(
input=block, ch_out=block.shape[1] * 2, name=self.prefix_name + "stage.{}.downsample".format(i)) input=block,
ch_out=block.shape[1] * 2,
name=self.prefix_name + "stage.{}.downsample".format(i))
if self.get_prediction: if self.get_prediction:
pool = fluid.layers.pool2d(input=block, pool_type='avg', global_pooling=True) pool = fluid.layers.pool2d(
input=block, pool_type='avg', global_pooling=True)
stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0) stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
out = fluid.layers.fc( out = fluid.layers.fc(
input=pool, input=pool,
size=self.class_dim, size=self.class_dim,
param_attr=ParamAttr(initializer=fluid.initializer.Uniform(-stdv, stdv), name='fc_weights'), param_attr=ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, stdv),
name='fc_weights'),
bias_attr=ParamAttr(name='fc_offset')) bias_attr=ParamAttr(name='fc_offset'))
out = fluid.layers.softmax(out) out = fluid.layers.softmax(out)
return out return out
......
...@@ -26,7 +26,8 @@ def reader(paths=[], images=None): ...@@ -26,7 +26,8 @@ def reader(paths=[], images=None):
if paths: if paths:
assert type(paths) is list, "type(paths) is not list." assert type(paths) is list, "type(paths) is not list."
for img_path in paths: for img_path in paths:
assert os.path.isfile(img_path), "The {} isn't a valid file path.".format(img_path) assert os.path.isfile(
img_path), "The {} isn't a valid file path.".format(img_path)
img = cv2.imread(img_path).astype('float32') img = cv2.imread(img_path).astype('float32')
img_list.append(img) img_list.append(img)
if images is not None: if images is not None:
...@@ -50,7 +51,8 @@ def reader(paths=[], images=None): ...@@ -50,7 +51,8 @@ def reader(paths=[], images=None):
im_scale_x = float(target_size) / float(im_shape[1]) im_scale_x = float(target_size) / float(im_shape[1])
im_scale_y = float(target_size) / float(im_shape[0]) im_scale_y = float(target_size) / float(im_shape[0])
im = cv2.resize(im, None, None, fx=im_scale_x, fy=im_scale_y, interpolation=2) im = cv2.resize(
im, None, None, fx=im_scale_x, fy=im_scale_y, interpolation=2)
# normalize image # normalize image
mean = [0.485, 0.456, 0.406] mean = [0.485, 0.456, 0.406]
......
...@@ -26,7 +26,8 @@ def reader(paths=[], images=None): ...@@ -26,7 +26,8 @@ def reader(paths=[], images=None):
if paths: if paths:
assert type(paths) is list, "type(paths) is not list." assert type(paths) is list, "type(paths) is not list."
for img_path in paths: for img_path in paths:
assert os.path.isfile(img_path), "The {} isn't a valid file path.".format(img_path) assert os.path.isfile(
img_path), "The {} isn't a valid file path.".format(img_path)
img = cv2.imread(img_path).astype('float32') img = cv2.imread(img_path).astype('float32')
img_list.append(img) img_list.append(img)
if images is not None: if images is not None:
...@@ -50,7 +51,8 @@ def reader(paths=[], images=None): ...@@ -50,7 +51,8 @@ def reader(paths=[], images=None):
im_scale_x = float(target_size) / float(im_shape[1]) im_scale_x = float(target_size) / float(im_shape[1])
im_scale_y = float(target_size) / float(im_shape[0]) im_scale_y = float(target_size) / float(im_shape[0])
im = cv2.resize(im, None, None, fx=im_scale_x, fy=im_scale_y, interpolation=2) im = cv2.resize(
im, None, None, fx=im_scale_x, fy=im_scale_y, interpolation=2)
# normalize image # normalize image
mean = [0.485, 0.456, 0.406] mean = [0.485, 0.456, 0.406]
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册