导出模型失败
Created by: miemie2013
开发者团队你们好!本人参考PaddleDetection的代码结构复现了yolact,用infer.py脚本是可以正常预测的,但导出模型时出错。有自定义的op。不同于yolov3是直接输出一个[M, 6]的张量,我的网络输出了4个张量。 导出模型(export_model.py脚本)时是没有问题的,最后提示 2020-03-29 16:11:12,077-INFO: Export inference model to ./inference_model\yolact, input: ['image', 'im_size'], output: ['_generated_var_5', '_generated_var_4', '_generated_var_0', '_generated _var_1']...
但是预测时(cpp_infer.py脚本)有问题 Traceback (most recent call last): File "tools/cpp_infer.py", line 327, in infer() File "tools/cpp_infer.py", line 259, in infer outs1, outs2, outs3, outs4 = predict.run(inputs) paddle.fluid.core_avx.EnforceNotMet:
C++ Call Stacks (More useful to developers):
Windows not support stack backtrace yet.
Python Call Stacks (More useful to users):
File "D:\Python36\lib\site-packages\paddle\fluid\framework.py", line 2525, in append_op attrs=kwargs.get("attrs", None)) File "D:\Python36\lib\site-packages\paddle\fluid\layer_helper.py", line 43, in append_op return self.main_program.current_block().append_op(*args, **kwargs) File "D:\Python36\lib\site-packages\paddle\fluid\layers\nn.py", line 12640, in py_func 'backward_skip_vars': list(backward_skip_vars) File "D:\PycharmProjects\yolactpp\tools..\ppdet\modeling\ops.py", line 735, in call func=_fast_nms, x=[bboxes, scores, mcf, im_size, proto_out], out=outs) File "D:\PycharmProjects\yolactpp\tools..\ppdet\modeling\anchor_heads\yolact_head.py", line 406, in get_prediction im_size=im_size, proto_out=proto_out[0]) File "D:\PycharmProjects\yolactpp\tools..\ppdet\modeling\architectures\yolact.py", line 102, in build return self.yolact_head.get_prediction(body_feats, im_size) File "D:\PycharmProjects\yolactpp\tools..\ppdet\modeling\architectures\yolact.py", line 187, in test return self.build(feed_vars, mode='test') File "tools/export_model.py", line 105, in main test_fetches = model.test(feed_vars) File "tools/export_model.py", line 122, in main()
Error Message Summary:
Error: Invalid python callable id [Hint: Expected i < g_py_callables.size(), but received i:0 >= g_py_callables.size():0.] at (D:\1.7.1\paddle\paddle\fluid\operators\py_func_op.cc:45) [operator < py_func > error]
自定义op的源代码如下:
@register @serializable class FastNMS(object): def init( self, score_threshold=0.01, nms_top_k=100, keep_top_k=100, nms_threshold=0.45): super(FastNMS, self).init() self.score_threshold = score_threshold self.nms_top_k = nms_top_k self.keep_top_k = keep_top_k self.nms_threshold = nms_threshold
def __call__(self, bboxes, scores, mcf, im_size, proto_out):
# 暂时支持批大小为1
# bboxes是一个形为[-1, 4]的张量,即这张图片所有的预测框(还未进行分数过滤)
# scores是一个形为[-1, 80]的张量,即bboxes属于各个类别的得分
# mcf是一个形为[-1, 32]的张量,即bboxes自带的32个掩码系数
# im_size是一个形为[1, 2]的张量,这一张图片的高宽
# proto_out是一个形为[-1, -1, 32]的张量,即这张图片的32个掩码原型
def create_tmp_var(program, name, dtype, shape, lod_level):
return program.current_block().create_var(
name=name, dtype=dtype, shape=shape, lod_level=lod_level)
def _ious(boxes):
'''
尝试过的几种方法里最快的一种(约20ms),只计算上三角部分的iou,有点像冒泡排序,也就不用生成上三角的掩码只保留上三角部分。
'''
n = boxes.shape[0]
A = boxes.shape[1]
r = np.zeros((n, A, A))
for j in range(A - 1):
tx1 = boxes[:, j, 0:1]
ty1 = boxes[:, j, 1:2]
tx2 = boxes[:, j, 2:3]
ty2 = boxes[:, j, 3:4]
x1 = boxes[:, j + 1:, 0]
y1 = boxes[:, j + 1:, 1]
x2 = boxes[:, j + 1:, 2]
y2 = boxes[:, j + 1:, 3]
areas = (x2 - x1) * (y2 - y1)
xx1 = np.maximum(tx1, x1)
yy1 = np.maximum(ty1, y1)
xx2 = np.minimum(tx2, x2)
yy2 = np.minimum(ty2, y2)
w = np.maximum(0.0, xx2 - xx1)
h = np.maximum(0.0, yy2 - yy1)
inter = w * h
ious = inter / (areas + (tx2 - tx1) * (ty2 - ty1) - inter + 1e-9)
r[:, j, j + 1:] = ious
return r
def _fast_nms(bboxes, scores, mcf, im_size, proto_out):
# 来到这里,花费了96ms
# 标记1,下面的步骤耗时几乎为0
bboxes = np.array(bboxes) # 形状为 [M,4] M是边界框的个数。数据类型为float32或float64
scores = np.array(scores) # 具有形状 [M,C] 的2-D LoDTensor。 M是bbox的数量,C是种类数目
mcf = np.array(mcf) # 具有形状 [M,32] 的2-D LoDTensor。 M是bbox的数量
im_size = np.array(im_size) # [-1, 2]
proto_out = np.array(proto_out) # [-1, -1, 32]
# 输出
cls_tensor = fluid.LoDTensor()
score_tensor = fluid.LoDTensor()
bbox_tensor = fluid.LoDTensor()
mask_tensor = fluid.LoDTensor()
class_nums = scores.shape[-1] # C=80
thresh = self.score_threshold
iou_threshold = self.nms_threshold
nms_top_k = self.nms_top_k
keep_top_k = self.keep_top_k
# 分数过滤
scores_tr = scores.transpose(1, 0) # [80, M]
conf_scores = np.max(scores_tr, axis=0)
keep = np.where(conf_scores > thresh)[0]
if len(keep) == 0:
result_c = np.array([[]], dtype=np.int32)
result_s = np.array([[]], dtype=np.float32)
result_b = np.zeros((1, 4)).astype(np.float32)
mask = np.zeros((1, 1, 1)).astype(np.float32)
cls_tensor.set_lod([[0, result_c.shape[0]]])
score_tensor.set_lod([[0, result_c.shape[0]]])
bbox_tensor.set_lod([[0, result_c.shape[0]]])
mask_tensor.set_lod([[0, result_c.shape[0]]])
cls_tensor.set(result_c, fluid.CPUPlace())
score_tensor.set(result_s, fluid.CPUPlace())
bbox_tensor.set(result_b, fluid.CPUPlace())
mask_tensor.set(mask, fluid.CPUPlace())
return cls_tensor, score_tensor, bbox_tensor, mask_tensor
# 分数过滤
scores = scores[keep]
scores = scores.transpose(1, 0) # [80, ?]
boxes = bboxes[keep] # [?, 4]
masks = mcf[keep] # [?, 32]
# fastnms
# 每个类别所有方框(最大分数大于阈值的方框) 降序排列
scores_sorted = np.sort(scores, axis=-1)
scores_sorted = scores_sorted[:, ::-1]
idx = np.argsort(-scores, axis=-1)
idx = idx[:, :nms_top_k]
scores_sorted = scores_sorted[:, :nms_top_k]
num_dets = idx.shape[1]
idx = np.reshape(idx, (-1,)) # [80 * ?, ]
boxes = boxes[idx] # [80 * ?, 4]
boxes = np.reshape(boxes, (class_nums, num_dets, 4)) # [80, ?, 4]
masks = masks[idx] # [80 * ?, 32]
masks = np.reshape(masks, (class_nums, num_dets, -1)) # [80, ?, 32]
# 标记1,上面的步骤耗时几乎为0
# 这一步还是比较耗时。nms_top_k从500下调到100,能减少12ms左右。
# 计算一个c×n×n的IOU矩阵,其中每个n×n矩阵表示对该类n个候选框,两两之间的IOU
# 只要某个框最高分数 > 阈值就保留。然而计算这个矩阵时,这个框其实重复了80次,每一个分身代表是不同类的物品。
iou = _ious(boxes)
# 留下来的iou都是这一列的框与分数比它高的框的iou,如果iou过大,应该丢弃这一列代表的框。
iou_max = np.max(iou, axis=1) # [80, ?] 每一类别的矩阵,每一列求最大值
# 只保留最大值低于阈值的。nms结束。
result_b = []
result_s = []
result_c = []
result_m = []
start_idx = 0
for j in range(start_idx, class_nums): # 遍历所有框的各个类别j,循环80次
# Now just filter out the ones higher than the threshold
keep = np.where(iou_max[j] <= iou_threshold)[0]
b = boxes[j][keep] # [?, 4]
s = scores_sorted[j][keep] # [?, ]
c = (np.zeros((len(s),)) + j).astype(np.int32) # [?, ]
m = masks[j][keep] # [?, 32]
result_b.append(b)
result_s.append(s[:, np.newaxis])
result_c.append(c[:, np.newaxis])
result_m.append(m)
# 80类的结果拼接
result_b = np.vstack(result_b) # [?, 4]
result_s = np.vstack(result_s) # [?, 1]
result_c = np.vstack(result_c) # [?, 1]
result_m = np.vstack(result_m) # [?, 32]
# 再做一次分数过滤。前面提到,只要某个框最高分数>阈值就保留,
# 然而计算上面那个矩阵时,这个框其实重复了80次,每一个分身代表是不同类的物品。
# 非最高分数的其它类别,它的得分可能小于阈值,要过滤。
# 所以fastnms存在这么一个现象:某个框它最高分数 > 阈值,它有一个非最高分数类的得分也超过了阈值,
# 那么最后有可能两个框都保留,而且这两个框有相同的xywh和相同的掩码系数(框重合掩码也重合)
# 其它的nms算法,不会有一个框其实重复了80次的现象,分数只会取最高分数。
keep = np.where(result_s > thresh)[0]
# 分数过滤
result_b = result_b[keep] # [?, 4]
result_s = result_s[keep] # [?, 1]
result_c = result_c[keep] # [?, 1]
result_m = result_m[keep] # [?, 32]
# Limit to max_per_image detections **over all classes**
image_scores = result_s[:, 0]
if len(image_scores) > keep_top_k:
image_thresh = np.sort(image_scores)[-keep_top_k]
keep = np.where(result_s[:, 0] >= image_thresh)[0]
result_c = result_c[keep, :] # [?, 1]
result_s = result_s[keep, :] # [?, 1]
result_b = result_b[keep, :] # [?, 4]
result_m = result_m[keep, :] # [?, 32]
mask = np.matmul(proto_out, result_m.transpose(1, 0))
# sigmoid()。这里用sigmoid()耗时7ms,非常不值得。
# mask = 1.0 / (1.0 + np.exp(-mask))
cls_tensor.set_lod([[0, result_c.shape[0]]])
score_tensor.set_lod([[0, result_c.shape[0]]])
bbox_tensor.set_lod([[0, result_c.shape[0]]])
mask_tensor.set_lod([[0, result_c.shape[0]]])
cls_tensor.set(result_c, fluid.CPUPlace())
score_tensor.set(result_s, fluid.CPUPlace())
bbox_tensor.set(result_b, fluid.CPUPlace())
mask_tensor.set(mask, fluid.CPUPlace())
return cls_tensor, score_tensor, bbox_tensor, mask_tensor
cls = create_tmp_var(
fluid.default_main_program(),
name=None,
dtype='int32',
shape=[-1, 1],
lod_level=1)
score = create_tmp_var(
fluid.default_main_program(),
name=None,
dtype='float32',
shape=[-1, 1],
lod_level=1)
bbox = create_tmp_var(
fluid.default_main_program(),
name=None,
dtype='float32',
shape=[-1, 4],
lod_level=1)
mask = create_tmp_var(
fluid.default_main_program(),
name=None,
dtype='float32',
shape=[-1, -1, -1],
lod_level=1)
outs = [cls, score, bbox, mask]
fluid.layers.py_func(
func=_fast_nms, x=[bboxes, scores, mcf, im_size, proto_out], out=outs)
return outs
这个脚本之后也有一步是用预测框来裁剪掩码也是用了自定义op实现。
cpp_demo.yml配置文件如下:
demo for cpp_infer.py
use_python_inference: false # whether to use python inference mode: fluid # trt_fp32, trt_fp16, trt_int8, fluid arch: YOLACT min_subgraph_size: 4 # need 3 for YOLO arch
visualize the predicted image
metric: COCO # COCO, VOC draw_threshold: 0.5
Preprocess:
- type: Resize target_size: 512 max_size: 512
- type: Normalize
mean:
- 0.485
- 0.456
- 0.406 std:
- 0.229
- 0.224
- 0.225 is_scale: True
- type: Permute to_bgr: False
- type: PadStride stride: 0 # set 32 on FPN and 128 on RetinaNet
有好几个参数没怎么看懂,比如min_subgraph_size。 这个问题该怎么解决?