Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • PaddleDetection
  • Issue
  • #412

P
PaddleDetection
  • 项目概览

PaddlePaddle / PaddleDetection
大约 2 年 前同步成功

通知 708
Star 11112
Fork 2696
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 184
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 40
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
P
PaddleDetection
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 184
    • Issue 184
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 40
    • 合并请求 40
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板
已关闭
开放中
Opened 3月 29, 2020 by saxon_zh@saxon_zhGuest

导出模型失败

Created by: miemie2013

开发者团队你们好!本人参考PaddleDetection的代码结构复现了yolact,用infer.py脚本是可以正常预测的,但导出模型时出错。有自定义的op。不同于yolov3是直接输出一个[M, 6]的张量,我的网络输出了4个张量。 导出模型(export_model.py脚本)时是没有问题的,最后提示 2020-03-29 16:11:12,077-INFO: Export inference model to ./inference_model\yolact, input: ['image', 'im_size'], output: ['_generated_var_5', '_generated_var_4', '_generated_var_0', '_generated _var_1']...

但是预测时(cpp_infer.py脚本)有问题 Traceback (most recent call last): File "tools/cpp_infer.py", line 327, in infer() File "tools/cpp_infer.py", line 259, in infer outs1, outs2, outs3, outs4 = predict.run(inputs) paddle.fluid.core_avx.EnforceNotMet:


C++ Call Stacks (More useful to developers):

Windows not support stack backtrace yet.


Python Call Stacks (More useful to users):

File "D:\Python36\lib\site-packages\paddle\fluid\framework.py", line 2525, in append_op attrs=kwargs.get("attrs", None)) File "D:\Python36\lib\site-packages\paddle\fluid\layer_helper.py", line 43, in append_op return self.main_program.current_block().append_op(*args, **kwargs) File "D:\Python36\lib\site-packages\paddle\fluid\layers\nn.py", line 12640, in py_func 'backward_skip_vars': list(backward_skip_vars) File "D:\PycharmProjects\yolactpp\tools..\ppdet\modeling\ops.py", line 735, in call func=_fast_nms, x=[bboxes, scores, mcf, im_size, proto_out], out=outs) File "D:\PycharmProjects\yolactpp\tools..\ppdet\modeling\anchor_heads\yolact_head.py", line 406, in get_prediction im_size=im_size, proto_out=proto_out[0]) File "D:\PycharmProjects\yolactpp\tools..\ppdet\modeling\architectures\yolact.py", line 102, in build return self.yolact_head.get_prediction(body_feats, im_size) File "D:\PycharmProjects\yolactpp\tools..\ppdet\modeling\architectures\yolact.py", line 187, in test return self.build(feed_vars, mode='test') File "tools/export_model.py", line 105, in main test_fetches = model.test(feed_vars) File "tools/export_model.py", line 122, in main()


Error Message Summary:

Error: Invalid python callable id [Hint: Expected i < g_py_callables.size(), but received i:0 >= g_py_callables.size():0.] at (D:\1.7.1\paddle\paddle\fluid\operators\py_func_op.cc:45) [operator < py_func > error]

自定义op的源代码如下:

@register @serializable class FastNMS(object): def init( self, score_threshold=0.01, nms_top_k=100, keep_top_k=100, nms_threshold=0.45): super(FastNMS, self).init() self.score_threshold = score_threshold self.nms_top_k = nms_top_k self.keep_top_k = keep_top_k self.nms_threshold = nms_threshold

def __call__(self, bboxes, scores, mcf, im_size, proto_out):
    # 暂时支持批大小为1
    # bboxes是一个形为[-1, 4]的张量,即这张图片所有的预测框(还未进行分数过滤)
    # scores是一个形为[-1, 80]的张量,即bboxes属于各个类别的得分
    # mcf是一个形为[-1, 32]的张量,即bboxes自带的32个掩码系数
    # im_size是一个形为[1, 2]的张量,这一张图片的高宽
    # proto_out是一个形为[-1, -1, 32]的张量,即这张图片的32个掩码原型
    def create_tmp_var(program, name, dtype, shape, lod_level):
        return program.current_block().create_var(
            name=name, dtype=dtype, shape=shape, lod_level=lod_level)

    def _ious(boxes):
        '''
        尝试过的几种方法里最快的一种(约20ms),只计算上三角部分的iou,有点像冒泡排序,也就不用生成上三角的掩码只保留上三角部分。
        '''
        n = boxes.shape[0]
        A = boxes.shape[1]
        r = np.zeros((n, A, A))
        for j in range(A - 1):
            tx1 = boxes[:, j, 0:1]
            ty1 = boxes[:, j, 1:2]
            tx2 = boxes[:, j, 2:3]
            ty2 = boxes[:, j, 3:4]
            x1 = boxes[:, j + 1:, 0]
            y1 = boxes[:, j + 1:, 1]
            x2 = boxes[:, j + 1:, 2]
            y2 = boxes[:, j + 1:, 3]
            areas = (x2 - x1) * (y2 - y1)
            xx1 = np.maximum(tx1, x1)
            yy1 = np.maximum(ty1, y1)
            xx2 = np.minimum(tx2, x2)
            yy2 = np.minimum(ty2, y2)
            w = np.maximum(0.0, xx2 - xx1)
            h = np.maximum(0.0, yy2 - yy1)
            inter = w * h
            ious = inter / (areas + (tx2 - tx1) * (ty2 - ty1) - inter + 1e-9)
            r[:, j, j + 1:] = ious
        return r

    def _fast_nms(bboxes, scores, mcf, im_size, proto_out):
        # 来到这里,花费了96ms
        # 标记1,下面的步骤耗时几乎为0
        bboxes = np.array(bboxes)  # 形状为   [M,4] M是边界框的个数。数据类型为float32或float64
        scores = np.array(scores)  # 具有形状 [M,C]    的2-D LoDTensor。 M是bbox的数量,C是种类数目
        mcf = np.array(mcf)  # 具有形状 [M,32]   的2-D LoDTensor。 M是bbox的数量
        im_size = np.array(im_size)  # [-1, 2]
        proto_out = np.array(proto_out)  # [-1, -1, 32]

        # 输出
        cls_tensor = fluid.LoDTensor()
        score_tensor = fluid.LoDTensor()
        bbox_tensor = fluid.LoDTensor()
        mask_tensor = fluid.LoDTensor()


        class_nums = scores.shape[-1]  # C=80

        thresh = self.score_threshold
        iou_threshold = self.nms_threshold
        nms_top_k = self.nms_top_k
        keep_top_k = self.keep_top_k

        # 分数过滤
        scores_tr = scores.transpose(1, 0)  # [80, M]
        conf_scores = np.max(scores_tr, axis=0)
        keep = np.where(conf_scores > thresh)[0]
        if len(keep) == 0:
            result_c = np.array([[]], dtype=np.int32)
            result_s = np.array([[]], dtype=np.float32)
            result_b = np.zeros((1, 4)).astype(np.float32)
            mask = np.zeros((1, 1, 1)).astype(np.float32)

            cls_tensor.set_lod([[0, result_c.shape[0]]])
            score_tensor.set_lod([[0, result_c.shape[0]]])
            bbox_tensor.set_lod([[0, result_c.shape[0]]])
            mask_tensor.set_lod([[0, result_c.shape[0]]])
            cls_tensor.set(result_c, fluid.CPUPlace())
            score_tensor.set(result_s, fluid.CPUPlace())
            bbox_tensor.set(result_b, fluid.CPUPlace())
            mask_tensor.set(mask, fluid.CPUPlace())

            return cls_tensor, score_tensor, bbox_tensor, mask_tensor

        # 分数过滤
        scores = scores[keep]
        scores = scores.transpose(1, 0)  # [80, ?]
        boxes = bboxes[keep]  # [?, 4]
        masks = mcf[keep]  # [?, 32]

        # fastnms
        # 每个类别所有方框(最大分数大于阈值的方框) 降序排列
        scores_sorted = np.sort(scores, axis=-1)
        scores_sorted = scores_sorted[:, ::-1]
        idx = np.argsort(-scores, axis=-1)
        idx = idx[:, :nms_top_k]
        scores_sorted = scores_sorted[:, :nms_top_k]

        num_dets = idx.shape[1]

        idx = np.reshape(idx, (-1,))  # [80 * ?, ]
        boxes = boxes[idx]  # [80 * ?, 4]
        boxes = np.reshape(boxes, (class_nums, num_dets, 4))  # [80, ?, 4]
        masks = masks[idx]  # [80 * ?, 32]
        masks = np.reshape(masks, (class_nums, num_dets, -1))  # [80, ?, 32]
        # 标记1,上面的步骤耗时几乎为0


        # 这一步还是比较耗时。nms_top_k从500下调到100,能减少12ms左右。
        # 计算一个c×n×n的IOU矩阵,其中每个n×n矩阵表示对该类n个候选框,两两之间的IOU
        # 只要某个框最高分数 > 阈值就保留。然而计算这个矩阵时,这个框其实重复了80次,每一个分身代表是不同类的物品。
        iou = _ious(boxes)

        # 留下来的iou都是这一列的框与分数比它高的框的iou,如果iou过大,应该丢弃这一列代表的框。
        iou_max = np.max(iou, axis=1)  # [80, ?]   每一类别的矩阵,每一列求最大值

        # 只保留最大值低于阈值的。nms结束。
        result_b = []
        result_s = []
        result_c = []
        result_m = []
        start_idx = 0
        for j in range(start_idx, class_nums):  # 遍历所有框的各个类别j,循环80次
            # Now just filter out the ones higher than the threshold
            keep = np.where(iou_max[j] <= iou_threshold)[0]

            b = boxes[j][keep]  # [?, 4]
            s = scores_sorted[j][keep]  # [?, ]
            c = (np.zeros((len(s),)) + j).astype(np.int32)  # [?, ]
            m = masks[j][keep]  # [?, 32]

            result_b.append(b)
            result_s.append(s[:, np.newaxis])
            result_c.append(c[:, np.newaxis])
            result_m.append(m)
        # 80类的结果拼接
        result_b = np.vstack(result_b)  # [?, 4]
        result_s = np.vstack(result_s)  # [?, 1]
        result_c = np.vstack(result_c)  # [?, 1]
        result_m = np.vstack(result_m)  # [?, 32]

        # 再做一次分数过滤。前面提到,只要某个框最高分数>阈值就保留,
        # 然而计算上面那个矩阵时,这个框其实重复了80次,每一个分身代表是不同类的物品。
        # 非最高分数的其它类别,它的得分可能小于阈值,要过滤。
        # 所以fastnms存在这么一个现象:某个框它最高分数 > 阈值,它有一个非最高分数类的得分也超过了阈值,
        # 那么最后有可能两个框都保留,而且这两个框有相同的xywh和相同的掩码系数(框重合掩码也重合)
        # 其它的nms算法,不会有一个框其实重复了80次的现象,分数只会取最高分数。
        keep = np.where(result_s > thresh)[0]
        # 分数过滤
        result_b = result_b[keep]  # [?, 4]
        result_s = result_s[keep]  # [?, 1]
        result_c = result_c[keep]  # [?, 1]
        result_m = result_m[keep]  # [?, 32]

        # Limit to max_per_image detections **over all classes**
        image_scores = result_s[:, 0]
        if len(image_scores) > keep_top_k:
            image_thresh = np.sort(image_scores)[-keep_top_k]
            keep = np.where(result_s[:, 0] >= image_thresh)[0]
            result_c = result_c[keep, :]    # [?, 1]
            result_s = result_s[keep, :]    # [?, 1]
            result_b = result_b[keep, :]    # [?, 4]
            result_m = result_m[keep, :]    # [?, 32]

        mask = np.matmul(proto_out, result_m.transpose(1, 0))

        # sigmoid()。这里用sigmoid()耗时7ms,非常不值得。
        # mask = 1.0 / (1.0 + np.exp(-mask))

        cls_tensor.set_lod([[0, result_c.shape[0]]])
        score_tensor.set_lod([[0, result_c.shape[0]]])
        bbox_tensor.set_lod([[0, result_c.shape[0]]])
        mask_tensor.set_lod([[0, result_c.shape[0]]])
        cls_tensor.set(result_c, fluid.CPUPlace())
        score_tensor.set(result_s, fluid.CPUPlace())
        bbox_tensor.set(result_b, fluid.CPUPlace())
        mask_tensor.set(mask, fluid.CPUPlace())
        return cls_tensor, score_tensor, bbox_tensor, mask_tensor

    cls = create_tmp_var(
        fluid.default_main_program(),
        name=None,
        dtype='int32',
        shape=[-1, 1],
        lod_level=1)
    score = create_tmp_var(
        fluid.default_main_program(),
        name=None,
        dtype='float32',
        shape=[-1, 1],
        lod_level=1)
    bbox = create_tmp_var(
        fluid.default_main_program(),
        name=None,
        dtype='float32',
        shape=[-1, 4],
        lod_level=1)
    mask = create_tmp_var(
        fluid.default_main_program(),
        name=None,
        dtype='float32',
        shape=[-1, -1, -1],
        lod_level=1)
    outs = [cls, score, bbox, mask]
    fluid.layers.py_func(
        func=_fast_nms, x=[bboxes, scores, mcf, im_size, proto_out], out=outs)
    return outs

这个脚本之后也有一步是用预测框来裁剪掩码也是用了自定义op实现。

cpp_demo.yml配置文件如下:

demo for cpp_infer.py

use_python_inference: false # whether to use python inference mode: fluid # trt_fp32, trt_fp16, trt_int8, fluid arch: YOLACT min_subgraph_size: 4 # need 3 for YOLO arch

visualize the predicted image

metric: COCO # COCO, VOC draw_threshold: 0.5

Preprocess:

  • type: Resize target_size: 512 max_size: 512
  • type: Normalize mean:
    • 0.485
    • 0.456
    • 0.406 std:
    • 0.229
    • 0.224
    • 0.225 is_scale: True
  • type: Permute to_bgr: False
  • type: PadStride stride: 0 # set 32 on FPN and 128 on RetinaNet

有好几个参数没怎么看懂,比如min_subgraph_size。 这个问题该怎么解决?

指派人
分配到
无
里程碑
无
分配里程碑
工时统计
无
截止日期
无
标识: paddlepaddle/PaddleDetection#412
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7