megengine_vision_simplebaseline.md 5.1 KB
Newer Older
1 2 3 4 5 6 7 8
---
template: hub1
title: SimpleBaseline
summary:
    en_US: SimpleBaeline on COCO
    zh_CN: SimpleBaeline(COCO 预训练权重)
author: MegEngine Team
tags: [vision, keypoints]
9
github-link: https://github.com/MegEngine/Models/tree/master/official/vision/keypoints
10 11 12 13 14 15 16 17 18 19
---

```python3
import megengine.hub
model = megengine.hub.load('megengine/models', 'simplebaseline_res50', pretrained=True)
# or any of these variants
# model = megengine.hub.load('megengine/models', 'simplebaseline_res101', pretrained=True)
# model = megengine.hub.load('megengine/models', 'simplebaseline_res152', pretrained=True)
model.eval()
```
20
<!-- section: zh_CN -->
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81
SimpleBaseline是单人关节点检测模型,在多人场景下需要配合人体检测器使用。详细的多人检测代码示例可以参考[inference.py](https://github.com/MegEngine/Models/blob/master/official/vision/keypoints/inference.py)

针对单张图片,这里提供使用retinanet做人体检测,然后用SimpleBaseline检测关节点的示例:
```python3

import urllib
url, filename = ("https://data.megengine.org.cn/images/cat.jpg", "cat.jpg")
try: urllib.URLopener().retrieve(url, filename)
except: urllib.request.urlretrieve(url, filename)

# Read and pre-process the image
import cv2
image = cv2.imread("cat.jpg")

import official.vision.detection.retinanet_res50_coco_1x_800size as Det
detector = Det.retinanet_res50_1x_800size(pretrained=True)

models_api = hub.import_module(
    "megengine/models",
    git_host="github.com",
)

@jit.trace(symbolic=True)
def det_func():
    pred = detector(detector.inputs)
    return pred

@jit.trace(symbolic=True)
def keypoint_func():
    pred = model.predict()
    return pred

evaluator = models_api.KeypointEvaluator(
    detector,
    det_func,
    model,
    keypoint_func
    )

print("Detecting Persons")
person_boxes = evaluator.detect_persons(image)

print("Detecting Keypoints")
all_keypoints = evaluator.predict(image, person_boxes)

print("Visualizing")
canvas = evaluator.vis_skeletons(image, all_keypoints)
cv2.imwrite("vis_skeleton.jpg", canvas)
```

### 模型描述
本目录使用了在COCO val2017上的Human AP为56.4的人体检测结果,最后在COCO val2017上人体关节点估计结果为
|Methods|Backbone|Input Size| AP | Ap .5 | AP .75 | AP (M) | AP (L) | AR | AR .5 | AR .75 | AR (M) | AR (L) |
|---|:---:|---|---|---|---|---|---|---|---|---|---|---|
| SimpleBaseline |Res50 |256x192| 0.712 | 0.887 | 0.779 | 0.673 | 0.785 | 0.782 | 0.932 | 0.839 | 0.730 | 0.854 |
| SimpleBaseline |Res101|256x192| 0.722 | 0.891 | 0.795 | 0.687 | 0.795 | 0.794 | 0.936 | 0.855 | 0.745 | 0.863 |
| SimpleBaseline |Res152|256x192| 0.724 | 0.888 | 0.794 | 0.688 | 0.795 | 0.795 | 0.934 | 0.856 | 0.746 | 0.863 |

### 参考文献
- [Simple Baselines for Human Pose Estimation and Tracking](https://arxiv.org/pdf/1804.06208.pdf), Bin Xiao, Haiping Wu, and Yichen Wei

82
<!-- section: en_US -->
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143
SimpleBaseline is classical network for single person pose estimation. It can also be applied to multi-person cases when combined with a human detector. The details of this pipline can be referred to [inference.py](https://github.com/MegEngine/Models/blob/master/official/vision/keypoints/inference.py).

For single image, here is a sample execution when SimpleBaseline is combined with retinanet

```python3

import urllib
url, filename = ("https://data.megengine.org.cn/images/cat.jpg", "cat.jpg")
try: urllib.URLopener().retrieve(url, filename)
except: urllib.request.urlretrieve(url, filename)

# Read and pre-process the image
import cv2
image = cv2.imread("cat.jpg")

import official.vision.detection.retinanet_res50_coco_1x_800size as Det
detector = Det.retinanet_res50_1x_800size(pretrained=True)

models_api = hub.import_module(
    "megengine/models",
    git_host="github.com",
)

@jit.trace(symbolic=True)
def det_func():
    pred = detector(detector.inputs)
    return pred

@jit.trace(symbolic=True)
def keypoint_func():
    pred = model.predict()
    return pred

evaluator = models_api.KeypointEvaluator(
    detector,
    det_func,
    model,
    keypoint_func
    )

print("Detecting Persons")
person_boxes = evaluator.detect_persons(image)

print("Detecting Keypoints")
all_keypoints = evaluator.predict(image, person_boxes)

print("Visualizing")
canvas = evaluator.vis_skeletons(image, all_keypoints)
cv2.imwrite("vis_skeleton.jpg", canvas)
```
### Model Desription

With the AP human detectoin results being 56.4 on COCO val2017 dataset, the performances of simplebline on COCO val2017 dataset are

|Methods|Backbone|Input Size| AP | Ap .5 | AP .75 | AP (M) | AP (L) | AR | AR .5 | AR .75 | AR (M) | AR (L) |
|---|:---:|---|---|---|---|---|---|---|---|---|---|---|
| SimpleBaseline |Res50 |256x192| 0.712 | 0.887 | 0.779 | 0.673 | 0.785 | 0.782 | 0.932 | 0.839 | 0.730 | 0.854 |
| SimpleBaseline |Res101|256x192| 0.722 | 0.891 | 0.795 | 0.687 | 0.795 | 0.794 | 0.936 | 0.855 | 0.745 | 0.863 |
| SimpleBaseline |Res152|256x192| 0.724 | 0.888 | 0.794 | 0.688 | 0.795 | 0.795 | 0.934 | 0.856 | 0.746 | 0.863 |

### References
144
- [Simple Baselines for Human Pose Estimation and Tracking](https://arxiv.org/pdf/1804.06208.pdf), Bin Xiao, Haiping Wu, and Yichen Wei