megengine_vision_deeplabv3plus.md 4.0 KB
Newer Older
1 2
---
template: hub1
M
Megvii Engine Team 已提交
3
title: DeepLabV3plus
4
summary:
M
Megvii Engine Team 已提交
5 6
    en_US: DeepLabV3plus pre-trained on VOC
    zh_CN: DeepLabV3plus (VOC预训练权重)
7 8
author: MegEngine Team
tags: [vision]
9
github-link: https://github.com/MegEngine/Models/tree/master/official/vision/segmentation
10 11 12 13 14 15 16 17 18 19 20
---

```python
from megengine import hub
model = hub.load(
    "megengine/models",
    "deeplabv3plus_res101",
    pretrained=True,
)
model.eval()
```
21
<!-- section: zh_CN -->
22 23 24 25 26 27

所有预训练模型希望数据被正确预处理。模型要求输入BGR的图片, 建议缩放到512x512,最后做归一化处理 (均值为: `[103.530, 116.280, 123.675]`, 标准差为: `[57.375, 57.120, 58.395]`)。


下面是一段处理一张图片的样例代码。

28 29
```python
# Download an example image from the megengine data website
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
import urllib
url, filename = ("https://data.megengine.org.cn/images/cat.jpg", "cat.jpg")
try: urllib.URLopener().retrieve(url, filename)
except: urllib.request.urlretrieve(url, filename)

# Read and pre-process the image
import cv2
import numpy as np
import megengine.data.transform as T
import megengine.functional as F

import megengine.jit as jit
@jit.trace(symbolic=True, opt_level=2)
def pred_fun(data, net=None):
    net.eval()
    pred = net(data)
    return pred

image = cv2.imread("cat.jpg")
orih, oriw = image.shape[:2]
transform = T.Compose([
    T.Resize((512, 512)),
    T.Normalize(mean=[103.530, 116.280, 123.675], std=[57.375, 57.120, 58.395]),  # BGR
    T.ToMode(),
])
processed_img = transform.apply(image)[np.newaxis]  # CHW -> 1CHW
pred = pred_fun(processed_img, net=model)

pred = pred.numpy().squeeze().argmax(axis=0)
pred = cv2.resize(pred.astype("uint8"), (oriw, orih), interpolation=cv2.INTER_LINEAR)
```

### 模型描述

M
Megvii Engine Team 已提交
64
目前我们提供了 deeplabv3plus 的预训练模型, 在voc验证集的表现如下:
65 66 67 68 69 70 71 72 73 74

 Methods     | Backbone    | TrainSet  | EvalSet | mIoU_single   | mIoU_multi  |
 :--:        |:--:         |:--:       |:--:     |:--:           |:--:         |
 DeepLab v3+ | ResNet101   | train_aug | val     | 79.0          | 79.8        |

### 参考文献

- [Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation](https://arxiv.org/abs/1802.02611.pdf), Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and
Hartwig Adam; ECCV, 2018

75
<!-- section: en_US -->
76 77 78 79 80

All pre-trained models expect input images normalized in the same way. Input images must be 3-channel BGR images of shape (H x W x 3), reszied to (512 x 512), then normalized using mean = [103.530, 116.280, 123.675] and std = [57.375, 57.120, 58.395]).

Here's a sample execution.

81 82
```python
# Download an example image from the megengine data website
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124
import urllib
url, filename = ("https://data.megengine.org.cn/images/cat.jpg", "cat.jpg")
try: urllib.URLopener().retrieve(url, filename)
except: urllib.request.urlretrieve(url, filename)

# Read and pre-process the image
import cv2
import numpy as np
import megengine.data.transform as T
import megengine.functional as F

import megengine.jit as jit
@jit.trace(symbolic=True, opt_level=2)
def pred_fun(data, net=None):
    net.eval()
    pred = net(data)
    return pred

image = cv2.imread("cat.jpg")
orih, oriw = image.shape[:2]
transform = T.Compose([
    T.Resize((512, 512)),
    T.Normalize(mean=[103.530, 116.280, 123.675], std=[57.375, 57.120, 58.395]),  # BGR
    T.ToMode(),
])
processed_img = transform.apply(image)[np.newaxis, :]  # CHW -> 1CHW
pred = pred_fun(processed_img, net=model)

pred = pred.numpy().squeeze().argmax(axis=0)
pred = cv2.resize(pred.astype("uint8"), (oriw, orih), interpolation=cv2.INTER_LINEAR)
```

### Model Description

 Methods     | Backbone    | TrainSet  | EvalSet | mIoU_single   | mIoU_multi  |
 :--:        |:--:         |:--:       |:--:     |:--:           |:--:         |
 DeepLab v3+ | ResNet101   | train_aug | val     | 79.0          | 79.8        |

### References

- [Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation](https://arxiv.org/abs/1802.02611.pdf), Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and
Hartwig Adam; ECCV, 2018