megengine_vision_deeplabv3plus.md 4.0 KB
Newer Older
1 2
---
template: hub1
M
Megvii Engine Team 已提交
3
title: DeepLabV3plus
4
summary:
M
Megvii Engine Team 已提交
5 6
    en_US: DeepLabV3plus pre-trained on VOC
    zh_CN: DeepLabV3plus (VOC预训练权重)
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
author: MegEngine Team
tags: [vision]
github-link: https://github.com/megengine/models
---

```python
from megengine import hub
model = hub.load(
    "megengine/models",
    "deeplabv3plus_res101",
    pretrained=True,
)
model.eval()
```
<!-- section: zh_CN --> 

所有预训练模型希望数据被正确预处理。模型要求输入BGR的图片, 建议缩放到512x512,最后做归一化处理 (均值为: `[103.530, 116.280, 123.675]`, 标准差为: `[57.375, 57.120, 58.395]`)。


下面是一段处理一张图片的样例代码。

```python 
# Download an example image from the megengine data website 
import urllib
url, filename = ("https://data.megengine.org.cn/images/cat.jpg", "cat.jpg")
try: urllib.URLopener().retrieve(url, filename)
except: urllib.request.urlretrieve(url, filename)

# Read and pre-process the image
import cv2
import numpy as np
import megengine.data.transform as T
import megengine.functional as F

import megengine.jit as jit
@jit.trace(symbolic=True, opt_level=2)
def pred_fun(data, net=None):
    net.eval()
    pred = net(data)
    return pred

image = cv2.imread("cat.jpg")
orih, oriw = image.shape[:2]
transform = T.Compose([
    T.Resize((512, 512)),
    T.Normalize(mean=[103.530, 116.280, 123.675], std=[57.375, 57.120, 58.395]),  # BGR
    T.ToMode(),
])
processed_img = transform.apply(image)[np.newaxis]  # CHW -> 1CHW
pred = pred_fun(processed_img, net=model)

pred = pred.numpy().squeeze().argmax(axis=0)
pred = cv2.resize(pred.astype("uint8"), (oriw, orih), interpolation=cv2.INTER_LINEAR)
```

### 模型描述

M
Megvii Engine Team 已提交
64
目前我们提供了 deeplabv3plus 的预训练模型, 在voc验证集的表现如下:
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124

 Methods     | Backbone    | TrainSet  | EvalSet | mIoU_single   | mIoU_multi  |
 :--:        |:--:         |:--:       |:--:     |:--:           |:--:         |
 DeepLab v3+ | ResNet101   | train_aug | val     | 79.0          | 79.8        |

### 参考文献

- [Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation](https://arxiv.org/abs/1802.02611.pdf), Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and
Hartwig Adam; ECCV, 2018

<!-- section: en_US --> 

All pre-trained models expect input images normalized in the same way. Input images must be 3-channel BGR images of shape (H x W x 3), reszied to (512 x 512), then normalized using mean = [103.530, 116.280, 123.675] and std = [57.375, 57.120, 58.395]).

Here's a sample execution.

```python 
# Download an example image from the megengine data website 
import urllib
url, filename = ("https://data.megengine.org.cn/images/cat.jpg", "cat.jpg")
try: urllib.URLopener().retrieve(url, filename)
except: urllib.request.urlretrieve(url, filename)

# Read and pre-process the image
import cv2
import numpy as np
import megengine.data.transform as T
import megengine.functional as F

import megengine.jit as jit
@jit.trace(symbolic=True, opt_level=2)
def pred_fun(data, net=None):
    net.eval()
    pred = net(data)
    return pred

image = cv2.imread("cat.jpg")
orih, oriw = image.shape[:2]
transform = T.Compose([
    T.Resize((512, 512)),
    T.Normalize(mean=[103.530, 116.280, 123.675], std=[57.375, 57.120, 58.395]),  # BGR
    T.ToMode(),
])
processed_img = transform.apply(image)[np.newaxis, :]  # CHW -> 1CHW
pred = pred_fun(processed_img, net=model)

pred = pred.numpy().squeeze().argmax(axis=0)
pred = cv2.resize(pred.astype("uint8"), (oriw, orih), interpolation=cv2.INTER_LINEAR)
```

### Model Description

 Methods     | Backbone    | TrainSet  | EvalSet | mIoU_single   | mIoU_multi  |
 :--:        |:--:         |:--:       |:--:     |:--:           |:--:         |
 DeepLab v3+ | ResNet101   | train_aug | val     | 79.0          | 79.8        |

### References

- [Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation](https://arxiv.org/abs/1802.02611.pdf), Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and
Hartwig Adam; ECCV, 2018