19.md 22.6 KB
Newer Older
W
wizardforcel 已提交
1
# `torchvision`对象检测微调教程
W
wizardforcel 已提交
2

W
wizardforcel 已提交
3
> 原文:<https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html>
W
wizardforcel 已提交
4 5 6 7 8

小费

为了充分利用本教程,我们建议使用此 [Colab 版本](https://colab.research.google.com/github/pytorch/vision/blob/temp-tutorial/tutorials/torchvision_finetuning_instance_segmentation.ipynb)。 这将使您可以尝试以下信息。

W
wizardforcel 已提交
9
在本教程中,我们将对 [Penn-Fudan 数据库](https://www.cis.upenn.edu/~jshi/ped_html/)中的行人检测和分割,使用预训练的 [Mask R-CNN](https://arxiv.org/abs/1703.06870) 模型进行微调。 它包含 170 个图像和 345 个行人实例,我们将用它来说明如何在`torchvision`中使用新功能,以便在自定义数据集上训练实例细分模型。
W
wizardforcel 已提交
10

W
wizardforcel 已提交
11
## 定义数据集
W
wizardforcel 已提交
12 13 14 15 16 17 18 19 20 21 22

用于训练对象检测,实例细分和人员关键点检测的参考脚本可轻松支持添加新的自定义数据集。 数据集应继承自标准`torch.utils.data.Dataset`类,并实现`__len__``__getitem__`

我们唯一需要的特异性是数据集`__getitem__`应该返回:

*   图像:大小为`(H, W)`的 PIL 图像
*   目标:包含以下字段的字典
    *   `boxes (FloatTensor[N, 4])``[x0, y0, x1, y1]`格式的`N`边界框的坐标,范围从`0``W`,从`0``H`
    *   `labels (Int64Tensor[N])`:每个边界框的标签。 `0`始终代表背景类。
    *   `image_id (Int64Tensor[1])`:图像标识符。 它在数据集中的所有图像之间应该是唯一的,并在评估过程中使用
    *   `area (Tensor[N])`:边界框的区域。 在使用 COCO 度量进行评估时,可使用此值来区分小盒子,中盒子和大盒子之间的度量得分。
W
wizardforcel 已提交
23
    *   `iscrowd (UInt8Tensor[N])``iscrowd = True`的实例在评估期间将被忽略。
W
wizardforcel 已提交
24
    *   (可选)`masks (UInt8Tensor[N, H, W])`:每个对象的分割蒙版
W
wizardforcel 已提交
25
    *   (可选)`keypoints (FloatTensor[N, K, 3])`:对于 N 个对象中的每一个,它包含`[x, y, visibility]`格式的 K 个关键点,以定义对象。 可见性为 0 表示关键点不可见。 请注意,对于数据扩充,翻转关键点的概念取决于数据表示形式,您可能应该将`references/detection/transforms.py`修改为新的关键点表示形式
W
wizardforcel 已提交
26 27 28 29 30

如果您的模型返回上述方法,则它们将使其适用于训练和评估,并将使用`pycocotools`中的评估脚本。

注意

W
wizardforcel 已提交
31
对于 Windows,请使用命令从[`gautamchitnis`](https://github.com/gautamchitnis/cocoapi)安装`pycocotools`
W
wizardforcel 已提交
32 33 34

`pip install git+https://github.com/gautamchitnis/cocoapi.git@cocodataset-master#subdirectory=PythonAPI`

W
wizardforcel 已提交
35
关于`labels`的注解。 该模型将`0`类作为背景。 如果您的数据集不包含背景类,则`labels`中不应包含`0`。 例如,假设您只有*猫**狗*两类,则可以定义`1`来表示*猫*`0`代表*狗*。 因此,例如,如果其中一个图像同时具有两个类,则您的`labels`张量应类似于`[1,2]`
W
wizardforcel 已提交
36

W
wizardforcel 已提交
37
此外,如果要在训练过程中使用宽高比分组(以便每个批量仅包含具有相似长宽比的图像),则建议您还实现`get_height_and_width`方法,该方法返回图像的高度和宽度。 如果未提供此方法,我们将通过`__getitem__`查询数据集的所有元素,这会将图像加载到内存中,并且比提供自定义方法慢。
W
wizardforcel 已提交
38

W
wizardforcel 已提交
39
### 为 PennFudan 编写自定义数据集
W
wizardforcel 已提交
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141

让我们为 PennFudan 数据集编写一个数据集。 在[下载并解压缩 zip 文件](https://www.cis.upenn.edu/~jshi/ped_html/PennFudanPed.zip)之后,我们具有以下文件夹结构:

```py
PennFudanPed/
  PedMasks/
    FudanPed00001_mask.png
    FudanPed00002_mask.png
    FudanPed00003_mask.png
    FudanPed00004_mask.png
    ...
  PNGimg/
    FudanPed00001.png
    FudanPed00002.png
    FudanPed00003.png
    FudanPed00004.png

```

这是一对图像和分割蒙版的一个示例

![intermediate/../../_static/img/tv_tutorial/tv_image01.png](img/342d5d0add3b5754dae73ff222bbc543.png) ![intermediate/../../_static/img/tv_tutorial/tv_image02.png](img/c814c5c2350e00cf5fc0d883acf0843c.png)

因此,每个图像都有一个对应的分割蒙版,其中每个颜色对应一个不同的实例。 让我们为此数据集编写一个`torch.utils.data.Dataset`类。

```py
import os
import numpy as np
import torch
from PIL import Image

class PennFudanDataset(object):
    def __init__(self, root, transforms):
        self.root = root
        self.transforms = transforms
        # load all image files, sorting them to
        # ensure that they are aligned
        self.imgs = list(sorted(os.listdir(os.path.join(root, "PNGImages"))))
        self.masks = list(sorted(os.listdir(os.path.join(root, "PedMasks"))))

    def __getitem__(self, idx):
        # load images ad masks
        img_path = os.path.join(self.root, "PNGImages", self.imgs[idx])
        mask_path = os.path.join(self.root, "PedMasks", self.masks[idx])
        img = Image.open(img_path).convert("RGB")
        # note that we haven't converted the mask to RGB,
        # because each color corresponds to a different instance
        # with 0 being background
        mask = Image.open(mask_path)
        # convert the PIL Image into a numpy array
        mask = np.array(mask)
        # instances are encoded as different colors
        obj_ids = np.unique(mask)
        # first id is the background, so remove it
        obj_ids = obj_ids[1:]

        # split the color-encoded mask into a set
        # of binary masks
        masks = mask == obj_ids[:, None, None]

        # get bounding box coordinates for each mask
        num_objs = len(obj_ids)
        boxes = []
        for i in range(num_objs):
            pos = np.where(masks[i])
            xmin = np.min(pos[1])
            xmax = np.max(pos[1])
            ymin = np.min(pos[0])
            ymax = np.max(pos[0])
            boxes.append([xmin, ymin, xmax, ymax])

        # convert everything into a torch.Tensor
        boxes = torch.as_tensor(boxes, dtype=torch.float32)
        # there is only one class
        labels = torch.ones((num_objs,), dtype=torch.int64)
        masks = torch.as_tensor(masks, dtype=torch.uint8)

        image_id = torch.tensor([idx])
        area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
        # suppose all instances are not crowd
        iscrowd = torch.zeros((num_objs,), dtype=torch.int64)

        target = {}
        target["boxes"] = boxes
        target["labels"] = labels
        target["masks"] = masks
        target["image_id"] = image_id
        target["area"] = area
        target["iscrowd"] = iscrowd

        if self.transforms is not None:
            img, target = self.transforms(img, target)

        return img, target

    def __len__(self):
        return len(self.imgs)

```

这就是数据集的全部内容。 现在,我们定义一个可以对该数据集执行预测的模型。

W
wizardforcel 已提交
142
## 定义模型
W
wizardforcel 已提交
143

W
wizardforcel 已提交
144
在本教程中,我们将基于 [Faster R-CNN](https://arxiv.org/abs/1506.01497) 使用 [Mask R-CNN](https://arxiv.org/abs/1703.06870) 。 Faster R-CNN 是可预测图像中潜在对象的边界框和类分数的模型。
W
wizardforcel 已提交
145 146 147 148 149 150 151

![intermediate/../../_static/img/tv_tutorial/tv_image03.png](img/611c2725bdfb89e258da9a99fca53433.png)

Mask R-CNN 在 Faster R-CNN 中增加了一个分支,该分支还可以预测每个实例的分割掩码。

![intermediate/../../_static/img/tv_tutorial/tv_image04.png](img/afd408b97567c661cc8cb8a80c7c777c.png)

W
wizardforcel 已提交
152
在两种常见情况下,可能要修改`torchvision`模型动物园中的可用模型之一。 首先是当我们想从预先训练的模型开始,然后微调最后一层时。 另一个是当我们想要用另一个模型替换主干时(例如,为了更快的预测)。
W
wizardforcel 已提交
153 154 155

在以下各节中,让我们看看如何做一个或另一个。

W
wizardforcel 已提交
156
### 1-将预训练模型用于微调
W
wizardforcel 已提交
157

W
wizardforcel 已提交
158
假设您要从在 COCO 上经过预训练的模型开始,并希望针对您的特定类对其进行微调。 这是一种可行的方法:
W
wizardforcel 已提交
159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176

```py
import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor

# load a model pre-trained pre-trained on COCO
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)

# replace the classifier with a new one, that has
# num_classes which is user-defined
num_classes = 2  # 1 class (person) + background
# get number of input features for the classifier
in_features = model.roi_heads.box_predictor.cls_score.in_features
# replace the pre-trained head with a new one
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)

```

W
wizardforcel 已提交
177
### 2-修改模型以添加其他主干
W
wizardforcel 已提交
178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218

```py
import torchvision
from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator

# load a pre-trained model for classification and return
# only the features
backbone = torchvision.models.mobilenet_v2(pretrained=True).features
# FasterRCNN needs to know the number of
# output channels in a backbone. For mobilenet_v2, it's 1280
# so we need to add it here
backbone.out_channels = 1280

# let's make the RPN generate 5 x 3 anchors per spatial
# location, with 5 different sizes and 3 different aspect
# ratios. We have a Tuple[Tuple[int]] because each feature
# map could potentially have different sizes and
# aspect ratios
anchor_generator = AnchorGenerator(sizes=((32, 64, 128, 256, 512),),
                                   aspect_ratios=((0.5, 1.0, 2.0),))

# let's define what are the feature maps that we will
# use to perform the region of interest cropping, as well as
# the size of the crop after rescaling.
# if your backbone returns a Tensor, featmap_names is expected to
# be [0]. More generally, the backbone should return an
# OrderedDict[Tensor], and in featmap_names you can choose which
# feature maps to use.
roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=[0],
                                                output_size=7,
                                                sampling_ratio=2)

# put the pieces together inside a FasterRCNN model
model = FasterRCNN(backbone,
                   num_classes=2,
                   rpn_anchor_generator=anchor_generator,
                   box_roi_pool=roi_pooler)

```

W
wizardforcel 已提交
219
### PennFudan 数据集的实例细分模型
W
wizardforcel 已提交
220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252

在我们的案例中,由于我们的数据集非常小,我们希望从预训练模型中进行微调,因此我们将遵循方法 1。

这里我们还想计算实例分割掩码,因此我们将使用 Mask R-CNN:

```py
import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection.mask_rcnn import MaskRCNNPredictor

def get_model_instance_segmentation(num_classes):
    # load an instance segmentation model pre-trained pre-trained on COCO
    model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)

    # get number of input features for the classifier
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    # replace the pre-trained head with a new one
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)

    # now get the number of input features for the mask classifier
    in_features_mask = model.roi_heads.mask_predictor.conv5_mask.in_channels
    hidden_layer = 256
    # and replace the mask predictor with a new one
    model.roi_heads.mask_predictor = MaskRCNNPredictor(in_features_mask,
                                                       hidden_layer,
                                                       num_classes)

    return model

```

就是这样,这将使`model`随时可以在您的自定义数据集上进行训练和评估。

W
wizardforcel 已提交
253
## 将所有内容放在一起
W
wizardforcel 已提交
254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270

`references/detection/`中,我们提供了许多帮助程序功能来简化训练和评估检测模型。 在这里,我们将使用`references/detection/engine.py``references/detection/utils.py``references/detection/transforms.py`。 只需将它们复制到您的文件夹中,然后在此处使用它们即可。

让我们写一些辅助函数来进行数据扩充/转换:

```py
import transforms as T

def get_transform(train):
    transforms = []
    transforms.append(T.ToTensor())
    if train:
        transforms.append(T.RandomHorizontalFlip(0.5))
    return T.Compose(transforms)

```

W
wizardforcel 已提交
271
## 测试`forward()`方法(可选)
W
wizardforcel 已提交
272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292

在遍历数据集之前,最好先查看模型在训练过程中的期望值以及对样本数据的推断时间。

```py
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
dataset = PennFudanDataset('PennFudanPed', get_transform(train=True))
data_loader = torch.utils.data.DataLoader(
 dataset, batch_size=2, shuffle=True, num_workers=4,
 collate_fn=utils.collate_fn)
# For Training
images,targets = next(iter(data_loader))
images = list(image for image in images)
targets = [{k: v for k, v in t.items()} for t in targets]
output = model(images,targets)   # Returns losses and detections
# For inference
model.eval()
x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
predictions = model(x)           # Returns predictions

```

W
wizardforcel 已提交
293
现在,我们编写执行训练和验证的`main`函数:
W
wizardforcel 已提交
294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402

```py
from engine import train_one_epoch, evaluate
import utils

def main():
    # train on the GPU or on the CPU, if a GPU is not available
    device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')

    # our dataset has two classes only - background and person
    num_classes = 2
    # use our dataset and defined transformations
    dataset = PennFudanDataset('PennFudanPed', get_transform(train=True))
    dataset_test = PennFudanDataset('PennFudanPed', get_transform(train=False))

    # split the dataset in train and test set
    indices = torch.randperm(len(dataset)).tolist()
    dataset = torch.utils.data.Subset(dataset, indices[:-50])
    dataset_test = torch.utils.data.Subset(dataset_test, indices[-50:])

    # define training and validation data loaders
    data_loader = torch.utils.data.DataLoader(
        dataset, batch_size=2, shuffle=True, num_workers=4,
        collate_fn=utils.collate_fn)

    data_loader_test = torch.utils.data.DataLoader(
        dataset_test, batch_size=1, shuffle=False, num_workers=4,
        collate_fn=utils.collate_fn)

    # get the model using our helper function
    model = get_model_instance_segmentation(num_classes)

    # move model to the right device
    model.to(device)

    # construct an optimizer
    params = [p for p in model.parameters() if p.requires_grad]
    optimizer = torch.optim.SGD(params, lr=0.005,
                                momentum=0.9, weight_decay=0.0005)
    # and a learning rate scheduler
    lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,
                                                   step_size=3,
                                                   gamma=0.1)

    # let's train it for 10 epochs
    num_epochs = 10

    for epoch in range(num_epochs):
        # train for one epoch, printing every 10 iterations
        train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10)
        # update the learning rate
        lr_scheduler.step()
        # evaluate on the test dataset
        evaluate(model, data_loader_test, device=device)

    print("That's it!")

```

您应该获得第一个时期的输出:

```py
Epoch: [0]  [ 0/60]  eta: 0:01:18  lr: 0.000090  loss: 2.5213 (2.5213)  loss_classifier: 0.8025 (0.8025)  loss_box_reg: 0.2634 (0.2634)  loss_mask: 1.4265 (1.4265)  loss_objectness: 0.0190 (0.0190)  loss_rpn_box_reg: 0.0099 (0.0099)  time: 1.3121  data: 0.3024  max mem: 3485
Epoch: [0]  [10/60]  eta: 0:00:20  lr: 0.000936  loss: 1.3007 (1.5313)  loss_classifier: 0.3979 (0.4719)  loss_box_reg: 0.2454 (0.2272)  loss_mask: 0.6089 (0.7953)  loss_objectness: 0.0197 (0.0228)  loss_rpn_box_reg: 0.0121 (0.0141)  time: 0.4198  data: 0.0298  max mem: 5081
Epoch: [0]  [20/60]  eta: 0:00:15  lr: 0.001783  loss: 0.7567 (1.1056)  loss_classifier: 0.2221 (0.3319)  loss_box_reg: 0.2002 (0.2106)  loss_mask: 0.2904 (0.5332)  loss_objectness: 0.0146 (0.0176)  loss_rpn_box_reg: 0.0094 (0.0123)  time: 0.3293  data: 0.0035  max mem: 5081
Epoch: [0]  [30/60]  eta: 0:00:11  lr: 0.002629  loss: 0.4705 (0.8935)  loss_classifier: 0.0991 (0.2517)  loss_box_reg: 0.1578 (0.1957)  loss_mask: 0.1970 (0.4204)  loss_objectness: 0.0061 (0.0140)  loss_rpn_box_reg: 0.0075 (0.0118)  time: 0.3403  data: 0.0044  max mem: 5081
Epoch: [0]  [40/60]  eta: 0:00:07  lr: 0.003476  loss: 0.3901 (0.7568)  loss_classifier: 0.0648 (0.2022)  loss_box_reg: 0.1207 (0.1736)  loss_mask: 0.1705 (0.3585)  loss_objectness: 0.0018 (0.0113)  loss_rpn_box_reg: 0.0075 (0.0112)  time: 0.3407  data: 0.0044  max mem: 5081
Epoch: [0]  [50/60]  eta: 0:00:03  lr: 0.004323  loss: 0.3237 (0.6703)  loss_classifier: 0.0474 (0.1731)  loss_box_reg: 0.1109 (0.1561)  loss_mask: 0.1658 (0.3201)  loss_objectness: 0.0015 (0.0093)  loss_rpn_box_reg: 0.0093 (0.0116)  time: 0.3379  data: 0.0043  max mem: 5081
Epoch: [0]  [59/60]  eta: 0:00:00  lr: 0.005000  loss: 0.2540 (0.6082)  loss_classifier: 0.0309 (0.1526)  loss_box_reg: 0.0463 (0.1405)  loss_mask: 0.1568 (0.2945)  loss_objectness: 0.0012 (0.0083)  loss_rpn_box_reg: 0.0093 (0.0123)  time: 0.3489  data: 0.0042  max mem: 5081
Epoch: [0] Total time: 0:00:21 (0.3570 s / it)
creating index...
index created!
Test:  [ 0/50]  eta: 0:00:19  model_time: 0.2152 (0.2152)  evaluator_time: 0.0133 (0.0133)  time: 0.4000  data: 0.1701  max mem: 5081
Test:  [49/50]  eta: 0:00:00  model_time: 0.0628 (0.0687)  evaluator_time: 0.0039 (0.0064)  time: 0.0735  data: 0.0022  max mem: 5081
Test: Total time: 0:00:04 (0.0828 s / it)
Averaged stats: model_time: 0.0628 (0.0687)  evaluator_time: 0.0039 (0.0064)
Accumulating evaluation results...
DONE (t=0.01s).
Accumulating evaluation results...
DONE (t=0.01s).
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.606
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.984
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.780
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.313
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.582
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.612
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.270
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.672
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.672
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.650
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.755
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.664
IoU metric: segm
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.704
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.979
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.871
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.325
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.488
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.727
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.316
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.748
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.749
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.650
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.673
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.758

```

W
wizardforcel 已提交
403
因此,经过一个时期的训练,我们获得了 60.6 的 COCO 风格 mAP 和 70.4 的遮罩 mAP。
W
wizardforcel 已提交
404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446

经过 10 个时期的训练,我得到了以下指标

```py
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.799
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.969
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.935
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.349
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.592
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.831
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.324
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.844
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.844
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.400
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.777
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.870
IoU metric: segm
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.761
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.969
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.919
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.341
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.464
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.788
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.303
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.799
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.799
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.400
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.769
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.818

```

但是这些预测是什么样的? 让我们在数据集中拍摄一张图像并进行验证

![intermediate/../../_static/img/tv_tutorial/tv_image05.png](img/85fee85630aaace1c60fe5ba0df8c795.png)

经过训练的模型会在此图片中预测 9 个人物实例,让我们看看其中的几个:

![intermediate/../../_static/img/tv_tutorial/tv_image06.png](img/c9d3ddd13da5858e2cb03b53753ece3c.png) ![intermediate/../../_static/img/tv_tutorial/tv_image07.png](img/5c33a15f9b0da3f9377dc63f70bb58a7.png)

结果看起来还不错!

W
wizardforcel 已提交
447
## 总结
W
wizardforcel 已提交
448

W
wizardforcel 已提交
449
在本教程中,您学习了如何在自定义数据集上为实例细分模型创建自己的训练管道。 为此,您编写了一个`torch.utils.data.Dataset`类,该类返回图像以及地面真相框和分段蒙版。 您还利用了在 COCO train2017 上预先训练的 Mask R-CNN 模型,以便对该新数据集执行迁移学习。
W
wizardforcel 已提交
450

W
wizardforcel 已提交
451
对于更完整的示例(包括多机/多 GPU 训练),请检查在`torchvision`存储库中存在的`references/detection/train.py`
W
wizardforcel 已提交
452

W
wizardforcel 已提交
453
[您可以在此处下载本教程的完整源文件](https://pytorch.org/tutorials/_static/tv-training-code.py)