未验证 提交 da3bf2b4 编写于 作者: J Jianfeng Wang 提交者: GitHub

feat(detection): several enhancement and reformat (#43)

* feat(detection): several enhancement and reformat

* feat(detection): use multi-scale training

* chore(detection): update weights and results
上级 0023ce55
__pycache__/
*log*/
*.so
......@@ -73,12 +73,12 @@ export PYTHONPATH=/path/to/models:$PYTHONPATH
### 目标检测
目标检测同样是计算机视觉中的常见任务,我们提供了一个经典的目标检测模型[retinanet](./official/vision/detection),这个模型在**COCO验证集**上的测试结果如下:
目标检测同样是计算机视觉中的常见任务,我们提供了两个经典的目标检测模型[Retinanet](./official/vision/detection/model/retinanet)[Faster R-CNN](./official/vision/detection/model/faster_rcnn),这两个模型在**COCO验证集**上的测试结果如下:
| 模型 | mAP<br>@5-95 |
| :---: | :---: |
| retinanet-res50-1x-800size | 36.0 |
| faster-rcnn-fpn-res50-1x-800size | 37.3 |
| 模型 | mAP<br>@5-95 |
| :---: | :---: |
| retinanet-res50-1x-800size | 36.4 |
| faster-rcnn-res50-1x-800size | 38.8 |
### 图像分割
......
from official.nlp.bert.model import (
cased_L_12_H_768_A_12,
cased_L_24_H_1024_A_16,
chinese_L_12_H_768_A_12,
multi_cased_L_12_H_768_A_12,
uncased_L_12_H_768_A_12,
uncased_L_24_H_1024_A_16,
wwm_cased_L_24_H_1024_A_16,
wwm_uncased_L_24_H_1024_A_16,
)
from official.quantization.models import quantized_resnet18
from official.vision.classification.resnet.model import (
BasicBlock,
Bottleneck,
......@@ -16,54 +27,22 @@ from official.vision.classification.shufflenet.model import (
shufflenet_v2_x1_5,
shufflenet_v2_x2_0,
)
from official.nlp.bert.model import (
uncased_L_12_H_768_A_12,
cased_L_12_H_768_A_12,
uncased_L_24_H_1024_A_16,
cased_L_24_H_1024_A_16,
chinese_L_12_H_768_A_12,
multi_cased_L_12_H_768_A_12,
wwm_uncased_L_24_H_1024_A_16,
wwm_cased_L_24_H_1024_A_16,
)
from official.vision.detection.faster_rcnn_fpn_res50_coco_1x_800size import (
faster_rcnn_fpn_res50_coco_1x_800size,
)
from official.vision.detection.faster_rcnn_fpn_res50_coco_1x_800size_syncbn import (
faster_rcnn_fpn_res50_coco_1x_800size_syncbn,
)
from official.vision.detection.retinanet_res50_coco_1x_800size import (
from official.vision.detection.configs import (
faster_rcnn_res50_coco_1x_800size,
faster_rcnn_res50_coco_1x_800size_syncbn,
retinanet_res50_coco_1x_800size,
)
from official.vision.detection.retinanet_res50_coco_1x_800size_syncbn import (
retinanet_res50_coco_1x_800size_syncbn,
)
# TODO: need pretrained weights
# from official.vision.detection.retinanet_res50_objects365_1x_800size import (
# retinanet_res50_objects365_1x_800size,
# )
# from official.vision.detection.retinanet_res50_voc_1x_800size import (
# retinanet_res50_voc_1x_800size,
# )
from official.vision.detection.models import FasterRCNN, RetinaNet
from official.vision.detection.tools.test import DetEvaluator
from official.vision.segmentation.deeplabv3plus import (
deeplabv3plus_res101,
DeepLabV3Plus,
)
from official.vision.detection.tools.utils import DetEvaluator
from official.vision.keypoints.inference import KeypointEvaluator
from official.vision.keypoints.models import (
mspn_4stage,
simplebaseline_res50,
simplebaseline_res101,
simplebaseline_res152,
mspn_4stage
)
from official.vision.keypoints.inference import KeypointEvaluator
from official.quantization.models import quantized_resnet18
from official.vision.segmentation.deeplabv3plus import (
DeepLabV3Plus,
deeplabv3plus_res101,
)
......@@ -2,14 +2,16 @@
## 介绍
本目录包含了采用MegEngine实现的经典网络结构,包括[RetinaNet](https://arxiv.org/pdf/1708.02002>)[Faster R-CNN with FPN](https://arxiv.org/pdf/1612.03144.pdf)等,同时提供了在COCO2017数据集上的完整训练和测试代码。
本目录包含了采用MegEngine实现的经典网络结构,包括[RetinaNet](https://arxiv.org/pdf/1708.02002>)[Faster R-CNN](https://arxiv.org/pdf/1612.03144.pdf)等,同时提供了在COCO2017数据集上的完整训练和测试代码。
网络的性能在COCO2017数据集上的测试结果如下:
| 模型 | mAP<br>@5-95 | batch<br>/gpu | gpu | trainging speed<br>(8gpu) | training speed<br>(1gpu) |
| --- | --- | --- | --- | --- | --- |
| retinanet-res50-coco-1x-800size | 36.0 | 2 | 2080Ti | 2.27(it/s) | 3.7(it/s) |
| faster-rcnn-fpn-res50-coco-1x-800size | 37.3 | 2 | 2080Ti | 1.9(it/s) | 3.1(it/s) |
| 模型 | mAP<br>@5-95 | batch<br>/gpu | gpu | trainging speed<br>(8gpu) |
| --- | :---: | :---: | :---: | :---: |
| retinanet-res50-coco-1x-800size | 36.4 | 2 | 2080Ti | 3.1(it/s) |
| retinanet-res50-coco-1x-800size-syncbn | 37.1 | 2 | 2080Ti | 1.7(it/s) |
| faster-rcnn-res50-coco-1x-800size | 38.8 | 2 | 2080Ti | 3.3(it/s) |
| faster-rcnn-res50-coco-1x-800size-syncbn | 39.3 | 2 | 2080Ti | 1.8(it/s) |
* MegEngine v0.4.0
......@@ -18,16 +20,16 @@
以RetinaNet为例,模型训练好之后,可以通过如下命令测试单张图片:
```bash
python3 tools/inference.py -f retinanet_res50_coco_1x_800size.py \
python3 tools/inference.py -f configs/retinanet_res50_coco_1x_800size.py \
-w /path/to/retinanet_weights.pkl
-i ../../assets/cat.jpg \
-m /path/to/retinanet_weights.pkl
```
`tools/inference.py`的命令行选项如下:
- `-f`, 测试的网络结构描述文件。
- `-m`, 网络结构文件所对应的训练权重, 可以从顶部的表格中下载训练好的检测器权重。
- `-i`, 需要测试的样例图片。
- `-w`, 网络结构文件所对应的训练权重, 可以从顶部的表格中下载训练好的检测器权重。
使用默认图片和默认模型测试的结果见下图:
......@@ -53,10 +55,7 @@ python3 tools/inference.py -f retinanet_res50_coco_1x_800size.py \
4. 开始训练:
```bash
python3 tools/train.py -f retinanet_res50_coco_1x_800size.py \
-n 8 \
--batch_size 2 \
-w /path/to/pretrain.pkl
python3 tools/train.py -f configs/retinanet_res50_coco_1x_800size.py -n 8
```
`tools/train.py`提供了灵活的命令行选项,包括:
......@@ -64,8 +63,8 @@ python3 tools/train.py -f retinanet_res50_coco_1x_800size.py \
- `-f`, 所需要训练的网络结构描述文件。可以是RetinaNet、Faster R-CNN等.
- `-n`, 用于训练的devices(gpu)数量,默认使用所有可用的gpu.
- `-w`, 预训练的backbone网络权重的路径。
- `--batch_size`,训练时采用的`batch size`, 默认2,表示每张卡训2张图。
- `--dataset-dir`, COCO2017数据集的上级目录,默认`/data/datasets`
- `-b`,训练时采用的`batch size`, 默认2,表示每张卡训2张图。
- `-d`, COCO2017数据集的上级目录,默认`/data/datasets`
默认情况下模型会存在 `log-of-模型名`目录下。
......@@ -90,18 +89,16 @@ nvcc -I $MGE/_internal/include -shared -o lib_nms.so -Xcompiler "-fno-strict-ali
在得到训练完保存的模型之后,可以通过tools下的test.py文件测试模型在`COCO2017`验证集的性能:
```bash
python3 tools/test.py -f retinanet_res50_coco_1x_800size.py \
-n 8 \
--model /path/to/retinanet_weights.pt \
--dataset_dir /data/datasets
python3 tools/test.py -f configs/retinanet_res50_coco_1x_800size.py -n 8 \
-w /path/to/retinanet_weights.pt \
```
`tools/test.py`的命令行选项如下:
- `-f`, 所需要测试的网络结构描述文件。
- `-n`, 用于测试的devices(gpu)数量,默认1;
- `--model`, 需要测试的模型;可以从顶部的表格中下载训练好的检测器权重, 也可以用自行训练好的权重。
- `--dataset_dir`,COCO2017数据集的上级目录,默认`/data/datasets`
- `-w`, 需要测试的模型;可以从顶部的表格中下载训练好的检测器权重, 也可以用自行训练好的权重。
- `-d`,COCO2017数据集的上级目录,默认`/data/datasets`
## 参考文献
......
from .faster_rcnn_res50_coco_1x_800size import faster_rcnn_res50_coco_1x_800size
from .faster_rcnn_res50_coco_1x_800size_syncbn import faster_rcnn_res50_coco_1x_800size_syncbn
from .retinanet_res50_coco_1x_800size import retinanet_res50_coco_1x_800size
from .retinanet_res50_coco_1x_800size_syncbn import retinanet_res50_coco_1x_800size_syncbn
_EXCLUDE = {}
__all__ = [k for k in globals().keys() if k not in _EXCLUDE and not k.startswith("_")]
......@@ -13,9 +13,9 @@ from official.vision.detection import models
@hub.pretrained(
"https://data.megengine.org.cn/models/weights/"
"faster_rcnn_fpn_ec2e80b9_res50_1x_800size_37dot3.pkl"
"faster_rcnn_res50_coco_1x_800size_38dot8_5e195d80.pkl"
)
def faster_rcnn_fpn_res50_coco_1x_800size(batch_size=1, **kwargs):
def faster_rcnn_res50_coco_1x_800size(batch_size=1, **kwargs):
r"""
Faster-RCNN FPN trained from COCO dataset.
`"Faster-RCNN" <https://arxiv.org/abs/1506.01497>`_
......
......@@ -11,7 +11,7 @@ from megengine import hub
from official.vision.detection import models
class CustomFasterRCNNFPNConfig(models.FasterRCNNConfig):
class CustomFasterRCNNConfig(models.FasterRCNNConfig):
def __init__(self):
super().__init__()
......@@ -22,9 +22,9 @@ class CustomFasterRCNNFPNConfig(models.FasterRCNNConfig):
@hub.pretrained(
"https://data.megengine.org.cn/models/weights/"
"faster_rcnn_fpn_cf5c020b_res50_1x_800size_syncbn_37dot6.pkl"
"faster_rcnn_res50_coco_1x_800size_syncbn_39dot3_09b99bce.pkl"
)
def faster_rcnn_fpn_res50_coco_1x_800size_syncbn(batch_size=1, **kwargs):
def faster_rcnn_res50_coco_1x_800size_syncbn(batch_size=1, **kwargs):
r"""
Faster-RCNN FPN trained from COCO dataset.
`"Faster-RCNN" <https://arxiv.org/abs/1506.01497>`_
......@@ -32,8 +32,8 @@ def faster_rcnn_fpn_res50_coco_1x_800size_syncbn(batch_size=1, **kwargs):
`"COCO" <https://arxiv.org/abs/1405.0312>`_
`"SyncBN" <https://arxiv.org/abs/1711.07240>`_
"""
return models.FasterRCNN(CustomFasterRCNNFPNConfig(), batch_size=batch_size, **kwargs)
return models.FasterRCNN(CustomFasterRCNNConfig(), batch_size=batch_size, **kwargs)
Net = models.FasterRCNN
Cfg = CustomFasterRCNNFPNConfig
Cfg = CustomFasterRCNNConfig
......@@ -13,7 +13,7 @@ from official.vision.detection import models
@hub.pretrained(
"https://data.megengine.org.cn/models/weights/"
"retinanet_d3f58dce_res50_1x_800size_36dot0.pkl"
"retinanet_res50_coco_1x_800size_36dot4_b782a619.pkl"
)
def retinanet_res50_coco_1x_800size(batch_size=1, **kwargs):
r"""
......
......@@ -6,6 +6,7 @@
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT ARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
from megengine import hub
from official.vision.detection import models
......@@ -19,6 +20,10 @@ class CustomRetinaNetConfig(models.RetinaNetConfig):
self.backbone_freeze_at = 0
@hub.pretrained(
"https://data.megengine.org.cn/models/weights/"
"retinanet_res50_coco_1x_800size_syncbn_37dot1_35cedcdf.pkl"
)
def retinanet_res50_coco_1x_800size_syncbn(batch_size=1, **kwargs):
r"""
RetinaNet with SyncBN trained from COCO dataset.
......
......@@ -63,14 +63,13 @@ def get_norm(norm, out_channels=None):
Returns:
M.Module or None: the normalization layer
"""
if isinstance(norm, str):
if len(norm) == 0:
return None
norm = {
"BN": M.BatchNorm2d,
"SyncBN": M.SyncBatchNorm,
"FrozenBN": FrozenBatchNorm2d
}[norm]
if norm is None:
return None
norm = {
"BN": M.BatchNorm2d,
"SyncBN": M.SyncBatchNorm,
"FrozenBN": FrozenBatchNorm2d,
}[norm]
if out_channels is not None:
return norm(out_channels)
else:
......
......@@ -42,14 +42,14 @@ class DefaultAnchorGenerator(BaseAnchorGenerator):
def __init__(
self,
base_size=8,
anchor_scales: np.ndarray = np.array([2, 3, 4]),
anchor_ratios: np.ndarray = np.array([0.5, 1, 2]),
anchor_scales: list = [2, 3, 4],
anchor_ratios: list = [0.5, 1, 2],
offset: float = 0,
):
super().__init__()
self.base_size = base_size
self.anchor_scales = anchor_scales
self.anchor_ratios = anchor_ratios
self.anchor_scales = np.array(anchor_scales)
self.anchor_ratios = np.array(anchor_ratios)
self.offset = offset
def _whctrs(self, anchor):
......@@ -111,7 +111,7 @@ class DefaultAnchorGenerator(BaseAnchorGenerator):
flatten_shift_y = F.add_axis(broad_shift_y.reshape(-1), 1)
centers = F.concat(
[flatten_shift_x, flatten_shift_y, flatten_shift_x, flatten_shift_y, ],
[flatten_shift_x, flatten_shift_y, flatten_shift_x, flatten_shift_y,],
axis=1,
)
centers = centers + self.offset * self.base_size
......
......@@ -8,6 +8,8 @@
# "AS IS" BASIS, WITHOUT ARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
from abc import ABCMeta, abstractmethod
import numpy as np
import megengine.functional as F
from megengine.core import Tensor
......@@ -29,15 +31,19 @@ class BoxCoderBase(metaclass=ABCMeta):
class BoxCoder(BoxCoderBase, metaclass=ABCMeta):
def __init__(self, reg_mean=None, reg_std=None):
def __init__(
self,
reg_mean=[0.0, 0.0, 0.0, 0.0],
reg_std=[1.0, 1.0, 1.0, 1.0],
):
"""
Args:
reg_mean(np.ndarray): [x0_mean, x1_mean, y0_mean, y1_mean] or None
reg_std(np.ndarray): [x0_std, x1_std, y0_std, y1_std] or None
"""
self.reg_mean = reg_mean[None, :] if reg_mean is not None else None
self.reg_std = reg_std[None, :] if reg_std is not None else None
self.reg_mean = np.array(reg_mean)[None, :]
self.reg_std = np.array(reg_std)[None, :]
super().__init__()
@staticmethod
......@@ -82,17 +88,13 @@ class BoxCoder(BoxCoderBase, metaclass=ABCMeta):
target_dh = F.log(gt_height / bbox_height)
target = self._concat_new_axis(target_dx, target_dy, target_dw, target_dh)
if self.reg_mean is not None:
target -= self.reg_mean
if self.reg_std is not None:
target /= self.reg_std
target -= self.reg_mean
target /= self.reg_std
return target
def decode(self, anchors: Tensor, deltas: Tensor) -> Tensor:
if self.reg_std is not None:
deltas *= self.reg_std
if self.reg_mean is not None:
deltas += self.reg_mean
deltas *= self.reg_std
deltas += self.reg_mean
(
anchor_width,
......@@ -158,7 +160,7 @@ def get_iou(boxes1: Tensor, boxes2: Tensor, return_ignore=False) -> Tensor:
if return_ignore:
overlaps_ignore = F.maximum(inter / b_area_box, 0)
gt_ignore_mask = F.add_axis((gt[:, 4] == -1), 0).broadcast(*area_target_shape)
overlaps *= (1 - gt_ignore_mask)
overlaps *= 1 - gt_ignore_mask
overlaps_ignore *= gt_ignore_mask
return overlaps, overlaps_ignore
......
......@@ -45,7 +45,7 @@ class FPN(M.Module):
bottom_up: M.Module,
in_features: List[str],
out_channels: int = 256,
norm: str = "",
norm: str = None,
top_block: M.Module = None,
strides=[8, 16, 32],
channels=[512, 1024, 2048],
......
......@@ -53,7 +53,9 @@ def get_focal_loss(
neg_part = score ** gamma * F.log(F.clamp(1 - score, 1e-8))
pos_loss = -(label == class_range) * pos_part * alpha
neg_loss = -(label != class_range) * (label != ignore_label) * neg_part * (1 - alpha)
neg_loss = (
-(label != class_range) * (label != ignore_label) * neg_part * (1 - alpha)
)
loss = pos_loss + neg_loss
if norm_type == "fg":
......@@ -69,10 +71,9 @@ def get_smooth_l1_loss(
pred_bbox: Tensor,
gt_bbox: Tensor,
label: Tensor,
sigma: int = 3,
beta: int = 1,
background: int = 0,
ignore_label: int = -1,
fix_smooth_l1: bool = False,
norm_type: str = "fg",
) -> Tensor:
r"""Smooth l1 loss used in RetinaNet.
......@@ -84,14 +85,12 @@ def get_smooth_l1_loss(
the ground-truth bbox with the shape of :math:`(B, A, 4)`
label (Tensor):
the assigned label of boxes with shape of :math:`(B, A)`
sigma (int):
beta (int):
the parameter of smooth l1 loss. Default: 1
background (int):
the value of background class. Default: 0
ignore_label (int):
the value of ignore class. Default: -1
fix_smooth_l1 (bool):
is to use huber loss, default is False to use original smooth-l1
norm_type (str): current support 'fg', 'all', 'none':
'fg': loss will be normalized by number of fore-ground samples
'all': loss will be normalized by number of all samples
......@@ -105,11 +104,11 @@ def get_smooth_l1_loss(
fg_mask = (label != background) * (label != ignore_label)
losses = get_smooth_l1_base(pred_bbox, gt_bbox, sigma, is_fix=fix_smooth_l1)
losses = get_smooth_l1_base(pred_bbox, gt_bbox, beta)
if norm_type == "fg":
loss = (losses.sum(axis=1) * fg_mask).sum() / F.maximum(fg_mask.sum(), 1)
elif norm_type == "all":
all_mask = (label != ignore_label)
all_mask = label != ignore_label
loss = (losses.sum(axis=1) * fg_mask).sum() / F.maximum(all_mask.sum(), 1)
else:
raise NotImplementedError
......@@ -118,7 +117,7 @@ def get_smooth_l1_loss(
def get_smooth_l1_base(
pred_bbox: Tensor, gt_bbox: Tensor, sigma: float, is_fix: bool = False,
pred_bbox: Tensor, gt_bbox: Tensor, beta: float,
):
r"""
......@@ -127,34 +126,24 @@ def get_smooth_l1_base(
the predicted bbox with the shape of :math:`(N, 4)`
gt_bbox (Tensor):
the ground-truth bbox with the shape of :math:`(N, 4)`
sigma (int):
beta (int):
the parameter of smooth l1 loss.
is_fix (bool):
is to use huber loss, default is False to use original smooth-l1
Returns:
the calculated smooth l1 loss.
"""
if is_fix:
sigma = 1 / sigma
cond_point = sigma
x = pred_bbox - gt_bbox
abs_x = F.abs(x)
in_loss = 0.5 * x ** 2
out_loss = sigma * abs_x - 0.5 * sigma ** 2
x = pred_bbox - gt_bbox
abs_x = F.abs(x)
if beta < 1e-5:
loss = abs_x
else:
sigma2 = sigma ** 2
cond_point = 1 / sigma2
x = pred_bbox - gt_bbox
abs_x = F.abs(x)
in_loss = 0.5 * x ** 2 * sigma2
out_loss = abs_x - 0.5 / sigma2
# FIXME: F.where cannot handle 0-shape tensor yet
# loss = F.where(abs_x < cond_point, in_loss, out_loss)
in_mask = abs_x < cond_point
out_mask = 1 - in_mask
loss = in_loss * in_mask + out_loss * out_mask
in_loss = 0.5 * x ** 2 / beta
out_loss = abs_x - 0.5 * beta
# FIXME: F.where cannot handle 0-shape tensor yet
# loss = F.where(abs_x < beta, in_loss, out_loss)
in_mask = abs_x < beta
loss = in_loss * in_mask + out_loss * (1 - in_mask)
return loss
......@@ -162,7 +151,7 @@ def softmax_loss(score, label, ignore_label=-1):
max_score = F.zero_grad(score.max(axis=1, keepdims=True))
score -= max_score
log_prob = score - F.log(F.exp(score).sum(axis=1, keepdims=True))
mask = (label != ignore_label)
mask = label != ignore_label
vlabel = label * mask
loss = -(F.indexing_one_hot(log_prob, vlabel.astype("int32"), 1) * mask).sum()
loss = loss / F.maximum(mask.sum(), 1)
......
......@@ -15,7 +15,7 @@ import megengine.functional as F
def roi_pool(
rpn_fms, rois, stride, pool_shape, roi_type='roi_align',
rpn_fms, rois, stride, pool_shape, roi_type="roi_align",
):
assert len(stride) == len(rpn_fms)
canonical_level = 4
......@@ -40,18 +40,22 @@ def roi_pool(
pool_list, inds_list = [], []
for i in range(num_fms):
mask = (level_assignments == i)
mask = level_assignments == i
_, inds = F.cond_take(mask == 1, mask)
level_rois = rois.ai[inds]
if roi_type == 'roi_pool':
if roi_type == "roi_pool":
pool_fm = F.roi_pooling(
rpn_fms[i], level_rois, pool_shape,
mode='max', scale=1.0/stride[i]
rpn_fms[i], level_rois, pool_shape, mode="max", scale=1.0 / stride[i]
)
elif roi_type == 'roi_align':
elif roi_type == "roi_align":
pool_fm = F.roi_align(
rpn_fms[i], level_rois, pool_shape, mode='average',
spatial_scale=1.0/stride[i], sample_points=2, aligned=True
rpn_fms[i],
level_rois,
pool_shape,
mode="average",
spatial_scale=1.0 / stride[i],
sample_points=2,
aligned=True,
)
pool_list.append(pool_fm)
inds_list.append(inds)
......
......@@ -14,14 +14,10 @@ from official.vision.detection import layers
class RCNN(M.Module):
def __init__(self, cfg):
super().__init__()
self.cfg = cfg
self.box_coder = layers.BoxCoder(
reg_mean=cfg.rcnn_reg_mean,
reg_std=cfg.rcnn_reg_std
)
self.box_coder = layers.BoxCoder(cfg.rcnn_reg_mean, cfg.rcnn_reg_std)
# roi head
self.in_features = cfg.rcnn_in_features
......@@ -44,12 +40,13 @@ class RCNN(M.Module):
M.init.fill_(l.bias, 0)
def forward(self, fpn_fms, rcnn_rois, im_info=None, gt_boxes=None):
rcnn_rois, labels, bbox_targets = self.get_ground_truth(rcnn_rois, im_info, gt_boxes)
rcnn_rois, labels, bbox_targets = self.get_ground_truth(
rcnn_rois, im_info, gt_boxes
)
fpn_fms = [fpn_fms[x] for x in self.in_features]
pool_features = layers.roi_pool(
fpn_fms, rcnn_rois, self.stride,
self.pooling_size, self.pooling_method,
fpn_fms, rcnn_rois, self.stride, self.pooling_size, self.pooling_method,
)
flatten_feature = F.flatten(pool_features, start_axis=1)
roi_feature = F.relu(self.fc1(flatten_feature))
......@@ -67,14 +64,13 @@ class RCNN(M.Module):
pred_delta = F.indexing_one_hot(pred_delta, vlabels, axis=1)
loss_rcnn_loc = layers.get_smooth_l1_loss(
pred_delta, bbox_targets, labels,
pred_delta,
bbox_targets,
labels,
self.cfg.rcnn_smooth_l1_beta,
norm_type="all",
)
loss_dict = {
'loss_rcnn_cls': loss_rcnn_cls,
'loss_rcnn_loc': loss_rcnn_loc
}
loss_dict = {"loss_rcnn_cls": loss_rcnn_cls, "loss_rcnn_loc": loss_rcnn_loc}
return loss_dict
else:
# slice 1 for removing background
......@@ -82,7 +78,9 @@ class RCNN(M.Module):
pred_delta = pred_delta[:, 4:].reshape(-1, 4)
target_shape = (rcnn_rois.shapeof(0), self.cfg.num_classes, 4)
# rois (N, 4) -> (N, 1, 4) -> (N, 80, 4) -> (N * 80, 4)
base_rois = F.add_axis(rcnn_rois[:, 1:5], 1).broadcast(target_shape).reshape(-1, 4)
base_rois = (
F.add_axis(rcnn_rois[:, 1:5], 1).broadcast(target_shape).reshape(-1, 4)
)
pred_bbox = self.box_coder.decode(base_rois, pred_delta)
return pred_bbox, pred_scores
......@@ -101,7 +99,7 @@ class RCNN(M.Module):
batch_inds = mge.ones((gt_boxes_per_img.shapeof(0), 1)) * bid
# if config.proposal_append_gt:
gt_rois = F.concat([batch_inds, gt_boxes_per_img[:, :4]], axis=1)
batch_roi_mask = (rpn_rois[:, 0] == bid)
batch_roi_mask = rpn_rois[:, 0] == bid
_, batch_roi_inds = F.cond_take(batch_roi_mask == 1, batch_roi_mask)
# all_rois : [batch_id, x1, y1, x2, y2]
all_rois = F.concat([rpn_rois.ai[batch_roi_inds], gt_rois])
......@@ -117,22 +115,26 @@ class RCNN(M.Module):
gt_assignment_ignore = F.argmax(overlaps_ignore, axis=1)
ignore_assign_mask = (max_overlaps_normal < self.cfg.fg_threshold) * (
max_overlaps_ignore > max_overlaps_normal)
max_overlaps_ignore > max_overlaps_normal
)
max_overlaps = (
max_overlaps_normal * (1 - ignore_assign_mask) +
max_overlaps_ignore * ignore_assign_mask
max_overlaps_normal * (1 - ignore_assign_mask)
+ max_overlaps_ignore * ignore_assign_mask
)
gt_assignment = (
gt_assignment_normal * (1 - ignore_assign_mask) +
gt_assignment_ignore * ignore_assign_mask
gt_assignment_normal * (1 - ignore_assign_mask)
+ gt_assignment_ignore * ignore_assign_mask
)
gt_assignment = gt_assignment.astype("int32")
labels = gt_boxes_per_img.ai[gt_assignment, 4]
# ---------------- get the fg/bg labels for each roi ---------------#
fg_mask = (max_overlaps >= self.cfg.fg_threshold) * (labels != self.cfg.ignore_label)
fg_mask = (max_overlaps >= self.cfg.fg_threshold) * (
labels != self.cfg.ignore_label
)
bg_mask = (max_overlaps < self.cfg.bg_threshold_high) * (
max_overlaps >= self.cfg.bg_threshold_low)
max_overlaps >= self.cfg.bg_threshold_low
)
num_fg_rois = self.cfg.num_rois * self.cfg.fg_ratio
......@@ -145,7 +147,7 @@ class RCNN(M.Module):
keep_mask = fg_inds_mask + bg_inds_mask
_, keep_inds = F.cond_take(keep_mask == 1, keep_mask)
# Add next line to avoid memory exceed
keep_inds = keep_inds[:F.minimum(self.cfg.num_rois, keep_inds.shapeof(0))]
keep_inds = keep_inds[: F.minimum(self.cfg.num_rois, keep_inds.shapeof(0))]
# labels
labels = labels.ai[keep_inds].astype("int32")
rois = all_rois.ai[keep_inds]
......@@ -160,12 +162,12 @@ class RCNN(M.Module):
return (
F.zero_grad(F.concat(return_rois, axis=0)),
F.zero_grad(F.concat(return_labels, axis=0)),
F.zero_grad(F.concat(return_bbox_targets, axis=0))
F.zero_grad(F.concat(return_bbox_targets, axis=0)),
)
def _bernoulli_sample_masks(self, masks, num_samples, sample_value):
""" Using the bernoulli sampling method"""
sample_mask = (masks == sample_value)
sample_mask = masks == sample_value
num_mask = sample_mask.sum()
num_final_samples = F.minimum(num_mask, num_samples)
# here, we use the bernoulli probability to sample the anchors
......
......@@ -27,7 +27,7 @@ class RetinaNetHead(M.Module):
num_classes = cfg.num_classes
num_convs = 4
prior_prob = cfg.cls_prior_prob
num_anchors = [len(cfg.anchor_ratios) * len(cfg.anchor_scales)] * len(
num_anchors = [len(cfg.anchor_scales) * len(cfg.anchor_ratios)] * len(
input_shape
)
......
......@@ -6,6 +6,8 @@
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT ARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
import numpy as np
import megengine as mge
import megengine.functional as F
import megengine.module as M
......@@ -16,29 +18,27 @@ from official.vision.detection.tools.gpu_nms import batched_nms
class RPN(M.Module):
def __init__(self, cfg):
super().__init__()
self.cfg = cfg
self.box_coder = layers.BoxCoder()
self.box_coder = layers.BoxCoder(cfg.rpn_reg_mean, cfg.rpn_reg_std)
self.num_cell_anchors = len(cfg.anchor_scales) * len(cfg.anchor_ratios)
self.stride_list = cfg.rpn_stride
self.stride_list = np.array(cfg.rpn_stride).astype(np.float32)
rpn_channel = cfg.rpn_channel
self.in_features = cfg.rpn_in_features
self.anchors_generator = layers.DefaultAnchorGenerator(
cfg.anchor_base_size,
cfg.anchor_scales,
cfg.anchor_aspect_ratios,
cfg.anchor_ratios,
cfg.anchor_offset,
)
self.rpn_conv = M.Conv2d(256, rpn_channel, kernel_size=3, stride=1, padding=1)
self.rpn_cls_score = M.Conv2d(
rpn_channel, cfg.num_cell_anchors * 2,
kernel_size=1, stride=1
rpn_channel, self.num_cell_anchors * 2, kernel_size=1, stride=1
)
self.rpn_bbox_offsets = M.Conv2d(
rpn_channel, cfg.num_cell_anchors * 4,
kernel_size=1, stride=1
rpn_channel, self.num_cell_anchors * 4, kernel_size=1, stride=1
)
for l in [self.rpn_conv, self.rpn_cls_score, self.rpn_bbox_offsets]:
......@@ -62,26 +62,32 @@ class RPN(M.Module):
scores = self.rpn_cls_score(t)
pred_cls_score_list.append(
scores.reshape(
scores.shape[0], 2, self.cfg.num_cell_anchors,
scores.shape[2], scores.shape[3]
scores.shape[0],
2,
self.num_cell_anchors,
scores.shape[2],
scores.shape[3],
)
)
bbox_offsets = self.rpn_bbox_offsets(t)
pred_bbox_offsets_list.append(
bbox_offsets.reshape(
bbox_offsets.shape[0], self.cfg.num_cell_anchors, 4,
bbox_offsets.shape[2], bbox_offsets.shape[3]
bbox_offsets.shape[0],
self.num_cell_anchors,
4,
bbox_offsets.shape[2],
bbox_offsets.shape[3],
)
)
# sample from the predictions
rpn_rois = self.find_top_rpn_proposals(
pred_bbox_offsets_list, pred_cls_score_list,
all_anchors_list, im_info
pred_bbox_offsets_list, pred_cls_score_list, all_anchors_list, im_info
)
if self.training:
rpn_labels, rpn_bbox_targets = self.get_ground_truth(
boxes, im_info, all_anchors_list)
boxes, im_info, all_anchors_list
)
pred_cls_score, pred_bbox_offsets = self.merge_rpn_score_box(
pred_cls_score_list, pred_bbox_offsets_list
)
......@@ -93,24 +99,26 @@ class RPN(M.Module):
rpn_bbox_targets,
rpn_labels,
self.cfg.rpn_smooth_l1_beta,
norm_type="all"
norm_type="all",
)
loss_dict = {
"loss_rpn_cls": loss_rpn_cls,
"loss_rpn_loc": loss_rpn_loc
}
loss_dict = {"loss_rpn_cls": loss_rpn_cls, "loss_rpn_loc": loss_rpn_loc}
return rpn_rois, loss_dict
else:
return rpn_rois
def find_top_rpn_proposals(
self, rpn_bbox_offsets_list, rpn_cls_prob_list,
all_anchors_list, im_info
self, rpn_bbox_offsets_list, rpn_cls_prob_list, all_anchors_list, im_info
):
prev_nms_top_n = self.cfg.train_prev_nms_top_n \
if self.training else self.cfg.test_prev_nms_top_n
post_nms_top_n = self.cfg.train_post_nms_top_n \
if self.training else self.cfg.test_post_nms_top_n
prev_nms_top_n = (
self.cfg.train_prev_nms_top_n
if self.training
else self.cfg.test_prev_nms_top_n
)
post_nms_top_n = (
self.cfg.train_post_nms_top_n
if self.training
else self.cfg.test_post_nms_top_n
)
batch_per_gpu = self.cfg.batch_per_gpu if self.training else 1
nms_threshold = self.cfg.rpn_nms_threshold
......@@ -125,7 +133,9 @@ class RPN(M.Module):
batch_level_list = []
for l in range(list_size):
# get proposals and probs
offsets = rpn_bbox_offsets_list[l][bid].dimshuffle(2, 3, 0, 1).reshape(-1, 4)
offsets = (
rpn_bbox_offsets_list[l][bid].dimshuffle(2, 3, 0, 1).reshape(-1, 4)
)
all_anchors = all_anchors_list[l]
proposals = self.box_coder.decode(all_anchors, offsets)
......@@ -162,7 +172,9 @@ class RPN(M.Module):
# apply total level nms
rois = F.concat([proposals, scores.reshape(-1, 1)], axis=1)
keep_inds = batched_nms(proposals, scores, level, nms_threshold, post_nms_top_n)
keep_inds = batched_nms(
proposals, scores, level, nms_threshold, post_nms_top_n
)
rois = rois.ai[keep_inds]
# rois shape (N, 5), info [batch_id, x1, y1, x2, y2]
......@@ -181,10 +193,12 @@ class RPN(M.Module):
batch_rpn_bbox_offsets_list = []
for i in range(len(self.in_features)):
rpn_cls_score = rpn_cls_score_list[i][bid] \
.dimshuffle(2, 3, 1, 0).reshape(-1, 2)
rpn_bbox_offsets = rpn_bbox_offsets_list[i][bid] \
.dimshuffle(2, 3, 0, 1).reshape(-1, 4)
rpn_cls_score = (
rpn_cls_score_list[i][bid].dimshuffle(2, 3, 1, 0).reshape(-1, 2)
)
rpn_bbox_offsets = (
rpn_bbox_offsets_list[i][bid].dimshuffle(2, 3, 0, 1).reshape(-1, 4)
)
batch_rpn_cls_score_list.append(rpn_cls_score)
batch_rpn_bbox_offsets_list.append(rpn_bbox_offsets)
......@@ -199,12 +213,10 @@ class RPN(M.Module):
final_rpn_bbox_offsets = F.concat(final_rpn_bbox_offsets_list, axis=0)
return final_rpn_cls_score, final_rpn_bbox_offsets
def per_level_gt(
self, gt_boxes, im_info, anchors, allow_low_quality_matches=True
):
def per_level_gt(self, gt_boxes, im_info, anchors, allow_low_quality_matches=True):
ignore_label = self.cfg.ignore_label
# get the gt boxes
valid_gt_boxes = gt_boxes[:im_info[4], :]
valid_gt_boxes = gt_boxes[: im_info[4], :]
# compute the iou matrix
overlaps = layers.get_iou(anchors, valid_gt_boxes[:, :4])
# match the dtboxes
......@@ -216,7 +228,7 @@ class RPN(M.Module):
# set negative ones
labels = labels * (max_overlaps >= self.cfg.rpn_negative_overlap)
# set positive ones
fg_mask = (max_overlaps >= self.cfg.rpn_positive_overlap)
fg_mask = max_overlaps >= self.cfg.rpn_positive_overlap
const_one = mge.tensor(1.0)
if allow_low_quality_matches:
# make sure that max iou of gt matched
......@@ -224,10 +236,10 @@ class RPN(M.Module):
num_valid_boxes = valid_gt_boxes.shapeof(0)
gt_id = F.linspace(0, num_valid_boxes - 1, num_valid_boxes).astype("int32")
argmax_overlaps = argmax_overlaps.set_ai(gt_id)[gt_argmax_overlaps]
max_overlaps = max_overlaps.set_ai(
const_one.broadcast(num_valid_boxes)
)[gt_argmax_overlaps]
fg_mask = (max_overlaps >= self.cfg.rpn_positive_overlap)
max_overlaps = max_overlaps.set_ai(const_one.broadcast(num_valid_boxes))[
gt_argmax_overlaps
]
fg_mask = max_overlaps >= self.cfg.rpn_positive_overlap
# set positive ones
_, fg_mask_ind = F.cond_take(fg_mask == 1, fg_mask)
labels = labels.set_ai(const_one.broadcast(fg_mask_ind.shapeof(0)))[fg_mask_ind]
......@@ -258,15 +270,13 @@ class RPN(M.Module):
num_positive = self.cfg.num_sample_anchors * self.cfg.positive_anchor_ratio
# sample positive
concated_batch_labels = self._bernoulli_sample_labels(
concated_batch_labels,
num_positive, 1, self.cfg.ignore_label
concated_batch_labels, num_positive, 1, self.cfg.ignore_label
)
# sample negative
num_positive = (concated_batch_labels == 1).sum()
num_negative = self.cfg.num_sample_anchors - num_positive
concated_batch_labels = self._bernoulli_sample_labels(
concated_batch_labels,
num_negative, 0, self.cfg.ignore_label
concated_batch_labels, num_negative, 0, self.cfg.ignore_label
)
final_labels_list.append(concated_batch_labels)
......@@ -279,7 +289,7 @@ class RPN(M.Module):
self, labels, num_samples, sample_value, ignore_label=-1
):
""" Using the bernoulli sampling method"""
sample_label_mask = (labels == sample_value)
sample_label_mask = labels == sample_value
num_mask = sample_label_mask.sum()
num_final_samples = F.minimum(num_mask, num_samples)
# here, we use the bernoulli probability to sample the anchors
......
......@@ -6,7 +6,7 @@
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT ARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
from .faster_rcnn_fpn import *
from .faster_rcnn import *
from .retinanet import *
_EXCLUDE = {}
......
......@@ -12,19 +12,20 @@ import megengine as mge
import megengine.functional as F
import megengine.module as M
from official.vision.classification.resnet.model import resnet50
import official.vision.classification.resnet.model as resnet
from official.vision.detection import layers
class FasterRCNN(M.Module):
def __init__(self, cfg, batch_size):
super().__init__()
self.cfg = cfg
cfg.batch_per_gpu = batch_size
self.batch_size = batch_size
# ----------------------- build the backbone ------------------------ #
bottom_up = resnet50(norm=layers.get_norm(cfg.resnet_norm))
bottom_up = getattr(resnet, cfg.backbone)(
norm=layers.get_norm(cfg.resnet_norm), pretrained=cfg.backbone_pretrained
)
# ------------ freeze the weights of resnet stage1 and stage 2 ------ #
if self.cfg.backbone_freeze_at >= 1:
......@@ -67,14 +68,14 @@ class FasterRCNN(M.Module):
def preprocess_image(self, image):
normed_image = (
image - self.cfg.img_mean[None, :, None, None]
) / self.cfg.img_std[None, :, None, None]
image - np.array(self.cfg.img_mean)[None, :, None, None]
) / np.array(self.cfg.img_std)[None, :, None, None]
return layers.get_padded_tensor(normed_image, 32, 0.0)
def forward(self, inputs):
images = inputs['image']
im_info = inputs['im_info']
gt_boxes = inputs['gt_boxes']
images = inputs["image"]
im_info = inputs["im_info"]
gt_boxes = inputs["gt_boxes"]
# process the images
normed_images = self.preprocess_image(images)
# normed_images = images
......@@ -89,10 +90,10 @@ class FasterRCNN(M.Module):
rpn_rois, rpn_losses = self.RPN(fpn_features, im_info, gt_boxes)
rcnn_losses = self.RCNN(fpn_features, rpn_rois, im_info, gt_boxes)
loss_rpn_cls = rpn_losses['loss_rpn_cls']
loss_rpn_loc = rpn_losses['loss_rpn_loc']
loss_rcnn_cls = rcnn_losses['loss_rcnn_cls']
loss_rcnn_loc = rcnn_losses['loss_rcnn_loc']
loss_rpn_cls = rpn_losses["loss_rpn_cls"]
loss_rpn_loc = rpn_losses["loss_rpn_loc"]
loss_rcnn_cls = rcnn_losses["loss_rcnn_cls"]
loss_rcnn_loc = rcnn_losses["loss_rcnn_loc"]
total_loss = loss_rpn_cls + loss_rpn_loc + loss_rcnn_cls + loss_rcnn_loc
loss_dict = {
......@@ -100,7 +101,7 @@ class FasterRCNN(M.Module):
"rpn_cls": loss_rpn_cls,
"rpn_loc": loss_rpn_loc,
"rcnn_cls": loss_rcnn_cls,
"rcnn_loc": loss_rcnn_loc
"rcnn_loc": loss_rcnn_loc,
}
self.cfg.losses_keys = list(loss_dict.keys())
return loss_dict
......@@ -112,20 +113,20 @@ class FasterRCNN(M.Module):
pred_boxes = pred_boxes.reshape(-1, 4)
scale_w = im_info[0, 1] / im_info[0, 3]
scale_h = im_info[0, 0] / im_info[0, 2]
pred_boxes = pred_boxes / F.concat(
[scale_w, scale_h, scale_w, scale_h], axis=0
)
pred_boxes = pred_boxes / F.concat([scale_w, scale_h, scale_w, scale_h], axis=0)
clipped_boxes = layers.get_clipped_box(
pred_boxes, im_info[0, 2:4]
).reshape(-1, self.cfg.num_classes, 4)
clipped_boxes = layers.get_clipped_box(pred_boxes, im_info[0, 2:4]).reshape(
-1, self.cfg.num_classes, 4
)
return pred_score, clipped_boxes
class FasterRCNNConfig:
def __init__(self):
self.backbone = "resnet50"
self.backbone_pretrained = True
self.resnet_norm = "FrozenBN"
self.fpn_norm = ""
self.fpn_norm = None
self.backbone_freeze_at = 2
# ------------------------ data cfg -------------------------- #
......@@ -142,17 +143,18 @@ class FasterRCNNConfig:
remove_images_without_annotations=False,
)
self.num_classes = 80
self.img_mean = np.array([103.530, 116.280, 123.675]) # BGR
self.img_std = np.array([57.375, 57.120, 58.395])
self.img_mean = [103.530, 116.280, 123.675] # BGR
self.img_std = [57.375, 57.120, 58.395]
# ----------------------- rpn cfg ------------------------- #
self.anchor_base_size = 16
self.anchor_scales = np.array([0.5])
self.anchor_aspect_ratios = [0.5, 1, 2]
self.anchor_scales = [0.5]
self.anchor_ratios = [0.5, 1, 2]
self.anchor_offset = -0.5
self.num_cell_anchors = len(self.anchor_aspect_ratios)
self.rpn_stride = np.array([4, 8, 16, 32, 64]).astype(np.float32)
self.rpn_stride = [4, 8, 16, 32, 64]
self.rpn_reg_mean = [0.0, 0.0, 0.0, 0.0]
self.rpn_reg_std = [1.0, 1.0, 1.0, 1.0]
self.rpn_in_features = ["p2", "p3", "p4", "p5", "p6"]
self.rpn_channel = 256
......@@ -165,7 +167,7 @@ class FasterRCNNConfig:
self.ignore_label = -1
# ----------------------- rcnn cfg ------------------------- #
self.pooling_method = 'roi_align'
self.pooling_method = "roi_align"
self.pooling_size = (7, 7)
self.num_rois = 512
......@@ -174,18 +176,18 @@ class FasterRCNNConfig:
self.bg_threshold_high = 0.5
self.bg_threshold_low = 0.0
self.rcnn_reg_mean = None
self.rcnn_reg_std = np.array([0.1, 0.1, 0.2, 0.2])
self.rcnn_reg_mean = [0.0, 0.0, 0.0, 0.0]
self.rcnn_reg_std = [0.1, 0.1, 0.2, 0.2]
self.rcnn_in_features = ["p2", "p3", "p4", "p5"]
self.rcnn_stride = [4, 8, 16, 32]
# ------------------------ loss cfg -------------------------- #
self.rpn_smooth_l1_beta = 3
self.rcnn_smooth_l1_beta = 1
self.rpn_smooth_l1_beta = 0 # use L1 loss
self.rcnn_smooth_l1_beta = 0 # use L1 loss
self.num_losses = 5
# ------------------------ training cfg ---------------------- #
self.train_image_short_size = 800
self.train_image_short_size = (640, 672, 704, 736, 768, 800)
self.train_image_max_size = 1333
self.train_prev_nms_top_n = 2000
self.train_post_nms_top_n = 1000
......
......@@ -12,7 +12,7 @@ import megengine as mge
import megengine.functional as F
import megengine.module as M
from official.vision.classification.resnet.model import resnet50
import official.vision.classification.resnet.model as resnet
from official.vision.detection import layers
......@@ -31,13 +31,15 @@ class RetinaNet(M.Module):
anchor_scales=self.cfg.anchor_scales,
anchor_ratios=self.cfg.anchor_ratios,
)
self.box_coder = layers.BoxCoder(reg_mean=cfg.reg_mean, reg_std=cfg.reg_std)
self.box_coder = layers.BoxCoder(cfg.reg_mean, cfg.reg_std)
self.stride_list = np.array([8, 16, 32, 64, 128]).astype(np.float32)
self.stride_list = np.array(cfg.stride).astype(np.float32)
self.in_features = ["p3", "p4", "p5", "p6", "p7"]
# ----------------------- build the backbone ------------------------ #
bottom_up = resnet50(norm=layers.get_norm(cfg.resnet_norm))
bottom_up = getattr(resnet, cfg.backbone)(
norm=layers.get_norm(cfg.resnet_norm), pretrained=cfg.backbone_pretrained
)
# ------------ freeze the weights of resnet stage1 and stage 2 ------ #
if self.cfg.backbone_freeze_at >= 1:
......@@ -78,8 +80,8 @@ class RetinaNet(M.Module):
def preprocess_image(self, image):
normed_image = (
image - self.cfg.img_mean[None, :, None, None]
) / self.cfg.img_std[None, :, None, None]
image - np.array(self.cfg.img_mean)[None, :, None, None]
) / np.array(self.cfg.img_std)[None, :, None, None]
return layers.get_padded_tensor(normed_image, 32, 0.0)
def forward(self, inputs):
......@@ -119,7 +121,12 @@ class RetinaNet(M.Module):
gamma=self.cfg.focal_loss_gamma,
)
rpn_bbox_loss = (
layers.get_smooth_l1_loss(all_level_box_delta, box_gt_delta, box_gt_cls)
layers.get_smooth_l1_loss(
all_level_box_delta,
box_gt_delta,
box_gt_cls,
self.cfg.smooth_l1_beta,
)
* self.cfg.reg_loss_weight
)
......@@ -127,7 +134,7 @@ class RetinaNet(M.Module):
loss_dict = {
"total_loss": total,
"loss_cls": rpn_cls_loss,
"loss_loc": rpn_bbox_loss
"loss_loc": rpn_bbox_loss,
}
self.cfg.losses_keys = list(loss_dict.keys())
return loss_dict
......@@ -203,8 +210,10 @@ class RetinaNet(M.Module):
class RetinaNetConfig:
def __init__(self):
self.backbone = "resnet50"
self.backbone_pretrained = True
self.resnet_norm = "FrozenBN"
self.fpn_norm = ""
self.fpn_norm = None
self.backbone_freeze_at = 2
# ------------------------ data cfg -------------------------- #
......@@ -221,13 +230,14 @@ class RetinaNetConfig:
remove_images_without_annotations=False,
)
self.num_classes = 80
self.img_mean = np.array([103.530, 116.280, 123.675]) # BGR
self.img_std = np.array([57.375, 57.120, 58.395])
self.reg_mean = None
self.reg_std = np.array([0.1, 0.1, 0.2, 0.2])
self.anchor_ratios = np.array([0.5, 1, 2])
self.anchor_scales = np.array([2 ** 0, 2 ** (1 / 3), 2 ** (2 / 3)])
self.img_mean = [103.530, 116.280, 123.675] # BGR
self.img_std = [57.375, 57.120, 58.395]
self.stride = [8, 16, 32, 64, 128]
self.reg_mean = [0.0, 0.0, 0.0, 0.0]
self.reg_std = [1.0, 1.0, 1.0, 1.0]
self.anchor_scales = [2 ** 0, 2 ** (1 / 3), 2 ** (2 / 3)]
self.anchor_ratios = [0.5, 1, 2]
self.negative_thresh = 0.4
self.positive_thresh = 0.5
self.allow_low_quality = True
......@@ -237,11 +247,12 @@ class RetinaNetConfig:
# ------------------------ loss cfg -------------------------- #
self.focal_loss_alpha = 0.25
self.focal_loss_gamma = 2
self.reg_loss_weight = 1.0 / 4.0
self.smooth_l1_beta = 0 # use L1 loss
self.reg_loss_weight = 1.0
self.num_losses = 3
# ------------------------ training cfg ---------------------- #
self.train_image_short_size = 800
self.train_image_short_size = (640, 672, 704, 736, 768, 800)
self.train_image_max_size = 1333
self.basic_lr = 0.01 / 16.0 # The basic learning rate for single-image
......
# -*- coding: utf-8 -*-
# MegEngine is Licensed under the Apache License, Version 2.0 (the "License")
#
# Copyright (c) 2014-2020 Megvii Inc. All rights reserved.
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT ARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
from official.vision.detection import models
class CustomRetinaNetConfig(models.RetinaNetConfig):
def __init__(self):
super().__init__()
# ------------------------ data cfg -------------------------- #
self.train_dataset = dict(
name="objects365",
root="train",
ann_file="annotations/objects365_train_20190423.json",
remove_images_without_annotations=True,
)
self.test_dataset = dict(
name="objects365",
root="val",
ann_file="annotations/objects365_val_20190423.json",
remove_images_without_annotations=False,
)
self.num_classes = 365
# ------------------------ training cfg ---------------------- #
self.nr_images_epoch = 400000
def retinanet_res50_objects365_1x_800size(batch_size=1, **kwargs):
r"""
RetinaNet trained from Objects365 dataset.
`"RetinaNet" <https://arxiv.org/abs/1708.02002>`_
"""
return models.RetinaNet(CustomRetinaNetConfig(), batch_size=batch_size, **kwargs)
Net = models.RetinaNet
Cfg = CustomRetinaNetConfig
# -*- coding: utf-8 -*-
# MegEngine is Licensed under the Apache License, Version 2.0 (the "License")
#
# Copyright (c) 2014-2020 Megvii Inc. All rights reserved.
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT ARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
from megengine import hub
from official.vision.detection import models
class CustomRetinaNetConfig(models.RetinaNetConfig):
def __init__(self):
super().__init__()
# ------------------------ data cfg -------------------------- #
self.train_dataset = dict(
name="voc",
root="VOCdevkit/VOC2012",
image_set="train",
)
self.test_dataset = dict(
name="voc",
root="VOCdevkit/VOC2012",
image_set="val",
)
self.num_classes = 20
# ------------------------ training cfg ---------------------- #
self.nr_images_epoch = 16000
def retinanet_res50_voc_1x_800size(batch_size=1, **kwargs):
r"""
RetinaNet trained from VOC dataset.
`"RetinaNet" <https://arxiv.org/abs/1708.02002>`_
"""
return models.RetinaNet(CustomRetinaNetConfig(), batch_size=batch_size, **kwargs)
Net = models.RetinaNet
Cfg = CustomRetinaNetConfig
......@@ -11,7 +11,7 @@ import megengine.functional as F
from megengine._internal.craniotome import CraniotomeBase
from megengine.core.tensor import wrap_io_tensor
_so_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'lib_nms.so')
_so_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "lib_nms.so")
_so_lib = ctypes.CDLL(_so_path)
_TYPE_POINTER = ctypes.c_void_p
......@@ -50,10 +50,13 @@ class NMSCran(CraniotomeBase):
mask_tensor_ptr = outputs[2].pubapi_dev_tensor_ptr
_so_lib.NMSForwardGpu(
box_tensor_ptr, mask_tensor_ptr,
output_tensor_ptr, output_num_tensor_ptr,
self._iou_threshold, self._max_output,
self._host_device
box_tensor_ptr,
mask_tensor_ptr,
output_tensor_ptr,
output_num_tensor_ptr,
self._iou_threshold,
self._max_output,
self._host_device,
)
def grad(self, wrt_idx, inputs, outputs, out_grad):
......@@ -63,7 +66,7 @@ class NMSCran(CraniotomeBase):
return [np.int32, np.int32, np.int32]
def get_serialize_params(self):
return ('nms', struct.pack('fi', self._iou_threshold, self._max_output))
return ("nms", struct.pack("fi", self._iou_threshold, self._max_output))
def infer_shape(self, inp_shapes):
nr_box = inp_shapes[0][0]
......@@ -72,9 +75,10 @@ class NMSCran(CraniotomeBase):
# here we compute the number of int32 used in mask_outputs.
# In original version, we compute the bytes only.
mask_size = int(
nr_box * (
nr_box // threadsPerBlock + int((nr_box % threadsPerBlock) > 0)
) * 8 / 4
nr_box
* (nr_box // threadsPerBlock + int((nr_box % threadsPerBlock) > 0))
* 8
/ 4
)
return [[output_size], [1], [mask_size]]
......@@ -87,9 +91,11 @@ def gpu_nms(box, iou_threshold, max_output):
def batched_nms(boxes, scores, idxs, iou_threshold, num_keep, use_offset=False):
if use_offset:
boxes_offset = mge.tensor(
[0, 0, 1, 1], device=boxes.device
).reshape(1, 4).broadcast(boxes.shapeof(0), 4)
boxes_offset = (
mge.tensor([0, 0, 1, 1], device=boxes.device)
.reshape(1, 4)
.broadcast(boxes.shapeof(0), 4)
)
boxes = boxes - boxes_offset
max_coordinate = boxes.max()
offsets = idxs * (max_coordinate + 1)
......
......@@ -18,7 +18,7 @@ import megengine as mge
from megengine import jit
from megengine.data.dataset import COCO
from official.vision.detection.tools.test import DetEvaluator
from official.vision.detection.tools.utils import DetEvaluator
logger = mge.get_logger(__name__)
......@@ -28,8 +28,10 @@ def make_parser():
parser.add_argument(
"-f", "--file", default="net.py", type=str, help="net description file"
)
parser.add_argument(
"-w", "--weight_file", default=None, type=str, help="weights file",
)
parser.add_argument("-i", "--image", default="example.jpg", type=str)
parser.add_argument("-m", "--model", default=None, type=str)
return parser
......@@ -37,7 +39,7 @@ def main():
parser = make_parser()
args = parser.parse_args()
logger.info("Load Model : %s completed", args.model)
logger.info("Load Model : %s completed", args.weight_file)
@jit.trace(symbolic=True)
def val_func():
......@@ -48,7 +50,7 @@ def main():
current_network = importlib.import_module(os.path.basename(args.file).split(".")[0])
model = current_network.Net(current_network.Cfg(), batch_size=1)
model.eval()
state_dict = mge.load(args.model)
state_dict = mge.load(args.weight_file)
if "state_dict" in state_dict:
state_dict = state_dict["state_dict"]
model.load_state_dict(state_dict)
......
......@@ -10,12 +10,10 @@ import argparse
import importlib
import json
import os
import random
import sys
from multiprocessing import Process, Queue
from tqdm import tqdm
import cv2
import numpy as np
import megengine as mge
......@@ -23,254 +21,30 @@ from megengine import jit
from megengine.data import DataLoader, SequentialSampler
from official.vision.detection.tools.data_mapper import data_mapper
from official.vision.detection.tools.nms import py_cpu_nms
from official.vision.detection.tools.utils import DetEvaluator
logger = mge.get_logger(__name__)
class DetEvaluator:
def __init__(self, model):
self.model = model
@staticmethod
def get_hw_by_short_size(im_height, im_width, short_size, max_size):
"""get height and width by short size
Args:
im_height(int): height of original image, e.g. 800
im_width(int): width of original image, e.g. 1000
short_size(int): short size of transformed image. e.g. 800
max_size(int): max size of transformed image. e.g. 1333
Returns:
resized_height(int): height of transformed image
resized_width(int): width of transformed image
"""
im_size_min = np.min([im_height, im_width])
im_size_max = np.max([im_height, im_width])
scale = (short_size + 0.0) / im_size_min
if scale * im_size_max > max_size:
scale = (max_size + 0.0) / im_size_max
resized_height, resized_width = (
int(round(im_height * scale)),
int(round(im_width * scale)),
)
return resized_height, resized_width
@staticmethod
def process_inputs(img, short_size, max_size, flip=False):
original_height, original_width, _ = img.shape
resized_height, resized_width = DetEvaluator.get_hw_by_short_size(
original_height, original_width, short_size, max_size
)
resized_img = cv2.resize(
img, (resized_width, resized_height), interpolation=cv2.INTER_LINEAR,
)
resized_img = cv2.flip(resized_img, 1) if flip else resized_img
trans_img = np.ascontiguousarray(
resized_img.transpose(2, 0, 1)[None, :, :, :], dtype=np.uint8
)
im_info = np.array(
[(resized_height, resized_width, original_height, original_width)],
dtype=np.float32,
)
return trans_img, im_info
def predict(self, val_func):
"""
Args:
val_func(callable): model inference function
Returns:
results boxes: detection model output
"""
model = self.model
box_cls, box_delta = val_func()
box_cls, box_delta = box_cls.numpy(), box_delta.numpy()
dtboxes_all = list()
all_inds = np.where(box_cls > model.cfg.test_cls_threshold)
for c in range(0, model.cfg.num_classes):
inds = np.where(all_inds[1] == c)[0]
inds = all_inds[0][inds]
scores = box_cls[inds, c]
if model.cfg.class_aware_box:
bboxes = box_delta[inds, c, :]
else:
bboxes = box_delta[inds, :]
dtboxes = np.hstack((bboxes, scores[:, np.newaxis])).astype(np.float32)
if dtboxes.size > 0:
keep = py_cpu_nms(dtboxes, model.cfg.test_nms)
dtboxes = np.hstack(
(dtboxes[keep], np.ones((len(keep), 1), np.float32) * c)
).astype(np.float32)
dtboxes_all.extend(dtboxes)
if len(dtboxes_all) > model.cfg.test_max_boxes_per_image:
dtboxes_all = sorted(dtboxes_all, reverse=True, key=lambda i: i[4])[
: model.cfg.test_max_boxes_per_image
]
dtboxes_all = np.array(dtboxes_all, dtype=np.float)
return dtboxes_all
@staticmethod
def format(results, cfg):
dataset_class = data_mapper[cfg.test_dataset["name"]]
all_results = []
for record in results:
image_filename = record["image_id"]
boxes = record["det_res"]
if len(boxes) <= 0:
continue
boxes[:, 2:4] = boxes[:, 2:4] - boxes[:, 0:2]
for box in boxes:
elem = dict()
elem["image_id"] = image_filename
elem["bbox"] = box[:4].tolist()
elem["score"] = box[4]
elem["category_id"] = dataset_class.classes_originID[
dataset_class.class_names[int(box[5])]
]
all_results.append(elem)
return all_results
@staticmethod
def vis_det(
img,
dets,
is_show_label=True,
classes=None,
thresh=0.3,
name="detection",
return_img=True,
):
img = np.array(img)
colors = dict()
font = cv2.FONT_HERSHEY_SIMPLEX
for det in dets:
bb = det[:4].astype(int)
if is_show_label:
cls_id = int(det[5])
score = det[4]
if cls_id == 0:
continue
if score > thresh:
if cls_id not in colors:
colors[cls_id] = (
random.random() * 255,
random.random() * 255,
random.random() * 255,
)
cv2.rectangle(
img, (bb[0], bb[1]), (bb[2], bb[3]), colors[cls_id], 3
)
if classes and len(classes) > cls_id:
cls_name = classes[cls_id]
else:
cls_name = str(cls_id)
cv2.putText(
img,
"{:s} {:.3f}".format(cls_name, score),
(bb[0], bb[1] - 2),
font,
0.5,
(255, 255, 255),
1,
)
else:
cv2.rectangle(img, (bb[0], bb[1]), (bb[2], bb[3]), (0, 0, 255), 2)
if return_img:
return img
cv2.imshow(name, img)
while True:
c = cv2.waitKey(100000)
if c == ord("d"):
return None
elif c == ord("n"):
break
def build_dataloader(rank, world_size, data_dir, cfg):
val_dataset = data_mapper[cfg.test_dataset["name"]](
os.path.join(data_dir, cfg.test_dataset["name"], cfg.test_dataset["root"]),
os.path.join(data_dir, cfg.test_dataset["name"], cfg.test_dataset["ann_file"]),
order=["image", "info"],
)
val_sampler = SequentialSampler(val_dataset, 1, world_size=world_size, rank=rank)
val_dataloader = DataLoader(val_dataset, sampler=val_sampler, num_workers=2)
return val_dataloader
def worker(
net_file, model_file, data_dir, worker_id, total_worker, result_queue,
):
"""
:param net_file: network description file
:param model_file: file of dump weights
:param data_dir: the dataset directory
:param worker_id: the index of the worker
:param total_worker: number of gpu for evaluation
:param result_queue: processing queue
"""
os.environ["CUDA_VISIBLE_DEVICES"] = str(worker_id)
@jit.trace(symbolic=True, opt_level=2)
def val_func():
pred = model(model.inputs)
return pred
sys.path.insert(0, os.path.dirname(net_file))
current_network = importlib.import_module(os.path.basename(net_file).split(".")[0])
model = current_network.Net(current_network.Cfg(), batch_size=1)
model.eval()
evaluator = DetEvaluator(model)
state_dict = mge.load(model_file)
if "state_dict" in state_dict:
state_dict = state_dict["state_dict"]
model.load_state_dict(state_dict)
loader = build_dataloader(worker_id, total_worker, data_dir, model.cfg)
for data_dict in loader:
data, im_info = DetEvaluator.process_inputs(
data_dict[0][0],
model.cfg.test_image_short_size,
model.cfg.test_image_max_size,
)
model.inputs["im_info"].set_value(im_info)
model.inputs["image"].set_value(data.astype(np.float32))
pred_res = evaluator.predict(val_func)
result_queue.put_nowait(
{
"det_res": pred_res,
"image_id": int(data_dict[1][2][0].split(".")[0].split("_")[-1]),
}
)
def make_parser():
parser = argparse.ArgumentParser()
parser.add_argument("-b", "--batch_size", default=1, type=int)
parser.add_argument("-n", "--ngpus", default=1, type=int)
parser.add_argument(
"-f", "--file", default="net.py", type=str, help="net description file"
)
parser.add_argument("-d", "--dataset_dir", default="/data/datasets", type=str)
parser.add_argument(
"-w", "--weight_file", default=None, type=str, help="weights file",
)
parser.add_argument(
"-n", "--ngpus", default=1, type=int, help="total number of gpus for testing",
)
parser.add_argument(
"-b", "--batch_size", default=1, type=int, help="batchsize for testing",
)
parser.add_argument(
"-d", "--dataset_dir", default="/data/datasets", type=str,
)
parser.add_argument("-se", "--start_epoch", default=-1, type=int)
parser.add_argument("-ee", "--end_epoch", default=-1, type=int)
parser.add_argument("-m", "--model", default=None, type=str)
return parser
......@@ -286,8 +60,8 @@ def main():
args.end_epoch = args.start_epoch
for epoch_num in range(args.start_epoch, args.end_epoch + 1):
if args.model:
model_file = args.model
if args.weight_file:
model_file = args.weight_file
else:
model_file = "log-of-{}/epoch_{}.pkl".format(
os.path.basename(args.file).split(".")[0], epoch_num
......@@ -312,16 +86,18 @@ def main():
proc.start()
procs.append(proc)
for _ in tqdm(range(5000)):
results_list.append(result_queue.get())
for p in procs:
p.join()
sys.path.insert(0, os.path.dirname(args.file))
current_network = importlib.import_module(
os.path.basename(args.file).split(".")[0]
)
cfg = current_network.Cfg()
num_imgs = dict(coco=5000, objects365=30000)
for _ in tqdm(range(num_imgs[cfg.test_dataset["name"]])):
results_list.append(result_queue.get())
for p in procs:
p.join()
all_results = DetEvaluator.format(results_list, cfg)
json_path = "log-of-{}/epoch_{}.json".format(
os.path.basename(args.file).split(".")[0], epoch_num
......@@ -362,5 +138,63 @@ def main():
logger.info("-" * 32)
def worker(
net_file, model_file, data_dir, worker_id, total_worker, result_queue,
):
"""
:param net_file: network description file
:param model_file: file of dump weights
:param data_dir: the dataset directory
:param worker_id: the index of the worker
:param total_worker: number of gpu for evaluation
:param result_queue: processing queue
"""
os.environ["CUDA_VISIBLE_DEVICES"] = str(worker_id)
@jit.trace(symbolic=True)
def val_func():
pred = model(model.inputs)
return pred
sys.path.insert(0, os.path.dirname(net_file))
current_network = importlib.import_module(os.path.basename(net_file).split(".")[0])
model = current_network.Net(current_network.Cfg(), batch_size=1)
model.eval()
evaluator = DetEvaluator(model)
state_dict = mge.load(model_file)
if "state_dict" in state_dict:
state_dict = state_dict["state_dict"]
model.load_state_dict(state_dict)
loader = build_dataloader(worker_id, total_worker, data_dir, model.cfg)
for data_dict in loader:
data, im_info = DetEvaluator.process_inputs(
data_dict[0][0],
model.cfg.test_image_short_size,
model.cfg.test_image_max_size,
)
model.inputs["im_info"].set_value(im_info)
model.inputs["image"].set_value(data.astype(np.float32))
pred_res = evaluator.predict(val_func)
result_queue.put_nowait(
{
"det_res": pred_res,
"image_id": int(data_dict[1][2][0].split(".")[0].split("_")[-1]),
}
)
def build_dataloader(rank, world_size, data_dir, cfg):
val_dataset = data_mapper[cfg.test_dataset["name"]](
os.path.join(data_dir, cfg.test_dataset["name"], cfg.test_dataset["root"]),
os.path.join(data_dir, cfg.test_dataset["name"], cfg.test_dataset["ann_file"]),
order=["image", "info"],
)
val_sampler = SequentialSampler(val_dataset, 1, world_size=world_size, rank=rank)
val_dataloader = DataLoader(val_dataset, sampler=val_sampler, num_workers=2)
return val_dataloader
if __name__ == "__main__":
main()
......@@ -15,7 +15,6 @@ import multiprocessing as mp
import os
import sys
import time
from collections import defaultdict
from tabulate import tabulate
import numpy as np
......@@ -24,32 +23,74 @@ import megengine as mge
from megengine import distributed as dist
from megengine import jit
from megengine import optimizer as optim
from megengine.data import Collator, DataLoader, Infinite, RandomSampler
from megengine.data import DataLoader, Infinite, RandomSampler
from megengine.data import transform as T
from official.vision.detection.tools.data_mapper import data_mapper
from official.vision.detection.tools.utils import (
AverageMeter,
DetectionPadCollator,
GroupedRandomSampler
)
logger = mge.get_logger(__name__)
class AverageMeter:
"""Computes and stores the average and current value"""
def make_parser():
parser = argparse.ArgumentParser()
parser.add_argument(
"-f", "--file", default="net.py", type=str, help="net description file"
)
parser.add_argument(
"-w", "--weight_file", default=None, type=str, help="weights file",
)
parser.add_argument(
"-n", "--ngpus", default=-1, type=int, help="total number of gpus for training",
)
parser.add_argument(
"-b", "--batch_size", default=2, type=int, help="batchsize for training",
)
parser.add_argument(
"-d", "--dataset_dir", default="/data/datasets", type=str,
)
parser.add_argument("--enable_sublinear", action="store_true")
return parser
def __init__(self, record_len=1):
self.record_len = record_len
self.sum = [0 for i in range(self.record_len)]
self.cnt = 0
def reset(self):
self.sum = [0 for i in range(self.record_len)]
self.cnt = 0
def main():
parser = make_parser()
args = parser.parse_args()
def update(self, val):
self.sum = [s + v for s, v in zip(self.sum, val)]
self.cnt += 1
# ------------------------ begin training -------------------------- #
valid_nr_dev = mge.get_device_count("gpu")
if args.ngpus == -1:
world_size = valid_nr_dev
else:
if args.ngpus > valid_nr_dev:
logger.error("do not have enough gpus for training")
sys.exit(1)
else:
world_size = args.ngpus
def average(self):
return [s / self.cnt for s in self.sum]
logger.info("Device Count = %d", world_size)
log_dir = "log-of-{}".format(os.path.basename(args.file).split(".")[0])
if not os.path.isdir(log_dir):
os.makedirs(log_dir)
if world_size > 1:
mp.set_start_method("spawn")
processes = list()
for i in range(world_size):
process = mp.Process(target=worker, args=(i, world_size, args))
process.start()
processes.append(process)
for p in processes:
p.join()
else:
worker(0, 1, args)
def worker(rank, world_size, args):
......@@ -85,8 +126,7 @@ def worker(rank, world_size, args):
if rank == 0:
logger.info("Prepare dataset")
loader = build_dataloader(model.batch_size, args.dataset_dir, model.cfg)
train_loader = iter(loader["train"])
train_loader = iter(build_dataloader(model.batch_size, args.dataset_dir, model.cfg))
for epoch_id in range(model.cfg.max_epoch):
for param_group in opt.param_groups:
......@@ -121,23 +161,6 @@ def worker(rank, world_size, args):
logger.info("dump weights to %s", save_path)
def adjust_learning_rate(optimizer, epoch_id, step, model, world_size):
base_lr = (
model.cfg.basic_lr
* world_size
* model.batch_size
* (
model.cfg.lr_decay_rate
** bisect.bisect_right(model.cfg.lr_decay_stages, epoch_id)
)
)
# Warm up
if epoch_id == 0 and step < model.cfg.warm_iters:
lr_factor = (step + 1.0) / model.cfg.warm_iters
for param_group in optimizer.param_groups:
param_group["lr"] = base_lr * lr_factor
def train_one_epoch(
model,
data_queue,
......@@ -150,7 +173,7 @@ def train_one_epoch(
):
sublinear_cfg = jit.SublinearMemoryConfig() if enable_sublinear else None
@jit.trace(symbolic=True, opt_level=2, sublinear_memory_config=sublinear_cfg)
@jit.trace(symbolic=True, sublinear_memory_config=sublinear_cfg)
def propagate():
loss_dict = model(model.inputs)
opt.backward(loss_dict["total_loss"])
......@@ -181,7 +204,9 @@ def train_one_epoch(
if rank == 0:
info_str = "e%d, %d/%d, lr:%f, "
loss_str = ", ".join(["{}:%f".format(loss) for loss in model.cfg.losses_keys])
loss_str = ", ".join(
["{}:%f".format(loss) for loss in model.cfg.losses_keys]
)
time_str = ", train_time:%.3fs, data_time:%.3fs"
log_info_str = info_str + loss_str + time_str
meter.update([loss.numpy() for loss in loss_list])
......@@ -200,28 +225,6 @@ def train_one_epoch(
time_meter.reset()
def make_parser():
parser = argparse.ArgumentParser()
parser.add_argument(
"-f", "--file", default="net.py", type=str, help="net description file"
)
parser.add_argument(
"-w", "--weight_file", default=None, type=str, help="pre-train weights file",
)
parser.add_argument(
"-n", "--ngpus", default=-1, type=int, help="total number of gpus for training",
)
parser.add_argument(
"-b", "--batch_size", default=2, type=int, help="batchsize for training",
)
parser.add_argument(
"-d", "--dataset_dir", default="/data/datasets", type=str,
)
parser.add_argument("--enable_sublinear", action="store_true")
return parser
def get_config_info(config):
config_table = []
for c, v in config.__dict__.items():
......@@ -237,39 +240,21 @@ def get_config_info(config):
return config_table
def main():
parser = make_parser()
args = parser.parse_args()
# ------------------------ begin training -------------------------- #
valid_nr_dev = mge.get_device_count("gpu")
if args.ngpus == -1:
world_size = valid_nr_dev
else:
if args.ngpus > valid_nr_dev:
logger.error("do not have enough gpus for training")
sys.exit(1)
else:
world_size = args.ngpus
logger.info("Device Count = %d", world_size)
log_dir = "log-of-{}".format(os.path.basename(args.file).split(".")[0])
if not os.path.isdir(log_dir):
os.makedirs(log_dir)
if world_size > 1:
mp.set_start_method("spawn")
processes = list()
for i in range(world_size):
process = mp.Process(target=worker, args=(i, world_size, args))
process.start()
processes.append(process)
for p in processes:
p.join()
else:
worker(0, 1, args)
def adjust_learning_rate(optimizer, epoch_id, step, model, world_size):
base_lr = (
model.cfg.basic_lr
* world_size
* model.batch_size
* (
model.cfg.lr_decay_rate
** bisect.bisect_right(model.cfg.lr_decay_stages, epoch_id)
)
)
# Warm up
if epoch_id == 0 and step < model.cfg.warm_iters:
lr_factor = (step + 1.0) / model.cfg.warm_iters
for param_group in optimizer.param_groups:
param_group["lr"] = base_lr * lr_factor
def build_dataset(data_dir, cfg):
......@@ -314,7 +299,9 @@ def build_dataloader(batch_size, data_dir, cfg):
transform=T.Compose(
transforms=[
T.ShortestEdgeResize(
cfg.train_image_short_size, cfg.train_image_max_size
cfg.train_image_short_size,
cfg.train_image_max_size,
sample_style="choice",
),
T.RandomHorizontalFlip(),
T.ToMode(),
......@@ -324,99 +311,7 @@ def build_dataloader(batch_size, data_dir, cfg):
collator=DetectionPadCollator(),
num_workers=2,
)
return {"train": train_dataloader}
class GroupedRandomSampler(RandomSampler):
def __init__(
self,
dataset,
batch_size,
group_ids,
indices=None,
world_size=None,
rank=None,
seed=None,
):
super().__init__(dataset, batch_size, False, indices, world_size, rank, seed)
self.group_ids = group_ids
assert len(group_ids) == len(dataset)
groups = np.unique(self.group_ids).tolist()
# buffer the indices of each group until batch size is reached
self.buffer_per_group = {k: [] for k in groups}
def batch(self):
indices = list(self.sample())
if self.world_size > 1:
indices = self.scatter(indices)
batch_index = []
for ind in indices:
group_id = self.group_ids[ind]
group_buffer = self.buffer_per_group[group_id]
group_buffer.append(ind)
if len(group_buffer) == self.batch_size:
batch_index.append(group_buffer)
self.buffer_per_group[group_id] = []
return iter(batch_index)
def __len__(self):
raise NotImplementedError("len() of GroupedRandomSampler is not well-defined.")
class DetectionPadCollator(Collator):
def __init__(self, pad_value: float = 0.0):
super().__init__()
self.pad_value = pad_value
def apply(self, inputs):
"""
assume order = ["image", "boxes", "boxes_category", "info"]
"""
batch_data = defaultdict(list)
for image, boxes, boxes_category, info in inputs:
batch_data["data"].append(image)
batch_data["gt_boxes"].append(
np.concatenate([boxes, boxes_category[:, np.newaxis]], axis=1).astype(
np.float32
)
)
_, current_height, current_width = image.shape
assert len(boxes) == len(boxes_category)
num_instances = len(boxes)
info = [
current_height,
current_width,
info[0],
info[1],
num_instances,
]
batch_data["im_info"].append(np.array(info, dtype=np.float32))
for key, value in batch_data.items():
pad_shape = list(max(s) for s in zip(*[x.shape for x in value]))
pad_value = [
np.pad(
v,
self._get_padding(v.shape, pad_shape),
constant_values=self.pad_value,
)
for v in value
]
batch_data[key] = np.ascontiguousarray(pad_value)
return batch_data
def _get_padding(self, original_shape, target_shape):
assert len(original_shape) == len(target_shape)
shape = []
for o, t in zip(original_shape, target_shape):
shape.append((0, t - o))
return tuple(shape)
return train_dataloader
if __name__ == "__main__":
......
# -*- coding: utf-8 -*-
# MegEngine is Licensed under the Apache License, Version 2.0 (the "License")
#
# Copyright (c) 2014-2020 Megvii Inc. All rights reserved.
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT ARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
import random
from collections import defaultdict
import cv2
import numpy as np
from megengine.data import Collator, RandomSampler
from official.vision.detection.tools.data_mapper import data_mapper
from official.vision.detection.tools.nms import py_cpu_nms
class AverageMeter:
"""Computes and stores the average and current value"""
def __init__(self, record_len=1):
self.record_len = record_len
self.sum = [0 for i in range(self.record_len)]
self.cnt = 0
def reset(self):
self.sum = [0 for i in range(self.record_len)]
self.cnt = 0
def update(self, val):
self.sum = [s + v for s, v in zip(self.sum, val)]
self.cnt += 1
def average(self):
return [s / self.cnt for s in self.sum]
class GroupedRandomSampler(RandomSampler):
def __init__(
self,
dataset,
batch_size,
group_ids,
indices=None,
world_size=None,
rank=None,
seed=None,
):
super().__init__(dataset, batch_size, False, indices, world_size, rank, seed)
self.group_ids = group_ids
assert len(group_ids) == len(dataset)
groups = np.unique(self.group_ids).tolist()
# buffer the indices of each group until batch size is reached
self.buffer_per_group = {k: [] for k in groups}
def batch(self):
indices = list(self.sample())
if self.world_size > 1:
indices = self.scatter(indices)
batch_index = []
for ind in indices:
group_id = self.group_ids[ind]
group_buffer = self.buffer_per_group[group_id]
group_buffer.append(ind)
if len(group_buffer) == self.batch_size:
batch_index.append(group_buffer)
self.buffer_per_group[group_id] = []
return iter(batch_index)
def __len__(self):
raise NotImplementedError("len() of GroupedRandomSampler is not well-defined.")
class DetectionPadCollator(Collator):
def __init__(self, pad_value: float = 0.0):
super().__init__()
self.pad_value = pad_value
def apply(self, inputs):
"""
assume order = ["image", "boxes", "boxes_category", "info"]
"""
batch_data = defaultdict(list)
for image, boxes, boxes_category, info in inputs:
batch_data["data"].append(image)
batch_data["gt_boxes"].append(
np.concatenate([boxes, boxes_category[:, np.newaxis]], axis=1).astype(
np.float32
)
)
_, current_height, current_width = image.shape
assert len(boxes) == len(boxes_category)
num_instances = len(boxes)
info = [
current_height,
current_width,
info[0],
info[1],
num_instances,
]
batch_data["im_info"].append(np.array(info, dtype=np.float32))
for key, value in batch_data.items():
pad_shape = list(max(s) for s in zip(*[x.shape for x in value]))
pad_value = [
np.pad(
v,
self._get_padding(v.shape, pad_shape),
constant_values=self.pad_value,
)
for v in value
]
batch_data[key] = np.ascontiguousarray(pad_value)
return batch_data
def _get_padding(self, original_shape, target_shape):
assert len(original_shape) == len(target_shape)
shape = []
for o, t in zip(original_shape, target_shape):
shape.append((0, t - o))
return tuple(shape)
class DetEvaluator:
def __init__(self, model):
self.model = model
@staticmethod
def get_hw_by_short_size(im_height, im_width, short_size, max_size):
"""get height and width by short size
Args:
im_height(int): height of original image, e.g. 800
im_width(int): width of original image, e.g. 1000
short_size(int): short size of transformed image. e.g. 800
max_size(int): max size of transformed image. e.g. 1333
Returns:
resized_height(int): height of transformed image
resized_width(int): width of transformed image
"""
im_size_min = np.min([im_height, im_width])
im_size_max = np.max([im_height, im_width])
scale = (short_size + 0.0) / im_size_min
if scale * im_size_max > max_size:
scale = (max_size + 0.0) / im_size_max
resized_height, resized_width = (
int(round(im_height * scale)),
int(round(im_width * scale)),
)
return resized_height, resized_width
@staticmethod
def process_inputs(img, short_size, max_size, flip=False):
original_height, original_width, _ = img.shape
resized_height, resized_width = DetEvaluator.get_hw_by_short_size(
original_height, original_width, short_size, max_size
)
resized_img = cv2.resize(
img, (resized_width, resized_height), interpolation=cv2.INTER_LINEAR,
)
resized_img = cv2.flip(resized_img, 1) if flip else resized_img
trans_img = np.ascontiguousarray(
resized_img.transpose(2, 0, 1)[None, :, :, :], dtype=np.uint8
)
im_info = np.array(
[(resized_height, resized_width, original_height, original_width)],
dtype=np.float32,
)
return trans_img, im_info
def predict(self, val_func):
"""
Args:
val_func(callable): model inference function
Returns:
results boxes: detection model output
"""
model = self.model
box_cls, box_delta = val_func()
box_cls, box_delta = box_cls.numpy(), box_delta.numpy()
dtboxes_all = list()
all_inds = np.where(box_cls > model.cfg.test_cls_threshold)
for c in range(0, model.cfg.num_classes):
inds = np.where(all_inds[1] == c)[0]
inds = all_inds[0][inds]
scores = box_cls[inds, c]
if model.cfg.class_aware_box:
bboxes = box_delta[inds, c, :]
else:
bboxes = box_delta[inds, :]
dtboxes = np.hstack((bboxes, scores[:, np.newaxis])).astype(np.float32)
if dtboxes.size > 0:
keep = py_cpu_nms(dtboxes, model.cfg.test_nms)
dtboxes = np.hstack(
(dtboxes[keep], np.ones((len(keep), 1), np.float32) * c)
).astype(np.float32)
dtboxes_all.extend(dtboxes)
if len(dtboxes_all) > model.cfg.test_max_boxes_per_image:
dtboxes_all = sorted(dtboxes_all, reverse=True, key=lambda i: i[4])[
: model.cfg.test_max_boxes_per_image
]
dtboxes_all = np.array(dtboxes_all, dtype=np.float)
return dtboxes_all
@staticmethod
def format(results, cfg):
dataset_class = data_mapper[cfg.test_dataset["name"]]
all_results = []
for record in results:
image_filename = record["image_id"]
boxes = record["det_res"]
if len(boxes) <= 0:
continue
boxes[:, 2:4] = boxes[:, 2:4] - boxes[:, 0:2]
for box in boxes:
elem = dict()
elem["image_id"] = image_filename
elem["bbox"] = box[:4].tolist()
elem["score"] = box[4]
if hasattr(dataset_class, "classes_originID"):
elem["category_id"] = dataset_class.classes_originID[
dataset_class.class_names[int(box[5])]
]
else:
elem["category_id"] = int(box[5])
all_results.append(elem)
return all_results
@staticmethod
def vis_det(
img,
dets,
is_show_label=True,
classes=None,
thresh=0.3,
name="detection",
return_img=True,
):
img = np.array(img)
colors = dict()
font = cv2.FONT_HERSHEY_SIMPLEX
for det in dets:
bb = det[:4].astype(int)
if is_show_label:
cls_id = int(det[5])
score = det[4]
if cls_id == 0:
continue
if score > thresh:
if cls_id not in colors:
colors[cls_id] = (
random.random() * 255,
random.random() * 255,
random.random() * 255,
)
cv2.rectangle(
img, (bb[0], bb[1]), (bb[2], bb[3]), colors[cls_id], 3
)
if classes and len(classes) > cls_id:
cls_name = classes[cls_id]
else:
cls_name = str(cls_id)
cv2.putText(
img,
"{:s} {:.3f}".format(cls_name, score),
(bb[0], bb[1] - 2),
font,
0.5,
(255, 255, 255),
1,
)
else:
cv2.rectangle(img, (bb[0], bb[1]), (bb[2], bb[3]), (0, 0, 255), 2)
if return_img:
return img
cv2.imshow(name, img)
while True:
c = cv2.waitKey(100000)
if c == ord("d"):
return None
elif c == ord("n"):
break
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册