未验证 提交 8d0ad5fd 编写于 作者: F Feng Ni 提交者: GitHub

[MOT] add enhance fairmot (#4251)

* add sge attention on fpn

* add iou head

* add config, fix head

* fix iou offset weight

* fix fairmot centerhead
上级 72eccd6e
...@@ -140,14 +140,17 @@ If you use a stronger detection model, you can get better results. Each txt is t ...@@ -140,14 +140,17 @@ If you use a stronger detection model, you can get better results. Each txt is t
### Results on MOT-16 Test Set ### Results on MOT-16 Test Set
| backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | download | config | | backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | download | config |
| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: | | :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: |
| DLA-34 | 1088x608 | 75.9 | 74.7 | 1021 | 11425 | 31475 | - |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_dla34_30e_1088x608.pdparams) | [config](./fairmot_enhance_dla34_30e_1088x608.yml) |
| HarDNet-85 | 1088x608 | 75.0 | 70.0 | 1050 | 11837 | 32774 | - |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_hardnet85_30e_1088x608.pdparams) | [config](./fairmot/fairmot_enhance_hardnet85_30e_1088x608.yml) | | HarDNet-85 | 1088x608 | 75.0 | 70.0 | 1050 | 11837 | 32774 | - |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_hardnet85_30e_1088x608.pdparams) | [config](./fairmot/fairmot_enhance_hardnet85_30e_1088x608.yml) |
### Results on MOT-17 Test Set ### Results on MOT-17 Test Set
| backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | download | config | | backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | download | config |
| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: | | :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: |
| DLA-34 | 1088x608 | 75.3 | 74.2 | 3270 | 29112 | 106749 | - |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_dla34_30e_1088x608.pdparams) | [config](./fairmot_enhance_dla34_30e_1088x608.yml) |
| HarDNet-85 | 1088x608 | 74.7 | 70.7 | 3210 | 29790 | 109914 | - |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_hardnet85_30e_1088x608.pdparams) | [config](./fairmot/fairmot_enhance_hardnet85_30e_1088x608.yml) | | HarDNet-85 | 1088x608 | 74.7 | 70.7 | 3210 | 29790 | 109914 | - |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_hardnet85_30e_1088x608.pdparams) | [config](./fairmot/fairmot_enhance_hardnet85_30e_1088x608.yml) |
**注意:** **Notes:**
FairMOT enhance DLA-34 used 8 GPUs for training and mini-batch size as 16 on each GPU,and trained for 60 epoches. The crowdhuman dataset is added to the train-set during training.
FairMOT enhance HarDNet-85 used 8 GPUs for training and mini-batch size as 10 on each GPU,and trained for 30 epoches. The crowdhuman dataset is added to the train-set during training. FairMOT enhance HarDNet-85 used 8 GPUs for training and mini-batch size as 10 on each GPU,and trained for 30 epoches. The crowdhuman dataset is added to the train-set during training.
......
...@@ -140,14 +140,17 @@ wget https://dataset.bj.bcebos.com/mot/det_results_dir.zip ...@@ -140,14 +140,17 @@ wget https://dataset.bj.bcebos.com/mot/det_results_dir.zip
### 在MOT-16 Test Set上结果 ### 在MOT-16 Test Set上结果
| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 | | 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 |
| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: | | :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: |
| DLA-34 | 1088x608 | 75.9 | 74.7 | 1021 | 11425 | 31475 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_dla34_60e_1088x608.pdparams) | [配置文件](./fairmot_enhance_dla34_60e_1088x608.yml) |
| HarDNet-85 | 1088x608 | 75.0 | 70.0 | 1050 | 11837 | 32774 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_hardnet85_30e_1088x608.pdparams) | [配置文件](./fairmot/fairmot_enhance_hardnet85_30e_1088x608.yml) | | HarDNet-85 | 1088x608 | 75.0 | 70.0 | 1050 | 11837 | 32774 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_hardnet85_30e_1088x608.pdparams) | [配置文件](./fairmot/fairmot_enhance_hardnet85_30e_1088x608.yml) |
### 在MOT-17 Test Set上结果 ### 在MOT-17 Test Set上结果
| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 | | 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 |
| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: | | :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: |
| DLA-34 | 1088x608 | 75.3 | 74.2 | 3270 | 29112 | 106749 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_dla34_60e_1088x608.pdparams) | [配置文件](./fairmot_enhance_dla34_60e_1088x608.yml) |
| HarDNet-85 | 1088x608 | 74.7 | 70.7 | 3210 | 29790 | 109914 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_hardnet85_30e_1088x608.pdparams) | [配置文件](./fairmot/fairmot_enhance_hardnet85_30e_1088x608.yml) | | HarDNet-85 | 1088x608 | 74.7 | 70.7 | 3210 | 29790 | 109914 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_hardnet85_30e_1088x608.pdparams) | [配置文件](./fairmot/fairmot_enhance_hardnet85_30e_1088x608.yml) |
**注意:** **注意:**
FairMOT enhance DLA-34使用8个GPU进行训练,每个GPU上batch size为16,训练60个epoch,并且训练集中加入了crowdhuman数据集一起参与训练。
FairMOT enhance HarDNet-85 使用8个GPU进行训练,每个GPU上batch size为10,训练30个epoch,并且训练集中加入了crowdhuman数据集一起参与训练。 FairMOT enhance HarDNet-85 使用8个GPU进行训练,每个GPU上batch size为10,训练30个epoch,并且训练集中加入了crowdhuman数据集一起参与训练。
......
...@@ -41,14 +41,17 @@ English | [简体中文](README_cn.md) ...@@ -41,14 +41,17 @@ English | [简体中文](README_cn.md)
### Results on MOT-16 Test Set ### Results on MOT-16 Test Set
| backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | download | config | | backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | download | config |
| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: | | :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: |
| DLA-34 | 1088x608 | 75.9 | 74.7 | 1021 | 11425 | 31475 | - |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_dla34_60e_1088x608.pdparams) | [config](./fairmot_enhance_dla34_60e_1088x608.yml) |
| HarDNet-85 | 1088x608 | 75.0 | 70.0 | 1050 | 11837 | 32774 | - |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_hardnet85_30e_1088x608.pdparams) | [config](./fairmot_enhance_hardnet85_30e_1088x608.yml) | | HarDNet-85 | 1088x608 | 75.0 | 70.0 | 1050 | 11837 | 32774 | - |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_hardnet85_30e_1088x608.pdparams) | [config](./fairmot_enhance_hardnet85_30e_1088x608.yml) |
### Results on MOT-17 Test Set ### Results on MOT-17 Test Set
| backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | download | config | | backbone | input shape | MOTA | IDF1 | IDS | FP | FN | FPS | download | config |
| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: | | :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: |
| DLA-34 | 1088x608 | 75.3 | 74.2 | 3270 | 29112 | 106749 | - |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_dla34_60e_1088x608.pdparams) | [config](./fairmot_enhance_dla34_60e_1088x608.yml) |
| HarDNet-85 | 1088x608 | 74.7 | 70.7 | 3210 | 29790 | 109914 | - |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_hardnet85_30e_1088x608.pdparams) | [config](./fairmot_enhance_hardnet85_30e_1088x608.yml) | | HarDNet-85 | 1088x608 | 74.7 | 70.7 | 3210 | 29790 | 109914 | - |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_hardnet85_30e_1088x608.pdparams) | [config](./fairmot_enhance_hardnet85_30e_1088x608.yml) |
**注意:** **Notes:**
FairMOT enhance DLA-34 used 8 GPUs for training and mini-batch size as 16 on each GPU,and trained for 60 epoches. The crowdhuman dataset is added to the train-set during training.
FairMOT enhance HarDNet-85 used 8 GPUs for training and mini-batch size as 10 on each GPU,and trained for 30 epoches. The crowdhuman dataset is added to the train-set during training. FairMOT enhance HarDNet-85 used 8 GPUs for training and mini-batch size as 10 on each GPU,and trained for 30 epoches. The crowdhuman dataset is added to the train-set during training.
...@@ -64,7 +67,7 @@ English | [简体中文](README_cn.md) ...@@ -64,7 +67,7 @@ English | [简体中文](README_cn.md)
| HRNetV2-W18 | 1088x608 | 70.7 | 65.7 | 4281 | 22485 | 138468 | - |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_1088x608.pdparams) | [config](./fairmot_hrnetv2_w18_dlafpn_30e_1088x608.yml) | | HRNetV2-W18 | 1088x608 | 70.7 | 65.7 | 4281 | 22485 | 138468 | - |[model](https://paddledet.bj.bcebos.com/models/mot/fairmot_hrnetv2_w18_dlafpn_30e_1088x608.pdparams) | [config](./fairmot_hrnetv2_w18_dlafpn_30e_1088x608.yml) |
**Notes:** **Notes:**
FairMOT HRNetV2-W18 used 8 GPUs for training and mini-batch size as 6 on each GPU, and trained for 30 epoches. Only ImageNet pre-train model is used, and the optimizer adopts Momentum. The crowdhuman dataset is added to the train-set during training. FairMOT HRNetV2-W18 used 8 GPUs for training and mini-batch size as 4 on each GPU, and trained for 30 epoches. Only ImageNet pre-train model is used, and the optimizer adopts Momentum. The crowdhuman dataset is added to the train-set during training.
## Getting Start ## Getting Start
......
...@@ -40,14 +40,17 @@ ...@@ -40,14 +40,17 @@
### 在MOT-16 Test Set上结果 ### 在MOT-16 Test Set上结果
| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 | | 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 |
| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: | | :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: |
| DLA-34 | 1088x608 | 75.9 | 74.7 | 1021 | 11425 | 31475 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_dla34_60e_1088x608.pdparams) | [配置文件](./fairmot_enhance_dla34_60e_1088x608.yml) |
| HarDNet-85 | 1088x608 | 75.0 | 70.0 | 1050 | 11837 | 32774 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_hardnet85_30e_1088x608.pdparams) | [配置文件](./fairmot_enhance_hardnet85_30e_1088x608.yml) | | HarDNet-85 | 1088x608 | 75.0 | 70.0 | 1050 | 11837 | 32774 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_hardnet85_30e_1088x608.pdparams) | [配置文件](./fairmot_enhance_hardnet85_30e_1088x608.yml) |
### 在MOT-17 Test Set上结果 ### 在MOT-17 Test Set上结果
| 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 | | 骨干网络 | 输入尺寸 | MOTA | IDF1 | IDS | FP | FN | FPS | 下载链接 | 配置文件 |
| :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: | | :--------------| :------- | :----: | :----: | :----: | :----: | :----: | :------: | :----: |:-----: |
| DLA-34 | 1088x608 | 75.3 | 74.2 | 3270 | 29112 | 106749 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_dla34_60e_1088x608.pdparams) | [配置文件](./fairmot_enhance_dla34_60e_1088x608.yml) |
| HarDNet-85 | 1088x608 | 74.7 | 70.7 | 3210 | 29790 | 109914 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_hardnet85_30e_1088x608.pdparams) | [配置文件](./fairmot_enhance_hardnet85_30e_1088x608.yml) | | HarDNet-85 | 1088x608 | 74.7 | 70.7 | 3210 | 29790 | 109914 | - |[下载链接](https://paddledet.bj.bcebos.com/models/mot/fairmot_enhance_hardnet85_30e_1088x608.pdparams) | [配置文件](./fairmot_enhance_hardnet85_30e_1088x608.yml) |
**注意:** **注意:**
FairMOT enhance DLA-34使用8个GPU进行训练,每个GPU上batch size为16,训练60个epoch,并且训练集中加入了crowdhuman数据集一起参与训练。
FairMOT enhance HarDNet-85 使用8个GPU进行训练,每个GPU上batch size为10,训练30个epoch,并且训练集中加入了crowdhuman数据集一起参与训练。 FairMOT enhance HarDNet-85 使用8个GPU进行训练,每个GPU上batch size为10,训练30个epoch,并且训练集中加入了crowdhuman数据集一起参与训练。
......
...@@ -14,8 +14,31 @@ CenterNet: ...@@ -14,8 +14,31 @@ CenterNet:
head: CenterNetHead head: CenterNetHead
post_process: CenterNetPostProcess post_process: CenterNetPostProcess
CenterNetDLAFPN:
down_ratio: 4
last_level: 5
out_channel: 0
dcn_v2: True
with_sge: False
CenterNetHead:
head_planes: 256
heatmap_weight: 1
regress_ltrb: True
size_weight: 0.1
size_loss: 'L1'
offset_weight: 1
iou_weight: 0
FairMOTEmbeddingHead:
ch_head: 256
ch_emb: 128
num_identifiers: 14455 # for mix dataset (Caltech, CityPersons, CUHK-SYSU, PRW, ETHZ and MOT16)
CenterNetPostProcess: CenterNetPostProcess:
max_per_img: 500 max_per_img: 500
down_ratio: 4
regress_ltrb: True
JDETracker: JDETracker:
conf_thres: 0.4 conf_thres: 0.4
......
_BASE_: [
'../../datasets/mot.yml',
'../../runtime.yml',
'_base_/optimizer_30e.yml',
'_base_/fairmot_dla34.yml',
'_base_/fairmot_reader_1088x608.yml',
]
norm_type: sync_bn
use_ema: true
ema_decay: 0.9998
worker_num: 4
TrainReader:
inputs_def:
image_shape: [3, 608, 1088]
sample_transforms:
- Decode: {}
- RGBReverse: {}
- AugmentHSV: {}
- LetterBoxResize: {target_size: [608, 1088]}
- MOTRandomAffine: {reject_outside: False}
- RandomFlip: {}
- BboxXYXY2XYWH: {}
- NormalizeBox: {}
- NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1]}
- RGBReverse: {}
- Permute: {}
batch_transforms:
- Gt2FairMOTTarget: {}
batch_size: 16
shuffle: True
drop_last: True
use_shared_memory: True
epoch: 60
LearningRate:
base_lr: 0.0005
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [40,]
use_warmup: False
OptimizerBuilder:
optimizer:
type: Adam
regularizer: NULL
weights: output/fairmot_enhance_dla34_60e_1088x608/model_final
...@@ -2,7 +2,7 @@ _BASE_: [ ...@@ -2,7 +2,7 @@ _BASE_: [
'../../datasets/mot.yml', '../../datasets/mot.yml',
'../../runtime.yml', '../../runtime.yml',
'_base_/optimizer_30e.yml', '_base_/optimizer_30e.yml',
'_base_/fairmot_enhance_hardnet85.yml', '_base_/fairmot_hardnet85.yml',
'_base_/fairmot_reader_1088x608.yml', '_base_/fairmot_reader_1088x608.yml',
] ]
norm_type: sync_bn norm_type: sync_bn
......
...@@ -70,8 +70,8 @@ class FairMOT(BaseArch): ...@@ -70,8 +70,8 @@ class FairMOT(BaseArch):
def _forward(self): def _forward(self):
loss = dict() loss = dict()
# det_outs keys: # det_outs keys:
# train: det_loss, heatmap_loss, size_loss, offset_loss, neck_feat # train: neck_feat, det_loss, heatmap_loss, size_loss, offset_loss (optional: iou_loss)
# eval/infer: bbox, bbox_inds, neck_feat # eval/infer: neck_feat, bbox, bbox_inds
det_outs = self.detector(self.inputs) det_outs = self.detector(self.inputs)
neck_feat = det_outs['neck_feat'] neck_feat = det_outs['neck_feat']
if self.training: if self.training:
...@@ -79,12 +79,11 @@ class FairMOT(BaseArch): ...@@ -79,12 +79,11 @@ class FairMOT(BaseArch):
det_loss = det_outs['det_loss'] det_loss = det_outs['det_loss']
loss = self.loss(det_loss, reid_loss) loss = self.loss(det_loss, reid_loss)
loss.update({ for k, v in det_outs.items():
'heatmap_loss': det_outs['heatmap_loss'], if 'loss' not in k:
'size_loss': det_outs['size_loss'], continue
'offset_loss': det_outs['offset_loss'], loss.update({k: v})
'reid_loss': reid_loss loss.update({'reid_loss': reid_loss})
})
return loss return loss
else: else:
embedding = self.reid(neck_feat, self.inputs) embedding = self.reid(neck_feat, self.inputs)
......
...@@ -18,7 +18,7 @@ import paddle.nn as nn ...@@ -18,7 +18,7 @@ import paddle.nn as nn
import paddle.nn.functional as F import paddle.nn.functional as F
from paddle.nn.initializer import Constant, Uniform from paddle.nn.initializer import Constant, Uniform
from ppdet.core.workspace import register from ppdet.core.workspace import register
from ppdet.modeling.losses import CTFocalLoss from ppdet.modeling.losses import CTFocalLoss, GIoULoss
class ConvLayer(nn.Layer): class ConvLayer(nn.Layer):
...@@ -51,7 +51,6 @@ class ConvLayer(nn.Layer): ...@@ -51,7 +51,6 @@ class ConvLayer(nn.Layer):
def forward(self, inputs): def forward(self, inputs):
out = self.conv(inputs) out = self.conv(inputs)
return out return out
...@@ -66,8 +65,9 @@ class CenterNetHead(nn.Layer): ...@@ -66,8 +65,9 @@ class CenterNetHead(nn.Layer):
regress_ltrb (bool): whether to regress left/top/right/bottom or regress_ltrb (bool): whether to regress left/top/right/bottom or
width/height for a box, true by default width/height for a box, true by default
size_weight (float): the weight of box size loss, 0.1 by default. size_weight (float): the weight of box size loss, 0.1 by default.
size_loss (): the type of size regression loss, 'L1 loss' by default.
offset_weight (float): the weight of center offset loss, 1 by default. offset_weight (float): the weight of center offset loss, 1 by default.
iou_weight (float): the weight of iou head loss, 0 by default.
""" """
__shared__ = ['num_classes'] __shared__ = ['num_classes']
...@@ -79,13 +79,18 @@ class CenterNetHead(nn.Layer): ...@@ -79,13 +79,18 @@ class CenterNetHead(nn.Layer):
heatmap_weight=1, heatmap_weight=1,
regress_ltrb=True, regress_ltrb=True,
size_weight=0.1, size_weight=0.1,
offset_weight=1): size_loss='L1',
offset_weight=1,
iou_weight=0):
super(CenterNetHead, self).__init__() super(CenterNetHead, self).__init__()
self.weights = { self.weights = {
'heatmap': heatmap_weight, 'heatmap': heatmap_weight,
'size': size_weight, 'size': size_weight,
'offset': offset_weight 'offset': offset_weight,
'iou': iou_weight
} }
# heatmap head
self.heatmap = nn.Sequential( self.heatmap = nn.Sequential(
ConvLayer( ConvLayer(
in_channels, head_planes, kernel_size=3, padding=1, bias=True), in_channels, head_planes, kernel_size=3, padding=1, bias=True),
...@@ -99,6 +104,8 @@ class CenterNetHead(nn.Layer): ...@@ -99,6 +104,8 @@ class CenterNetHead(nn.Layer):
bias=True)) bias=True))
with paddle.no_grad(): with paddle.no_grad():
self.heatmap[2].conv.bias[:] = -2.19 self.heatmap[2].conv.bias[:] = -2.19
# size(ltrb or wh) head
self.size = nn.Sequential( self.size = nn.Sequential(
ConvLayer( ConvLayer(
in_channels, head_planes, kernel_size=3, padding=1, bias=True), in_channels, head_planes, kernel_size=3, padding=1, bias=True),
...@@ -110,13 +117,33 @@ class CenterNetHead(nn.Layer): ...@@ -110,13 +117,33 @@ class CenterNetHead(nn.Layer):
stride=1, stride=1,
padding=0, padding=0,
bias=True)) bias=True))
self.size_loss = size_loss
# offset head
self.offset = nn.Sequential( self.offset = nn.Sequential(
ConvLayer( ConvLayer(
in_channels, head_planes, kernel_size=3, padding=1, bias=True), in_channels, head_planes, kernel_size=3, padding=1, bias=True),
nn.ReLU(), nn.ReLU(),
ConvLayer( ConvLayer(
head_planes, 2, kernel_size=1, stride=1, padding=0, bias=True)) head_planes, 2, kernel_size=1, stride=1, padding=0, bias=True))
self.focal_loss = CTFocalLoss()
# iou head (optinal)
if iou_weight > 0:
self.iou = nn.Sequential(
ConvLayer(
in_channels,
head_planes,
kernel_size=3,
padding=1,
bias=True),
nn.ReLU(),
ConvLayer(
head_planes,
4 if regress_ltrb else 2,
kernel_size=1,
stride=1,
padding=0,
bias=True))
@classmethod @classmethod
def from_config(cls, cfg, input_shape): def from_config(cls, cfg, input_shape):
...@@ -128,22 +155,29 @@ class CenterNetHead(nn.Layer): ...@@ -128,22 +155,29 @@ class CenterNetHead(nn.Layer):
heatmap = self.heatmap(feat) heatmap = self.heatmap(feat)
size = self.size(feat) size = self.size(feat)
offset = self.offset(feat) offset = self.offset(feat)
iou = self.iou(feat) if hasattr(self, 'iou_weight') else None
if self.training: if self.training:
loss = self.get_loss(heatmap, size, offset, self.weights, inputs) loss = self.get_loss(
inputs, self.weights, heatmap, size, offset, iou=iou)
return loss return loss
else: else:
heatmap = F.sigmoid(heatmap) heatmap = F.sigmoid(heatmap)
return {'heatmap': heatmap, 'size': size, 'offset': offset} head_outs = {'heatmap': heatmap, 'size': size, 'offset': offset}
if iou is not None:
head_outs.update({'iou': iou})
return head_outs
def get_loss(self, heatmap, size, offset, weights, inputs): def get_loss(self, inputs, weights, heatmap, size, offset, iou=None):
# heatmap head loss: CTFocalLoss
heatmap_target = inputs['heatmap'] heatmap_target = inputs['heatmap']
size_target = inputs['size']
offset_target = inputs['offset']
index = inputs['index']
mask = inputs['index_mask']
heatmap = paddle.clip(F.sigmoid(heatmap), 1e-4, 1 - 1e-4) heatmap = paddle.clip(F.sigmoid(heatmap), 1e-4, 1 - 1e-4)
heatmap_loss = self.focal_loss(heatmap, heatmap_target) ctfocal_loss = CTFocalLoss()
heatmap_loss = ctfocal_loss(heatmap, heatmap_target)
# size head loss: L1 loss or GIoU loss
index = inputs['index']
mask = inputs['index_mask']
size = paddle.transpose(size, perm=[0, 2, 3, 1]) size = paddle.transpose(size, perm=[0, 2, 3, 1])
size_n, size_h, size_w, size_c = size.shape size_n, size_h, size_w, size_c = size.shape
size = paddle.reshape(size, shape=[size_n, -1, size_c]) size = paddle.reshape(size, shape=[size_n, -1, size_c])
...@@ -161,11 +195,32 @@ class CenterNetHead(nn.Layer): ...@@ -161,11 +195,32 @@ class CenterNetHead(nn.Layer):
size_mask = paddle.cast(size_mask, dtype=pos_size.dtype) size_mask = paddle.cast(size_mask, dtype=pos_size.dtype)
pos_num = size_mask.sum() pos_num = size_mask.sum()
size_mask.stop_gradient = True size_mask.stop_gradient = True
if self.size_loss == 'L1':
size_target = inputs['size']
size_target.stop_gradient = True size_target.stop_gradient = True
size_loss = F.l1_loss( size_loss = F.l1_loss(
pos_size * size_mask, size_target * size_mask, reduction='sum') pos_size * size_mask, size_target * size_mask, reduction='sum')
size_loss = size_loss / (pos_num + 1e-4) size_loss = size_loss / (pos_num + 1e-4)
elif self.size_loss == 'giou':
size_target = inputs['bbox_xys']
size_target.stop_gradient = True
centers_x = (size_target[:, :, 0:1] + size_target[:, :, 2:3]) / 2.0
centers_y = (size_target[:, :, 1:2] + size_target[:, :, 3:4]) / 2.0
x1 = centers_x - pos_size[:, :, 0:1]
y1 = centers_y - pos_size[:, :, 1:2]
x2 = centers_x + pos_size[:, :, 2:3]
y2 = centers_y + pos_size[:, :, 3:4]
pred_boxes = paddle.concat([x1, y1, x2, y2], axis=-1)
giou_loss = GIoULoss(reduction='sum')
size_loss = giou_loss(
pred_boxes * size_mask,
size_target * size_mask,
iou_weight=size_mask,
loc_reweight=None)
size_loss = size_loss / (pos_num + 1e-4)
# offset head loss: L1 loss
offset_target = inputs['offset']
offset = paddle.transpose(offset, perm=[0, 2, 3, 1]) offset = paddle.transpose(offset, perm=[0, 2, 3, 1])
offset_n, offset_h, offset_w, offset_c = offset.shape offset_n, offset_h, offset_w, offset_c = offset.shape
offset = paddle.reshape(offset, shape=[offset_n, -1, offset_c]) offset = paddle.reshape(offset, shape=[offset_n, -1, offset_c])
...@@ -181,12 +236,43 @@ class CenterNetHead(nn.Layer): ...@@ -181,12 +236,43 @@ class CenterNetHead(nn.Layer):
reduction='sum') reduction='sum')
offset_loss = offset_loss / (pos_num + 1e-4) offset_loss = offset_loss / (pos_num + 1e-4)
det_loss = weights['heatmap'] * heatmap_loss + weights[ # iou head loss: GIoU loss
'size'] * size_loss + weights['offset'] * offset_loss if iou is not None:
iou = paddle.transpose(iou, perm=[0, 2, 3, 1])
iou_n, iou_h, iou_w, iou_c = iou.shape
iou = paddle.reshape(iou, shape=[iou_n, -1, iou_c])
pos_iou = paddle.gather_nd(iou, index=index)
iou_mask = paddle.expand_as(mask, pos_iou)
iou_mask = paddle.cast(iou_mask, dtype=pos_iou.dtype)
pos_num = iou_mask.sum()
iou_mask.stop_gradient = True
gt_bbox_xys = inputs['bbox_xys']
gt_bbox_xys.stop_gradient = True
centers_x = (gt_bbox_xys[:, :, 0:1] + gt_bbox_xys[:, :, 2:3]) / 2.0
centers_y = (gt_bbox_xys[:, :, 1:2] + gt_bbox_xys[:, :, 3:4]) / 2.0
x1 = centers_x - pos_size[:, :, 0:1]
y1 = centers_y - pos_size[:, :, 1:2]
x2 = centers_x + pos_size[:, :, 2:3]
y2 = centers_y + pos_size[:, :, 3:4]
pred_boxes = paddle.concat([x1, y1, x2, y2], axis=-1)
giou_loss = GIoULoss(reduction='sum')
iou_loss = giou_loss(
pred_boxes * iou_mask,
gt_bbox_xys * iou_mask,
iou_weight=iou_mask,
loc_reweight=None)
iou_loss = iou_loss / (pos_num + 1e-4)
return { losses = {
'det_loss': det_loss,
'heatmap_loss': heatmap_loss, 'heatmap_loss': heatmap_loss,
'size_loss': size_loss, 'size_loss': size_loss,
'offset_loss': offset_loss 'offset_loss': offset_loss,
} }
det_loss = weights['heatmap'] * heatmap_loss + weights[
'size'] * size_loss + weights['offset'] * offset_loss
if iou is not None:
losses.update({'iou_loss': iou_loss})
det_loss = det_loss + weights['iou'] * iou_loss
losses.update({'det_loss': det_loss})
return losses
...@@ -27,6 +27,74 @@ from ..shape_spec import ShapeSpec ...@@ -27,6 +27,74 @@ from ..shape_spec import ShapeSpec
__all__ = ['CenterNetDLAFPN', 'CenterNetHarDNetFPN'] __all__ = ['CenterNetDLAFPN', 'CenterNetHarDNetFPN']
# SGE attention
class BasicConv(nn.Layer):
def __init__(self,
in_planes,
out_planes,
kernel_size,
stride=1,
padding=0,
dilation=1,
groups=1,
relu=True,
bn=True,
bias_attr=False):
super(BasicConv, self).__init__()
self.out_channels = out_planes
self.conv = nn.Conv2D(
in_planes,
out_planes,
kernel_size=kernel_size,
stride=stride,
padding=padding,
dilation=dilation,
groups=groups,
bias_attr=bias_attr)
self.bn = nn.BatchNorm2D(
out_planes,
epsilon=1e-5,
momentum=0.01,
weight_attr=False,
bias_attr=False) if bn else None
self.relu = nn.ReLU() if relu else None
def forward(self, x):
x = self.conv(x)
if self.bn is not None:
x = self.bn(x)
if self.relu is not None:
x = self.relu(x)
return x
class ChannelPool(nn.Layer):
def forward(self, x):
return paddle.concat(
(paddle.max(x, 1).unsqueeze(1), paddle.mean(x, 1).unsqueeze(1)),
axis=1)
class SpatialGate(nn.Layer):
def __init__(self):
super(SpatialGate, self).__init__()
kernel_size = 7
self.compress = ChannelPool()
self.spatial = BasicConv(
2,
1,
kernel_size,
stride=1,
padding=(kernel_size - 1) // 2,
relu=False)
def forward(self, x):
x_compress = self.compress(x)
x_out = self.spatial(x_compress)
scale = F.sigmoid(x_out) # broadcasting
return x * scale
def fill_up_weights(up): def fill_up_weights(up):
weight = up.weight.numpy() weight = up.weight.numpy()
f = math.ceil(weight.shape[2] / 2) f = math.ceil(weight.shape[2] / 2)
...@@ -145,10 +213,10 @@ class CenterNetDLAFPN(nn.Layer): ...@@ -145,10 +213,10 @@ class CenterNetDLAFPN(nn.Layer):
last_level (int): the last level of input feature fed into the upsamplng block last_level (int): the last level of input feature fed into the upsamplng block
out_channel (int): the channel of the output feature, 0 by default means out_channel (int): the channel of the output feature, 0 by default means
the channel of the input feature whose down ratio is `down_ratio` the channel of the input feature whose down ratio is `down_ratio`
dcn_v2 (bool): whether use the DCNv2, true by default first_level (None): the first level of input feature fed into the upsamplng block.
first_level (int|None): the first level of input feature fed into the upsamplng block.
if None, the first level stands for logs(down_ratio) if None, the first level stands for logs(down_ratio)
dcn_v2 (bool): whether use the DCNv2, True by default
with_sge (bool): whether use SGE attention, False by default
""" """
def __init__(self, def __init__(self,
...@@ -156,8 +224,9 @@ class CenterNetDLAFPN(nn.Layer): ...@@ -156,8 +224,9 @@ class CenterNetDLAFPN(nn.Layer):
down_ratio=4, down_ratio=4,
last_level=5, last_level=5,
out_channel=0, out_channel=0,
first_level=None,
dcn_v2=True, dcn_v2=True,
first_level=None): with_sge=False):
super(CenterNetDLAFPN, self).__init__() super(CenterNetDLAFPN, self).__init__()
self.first_level = int(np.log2( self.first_level = int(np.log2(
down_ratio)) if first_level is None else first_level down_ratio)) if first_level is None else first_level
...@@ -180,6 +249,10 @@ class CenterNetDLAFPN(nn.Layer): ...@@ -180,6 +249,10 @@ class CenterNetDLAFPN(nn.Layer):
[2**i for i in range(self.last_level - self.first_level)], [2**i for i in range(self.last_level - self.first_level)],
dcn_v2=dcn_v2) dcn_v2=dcn_v2)
self.with_sge = with_sge
if self.with_sge:
self.sge_attention = SpatialGate()
@classmethod @classmethod
def from_config(cls, cfg, input_shape): def from_config(cls, cfg, input_shape):
return {'in_channels': [i.channels for i in input_shape]} return {'in_channels': [i.channels for i in input_shape]}
...@@ -194,7 +267,10 @@ class CenterNetDLAFPN(nn.Layer): ...@@ -194,7 +267,10 @@ class CenterNetDLAFPN(nn.Layer):
self.ida_up(ida_up_feats, 0, len(ida_up_feats)) self.ida_up(ida_up_feats, 0, len(ida_up_feats))
return ida_up_feats[-1] feat = ida_up_feats[-1]
if self.with_sge:
feat = self.sge_attention(feat)
return feat
@property @property
def out_shape(self): def out_shape(self):
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册