未验证 提交 866332ed 编写于 作者: W wangguanzhong 提交者: GitHub

add ttfnet enhance (#2609)

* add ttfnet enhance

* add doc

* fix pafnet training
上级 ac701833
# TTFNet # 1. TTFNet
## 简介 ## 简介
...@@ -15,6 +15,43 @@ TTFNet是一种用于实时目标检测且对训练时间友好的网络,对Ce ...@@ -15,6 +15,43 @@ TTFNet是一种用于实时目标检测且对训练时间友好的网络,对Ce
| :-------------- | :------------- | :-----: | :-----: | :------------: | :-----: | :-----------------------------------------------------: | :-----: | | :-------------- | :------------- | :-----: | :-----: | :------------: | :-----: | :-----------------------------------------------------: | :-----: |
| DarkNet53 | TTFNet | 12 | 1x | ---- | 33.5 | [下载链接](https://paddledet.bj.bcebos.com/models/ttfnet_darknet53_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ttfnet/ttfnet_darknet53_1x_coco.yml) | | DarkNet53 | TTFNet | 12 | 1x | ---- | 33.5 | [下载链接](https://paddledet.bj.bcebos.com/models/ttfnet_darknet53_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ttfnet/ttfnet_darknet53_1x_coco.yml) |
# 2. PAFNet
## 简介
PAFNet(Paddle Anchor Free)是PaddleDetection基于TTFNet的优化模型,精度达到anchor free领域SOTA水平,同时产出移动端轻量级模型PAFNet-Lite
PAFNet系列模型从如下方面优化TTFNet模型:
- [CutMix](https://arxiv.org/abs/1905.04899)
- 更优的骨干网络: ResNet50vd-DCN
- 更大的训练batch size: 8 GPUs,每GPU batch_size=18
- Synchronized Batch Normalization
- [Deformable Convolution](https://arxiv.org/abs/1703.06211)
- [Exponential Moving Average](https://www.investopedia.com/terms/e/ema.asp)
- 更优的预训练模型
## 模型库
| 骨架网络 | 网络类型 | 每张GPU图片个数 | 学习率策略 |推理时间(fps) | Box AP | 下载 | 配置文件 |
| :-------------- | :------------- | :-----: | :-----: | :------------: | :-----: | :-----------------------------------------------------: | :-----: |
| ResNet50vd | PAFNet | 18 | 10x | ---- | 42.2 | [下载链接](https://paddledet.bj.bcebos.com/models/pafnet_10x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ttfnet/pafnet_10x_coco.yml) |
### PAFNet-Lite
| 骨架网络 | 网络类型 | 每张GPU图片个数 | 学习率策略 | Box AP | 麒麟990延时(ms) | 体积(M) | 下载 | 配置文件 |
| :-------------- | :------------- | :-----: | :-----: | :-----: | :------------: | :-----: | :-----------------------------------------------------: | :-----: |
| MobileNetv3 | PAFNet-Lite | 12 | 20x | 23.9 | 26.00 | 14 | [下载链接](https://paddledet.bj.bcebos.com/models/pafnet_lite_mobilenet_v3_20x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/ttfnet/pafnet_lite_mobilenet_v3_20x_coco.yml) |
## Citations ## Citations
``` ```
@article{liu2019training, @article{liu2019training,
......
epoch: 120
LearningRate:
base_lr: 0.015
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [80, 110]
- !LinearWarmup
start_factor: 0.2
steps: 500
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0004
type: L2
epoch: 240
LearningRate:
base_lr: 0.015
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones: [160, 220]
- !LinearWarmup
start_factor: 0.2
steps: 1000
OptimizerBuilder:
clip_grad_by_norm: 35
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0004
type: L2
architecture: TTFNet
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_pretrained.pdparams
norm_type: sync_bn
use_ema: true
ema_decay: 0.9998
TTFNet:
backbone: ResNet
neck: TTFFPN
ttf_head: TTFHead
post_process: BBoxPostProcess
ResNet:
depth: 50
variant: d
return_idx: [0, 1, 2, 3]
freeze_at: -1
norm_decay: 0.
variant: d
dcn_v2_stages: [1, 2, 3]
TTFFPN:
planes: [256, 128, 64]
shortcut_num: [3, 2, 1]
TTFHead:
dcn_head: true
hm_loss:
name: CTFocalLoss
loss_weight: 1.
wh_loss:
name: GIoULoss
loss_weight: 5.
reduction: sum
BBoxPostProcess:
decode:
name: TTFBox
max_per_img: 100
score_thresh: 0.01
down_ratio: 4
architecture: TTFNet
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/MobileNetV3_large_x1_0_ssld_pretrained.pdparams
norm_type: sync_bn
TTFNet:
backbone: MobileNetV3
neck: TTFFPN
ttf_head: TTFHead
post_process: BBoxPostProcess
MobileNetV3:
scale: 1.0
model_name: large
feature_maps: [5, 8, 14, 17]
with_extra_blocks: true
lr_mult_list: [0.25, 0.25, 0.5, 0.5, 0.75]
conv_decay: 0.00001
norm_decay: 0.0
extra_block_filters: []
TTFFPN:
planes: [96, 48, 24]
shortcut_num: [2, 2, 1]
lite_neck: true
fusion_method: concat
TTFHead:
hm_head_planes: 48
wh_head_planes: 24
lite_head: true
hm_loss:
name: CTFocalLoss
loss_weight: 1.
wh_loss:
name: GIoULoss
loss_weight: 5.
reduction: sum
BBoxPostProcess:
decode:
name: TTFBox
max_per_img: 100
score_thresh: 0.01
down_ratio: 4
worker_num: 2
TrainReader:
sample_transforms:
- Decode: {}
- Cutmix: {alpha: 1.5, beta: 1.5}
- RandomDistort: {}
- RandomExpand: {fill_value: [123.675, 116.28, 103.53]}
- RandomCrop: {aspect_ratio: NULL, cover_all_box: True}
- RandomFlip: {}
- GridMask: {upper_iter: 300000}
batch_transforms:
- BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512], random_interp: True, keep_ratio: False}
- NormalizeImage: {mean: [123.675, 116.28, 103.53], std: [58.395, 57.12, 57.375], is_scale: false}
- Permute: {}
- Gt2TTFTarget: {down_ratio: 4}
- PadBatch: {pad_to_stride: 32}
batch_size: 12
shuffle: true
drop_last: true
use_shared_memory: true
EvalReader:
sample_transforms:
- Decode: {}
- Resize: {interp: 1, target_size: [320, 320], keep_ratio: False}
- NormalizeImage: {is_scale: false, mean: [123.675, 116.28, 103.53], std: [58.395, 57.12, 57.375]}
- Permute: {}
batch_size: 1
drop_last: false
drop_empty: false
TestReader:
sample_transforms:
- Decode: {}
- Resize: {interp: 1, target_size: [320, 320], keep_ratio: False}
- NormalizeImage: {is_scale: false, mean: [123.675, 116.28, 103.53], std: [58.395, 57.12, 57.375]}
- Permute: {}
batch_size: 1
drop_last: false
drop_empty: false
worker_num: 2
TrainReader:
sample_transforms:
- Decode: {}
- Cutmix: {alpha: 1.5, beta: 1.5}
- RandomDistort: {random_apply: false, random_channel: true}
- RandomExpand: {fill_value: [123.675, 116.28, 103.53]}
- RandomCrop: {aspect_ratio: NULL, cover_all_box: True}
- RandomFlip: {prob: 0.5}
batch_transforms:
- BatchRandomResize: {target_size: [416, 448, 480, 512, 544, 576, 608, 640, 672], keep_ratio: false}
- NormalizeImage: {mean: [123.675, 116.28, 103.53], std: [58.395, 57.12, 57.375], is_scale: false}
- Permute: {}
- Gt2TTFTarget: {down_ratio: 4}
- PadBatch: {pad_to_stride: 32}
batch_size: 18
shuffle: true
drop_last: true
use_shared_memory: true
mixup_epoch: 100
EvalReader:
sample_transforms:
- Decode: {}
- Resize: {interp: 1, target_size: [512, 512], keep_ratio: False}
- NormalizeImage: {is_scale: false, mean: [123.675, 116.28, 103.53], std: [58.395, 57.12, 57.375]}
- Permute: {}
batch_size: 1
drop_last: false
drop_empty: false
TestReader:
sample_transforms:
- Decode: {}
- Resize: {interp: 1, target_size: [512, 512], keep_ratio: False}
- NormalizeImage: {is_scale: false, mean: [123.675, 116.28, 103.53], std: [58.395, 57.12, 57.375]}
- Permute: {}
batch_size: 1
drop_last: false
drop_empty: false
...@@ -14,8 +14,9 @@ DarkNet: ...@@ -14,8 +14,9 @@ DarkNet:
norm_type: bn norm_type: bn
norm_decay: 0.0004 norm_decay: 0.0004
# use default config TTFFPN:
# TTFFPN: planes: [256, 128, 64]
shortcut_num: [3, 2, 1]
TTFHead: TTFHead:
hm_loss: hm_loss:
......
_BASE_: [
'../datasets/coco_detection.yml',
'../runtime.yml',
'_base_/optimizer_10x.yml',
'_base_/pafnet.yml',
'_base_/pafnet_reader.yml',
]
weights: output/pafnet_10x_coco/model_final
_BASE_: [
'../datasets/coco_detection.yml',
'../runtime.yml',
'_base_/optimizer_20x.yml',
'_base_/pafnet_lite.yml',
'_base_/pafnet_lite_reader.yml',
]
weights: output/pafnet_lite_mobilenet_v3_10x_coco/model_final
...@@ -55,6 +55,7 @@ class DetDataset(Dataset): ...@@ -55,6 +55,7 @@ class DetDataset(Dataset):
self.sample_num = sample_num self.sample_num = sample_num
self.use_default_label = use_default_label self.use_default_label = use_default_label
self._epoch = 0 self._epoch = 0
self._curr_iter = 0
def __len__(self, ): def __len__(self, ):
return len(self.roidbs) return len(self.roidbs)
...@@ -76,6 +77,8 @@ class DetDataset(Dataset): ...@@ -76,6 +77,8 @@ class DetDataset(Dataset):
copy.deepcopy(self.roidbs[np.random.randint(n)]) copy.deepcopy(self.roidbs[np.random.randint(n)])
for _ in range(3) for _ in range(3)
] ]
roidb['curr_iter'] = self._curr_iter
self._curr_iter += 1
return self.transform(roidb) return self.transform(roidb)
......
...@@ -533,6 +533,8 @@ class Gt2TTFTarget(BaseOperator): ...@@ -533,6 +533,8 @@ class Gt2TTFTarget(BaseOperator):
sample.pop('is_crowd') sample.pop('is_crowd')
sample.pop('gt_class') sample.pop('gt_class')
sample.pop('gt_bbox') sample.pop('gt_bbox')
if 'gt_score' in sample:
sample.pop('gt_score')
return samples return samples
def draw_truncate_gaussian(self, heatmap, center, h_radius, w_radius): def draw_truncate_gaussian(self, heatmap, center, h_radius, w_radius):
......
...@@ -20,7 +20,7 @@ import numpy as np ...@@ -20,7 +20,7 @@ import numpy as np
from PIL import Image from PIL import Image
class GridMask(object): class Gridmask(object):
def __init__(self, def __init__(self,
use_h=True, use_h=True,
use_w=True, use_w=True,
...@@ -30,7 +30,7 @@ class GridMask(object): ...@@ -30,7 +30,7 @@ class GridMask(object):
mode=1, mode=1,
prob=0.7, prob=0.7,
upper_iter=360000): upper_iter=360000):
super(GridMask, self).__init__() super(Gridmask, self).__init__()
self.use_h = use_h self.use_h = use_h
self.use_w = use_w self.use_w = use_w
self.rotate = rotate self.rotate = rotate
......
...@@ -308,8 +308,8 @@ class GridMask(BaseOperator): ...@@ -308,8 +308,8 @@ class GridMask(BaseOperator):
self.prob = prob self.prob = prob
self.upper_iter = upper_iter self.upper_iter = upper_iter
from .gridmask_utils import GridMask from .gridmask_utils import Gridmask
self.gridmask_op = GridMask( self.gridmask_op = Gridmask(
use_h, use_h,
use_w, use_w,
rotate=rotate, rotate=rotate,
...@@ -1516,14 +1516,14 @@ class Cutmix(BaseOperator): ...@@ -1516,14 +1516,14 @@ class Cutmix(BaseOperator):
bbx2 = np.clip(cx + cut_w // 2, 0, w - 1) bbx2 = np.clip(cx + cut_w // 2, 0, w - 1)
bby2 = np.clip(cy + cut_h // 2, 0, h - 1) bby2 = np.clip(cy + cut_h // 2, 0, h - 1)
img_1 = np.zeros((h, w, img1.shape[2]), 'float32') img_1_pad = np.zeros((h, w, img1.shape[2]), 'float32')
img_1[:img1.shape[0], :img1.shape[1], :] = \ img_1_pad[:img1.shape[0], :img1.shape[1], :] = \
img1.astype('float32') img1.astype('float32')
img_2 = np.zeros((h, w, img2.shape[2]), 'float32') img_2_pad = np.zeros((h, w, img2.shape[2]), 'float32')
img_2[:img2.shape[0], :img2.shape[1], :] = \ img_2_pad[:img2.shape[0], :img2.shape[1], :] = \
img2.astype('float32') img2.astype('float32')
img_1[bby1:bby2, bbx1:bbx2, :] = img2[bby1:bby2, bbx1:bbx2, :] img_1_pad[bby1:bby2, bbx1:bbx2, :] = img_2_pad[bby1:bby2, bbx1:bbx2, :]
return img_1 return img_1_pad
def __call__(self, sample, context=None): def __call__(self, sample, context=None):
if not isinstance(sample, Sequence): if not isinstance(sample, Sequence):
...@@ -1546,16 +1546,27 @@ class Cutmix(BaseOperator): ...@@ -1546,16 +1546,27 @@ class Cutmix(BaseOperator):
gt_class1 = sample[0]['gt_class'] gt_class1 = sample[0]['gt_class']
gt_class2 = sample[1]['gt_class'] gt_class2 = sample[1]['gt_class']
gt_class = np.concatenate((gt_class1, gt_class2), axis=0) gt_class = np.concatenate((gt_class1, gt_class2), axis=0)
gt_score1 = sample[0]['gt_score'] gt_score1 = np.ones_like(sample[0]['gt_class'])
gt_score2 = sample[1]['gt_score'] gt_score2 = np.ones_like(sample[1]['gt_class'])
gt_score = np.concatenate( gt_score = np.concatenate(
(gt_score1 * factor, gt_score2 * (1. - factor)), axis=0) (gt_score1 * factor, gt_score2 * (1. - factor)), axis=0)
sample = sample[0] result = copy.deepcopy(sample[0])
sample['image'] = img result['image'] = img
sample['gt_bbox'] = gt_bbox result['gt_bbox'] = gt_bbox
sample['gt_score'] = gt_score result['gt_score'] = gt_score
sample['gt_class'] = gt_class result['gt_class'] = gt_class
return sample if 'is_crowd' in sample[0]:
is_crowd1 = sample[0]['is_crowd']
is_crowd2 = sample[1]['is_crowd']
is_crowd = np.concatenate((is_crowd1, is_crowd2), axis=0)
result['is_crowd'] = is_crowd
if 'difficult' in sample[0]:
is_difficult1 = sample[0]['difficult']
is_difficult2 = sample[1]['difficult']
is_difficult = np.concatenate(
(is_difficult1, is_difficult2), axis=0)
result['difficult'] = is_difficult
return result
@register_op @register_op
......
...@@ -330,16 +330,16 @@ class MobileNetV3(nn.Layer): ...@@ -330,16 +330,16 @@ class MobileNetV3(nn.Layer):
[3, 16, 16, False, "relu", 1], [3, 16, 16, False, "relu", 1],
[3, 64, 24, False, "relu", 2], [3, 64, 24, False, "relu", 2],
[3, 72, 24, False, "relu", 1], [3, 72, 24, False, "relu", 1],
[5, 72, 40, True, "relu", 2], [5, 72, 40, True, "relu", 2], # RCNN output
[5, 120, 40, True, "relu", 1], [5, 120, 40, True, "relu", 1],
[5, 120, 40, True, "relu", 1], # YOLOv3 output [5, 120, 40, True, "relu", 1], # YOLOv3 output
[3, 240, 80, False, "hard_swish", 2], [3, 240, 80, False, "hard_swish", 2], # RCNN output
[3, 200, 80, False, "hard_swish", 1], [3, 200, 80, False, "hard_swish", 1],
[3, 184, 80, False, "hard_swish", 1], [3, 184, 80, False, "hard_swish", 1],
[3, 184, 80, False, "hard_swish", 1], [3, 184, 80, False, "hard_swish", 1],
[3, 480, 112, True, "hard_swish", 1], [3, 480, 112, True, "hard_swish", 1],
[3, 672, 112, True, "hard_swish", 1], # YOLOv3 output [3, 672, 112, True, "hard_swish", 1], # YOLOv3 output
[5, 672, 160, True, "hard_swish", 2], # SSD/SSDLite output [5, 672, 160, True, "hard_swish", 2], # SSD/SSDLite/RCNN output
[5, 960, 160, True, "hard_swish", 1], [5, 960, 160, True, "hard_swish", 1],
[5, 960, 160, True, "hard_swish", 1], # YOLOv3 output [5, 960, 160, True, "hard_swish", 1], # YOLOv3 output
] ]
...@@ -347,14 +347,14 @@ class MobileNetV3(nn.Layer): ...@@ -347,14 +347,14 @@ class MobileNetV3(nn.Layer):
self.cfg = [ self.cfg = [
# k, exp, c, se, nl, s, # k, exp, c, se, nl, s,
[3, 16, 16, True, "relu", 2], [3, 16, 16, True, "relu", 2],
[3, 72, 24, False, "relu", 2], [3, 72, 24, False, "relu", 2], # RCNN output
[3, 88, 24, False, "relu", 1], # YOLOv3 output [3, 88, 24, False, "relu", 1], # YOLOv3 output
[5, 96, 40, True, "hard_swish", 2], [5, 96, 40, True, "hard_swish", 2], # RCNN output
[5, 240, 40, True, "hard_swish", 1], [5, 240, 40, True, "hard_swish", 1],
[5, 240, 40, True, "hard_swish", 1], [5, 240, 40, True, "hard_swish", 1],
[5, 120, 48, True, "hard_swish", 1], [5, 120, 48, True, "hard_swish", 1],
[5, 144, 48, True, "hard_swish", 1], # YOLOv3 output [5, 144, 48, True, "hard_swish", 1], # YOLOv3 output
[5, 288, 96, True, "hard_swish", 2], # SSD/SSDLite output [5, 288, 96, True, "hard_swish", 2], # SSD/SSDLite/RCNN output
[5, 576, 96, True, "hard_swish", 1], [5, 576, 96, True, "hard_swish", 1],
[5, 576, 96, True, "hard_swish", 1], # YOLOv3 output [5, 576, 96, True, "hard_swish", 1], # YOLOv3 output
] ]
......
...@@ -19,6 +19,7 @@ from paddle import ParamAttr ...@@ -19,6 +19,7 @@ from paddle import ParamAttr
from paddle.nn.initializer import Constant, Uniform, Normal from paddle.nn.initializer import Constant, Uniform, Normal
from paddle.regularizer import L2Decay from paddle.regularizer import L2Decay
from ppdet.core.workspace import register from ppdet.core.workspace import register
from ppdet.modeling.layers import DeformableConvV2, LiteConv
import numpy as np import numpy as np
...@@ -30,27 +31,61 @@ class HMHead(nn.Layer): ...@@ -30,27 +31,61 @@ class HMHead(nn.Layer):
ch_out (int): The channel number of output Tensor. ch_out (int): The channel number of output Tensor.
num_classes (int): Number of classes. num_classes (int): Number of classes.
conv_num (int): The convolution number of hm_feat. conv_num (int): The convolution number of hm_feat.
dcn_head(bool): whether use dcn in head. False by default.
lite_head(bool): whether use lite version. False by default.
norm_type (string): norm type, 'sync_bn', 'bn', 'gn' are optional.
bn by default
Return: Return:
Heatmap head output Heatmap head output
""" """
__shared__ = ['num_classes'] __shared__ = ['num_classes', 'norm_type']
def __init__(self, ch_in, ch_out=128, num_classes=80, conv_num=2): def __init__(
self,
ch_in,
ch_out=128,
num_classes=80,
conv_num=2,
dcn_head=False,
lite_head=False,
norm_type='bn', ):
super(HMHead, self).__init__() super(HMHead, self).__init__()
head_conv = nn.Sequential() head_conv = nn.Sequential()
for i in range(conv_num): for i in range(conv_num):
name = 'conv.{}'.format(i) name = 'conv.{}'.format(i)
head_conv.add_sublayer( if lite_head:
name, lite_name = 'hm.' + name
nn.Conv2D( head_conv.add_sublayer(
in_channels=ch_in if i == 0 else ch_out, lite_name,
out_channels=ch_out, LiteConv(
kernel_size=3, in_channels=ch_in if i == 0 else ch_out,
padding=1, out_channels=ch_out,
weight_attr=ParamAttr(initializer=Normal(0, 0.01)), norm_type=norm_type,
bias_attr=ParamAttr( name=lite_name))
learning_rate=2., regularizer=L2Decay(0.)))) head_conv.add_sublayer(lite_name + '.act', nn.ReLU6())
head_conv.add_sublayer(name + '.act', nn.ReLU()) else:
if dcn_head:
head_conv.add_sublayer(
name,
DeformableConvV2(
in_channels=ch_in if i == 0 else ch_out,
out_channels=ch_out,
kernel_size=3,
weight_attr=ParamAttr(initializer=Normal(0, 0.01)),
name='hm.' + name))
else:
head_conv.add_sublayer(
name,
nn.Conv2D(
in_channels=ch_in if i == 0 else ch_out,
out_channels=ch_out,
kernel_size=3,
padding=1,
weight_attr=ParamAttr(initializer=Normal(0, 0.01)),
bias_attr=ParamAttr(
learning_rate=2., regularizer=L2Decay(0.))))
head_conv.add_sublayer(name + '.act', nn.ReLU())
self.feat = self.add_sublayer('hm_feat', head_conv) self.feat = self.add_sublayer('hm_feat', head_conv)
bias_init = float(-np.log((1 - 0.01) / 0.01)) bias_init = float(-np.log((1 - 0.01) / 0.01))
self.head = self.add_sublayer( self.head = self.add_sublayer(
...@@ -78,26 +113,59 @@ class WHHead(nn.Layer): ...@@ -78,26 +113,59 @@ class WHHead(nn.Layer):
ch_in (int): The channel number of input Tensor. ch_in (int): The channel number of input Tensor.
ch_out (int): The channel number of output Tensor. ch_out (int): The channel number of output Tensor.
conv_num (int): The convolution number of wh_feat. conv_num (int): The convolution number of wh_feat.
dcn_head(bool): whether use dcn in head. False by default.
lite_head(bool): whether use lite version. False by default.
norm_type (string): norm type, 'sync_bn', 'bn', 'gn' are optional.
bn by default
Return: Return:
Width & Height head output Width & Height head output
""" """
__shared__ = ['norm_type']
def __init__(self, ch_in, ch_out=64, conv_num=2): def __init__(self,
ch_in,
ch_out=64,
conv_num=2,
dcn_head=False,
lite_head=False,
norm_type='bn'):
super(WHHead, self).__init__() super(WHHead, self).__init__()
head_conv = nn.Sequential() head_conv = nn.Sequential()
for i in range(conv_num): for i in range(conv_num):
name = 'conv.{}'.format(i) name = 'conv.{}'.format(i)
head_conv.add_sublayer( if lite_head:
name, lite_name = 'wh.' + name
nn.Conv2D( head_conv.add_sublayer(
in_channels=ch_in if i == 0 else ch_out, lite_name,
out_channels=ch_out, LiteConv(
kernel_size=3, in_channels=ch_in if i == 0 else ch_out,
padding=1, out_channels=ch_out,
weight_attr=ParamAttr(initializer=Normal(0, 0.001)), norm_type=norm_type,
bias_attr=ParamAttr( name=lite_name))
learning_rate=2., regularizer=L2Decay(0.)))) head_conv.add_sublayer(lite_name + '.act', nn.ReLU6())
head_conv.add_sublayer(name + '.act', nn.ReLU()) else:
if dcn_head:
head_conv.add_sublayer(
name,
DeformableConvV2(
in_channels=ch_in if i == 0 else ch_out,
out_channels=ch_out,
kernel_size=3,
weight_attr=ParamAttr(initializer=Normal(0, 0.01)),
name='wh.' + name))
else:
head_conv.add_sublayer(
name,
nn.Conv2D(
in_channels=ch_in if i == 0 else ch_out,
out_channels=ch_out,
kernel_size=3,
padding=1,
weight_attr=ParamAttr(initializer=Normal(0, 0.01)),
bias_attr=ParamAttr(
learning_rate=2., regularizer=L2Decay(0.))))
head_conv.add_sublayer(name + '.act', nn.ReLU())
self.feat = self.add_sublayer('wh_feat', head_conv) self.feat = self.add_sublayer('wh_feat', head_conv)
self.head = self.add_sublayer( self.head = self.add_sublayer(
'wh_head', 'wh_head',
...@@ -137,9 +205,12 @@ class TTFHead(nn.Layer): ...@@ -137,9 +205,12 @@ class TTFHead(nn.Layer):
16.0 by default. 16.0 by default.
down_ratio (int): the actual down_ratio is calculated by base_down_ratio down_ratio (int): the actual down_ratio is calculated by base_down_ratio
(default 16) and the number of upsample layers. (default 16) and the number of upsample layers.
lite_head(bool): whether use lite version. False by default.
norm_type (string): norm type, 'sync_bn', 'bn', 'gn' are optional.
bn by default
""" """
__shared__ = ['num_classes', 'down_ratio'] __shared__ = ['num_classes', 'down_ratio', 'norm_type']
__inject__ = ['hm_loss', 'wh_loss'] __inject__ = ['hm_loss', 'wh_loss']
def __init__(self, def __init__(self,
...@@ -152,12 +223,16 @@ class TTFHead(nn.Layer): ...@@ -152,12 +223,16 @@ class TTFHead(nn.Layer):
hm_loss='CTFocalLoss', hm_loss='CTFocalLoss',
wh_loss='GIoULoss', wh_loss='GIoULoss',
wh_offset_base=16., wh_offset_base=16.,
down_ratio=4): down_ratio=4,
dcn_head=False,
lite_head=False,
norm_type='bn'):
super(TTFHead, self).__init__() super(TTFHead, self).__init__()
self.in_channels = in_channels self.in_channels = in_channels
self.hm_head = HMHead(in_channels, hm_head_planes, num_classes, self.hm_head = HMHead(in_channels, hm_head_planes, num_classes,
hm_head_conv_num) hm_head_conv_num, dcn_head, lite_head, norm_type)
self.wh_head = WHHead(in_channels, wh_head_planes, wh_head_conv_num) self.wh_head = WHHead(in_channels, wh_head_planes, wh_head_conv_num,
dcn_head, lite_head, norm_type)
self.hm_loss = hm_loss self.hm_loss = hm_loss
self.wh_loss = wh_loss self.wh_loss = wh_loss
......
...@@ -23,7 +23,7 @@ from paddle import ParamAttr ...@@ -23,7 +23,7 @@ from paddle import ParamAttr
from paddle import to_tensor from paddle import to_tensor
from paddle.nn import Conv2D, BatchNorm2D, GroupNorm from paddle.nn import Conv2D, BatchNorm2D, GroupNorm
import paddle.nn.functional as F import paddle.nn.functional as F
from paddle.nn.initializer import Normal, Constant from paddle.nn.initializer import Normal, Constant, XavierUniform
from paddle.regularizer import L2Decay from paddle.regularizer import L2Decay
from ppdet.core.workspace import register, serializable from ppdet.core.workspace import register, serializable
...@@ -112,6 +112,7 @@ class ConvNormLayer(nn.Layer): ...@@ -112,6 +112,7 @@ class ConvNormLayer(nn.Layer):
ch_out, ch_out,
filter_size, filter_size,
stride, stride,
groups=1,
norm_type='bn', norm_type='bn',
norm_decay=0., norm_decay=0.,
norm_groups=32, norm_groups=32,
...@@ -142,7 +143,7 @@ class ConvNormLayer(nn.Layer): ...@@ -142,7 +143,7 @@ class ConvNormLayer(nn.Layer):
kernel_size=filter_size, kernel_size=filter_size,
stride=stride, stride=stride,
padding=(filter_size - 1) // 2, padding=(filter_size - 1) // 2,
groups=1, groups=groups,
weight_attr=ParamAttr( weight_attr=ParamAttr(
name=name + "_weight", name=name + "_weight",
initializer=initializer, initializer=initializer,
...@@ -158,7 +159,7 @@ class ConvNormLayer(nn.Layer): ...@@ -158,7 +159,7 @@ class ConvNormLayer(nn.Layer):
kernel_size=filter_size, kernel_size=filter_size,
stride=stride, stride=stride,
padding=(filter_size - 1) // 2, padding=(filter_size - 1) // 2,
groups=1, groups=groups,
weight_attr=ParamAttr( weight_attr=ParamAttr(
name=name + "_weight", name=name + "_weight",
initializer=initializer, initializer=initializer,
...@@ -197,6 +198,71 @@ class ConvNormLayer(nn.Layer): ...@@ -197,6 +198,71 @@ class ConvNormLayer(nn.Layer):
return out return out
class LiteConv(nn.Layer):
def __init__(self,
in_channels,
out_channels,
stride=1,
with_act=True,
norm_type='sync_bn',
name=None):
super(LiteConv, self).__init__()
self.lite_conv = nn.Sequential()
conv1 = ConvNormLayer(
in_channels,
in_channels,
filter_size=5,
stride=stride,
groups=in_channels,
norm_type=norm_type,
initializer=XavierUniform(),
norm_name=name + '.conv1.norm',
name=name + '.conv1')
conv2 = ConvNormLayer(
in_channels,
out_channels,
filter_size=1,
stride=stride,
norm_type=norm_type,
initializer=XavierUniform(),
norm_name=name + '.conv2.norm',
name=name + '.conv2')
conv3 = ConvNormLayer(
out_channels,
out_channels,
filter_size=1,
stride=stride,
norm_type=norm_type,
initializer=XavierUniform(),
norm_name=name + '.conv3.norm',
name=name + '.conv3')
conv4 = ConvNormLayer(
out_channels,
out_channels,
filter_size=5,
stride=stride,
groups=out_channels,
norm_type=norm_type,
initializer=XavierUniform(),
norm_name=name + '.conv4.norm',
name=name + '.conv4')
conv_list = [conv1, conv2, conv3, conv4]
self.lite_conv.add_sublayer('conv1', conv1)
self.lite_conv.add_sublayer('relu6_1', nn.ReLU6())
self.lite_conv.add_sublayer('conv2', conv2)
if with_act:
self.lite_conv.add_sublayer('relu6_2', nn.ReLU6())
self.lite_conv.add_sublayer('conv3', conv3)
self.lite_conv.add_sublayer('relu6_3', nn.ReLU6())
self.lite_conv.add_sublayer('conv4', conv4)
if with_act:
self.lite_conv.add_sublayer('relu6_4', nn.ReLU6())
def forward(self, inputs):
out = self.lite_conv(inputs)
return out
@register @register
@serializable @serializable
class AnchorGeneratorRPN(object): class AnchorGeneratorRPN(object):
......
...@@ -16,11 +16,11 @@ import paddle ...@@ -16,11 +16,11 @@ import paddle
import paddle.nn as nn import paddle.nn as nn
import paddle.nn.functional as F import paddle.nn.functional as F
from paddle import ParamAttr from paddle import ParamAttr
from paddle.nn.initializer import Constant, Uniform, Normal from paddle.nn.initializer import Constant, Uniform, Normal, XavierUniform
from paddle import ParamAttr from paddle import ParamAttr
from ppdet.core.workspace import register, serializable from ppdet.core.workspace import register, serializable
from paddle.regularizer import L2Decay from paddle.regularizer import L2Decay
from ppdet.modeling.layers import DeformableConvV2 from ppdet.modeling.layers import DeformableConvV2, ConvNormLayer, LiteConv
import math import math
from ppdet.modeling.ops import batch_norm from ppdet.modeling.ops import batch_norm
from ..shape_spec import ShapeSpec from ..shape_spec import ShapeSpec
...@@ -29,7 +29,7 @@ __all__ = ['TTFFPN'] ...@@ -29,7 +29,7 @@ __all__ = ['TTFFPN']
class Upsample(nn.Layer): class Upsample(nn.Layer):
def __init__(self, ch_in, ch_out, name=None): def __init__(self, ch_in, ch_out, norm_type='bn', name=None):
super(Upsample, self).__init__() super(Upsample, self).__init__()
fan_in = ch_in * 3 * 3 fan_in = ch_in * 3 * 3
stdv = 1. / math.sqrt(fan_in) stdv = 1. / math.sqrt(fan_in)
...@@ -46,7 +46,7 @@ class Upsample(nn.Layer): ...@@ -46,7 +46,7 @@ class Upsample(nn.Layer):
regularizer=L2Decay(0.)) regularizer=L2Decay(0.))
self.bn = batch_norm( self.bn = batch_norm(
ch_out, norm_type='bn', initializer=Constant(1.), name=name) ch_out, norm_type=norm_type, initializer=Constant(1.), name=name)
def forward(self, feat): def forward(self, feat):
dcn = self.dcn(feat) dcn = self.dcn(feat)
...@@ -56,28 +56,105 @@ class Upsample(nn.Layer): ...@@ -56,28 +56,105 @@ class Upsample(nn.Layer):
return out return out
class DeConv(nn.Layer):
def __init__(self, ch_in, ch_out, norm_type='bn', name=None):
super(DeConv, self).__init__()
self.deconv = nn.Sequential()
conv1 = ConvNormLayer(
ch_in=ch_in,
ch_out=ch_out,
stride=1,
filter_size=1,
norm_type=norm_type,
initializer=XavierUniform(),
norm_name=name + '.conv1.norm',
name=name + '.conv1')
conv2 = nn.Conv2DTranspose(
in_channels=ch_out,
out_channels=ch_out,
kernel_size=4,
padding=1,
stride=2,
groups=ch_out,
weight_attr=ParamAttr(initializer=XavierUniform()),
bias_attr=False)
bn = batch_norm(
ch_out, norm_type=norm_type, norm_decay=0., name=name + '.bn')
conv3 = ConvNormLayer(
ch_in=ch_out,
ch_out=ch_out,
stride=1,
filter_size=1,
norm_type=norm_type,
initializer=XavierUniform(),
norm_name=name + '.conv3.norm',
name=name + '.conv3')
self.deconv.add_sublayer('conv1', conv1)
self.deconv.add_sublayer('relu6_1', nn.ReLU6())
self.deconv.add_sublayer('conv2', conv2)
self.deconv.add_sublayer('bn', bn)
self.deconv.add_sublayer('relu6_2', nn.ReLU6())
self.deconv.add_sublayer('conv3', conv3)
self.deconv.add_sublayer('relu6_3', nn.ReLU6())
def forward(self, inputs):
return self.deconv(inputs)
class LiteUpsample(nn.Layer):
def __init__(self, ch_in, ch_out, norm_type='bn', name=None):
super(LiteUpsample, self).__init__()
self.deconv = DeConv(
ch_in, ch_out, norm_type=norm_type, name=name + '.deconv')
self.conv = LiteConv(
ch_in, ch_out, norm_type=norm_type, name=name + '.liteconv')
def forward(self, inputs):
deconv_up = self.deconv(inputs)
conv = self.conv(inputs)
interp_up = F.interpolate(conv, scale_factor=2., mode='bilinear')
return deconv_up + interp_up
class ShortCut(nn.Layer): class ShortCut(nn.Layer):
def __init__(self, layer_num, ch_out, name=None): def __init__(self,
layer_num,
ch_in,
ch_out,
norm_type='bn',
lite_neck=False,
name=None):
super(ShortCut, self).__init__() super(ShortCut, self).__init__()
shortcut_conv = nn.Sequential() shortcut_conv = nn.Sequential()
ch_in = ch_out * 2
for i in range(layer_num): for i in range(layer_num):
fan_out = 3 * 3 * ch_out fan_out = 3 * 3 * ch_out
std = math.sqrt(2. / fan_out) std = math.sqrt(2. / fan_out)
in_channels = ch_in if i == 0 else ch_out in_channels = ch_in if i == 0 else ch_out
shortcut_name = name + '.conv.{}'.format(i) shortcut_name = name + '.conv.{}'.format(i)
shortcut_conv.add_sublayer( if lite_neck:
shortcut_name, shortcut_conv.add_sublayer(
nn.Conv2D( shortcut_name,
in_channels=in_channels, LiteConv(
out_channels=ch_out, in_channels=in_channels,
kernel_size=3, out_channels=ch_out,
padding=1, with_act=i < layer_num - 1,
weight_attr=ParamAttr(initializer=Normal(0, std)), norm_type=norm_type,
bias_attr=ParamAttr( name=shortcut_name))
learning_rate=2., regularizer=L2Decay(0.)))) else:
if i < layer_num - 1: shortcut_conv.add_sublayer(
shortcut_conv.add_sublayer(shortcut_name + '.act', nn.ReLU()) shortcut_name,
nn.Conv2D(
in_channels=in_channels,
out_channels=ch_out,
kernel_size=3,
padding=1,
weight_attr=ParamAttr(initializer=Normal(0, std)),
bias_attr=ParamAttr(
learning_rate=2., regularizer=L2Decay(0.))))
if i < layer_num - 1:
shortcut_conv.add_sublayer(shortcut_name + '.act',
nn.ReLU())
self.shortcut = self.add_sublayer('short', shortcut_conv) self.shortcut = self.add_sublayer('short', shortcut_conv)
def forward(self, feat): def forward(self, feat):
...@@ -93,35 +170,68 @@ class TTFFPN(nn.Layer): ...@@ -93,35 +170,68 @@ class TTFFPN(nn.Layer):
in_channels (list): number of input feature channels from backbone. in_channels (list): number of input feature channels from backbone.
[128,256,512,1024] by default, means the channels of DarkNet53 [128,256,512,1024] by default, means the channels of DarkNet53
backbone return_idx [1,2,3,4]. backbone return_idx [1,2,3,4].
planes (list): the number of output feature channels of FPN.
[256, 128, 64] by default
shortcut_num (list): the number of convolution layers in each shortcut. shortcut_num (list): the number of convolution layers in each shortcut.
[3,2,1] by default, means DarkNet53 backbone return_idx_1 has 3 convs [3,2,1] by default, means DarkNet53 backbone return_idx_1 has 3 convs
in its shortcut, return_idx_2 has 2 convs and return_idx_3 has 1 conv. in its shortcut, return_idx_2 has 2 convs and return_idx_3 has 1 conv.
norm_type (string): norm type, 'sync_bn', 'bn', 'gn' are optional.
bn by default
lite_neck (bool): whether to use lite conv in TTFNet FPN,
False by default
fusion_method (string): the method to fusion upsample and lateral layer.
'add' and 'concat' are optional, add by default
""" """
__shared__ = ['norm_type']
def __init__(self, def __init__(self,
in_channels=[128, 256, 512, 1024], in_channels,
shortcut_num=[3, 2, 1]): planes=[256, 128, 64],
shortcut_num=[3, 2, 1],
norm_type='bn',
lite_neck=False,
fusion_method='add'):
super(TTFFPN, self).__init__() super(TTFFPN, self).__init__()
self.planes = [c // 2 for c in in_channels[:-1]][::-1] self.planes = planes
self.shortcut_num = shortcut_num[::-1] self.shortcut_num = shortcut_num[::-1]
self.shortcut_len = len(shortcut_num) self.shortcut_len = len(shortcut_num)
self.ch_in = in_channels[::-1] self.ch_in = in_channels[::-1]
self.fusion_method = fusion_method
self.upsample_list = [] self.upsample_list = []
self.shortcut_list = [] self.shortcut_list = []
self.upper_list = []
for i, out_c in enumerate(self.planes): for i, out_c in enumerate(self.planes):
in_c = self.ch_in[i] if i == 0 else self.ch_in[i] // 2 in_c = self.ch_in[i] if i == 0 else self.upper_list[-1]
upsample_module = LiteUpsample if lite_neck else Upsample
upsample = self.add_sublayer( upsample = self.add_sublayer(
'upsample.' + str(i), 'upsample.' + str(i),
Upsample( upsample_module(
in_c, out_c, name='upsample.' + str(i))) in_c,
out_c,
norm_type=norm_type,
name='deconv_layers.' + str(i)))
self.upsample_list.append(upsample) self.upsample_list.append(upsample)
if i < self.shortcut_len: if i < self.shortcut_len:
shortcut = self.add_sublayer( shortcut = self.add_sublayer(
'shortcut.' + str(i), 'shortcut.' + str(i),
ShortCut( ShortCut(
self.shortcut_num[i], out_c, name='shortcut.' + str(i))) self.shortcut_num[i],
self.ch_in[i + 1],
out_c,
norm_type=norm_type,
lite_neck=lite_neck,
name='shortcut.' + str(i)))
self.shortcut_list.append(shortcut) self.shortcut_list.append(shortcut)
if self.fusion_method == 'add':
upper_c = out_c
elif self.fusion_method == 'concat':
upper_c = out_c * 2
else:
raise ValueError('Illegal fusion method. Expected add or\
concat, but received {}'.format(self.fusion_method))
self.upper_list.append(upper_c)
def forward(self, inputs): def forward(self, inputs):
feat = inputs[-1] feat = inputs[-1]
...@@ -129,7 +239,10 @@ class TTFFPN(nn.Layer): ...@@ -129,7 +239,10 @@ class TTFFPN(nn.Layer):
feat = self.upsample_list[i](feat) feat = self.upsample_list[i](feat)
if i < self.shortcut_len: if i < self.shortcut_len:
shortcut = self.shortcut_list[i](inputs[-i - 2]) shortcut = self.shortcut_list[i](inputs[-i - 2])
feat = feat + shortcut if self.fusion_method == 'add':
feat = feat + shortcut
else:
feat = paddle.concat([feat, shortcut], axis=1)
return feat return feat
@classmethod @classmethod
...@@ -138,4 +251,4 @@ class TTFFPN(nn.Layer): ...@@ -138,4 +251,4 @@ class TTFFPN(nn.Layer):
@property @property
def out_shape(self): def out_shape(self):
return [ShapeSpec(channels=self.planes[-1], )] return [ShapeSpec(channels=self.upper_list[-1], )]
architecture: TTFNet
use_gpu: true
max_iters: 150000
log_smooth_window: 20
save_dir: output
snapshot_iter: 10000
metric: COCO
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar
weights: output/pafnet_10x_coco/model_final
num_classes: 80
use_ema: true
ema_decay: 0.9998
TTFNet:
backbone: ResNet
ttf_head: TTFHead
ResNet:
norm_type: sync_bn
freeze_at: 0
freeze_norm: false
norm_decay: 0.
depth: 50
feature_maps: [2, 3, 4, 5]
variant: d
dcn_v2_stages: [3, 4, 5]
TTFHead:
head_conv: 128
wh_conv: 64
hm_head_conv_num: 2
wh_head_conv_num: 2
wh_offset_base: 16
wh_loss: GiouLoss
dcn_head: True
GiouLoss:
loss_weight: 5.
do_average: false
use_class_weight: false
LearningRate:
base_lr: 0.015
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones:
- 112500
- 137500
- !LinearWarmup
start_factor: 0.2
steps: 500
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0004
type: L2
TrainReader:
inputs_def:
fields: ['image', 'ttf_heatmap', 'ttf_box_target', 'ttf_reg_weight']
dataset:
!COCODataSet
image_dir: train2017
anno_path: annotations/instances_train2017.json
dataset_dir: dataset/coco
with_background: false
sample_transforms:
- !DecodeImage
to_rgb: true
with_cutmix: True
- !CutmixImage
alpha: 1.5
beta: 1.5
- !ColorDistort
hue: [-18., 18., 0.5]
saturation: [0.5, 1.5, 0.5]
contrast: [0.5, 1.5, 0.5]
brightness: [-32., 32., 0.5]
random_apply: False
hsv_format: True
random_channel: True
- !RandomExpand
ratio: 4
prob: 0.5
fill_value: [123.675, 116.28, 103.53]
- !RandomCrop
aspect_ratio: NULL
cover_all_box: True
- !RandomFlipImage
prob: 0.5
batch_transforms:
- !RandomShape
sizes: [416, 448, 480, 512, 544, 576, 608, 640, 672]
random_inter: True
resize_box: True
- !NormalizeImage
is_channel_first: false
is_scale: false
mean: [123.675, 116.28, 103.53]
std: [58.395, 57.12, 57.375]
- !Permute
to_bgr: false
channel_first: true
- !Gt2TTFTarget
num_classes: 80
down_ratio: 4
- !PadBatch
pad_to_stride: 32
batch_size: 12
shuffle: true
worker_num: 8
bufsize: 2
use_process: false
cutmix_epoch: 100
EvalReader:
inputs_def:
image_shape: [3, 512, 512]
fields: ['image', 'im_id', 'scale_factor']
dataset:
!COCODataSet
image_dir: val2017
anno_path: annotations/instances_val2017.json
dataset_dir: dataset/coco
with_background: false
sample_transforms:
- !DecodeImage
to_rgb: True
- !Resize
target_dim: 512
- !NormalizeImage
mean: [123.675, 116.28, 103.53]
std: [58.395, 57.12, 57.375]
is_scale: false
is_channel_first: false
- !Permute
to_bgr: false
channel_first: True
batch_size: 1
drop_empty: false
worker_num: 8
bufsize: 16
TestReader:
inputs_def:
image_shape: [3, 512, 512]
fields: ['image', 'im_id', 'scale_factor']
dataset:
!ImageFolder
anno_path: annotations/instances_val2017.json
with_background: false
sample_transforms:
- !DecodeImage
to_rgb: True
- !Resize
interp: 1
target_dim: 512
- !NormalizeImage
mean: [123.675, 116.28, 103.53]
std: [58.395, 57.12, 57.375]
is_scale: false
is_channel_first: false
- !Permute
to_bgr: false
channel_first: True
batch_size: 1
architecture: TTFNet
use_gpu: true
max_iters: 300000
log_smooth_window: 20
save_dir: output
snapshot_iter: 50000
metric: COCO
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x1_0_ssld_pretrained.tar
weights: output/pafnet_lite_mobilenet_v3_20x_coco/model_final
num_classes: 80
TTFNet:
backbone: MobileNetV3RCNN
ttf_head: TTFLiteHead
MobileNetV3RCNN:
norm_type: sync_bn
norm_decay: 0.0
model_name: large
scale: 1.0
conv_decay: 0.00001
lr_mult_list: [0.25, 0.25, 0.5, 0.5, 0.75]
freeze_norm: false
TTFLiteHead:
head_conv: 48
GiouLoss:
loss_weight: 5.
do_average: false
use_class_weight: false
LearningRate:
base_lr: 0.015
schedulers:
- !PiecewiseDecay
gamma: 0.1
milestones:
- 225000
- 275000
- !LinearWarmup
start_factor: 0.2
steps: 1000
OptimizerBuilder:
clip_grad_by_norm: 35
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0004
type: L2
TrainReader:
inputs_def:
fields: ['image', 'ttf_heatmap', 'ttf_box_target', 'ttf_reg_weight']
dataset:
!COCODataSet
image_dir: train2017
anno_path: annotations/instances_train2017.json
dataset_dir: dataset/coco
with_background: false
sample_transforms:
- !DecodeImage
to_rgb: true
with_cutmix: True
- !ColorDistort
hue: [-18., 18., 0.5]
saturation: [0.5, 1.5, 0.5]
contrast: [0.5, 1.5, 0.5]
brightness: [-32., 32., 0.5]
random_apply: False
hsv_format: False
random_channel: True
- !RandomExpand
ratio: 4
prob: 0.5
fill_value: [123.675, 116.28, 103.53]
- !RandomCrop
aspect_ratio: NULL
cover_all_box: True
- !CutmixImage
alpha: 1.5
beta: 1.5
- !RandomFlipImage
prob: 0.5
- !GridMaskOp
use_h: true
use_w: true
rotate: 1
offset: false
ratio: 0.5
mode: 1
prob: 0.7
upper_iter: 300000
batch_transforms:
- !RandomShape
sizes: [320, 352, 384, 416, 448, 480, 512]
random_inter: True
resize_box: True
- !NormalizeImage
is_channel_first: false
is_scale: false
mean: [123.675, 116.28, 103.53]
std: [58.395, 57.12, 57.375]
- !Permute
to_bgr: false
channel_first: true
- !Gt2TTFTarget
num_classes: 80
down_ratio: 4
- !PadBatch
pad_to_stride: 32
batch_size: 12
shuffle: true
worker_num: 8
bufsize: 2
use_process: false
cutmix_epoch: 200
EvalReader:
inputs_def:
image_shape: [3, 320, 320]
fields: ['image', 'im_id', 'scale_factor']
dataset:
!COCODataSet
image_dir: val2017
anno_path: annotations/instances_val2017.json
dataset_dir: dataset/coco
with_background: false
sample_transforms:
- !DecodeImage
to_rgb: True
- !Resize
target_dim: 320
- !NormalizeImage
mean: [123.675, 116.28, 103.53]
std: [58.395, 57.12, 57.375]
is_scale: false
is_channel_first: false
- !Permute
to_bgr: false
channel_first: True
batch_size: 1
drop_empty: false
worker_num: 2
bufsize: 2
TestReader:
inputs_def:
image_shape: [3, 320, 320]
fields: ['image', 'im_id', 'scale_factor']
dataset:
!ImageFolder
anno_path: annotations/instances_val2017.json
with_background: false
sample_transforms:
- !DecodeImage
to_rgb: True
- !Resize
interp: 1
target_dim: 320
- !NormalizeImage
mean: [123.675, 116.28, 103.53]
std: [58.395, 57.12, 57.375]
is_scale: false
is_channel_first: false
- !Permute
to_bgr: false
channel_first: True
batch_size: 1
...@@ -24,10 +24,10 @@ from paddle.fluid.param_attr import ParamAttr ...@@ -24,10 +24,10 @@ from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.initializer import Normal, Constant, Uniform, Xavier from paddle.fluid.initializer import Normal, Constant, Uniform, Xavier
from paddle.fluid.regularizer import L2Decay from paddle.fluid.regularizer import L2Decay
from ppdet.core.workspace import register from ppdet.core.workspace import register
from ppdet.modeling.ops import DeformConv, DropBlock from ppdet.modeling.ops import DeformConv, DropBlock, ConvNorm
from ppdet.modeling.losses import GiouLoss from ppdet.modeling.losses import GiouLoss
__all__ = ['TTFHead'] __all__ = ['TTFHead', 'TTFLiteHead']
@register @register
...@@ -65,6 +65,8 @@ class TTFHead(object): ...@@ -65,6 +65,8 @@ class TTFHead(object):
drop_block(bool): whether use dropblock. False by default. drop_block(bool): whether use dropblock. False by default.
block_size(int): block_size parameter for drop_block. 3 by default. block_size(int): block_size parameter for drop_block. 3 by default.
keep_prob(float): keep_prob parameter for drop_block. 0.9 by default. keep_prob(float): keep_prob parameter for drop_block. 0.9 by default.
fusion_method (string): Method to fusion upsample and lateral branch.
'add' and 'concat' are optional, add by default
""" """
__inject__ = ['wh_loss'] __inject__ = ['wh_loss']
...@@ -90,7 +92,8 @@ class TTFHead(object): ...@@ -90,7 +92,8 @@ class TTFHead(object):
dcn_head=False, dcn_head=False,
drop_block=False, drop_block=False,
block_size=3, block_size=3,
keep_prob=0.9): keep_prob=0.9,
fusion_method='add'):
super(TTFHead, self).__init__() super(TTFHead, self).__init__()
self.head_conv = head_conv self.head_conv = head_conv
self.num_classes = num_classes self.num_classes = num_classes
...@@ -115,6 +118,7 @@ class TTFHead(object): ...@@ -115,6 +118,7 @@ class TTFHead(object):
self.drop_block = drop_block self.drop_block = drop_block
self.block_size = block_size self.block_size = block_size
self.keep_prob = keep_prob self.keep_prob = keep_prob
self.fusion_method = fusion_method
def shortcut(self, x, out_c, layer_num, kernel_size=3, padding=1, def shortcut(self, x, out_c, layer_num, kernel_size=3, padding=1,
name=None): name=None):
...@@ -255,7 +259,14 @@ class TTFHead(object): ...@@ -255,7 +259,14 @@ class TTFHead(object):
out_c, out_c,
self.shortcut_num[i], self.shortcut_num[i],
name=name + '.shortcut_layers.' + str(i)) name=name + '.shortcut_layers.' + str(i))
feat = fluid.layers.elementwise_add(feat, shortcut) if self.fusion_method == 'add':
feat = fluid.layers.elementwise_add(feat, shortcut)
elif self.fusion_method == 'concat':
feat = fluid.layers.concat([feat, shortcut], axis=1)
else:
raise ValueError(
"Illegal fusion method, expected 'add' or 'concat', but received {}".
format(self.fusion_method))
hm = self.hm_head(feat, name=name + '.hm', is_test=is_test) hm = self.hm_head(feat, name=name + '.hm', is_test=is_test)
wh = self.wh_head(feat, name=name + '.wh') * self.wh_offset_base wh = self.wh_head(feat, name=name + '.wh') * self.wh_offset_base
...@@ -273,12 +284,13 @@ class TTFHead(object): ...@@ -273,12 +284,13 @@ class TTFHead(object):
# batch size is 1 # batch size is 1
scores_r = fluid.layers.reshape(scores, [cat, -1]) scores_r = fluid.layers.reshape(scores, [cat, -1])
topk_scores, topk_inds = fluid.layers.topk(scores_r, k) topk_scores, topk_inds = fluid.layers.topk(scores_r, k)
topk_ys = topk_inds / width topk_ys = topk_inds // width
topk_xs = topk_inds % width topk_xs = topk_inds % width
topk_score_r = fluid.layers.reshape(topk_scores, [-1]) topk_score_r = fluid.layers.reshape(topk_scores, [-1])
topk_score, topk_ind = fluid.layers.topk(topk_score_r, k) topk_score, topk_ind = fluid.layers.topk(topk_score_r, k)
topk_clses = fluid.layers.cast(topk_ind / k, 'float32') k_t = fluid.layers.assign(np.array([k], dtype='int64'))
topk_clses = fluid.layers.cast(topk_ind / k_t, 'float32')
topk_inds = fluid.layers.reshape(topk_inds, [-1]) topk_inds = fluid.layers.reshape(topk_inds, [-1])
topk_ys = fluid.layers.reshape(topk_ys, [-1, 1]) topk_ys = fluid.layers.reshape(topk_ys, [-1, 1])
...@@ -384,3 +396,172 @@ class TTFHead(object): ...@@ -384,3 +396,172 @@ class TTFHead(object):
ttf_loss = {'hm_loss': hm_loss, 'wh_loss': wh_loss} ttf_loss = {'hm_loss': hm_loss, 'wh_loss': wh_loss}
return ttf_loss return ttf_loss
@register
class TTFLiteHead(TTFHead):
"""
TTFLiteHead
Lite version for TTFNet
Args:
head_conv(int): the default channel number of convolution in head.
32 by default.
num_classes(int): the number of classes, 80 by default.
planes(tuple): the channel number of convolution in each upsample.
(96, 48, 24) by default.
wh_conv(int): the channel number of convolution in wh head.
24 by default.
wh_loss(object): `GiouLoss` instance.
shortcut_num(tuple): the number of convolution layers in each shortcut.
(1, 2, 2) by default.
fusion_method (string): Method to fusion upsample and lateral branch.
'add' and 'concat' are optional, add by default
"""
__inject__ = ['wh_loss']
__shared__ = ['num_classes']
def __init__(self,
head_conv=32,
num_classes=80,
planes=(96, 48, 24),
wh_conv=24,
wh_loss='GiouLoss',
shortcut_num=(1, 2, 2),
fusion_method='concat'):
super(TTFLiteHead, self).__init__(
head_conv=head_conv,
num_classes=num_classes,
planes=planes,
wh_conv=wh_conv,
wh_loss=wh_loss,
shortcut_num=shortcut_num,
fusion_method=fusion_method)
def _lite_conv(self, x, out_c, act=None, name=None):
conv1 = ConvNorm(
input=x,
num_filters=x.shape[1],
filter_size=5,
groups=x.shape[1],
norm_type='bn',
act='relu6',
initializer=Xavier(),
name=name + '.depthwise',
norm_name=name + '.depthwise.bn')
conv2 = ConvNorm(
input=conv1,
num_filters=out_c,
filter_size=1,
norm_type='bn',
act=act,
initializer=Xavier(),
name=name + '.pointwise_linear',
norm_name=name + '.pointwise_linear.bn')
conv3 = ConvNorm(
input=conv2,
num_filters=out_c,
filter_size=1,
norm_type='bn',
act='relu6',
initializer=Xavier(),
name=name + '.pointwise',
norm_name=name + '.pointwise.bn')
conv4 = ConvNorm(
input=conv3,
num_filters=out_c,
filter_size=5,
groups=out_c,
norm_type='bn',
act=act,
initializer=Xavier(),
name=name + '.depthwise_linear',
norm_name=name + '.depthwise_linear.bn')
return conv4
def shortcut(self, x, out_c, layer_num, name=None):
assert layer_num > 0
for i in range(layer_num):
param_name = name + '.layers.' + str(i * 2)
act = 'relu6' if i < layer_num - 1 else None
x = self._lite_conv(x, out_c, act, param_name)
return x
def _deconv_upsample(self, x, out_c, name=None):
conv1 = ConvNorm(
input=x,
num_filters=out_c,
filter_size=1,
norm_type='bn',
act='relu6',
name=name + '.pointwise',
initializer=Xavier(),
norm_name=name + '.pointwise.bn')
conv2 = fluid.layers.conv2d_transpose(
input=conv1,
num_filters=out_c,
filter_size=4,
padding=1,
stride=2,
groups=out_c,
param_attr=ParamAttr(
name=name + '.deconv.weights', initializer=Xavier()),
bias_attr=False)
bn = fluid.layers.batch_norm(
input=conv2,
act='relu6',
param_attr=ParamAttr(
name=name + '.deconv.bn.scale', regularizer=L2Decay(0.)),
bias_attr=ParamAttr(
name=name + '.deconv.bn.offset', regularizer=L2Decay(0.)),
moving_mean_name=name + '.deconv.bn.mean',
moving_variance_name=name + '.deconv.bn.variance')
conv3 = ConvNorm(
input=bn,
num_filters=out_c,
filter_size=1,
norm_type='bn',
act='relu6',
name=name + '.normal',
initializer=Xavier(),
norm_name=name + '.normal.bn')
return conv3
def _interp_upsample(self, x, out_c, name=None):
conv = self._lite_conv(x, out_c, 'relu6', name)
up = fluid.layers.resize_bilinear(conv, scale=2)
return up
def upsample(self, x, out_c, name=None):
deconv_up = self._deconv_upsample(x, out_c, name=name + '.dilation_up')
interp_up = self._interp_upsample(x, out_c, name=name + '.interp_up')
return deconv_up + interp_up
def _head(self,
x,
out_c,
conv_num=1,
head_out_c=None,
name=None,
is_test=False):
head_out_c = self.head_conv if not head_out_c else head_out_c
for i in range(conv_num):
conv_name = '{}.{}.conv'.format(name, i)
x = self._lite_conv(x, head_out_c, 'relu6', conv_name)
bias_init = float(-np.log((1 - 0.01) / 0.01)) if '.hm' in name else 0.
conv_b_init = Constant(bias_init)
x = fluid.layers.conv2d(
x,
out_c,
1,
param_attr=ParamAttr(name='{}.{}.weight'.format(name, conv_num)),
bias_attr=ParamAttr(
learning_rate=2.,
regularizer=L2Decay(0.),
name='{}.{}.bias'.format(name, conv_num),
initializer=conv_b_init))
return x
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册