Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
s920243400
PaddleDetection
提交
3f7e70d4
P
PaddleDetection
项目概览
s920243400
/
PaddleDetection
与 Fork 源项目一致
Fork自
PaddlePaddle / PaddleDetection
通知
2
Star
0
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
PaddleDetection
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
未验证
提交
3f7e70d4
编写于
1月 05, 2023
作者:
W
Wenyu
提交者:
GitHub
1月 05, 2023
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
add vit + mask rcnn (#7592)
上级
9daf164b
变更
4
显示空白变更内容
内联
并排
Showing
4 changed file
with
72 addition
and
12 deletion
+72
-12
configs/vitdet/README.md
configs/vitdet/README.md
+8
-5
configs/vitdet/faster_rcnn_vit_base_fpn_cae_1x_coco.yml
configs/vitdet/faster_rcnn_vit_base_fpn_cae_1x_coco.yml
+24
-4
configs/vitdet/mask_rcnn_vit_large_hrfpn_cae_1x_coco.yml
configs/vitdet/mask_rcnn_vit_large_hrfpn_cae_1x_coco.yml
+29
-0
ppdet/modeling/backbones/vision_transformer.py
ppdet/modeling/backbones/vision_transformer.py
+11
-3
未找到文件。
configs/vitdet/README.md
浏览文件 @
3f7e70d4
...
...
@@ -13,11 +13,14 @@ non-trivial when new architectures, such as Vision Transformer (ViT) models, arr
## Model Zoo
| Backbone | Pretrained | Model | Scheduler | Images/GPU | Box AP | Config | Download |
|:------:|:--------:|:--------------:|:--------------:|:--------------:|:------:|:------:|:--------:|
| ViT-base | CAE | Cascade RCNN | 1x | 1 | 52.7 |
[
config
](
./cascade_rcnn_vit_base_hrfpn_cae_1x_coco.yml
)
|
[
model
](
https://bj.bcebos.com/v1/paddledet/models/cascade_rcnn_vit_base_hrfpn_cae_1x_coco.pdparams
)
|
| ViT-large | CAE | Cascade RCNN | 1x | 1 | 55.7 |
[
config
](
./cascade_rcnn_vit_large_hrfpn_cae_1x_coco.yml
)
|
[
model
](
https://bj.bcebos.com/v1/paddledet/models/cascade_rcnn_vit_large_hrfpn_cae_1x_coco.pdparams
)
|
| ViT-base | CAE | PP-YOLOE | 36e | 2 | 52.2 |
[
config
](
./ppyoloe_vit_base_csppan_cae_36e_coco.yml
)
|
[
model
](
https://bj.bcebos.com/v1/paddledet/models/ppyoloe_vit_base_csppan_cae_36e_coco.pdparams
)
|
| Model | Backbone | Pretrained | Scheduler | Images/GPU | Box AP | Mask AP | Config | Download |
|:------:|:--------:|:--------------:|:--------------:|:--------------:|:--------------:|:------:|:------:|:--------:|
| Cascade RCNN | ViT-base | CAE | 1x | 1 | 52.7 | - |
[
config
](
./cascade_rcnn_vit_base_hrfpn_cae_1x_coco.yml
)
|
[
model
](
https://bj.bcebos.com/v1/paddledet/models/cascade_rcnn_vit_base_hrfpn_cae_1x_coco.pdparams
)
|
| Cascade RCNN | ViT-large | CAE | 1x | 1 | 55.7 | - |
[
config
](
./cascade_rcnn_vit_large_hrfpn_cae_1x_coco.yml
)
|
[
model
](
https://bj.bcebos.com/v1/paddledet/models/cascade_rcnn_vit_large_hrfpn_cae_1x_coco.pdparams
)
|
| PP-YOLOE | ViT-base | CAE | 36e | 2 | 52.2 | - |
[
config
](
./ppyoloe_vit_base_csppan_cae_36e_coco.yml
)
|
[
model
](
https://bj.bcebos.com/v1/paddledet/models/ppyoloe_vit_base_csppan_cae_36e_coco.pdparams
)
|
| Mask RCNN | ViT-base | CAE | 1x | 1 | 50.6 | 44.9 |
[
config
](
./mask_rcnn_vit_base_hrfpn_cae_1x_coco.yml
)
|
[
model
](
https://bj.bcebos.com/v1/paddledet/models/mask_rcnn_vit_base_hrfpn_cae_1x_coco.pdparams
)
|
| Mask RCNN | ViT-large | CAE | 1x | 1 | 54.2 | 47.4 |
[
config
](
./mask_rcnn_vit_large_hrfpn_cae_1x_coco.yml
)
|
[
model
](
https://bj.bcebos.com/v1/paddledet/models/mask_rcnn_vit_large_hrfpn_cae_1x_coco.pdparams
)
|
**Notes:**
-
Model is trained on COCO train2017 dataset and evaluated on val2017 results of
`mAP(IoU=0.5:0.95)
...
...
configs/vitdet/faster_rcnn_vit_base_fpn_cae_1x_coco.yml
浏览文件 @
3f7e70d4
...
...
@@ -2,7 +2,7 @@
_BASE_
:
[
'
../datasets/coco_detection.yml'
,
'
../runtime.yml'
,
'
./_base_/reader.yml'
,
'
./_base_/
faster_rcnn_
reader.yml'
,
'
./_base_/optimizer_base_1x.yml'
]
...
...
@@ -81,15 +81,30 @@ RPNHead:
nms_thresh
:
0.7
pre_nms_top_n
:
1000
post_nms_top_n
:
1000
loss_rpn_bbox
:
SmoothL1Loss
SmoothL1Loss
:
beta
:
0.1111111111111111
BBoxHead
:
head
:
TwoFCHead
# head: TwoFCHead
head
:
XConvNormHead
roi_extractor
:
resolution
:
7
sampling_ratio
:
0
aligned
:
True
bbox_assigner
:
BBoxAssigner
loss_normalize_pos
:
True
bbox_loss
:
GIoULoss
GIoULoss
:
loss_weight
:
10.
reduction
:
'
none'
eps
:
0.000001
# 1e-6
BBoxAssigner
:
batch_size_per_im
:
512
...
...
@@ -98,8 +113,13 @@ BBoxAssigner:
fg_fraction
:
0.25
use_random
:
True
TwoFCHead
:
out_channel
:
1024
# TwoFCHead:
# out_channel: 1024
XConvNormHead
:
num_convs
:
4
norm_type
:
bn
BBoxPostProcess
:
decode
:
RCNNBox
...
...
configs/vitdet/mask_rcnn_vit_large_hrfpn_cae_1x_coco.yml
0 → 100644
浏览文件 @
3f7e70d4
_BASE_
:
[
'
./mask_rcnn_vit_base_hrfpn_cae_1x_coco.yml'
]
weights
:
output/mask_rcnn_vit_large_hrfpn_cae_1x_coco/model_final
depth
:
&depth
24
dim
:
&dim
1024
use_fused_allreduce_gradients
:
&use_checkpoint
True
VisionTransformer
:
img_size
:
[
800
,
1344
]
embed_dim
:
*dim
depth
:
*depth
num_heads
:
16
drop_path_rate
:
0.25
out_indices
:
[
7
,
11
,
15
,
23
]
use_checkpoint
:
*use_checkpoint
pretrained
:
https://bj.bcebos.com/v1/paddledet/models/pretrained/vit_large_cae_pretrained.pdparams
HRFPN
:
in_channels
:
[
*dim
,
*dim
,
*dim
,
*dim
]
OptimizerBuilder
:
optimizer
:
layer_decay
:
0.9
weight_decay
:
0.02
num_layers
:
*depth
ppdet/modeling/backbones/vision_transformer.py
浏览文件 @
3f7e70d4
...
...
@@ -509,16 +509,24 @@ class VisionTransformer(nn.Layer):
dim
=
x
.
shape
[
-
1
]
# we add a small number to avoid floating point error in the interpolation
# see discussion at https://github.com/facebookresearch/dino/issues/8
w0
,
h0
=
w0
+
0.1
,
h0
+
0.1
# w0, h0 = w0 + 0.1, h0 + 0.1
# patch_pos_embed = nn.functional.interpolate(
# patch_pos_embed.reshape([
# 1, self.patch_embed.num_patches_w,
# self.patch_embed.num_patches_h, dim
# ]).transpose((0, 3, 1, 2)),
# scale_factor=(w0 / self.patch_embed.num_patches_w,
# h0 / self.patch_embed.num_patches_h),
# mode='bicubic', )
patch_pos_embed
=
nn
.
functional
.
interpolate
(
patch_pos_embed
.
reshape
([
1
,
self
.
patch_embed
.
num_patches_w
,
self
.
patch_embed
.
num_patches_h
,
dim
]).
transpose
((
0
,
3
,
1
,
2
)),
scale_factor
=
(
w0
/
self
.
patch_embed
.
num_patches_w
,
h0
/
self
.
patch_embed
.
num_patches_h
),
(
w0
,
h0
),
mode
=
'bicubic'
,
)
assert
int
(
w0
)
==
patch_pos_embed
.
shape
[
-
2
]
and
int
(
h0
)
==
patch_pos_embed
.
shape
[
-
1
]
patch_pos_embed
=
patch_pos_embed
.
transpose
(
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录