Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
openvinotoolkit
mmaction2
提交
4f422df0
M
mmaction2
项目概览
openvinotoolkit
/
mmaction2
大约 1 年 前同步成功
通知
2
Star
5
Fork
3
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
DevOps
流水线
流水线任务
计划
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
M
mmaction2
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
DevOps
DevOps
流水线
流水线任务
计划
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
流水线任务
提交
Issue看板
体验新版 GitCode,发现更多精彩内容 >>
未验证
提交
4f422df0
编写于
8月 20, 2020
作者:
J
Jintao Lin
提交者:
GitHub
8月 20, 2020
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Update nonlocal readme in TSM and I3D (#131)
上级
478c746a
变更
8
隐藏空白更改
内联
并排
Showing
8 changed file
with
564 addition
and
2 deletion
+564
-2
configs/recognition/i3d/README.md
configs/recognition/i3d/README.md
+22
-0
configs/recognition/i3d/i3d_nl_dot_product_r50_32x2x1_100e_kinetics400_rgb.py
...i3d/i3d_nl_dot_product_r50_32x2x1_100e_kinetics400_rgb.py
+128
-0
configs/recognition/i3d/i3d_nl_embedded_gaussian_r50_32x2x1_100e_kinetics400_rgb.py
...d_nl_embedded_gaussian_r50_32x2x1_100e_kinetics400_rgb.py
+1
-1
configs/recognition/i3d/i3d_nl_gaussian_r50_32x2x1_100e_kinetics400_rgb.py
...on/i3d/i3d_nl_gaussian_r50_32x2x1_100e_kinetics400_rgb.py
+128
-0
configs/recognition/tsm/README.md
configs/recognition/tsm/README.md
+20
-0
configs/recognition/tsm/tsm_nl_dot_product_r50_1x1x8_50e_kinetics400_rgb.py
...n/tsm/tsm_nl_dot_product_r50_1x1x8_50e_kinetics400_rgb.py
+132
-0
configs/recognition/tsm/tsm_nl_embedded_gaussian_r50_1x1x8_50e_kinetics400_rgb.py
...tsm_nl_embedded_gaussian_r50_1x1x8_50e_kinetics400_rgb.py
+1
-1
configs/recognition/tsm/tsm_nl_gaussian_r50_1x1x8_50e_kinetics400_rgb.py
...tion/tsm/tsm_nl_gaussian_r50_1x1x8_50e_kinetics400_rgb.py
+132
-0
未找到文件。
configs/recognition/i3d/README.md
浏览文件 @
4f422df0
# I3D
## Introduction
```
@inproceedings{inproceedings,
author = {Carreira, J. and Zisserman, Andrew},
year = {2017},
month = {07},
pages = {4724-4733},
title = {Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset},
doi = {10.1109/CVPR.2017.502}
}
@article{NonLocal2018,
author = {Xiaolong Wang and Ross Girshick and Abhinav Gupta and Kaiming He},
title = {Non-local Neural Networks},
journal = {CVPR},
year = {2018}
}
```
## Model Zoo
### Kinetics-400
...
...
@@ -12,6 +31,9 @@
|
[
i3d_r50_dense_32x2x1_100e_kinetics400_rgb
](
/configs/recognition/i3d/i3d_r50_dense_32x2x1_100e_kinetics400_rgb.py
)
|short-side 256|8| ResNet50| ImageNet|73.48|91.00|x| 5170|
[
ckpt
](
https://openmmlab.oss-accelerate.aliyuncs.com/mmaction/recognition/i3d/i3d_r50_dense_256p_32x2x1_100e_kinetics400_rgb/i3d_r50_dense_256p_32x2x1_100e_kinetics400_rgb_20200725-24eb54cc.pth
)
|
[
log
](
https://openmmlab.oss-accelerate.aliyuncs.com/mmaction/recognition/i3d/i3d_r50_dense_256p_32x2x1_100e_kinetics400_rgb/20200725_031604.log
)
|
[
json
](
https://openmmlab.oss-accelerate.aliyuncs.com/mmaction/recognition/i3d/i3d_r50_dense_256p_32x2x1_100e_kinetics400_rgb/20200725_031604.log.json
)
|
|
[
i3d_r50_fast_32x2x1_100e_kinetics400_rgb
](
/configs/recognition/i3d/i3d_r50_fast_32x2x1_100e_kinetics400_rgb.py
)
|340x256|8| ResNet50 |ImageNet|72.32|90.72|1.8 (320x3 frames)| 5170|
[
ckpt
](
https://openmmlab.oss-accelerate.aliyuncs.com/mmaction/recognition/i3d/i3d_r50_fast_32x2x1_100e_kinetics400_rgb/i3d_r50_fast_32x2x1_100e_kinetics400_rgb_20200612-000e4d2a.pth
)
|
[
log
](
https://openmmlab.oss-accelerate.aliyuncs.com/mmaction/recognition/i3d/i3d_r50_fast_32x2x1_100e_kinetics400_rgb/20200612_233836.log
)
|
[
json
](
https://openmmlab.oss-accelerate.aliyuncs.com/mmaction/recognition/i3d/i3d_r50_fast_32x2x1_100e_kinetics400_rgb/20200612_233836.log.json
)
|
|
[
i3d_r50_fast_32x2x1_100e_kinetics400_rgb
](
/configs/recognition/i3d/i3d_r50_fast_32x2x1_100e_kinetics400_rgb.py
)
|short-side 256|8| ResNet50| ImageNet|73.24|90.99|x| 5170|
[
ckpt
](
https://openmmlab.oss-accelerate.aliyuncs.com/mmaction/recognition/i3d/i3d_r50_fast_256p_32x2x1_100e_kinetics400_rgb/i3d_r50_fast_256p_32x2x1_100e_kinetics400_rgb_20200817-4e90d1d5.pth
)
|
[
log
](
https://openmmlab.oss-accelerate.aliyuncs.com/mmaction/recognition/i3d/i3d_r50_fast_256p_32x2x1_100e_kinetics400_rgb/20200725_031457.log
)
|
[
json
](
https://openmmlab.oss-accelerate.aliyuncs.com/mmaction/recognition/i3d/i3d_r50_fast_256p_32x2x1_100e_kinetics400_rgb/20200725_031457.log.json
)
|
|
[
i3d_nl_embedded_gaussian_r50_32x2x1_100e_kinetics400_rgb
](
/configs/recognition/i3d/i3d_nl_embedded_gaussian_r50_32x2x1_100e_kinetics400_rgb.py
)
|short-side 256p|8x4| ResNet50 |ImageNet|74.71|91.81|x|6438|
[
ckpt
](
https://openmmlab.oss-accelerate.aliyuncs.com/mmaction/recognition/i3d/i3d_nl_embedded_gaussian_r50_32x2x1_100e_kinetics400_rgb/i3d_nl_embedded_gaussian_r50_32x2x1_100e_kinetics400_rgb_20200813-6e6aef1b.pth
)
|
[
log
](
https://openmmlab.oss-accelerate.aliyuncs.com/mmaction/recognition/i3d/i3d_nl_embedded_gaussian_r50_32x2x1_100e_kinetics400_rgb/20200813_034054.log
)
|
[
json
](
https://openmmlab.oss-accelerate.aliyuncs.com/mmaction/recognition/i3d/i3d_nl_embedded_gaussian_r50_32x2x1_100e_kinetics400_rgb/20200813_034054.log.json
)
|
|
[
i3d_nl_gaussian_r50_32x2x1_100e_kinetics400_rgb
](
/configs/recognition/i3d/i3d_nl_gaussian_r50_32x2x1_100e_kinetics400_rgb.py
)
|short-side 256p|8x4| ResNet50 |ImageNet|73.37|91.26|x|4944|
[
ckpt
](
https://openmmlab.oss-accelerate.aliyuncs.com/mmaction/recognition/i3d/i3d_nl_gaussian_r50_32x2x1_100e_kinetics400_rgb/i3d_nl_gaussian_r50_32x2x1_100e_kinetics400_rgb_20200815-17f84aa2.pth
)
|
[
log
](
https://openmmlab.oss-accelerate.aliyuncs.com/mmaction/recognition/i3d/i3d_nl_gaussian_r50_32x2x1_100e_kinetics400_rgb/20200813_034909.log
)
|
[
json
](
https://openmmlab.oss-accelerate.aliyuncs.com/mmaction/recognition/i3d/i3d_nl_gaussian_r50_32x2x1_100e_kinetics400_rgb/20200813_034909.log.json
)
|
|
[
i3d_nl_dot_product_r50_32x2x1_100e_kinetics400_rgb
](
/configs/recognition/i3d/i3d_nl_dot_product_r50_32x2x1_100e_kinetics400_rgb.py
)
|short-side 256p|8x4| ResNet50 |ImageNet|73.92|91.59|x|4832|
[
ckpt
](
https://openmmlab.oss-accelerate.aliyuncs.com/mmaction/recognition/i3d/i3d_nl_dot_product_r50_32x2x1_100e_kinetics400_rgb/i3d_nl_dot_product_r50_32x2x1_100e_kinetics400_rgb_20200814-7c30d5bb.pth
)
|
[
log
](
https://openmmlab.oss-accelerate.aliyuncs.com/mmaction/recognition/i3d/i3d_nl_dot_product_r50_32x2x1_100e_kinetics400_rgb/20200814_044208.log
)
|
[
json
](
https://openmmlab.oss-accelerate.aliyuncs.com/mmaction/recognition/i3d/i3d_nl_dot_product_r50_32x2x1_100e_kinetics400_rgb/20200814_044208.log.json
)
|
Notes:
1.
The
**gpus**
indicates the number of gpu we used to get the checkpoint. It is noteworthy that the configs we provide are used for 8 gpus as default.
...
...
configs/recognition/i3d/i3d_nl_dot_product_r50_32x2x1_100e_kinetics400_rgb.py
0 → 100644
浏览文件 @
4f422df0
# model settings
model
=
dict
(
type
=
'Recognizer3D'
,
backbone
=
dict
(
type
=
'ResNet3d'
,
pretrained2d
=
True
,
pretrained
=
'torchvision://resnet50'
,
depth
=
50
,
conv_cfg
=
dict
(
type
=
'Conv3d'
),
norm_eval
=
False
,
inflate
=
((
1
,
1
,
1
),
(
1
,
0
,
1
,
0
),
(
1
,
0
,
1
,
0
,
1
,
0
),
(
0
,
1
,
0
)),
non_local
=
((
0
,
0
,
0
),
(
0
,
1
,
0
,
1
),
(
0
,
1
,
0
,
1
,
0
,
1
),
(
0
,
0
,
0
)),
non_local_cfg
=
dict
(
sub_sample
=
True
,
use_scale
=
False
,
norm_cfg
=
dict
(
type
=
'BN3d'
,
requires_grad
=
True
),
mode
=
'dot_product'
),
zero_init_residual
=
False
),
cls_head
=
dict
(
type
=
'I3DHead'
,
num_classes
=
400
,
in_channels
=
2048
,
spatial_type
=
'avg'
,
dropout_ratio
=
0.5
,
init_std
=
0.01
))
# model training and testing settings
train_cfg
=
None
test_cfg
=
dict
(
average_clips
=
None
)
# dataset settings
dataset_type
=
'RawframeDataset'
data_root
=
'data/kinetics400/rawframes_train'
data_root_val
=
'data/kinetics400/rawframes_val'
ann_file_train
=
'data/kinetics400/kinetics400_train_list_rawframes.txt'
ann_file_val
=
'data/kinetics400/kinetics400_val_list_rawframes.txt'
ann_file_test
=
'data/kinetics400/kinetics400_val_list_rawframes.txt'
img_norm_cfg
=
dict
(
mean
=
[
123.675
,
116.28
,
103.53
],
std
=
[
58.395
,
57.12
,
57.375
],
to_bgr
=
False
)
train_pipeline
=
[
dict
(
type
=
'SampleFrames'
,
clip_len
=
32
,
frame_interval
=
2
,
num_clips
=
1
),
dict
(
type
=
'RawFrameDecode'
),
dict
(
type
=
'Resize'
,
scale
=
(
-
1
,
256
)),
dict
(
type
=
'MultiScaleCrop'
,
input_size
=
224
,
scales
=
(
1
,
0.8
),
random_crop
=
False
,
max_wh_scale_gap
=
0
),
dict
(
type
=
'Resize'
,
scale
=
(
224
,
224
),
keep_ratio
=
False
),
dict
(
type
=
'Flip'
,
flip_ratio
=
0.5
),
dict
(
type
=
'Normalize'
,
**
img_norm_cfg
),
dict
(
type
=
'FormatShape'
,
input_format
=
'NCTHW'
),
dict
(
type
=
'Collect'
,
keys
=
[
'imgs'
,
'label'
],
meta_keys
=
[]),
dict
(
type
=
'ToTensor'
,
keys
=
[
'imgs'
,
'label'
])
]
val_pipeline
=
[
dict
(
type
=
'SampleFrames'
,
clip_len
=
32
,
frame_interval
=
2
,
num_clips
=
1
,
test_mode
=
True
),
dict
(
type
=
'RawFrameDecode'
),
dict
(
type
=
'Resize'
,
scale
=
(
-
1
,
256
)),
dict
(
type
=
'CenterCrop'
,
crop_size
=
224
),
dict
(
type
=
'Flip'
,
flip_ratio
=
0
),
dict
(
type
=
'Normalize'
,
**
img_norm_cfg
),
dict
(
type
=
'FormatShape'
,
input_format
=
'NCTHW'
),
dict
(
type
=
'Collect'
,
keys
=
[
'imgs'
,
'label'
],
meta_keys
=
[]),
dict
(
type
=
'ToTensor'
,
keys
=
[
'imgs'
])
]
test_pipeline
=
[
dict
(
type
=
'SampleFrames'
,
clip_len
=
32
,
frame_interval
=
2
,
num_clips
=
10
,
test_mode
=
True
),
dict
(
type
=
'RawFrameDecode'
),
dict
(
type
=
'Resize'
,
scale
=
(
-
1
,
256
)),
dict
(
type
=
'ThreeCrop'
,
crop_size
=
256
),
dict
(
type
=
'Flip'
,
flip_ratio
=
0
),
dict
(
type
=
'Normalize'
,
**
img_norm_cfg
),
dict
(
type
=
'FormatShape'
,
input_format
=
'NCTHW'
),
dict
(
type
=
'Collect'
,
keys
=
[
'imgs'
,
'label'
],
meta_keys
=
[]),
dict
(
type
=
'ToTensor'
,
keys
=
[
'imgs'
])
]
data
=
dict
(
videos_per_gpu
=
8
,
workers_per_gpu
=
4
,
train
=
dict
(
type
=
dataset_type
,
ann_file
=
ann_file_train
,
data_prefix
=
data_root
,
pipeline
=
train_pipeline
),
val
=
dict
(
type
=
dataset_type
,
ann_file
=
ann_file_val
,
data_prefix
=
data_root_val
,
pipeline
=
val_pipeline
),
test
=
dict
(
type
=
dataset_type
,
ann_file
=
ann_file_val
,
data_prefix
=
data_root_val
,
pipeline
=
test_pipeline
))
# optimizer
optimizer
=
dict
(
type
=
'SGD'
,
lr
=
0.01
,
momentum
=
0.9
,
weight_decay
=
0.0001
)
# this lr is used for 8 gpus
optimizer_config
=
dict
(
grad_clip
=
dict
(
max_norm
=
40
,
norm_type
=
2
))
# learning policy
lr_config
=
dict
(
policy
=
'step'
,
step
=
[
40
,
80
])
total_epochs
=
100
checkpoint_config
=
dict
(
interval
=
5
)
evaluation
=
dict
(
interval
=
5
,
metrics
=
[
'top_k_accuracy'
,
'mean_class_accuracy'
],
topk
=
(
1
,
5
))
log_config
=
dict
(
interval
=
20
,
hooks
=
[
dict
(
type
=
'TextLoggerHook'
),
# dict(type='TensorboardLoggerHook'),
])
# runtime settings
dist_params
=
dict
(
backend
=
'nccl'
)
log_level
=
'INFO'
work_dir
=
'./work_dirs/i3d_nl_dot_product_r50_32x2x1_100e_kinetics400_rgb/'
load_from
=
None
resume_from
=
None
workflow
=
[(
'train'
,
1
)]
configs/recognition/i3d/i3d_nl_r50_32x2x1_100e_kinetics400_rgb.py
→
configs/recognition/i3d/i3d_nl_
embedded_gaussian_
r50_32x2x1_100e_kinetics400_rgb.py
浏览文件 @
4f422df0
...
...
@@ -122,7 +122,7 @@ log_config = dict(
# runtime settings
dist_params
=
dict
(
backend
=
'nccl'
)
log_level
=
'INFO'
work_dir
=
'./work_dirs/i3d_nl_
r50_32x2x1_100e_kinetics400_rgb/'
work_dir
=
'./work_dirs/i3d_nl_
embedded_gaussian_r50_32x2x1_100e_kinetics400_rgb/'
# noqa: E501
load_from
=
None
resume_from
=
None
workflow
=
[(
'train'
,
1
)]
configs/recognition/i3d/i3d_nl_gaussian_r50_32x2x1_100e_kinetics400_rgb.py
0 → 100644
浏览文件 @
4f422df0
# model settings
model
=
dict
(
type
=
'Recognizer3D'
,
backbone
=
dict
(
type
=
'ResNet3d'
,
pretrained2d
=
True
,
pretrained
=
'torchvision://resnet50'
,
depth
=
50
,
conv_cfg
=
dict
(
type
=
'Conv3d'
),
norm_eval
=
False
,
inflate
=
((
1
,
1
,
1
),
(
1
,
0
,
1
,
0
),
(
1
,
0
,
1
,
0
,
1
,
0
),
(
0
,
1
,
0
)),
non_local
=
((
0
,
0
,
0
),
(
0
,
1
,
0
,
1
),
(
0
,
1
,
0
,
1
,
0
,
1
),
(
0
,
0
,
0
)),
non_local_cfg
=
dict
(
sub_sample
=
True
,
use_scale
=
False
,
norm_cfg
=
dict
(
type
=
'BN3d'
,
requires_grad
=
True
),
mode
=
'gaussian'
),
zero_init_residual
=
False
),
cls_head
=
dict
(
type
=
'I3DHead'
,
num_classes
=
400
,
in_channels
=
2048
,
spatial_type
=
'avg'
,
dropout_ratio
=
0.5
,
init_std
=
0.01
))
# model training and testing settings
train_cfg
=
None
test_cfg
=
dict
(
average_clips
=
None
)
# dataset settings
dataset_type
=
'RawframeDataset'
data_root
=
'data/kinetics400/rawframes_train'
data_root_val
=
'data/kinetics400/rawframes_val'
ann_file_train
=
'data/kinetics400/kinetics400_train_list_rawframes.txt'
ann_file_val
=
'data/kinetics400/kinetics400_val_list_rawframes.txt'
ann_file_test
=
'data/kinetics400/kinetics400_val_list_rawframes.txt'
img_norm_cfg
=
dict
(
mean
=
[
123.675
,
116.28
,
103.53
],
std
=
[
58.395
,
57.12
,
57.375
],
to_bgr
=
False
)
train_pipeline
=
[
dict
(
type
=
'SampleFrames'
,
clip_len
=
32
,
frame_interval
=
2
,
num_clips
=
1
),
dict
(
type
=
'RawFrameDecode'
),
dict
(
type
=
'Resize'
,
scale
=
(
-
1
,
256
)),
dict
(
type
=
'MultiScaleCrop'
,
input_size
=
224
,
scales
=
(
1
,
0.8
),
random_crop
=
False
,
max_wh_scale_gap
=
0
),
dict
(
type
=
'Resize'
,
scale
=
(
224
,
224
),
keep_ratio
=
False
),
dict
(
type
=
'Flip'
,
flip_ratio
=
0.5
),
dict
(
type
=
'Normalize'
,
**
img_norm_cfg
),
dict
(
type
=
'FormatShape'
,
input_format
=
'NCTHW'
),
dict
(
type
=
'Collect'
,
keys
=
[
'imgs'
,
'label'
],
meta_keys
=
[]),
dict
(
type
=
'ToTensor'
,
keys
=
[
'imgs'
,
'label'
])
]
val_pipeline
=
[
dict
(
type
=
'SampleFrames'
,
clip_len
=
32
,
frame_interval
=
2
,
num_clips
=
1
,
test_mode
=
True
),
dict
(
type
=
'RawFrameDecode'
),
dict
(
type
=
'Resize'
,
scale
=
(
-
1
,
256
)),
dict
(
type
=
'CenterCrop'
,
crop_size
=
224
),
dict
(
type
=
'Flip'
,
flip_ratio
=
0
),
dict
(
type
=
'Normalize'
,
**
img_norm_cfg
),
dict
(
type
=
'FormatShape'
,
input_format
=
'NCTHW'
),
dict
(
type
=
'Collect'
,
keys
=
[
'imgs'
,
'label'
],
meta_keys
=
[]),
dict
(
type
=
'ToTensor'
,
keys
=
[
'imgs'
])
]
test_pipeline
=
[
dict
(
type
=
'SampleFrames'
,
clip_len
=
32
,
frame_interval
=
2
,
num_clips
=
10
,
test_mode
=
True
),
dict
(
type
=
'RawFrameDecode'
),
dict
(
type
=
'Resize'
,
scale
=
(
-
1
,
256
)),
dict
(
type
=
'ThreeCrop'
,
crop_size
=
256
),
dict
(
type
=
'Flip'
,
flip_ratio
=
0
),
dict
(
type
=
'Normalize'
,
**
img_norm_cfg
),
dict
(
type
=
'FormatShape'
,
input_format
=
'NCTHW'
),
dict
(
type
=
'Collect'
,
keys
=
[
'imgs'
,
'label'
],
meta_keys
=
[]),
dict
(
type
=
'ToTensor'
,
keys
=
[
'imgs'
])
]
data
=
dict
(
videos_per_gpu
=
8
,
workers_per_gpu
=
4
,
train
=
dict
(
type
=
dataset_type
,
ann_file
=
ann_file_train
,
data_prefix
=
data_root
,
pipeline
=
train_pipeline
),
val
=
dict
(
type
=
dataset_type
,
ann_file
=
ann_file_val
,
data_prefix
=
data_root_val
,
pipeline
=
val_pipeline
),
test
=
dict
(
type
=
dataset_type
,
ann_file
=
ann_file_val
,
data_prefix
=
data_root_val
,
pipeline
=
test_pipeline
))
# optimizer
optimizer
=
dict
(
type
=
'SGD'
,
lr
=
0.01
,
momentum
=
0.9
,
weight_decay
=
0.0001
)
# this lr is used for 8 gpus
optimizer_config
=
dict
(
grad_clip
=
dict
(
max_norm
=
40
,
norm_type
=
2
))
# learning policy
lr_config
=
dict
(
policy
=
'step'
,
step
=
[
40
,
80
])
total_epochs
=
100
checkpoint_config
=
dict
(
interval
=
5
)
evaluation
=
dict
(
interval
=
5
,
metrics
=
[
'top_k_accuracy'
,
'mean_class_accuracy'
],
topk
=
(
1
,
5
))
log_config
=
dict
(
interval
=
20
,
hooks
=
[
dict
(
type
=
'TextLoggerHook'
),
# dict(type='TensorboardLoggerHook'),
])
# runtime settings
dist_params
=
dict
(
backend
=
'nccl'
)
log_level
=
'INFO'
work_dir
=
'./work_dirs/i3d_nl_gaussian_r50_32x2x1_100e_kinetics400_rgb/'
load_from
=
None
resume_from
=
None
workflow
=
[(
'train'
,
1
)]
configs/recognition/tsm/README.md
浏览文件 @
4f422df0
# TSM
## Introduction
```
@inproceedings{lin2019tsm,
title={TSM: Temporal Shift Module for Efficient Video Understanding},
author={Lin, Ji and Gan, Chuang and Han, Song},
booktitle={Proceedings of the IEEE International Conference on Computer Vision},
year={2019}
}
@article{NonLocal2018,
author = {Xiaolong Wang and Ross Girshick and Abhinav Gupta and Kaiming He},
title = {Non-local Neural Networks},
journal = {CVPR},
year = {2018}
}
```
## Model Zoo
### Kinetics-400
...
...
@@ -13,6 +30,9 @@
|
[
tsm_r50_dense_1x1x8_100e_kinetics400_rgb
](
/configs/recognition/tsm/tsm_r50_dense_1x1x8_100e_kinetics400_rgb.py
)
|short-side 256|8| ResNet50 | ImageNet|73.38|91.02|x|x|x|7079|
[
ckpt
](
https://openmmlab.oss-accelerate.aliyuncs.com/mmaction/recognition/tsm/tsm_r50_dense_256p_1x1x8_100e_kinetics400_rgb/tsm_r50_dense_256p_1x1x8_100e_kinetics400_rgb_20200727-e1e0c785.pth
)
|
[
log
](
https://openmmlab.oss-accelerate.aliyuncs.com/mmaction/recognition/tsm/tsm_r50_dense_256p_1x1x8_100e_kinetics400_rgb/20200725_032043.log
)
|
[
json
](
https://openmmlab.oss-accelerate.aliyuncs.com/mmaction/recognition/tsm/tsm_r50_dense_256p_1x1x8_100e_kinetics400_rgb/20200725_032043.log.json
)
|
|
[
tsm_r50_1x1x16_50e_kinetics400_rgb
](
/configs/recognition/tsm/tsm_r50_1x1x16_50e_kinetics400_rgb.py
)
|340x256|8| ResNet50| ImageNet |71.69|90.4|
[
70.67
](
https://github.com/mit-han-lab/temporal-shift-module/blob/8d53d6fda40bea2f1b37a6095279c4b454d672bd/scripts/train_tsm_kinetics_rgb_16f.sh
)
|
[
89.98
](
https://github.com/mit-han-lab/temporal-shift-module/blob/8d53d6fda40bea2f1b37a6095279c4b454d672bd/scripts/train_tsm_kinetics_rgb_16f.sh
)
|47.0 (16x1 frames)| 10404 |
[
ckpt
](
https://openmmlab.oss-accelerate.aliyuncs.com/mmaction/recognition/tsm/tsm_r50_1x1x16_50e_kinetics400_rgb/tsm_r50_1x1x16_50e_kinetics400_rgb_20200607-f731bffc.pth
)
|
[
log
](
https://openmmlab.oss-accelerate.aliyuncs.com/mmaction/recognition/tsm/tsm_r50_1x1x16_50e_kinetics400_rgb/20200607_221310.log
)
|
[
json
](
https://openmmlab.oss-accelerate.aliyuncs.com/mmaction/recognition/tsm/tsm_r50_1x1x16_50e_kinetics400_rgb/20200607_221310.log.json
)
|
|
[
tsm_r50_1x1x16_50e_kinetics400_rgb
](
/configs/recognition/tsm/tsm_r50_1x1x16_50e_kinetics400_rgb.py
)
|short-side 256|8| ResNet50| ImageNet |72.01|90.57|x|x|x|10398|
[
ckpt
](
https://openmmlab.oss-accelerate.aliyuncs.com/mmaction/recognition/tsm/tsm_r50_256p_1x1x16_50e_kinetics400_rgb/tsm_r50_256p_1x1x16_50e_kinetics400_rgb_20200727-b414aa3c.pth
)
|
[
log
](
https://openmmlab.oss-accelerate.aliyuncs.com/mmaction/recognition/tsm/tsm_r50_256p_1x1x16_50e_kinetics400_rgb/20200725_031232.log
)
|
[
json
](
https://openmmlab.oss-accelerate.aliyuncs.com/mmaction/recognition/tsm/tsm_r50_256p_1x1x16_50e_kinetics400_rgb/20200725_031232.log.json
)
|
|
[
tsm_nl_embedded_gaussian_r50_1x1x8_50e_kinetics400_rgb
](
/configs/recognition/tsm/tsm_nl_embedded_gaussian_r50_1x1x8_50e_kinetics400_rgb.py
)
|short-side 320|8x4| ResNet50| ImageNet |72.03|90.25|71.81|90.36|x|8931|
[
ckpt
](
https://openmmlab.oss-accelerate.aliyuncs.com/mmaction/recognition/tsm/tsm_nl_embedded_gaussian_r50_1x1x8_50e_kinetics400_rgb/tsm_nl_embedded_gaussian_r50_1x1x8_50e_kinetics400_rgb_20200724-f00f1336.pth
)
|
[
log
](
https://openmmlab.oss-accelerate.aliyuncs.com/mmaction/recognition/tsm/tsm_nl_embedded_gaussian_r50_1x1x8_50e_kinetics400_rgb/20200724_120023.log
)
|
[
json
](
https://openmmlab.oss-accelerate.aliyuncs.com/mmaction/recognition/tsm/tsm_nl_embedded_gaussian_r50_1x1x8_50e_kinetics400_rgb/20200724_120023.log.json
)
|
|
[
tsm_nl_gaussian_r50_1x1x8_50e_kinetics400_rgb
](
/configs/recognition/tsm/tsm_nl_gaussian_r50_1x1x8_50e_kinetics400_rgb.py
)
|short-side 320|8x4| ResNet50| ImageNet |70.70|89.90|x|x|x|10125|
[
ckpt
](
https://openmmlab.oss-accelerate.aliyuncs.com/mmaction/recognition/tsm/tsm_nl_gaussian_r50_1x1x8_50e_kinetics400_rgb/tsm_nl_gaussian_r50_1x1x8_50e_kinetics400_rgb_20200816-b93fd297.pth
)
|
[
log
](
https://openmmlab.oss-accelerate.aliyuncs.com/mmaction/recognition/tsm/tsm_nl_gaussian_r50_1x1x8_50e_kinetics400_rgb/20200815_210253.log
)
|
[
json
](
https://openmmlab.oss-accelerate.aliyuncs.com/mmaction/recognition/tsm/tsm_nl_gaussian_r50_1x1x8_50e_kinetics400_rgb/20200815_210253.log.json
)
|
|
[
tsm_nl_dot_product_r50_1x1x8_50e_kinetics400_rgb
](
/configs/recognition/tsm/tsm_nl_dot_product_r50_1x1x8_50e_kinetics400_rgb.py
)
|short-side 320|8x4|ResNet50| ImageNet |71.60|90.34|x|x|x|8358|
[
ckpt
](
https://openmmlab.oss-accelerate.aliyuncs.com/mmaction/recognition/tsm/tsm_nl_dot_product_r50_1x1x8_50e_kinetics400_rgb/tsm_nl_dot_product_r50_1x1x8_50e_kinetics400_rgb_20200724-d8ad84d2.pth
)
|
[
log
](
https://openmmlab.oss-accelerate.aliyuncs.com/mmaction/recognition/tsm/tsm_nl_dot_product_r50_1x1x8_50e_kinetics400_rgb/20200723_220442.log
)
|
[
json
](
https://openmmlab.oss-accelerate.aliyuncs.com/mmaction/recognition/tsm/tsm_nl_dot_product_r50_1x1x8_50e_kinetics400_rgb/20200723_220442.log.json
)
|
### Something-Something V1
...
...
configs/recognition/tsm/tsm_nl_dot_product_r50_1x1x8_50e_kinetics400_rgb.py
0 → 100644
浏览文件 @
4f422df0
# model settings
model
=
dict
(
type
=
'Recognizer2D'
,
backbone
=
dict
(
type
=
'ResNetTSM'
,
pretrained
=
'torchvision://resnet50'
,
depth
=
50
,
norm_eval
=
False
,
non_local
=
((
0
,
0
,
0
),
(
1
,
0
,
1
,
0
),
(
1
,
0
,
1
,
0
,
1
,
0
),
(
0
,
0
,
0
)),
non_local_cfg
=
dict
(
sub_sample
=
True
,
use_scale
=
False
,
norm_cfg
=
dict
(
type
=
'BN3d'
,
requires_grad
=
True
),
mode
=
'dot_product'
),
shift_div
=
8
),
cls_head
=
dict
(
type
=
'TSMHead'
,
num_classes
=
400
,
in_channels
=
2048
,
spatial_type
=
'avg'
,
consensus
=
dict
(
type
=
'AvgConsensus'
,
dim
=
1
),
dropout_ratio
=
0.5
,
init_std
=
0.001
,
is_shift
=
True
))
# model training and testing settings
train_cfg
=
None
test_cfg
=
dict
(
average_clips
=
None
)
# dataset settings
dataset_type
=
'RawframeDataset'
data_root
=
'data/kinetics400/rawframes_train'
data_root_val
=
'data/kinetics400/rawframes_val'
ann_file_train
=
'data/kinetics400/kinetics400_train_list_rawframes.txt'
ann_file_val
=
'data/kinetics400/kinetics400_val_list_rawframes.txt'
ann_file_test
=
'data/kinetics400/kinetics400_val_list_rawframes.txt'
img_norm_cfg
=
dict
(
mean
=
[
123.675
,
116.28
,
103.53
],
std
=
[
58.395
,
57.12
,
57.375
],
to_bgr
=
False
)
train_pipeline
=
[
dict
(
type
=
'SampleFrames'
,
clip_len
=
1
,
frame_interval
=
1
,
num_clips
=
8
),
dict
(
type
=
'RawFrameDecode'
),
dict
(
type
=
'Resize'
,
scale
=
(
-
1
,
256
)),
dict
(
type
=
'MultiScaleCrop'
,
input_size
=
224
,
scales
=
(
1
,
0.875
,
0.75
,
0.66
),
random_crop
=
False
,
max_wh_scale_gap
=
1
,
num_fixed_crops
=
13
),
dict
(
type
=
'Resize'
,
scale
=
(
224
,
224
),
keep_ratio
=
False
),
dict
(
type
=
'Flip'
,
flip_ratio
=
0.5
),
dict
(
type
=
'Normalize'
,
**
img_norm_cfg
),
dict
(
type
=
'FormatShape'
,
input_format
=
'NCHW'
),
dict
(
type
=
'Collect'
,
keys
=
[
'imgs'
,
'label'
],
meta_keys
=
[]),
dict
(
type
=
'ToTensor'
,
keys
=
[
'imgs'
,
'label'
])
]
val_pipeline
=
[
dict
(
type
=
'SampleFrames'
,
clip_len
=
1
,
frame_interval
=
1
,
num_clips
=
8
,
test_mode
=
True
),
dict
(
type
=
'RawFrameDecode'
),
dict
(
type
=
'Resize'
,
scale
=
(
-
1
,
256
)),
dict
(
type
=
'CenterCrop'
,
crop_size
=
224
),
dict
(
type
=
'Flip'
,
flip_ratio
=
0
),
dict
(
type
=
'Normalize'
,
**
img_norm_cfg
),
dict
(
type
=
'FormatShape'
,
input_format
=
'NCHW'
),
dict
(
type
=
'Collect'
,
keys
=
[
'imgs'
,
'label'
],
meta_keys
=
[]),
dict
(
type
=
'ToTensor'
,
keys
=
[
'imgs'
])
]
test_pipeline
=
[
dict
(
type
=
'SampleFrames'
,
clip_len
=
1
,
frame_interval
=
1
,
num_clips
=
8
,
test_mode
=
True
),
dict
(
type
=
'RawFrameDecode'
),
dict
(
type
=
'Resize'
,
scale
=
(
-
1
,
256
)),
dict
(
type
=
'CenterCrop'
,
crop_size
=
224
),
dict
(
type
=
'Flip'
,
flip_ratio
=
0
),
dict
(
type
=
'Normalize'
,
**
img_norm_cfg
),
dict
(
type
=
'FormatShape'
,
input_format
=
'NCHW'
),
dict
(
type
=
'Collect'
,
keys
=
[
'imgs'
,
'label'
],
meta_keys
=
[]),
dict
(
type
=
'ToTensor'
,
keys
=
[
'imgs'
])
]
data
=
dict
(
videos_per_gpu
=
8
,
workers_per_gpu
=
4
,
train
=
dict
(
type
=
dataset_type
,
ann_file
=
ann_file_train
,
data_prefix
=
data_root
,
pipeline
=
train_pipeline
),
val
=
dict
(
type
=
dataset_type
,
ann_file
=
ann_file_val
,
data_prefix
=
data_root_val
,
pipeline
=
val_pipeline
),
test
=
dict
(
type
=
dataset_type
,
ann_file
=
ann_file_test
,
data_prefix
=
data_root_val
,
pipeline
=
test_pipeline
))
# optimizer
optimizer
=
dict
(
type
=
'SGD'
,
constructor
=
'TSMOptimizerConstructor'
,
paramwise_cfg
=
dict
(
fc_lr5
=
True
),
lr
=
0.01
,
# this lr is used for 8 gpus
momentum
=
0.9
,
weight_decay
=
0.0001
)
optimizer_config
=
dict
(
grad_clip
=
dict
(
max_norm
=
20
,
norm_type
=
2
))
# learning policy
lr_config
=
dict
(
policy
=
'step'
,
step
=
[
20
,
40
])
total_epochs
=
50
checkpoint_config
=
dict
(
interval
=
1
)
evaluation
=
dict
(
interval
=
5
,
metrics
=
[
'top_k_accuracy'
,
'mean_class_accuracy'
],
topk
=
(
1
,
5
))
log_config
=
dict
(
interval
=
20
,
hooks
=
[
dict
(
type
=
'TextLoggerHook'
),
# dict(type='TensorboardLoggerHook'),
])
# runtime settings
dist_params
=
dict
(
backend
=
'nccl'
)
log_level
=
'INFO'
work_dir
=
'./work_dirs/tsm_nl_gaussian_r50_1x1x8_50e_kinetics400_rgb/'
load_from
=
None
resume_from
=
None
workflow
=
[(
'train'
,
1
)]
configs/recognition/tsm/tsm_nl_r50_1x1x8_50e_kinetics400_rgb.py
→
configs/recognition/tsm/tsm_nl_
embedded_gaussian_
r50_1x1x8_50e_kinetics400_rgb.py
浏览文件 @
4f422df0
...
...
@@ -126,7 +126,7 @@ log_config = dict(
# runtime settings
dist_params
=
dict
(
backend
=
'nccl'
)
log_level
=
'INFO'
work_dir
=
'./work_dirs/tsm_nl_
r50_1x1x8_50e_kinetics400_rgb/'
work_dir
=
'./work_dirs/tsm_nl_
embedded_gaussian_r50_1x1x8_50e_kinetics400_rgb/'
# noqa: E501
load_from
=
None
resume_from
=
None
workflow
=
[(
'train'
,
1
)]
configs/recognition/tsm/tsm_nl_gaussian_r50_1x1x8_50e_kinetics400_rgb.py
0 → 100644
浏览文件 @
4f422df0
# model settings
model
=
dict
(
type
=
'Recognizer2D'
,
backbone
=
dict
(
type
=
'ResNetTSM'
,
pretrained
=
'torchvision://resnet50'
,
depth
=
50
,
norm_eval
=
False
,
non_local
=
((
0
,
0
,
0
),
(
1
,
0
,
1
,
0
),
(
1
,
0
,
1
,
0
,
1
,
0
),
(
0
,
0
,
0
)),
non_local_cfg
=
dict
(
sub_sample
=
True
,
use_scale
=
False
,
norm_cfg
=
dict
(
type
=
'BN3d'
,
requires_grad
=
True
),
mode
=
'gaussian'
),
shift_div
=
8
),
cls_head
=
dict
(
type
=
'TSMHead'
,
num_classes
=
400
,
in_channels
=
2048
,
spatial_type
=
'avg'
,
consensus
=
dict
(
type
=
'AvgConsensus'
,
dim
=
1
),
dropout_ratio
=
0.5
,
init_std
=
0.001
,
is_shift
=
True
))
# model training and testing settings
train_cfg
=
None
test_cfg
=
dict
(
average_clips
=
None
)
# dataset settings
dataset_type
=
'RawframeDataset'
data_root
=
'data/kinetics400/rawframes_train'
data_root_val
=
'data/kinetics400/rawframes_val'
ann_file_train
=
'data/kinetics400/kinetics400_train_list_rawframes.txt'
ann_file_val
=
'data/kinetics400/kinetics400_val_list_rawframes.txt'
ann_file_test
=
'data/kinetics400/kinetics400_val_list_rawframes.txt'
img_norm_cfg
=
dict
(
mean
=
[
123.675
,
116.28
,
103.53
],
std
=
[
58.395
,
57.12
,
57.375
],
to_bgr
=
False
)
train_pipeline
=
[
dict
(
type
=
'SampleFrames'
,
clip_len
=
1
,
frame_interval
=
1
,
num_clips
=
8
),
dict
(
type
=
'RawFrameDecode'
),
dict
(
type
=
'Resize'
,
scale
=
(
-
1
,
256
)),
dict
(
type
=
'MultiScaleCrop'
,
input_size
=
224
,
scales
=
(
1
,
0.875
,
0.75
,
0.66
),
random_crop
=
False
,
max_wh_scale_gap
=
1
,
num_fixed_crops
=
13
),
dict
(
type
=
'Resize'
,
scale
=
(
224
,
224
),
keep_ratio
=
False
),
dict
(
type
=
'Flip'
,
flip_ratio
=
0.5
),
dict
(
type
=
'Normalize'
,
**
img_norm_cfg
),
dict
(
type
=
'FormatShape'
,
input_format
=
'NCHW'
),
dict
(
type
=
'Collect'
,
keys
=
[
'imgs'
,
'label'
],
meta_keys
=
[]),
dict
(
type
=
'ToTensor'
,
keys
=
[
'imgs'
,
'label'
])
]
val_pipeline
=
[
dict
(
type
=
'SampleFrames'
,
clip_len
=
1
,
frame_interval
=
1
,
num_clips
=
8
,
test_mode
=
True
),
dict
(
type
=
'RawFrameDecode'
),
dict
(
type
=
'Resize'
,
scale
=
(
-
1
,
256
)),
dict
(
type
=
'CenterCrop'
,
crop_size
=
224
),
dict
(
type
=
'Flip'
,
flip_ratio
=
0
),
dict
(
type
=
'Normalize'
,
**
img_norm_cfg
),
dict
(
type
=
'FormatShape'
,
input_format
=
'NCHW'
),
dict
(
type
=
'Collect'
,
keys
=
[
'imgs'
,
'label'
],
meta_keys
=
[]),
dict
(
type
=
'ToTensor'
,
keys
=
[
'imgs'
])
]
test_pipeline
=
[
dict
(
type
=
'SampleFrames'
,
clip_len
=
1
,
frame_interval
=
1
,
num_clips
=
8
,
test_mode
=
True
),
dict
(
type
=
'RawFrameDecode'
),
dict
(
type
=
'Resize'
,
scale
=
(
-
1
,
256
)),
dict
(
type
=
'CenterCrop'
,
crop_size
=
224
),
dict
(
type
=
'Flip'
,
flip_ratio
=
0
),
dict
(
type
=
'Normalize'
,
**
img_norm_cfg
),
dict
(
type
=
'FormatShape'
,
input_format
=
'NCHW'
),
dict
(
type
=
'Collect'
,
keys
=
[
'imgs'
,
'label'
],
meta_keys
=
[]),
dict
(
type
=
'ToTensor'
,
keys
=
[
'imgs'
])
]
data
=
dict
(
videos_per_gpu
=
8
,
workers_per_gpu
=
4
,
train
=
dict
(
type
=
dataset_type
,
ann_file
=
ann_file_train
,
data_prefix
=
data_root
,
pipeline
=
train_pipeline
),
val
=
dict
(
type
=
dataset_type
,
ann_file
=
ann_file_val
,
data_prefix
=
data_root_val
,
pipeline
=
val_pipeline
),
test
=
dict
(
type
=
dataset_type
,
ann_file
=
ann_file_test
,
data_prefix
=
data_root_val
,
pipeline
=
test_pipeline
))
# optimizer
optimizer
=
dict
(
type
=
'SGD'
,
constructor
=
'TSMOptimizerConstructor'
,
paramwise_cfg
=
dict
(
fc_lr5
=
True
),
lr
=
0.01
,
# this lr is used for 8 gpus
momentum
=
0.9
,
weight_decay
=
0.0001
)
optimizer_config
=
dict
(
grad_clip
=
dict
(
max_norm
=
20
,
norm_type
=
2
))
# learning policy
lr_config
=
dict
(
policy
=
'step'
,
step
=
[
20
,
40
])
total_epochs
=
50
checkpoint_config
=
dict
(
interval
=
1
)
evaluation
=
dict
(
interval
=
5
,
metrics
=
[
'top_k_accuracy'
,
'mean_class_accuracy'
],
topk
=
(
1
,
5
))
log_config
=
dict
(
interval
=
20
,
hooks
=
[
dict
(
type
=
'TextLoggerHook'
),
# dict(type='TensorboardLoggerHook'),
])
# runtime settings
dist_params
=
dict
(
backend
=
'nccl'
)
log_level
=
'INFO'
work_dir
=
'./work_dirs/tsm_nl_gaussian_r50_1x1x8_50e_kinetics400_rgb/'
load_from
=
None
resume_from
=
None
workflow
=
[(
'train'
,
1
)]
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录