未验证 提交 b96aab9a 编写于 作者: S su 提交者: GitHub

rebenchmark for Pyslowfast comparison (#28)

* add resolution for K400

* rebenchmark on 32G V100

* add new i3d config and i3d benchmakr

minor

minor
Co-authored-by: Nlinjintao <linjintao@sensetime.com>
上级 3aefca87
......@@ -65,12 +65,13 @@ We compare with other popular codebases and the [results](https://mmaction2.read
| Model | MMAction2 (s/iter) | MMAction (s/iter) | Temporal-Shift-Module (s/iter) | PySlowFast (s/iter) |
| :--- | :---------------: | :--------------------: | :----------------------------: | :-----------------: |
| [TSN](/configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py) | **0.29** | 0.36 | 0.45 | x |
| [I3D (setting1)](/configs/recognition/i3d/i3d_r50_32x2x1_100e_kinetics400_rgb.py) | **0.45** | 0.58 | x | x |
| [I3D (setting2)](/configs/recognition/i3d/i3d_r50_8x8x1_100e_kinetics400_rgb.py) | **0.32** | x | x | 0.56 |
| [I3D (video)](/configs/recognition/i3d/i3d_r50_video_8x8x1_100e_kinetics400_rgb.py) | **0.31** | x | x | 0.59 |
| [I3D (rawframe)](/configs/recognition/i3d/i3d_r50_32x2x1_100e_kinetics400_rgb.py) | **0.45** | 0.58 | x | x |
| [TSM](/configs/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb.py) | **0.30** | x | 0.38 | x |
| [Slowonly](/configs/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb.py) | **0.30** | x | x | 1.03 |
| [Slowfast](/configs/recognition/slowfast/slowfast_r50_4x16x1_256e_kinetics400_rgb.py) | **0.80** | x | x | 1.40 |
| [R(2+1)D](/configs/recognition/r2plus1d/r2plus1d_r34_8x8x1_180e_kinetics400_rgb.py) | **0.48** | x | x | x |
| [Slowonly](/configs/recognition/slowonly/slowonly_r50_video_4x16x1_256e_kinetics400_rgb.py) | **0.27** | x | x | 0.89 |
| [Slowfast](/configs/recognition/slowfast/slowfast_r50_video_4x16x1_256e_kinetics400_rgb.py) | **0.68** | x | x | 1.07 |
| [R(2+1)D](/configs/recognition/r2plus1d/r2plus1d_r34_video_8x8x1_180e_kinetics400_rgb.py) | **0.45** | x | x | x |
Supported methods for action recognition:
- [x] [TSN](configs/recognition/tsn/README.md)
......
# model settings
model = dict(
type='Recognizer3D',
backbone=dict(
type='ResNet3d',
pretrained2d=True,
pretrained='torchvision://resnet50',
depth=50,
conv_cfg=dict(type='Conv3d'),
norm_eval=False,
inflate=((1, 1, 1), (1, 0, 1, 0), (1, 0, 1, 0, 1, 0), (0, 1, 0)),
zero_init_residual=False),
cls_head=dict(
type='I3DHead',
num_classes=400,
in_channels=2048,
spatial_type='avg',
dropout_ratio=0.5,
init_std=0.01))
# model training and testing settings
train_cfg = None
test_cfg = dict(average_clips=None)
# dataset settings
dataset_type = 'VideoDataset'
data_root = 'data/kinetics400/videos_train'
data_root_val = 'data/kinetics400/videos_val'
ann_file_train = 'data/kinetics400/kinetics400_train_list_videos.txt'
ann_file_val = 'data/kinetics400/kinetics400_val_list_videos.txt'
ann_file_test = 'data/kinetics400/kinetics400_val_list_videos.txt'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False)
train_pipeline = [
dict(type='DecordInit'),
dict(type='SampleFrames', clip_len=8, frame_interval=8, num_clips=1),
dict(type='DecordDecode'),
dict(type='Resize', scale=(-1, 256)),
dict(
type='MultiScaleCrop',
input_size=224,
scales=(1, 0.8),
random_crop=False,
max_wh_scale_gap=0),
dict(type='Resize', scale=(224, 224), keep_ratio=False),
dict(type='Flip', flip_ratio=0.5),
dict(type='Normalize', **img_norm_cfg),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs', 'label'])
]
val_pipeline = [
dict(type='DecordInit'),
dict(
type='SampleFrames',
clip_len=8,
frame_interval=8,
num_clips=1,
test_mode=True),
dict(type='DecordDecode'),
dict(type='Resize', scale=(-1, 256)),
dict(type='CenterCrop', crop_size=224),
dict(type='Flip', flip_ratio=0),
dict(type='Normalize', **img_norm_cfg),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs'])
]
test_pipeline = [
dict(type='DecordInit'),
dict(
type='SampleFrames',
clip_len=8,
frame_interval=8,
num_clips=10,
test_mode=True),
dict(type='DecordDecode'),
dict(type='Resize', scale=(-1, 256)),
dict(type='ThreeCrop', crop_size=256),
dict(type='Flip', flip_ratio=0),
dict(type='Normalize', **img_norm_cfg),
dict(type='FormatShape', input_format='NCTHW'),
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['imgs'])
]
data = dict(
videos_per_gpu=8,
workers_per_gpu=4,
train=dict(
type=dataset_type,
ann_file=ann_file_train,
data_prefix=data_root,
pipeline=train_pipeline),
val=dict(
type=dataset_type,
ann_file=ann_file_val,
data_prefix=data_root_val,
pipeline=val_pipeline),
test=dict(
type=dataset_type,
ann_file=ann_file_val,
data_prefix=data_root_val,
pipeline=test_pipeline))
# optimizer
optimizer = dict(
type='SGD', lr=0.01, momentum=0.9,
weight_decay=0.0001) # this lr is used for 8 gpus
optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2))
# learning policy
lr_config = dict(policy='step', step=[40, 80])
total_epochs = 100
checkpoint_config = dict(interval=5)
evaluation = dict(
interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy'], topk=(1, 5))
log_config = dict(
interval=20,
hooks=[
dict(type='TextLoggerHook'),
# dict(type='TensorboardLoggerHook'),
])
# runtime settings
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs/i3d_r50_video_3d_32x2x1_100e_kinetics400_rgb/'
load_from = None
resume_from = None
workflow = [('train', 1)]
......@@ -8,7 +8,7 @@ Here we compare our MMAction2 repo with other video understanding toolboxes in t
by the training time per iteration. Here, we use
- commit id [7f3490d](https://github.com/open-mmlab/mmaction/tree/7f3490d3db6a67fe7b87bfef238b757403b670e3)(1/5/2020) of MMAction
- commit id [8d53d6f](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd)(5/5/2020) of Temporal-Shift-Module
- commit id [133e40f](https://github.com/facebookresearch/SlowFast/tree/133e40f8349ce37b0e6168639da0811a413579c8)(30/5/2020) of PySlowFast
- commit id [8299c98](https://github.com/facebookresearch/SlowFast/tree/8299c9862f83a067fa7114ce98120ae1568a83ec)(7/7/2020) of PySlowFast
- commit id [f13707f](https://github.com/wzmsltw/BSN-boundary-sensitive-network/tree/f13707fbc362486e93178c39f9c4d398afe2cb2f)(12/12/2018) of BSN(boundary sensitive network)
- commit id [45d0514](https://github.com/JJBOY/BMN-Boundary-Matching-Network/tree/45d05146822b85ca672b65f3d030509583d0135a)(17/10/2019) of BMN(boundary matching network)
......@@ -24,12 +24,12 @@ The training speed is measure with s/iter. The lower, the better.
| Model | MMAction2 (s/iter) | MMAction (s/iter) | Temporal-Shift-Module (s/iter) | PySlowFast (s/iter) |
| :--- | :---------------: | :--------------------: | :----------------------------: | :-----------------: |
| [TSN](/configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py) | **0.29** | 0.36 | 0.45 | x |
| [I3D (setting1)](/configs/recognition/i3d/i3d_r50_32x2x1_100e_kinetics400_rgb.py) | **0.45** | 0.58 | x | x |
| [I3D (setting2)](/configs/recognition/i3d/i3d_r50_8x8x1_100e_kinetics400_rgb.py) | **0.32** | x | x | 0.56 |
| [I3D (video)](/configs/recognition/i3d/i3d_r50_video_8x8x1_100e_kinetics400_rgb.py) | **0.31** | x | x | 0.59 |
| [I3D (rawframe)](/configs/recognition/i3d/i3d_r50_32x2x1_100e_kinetics400_rgb.py) | **0.45** | 0.58 | x | x |
| [TSM](/configs/recognition/tsm/tsm_r50_1x1x8_50e_kinetics400_rgb.py) | **0.30** | x | 0.38 | x |
| [Slowonly](/configs/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb.py) | **0.30** | x | x | 1.03 |
| [Slowfast](/configs/recognition/slowfast/slowfast_r50_4x16x1_256e_kinetics400_rgb.py) | **0.80** | x | x | 1.40 |
| [R(2+1)D](/configs/recognition/r2plus1d/r2plus1d_r34_8x8x1_180e_kinetics400_rgb.py) | **0.48** | x | x | x |
| [Slowonly](/configs/recognition/slowonly/slowonly_r50_video_4x16x1_256e_kinetics400_rgb.py) | **0.27** | x | x | 0.89 |
| [Slowfast](/configs/recognition/slowfast/slowfast_r50_video_4x16x1_256e_kinetics400_rgb.py) | **0.68** | x | x | 1.07 |
| [R(2+1)D](/configs/recognition/r2plus1d/r2plus1d_r34_video_8x8x1_180e_kinetics400_rgb.py) | **0.45** | x | x | x |
## Localizers
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册