Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
models
提交
e035d917
M
models
项目概览
PaddlePaddle
/
models
1 年多 前同步成功
通知
222
Star
6828
Fork
2962
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
602
列表
看板
标记
里程碑
合并请求
255
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
M
models
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
602
Issue
602
列表
看板
标记
里程碑
合并请求
255
合并请求
255
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
e035d917
编写于
8月 29, 2020
作者:
L
LiuChaoXD
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
refined dynamic tsn 2020-08-30
上级
ba0dd405
变更
14
隐藏空白更改
内联
并排
Showing
14 changed file
with
112 addition
and
461 deletion
+112
-461
dygraph/tsn/README.md
dygraph/tsn/README.md
+40
-13
dygraph/tsn/data/dataset/ucf101/README.md
dygraph/tsn/data/dataset/ucf101/README.md
+6
-7
dygraph/tsn/data/dataset/ucf101/build_ucf101_file_list.py
dygraph/tsn/data/dataset/ucf101/build_ucf101_file_list.py
+9
-26
dygraph/tsn/data/dataset/ucf101/extract_rawframes.py
dygraph/tsn/data/dataset/ucf101/extract_rawframes.py
+4
-12
dygraph/tsn/eval.py
dygraph/tsn/eval.py
+1
-2
dygraph/tsn/model.py
dygraph/tsn/model.py
+0
-5
dygraph/tsn/multi_gpus_run.sh
dygraph/tsn/multi_gpus_run.sh
+0
-0
dygraph/tsn/reader/reader_utils.py
dygraph/tsn/reader/reader_utils.py
+9
-9
dygraph/tsn/reader/ucf101_reader.py
dygraph/tsn/reader/ucf101_reader.py
+5
-362
dygraph/tsn/run_eval.sh
dygraph/tsn/run_eval.sh
+1
-1
dygraph/tsn/single_gpu_run.sh
dygraph/tsn/single_gpu_run.sh
+0
-0
dygraph/tsn/train.py
dygraph/tsn/train.py
+29
-15
dygraph/tsn/tsn.yaml
dygraph/tsn/tsn.yaml
+3
-4
dygraph/tsn/tsn_test.yaml
dygraph/tsn/tsn_test.yaml
+5
-5
未找到文件。
dygraph/tsn/README.md
浏览文件 @
e035d917
# TSN 视频分类模型
本目录下为基于PaddlePaddle 动态图实现的
TSN视频分类模型
本目录下为基于PaddlePaddle 动态图实现的
TSN视频分类模型。模型支持PaddlePaddle Fluid 1.8, GPU, Linux。
---
## 内容
-
[
模型简介
](
#模型简介
)
-
[
安装说明
](
#安装说明
)
-
[
数据准备
](
#数据准备
)
-
[
模型训练
](
#模型训练
)
-
[
模型评估
](
#模型评估
)
...
...
@@ -13,10 +14,39 @@
## 模型简介
Temporal Segment Network (TSN) 是视频分类领域经典的基于2D-CNN的解决方案。该方法主要解决视频的长时间行为判断问题,通过稀疏采样视频帧的方式代替稠密采样,既能捕获视频全局信息,也能去除冗余,降低计算量。最终将每帧特征平均融合后得到视频的整体特征,并用于分类。本代码实现的模型为基于单路RGB图像的TSN网络结构,Backbone采用ResNet
-
50结构。
Temporal Segment Network (TSN) 是视频分类领域经典的基于2D-CNN的解决方案。该方法主要解决视频的长时间行为判断问题,通过稀疏采样视频帧的方式代替稠密采样,既能捕获视频全局信息,也能去除冗余,降低计算量。最终将每帧特征平均融合后得到视频的整体特征,并用于分类。本代码实现的模型为基于单路RGB图像的TSN网络结构,Backbone采用ResNet50结构。
详细内容请参考ECCV 2016年论文
[
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition
](
https://arxiv.org/abs/1608.00859
)
## 安装说明
### 环境依赖:
```
python=3.7
paddlepaddle-gpu==1.8.3.post97
opencv=4.3
CUDA >= 9.0
cudnn >= 7.5
wget
numpy
```
### 依赖安装:
-
安装PaddlePaddle,GPU版本:
``` pip3 install paddlepaddle-gpu==1.8.3.post97 -i https://mirror.baidu.com/pypi/simple```
-
安装opencv 4.2:
``` pip3 install opencv-python==4.3.0.36```
-
安装wget
``` pip3 install wget```
-
安装numpy
``` pip3 install numpy```
## 数据准备
TSN的训练数据采用UCF101动作识别数据集。数据下载及准备请参考
[
数据说明
](
./data/dataset/ucf101/README.md
)
...
...
@@ -27,32 +57,30 @@ TSN的训练数据采用UCF101动作识别数据集。数据下载及准备请
1.
多卡训练
```
bash
bash multi
-gpus-
run.sh ./configs/tsn.yaml
bash multi
_gpus_
run.sh ./configs/tsn.yaml
```
多卡训练所使用的gpu可以通过如下方式设置:
-
首先,修改
`./configs/tsn.yaml`
中的 num_gpus (默认为4,表示使用4个gpu进行训练)
-
其次,修改
`multi-gpus-run.sh`
中
`export CUDA_VISIBLE_DEVICES=0,1,2,3`
(默认为0,1,2,3表示使用0,1,2,3卡号的gpu进行训练)
-
注意:若修改了batchsize则学习率也要做相应的修改。例如,默认batchsize=128,lr=0.001,若batchsize=64,lr=0.0005
-
修改
`multi_gpus_run.sh`
中
`export CUDA_VISIBLE_DEVICES=0,1,2,3`
(默认为0,1,2,3表示使用0,1,2,3卡号的gpu进行训练)
-
注意:若修改了batchsize则学习率也要做相应的修改,规则为大batchsize用大lr,即同倍数增长缩小关系。例如,默认batchsize=128,lr=0.001,若batchsize=64,lr=0.0005
2.
单卡训练
```
bash
bash run.sh ./configs/tsn.yaml
bash
single_gpu_
run.sh ./configs/tsn.yaml
```
单卡训练所使用的gpu可以通过如下方式设置:
-
首先,修改
`./configs/tsn.yaml`
中的
`num_gpus=1`
(表示使用单卡进行训练)
-
其次,修改
`run.sh`
中的
`export CUDA_VISIBLE_DEVICES=0`
(表示使用gpu 0 进行模型训练)
-
注意,若修改了batchsize则学习率也要做相应的修改。例如,默认batchsize=128,lr=0.001,若batchsize=64,lr=0.0005
-
修改
`run.sh`
中的
`export CUDA_VISIBLE_DEVICES=0`
(表示使用gpu 0 进行模型训练)
-
注意:若修改了batchsize则学习率也要做相应的修改,规则为大batchsize用大lr,即同倍数增长缩小关系。例如,默认batchsize=128,lr=0.001,若batchsize=64,lr=0.0005
## 模型评估
可通过如下方式进行模型评估:
```
bash
bash run
-eval.sh ./configs/tsn-
test.yaml ./weights/final.pdparams
bash run
_eval.sh ./configs/tsn_
test.yaml ./weights/final.pdparams
```
-
使用
`run.sh`
进行评估时,需要修改脚本中的
`weights`
参数指定需要评估的权重
-
`./tsn
-
test.yaml`
是评估模型时所用的参数文件;
`./weights/final.pdparams`
为模型训练完成后,保存的模型文件
-
`./tsn
_
test.yaml`
是评估模型时所用的参数文件;
`./weights/final.pdparams`
为模型训练完成后,保存的模型文件
-
评估结果以log的形式直接打印输出TOP1
\_
ACC、TOP5
\_
ACC等精度指标
...
...
@@ -62,7 +90,6 @@ bash run-eval.sh ./configs/tsn-test.yaml ./weights/final.pdparams
| | seg
\_
num | Top-1 | Top-5 |
| :------: | :----------: | :----: | :----: |
| Pytorch TSN | 3 | 83.88% | 96.78% |
| Paddle TSN (静态图) | 3 | 84.00% | 97.38% |
| Paddle TSN (动态图) | 3 | 84.27% | 97.27% |
...
...
dygraph/tsn/data/dataset/ucf101/README.md
浏览文件 @
e035d917
...
...
@@ -14,29 +14,30 @@ bash download_annotations.sh
### 下载UCF101的视频文件
同样需要确保在
`./data/dataset/ucf101/`
目录下,输入下述命令下载视频文件
```
shell
bash download_
annotation
s.sh
bash download_
video
s.sh
```
下载完成后视频文件会存储在
`./data/dataset/ucf101/videos/`
文件夹下
下载完成后视频文件会存储在
`./data/dataset/ucf101/videos/`
文件夹下
,视频文件大小为6.8G。
---
## 2. 提取视频文件的frames
为了加速网络的训练过程,我们首先对视频文件(ucf101视频文件为avi格式)提取帧 (frames)。
通过读取frames的方式替换原始的直接读取视频文件,能够极大的减小巡训练的时间开销
。
为了加速网络的训练过程,我们首先对视频文件(ucf101视频文件为avi格式)提取帧 (frames)。
相对于直接通过视频文件进行网络训练的方式,frames的方式能够加快网络训练的速度
。
直接输入如下命令,即可提取ucf101视频文件的frames
```
python
python
extract_rawframes
.
py
.
/
videos
/
.
/
rawframes
/
--
level
2
--
ext
avi
```
视频文件frames提取完成后,会存储在
`./rawframes`
文件夹下,大小为56G。
---
## 3. 生成frames文件和视频文件的路径list
生成视频文件的路径list,输入如下命令
```
python
python
build_ucf101_file_list
.
py
videos
/
--
level
2
--
format
videos
--
out_list_path
.
/
--
shuffle
python
build_ucf101_file_list
.
py
videos
/
--
level
2
--
format
videos
--
out_list_path
.
/
```
生成frames文件的路径list,输入如下命令:
```
python
python
build_ucf101_file_list
.
py
rawframes
/
--
level
2
--
format
rawframes
--
out_list_path
.
/
--
shuffle
python
build_ucf101_file_list
.
py
rawframes
/
--
level
2
--
format
rawframes
--
out_list_path
.
/
```
**参数说明**
...
...
@@ -49,8 +50,6 @@ python extract_rawframes.py ./videos/ ./rawframes/ --level 2 --ext avi
`--out_list_path `
: 表示生的路径list文件存储位置
`--shuffle`
: 表示对路径list中的文件顺序进行shuffle
# 以上步骤完成后,文件组织形式如下所示
...
...
dygraph/tsn/data/dataset/ucf101/build_ucf101_file_list.py
浏览文件 @
e035d917
...
...
@@ -50,7 +50,6 @@ def build_split_list(split, frame_info, shuffle=False):
rgb_list
=
list
()
for
item
in
set_list
:
if
item
[
0
]
not
in
frame_info
:
# print("item:", item)
continue
elif
frame_info
[
item
[
0
]][
1
]
>
0
:
rgb_cnt
=
frame_info
[
item
[
0
]][
1
]
...
...
@@ -97,9 +96,6 @@ def parse_args():
'frame_path'
,
type
=
str
,
help
=
'root directory for the frames'
)
parser
.
add_argument
(
'--rgb_prefix'
,
type
=
str
,
default
=
'img_'
)
parser
.
add_argument
(
'--num_split'
,
type
=
int
,
default
=
3
)
parser
.
add_argument
(
'--subset'
,
type
=
str
,
default
=
'train'
,
choices
=
[
'train'
,
'val'
,
'test'
])
parser
.
add_argument
(
'--level'
,
type
=
int
,
default
=
2
,
choices
=
[
1
,
2
])
parser
.
add_argument
(
'--format'
,
...
...
@@ -145,29 +141,16 @@ def main():
assert
len
(
split_tp
)
==
args
.
num_split
out_path
=
args
.
out_list_path
if
len
(
split_tp
)
>
1
:
for
i
,
split
in
enumerate
(
split_tp
):
lists
=
build_split_list
(
split_tp
[
i
],
frame_info
,
shuffle
=
args
.
shuffle
)
filename
=
'ucf101_train_split_{}_{}.txt'
.
format
(
i
+
1
,
args
.
format
)
with
open
(
os
.
path
.
join
(
out_path
,
filename
),
'w'
)
as
f
:
f
.
writelines
(
lists
[
0
])
filename
=
'ucf101_val_split_{}_{}.txt'
.
format
(
i
+
1
,
args
.
format
)
with
open
(
os
.
path
.
join
(
out_path
,
filename
),
'w'
)
as
f
:
f
.
writelines
(
lists
[
1
])
else
:
lists
=
build_split_list
(
split_tp
[
0
],
frame_info
,
shuffle
=
args
.
shuffle
)
filename
=
'{}_{}_list_{}.txt'
.
format
(
args
.
dataset
,
args
.
subset
,
args
.
format
)
if
args
.
subset
==
'train'
:
ind
=
0
elif
args
.
subset
==
'val'
:
ind
=
1
elif
args
.
subset
==
'test'
:
ind
=
2
for
i
,
split
in
enumerate
(
split_tp
):
lists
=
build_split_list
(
split_tp
[
i
],
frame_info
,
shuffle
=
args
.
shuffle
)
filename
=
'ucf101_train_split_{}_{}.txt'
.
format
(
i
+
1
,
args
.
format
)
with
open
(
os
.
path
.
join
(
out_path
,
filename
),
'w'
)
as
f
:
f
.
writelines
(
lists
[
0
])
filename
=
'ucf101_val_split_{}_{}.txt'
.
format
(
i
+
1
,
args
.
format
)
with
open
(
os
.
path
.
join
(
out_path
,
filename
),
'w'
)
as
f
:
f
.
writelines
(
lists
[
0
][
ind
])
f
.
writelines
(
lists
[
1
])
if
__name__
==
"__main__"
:
...
...
dygraph/tsn/data/dataset/ucf101/extract_rawframes.py
浏览文件 @
e035d917
...
...
@@ -23,9 +23,10 @@ def dump_frames(vid_item):
if
ret
==
False
:
continue
img
=
frame
[:,
:,
::
-
1
]
# covert the BGR img
into RGB img
# covert the BGR img
img
=
cv2
.
cvtColor
(
img
,
cv2
.
COLOR_RGB2BGR
)
if
img
is
not
None
:
# cv2.imwrite will write BGR into RGB images
cv2
.
imwrite
(
'{}/img_{:05d}.jpg'
.
format
(
out_full_path
,
i
+
1
),
img
)
else
:
print
(
'[Warning] length inconsistent!'
...
...
@@ -37,27 +38,18 @@ def dump_frames(vid_item):
def
parse_args
():
parser
=
argparse
.
ArgumentParser
(
description
=
'extract
optical flow
s'
)
parser
=
argparse
.
ArgumentParser
(
description
=
'extract
frame
s'
)
parser
.
add_argument
(
'src_dir'
,
type
=
str
)
parser
.
add_argument
(
'out_dir'
,
type
=
str
)
parser
.
add_argument
(
'--level'
,
type
=
int
,
choices
=
[
1
,
2
],
default
=
2
)
parser
.
add_argument
(
'--num_worker'
,
type
=
int
,
default
=
8
)
parser
.
add_argument
(
"--out_format"
,
type
=
str
,
default
=
'dir'
,
choices
=
[
'dir'
,
'zip'
],
help
=
'output format'
)
parser
.
add_argument
(
"--ext"
,
type
=
str
,
default
=
'avi'
,
choices
=
[
'avi'
,
'mp4'
],
help
=
'video file extensions'
)
parser
.
add_argument
(
"--new_width"
,
type
=
int
,
default
=
0
,
help
=
'resize image width'
)
parser
.
add_argument
(
"--new_height"
,
type
=
int
,
default
=
0
,
help
=
'resize image height'
)
parser
.
add_argument
(
"--resume"
,
action
=
'store_true'
,
...
...
dygraph/tsn/eval.py
浏览文件 @
e035d917
...
...
@@ -37,7 +37,7 @@ def parse_args():
parser
.
add_argument
(
'--config'
,
type
=
str
,
default
=
'./tsn
-
test.yaml'
,
default
=
'./tsn
_
test.yaml'
,
help
=
'path to config file of model'
)
parser
.
add_argument
(
'--batch_size'
,
...
...
@@ -69,7 +69,6 @@ def test(args):
video_model
.
set_dict
(
model_dict
)
test_reader
=
UCF101Reader
(
name
=
"TSN"
,
mode
=
"test"
,
cfg
=
test_config
)
#test_reader = KineticsReader(mode='test', cfg=test_config)
test_reader
=
test_reader
.
create_reader
()
video_model
.
eval
()
...
...
dygraph/tsn/model.py
浏览文件 @
e035d917
...
...
@@ -25,8 +25,6 @@ from paddle.fluid.dygraph.nn import Conv2D, Pool2D, BatchNorm, Linear, Dropout
import
math
__all__
=
[
"ResNet18"
,
"ResNet34"
,
"ResNet50"
,
"ResNet101"
,
"ResNet152"
]
class
ConvBNLayer
(
fluid
.
dygraph
.
Layer
):
def
__init__
(
self
,
...
...
@@ -119,9 +117,6 @@ class BottleneckBlock(fluid.dygraph.Layer):
y
=
fluid
.
layers
.
elementwise_add
(
x
=
short
,
y
=
conv2
)
return
fluid
.
layers
.
relu
(
y
)
# layer_helper = LayerHelper(self.full_name(), act="relu")
# return layer_helper.append_activation(y)
class
BasicBlock
(
fluid
.
dygraph
.
Layer
):
def
__init__
(
self
,
...
...
dygraph/tsn/multi
-gpus-
run.sh
→
dygraph/tsn/multi
_gpus_
run.sh
浏览文件 @
e035d917
文件已移动
dygraph/tsn/reader/reader_utils.py
浏览文件 @
e035d917
# Copyright (c) 20
19
PaddlePaddle Authors. All Rights Reserve.
# Copyright (c) 20
20
PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
Licensed under the Apache License, Version 2.0 (the "License");
#
you may not use this file except in compliance with the License.
#
You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
#
Unless required by applicable law or agreed to in writing, software
#
distributed under the License is distributed on an "AS IS" BASIS,
#
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#
See the License for the specific language governing permissions and
#
limitations under the License.
import
pickle
import
cv2
...
...
dygraph/tsn/reader/ucf101_reader.py
浏览文件 @
e035d917
# copyright (c) 20
19
PaddlePaddle Authors. All Rights Reserve.
# copyright (c) 20
20
PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
...
...
@@ -18,24 +18,9 @@ import cv2
import
math
import
random
import
functools
try
:
import
cPickle
as
pickle
from
cStringIO
import
StringIO
except
ImportError
:
import
pickle
from
io
import
BytesIO
import
numpy
as
np
import
paddle
import
paddle.fluid
as
fluid
try
:
from
nvidia.dali.pipeline
import
Pipeline
import
nvidia.dali.ops
as
ops
import
nvidia.dali.types
as
types
import
tempfile
from
nvidia.dali.plugin.paddle
import
DALIGenericIterator
except
:
Pipeline
=
object
print
(
"DALI is not installed, you can improve performance if use DALI"
)
from
PIL
import
Image
,
ImageEnhance
import
logging
...
...
@@ -115,9 +100,6 @@ class UCF101Reader(DataReader):
# set num_trainers and trainer_id when distributed training is implemented
self
.
num_trainers
=
self
.
get_config_from_sec
(
mode
,
'num_trainers'
,
1
)
self
.
trainer_id
=
self
.
get_config_from_sec
(
mode
,
'trainer_id'
,
0
)
self
.
use_dali
=
self
.
get_config_from_sec
(
mode
,
'use_dali'
,
False
)
self
.
dali_mean
=
cfg
.
MODEL
.
image_mean
*
(
self
.
seg_num
*
self
.
seglen
)
self
.
dali_std
=
cfg
.
MODEL
.
image_std
*
(
self
.
seg_num
*
self
.
seglen
)
if
self
.
mode
==
'infer'
:
self
.
video_path
=
cfg
[
mode
.
upper
()][
'video_path'
]
...
...
@@ -129,9 +111,6 @@ class UCF101Reader(DataReader):
self
.
num_reader_threads
=
1
def
create_reader
(
self
):
# if use_dali to improve performance
if
self
.
use_dali
:
return
self
.
build_dali_reader
()
# if set video_path for inference mode, just load this single video
if
(
self
.
mode
==
'infer'
)
and
(
self
.
video_path
!=
''
):
...
...
@@ -237,42 +216,6 @@ class UCF101Reader(DataReader):
img_std
,
name
=
self
.
name
),
label
def
decode_pickle
(
sample
,
mode
,
seg_num
,
seglen
,
short_size
,
target_size
,
img_mean
,
img_std
):
pickle_path
=
sample
[
0
]
try
:
if
python_ver
<
(
3
,
0
):
data_loaded
=
pickle
.
load
(
open
(
pickle_path
,
'rb'
))
else
:
data_loaded
=
pickle
.
load
(
open
(
pickle_path
,
'rb'
),
encoding
=
'bytes'
)
vid
,
label
,
frames
=
data_loaded
if
len
(
frames
)
<
1
:
logger
.
error
(
'{} frame length {} less than 1.'
.
format
(
pickle_path
,
len
(
frames
)))
return
None
,
None
except
:
logger
.
info
(
'Error when loading {}'
.
format
(
pickle_path
))
return
None
,
None
if
mode
==
'train'
or
mode
==
'valid'
or
mode
==
'test'
:
ret_label
=
label
elif
mode
==
'infer'
:
ret_label
=
vid
imgs
=
video_loader
(
frames
,
seg_num
,
seglen
,
mode
)
return
imgs_transform
(
imgs
,
mode
,
seg_num
,
seglen
,
short_size
,
target_size
,
img_mean
,
img_std
,
name
=
self
.
name
),
ret_label
def
decode_frames
(
sample
,
mode
,
seg_num
,
seglen
,
short_size
,
target_size
,
img_mean
,
img_std
):
recode
=
VideoRecord
(
sample
[
0
].
split
(
' '
))
...
...
@@ -334,11 +277,9 @@ class UCF101Reader(DataReader):
pickle_path
=
line
.
strip
()
yield
[
pickle_path
]
if
format
==
'pkl'
:
decode_func
=
decode_pickle
if
format
==
'frames'
:
decode_func
=
decode_frames
elif
format
==
'
mp4'
or
'avi
'
:
elif
format
==
'
videos
'
:
decode_func
=
decode_mp4
else
:
raise
"Not implemented format {}"
.
format
(
format
)
...
...
@@ -355,249 +296,6 @@ class UCF101Reader(DataReader):
return
fluid
.
io
.
xmap_readers
(
mapper
,
reader_
,
num_threads
,
buf_size
)
def
build_dali_reader
(
self
):
"""
build dali training reader
"""
def
reader_
():
with
open
(
self
.
filelist
)
as
flist
:
full_lines
=
[
line
for
line
in
flist
]
if
self
.
mode
==
'train'
:
if
(
not
hasattr
(
reader_
,
'seed'
)):
reader_
.
seed
=
0
random
.
Random
(
reader_
.
seed
).
shuffle
(
full_lines
)
print
(
"reader shuffle seed"
,
reader_
.
seed
)
if
reader_
.
seed
is
not
None
:
reader_
.
seed
+=
1
per_node_lines
=
int
(
math
.
ceil
(
len
(
full_lines
)
*
1.0
/
self
.
num_trainers
))
total_lines
=
per_node_lines
*
self
.
num_trainers
# aligned full_lines so that it can evenly divisible
full_lines
+=
full_lines
[:(
total_lines
-
len
(
full_lines
))]
assert
len
(
full_lines
)
==
total_lines
# trainer get own sample
lines
=
full_lines
[
self
.
trainer_id
:
total_lines
:
self
.
num_trainers
]
assert
len
(
lines
)
==
per_node_lines
logger
.
info
(
"trainerid %d, trainer_count %d"
%
(
self
.
trainer_id
,
self
.
num_trainers
))
logger
.
info
(
"read images from %d, length: %d, lines length: %d, total: %d"
%
(
self
.
trainer_id
*
per_node_lines
,
per_node_lines
,
len
(
lines
),
len
(
full_lines
)))
video_files
=
''
for
item
in
lines
:
video_files
+=
item
tf
=
tempfile
.
NamedTemporaryFile
()
tf
.
write
(
str
.
encode
(
video_files
))
tf
.
flush
()
video_files
=
tf
.
name
device_id
=
int
(
os
.
getenv
(
'FLAGS_selected_gpus'
,
0
))
print
(
'---------- device id -----------'
,
device_id
)
if
self
.
mode
==
'train'
:
pipe
=
VideoPipe
(
batch_size
=
self
.
batch_size
,
num_threads
=
1
,
device_id
=
device_id
,
file_list
=
video_files
,
sequence_length
=
self
.
seg_num
*
self
.
seglen
,
seg_num
=
self
.
seg_num
,
seg_length
=
self
.
seglen
,
resize_shorter_scale
=
self
.
short_size
,
crop_target_size
=
self
.
target_size
,
is_training
=
(
self
.
mode
==
'train'
),
dali_mean
=
self
.
dali_mean
,
dali_std
=
self
.
dali_std
)
else
:
pipe
=
VideoTestPipe
(
batch_size
=
self
.
batch_size
,
num_threads
=
1
,
device_id
=
device_id
,
file_list
=
video_files
,
sequence_length
=
self
.
seg_num
*
self
.
seglen
,
seg_num
=
self
.
seg_num
,
seg_length
=
self
.
seglen
,
resize_shorter_scale
=
self
.
short_size
,
crop_target_size
=
self
.
target_size
,
is_training
=
(
self
.
mode
==
'train'
),
dali_mean
=
self
.
dali_mean
,
dali_std
=
self
.
dali_std
)
logger
.
info
(
'initializing dataset, it will take several minutes if it is too large .... '
)
video_loader
=
DALIGenericIterator
(
[
pipe
],
[
'image'
,
'label'
],
len
(
lines
),
dynamic_shape
=
True
,
auto_reset
=
True
)
return
video_loader
dali_reader
=
reader_
()
def
ret_reader
():
for
data
in
dali_reader
:
yield
data
[
0
][
'image'
],
data
[
0
][
'label'
]
return
ret_reader
class
VideoPipe
(
Pipeline
):
def
__init__
(
self
,
batch_size
,
num_threads
,
device_id
,
file_list
,
sequence_length
,
seg_num
,
seg_length
,
resize_shorter_scale
,
crop_target_size
,
is_training
=
False
,
initial_prefetch_size
=
10
,
num_shards
=
1
,
shard_id
=
0
,
dali_mean
=
0.
,
dali_std
=
1.0
):
super
(
VideoPipe
,
self
).
__init__
(
batch_size
,
num_threads
,
device_id
)
self
.
input
=
ops
.
VideoReader
(
device
=
"gpu"
,
file_list
=
file_list
,
sequence_length
=
sequence_length
,
seg_num
=
seg_num
,
seg_length
=
seg_length
,
is_training
=
is_training
,
num_shards
=
num_shards
,
shard_id
=
shard_id
,
random_shuffle
=
is_training
,
initial_fill
=
initial_prefetch_size
)
# the sequece data read by ops.VideoReader is of shape [F, H, W, C]
# Because the ops.Resize does not support sequence data,
# it will be transposed into [H, W, F, C],
# then reshaped to [H, W, FC], and then resized like a 2-D image.
self
.
transpose
=
ops
.
Transpose
(
device
=
"gpu"
,
perm
=
[
1
,
2
,
0
,
3
])
self
.
reshape
=
ops
.
Reshape
(
device
=
"gpu"
,
rel_shape
=
[
1.0
,
1.0
,
-
1
],
layout
=
'HWC'
)
self
.
resize
=
ops
.
Resize
(
device
=
"gpu"
,
resize_shorter
=
resize_shorter_scale
)
# crops and mirror are applied by ops.CropMirrorNormalize.
# Normalization will be implemented in paddle due to the difficulty of dimension broadcast,
# It is not sure whether dimension broadcast can be implemented correctly by dali, just take the Paddle Op instead.
self
.
pos_rng_x
=
ops
.
Uniform
(
range
=
(
0.0
,
1.0
))
self
.
pos_rng_y
=
ops
.
Uniform
(
range
=
(
0.0
,
1.0
))
self
.
mirror_generator
=
ops
.
Uniform
(
range
=
(
0.0
,
1.0
))
self
.
cast_mirror
=
ops
.
Cast
(
dtype
=
types
.
DALIDataType
.
INT32
)
self
.
crop_mirror_norm
=
ops
.
CropMirrorNormalize
(
device
=
"gpu"
,
crop
=
[
crop_target_size
,
crop_target_size
],
mean
=
dali_mean
,
std
=
dali_std
)
self
.
reshape_back
=
ops
.
Reshape
(
device
=
"gpu"
,
shape
=
[
seg_num
,
seg_length
*
3
,
crop_target_size
,
crop_target_size
],
layout
=
'FCHW'
)
self
.
cast_label
=
ops
.
Cast
(
device
=
"gpu"
,
dtype
=
types
.
DALIDataType
.
INT64
)
def
define_graph
(
self
):
output
,
label
=
self
.
input
(
name
=
"Reader"
)
output
=
self
.
transpose
(
output
)
output
=
self
.
reshape
(
output
)
output
=
self
.
resize
(
output
)
output
=
output
/
255.
pos_x
=
self
.
pos_rng_x
()
pos_y
=
self
.
pos_rng_y
()
mirror_flag
=
self
.
mirror_generator
()
mirror_flag
=
(
mirror_flag
>
0.5
)
mirror_flag
=
self
.
cast_mirror
(
mirror_flag
)
#output = self.crop(output, crop_pos_x=pos_x, crop_pos_y=pos_y)
output
=
self
.
crop_mirror_norm
(
output
,
crop_pos_x
=
pos_x
,
crop_pos_y
=
pos_y
,
mirror
=
mirror_flag
)
output
=
self
.
reshape_back
(
output
)
label
=
self
.
cast_label
(
label
)
return
output
,
label
class
VideoTestPipe
(
Pipeline
):
def
__init__
(
self
,
batch_size
,
num_threads
,
device_id
,
file_list
,
sequence_length
,
seg_num
,
seg_length
,
resize_shorter_scale
,
crop_target_size
,
is_training
=
False
,
initial_prefetch_size
=
10
,
num_shards
=
1
,
shard_id
=
0
,
dali_mean
=
0.
,
dali_std
=
1.0
):
super
(
VideoTestPipe
,
self
).
__init__
(
batch_size
,
num_threads
,
device_id
)
self
.
input
=
ops
.
VideoReader
(
device
=
"gpu"
,
file_list
=
file_list
,
sequence_length
=
sequence_length
,
seg_num
=
seg_num
,
seg_length
=
seg_length
,
is_training
=
is_training
,
num_shards
=
num_shards
,
shard_id
=
shard_id
,
random_shuffle
=
is_training
,
initial_fill
=
initial_prefetch_size
)
# the sequece data read by ops.VideoReader is of shape [F, H, W, C]
# Because the ops.Resize does not support sequence data,
# it will be transposed into [H, W, F, C],
# then reshaped to [H, W, FC], and then resized like a 2-D image.
self
.
transpose
=
ops
.
Transpose
(
device
=
"gpu"
,
perm
=
[
1
,
2
,
0
,
3
])
self
.
reshape
=
ops
.
Reshape
(
device
=
"gpu"
,
rel_shape
=
[
1.0
,
1.0
,
-
1
],
layout
=
'HWC'
)
self
.
resize
=
ops
.
Resize
(
device
=
"gpu"
,
resize_shorter
=
resize_shorter_scale
)
# crops and mirror are applied by ops.CropMirrorNormalize.
# Normalization will be implemented in paddle due to the difficulty of dimension broadcast,
# It is not sure whether dimension broadcast can be implemented correctly by dali, just take the Paddle Op instead.
self
.
crop_mirror_norm
=
ops
.
CropMirrorNormalize
(
device
=
"gpu"
,
crop
=
[
crop_target_size
,
crop_target_size
],
crop_pos_x
=
0.5
,
crop_pos_y
=
0.5
,
mirror
=
0
,
mean
=
dali_mean
,
std
=
dali_std
)
self
.
reshape_back
=
ops
.
Reshape
(
device
=
"gpu"
,
shape
=
[
seg_num
,
seg_length
*
3
,
crop_target_size
,
crop_target_size
],
layout
=
'FCHW'
)
self
.
cast_label
=
ops
.
Cast
(
device
=
"gpu"
,
dtype
=
types
.
DALIDataType
.
INT64
)
def
define_graph
(
self
):
output
,
label
=
self
.
input
(
name
=
"Reader"
)
output
=
self
.
transpose
(
output
)
output
=
self
.
reshape
(
output
)
output
=
self
.
resize
(
output
)
output
=
output
/
255.
#output = self.crop(output, crop_pos_x=pos_x, crop_pos_y=pos_y)
output
=
self
.
crop_mirror_norm
(
output
)
output
=
self
.
reshape_back
(
output
)
label
=
self
.
cast_label
(
label
)
return
output
,
label
def
imgs_transform
(
imgs
,
mode
,
...
...
@@ -611,8 +309,7 @@ def imgs_transform(imgs,
imgs
=
group_scale
(
imgs
,
short_size
)
if
mode
==
'train'
:
if
name
==
"TSM"
:
imgs
=
group_multi_scale_crop
(
imgs
,
short_size
)
imgs
=
group_random_crop
(
imgs
,
target_size
)
imgs
=
group_random_flip
(
imgs
)
else
:
...
...
@@ -777,47 +474,6 @@ def group_scale(imgs, target_size):
return
resized_imgs
def
imageloader
(
buf
):
if
isinstance
(
buf
,
str
):
img
=
Image
.
open
(
StringIO
(
buf
))
else
:
img
=
Image
.
open
(
BytesIO
(
buf
))
return
img
.
convert
(
'RGB'
)
def
video_loader
(
frames
,
nsample
,
seglen
,
mode
):
videolen
=
len
(
frames
)
average_dur
=
int
(
videolen
/
nsample
)
imgs
=
[]
for
i
in
range
(
nsample
):
idx
=
0
if
mode
==
'train'
:
if
average_dur
>=
seglen
:
idx
=
random
.
randint
(
0
,
average_dur
-
seglen
)
idx
+=
i
*
average_dur
elif
average_dur
>=
1
:
idx
+=
i
*
average_dur
else
:
idx
=
i
else
:
if
average_dur
>=
seglen
:
idx
=
(
average_dur
-
seglen
)
//
2
idx
+=
i
*
average_dur
elif
average_dur
>=
1
:
idx
+=
i
*
average_dur
else
:
idx
=
i
for
jj
in
range
(
idx
,
idx
+
seglen
):
imgbuf
=
frames
[
int
(
jj
%
videolen
)]
img
=
imageloader
(
imgbuf
)
imgs
.
append
(
img
)
return
imgs
def
mp4_loader
(
filepath
,
nsample
,
seglen
,
mode
):
cap
=
cv2
.
VideoCapture
(
filepath
)
videolen
=
int
(
cap
.
get
(
cv2
.
CAP_PROP_FRAME_COUNT
))
...
...
@@ -858,20 +514,9 @@ def mp4_loader(filepath, nsample, seglen, mode):
return
imgs
# the additional function which used to load the frames
# loading images by PIL
# def load_image(directory, idx):
# return Image.open(os.path.join(
# directory, 'img_{:05d}.jpg'.format(idx))).convert('RGB')
# loading images by opencv
def
load_image
(
directory
,
idx
):
img
=
cv2
.
imread
(
os
.
path
.
join
(
directory
,
'img_{:05d}.jpg'
.
format
(
idx
)))
img
=
cv2
.
cvtColor
(
img
,
cv2
.
COLOR_BGR2RGB
)
return
img
return
Image
.
open
(
os
.
path
.
join
(
directory
,
'img_{:05d}.jpg'
.
format
(
idx
))).
convert
(
'RGB'
)
def
frames_loader
(
recode
,
nsample
,
seglen
,
mode
):
...
...
@@ -899,7 +544,5 @@ def frames_loader(recode, nsample, seglen, mode):
for
jj
in
range
(
idx
,
idx
+
seglen
):
img
=
load_image
(
imgpath
,
jj
+
1
)
img
=
Image
.
fromarray
(
img
,
mode
=
'RGB'
)
# print("the readed image shape {}".format(img.shape))
imgs
.
append
(
img
)
return
imgs
dygraph/tsn/run
-
eval.sh
→
dygraph/tsn/run
_
eval.sh
浏览文件 @
e035d917
configs
=
"tsn
-
test.yaml"
configs
=
"tsn
_
test.yaml"
use_gpu
=
True
use_data_parallel
=
False
...
...
dygraph/tsn/single
-gpu-
run.sh
→
dygraph/tsn/single
_gpu_
run.sh
浏览文件 @
e035d917
文件已移动
dygraph/tsn/train.py
浏览文件 @
e035d917
...
...
@@ -84,6 +84,12 @@ def parse_args():
default
=
"./weights"
,
help
=
'path to save the final optimized model.'
'default path is "./weights".'
)
parser
.
add_argument
(
'--validate'
,
type
=
str
,
default
=
False
,
help
=
'whether to validating in training phase.'
'default value is True.'
)
args
=
parser
.
parse_args
()
return
args
...
...
@@ -187,6 +193,7 @@ def val(epoch, model, cfg, args):
print
(
'Finish loss {} , acc1 {} , acc5 {}'
.
format
(
total_loss
/
total_sample
,
total_acc1
/
total_sample
,
total_acc5
/
total_sample
))
return
total_acc1
/
total_sample
def
create_optimizer
(
cfg
,
params
):
...
...
@@ -248,13 +255,7 @@ def train(args):
else
:
gpus
=
gpus
.
split
(
","
)
num_gpus
=
len
(
gpus
)
assert
num_gpus
==
train_config
.
TRAIN
.
num_gpus
,
\
"num_gpus({}) set by CUDA_VISIBLE_DEVICES"
\
"shoud be the same as that"
\
"set in {}({})"
.
format
(
num_gpus
,
args
.
config
,
train_config
.
TRAIN
.
num_gpus
)
bs_denominator
=
train_config
.
TRAIN
.
num_gpus
bs_denominator
=
num_gpus
train_config
.
TRAIN
.
batch_size
=
int
(
train_config
.
TRAIN
.
batch_size
/
bs_denominator
)
...
...
@@ -314,7 +315,7 @@ def train(args):
total_sample
+=
1
train_batch_cost
=
time
.
time
()
-
batch_start
print
(
'TRAIN Epoch: {}, iter: {}, batch_cost: {: .5f}
s, reader_cost: {: .5f}
s loss={: .6f}, acc1 {: .6f}, acc5 {: .6f}
\t
'
.
'TRAIN Epoch: {}, iter: {}, batch_cost: {: .5f}
s, reader_cost: {: .5f}
s loss={: .6f}, acc1 {: .6f}, acc5 {: .6f}
\t
'
.
format
(
epoch
,
batch_id
,
train_batch_cost
,
train_reader_cost
,
avg_loss
.
numpy
()[
0
],
acc_top1
.
numpy
()[
0
],
acc_top5
.
numpy
()[
0
]))
...
...
@@ -339,14 +340,27 @@ def train(args):
fluid
.
dygraph
.
save_dygraph
(
video_model
.
state_dict
(),
model_path
)
fluid
.
dygraph
.
save_dygraph
(
optimizer
.
state_dict
(),
model_path
)
video_model
.
eval
()
val
(
epoch
,
video_model
,
valid_config
,
args
)
if
args
.
validate
:
video_model
.
eval
()
val_acc
=
val
(
epoch
,
video_model
,
valid_config
,
args
)
# save the best parameters in trainging stage
if
epoch
==
1
:
best_acc
=
val_acc
else
:
if
val_acc
>
best_acc
:
best_acc
=
val_acc
if
fluid
.
dygraph
.
parallel
.
Env
().
local_rank
==
0
:
if
not
os
.
path
.
isdir
(
args
.
weights
):
os
.
makedirs
(
args
.
weights
)
fluid
.
dygraph
.
save_dygraph
(
video_model
.
state_dict
(),
args
.
weights
+
"/final"
)
else
:
if
fluid
.
dygraph
.
parallel
.
Env
().
local_rank
==
0
:
if
not
os
.
path
.
isdir
(
args
.
weights
):
os
.
makedirs
(
args
.
weights
)
fluid
.
dygraph
.
save_dygraph
(
video_model
.
state_dict
(),
args
.
weights
+
"/final"
)
if
fluid
.
dygraph
.
parallel
.
Env
().
local_rank
==
0
:
if
not
os
.
path
.
isdir
(
args
.
weights
):
os
.
makedirs
(
args
.
weights
)
fluid
.
dygraph
.
save_dygraph
(
video_model
.
state_dict
(),
args
.
weights
+
"/final"
)
logger
.
info
(
'[TRAIN] training finished'
)
...
...
dygraph/tsn/tsn.yaml
浏览文件 @
e035d917
MODEL
:
name
:
"
TSN"
format
:
"
frames"
format
:
"
frames"
# support for "frames" or "videos"
num_classes
:
101
seg_num
:
3
seglen
:
1
...
...
@@ -15,9 +15,8 @@ TRAIN:
target_size
:
224
num_reader_threads
:
12
buf_size
:
1024
batch_size
:
128
batch_size
:
256
use_gpu
:
True
num_gpus
:
4
#8
filelist
:
"
./data/dataset/ucf101/ucf101_train_split_1_rawframes.txt"
learning_rate
:
0.001
learning_rate_decay
:
0.1
...
...
@@ -40,4 +39,4 @@ TEST:
num_reader_threads
:
12
buf_size
:
1024
batch_size
:
64
filelist
:
"
./data/dataset/ucf101/ucf101_val_split_1_rawframes.txt"
filelist
:
"
./data/dataset/ucf101/ucf101_val_split_1_rawframes.txt"
\ No newline at end of file
dygraph/tsn/tsn
-
test.yaml
→
dygraph/tsn/tsn
_
test.yaml
浏览文件 @
e035d917
...
...
@@ -14,14 +14,14 @@ VALID:
short_size
:
256
target_size
:
224
num_reader_threads
:
12
buf_size
:
1
batch_size
:
128
buf_size
:
4
batch_size
:
32
filelist
:
"
./data/dataset/ucf101/ucf101_val_split_1_rawframes.txt"
TEST
:
short_size
:
256
target_size
:
224
num_reader_threads
:
12
buf_size
:
102
4
batch_size
:
1
filelist
:
"
./data/dataset/ucf101/ucf101_val_split_1_rawframes.txt"
buf_size
:
4
batch_size
:
32
filelist
:
"
./data/dataset/ucf101/ucf101_val_split_1_rawframes.txt"
\ No newline at end of file
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录