未验证 提交 e3201306 编写于 作者: C chajchaj 提交者: GitHub

support data_parallel training and ucf101 dataset (#4819)

上级 2c8b76b1
......@@ -6,44 +6,61 @@
## 内容
- [模型简介](#模型简介)
- [安装说明](#安装说明)
- [数据准备](#数据准备)
- [模型训练](#模型训练)
- [模型评估](#模型评估)
## 模型简介
Temporal Shift Module是由MIT和IBM Watson AI Lab的Ji Lin,Chuang Gan和Song Han等人提出的通过时间位移来提高网络视频理解能力的模块, 详细内容请参考论文[Temporal Shift Module for Efficient Video Understanding](https://arxiv.org/abs/1811.08383v1)
## 数据准备
## 安装说明
1. 在当前模型库运行样例代码需要PaddlePaddle v.2.0.0或以上的版本。如果你的运行环境中的PaddlePaddle低于此版本,请根据[安装文档](http://www.paddlepaddle.org/documentation/docs/zh/1.6/beginners_guide/install/index_cn.html)中的说明来更新PaddlePaddle。
2. 下载模型repo: git clone https://github.com/PaddlePaddle/models
### 其他环境依赖
- Python >= 3.7
TSM的训练数据采用由DeepMind公布的Kinetics-400动作识别数据集。数据下载及准备请参考[数据说明](data/dataset/README.md)
- CUDA >= 8.0
### 小数据集验证
- CUDNN >= 7.0
为了便于快速迭代,我们采用了较小的数据集进行动态图训练验证,分别进行了两组实验验证:
1. 其中包括8k大小的训练数据和2k大小的测试数据。
2. 其中包括了十类大小的训练数据和测试数据。
## 数据准备
TSM的训练数据采用UCF101行为识别数据集,包含101个行为类别。
ucf101_reader.py文件中的ucf101_root设置为ucf101数据集目录,其中的videos、rawframes分别为视频格式和帧图格式,大小分别为6.8G、56G。
准备数据步骤:
1. 下载官方ucf101数据: wget https://www.crcv.ucf.edu/data/UCF101/UCF101.rar, 解压存放到$ucf101_root/videos
2. 提取视频frames文件(TODO),存放到$ucf101_root/frames
3. 生成video文件路径list文件(步骤TODO),存放到./data/dataset/ucf101/
## 模型训练
数据准备完毕后,可以通过如下方式启动训练:
数据准备完毕后,可以通过如下方式启动训练.
- 从头开始训练
sh run_ucf101.sh
bash run.sh train
- 基于imagenet pretrain的resnet backbone参数进行训练:
## 模型评估
1. 需要加载在ImageNet上训练的ResNet50权重作为初始化参数,wget https://paddlemodels.bj.bcebos.com/video_classification/ResNet50_pretrained.tar.gz, 并解压
2. 通过--weights=./ResNet50_pretrained/启动训练: sh run_ucf101_imagenet.sh
数据准备完毕后,可以通过如下方式启动训练:
- 基于k400 pretrain模型进行finetune:
bash run.sh eval
1. 下载静态图已发布模型 wget https://paddlemodels.bj.bcebos.com/video_classification/TSM.pdparams
2. mkdir k400_wei && mv TSM.pdparams k400_wei
3. 通过--weights=k400_wei/TSM.pdparams启动训练: sh run_ucf101_k400.sh
从Kinetics400选取的十类的数据集下:
UCF101数据集下:
|Top-1|Top-5|
|:-:|:-:|
|76.56%|98.1%|
|Top-1|Top-5|pretrain|
|:-:|:-:|:-:|
|84.37%|95.68%|ImageNet|
|94.54%|98.96%|Kinetics-400|
全量数据集精度
Top-1 0.70
请参考:[静态图](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/PaddleVideo)
......@@ -16,6 +16,7 @@ import os
import time
import sys
import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.layer_helper import LayerHelper
from paddle.fluid.dygraph.nn import Conv2D, Pool2D, BatchNorm, Linear
import math
......@@ -28,7 +29,8 @@ class ConvBNLayer(fluid.dygraph.Layer):
filter_size,
stride=1,
groups=1,
act=None):
act=None,
name=None):
super(ConvBNLayer, self).__init__()
self._conv = Conv2D(
......@@ -39,14 +41,22 @@ class ConvBNLayer(fluid.dygraph.Layer):
padding=(filter_size - 1) // 2,
groups=None,
act=None,
param_attr=fluid.param_attr.ParamAttr(),
param_attr=fluid.param_attr.ParamAttr(name=name + "_weights"),
bias_attr=False)
if name == "conv1":
bn_name = "bn_" + name
else:
bn_name = "bn" + name[3:]
self._batch_norm = BatchNorm(
num_filters,
act=act,
param_attr=fluid.param_attr.ParamAttr(),
bias_attr=fluid.param_attr.ParamAttr())
param_attr=ParamAttr(
name=bn_name + "_scale"), #fluid.param_attr.ParamAttr(),
bias_attr=ParamAttr(bn_name +
"_offset"), #fluid.param_attr.ParamAttr())
moving_mean_name=bn_name + "_mean",
moving_variance_name=bn_name + "_variance")
def forward(self, inputs):
y = self._conv(inputs)
......@@ -61,32 +71,36 @@ class BottleneckBlock(fluid.dygraph.Layer):
num_filters,
stride,
shortcut=True,
seg_num=8):
seg_num=8,
name=None):
super(BottleneckBlock, self).__init__()
self.conv0 = ConvBNLayer(
num_channels=num_channels,
num_filters=num_filters,
filter_size=1,
act='relu')
act='relu',
name=name + "_branch2a")
self.conv1 = ConvBNLayer(
num_channels=num_filters,
num_filters=num_filters,
filter_size=3,
stride=stride,
act='relu')
act='relu',
name=name + "_branch2b")
self.conv2 = ConvBNLayer(
num_channels=num_filters,
num_filters=num_filters * 4,
filter_size=1,
act=None)
act=None,
name=name + "_branch2c")
if not shortcut:
self.short = ConvBNLayer(
num_channels=num_channels,
num_filters=num_filters * 4,
filter_size=1,
stride=stride)
stride=stride,
name=name + "_branch1")
self.shortcut = shortcut
self.seg_num = seg_num
self._num_channels_out = int(num_filters * 4)
......@@ -119,7 +133,12 @@ class TSM_ResNet(fluid.dygraph.Layer):
num_filters = [64, 128, 256, 512]
self.conv = ConvBNLayer(
num_channels=3, num_filters=64, filter_size=7, stride=2, act='relu')
num_channels=3,
num_filters=64,
filter_size=7,
stride=2,
act='relu',
name="conv1")
self.pool2d_max = Pool2D(
pool_size=3, pool_stride=2, pool_padding=1, pool_type='max')
......@@ -129,14 +148,23 @@ class TSM_ResNet(fluid.dygraph.Layer):
for block in range(len(depth)):
shortcut = False
for i in range(depth[block]):
if self.layers in [101, 152] and block == 2:
if i == 0:
conv_name = "res" + str(block + 2) + "a"
else:
conv_name = "res" + str(block + 2) + "b" + str(i)
else:
conv_name = "res" + str(block + 2) + chr(97 + i)
bottleneck_block = self.add_sublayer(
'bb_%d_%d' % (block, i),
conv_name,
BottleneckBlock(
num_channels=num_channels,
num_filters=num_filters[block],
stride=2 if i == 0 and block != 0 else 1,
shortcut=shortcut,
seg_num=self.seg_num))
seg_num=self.seg_num,
name=conv_name))
num_channels = int(bottleneck_block._num_channels_out)
self.bottleneck_block_list.append(bottleneck_block)
shortcut = True
......@@ -151,9 +179,12 @@ class TSM_ResNet(fluid.dygraph.Layer):
self.class_dim,
act="softmax",
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, stdv)),
initializer=fluid.initializer.Uniform(-stdv, stdv),
name="fc_0.w_0"),
bias_attr=fluid.param_attr.ParamAttr(
learning_rate=2.0, regularizer=fluid.regularizer.L2Decay(0.)))
learning_rate=2.0,
regularizer=fluid.regularizer.L2Decay(0.),
name="fc_0.b_0"))
def forward(self, inputs):
y = fluid.layers.reshape(
......
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import pickle
import cv2
import numpy as np
import random
class ReaderNotFoundError(Exception):
"Error: reader not found"
def __init__(self, reader_name, avail_readers):
super(ReaderNotFoundError, self).__init__()
self.reader_name = reader_name
self.avail_readers = avail_readers
def __str__(self):
msg = "Reader {} Not Found.\nAvailiable readers:\n".format(
self.reader_name)
for reader in self.avail_readers:
msg += " {}\n".format(reader)
return msg
class DataReader(object):
"""data reader for video input"""
def __init__(self, model_name, mode, cfg):
self.name = model_name
self.mode = mode
self.cfg = cfg
def create_reader(self):
"""Not implemented"""
pass
def get_config_from_sec(self, sec, item, default=None):
if sec.upper() not in self.cfg:
return default
return self.cfg[sec.upper()].get(item, default)
class ReaderZoo(object):
def __init__(self):
self.reader_zoo = {}
def regist(self, name, reader):
assert reader.__base__ == DataReader, "Unknow model type {}".format(
type(reader))
self.reader_zoo[name] = reader
def get(self, name, mode, cfg):
for k, v in self.reader_zoo.items():
if k == name:
return v(name, mode, cfg)
raise ReaderNotFoundError(name, self.reader_zoo.keys())
# singleton reader_zoo
reader_zoo = ReaderZoo()
def regist_reader(name, reader):
reader_zoo.regist(name, reader)
def get_reader(name, mode, cfg):
reader_model = reader_zoo.get(name, mode, cfg)
return reader_model.create_reader()
CUDA_VISIBLE_DEVICES=0,1,2,3 python3.7 -m paddle.distributed.launch --started_port 38989 --log_dir ./mylog.ucf101.frames tsm.py --config=./tsm_ucf101.yaml --use_gpu=True --use_data_parallel=True
CUDA_VISIBLE_DEVICES=0,1,2,3 python3.7 -m paddle.distributed.launch --started_port 18989 --log_dir ./mylog.ucf101.frames.imagenet train.py --config=./tsm_ucf101.yaml --use_gpu=True --use_data_parallel=True --weights=./ResNet50_pretrained/
CUDA_VISIBLE_DEVICES=4,5,6,7 python3.7 -m paddle.distributed.launch --started_port 38989 --log_dir ./mylog.ucf101.frames.k400 train.py --config=./tsm_ucf101.yaml --use_gpu=True --use_data_parallel=True --weights=k400_wei/TSM.pdparams
CUDA_VISIBLE_DEVICES=1 python3.7 -m paddle.distributed.launch --started_port 38989 --log_dir ./mylog.ucf101.frames.k400.sing train.py --config=./tsm_ucf101_sing.yaml --use_gpu=True --use_data_parallel=False --weights=k400_wei/TSM.pdparams
......@@ -24,6 +24,7 @@ from paddle.fluid.dygraph.base import to_variable
from model import TSM_ResNet
from config_utils import *
from reader import KineticsReader
from ucf101_reader import UCF101Reader
logging.root.handlers = []
FORMAT = '[%(levelname)s: %(filename)s: %(lineno)4d]: %(message)s'
......@@ -65,12 +66,39 @@ def parse_args():
type=int,
default=None,
help='epoch number, 0 for read from config file')
parser.add_argument(
'--use_data_parallel',
type=ast.literal_eval,
default=True,
help='default use data parallel.')
parser.add_argument(
'--model_save_dir',
type=str,
default='./output',
help='default model save in ./output.')
parser.add_argument(
'--checkpoint',
type=str,
default=None,
help='path to resume training based on previous checkpoints. '
'None for not resuming any checkpoints.')
parser.add_argument(
'--model_path_pre',
type=str,
default='tsm',
help='default model path pre is tsm.')
parser.add_argument(
'--weights',
type=str,
default='./ResNet50_pretrained/',
help='default weights is ./ResNet50_pretrained/.')
args = parser.parse_args()
return args
def val(epoch, model, cfg, args):
reader = KineticsReader(mode="valid", cfg=cfg)
reader = UCF101Reader(name="TSM", mode="valid", cfg=cfg)
reader = reader.create_reader()
total_loss = 0.0
total_acc1 = 0.0
......@@ -101,9 +129,9 @@ def val(epoch, model, cfg, args):
epoch, batch_id,
avg_loss.numpy()[0], acc_top1.numpy()[0], acc_top5.numpy()[0]))
print('Finish loss {} , acc1 {} , acc5 {}'.format(
total_loss / total_sample, total_acc1 / total_sample, total_acc5 /
total_sample))
print('TEST Epoch {}, iter {}, Finish loss {} , acc1 {} , acc5 {}'.format(
epoch, batch_id, total_loss / total_sample, total_acc1 / total_sample,
total_acc5 / total_sample))
def create_optimizer(cfg, params):
......@@ -132,26 +160,66 @@ def train(args):
valid_config = merge_configs(config, 'valid', vars(args))
print_configs(train_config, 'Train')
use_data_parallel = False
local_rank = fluid.dygraph.parallel.Env().local_rank
use_data_parallel = args.use_data_parallel
trainer_count = fluid.dygraph.parallel.Env().nranks
place = fluid.CUDAPlace(fluid.dygraph.parallel.Env().dev_id) \
if use_data_parallel else fluid.CUDAPlace(0)
if not args.use_gpu:
place = fluid.CPUPlace()
elif not args.use_data_parallel:
place = fluid.CUDAPlace(0)
else:
#(data_parallel step1/6)
place = fluid.CUDAPlace(fluid.dygraph.parallel.Env().dev_id)
#load pretrain
assert os.path.exists(args.weights), \
"Given dir {} not exist.".format(args.weights)
pre_state_dict = fluid.load_program_state(args.weights)
#for key in pre_state_dict.keys():
# print('pre_state_dict.key: {}'.format(key))
with fluid.dygraph.guard(place):
if use_data_parallel:
strategy = fluid.dygraph.parallel.prepare_context()
#1. init model
video_model = TSM_ResNet("TSM", train_config)
#2. set weights
param_state_dict = {}
model_dict = video_model.state_dict()
for key in model_dict.keys():
weight_name = model_dict[key].name
if weight_name in pre_state_dict.keys(
) and weight_name != "fc_0.w_0" and weight_name != "fc_0.b_0":
print('succ Load weight: {}, shape: {}'.format(
weight_name, pre_state_dict[weight_name].shape))
param_state_dict[key] = pre_state_dict[weight_name]
else:
print('fail Load weight: {}'.format(weight_name))
param_state_dict[key] = model_dict[key]
video_model.set_dict(param_state_dict)
#3. init optim
optimizer = create_optimizer(train_config.TRAIN,
video_model.parameters())
if use_data_parallel:
#(data_parallel step2,3/6)
strategy = fluid.dygraph.parallel.prepare_context()
video_model = fluid.dygraph.parallel.DataParallel(video_model,
strategy)
# 4. load checkpoint
if args.checkpoint:
assert os.path.exists(args.checkpoint + ".pdparams"), \
"Given dir {}.pdparams not exist.".format(args.checkpoint)
assert os.path.exists(args.checkpoint + ".pdopt"), \
"Given dir {}.pdopt not exist.".format(args.checkpoint)
para_dict, opti_dict = fluid.dygraph.load_dygraph(args.checkpoint)
video_model.set_dict(para_dict)
optimizer.set_dict(opti_dict)
# 5. reader
bs_denominator = 1
if args.use_gpu:
# check number of GPUs
gpus = os.getenv("CUDA_VISIBLE_DEVICES", "")
if gpus == "":
pass
......@@ -168,27 +236,36 @@ def train(args):
train_config.TRAIN.batch_size = int(train_config.TRAIN.batch_size /
bs_denominator)
train_reader = KineticsReader(mode="train", cfg=train_config)
train_reader = UCF101Reader(name="TSM", mode="train", cfg=train_config)
train_reader = train_reader.create_reader()
if use_data_parallel:
#(data_parallel step4/6)
train_reader = fluid.contrib.reader.distributed_batch_reader(
train_reader)
# 6. train loop
for epoch in range(train_config.TRAIN.epoch):
video_model.train()
total_loss = 0.0
total_acc1 = 0.0
total_acc5 = 0.0
total_sample = 0
t_last = time.time()
# 6.1 for each batch, call model() , backward(), and minimize()
for batch_id, data in enumerate(train_reader()):
t1 = time.time()
x_data = np.array([item[0] for item in data])
y_data = np.array([item[1] for item in data]).reshape([-1, 1])
imgs = to_variable(x_data)
labels = to_variable(y_data)
labels.stop_gradient = True
t2 = time.time()
outputs = video_model(imgs)
t3 = time.time()
loss = fluid.layers.cross_entropy(
input=outputs, label=labels, ignore_index=-1)
avg_loss = fluid.layers.mean(loss)
......@@ -198,34 +275,62 @@ def train(args):
acc_top5 = fluid.layers.accuracy(
input=outputs, label=labels, k=5)
current_step_lr = optimizer.current_step_lr()
if use_data_parallel:
#(data_parallel step5/6)
avg_loss = video_model.scale_loss(avg_loss)
avg_loss.backward()
video_model.apply_collective_grads()
else:
avg_loss.backward()
t4 = time.time()
optimizer.minimize(avg_loss)
video_model.clear_gradients()
t5 = time.time()
total_loss += avg_loss.numpy()[0]
total_acc1 += acc_top1.numpy()[0]
total_acc5 += acc_top5.numpy()[0]
total_sample += 1
print('TRAIN Epoch {}, iter {}, loss = {}, acc1 {}, acc5 {}'.
format(epoch, batch_id,
avg_loss.numpy()[0],
acc_top1.numpy()[0], acc_top5.numpy()[0]))
print(
'TRAIN Epoch: %d, iter: %d, loss: %.5f, acc1: %.5f, acc5: %.5f, lr: %.5f, forward_cost:%.5f s, backward_cost:%.5f s, minimize_cost:%.5f s, to_variable_cost: %.5f s, batch_cost: %.5f s, reader_cost: %.5f s'
% (epoch, batch_id, avg_loss.numpy()[0],
acc_top1.numpy()[0], acc_top5.numpy()[0],
current_step_lr, t3 - t2, t4 - t3, t5 - t4, t2 - t1,
t5 - t_last, t2 - t_last))
t_last = time.time()
print(
'TRAIN End, Epoch {}, avg_loss= {}, avg_acc1= {}, avg_acc5= {}'.
'TRAIN End, Epoch {}, avg_loss= {}, avg_acc1= {}, avg_acc5= {}, lr={}'.
format(epoch, total_loss / total_sample, total_acc1 /
total_sample, total_acc5 / total_sample))
total_sample, total_acc5 / total_sample,
current_step_lr))
# 6.2 save checkpoint
if local_rank == 0:
if not os.path.isdir(args.model_save_dir):
os.makedirs(args.model_save_dir)
model_path = os.path.join(
args.model_save_dir,
args.model_path_pre + "_epoch{}".format(epoch))
fluid.dygraph.save_dygraph(video_model.state_dict(), model_path)
fluid.dygraph.save_dygraph(optimizer.state_dict(), model_path)
print('save_dygraph End, Epoch {}/{} '.format(
epoch, train_config.TRAIN.epoch))
# 6.3 validation
video_model.eval()
val(epoch, video_model, valid_config, args)
if fluid.dygraph.parallel.Env().local_rank == 0:
fluid.dygraph.save_dygraph(video_model.state_dict(), "final")
# 7. save final model
if local_rank == 0:
model_path = os.path.join(args.model_save_dir,
args.model_path_pre + "_final")
fluid.dygraph.save_dygraph(video_model.state_dict(), model_path)
fluid.dygraph.save_dygraph(optimizer.state_dict(), model_path)
logger.info('[TRAIN] training finished')
......
MODEL:
name: "TSM"
format: "frames"
num_classes: 101
seg_num: 8
seglen: 1
image_mean: [0.485, 0.456, 0.406]
image_std: [0.229, 0.224, 0.225]
num_layers: 50
topk: 5
TRAIN:
epoch: 80
short_size: 256
target_size: 224
num_reader_threads: 12
buf_size: 1024
batch_size: 64
use_gpu: True
num_gpus: 4
filelist: "./data/dataset/ucf101/ucf101_train_split_1_rawframes.txt"
learning_rate: 0.01
learning_rate_decay: 0.1
decay_epochs: [40, 60]
l2_weight_decay: 1e-4
momentum: 0.9
total_videos: 9537
fix_random_seed: False
VALID:
short_size: 256
target_size: 224
num_reader_threads: 12
buf_size: 1024
batch_size: 32
filelist: "./data/dataset/ucf101/ucf101_val_split_1_rawframes.txt"
TEST:
short_size: 256
target_size: 224
num_reader_threads: 12
buf_size: 1024
batch_size: 16
filelist: "./data/dataset/ucf101/ucf101_val_split_1_rawframes.txt"
MODEL:
name: "TSM"
format: "frames"
num_classes: 101
seg_num: 8
seglen: 1
image_mean: [0.485, 0.456, 0.406]
image_std: [0.229, 0.224, 0.225]
num_layers: 50
topk: 5
TRAIN:
epoch: 80
short_size: 256
target_size: 224
num_reader_threads: 12
buf_size: 1024
batch_size: 16
use_gpu: True
num_gpus: 1
filelist: "./data/dataset/ucf101/ucf101_train_split_1_rawframes.txt"
learning_rate: 0.01
learning_rate_decay: 0.1
decay_epochs: [40, 60]
l2_weight_decay: 1e-4
momentum: 0.9
total_videos: 9537
fix_random_seed: False
VALID:
short_size: 256
target_size: 224
num_reader_threads: 12
buf_size: 1024
batch_size: 32
filelist: "./data/dataset/ucf101/ucf101_val_split_1_rawframes.txt"
TEST:
short_size: 256
target_size: 224
num_reader_threads: 12
buf_size: 1024
batch_size: 16
filelist: "./data/dataset/ucf101/ucf101_val_split_1_rawframes.txt"
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册