未验证 提交 e3201306 编写于 作者: C chajchaj 提交者: GitHub

support data_parallel training and ucf101 dataset (#4819)

上级 2c8b76b1
...@@ -6,44 +6,61 @@ ...@@ -6,44 +6,61 @@
## 内容 ## 内容
- [模型简介](#模型简介) - [模型简介](#模型简介)
- [安装说明](#安装说明)
- [数据准备](#数据准备) - [数据准备](#数据准备)
- [模型训练](#模型训练) - [模型训练](#模型训练)
- [模型评估](#模型评估)
## 模型简介 ## 模型简介
Temporal Shift Module是由MIT和IBM Watson AI Lab的Ji Lin,Chuang Gan和Song Han等人提出的通过时间位移来提高网络视频理解能力的模块, 详细内容请参考论文[Temporal Shift Module for Efficient Video Understanding](https://arxiv.org/abs/1811.08383v1) Temporal Shift Module是由MIT和IBM Watson AI Lab的Ji Lin,Chuang Gan和Song Han等人提出的通过时间位移来提高网络视频理解能力的模块, 详细内容请参考论文[Temporal Shift Module for Efficient Video Understanding](https://arxiv.org/abs/1811.08383v1)
## 数据准备 ## 安装说明
1. 在当前模型库运行样例代码需要PaddlePaddle v.2.0.0或以上的版本。如果你的运行环境中的PaddlePaddle低于此版本,请根据[安装文档](http://www.paddlepaddle.org/documentation/docs/zh/1.6/beginners_guide/install/index_cn.html)中的说明来更新PaddlePaddle。
2. 下载模型repo: git clone https://github.com/PaddlePaddle/models
### 其他环境依赖
- Python >= 3.7
TSM的训练数据采用由DeepMind公布的Kinetics-400动作识别数据集。数据下载及准备请参考[数据说明](data/dataset/README.md) - CUDA >= 8.0
### 小数据集验证 - CUDNN >= 7.0
为了便于快速迭代,我们采用了较小的数据集进行动态图训练验证,分别进行了两组实验验证:
1. 其中包括8k大小的训练数据和2k大小的测试数据。 ## 数据准备
2. 其中包括了十类大小的训练数据和测试数据。
TSM的训练数据采用UCF101行为识别数据集,包含101个行为类别。
ucf101_reader.py文件中的ucf101_root设置为ucf101数据集目录,其中的videos、rawframes分别为视频格式和帧图格式,大小分别为6.8G、56G。
准备数据步骤:
1. 下载官方ucf101数据: wget https://www.crcv.ucf.edu/data/UCF101/UCF101.rar, 解压存放到$ucf101_root/videos
2. 提取视频frames文件(TODO),存放到$ucf101_root/frames
3. 生成video文件路径list文件(步骤TODO),存放到./data/dataset/ucf101/
## 模型训练 ## 模型训练
数据准备完毕后,可以通过如下方式启动训练: 数据准备完毕后,可以通过如下方式启动训练.
- 从头开始训练
sh run_ucf101.sh
bash run.sh train - 基于imagenet pretrain的resnet backbone参数进行训练:
## 模型评估 1. 需要加载在ImageNet上训练的ResNet50权重作为初始化参数,wget https://paddlemodels.bj.bcebos.com/video_classification/ResNet50_pretrained.tar.gz, 并解压
2. 通过--weights=./ResNet50_pretrained/启动训练: sh run_ucf101_imagenet.sh
数据准备完毕后,可以通过如下方式启动训练: - 基于k400 pretrain模型进行finetune:
bash run.sh eval 1. 下载静态图已发布模型 wget https://paddlemodels.bj.bcebos.com/video_classification/TSM.pdparams
2. mkdir k400_wei && mv TSM.pdparams k400_wei
3. 通过--weights=k400_wei/TSM.pdparams启动训练: sh run_ucf101_k400.sh
从Kinetics400选取的十类的数据集下: UCF101数据集下:
|Top-1|Top-5| |Top-1|Top-5|pretrain|
|:-:|:-:| |:-:|:-:|:-:|
|76.56%|98.1%| |84.37%|95.68%|ImageNet|
|94.54%|98.96%|Kinetics-400|
全量数据集精度
Top-1 0.70
请参考:[静态图](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/PaddleVideo)
...@@ -16,6 +16,7 @@ import os ...@@ -16,6 +16,7 @@ import os
import time import time
import sys import sys
import paddle.fluid as fluid import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.layer_helper import LayerHelper from paddle.fluid.layer_helper import LayerHelper
from paddle.fluid.dygraph.nn import Conv2D, Pool2D, BatchNorm, Linear from paddle.fluid.dygraph.nn import Conv2D, Pool2D, BatchNorm, Linear
import math import math
...@@ -28,7 +29,8 @@ class ConvBNLayer(fluid.dygraph.Layer): ...@@ -28,7 +29,8 @@ class ConvBNLayer(fluid.dygraph.Layer):
filter_size, filter_size,
stride=1, stride=1,
groups=1, groups=1,
act=None): act=None,
name=None):
super(ConvBNLayer, self).__init__() super(ConvBNLayer, self).__init__()
self._conv = Conv2D( self._conv = Conv2D(
...@@ -39,14 +41,22 @@ class ConvBNLayer(fluid.dygraph.Layer): ...@@ -39,14 +41,22 @@ class ConvBNLayer(fluid.dygraph.Layer):
padding=(filter_size - 1) // 2, padding=(filter_size - 1) // 2,
groups=None, groups=None,
act=None, act=None,
param_attr=fluid.param_attr.ParamAttr(), param_attr=fluid.param_attr.ParamAttr(name=name + "_weights"),
bias_attr=False) bias_attr=False)
if name == "conv1":
bn_name = "bn_" + name
else:
bn_name = "bn" + name[3:]
self._batch_norm = BatchNorm( self._batch_norm = BatchNorm(
num_filters, num_filters,
act=act, act=act,
param_attr=fluid.param_attr.ParamAttr(), param_attr=ParamAttr(
bias_attr=fluid.param_attr.ParamAttr()) name=bn_name + "_scale"), #fluid.param_attr.ParamAttr(),
bias_attr=ParamAttr(bn_name +
"_offset"), #fluid.param_attr.ParamAttr())
moving_mean_name=bn_name + "_mean",
moving_variance_name=bn_name + "_variance")
def forward(self, inputs): def forward(self, inputs):
y = self._conv(inputs) y = self._conv(inputs)
...@@ -61,32 +71,36 @@ class BottleneckBlock(fluid.dygraph.Layer): ...@@ -61,32 +71,36 @@ class BottleneckBlock(fluid.dygraph.Layer):
num_filters, num_filters,
stride, stride,
shortcut=True, shortcut=True,
seg_num=8): seg_num=8,
name=None):
super(BottleneckBlock, self).__init__() super(BottleneckBlock, self).__init__()
self.conv0 = ConvBNLayer( self.conv0 = ConvBNLayer(
num_channels=num_channels, num_channels=num_channels,
num_filters=num_filters, num_filters=num_filters,
filter_size=1, filter_size=1,
act='relu') act='relu',
name=name + "_branch2a")
self.conv1 = ConvBNLayer( self.conv1 = ConvBNLayer(
num_channels=num_filters, num_channels=num_filters,
num_filters=num_filters, num_filters=num_filters,
filter_size=3, filter_size=3,
stride=stride, stride=stride,
act='relu') act='relu',
name=name + "_branch2b")
self.conv2 = ConvBNLayer( self.conv2 = ConvBNLayer(
num_channels=num_filters, num_channels=num_filters,
num_filters=num_filters * 4, num_filters=num_filters * 4,
filter_size=1, filter_size=1,
act=None) act=None,
name=name + "_branch2c")
if not shortcut: if not shortcut:
self.short = ConvBNLayer( self.short = ConvBNLayer(
num_channels=num_channels, num_channels=num_channels,
num_filters=num_filters * 4, num_filters=num_filters * 4,
filter_size=1, filter_size=1,
stride=stride) stride=stride,
name=name + "_branch1")
self.shortcut = shortcut self.shortcut = shortcut
self.seg_num = seg_num self.seg_num = seg_num
self._num_channels_out = int(num_filters * 4) self._num_channels_out = int(num_filters * 4)
...@@ -119,7 +133,12 @@ class TSM_ResNet(fluid.dygraph.Layer): ...@@ -119,7 +133,12 @@ class TSM_ResNet(fluid.dygraph.Layer):
num_filters = [64, 128, 256, 512] num_filters = [64, 128, 256, 512]
self.conv = ConvBNLayer( self.conv = ConvBNLayer(
num_channels=3, num_filters=64, filter_size=7, stride=2, act='relu') num_channels=3,
num_filters=64,
filter_size=7,
stride=2,
act='relu',
name="conv1")
self.pool2d_max = Pool2D( self.pool2d_max = Pool2D(
pool_size=3, pool_stride=2, pool_padding=1, pool_type='max') pool_size=3, pool_stride=2, pool_padding=1, pool_type='max')
...@@ -129,14 +148,23 @@ class TSM_ResNet(fluid.dygraph.Layer): ...@@ -129,14 +148,23 @@ class TSM_ResNet(fluid.dygraph.Layer):
for block in range(len(depth)): for block in range(len(depth)):
shortcut = False shortcut = False
for i in range(depth[block]): for i in range(depth[block]):
if self.layers in [101, 152] and block == 2:
if i == 0:
conv_name = "res" + str(block + 2) + "a"
else:
conv_name = "res" + str(block + 2) + "b" + str(i)
else:
conv_name = "res" + str(block + 2) + chr(97 + i)
bottleneck_block = self.add_sublayer( bottleneck_block = self.add_sublayer(
'bb_%d_%d' % (block, i), conv_name,
BottleneckBlock( BottleneckBlock(
num_channels=num_channels, num_channels=num_channels,
num_filters=num_filters[block], num_filters=num_filters[block],
stride=2 if i == 0 and block != 0 else 1, stride=2 if i == 0 and block != 0 else 1,
shortcut=shortcut, shortcut=shortcut,
seg_num=self.seg_num)) seg_num=self.seg_num,
name=conv_name))
num_channels = int(bottleneck_block._num_channels_out) num_channels = int(bottleneck_block._num_channels_out)
self.bottleneck_block_list.append(bottleneck_block) self.bottleneck_block_list.append(bottleneck_block)
shortcut = True shortcut = True
...@@ -151,9 +179,12 @@ class TSM_ResNet(fluid.dygraph.Layer): ...@@ -151,9 +179,12 @@ class TSM_ResNet(fluid.dygraph.Layer):
self.class_dim, self.class_dim,
act="softmax", act="softmax",
param_attr=fluid.param_attr.ParamAttr( param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, stdv)), initializer=fluid.initializer.Uniform(-stdv, stdv),
name="fc_0.w_0"),
bias_attr=fluid.param_attr.ParamAttr( bias_attr=fluid.param_attr.ParamAttr(
learning_rate=2.0, regularizer=fluid.regularizer.L2Decay(0.))) learning_rate=2.0,
regularizer=fluid.regularizer.L2Decay(0.),
name="fc_0.b_0"))
def forward(self, inputs): def forward(self, inputs):
y = fluid.layers.reshape( y = fluid.layers.reshape(
......
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import pickle
import cv2
import numpy as np
import random
class ReaderNotFoundError(Exception):
"Error: reader not found"
def __init__(self, reader_name, avail_readers):
super(ReaderNotFoundError, self).__init__()
self.reader_name = reader_name
self.avail_readers = avail_readers
def __str__(self):
msg = "Reader {} Not Found.\nAvailiable readers:\n".format(
self.reader_name)
for reader in self.avail_readers:
msg += " {}\n".format(reader)
return msg
class DataReader(object):
"""data reader for video input"""
def __init__(self, model_name, mode, cfg):
self.name = model_name
self.mode = mode
self.cfg = cfg
def create_reader(self):
"""Not implemented"""
pass
def get_config_from_sec(self, sec, item, default=None):
if sec.upper() not in self.cfg:
return default
return self.cfg[sec.upper()].get(item, default)
class ReaderZoo(object):
def __init__(self):
self.reader_zoo = {}
def regist(self, name, reader):
assert reader.__base__ == DataReader, "Unknow model type {}".format(
type(reader))
self.reader_zoo[name] = reader
def get(self, name, mode, cfg):
for k, v in self.reader_zoo.items():
if k == name:
return v(name, mode, cfg)
raise ReaderNotFoundError(name, self.reader_zoo.keys())
# singleton reader_zoo
reader_zoo = ReaderZoo()
def regist_reader(name, reader):
reader_zoo.regist(name, reader)
def get_reader(name, mode, cfg):
reader_model = reader_zoo.get(name, mode, cfg)
return reader_model.create_reader()
CUDA_VISIBLE_DEVICES=0,1,2,3 python3.7 -m paddle.distributed.launch --started_port 38989 --log_dir ./mylog.ucf101.frames tsm.py --config=./tsm_ucf101.yaml --use_gpu=True --use_data_parallel=True
CUDA_VISIBLE_DEVICES=0,1,2,3 python3.7 -m paddle.distributed.launch --started_port 18989 --log_dir ./mylog.ucf101.frames.imagenet train.py --config=./tsm_ucf101.yaml --use_gpu=True --use_data_parallel=True --weights=./ResNet50_pretrained/
CUDA_VISIBLE_DEVICES=4,5,6,7 python3.7 -m paddle.distributed.launch --started_port 38989 --log_dir ./mylog.ucf101.frames.k400 train.py --config=./tsm_ucf101.yaml --use_gpu=True --use_data_parallel=True --weights=k400_wei/TSM.pdparams
CUDA_VISIBLE_DEVICES=1 python3.7 -m paddle.distributed.launch --started_port 38989 --log_dir ./mylog.ucf101.frames.k400.sing train.py --config=./tsm_ucf101_sing.yaml --use_gpu=True --use_data_parallel=False --weights=k400_wei/TSM.pdparams
...@@ -24,6 +24,7 @@ from paddle.fluid.dygraph.base import to_variable ...@@ -24,6 +24,7 @@ from paddle.fluid.dygraph.base import to_variable
from model import TSM_ResNet from model import TSM_ResNet
from config_utils import * from config_utils import *
from reader import KineticsReader from reader import KineticsReader
from ucf101_reader import UCF101Reader
logging.root.handlers = [] logging.root.handlers = []
FORMAT = '[%(levelname)s: %(filename)s: %(lineno)4d]: %(message)s' FORMAT = '[%(levelname)s: %(filename)s: %(lineno)4d]: %(message)s'
...@@ -65,12 +66,39 @@ def parse_args(): ...@@ -65,12 +66,39 @@ def parse_args():
type=int, type=int,
default=None, default=None,
help='epoch number, 0 for read from config file') help='epoch number, 0 for read from config file')
parser.add_argument(
'--use_data_parallel',
type=ast.literal_eval,
default=True,
help='default use data parallel.')
parser.add_argument(
'--model_save_dir',
type=str,
default='./output',
help='default model save in ./output.')
parser.add_argument(
'--checkpoint',
type=str,
default=None,
help='path to resume training based on previous checkpoints. '
'None for not resuming any checkpoints.')
parser.add_argument(
'--model_path_pre',
type=str,
default='tsm',
help='default model path pre is tsm.')
parser.add_argument(
'--weights',
type=str,
default='./ResNet50_pretrained/',
help='default weights is ./ResNet50_pretrained/.')
args = parser.parse_args() args = parser.parse_args()
return args return args
def val(epoch, model, cfg, args): def val(epoch, model, cfg, args):
reader = KineticsReader(mode="valid", cfg=cfg) reader = UCF101Reader(name="TSM", mode="valid", cfg=cfg)
reader = reader.create_reader() reader = reader.create_reader()
total_loss = 0.0 total_loss = 0.0
total_acc1 = 0.0 total_acc1 = 0.0
...@@ -101,9 +129,9 @@ def val(epoch, model, cfg, args): ...@@ -101,9 +129,9 @@ def val(epoch, model, cfg, args):
epoch, batch_id, epoch, batch_id,
avg_loss.numpy()[0], acc_top1.numpy()[0], acc_top5.numpy()[0])) avg_loss.numpy()[0], acc_top1.numpy()[0], acc_top5.numpy()[0]))
print('Finish loss {} , acc1 {} , acc5 {}'.format( print('TEST Epoch {}, iter {}, Finish loss {} , acc1 {} , acc5 {}'.format(
total_loss / total_sample, total_acc1 / total_sample, total_acc5 / epoch, batch_id, total_loss / total_sample, total_acc1 / total_sample,
total_sample)) total_acc5 / total_sample))
def create_optimizer(cfg, params): def create_optimizer(cfg, params):
...@@ -132,26 +160,66 @@ def train(args): ...@@ -132,26 +160,66 @@ def train(args):
valid_config = merge_configs(config, 'valid', vars(args)) valid_config = merge_configs(config, 'valid', vars(args))
print_configs(train_config, 'Train') print_configs(train_config, 'Train')
use_data_parallel = False local_rank = fluid.dygraph.parallel.Env().local_rank
use_data_parallel = args.use_data_parallel
trainer_count = fluid.dygraph.parallel.Env().nranks trainer_count = fluid.dygraph.parallel.Env().nranks
place = fluid.CUDAPlace(fluid.dygraph.parallel.Env().dev_id) \ if not args.use_gpu:
if use_data_parallel else fluid.CUDAPlace(0) place = fluid.CPUPlace()
elif not args.use_data_parallel:
place = fluid.CUDAPlace(0)
else:
#(data_parallel step1/6)
place = fluid.CUDAPlace(fluid.dygraph.parallel.Env().dev_id)
#load pretrain
assert os.path.exists(args.weights), \
"Given dir {} not exist.".format(args.weights)
pre_state_dict = fluid.load_program_state(args.weights)
#for key in pre_state_dict.keys():
# print('pre_state_dict.key: {}'.format(key))
with fluid.dygraph.guard(place): with fluid.dygraph.guard(place):
if use_data_parallel: #1. init model
strategy = fluid.dygraph.parallel.prepare_context()
video_model = TSM_ResNet("TSM", train_config) video_model = TSM_ResNet("TSM", train_config)
#2. set weights
param_state_dict = {}
model_dict = video_model.state_dict()
for key in model_dict.keys():
weight_name = model_dict[key].name
if weight_name in pre_state_dict.keys(
) and weight_name != "fc_0.w_0" and weight_name != "fc_0.b_0":
print('succ Load weight: {}, shape: {}'.format(
weight_name, pre_state_dict[weight_name].shape))
param_state_dict[key] = pre_state_dict[weight_name]
else:
print('fail Load weight: {}'.format(weight_name))
param_state_dict[key] = model_dict[key]
video_model.set_dict(param_state_dict)
#3. init optim
optimizer = create_optimizer(train_config.TRAIN, optimizer = create_optimizer(train_config.TRAIN,
video_model.parameters()) video_model.parameters())
if use_data_parallel: if use_data_parallel:
#(data_parallel step2,3/6)
strategy = fluid.dygraph.parallel.prepare_context()
video_model = fluid.dygraph.parallel.DataParallel(video_model, video_model = fluid.dygraph.parallel.DataParallel(video_model,
strategy) strategy)
# 4. load checkpoint
if args.checkpoint:
assert os.path.exists(args.checkpoint + ".pdparams"), \
"Given dir {}.pdparams not exist.".format(args.checkpoint)
assert os.path.exists(args.checkpoint + ".pdopt"), \
"Given dir {}.pdopt not exist.".format(args.checkpoint)
para_dict, opti_dict = fluid.dygraph.load_dygraph(args.checkpoint)
video_model.set_dict(para_dict)
optimizer.set_dict(opti_dict)
# 5. reader
bs_denominator = 1 bs_denominator = 1
if args.use_gpu: if args.use_gpu:
# check number of GPUs
gpus = os.getenv("CUDA_VISIBLE_DEVICES", "") gpus = os.getenv("CUDA_VISIBLE_DEVICES", "")
if gpus == "": if gpus == "":
pass pass
...@@ -168,27 +236,36 @@ def train(args): ...@@ -168,27 +236,36 @@ def train(args):
train_config.TRAIN.batch_size = int(train_config.TRAIN.batch_size / train_config.TRAIN.batch_size = int(train_config.TRAIN.batch_size /
bs_denominator) bs_denominator)
train_reader = KineticsReader(mode="train", cfg=train_config) train_reader = UCF101Reader(name="TSM", mode="train", cfg=train_config)
train_reader = train_reader.create_reader() train_reader = train_reader.create_reader()
if use_data_parallel: if use_data_parallel:
#(data_parallel step4/6)
train_reader = fluid.contrib.reader.distributed_batch_reader( train_reader = fluid.contrib.reader.distributed_batch_reader(
train_reader) train_reader)
# 6. train loop
for epoch in range(train_config.TRAIN.epoch): for epoch in range(train_config.TRAIN.epoch):
video_model.train() video_model.train()
total_loss = 0.0 total_loss = 0.0
total_acc1 = 0.0 total_acc1 = 0.0
total_acc5 = 0.0 total_acc5 = 0.0
total_sample = 0 total_sample = 0
t_last = time.time()
# 6.1 for each batch, call model() , backward(), and minimize()
for batch_id, data in enumerate(train_reader()): for batch_id, data in enumerate(train_reader()):
t1 = time.time()
x_data = np.array([item[0] for item in data]) x_data = np.array([item[0] for item in data])
y_data = np.array([item[1] for item in data]).reshape([-1, 1]) y_data = np.array([item[1] for item in data]).reshape([-1, 1])
imgs = to_variable(x_data) imgs = to_variable(x_data)
labels = to_variable(y_data) labels = to_variable(y_data)
labels.stop_gradient = True labels.stop_gradient = True
t2 = time.time()
outputs = video_model(imgs) outputs = video_model(imgs)
t3 = time.time()
loss = fluid.layers.cross_entropy( loss = fluid.layers.cross_entropy(
input=outputs, label=labels, ignore_index=-1) input=outputs, label=labels, ignore_index=-1)
avg_loss = fluid.layers.mean(loss) avg_loss = fluid.layers.mean(loss)
...@@ -198,34 +275,62 @@ def train(args): ...@@ -198,34 +275,62 @@ def train(args):
acc_top5 = fluid.layers.accuracy( acc_top5 = fluid.layers.accuracy(
input=outputs, label=labels, k=5) input=outputs, label=labels, k=5)
current_step_lr = optimizer.current_step_lr()
if use_data_parallel: if use_data_parallel:
#(data_parallel step5/6)
avg_loss = video_model.scale_loss(avg_loss) avg_loss = video_model.scale_loss(avg_loss)
avg_loss.backward() avg_loss.backward()
video_model.apply_collective_grads() video_model.apply_collective_grads()
else: else:
avg_loss.backward() avg_loss.backward()
t4 = time.time()
optimizer.minimize(avg_loss) optimizer.minimize(avg_loss)
video_model.clear_gradients() video_model.clear_gradients()
t5 = time.time()
total_loss += avg_loss.numpy()[0] total_loss += avg_loss.numpy()[0]
total_acc1 += acc_top1.numpy()[0] total_acc1 += acc_top1.numpy()[0]
total_acc5 += acc_top5.numpy()[0] total_acc5 += acc_top5.numpy()[0]
total_sample += 1 total_sample += 1
print('TRAIN Epoch {}, iter {}, loss = {}, acc1 {}, acc5 {}'. print(
format(epoch, batch_id, 'TRAIN Epoch: %d, iter: %d, loss: %.5f, acc1: %.5f, acc5: %.5f, lr: %.5f, forward_cost:%.5f s, backward_cost:%.5f s, minimize_cost:%.5f s, to_variable_cost: %.5f s, batch_cost: %.5f s, reader_cost: %.5f s'
avg_loss.numpy()[0], % (epoch, batch_id, avg_loss.numpy()[0],
acc_top1.numpy()[0], acc_top5.numpy()[0])) acc_top1.numpy()[0], acc_top5.numpy()[0],
current_step_lr, t3 - t2, t4 - t3, t5 - t4, t2 - t1,
t5 - t_last, t2 - t_last))
t_last = time.time()
print( print(
'TRAIN End, Epoch {}, avg_loss= {}, avg_acc1= {}, avg_acc5= {}'. 'TRAIN End, Epoch {}, avg_loss= {}, avg_acc1= {}, avg_acc5= {}, lr={}'.
format(epoch, total_loss / total_sample, total_acc1 / format(epoch, total_loss / total_sample, total_acc1 /
total_sample, total_acc5 / total_sample)) total_sample, total_acc5 / total_sample,
current_step_lr))
# 6.2 save checkpoint
if local_rank == 0:
if not os.path.isdir(args.model_save_dir):
os.makedirs(args.model_save_dir)
model_path = os.path.join(
args.model_save_dir,
args.model_path_pre + "_epoch{}".format(epoch))
fluid.dygraph.save_dygraph(video_model.state_dict(), model_path)
fluid.dygraph.save_dygraph(optimizer.state_dict(), model_path)
print('save_dygraph End, Epoch {}/{} '.format(
epoch, train_config.TRAIN.epoch))
# 6.3 validation
video_model.eval() video_model.eval()
val(epoch, video_model, valid_config, args) val(epoch, video_model, valid_config, args)
if fluid.dygraph.parallel.Env().local_rank == 0: # 7. save final model
fluid.dygraph.save_dygraph(video_model.state_dict(), "final") if local_rank == 0:
model_path = os.path.join(args.model_save_dir,
args.model_path_pre + "_final")
fluid.dygraph.save_dygraph(video_model.state_dict(), model_path)
fluid.dygraph.save_dygraph(optimizer.state_dict(), model_path)
logger.info('[TRAIN] training finished') logger.info('[TRAIN] training finished')
......
MODEL:
name: "TSM"
format: "frames"
num_classes: 101
seg_num: 8
seglen: 1
image_mean: [0.485, 0.456, 0.406]
image_std: [0.229, 0.224, 0.225]
num_layers: 50
topk: 5
TRAIN:
epoch: 80
short_size: 256
target_size: 224
num_reader_threads: 12
buf_size: 1024
batch_size: 64
use_gpu: True
num_gpus: 4
filelist: "./data/dataset/ucf101/ucf101_train_split_1_rawframes.txt"
learning_rate: 0.01
learning_rate_decay: 0.1
decay_epochs: [40, 60]
l2_weight_decay: 1e-4
momentum: 0.9
total_videos: 9537
fix_random_seed: False
VALID:
short_size: 256
target_size: 224
num_reader_threads: 12
buf_size: 1024
batch_size: 32
filelist: "./data/dataset/ucf101/ucf101_val_split_1_rawframes.txt"
TEST:
short_size: 256
target_size: 224
num_reader_threads: 12
buf_size: 1024
batch_size: 16
filelist: "./data/dataset/ucf101/ucf101_val_split_1_rawframes.txt"
MODEL:
name: "TSM"
format: "frames"
num_classes: 101
seg_num: 8
seglen: 1
image_mean: [0.485, 0.456, 0.406]
image_std: [0.229, 0.224, 0.225]
num_layers: 50
topk: 5
TRAIN:
epoch: 80
short_size: 256
target_size: 224
num_reader_threads: 12
buf_size: 1024
batch_size: 16
use_gpu: True
num_gpus: 1
filelist: "./data/dataset/ucf101/ucf101_train_split_1_rawframes.txt"
learning_rate: 0.01
learning_rate_decay: 0.1
decay_epochs: [40, 60]
l2_weight_decay: 1e-4
momentum: 0.9
total_videos: 9537
fix_random_seed: False
VALID:
short_size: 256
target_size: 224
num_reader_threads: 12
buf_size: 1024
batch_size: 32
filelist: "./data/dataset/ucf101/ucf101_val_split_1_rawframes.txt"
TEST:
short_size: 256
target_size: 224
num_reader_threads: 12
buf_size: 1024
batch_size: 16
filelist: "./data/dataset/ucf101/ucf101_val_split_1_rawframes.txt"
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册