提交 9e963ac2 编写于 作者: L LiuChaoXD

add tsn 2020-08-31

上级 5df67047
......@@ -57,20 +57,20 @@ TSN的训练数据采用UCF101动作识别数据集。数据下载及准备请
1. 多卡训练
```bash
bash multi_gpus_run.sh ./configs/tsn.yaml
bash multi_gpus_run.sh
```
多卡训练所使用的gpu可以通过如下方式设置:
- 修改`multi_gpus_run.sh``export CUDA_VISIBLE_DEVICES=0,1,2,3`(默认为0,1,2,3表示使用0,1,2,3卡号的gpu进行训练)
- 注意:若修改了batchsize则学习率也要做相应的修改,规则为大batchsize用大lr,即同倍数增长缩小关系。例如,默认batchsize=128,lr=0.001,若batchsize=64,lr=0.0005
- 注意:多卡训练的参数配置文件为`multi_tsn.yaml`。若修改了batchsize则学习率也要做相应的修改,规则为大batchsize用大lr,即同倍数增长缩小关系。例如,默认四卡batchsize=128,lr=0.001,若batchsize=64,lr=0.0005。
2. 单卡训练
```bash
bash single_gpu_run.sh ./configs/tsn.yaml
bash single_gpu_run.sh
```
单卡训练所使用的gpu可以通过如下方式设置:
- 修改 `run.sh` 中的 `export CUDA_VISIBLE_DEVICES=0` (表示使用gpu 0 进行模型训练)
- 注意:若修改了batchsize则学习率也要做相应的修改,规则为大batchsize用大lr,即同倍数增长缩小关系。例如,默认batchsize=128,lr=0.001,若batchsize=64,lr=0.0005
- 修改 `single_gpu_run.sh` 中的 `export CUDA_VISIBLE_DEVICES=0` (表示使用gpu 0 进行模型训练)
- 注意:单卡训练的参数配置文件为`single_gpu_run.sh`。若修改了batchsize则学习率也要做相应的修改,规则为大batchsize用大lr,即同倍数增长缩小关系。默认单卡batchsize=64,lr=0.0005;若batchsize=32,lr=0.00025
## 模型评估
可通过如下方式进行模型评估:
......
configs="./tsn.yaml"
configs="./multi_tsn.yaml"
pretrain="" # set pretrain model path if needed
resume="" # set checkpoints model path if u want to resume training
save_dir=""
......
......@@ -15,7 +15,7 @@ TRAIN:
target_size: 224
num_reader_threads: 12
buf_size: 1024
batch_size: 256
batch_size: 128
use_gpu: True
filelist: "./data/dataset/ucf101/ucf101_train_split_1_rawframes.txt"
learning_rate: 0.001
......@@ -23,7 +23,7 @@ TRAIN:
decay_epochs: [30, 60]
l2_weight_decay: 1e-4
momentum: 0.9
total_videos: 9738 #239781
total_videos: 9738
VALID:
short_size: 256
......
configs="tsn.yaml"
configs="single_tsn.yaml"
pretrain="" # set pretrain model path if needed
resume="" # set checkpoints model path if u want to resume training
save_dir=""
......
MODEL:
name: "TSN"
format: "frames" # support for "frames" or "videos"
num_classes: 101
seg_num: 3
seglen: 1
image_mean: [0.485, 0.456, 0.406]
image_std: [0.229, 0.224, 0.225]
num_layers: 50
topk: 5
TRAIN:
epoch: 80
short_size: 256
target_size: 224
num_reader_threads: 12
buf_size: 1024
batch_size: 64
use_gpu: True
filelist: "./data/dataset/ucf101/ucf101_train_split_1_rawframes.txt"
learning_rate: 0.0005
learning_rate_decay: 0.1
decay_epochs: [30, 60]
l2_weight_decay: 1e-4
momentum: 0.9
total_videos: 9738
VALID:
short_size: 256
target_size: 224
num_reader_threads: 12
buf_size: 1024
batch_size: 128
filelist: "./data/dataset/ucf101/ucf101_val_split_1_rawframes.txt"
TEST:
short_size: 256
target_size: 224
num_reader_threads: 12
buf_size: 1024
batch_size: 64
filelist: "./data/dataset/ucf101/ucf101_val_split_1_rawframes.txt"
\ No newline at end of file
......@@ -181,14 +181,17 @@ def val(epoch, model, cfg, args):
acc_top1 = fluid.layers.accuracy(input=outputs, label=labels, k=1)
acc_top5 = fluid.layers.accuracy(input=outputs, label=labels, k=5)
total_loss += avg_loss.numpy()[0]
dy_out = avg_loss.numpy()[0]
total_loss += dy_out
total_acc1 += acc_top1.numpy()[0]
total_acc5 += acc_top5.numpy()[0]
total_sample += 1
print('TEST Epoch {}, iter {}, loss = {}, acc1 {}, acc5 {}'.format(
epoch, batch_id,
avg_loss.numpy()[0], acc_top1.numpy()[0], acc_top5.numpy()[0]))
if batch_id % 5 == 0:
print(
"TEST Epoch {}, iter {}, loss={:.5f}, acc1 {:.5f}, acc5 {:.5f}".
format(epoch, batch_id, total_loss / total_sample, total_acc1 /
total_sample, total_acc5 / total_sample))
print('Finish loss {} , acc1 {} , acc5 {}'.format(
total_loss / total_sample, total_acc1 / total_sample, total_acc5 /
......@@ -297,6 +300,7 @@ def train(args):
input=outputs, label=labels, k=1)
acc_top5 = fluid.layers.accuracy(
input=outputs, label=labels, k=5)
dy_out = avg_loss.numpy()[0]
if use_data_parallel:
# (data_parallel step5/6)
......@@ -309,16 +313,16 @@ def train(args):
optimizer.minimize(avg_loss)
video_model.clear_gradients()
total_loss += avg_loss.numpy()[0]
total_loss += dy_out
total_acc1 += acc_top1.numpy()[0]
total_acc5 += acc_top5.numpy()[0]
total_sample += 1
train_batch_cost = time.time() - batch_start
print(
'TRAIN Epoch: {}, iter: {}, batch_cost: {: .5f} s, reader_cost: {: .5f} s loss={: .6f}, acc1 {: .6f}, acc5 {: .6f} \t'.
'TRAIN Epoch: {}, iter: {}, batch_cost: {:.5f} s, reader_cost: {:.5f} s, loss={:.6f}, acc1 {:.6f}, acc5 {:.6f} '.
format(epoch, batch_id, train_batch_cost, train_reader_cost,
avg_loss.numpy()[0],
acc_top1.numpy()[0], acc_top5.numpy()[0]))
total_loss / total_sample, total_acc1 / total_sample,
total_acc5 / total_sample))
batch_start = time.time()
print(
......@@ -349,11 +353,11 @@ def train(args):
else:
if val_acc > best_acc:
best_acc = val_acc
if fluid.dygraph.parallel.Env().local_rank == 0:
if not os.path.isdir(args.weights):
os.makedirs(args.weights)
fluid.dygraph.save_dygraph(video_model.state_dict(),
args.weights + "/final")
if fluid.dygraph.parallel.Env().local_rank == 0:
if not os.path.isdir(args.weights):
os.makedirs(args.weights)
fluid.dygraph.save_dygraph(video_model.state_dict(),
args.weights + "/final")
else:
if fluid.dygraph.parallel.Env().local_rank == 0:
if not os.path.isdir(args.weights):
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册