未验证 提交 c44bfba7 编写于 作者: H huangjun12 提交者: GitHub

Update video tag (#4916)

* update videotag, add fine-tune code and doc

* refine datalist

* refine eval.py
上级 f24158de
# 模型微调指南
---
## 内容
参考本文档,您可以使用自己的训练数据在VideoTag预训练模型上进行fine-tune,训练出自己的模型。
文档内容包括:
- [原理解析](#原理解析)
- [对AttentionLSTM模型进行微调](#对AttentionLSTM模型进行微调)
- [对TSN模型进行微调](#对TSN模型进行微调)
- [扩展内容](#扩展内容)
- [参考论文](#参考论文)
## 原理解析
VideoTag采用两阶段建模方式,由两个模型组成: TSN + AttentionLSTM。
Temporal Segment Network (TSN) 是经典的基于2D-CNN的视频分类模型。该模型通过稀疏采样视频帧的方式,在捕获视频时序信息的同时降低了计算量。详细内容请参考论文[Temporal Segment Networks: Towards Good Practices for Deep Action Recognition](https://arxiv.org/abs/1608.00859)
AttentionLSTM以视频的特征向量作为输入,采用双向长短时记忆网络(LSTM)对所有帧特征进行编码,并增加Attention层,将每个时刻的隐状态输出与自适应权重线性加权得到最终分类向量。详细内容请参考论文[AttentionCluster](https://arxiv.org/abs/1711.09550)
VideoTag训练时分两个阶段: 第一阶段使用少量视频样本(十万级别)训练大规模视频特征提取模型(TSN);第二阶段使用千万级数据训练预测器(AttentionLSTM)。
VideoTag预测时也分两个阶段: 第一阶段以视频文件作为输入,经过去除了全连接层以及损失函数层的TSN网络后得到输出特征向量;第二阶段以TSN网络输出的特征向量作为输入,经过AttentionLSTM后得到最终的分类结果。
基于我们的预模型,您可以使用自己的训练数据进行fine-tune:
- [对AttentionLSTM模型进行微调](#对AttentionLSTM模型进行微调)
- [对TSN模型进行微调](#对TSN模型进行微调)
## 对AttentionLSTM模型进行微调
AttentionLSTM以视频特征作为输入,显存占用少,训练速度较TSN更快,因此推荐优先对AttentionLSTM模型进行微调。输入视频首先经过TSN预训练模型提取特征向量,然后将特征向量作为训练输入数据,微调AttentionLSTM模型。
### TSN预模型提取特征向量
#### 数据准备
- 预训练权重下载: 参考[样例代码运行指南-数据准备-预训练权重下载](./Run.md)
- 准备训练数据: 准备好待训练的视频数据,并在video\_tag/data/TsnExtractor.list文件中指定待训练的文件路径,内容格式如下:
```
my_video_path/my_video_file1.mp4
my_video_path/my_video_file2.mp4
...
```
#### 特征提取
特征提取脚本如下:
```
python tsn_extractor.py --model_name=TSN --config=./configs/tsn.yaml --weights=./weights/tsn.pdparams
```
- 通过--weights可指定TSN权重参数的存储路径,默认为video\_tag/weights/tsn.pdparams
- 通过--save\_dir可指定特征向量保存路径,默认为video\_tag/data/tsn\_features,不同输入视频的特征向量提取结果分文件保存在不同的npy文件中,目录形式为:
```
video_tag
├──data
├──tsn_features
├── my_feature_file1.npy
├── my_feature_file2.npy
...
```
- tsn提取的特征向量维度为```帧数*特征维度```,默认为300 * 2048。
### AttentionLSTM模型Fine-tune
#### 数据准备
VideoTag中的AttentionLSTM以TSN模型提取的特征向量作为输入。在video\_tag/data/dataset/attention\_lstm/train.list文件中指定待训练的文件路径和对应的标签,内容格式如下:
```
my_feature_path/my_feature_file1.npy label1 label2
my_feature_path/my_feature_file2.npy label1
...
```
- 一个输入视频可以有多个标签,标签索引为整型数据,文件名与标签之间、多个标签之间以一个空格分隔;
- 标签索引与标签名称的之间的对应关系以list文件指定,可参考VideoTag用到的label_3396.txt文件构造,行索引对应标签索引;
- 验证集、测试集以及预测数据集的构造方式同训练集类似,仅需要在video\_tag/data/attention\_lstm/目录下对应的list文件中指定相关文件路径/标签即可。
#### 模型训练
使用VideoTag中的AttentionLSTM预模型进行fine-tune训练脚本如下:
```
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python train.py --model_name=AttentionLSTM --config=./configs/attention_lstm.yaml --pretrain=./weights/attention_lstm
```
- AttentionLSTM模型默认使用8卡训练,总的batch size数是1024。若使用单卡训练,请修改环境变量,脚本如下:
```
export CUDA_VISIBLE_DEVICES=0
python train.py --model_name=AttentionLSTM --config=./configs/attention_lstm-single.yaml --pretrain=./weights/attention_lstm
```
- 请确保训练样本数大于batch_size数
- 通过--pretrain参数可指定AttentionLSTM预训练模型的路径,默认为./weights/attention\_lstm;
- 模型相关配置写在video_tag/configs/attention\_lstm.yaml文件中,可以方便的调节各项超参数;
- 通过--save_dir参数可指定训练模型参数的保存路径,默认为./data/checkpoints;
#### 模型评估
可用如下方式进行模型评估:
```
python eval.py --model_name=AttentionLSTM --config=./configs/attention_lstm.yaml --weights=./data/checkpoints/AttentionLSTM_epoch9.pdparams
```
- 通过--weights参数可指定评估需要的权重,默认为./data/checkpoints/AttentionLSTM_epoch9.pdparams;
- 评估结果以log的形式直接打印输出GAP、Hit@1等精度指标。
#### 模型推断
可用如下方式进行模型推断:
```
python predict.py --model_name=AttentionLSTM --config=./configs/attention_lstm.yaml --weights=./data/checkpoints/AttentionLSTM_epoch9.pdparams
```
- 通过--weights参数可指定推断需要的权重,默认为./data/checkpoints/AttentionLSTM_epoch9.pdparams;
- 通过--label_file参数指定标签文件,请根据自己的数据修改,默认为./label_3396.txt;
- 预测结果会以日志形式打印出来,同时也保存在json文件中,通过--save_dir参数可指定预测结果保存路径,默认为./data/predict_results/attention_lstm。
## 对TSN模型进行微调
VideoTag中使用的TSN模型以mp4文件为输入,backbone为ResNet101。
### 数据准备
准备好训练视频文件后,在video\_tag/data/dataset/tsn/train.list文件中指定待训练的文件路径和对应的标签即可,内容格式如下:
```
my_video_path/my_video_file1.mp4 label1
my_video_path/my_video_file2.mp4 label2
...
```
- 一个输入视频只能有一个标签,标签索引为整型数据,标签索引与文件名之间以一个空格分隔;
- 验证集、测试集以及预测数据集的构造方式同训练集类似,仅需要在video\_tag/data/dataset/tsn目录下对应的list文件中指定相关文件路径/标签即可。
#### 模型训练
使用VideoTag中的TSN预模型进行fine-tune训练脚本如下:
```
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python train.py --model_name=TSN --config=./configs/tsn.yaml --pretrain=./weights/tsn
```
- TSN模型默认使用8卡训练,总的batch size数是256。若使用单卡训练,请修改环境变量,脚本如下:
```
export CUDA_VISIBLE_DEVICES=0
python train.py --model_name=TSN --config=./configs/tsn-single.yaml --pretrain=./weights/tsn
```
- 通过--pretrain参数可指定TSN预训练模型的路径,示例为./weights/tsn;
- 模型相关配置写在video_tag/configs/tsn.yaml文件中,可以方便的调节各项超参数;
- 通过--save_dir参数可指定训练模型参数的保存路径,默认为./data/checkpoints;
#### 模型评估
可用如下方式进行模型评估:
```
python eval.py --model_name=TSN --config=./configs/tsn.yaml --weights=./data/checkpoints/TSN_epoch44.pdparams
```
- 通过--weights参数可指定评估需要的权重,示例为./data/checkpoints/TSN_epoch44.pdparams;
- 评估结果以log的形式直接打印输出TOP1_ACC、TOP5_ACC等精度指标。
#### 模型推断
可用如下方式进行模型推断:
```
python predict.py --model_name=TSN --config=./configs/tsn.yaml --weights=./data/checkpoints/TSN_epoch44.pdparams --save_dir=./data/predict_results/tsn/
```
- 通过--weights参数可指定推断需要的权重,示例为./data/checkpoints/TSN_epoch44.pdparams;
- 通过--label_file参数指定标签文件,请根据自己的数据修改,默认为./label_3396.txt;
- 预测结果会以日志形式打印出来,同时也保存在json文件中,通过--save_dir参数可指定预测结果保存路径,示例为./data/predict_results/tsn。
### 训练加速
TSN模型默认以mp4的视频文件作为输入,训练时需要先对视频文件解码,再将解码后的数据送入网络进行训练,如果视频文件很大,这个过程将会很耗时。
为加速训练,可以先将视频解码成图片,然后保存下来,训练时直接根据索引读取帧图片作为输入,加快训练过程。
- 数据准备: 首先将视频解码,存成帧图片;然后生成帧图片的文件路径列表。实现过程可参考[ucf-101数据准备](../../../../dygraph/tsn/data/dataset/ucf101/README.md)
- 修改配置文件: 修改配置文件./config/tsn.yaml,其中MODEL.format值改为"frames",不同模式下的filelist值改为对应的帧图片文件list。
## 扩展内容
- 更多关于TSN模型的内容可参考PaddleCV视频库[TSN视频分类模型](../../models/tsn/README.md)
- 更多关于AttentionLSTM模型的内容可参考PaddleCV视频库[AttentionLSTM视频分类模型](../../models/attention_lstm/README.md)
## 参考论文
- [Temporal Segment Networks: Towards Good Practices for Deep Action Recognition](https://arxiv.org/abs/1608.00859), Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, Luc Van Gool
- [Beyond Short Snippets: Deep Networks for Video Classification](https://arxiv.org/abs/1503.08909) Joe Yue-Hei Ng, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, George Toderici
...@@ -4,11 +4,7 @@ ...@@ -4,11 +4,7 @@
## 内容 ## 内容
- [模型简介](#模型简介) - [模型简介](#模型简介)
- [安装说明](#安装说明) - [使用方法](#使用方法)
- [数据准备](#数据准备)
- [模型推断](#模型推断)
- [模型微调](#模型微调)
- [参考论文](#参考论文)
## 模型简介 ## 模型简介
...@@ -16,7 +12,7 @@ ...@@ -16,7 +12,7 @@
飞桨大规模视频分类模型VideoTag基于百度短视频业务千万级数据,支持3000个源于产业实践的实用标签,具有良好的泛化能力,非常适用于国内大规模(千万/亿/十亿级别)短视频分类场景的应用。VideoTag采用两阶段建模方式,即图像建模和序列学习。第一阶段,使用少量视频样本(十万级别)训练大规模视频特征提取模型(Extractor);第二阶段,使用千万级数据训练预测器(Predictor),最终实现在超大规模(千万/亿/十亿级别)短视频上产业应用,其原理示意如下图所示。 飞桨大规模视频分类模型VideoTag基于百度短视频业务千万级数据,支持3000个源于产业实践的实用标签,具有良好的泛化能力,非常适用于国内大规模(千万/亿/十亿级别)短视频分类场景的应用。VideoTag采用两阶段建模方式,即图像建模和序列学习。第一阶段,使用少量视频样本(十万级别)训练大规模视频特征提取模型(Extractor);第二阶段,使用千万级数据训练预测器(Predictor),最终实现在超大规模(千万/亿/十亿级别)短视频上产业应用,其原理示意如下图所示。
<p align="center"> <p align="center">
<img src="video_tag.png" height=220 width=800 hspace='10'/> <br /> <img src="images.png" height=220 width=800 hspace='10'/> <br />
Temporal shift module Temporal shift module
</p> </p>
...@@ -29,87 +25,7 @@ Temporal shift module ...@@ -29,87 +25,7 @@ Temporal shift module
- 预测结果:融合多个模型结果实现视频分类,进一步提高分类准确率。 - 预测结果:融合多个模型结果实现视频分类,进一步提高分类准确率。
## 安装说明 ## 使用方法
- [1. 如何运行样例代码](./Run.md)
运行样例代码需要PaddlePaddle版本>= 1.7.0,请参考[安装文档](https://www.paddlepaddle.org.cn/documentation/docs/zh/1.7/install/index_cn.html)安装PaddlePaddle。 - [2. 如何使用自己的数据进行测试](./Test.md)
- [3. 如何进行模型fine-tune](./FineTune.md)
- 环境依赖:
```
CUDA >= 9.0
cudnn >= 7.5
OpenCV >= 4.1.0 : pip install opencv-python
```
## 数据准备
- 预训练权重下载:我们提供了[TSN](https://videotag.bj.bcebos.com/video_tag_tsn.tar)[AttentionLSTM](https://videotag.bj.bcebos.com/video_tag_lstm.tar)预训练权重,请下载后解压,并将参数文件放在weights目录下,目录结构如下:
```
video_tag
├──weights
├── attention_lstm.pdmodel
├── attention_lstm.pdopt
├── attention_lstm.pdparams
├── tsn.pdmodel
├── tsn.pdopt
└── tsn.pdparams
```
- 示例视频下载:我们提供了[样例视频](https://videotag.bj.bcebos.com/mp4.tar)方便用户测试,请下载后解压,并将视频文件放置在video\_tag/data/mp4目录下,目录结构如下:
```
video_tag
├──data
├── mp4
├── 1.mp4
└── 2.mp4
```
- 目前支持的视频文件输入格式为:mp4、mkv和webm格式;
- 模型会从输入的视频文件中均匀抽取300帧用于预测。对于较长的视频文件,建议先截取有效部分输入模型以提高预测速度。
## 模型推断
模型推断的启动方式如下:
bash run_TSN_LSTM.sh
- 可修改video\_tag/data/tsn.list文件内容,指定待推断的文件路径列表;
- 通过--filelist可指定输入list文件路径,默认为video\_tag/data/tsn.list;
- 通过--extractor\_weights可指定特征提取器参数的存储路径,默认为video\_tag/weights/tsn;
- 通过--predictor\_weights可指定预测器参数的存储路径,默认为video\_tag/weights/attention\_lstm;
- 通过--use\_gpu参数可指定是否使用gpu进行推断,默认使用gpu。对于10s左右的短视频文件,gpu推断时间约为4s;
- 通过--save\_dir可指定预测结果存储路径,默认为video\_tag/data/results,结果保存在json文件中,其格式为:
```
[file_path,
{"class_name": class_name1, "probability": probability1, "class_id": class_id1},
{"class_name": class_name2, "probability": probability2, "class_id": class_id2},
...
]
```
- 通过--label\_file可指定标签文件存储路径,默认为video\_tag/label\_3396.txt;
- 模型相关配置写在video\_tag/configs目录下的yaml文件中。
## 模型微调
- VideoTag中的TSN模型只输出视频特征,无需输出最终分类结果,fine-tune请参考PaddleCV视频库[TSN视频分类模型](../../models/tsn/README.md)请对应修改模型文件。
- VideoTag中的attention\_lstm模型只需要输入视频特征,无需音频特征输入,fine-tune请参考PaddleCV视频库[AttentionLSTM视频分类模型](../../models/attention_lstm/README.md)对应修改模型文件。
## 参考论文
- [Temporal Segment Networks: Towards Good Practices for Deep Action Recognition](https://arxiv.org/abs/1608.00859), Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, Luc Van Gool
- [Beyond Short Snippets: Deep Networks for Video Classification](https://arxiv.org/abs/1503.08909) Joe Yue-Hei Ng, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, George Toderici
# 样例代码运行指南
---
## 内容
参考本文档,您可以快速熟悉VideoTag的使用方法,观察VideoTag的预训练模型在示例视频上的预测结果。
文档内容包括:
- [安装说明](#安装说明)
- [数据准备](#数据准备)
- [模型推断](#模型推断)
## 安装说明
### 环境依赖:
```
CUDA >= 9.0
cudnn >= 7.5
```
### 依赖安装:
- 1.7.0 <= PaddlePaddle版本 <= 2.0.0: pip install paddlepaddle-gpu==1.8.4.post97 -i https://mirror.baidu.com/pypi/simple
- opencv版本 >= 4.1.0: pip install opencv-python==4.2.0.34
## 数据准备
### 预训练权重下载
我们提供了[TSN](https://videotag.bj.bcebos.com/video_tag_tsn.tar)[AttentionLSTM](https://videotag.bj.bcebos.com/video_tag_lstm.tar)预训练权重,请在video\_tag目录下新建weights目录,并将下载解压后的参数文件放在weights目录下:
```
mkdir weights
cd weights
wget https://videotag.bj.bcebos.com/video_tag_tsn.tar
wget https://videotag.bj.bcebos.com/video_tag_lstm.tar
tar -zxvf video_tag_tsn.tar
tar -zxvf video_tag_lstm.tar
rm video_tag_tsn.tar -rf
rm video_tag_lstm.tar -rf
mv video_tag_tsn/* .
mv attention_lstm/* .
rm video_tag_tsn/ -rf
rm attention_lstm -rf
```
所得目录结构如下:
```
video_tag
├──weights
├── attention_lstm.pdmodel
├── attention_lstm.pdopt
├── attention_lstm.pdparams
├── tsn.pdmodel
├── tsn.pdopt
└── tsn.pdparams
```
### 示例视频下载
我们提供了[样例视频](https://videotag.bj.bcebos.com/mp4.tar)方便用户测试,请下载后解压,并将视频文件放置在video\_tag/data/mp4目录下:
```
cd data/
wget https://videotag.bj.bcebos.com/mp4.tar
tar -zxvf mp4.tar
rm mp4.tar -rf
```
所得目录结构如下:
```
video_tag
├──data
├── mp4
├── 1.mp4
├── 2.mp4
└── ...
```
## 模型推断
模型推断的启动方式如下:
python videotag_test.py
- 预测结果会以日志方式打印,示例如下:
```
[========video_id [ data/mp4/1.mp4 ] , topk(20) preds: ========]
class_id: 3110, class_name: 训练 , probability: 0.97730666399
class_id: 2159, class_name: 蹲 , probability: 0.945082366467
...
[========video_id [ data/mp4/2.mp4 ] , topk(20) preds: ========]
class_id: 2773, class_name: 舞蹈 , probability: 0.850423932076
class_id: 1128, class_name: 表演艺术 , probability: 0.0446354188025
...
```
- 通过--save\_dir可指定预测结果存储路径,默认为video\_tag/data/VideoTag\_results,不同输入视频的预测结果分文件保存在不同的json文件中,文件的内容格式为:
```
[file_path,
{"class_name": class_name1, "probability": probability1, "class_id": class_id1},
{"class_name": class_name2, "probability": probability2, "class_id": class_id2},
...
]
```
# 预训练模型自测指南
## 内容
参考本文档,您可以快速测试VideoTag的预训练模型在自己业务数据上的预测效果。
主要内容包括:
- [数据准备](#数据准备)
- [模型推断](#模型推断)
## 数据准备
在数据准备阶段,您需要准备好自己的测试数据,并在video\_tag/data/VideoTag\_test.list文件中指定待推断的测试文件路径,内容格式如下:
```
my_video_path/my_video_file1.mp4
my_video_path/my_video_file2.mp4
...
```
## 模型推断
模型推断的启动方式如下:
python videotag_test.py
- 目前支持的视频文件输入格式为:mp4、mkv和webm格式;
- 模型会从输入的视频文件中*均匀抽取300帧*用于预测。对于较长的视频文件,建议先截取有效部分输入模型以提高预测速度;
- 通过--use\_gpu参数可指定是否使用gpu进行推断,默认使用gpu。对于10s左右的短视频文件,gpu推断时间约为4s;
- 通过--filelist可指定输入list文件路径,默认为video\_tag/data/VideoTag\_test.list。
MODEL:
name: "AttentionLSTM"
dataset: "YouTube-8M" #Default, don't recommand to modify it
bone_nework: None
drop_rate: 0.5
feature_names: ['rgb'] #rbg only, without audio
feature_dims: [2048]
embedding_size: 1024
lstm_size: 512
num_classes: 3396
topk: 20
TRAIN:
epoch: 10
learning_rate: 0.000125
decay_epochs: [5]
decay_gamma: 0.1
weight_decay: 0.0008
num_samples: 5000000 # modify it according to the number samples of your dataset
pretrain_base: None
batch_size: 128
use_gpu: True
num_gpus: 1
filelist: "data/dataset/attention_lstm/train.list"
VALID:
batch_size: 128
filelist: "data/dataset/attention_lstm/val.list"
TEST:
batch_size: 128
filelist: "data/dataset/attention_lstm/test.list"
INFER:
batch_size: 1
filelist: "data/dataset/attention_lstm/infer.list"
MODEL: MODEL:
name: "AttentionLSTM" name: "AttentionLSTM"
dataset: None dataset: "YouTube-8M" #Default, don't recommand to modify it
bone_nework: None bone_nework: None
drop_rate: 0.5 drop_rate: 0.5
feature_num: 2 feature_names: ['rgb'] #rbg only, without audio
feature_names: ['rgb']
feature_dims: [2048] feature_dims: [2048]
embedding_size: 1024 embedding_size: 1024
lstm_size: 512 lstm_size: 512
num_classes: 3396 num_classes: 3396
topk: 20 topk: 20
TRAIN:
epoch: 10
learning_rate: 0.001
decay_epochs: [5]
decay_gamma: 0.1
weight_decay: 0.0008
num_samples: 5000000 # modify it according to the number samples of your dataset
pretrain_base: None
batch_size: 1024
use_gpu: True
num_gpus: 8
filelist: "data/dataset/attention_lstm/train.list"
VALID:
batch_size: 1024
filelist: "data/dataset/attention_lstm/val.list"
TEST:
batch_size: 128
filelist: "data/dataset/attention_lstm/test.list"
INFER: INFER:
batch_size: 1 batch_size: 1
filelist: "data/dataset/attention_lstm/infer.list"
MODEL:
name: "TSN"
format: "video" # ["video", "frames"]
num_classes: 400
seglen: 1
image_mean: [0.485, 0.456, 0.406]
image_std: [0.229, 0.224, 0.225]
num_layers: 101
topk: 5
TRAIN:
seg_num: 3 # training with 3 segments
epoch: 45
short_size: 256
target_size: 224
num_reader_threads: 12
buf_size: 1024
batch_size: 32
use_gpu: True
num_gpus: 1
filelist: "./data/dataset/tsn/train.list"
learning_rate: 0.00125
learning_rate_decay: 0.1
l2_weight_decay: 1e-4
momentum: 0.9
total_videos: 224684 # modify it according to the number samples of your dataset
VALID:
seg_num: 3
short_size: 256
target_size: 224
num_reader_threads: 12
buf_size: 1024
batch_size: 32
filelist: "./data/dataset/tsn/val.list"
TEST:
seg_num: 7
short_size: 256
target_size: 224
num_reader_threads: 12
buf_size: 1024
batch_size: 16
filelist: "./data/dataset/tsn/test.list"
INFER:
seg_num: 300 # infer using 300 segments
short_size: 256
target_size: 224
num_reader_threads: 12
buf_size: 1024
batch_size: 1
filelist: "./data/dataset/tsn/infer.list"
MODEL: MODEL:
name: "TSN" name: "TSN"
format: "mp4" format: "video" # ["video", "frames"]
num_classes: 400 num_classes: 400
seglen: 1 seglen: 1
image_mean: [0.485, 0.456, 0.406] image_mean: [0.485, 0.456, 0.406]
image_std: [0.229, 0.224, 0.225] image_std: [0.229, 0.224, 0.225]
num_layers: 50 num_layers: 101
topk: 5 topk: 5
TRAIN:
seg_num: 3 # training with 3 segments
epoch: 45
short_size: 256
target_size: 224
num_reader_threads: 12
buf_size: 1024
batch_size: 256
use_gpu: True
num_gpus: 8
filelist: "./data/dataset/tsn/train.list"
learning_rate: 0.01
learning_rate_decay: 0.1
l2_weight_decay: 1e-4
momentum: 0.9
total_videos: 224684 # modify it according to the number samples of your dataset
VALID:
seg_num: 3
short_size: 256
target_size: 224
num_reader_threads: 12
buf_size: 1024
batch_size: 256
filelist: "./data/dataset/tsn/val.list"
TEST:
seg_num: 7
short_size: 256
target_size: 224
num_reader_threads: 12
buf_size: 1024
batch_size: 16
filelist: "./data/dataset/tsn/test.list"
INFER: INFER:
seg_num: 300 seg_num: 300 # infer using 300 segments
short_size: 256 short_size: 256
target_size: 224 target_size: 224
num_reader_threads: 1 num_reader_threads: 12
buf_size: 1024 buf_size: 1024
batch_size: 1 batch_size: 1
kinetics_labels: None filelist: "./data/dataset/tsn/infer.list"
video_path: ""
filelist: "./data/tsn.list"
data/mp4/1.mp4
data/mp4/2.mp4
data/mp4/3.mp4
data/mp4/4.mp4
data/mp4/5.mp4
data/mp4/1.mp4
data/mp4/2.mp4
data/mp4/3.mp4
data/mp4/4.mp4
data/mp4/5.mp4
data/tsn_features/1.npy
data/tsn_features/2.npy
data/tsn_features/3.npy
data/tsn_features/4.npy
data/tsn_features/5.npy
data/tsn_features/1.npy 0 3 4
data/tsn_features/2.npy 1
data/tsn_features/3.npy 2
data/tsn_features/4.npy 3 4
data/tsn_features/5.npy 4
data/tsn_features/1.npy 0 3 4
data/tsn_features/2.npy 1
data/tsn_features/3.npy 2
data/tsn_features/4.npy 3 4
data/tsn_features/5.npy 4
data/tsn_features/1.npy 0 3 4
data/tsn_features/2.npy 1
data/tsn_features/3.npy 2
data/tsn_features/4.npy 3 4
data/tsn_features/5.npy 4
data/mp4/1.mp4
data/mp4/2.mp4
data/mp4/3.mp4
data/mp4/4.mp4
data/mp4/5.mp4
data/mp4/1.mp4 0
data/mp4/2.mp4 1
data/mp4/3.mp4 2
data/mp4/4.mp4 3
data/mp4/5.mp4 4
data/mp4/1.mp4 0
data/mp4/2.mp4 1
data/mp4/3.mp4 2
data/mp4/4.mp4 3
data/mp4/5.mp4 4
data/mp4/1.mp4 0
data/mp4/2.mp4 1
data/mp4/3.mp4 2
data/mp4/4.mp4 3
data/mp4/5.mp4 4
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import os
import sys
import time
import logging
import argparse
import ast
import paddle.fluid as fluid
from utils.config_utils import *
import models
from reader import get_reader
from metrics import get_metrics
from utils.utility import check_cuda
from utils.utility import check_version
logging.root.handlers = []
FORMAT = '[%(levelname)s: %(filename)s: %(lineno)4d]: %(message)s'
logging.basicConfig(level=logging.INFO, format=FORMAT, stream=sys.stdout)
logger = logging.getLogger(__name__)
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument(
'--model_name',
type=str,
default='AttentionCluster',
help='name of model to train.')
parser.add_argument(
'--config',
type=str,
default='configs/attention_cluster.txt',
help='path to config file of model')
parser.add_argument(
'--batch_size',
type=int,
default=None,
help='test batch size. None to use config file setting.')
parser.add_argument(
'--use_gpu',
type=ast.literal_eval,
default=True,
help='default use gpu.')
parser.add_argument(
'--weights',
type=str,
default='./data/checkpoints/AttentionLSTM_epoch9.pdparams',
help='weight path.')
parser.add_argument(
'--save_dir',
type=str,
default=os.path.join('data', 'evaluate_results'),
help='output dir path, default to use ./data/evaluate_results')
parser.add_argument(
'--log_interval',
type=int,
default=1,
help='mini-batch interval to log.')
args = parser.parse_args()
return args
def test(args):
# parse config
config = parse_config(args.config)
test_config = merge_configs(config, 'test', vars(args))
print_configs(test_config, "Test")
use_dali = test_config['TEST'].get('use_dali', False)
# build model
test_model = models.get_model(args.model_name, test_config, mode='test')
test_model.build_input(use_dataloader=False)
test_model.build_model()
test_feeds = test_model.feeds()
test_fetch_list = test_model.fetches()
place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
if args.weights:
assert os.path.exists(
args.weights), "Given weight dir {} not exist.".format(args.weights)
weights = args.weights or test_model.get_weights()
logger.info('load test weights from {}'.format(weights))
test_model.load_test_weights(exe, weights, fluid.default_main_program())
# get reader and metrics
test_reader = get_reader(args.model_name.upper(), 'test', test_config)
test_metrics = get_metrics(args.model_name.upper(), 'test', test_config)
test_feeder = fluid.DataFeeder(place=place, feed_list=test_feeds)
epoch_period = []
for test_iter, data in enumerate(test_reader()):
cur_time = time.time()
test_outs = exe.run(fetch_list=test_fetch_list,
feed=test_feeder.feed(data))
period = time.time() - cur_time
epoch_period.append(period)
test_metrics.accumulate(test_outs)
# metric here
if args.log_interval > 0 and test_iter % args.log_interval == 0:
info_str = '[EVAL] Batch {}'.format(test_iter)
test_metrics.calculate_and_log_out(test_outs, info_str)
if not os.path.isdir(args.save_dir):
os.makedirs(args.save_dir)
test_metrics.finalize_and_log_out("[EVAL] eval finished. ", args.save_dir)
if __name__ == "__main__":
args = parse_args()
# check whether the installed paddle is compiled with GPU
check_cuda(args.use_gpu)
check_version()
logger.info(args)
test(args)
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
from __future__ import unicode_literals
from __future__ import print_function
from __future__ import division
import numpy as np
import datetime
import logging
logger = logging.getLogger(__name__)
class MetricsCalculator():
def __init__(self, name, mode):
self.name = name
self.mode = mode # 'train', 'val', 'test'
self.reset()
def reset(self):
logger.info('Resetting {} metrics...'.format(self.mode))
self.aggr_acc1 = 0.0
self.aggr_acc5 = 0.0
self.aggr_loss = 0.0
self.aggr_batch_size = 0
def finalize_metrics(self):
self.avg_acc1 = self.aggr_acc1 / self.aggr_batch_size
self.avg_acc5 = self.aggr_acc5 / self.aggr_batch_size
self.avg_loss = self.aggr_loss / self.aggr_batch_size
def get_computed_metrics(self):
json_stats = {}
json_stats['avg_loss'] = self.avg_loss
json_stats['avg_acc1'] = self.avg_acc1
json_stats['avg_acc5'] = self.avg_acc5
return json_stats
def calculate_metrics(self, loss, softmax, labels):
accuracy1 = compute_topk_accuracy(softmax, labels, top_k=1) * 100.
accuracy5 = compute_topk_accuracy(softmax, labels, top_k=5) * 100.
return accuracy1, accuracy5
def accumulate(self, loss, softmax, labels):
cur_batch_size = softmax.shape[0]
# if returned loss is None for e.g. test, just set loss to be 0.
if loss is None:
cur_loss = 0.
else:
cur_loss = np.mean(np.array(loss)) #
self.aggr_batch_size += cur_batch_size
self.aggr_loss += cur_loss * cur_batch_size
accuracy1 = compute_topk_accuracy(softmax, labels, top_k=1) * 100.
accuracy5 = compute_topk_accuracy(softmax, labels, top_k=5) * 100.
self.aggr_acc1 += accuracy1 * cur_batch_size
self.aggr_acc5 += accuracy5 * cur_batch_size
return
# ----------------------------------------------
# other utils
# ----------------------------------------------
def compute_topk_correct_hits(top_k, preds, labels):
'''Compute the number of corret hits'''
batch_size = preds.shape[0]
top_k_preds = np.zeros((batch_size, top_k), dtype=np.float32)
for i in range(batch_size):
top_k_preds[i, :] = np.argsort(-preds[i, :])[:top_k]
correctness = np.zeros(batch_size, dtype=np.int32)
for i in range(batch_size):
if labels[i] in top_k_preds[i, :].astype(np.int32).tolist():
correctness[i] = 1
correct_hits = sum(correctness)
return correct_hits
def compute_topk_accuracy(softmax, labels, top_k):
computed_metrics = {}
assert labels.shape[0] == softmax.shape[0], "Batch size mismatch."
aggr_batch_size = labels.shape[0]
aggr_top_k_correct_hits = compute_topk_correct_hits(top_k, softmax, labels)
# normalize results
computed_metrics = \
float(aggr_top_k_correct_hits) / aggr_batch_size
return computed_metrics
...@@ -17,13 +17,14 @@ from __future__ import unicode_literals ...@@ -17,13 +17,14 @@ from __future__ import unicode_literals
from __future__ import print_function from __future__ import print_function
from __future__ import division from __future__ import division
import os
import io
import logging import logging
import os
import io
import numpy as np import numpy as np
import json import json
from metrics.youtube8m import eval_util as youtube8m_metrics from metrics.youtube8m import eval_util as youtube8m_metrics
from metrics.kinetics import accuracy_metrics as kinetics_metrics
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
...@@ -95,9 +96,8 @@ class Youtube8mMetrics(Metrics): ...@@ -95,9 +96,8 @@ class Youtube8mMetrics(Metrics):
if self.mode == 'infer': if self.mode == 'infer':
for index, item in enumerate(self.infer_results): for index, item in enumerate(self.infer_results):
video_id = item[0] video_id = item[0]
logger.info( print('[========video_id [ {} ] , topk({}) preds: ========]\n'.
'========video_id [ {} ] , topk({}) preds: ========\n'. format(video_id, self.topk))
format(video_id, self.topk))
f = io.open(label_file, "r", encoding="utf-8") f = io.open(label_file, "r", encoding="utf-8")
fl = f.readlines() fl = f.readlines()
...@@ -122,7 +122,7 @@ class Youtube8mMetrics(Metrics): ...@@ -122,7 +122,7 @@ class Youtube8mMetrics(Metrics):
os.path.join(savedir, 'result' + str(index) + '.json'), os.path.join(savedir, 'result' + str(index) + '.json'),
'w', 'w',
encoding='utf-8') as f: encoding='utf-8') as f:
f.write(json.dumps(res_list, f, ensure_ascii=False)) f.write(json.dumps(res_list, ensure_ascii=False))
else: else:
epoch_info_dict = self.calculator.get() epoch_info_dict = self.calculator.get()
logger.info(info + '\tavg_hit_at_one: {0},\tavg_perr: {1},\tavg_loss :{2},\taps: {3},\tgap:{4}'\ logger.info(info + '\tavg_hit_at_one: {0},\tavg_perr: {1},\tavg_loss :{2},\taps: {3},\tgap:{4}'\
...@@ -135,6 +135,101 @@ class Youtube8mMetrics(Metrics): ...@@ -135,6 +135,101 @@ class Youtube8mMetrics(Metrics):
self.infer_results = [] self.infer_results = []
class Kinetics400Metrics(Metrics):
def __init__(self, name, mode, metrics_args):
self.name = name
self.mode = mode
self.topk = metrics_args['MODEL']['topk']
self.calculator = kinetics_metrics.MetricsCalculator(name, mode.lower())
if self.mode == 'infer':
self.infer_results = []
def calculate_and_log_out(self, fetch_list, info=''):
if len(fetch_list) == 3:
loss = fetch_list[0]
loss = np.mean(np.array(loss))
pred = np.array(fetch_list[1])
label = np.array(fetch_list[2])
else:
loss = 0.
pred = np.array(fetch_list[0])
label = np.array(fetch_list[1])
acc1, acc5 = self.calculator.calculate_metrics(loss, pred, label)
logger.info(info + '\tLoss: {},\ttop1_acc: {}, \ttop5_acc: {}'.format('%.6f' % loss, \
'%.2f' % acc1, '%.2f' % acc5))
return loss
def accumulate(self, fetch_list, info=''):
if self.mode == 'infer':
predictions = np.array(fetch_list[0])
video_id = fetch_list[1]
for i in range(len(predictions)):
topk_inds = predictions[i].argsort()[0 - self.topk:]
topk_inds = topk_inds[::-1]
preds = predictions[i][topk_inds]
self.infer_results.append(
(video_id[i], topk_inds.tolist(), preds.tolist()))
else:
if len(fetch_list) == 3:
loss = fetch_list[0]
loss = np.mean(np.array(loss))
pred = np.array(fetch_list[1])
label = np.array(fetch_list[2])
else:
loss = 0.
pred = np.array(fetch_list[0])
label = np.array(fetch_list[1])
self.calculator.accumulate(loss, pred, label)
def finalize_and_log_out(self,
info='',
savedir='./data/results',
label_file='./label_3396.txt'):
if self.mode == 'infer':
for index, item in enumerate(self.infer_results):
video_id = item[0]
print('[========video_id [ {} ] , topk({}) preds: ========]\n'.
format(video_id, self.topk))
f = io.open(label_file, "r", encoding="utf-8")
fl = f.readlines()
res_list = []
res_list.append(video_id)
for i in range(len(item[1])):
class_id = item[1][i]
class_prob = item[2][i]
class_name = fl[class_id].split('\n')[0]
print('class_id: {},'.format(class_id), 'class_name:',
class_name,
', probability: {} \n'.format(class_prob))
save_dict = {
"'class_id": class_id,
"class_name": class_name,
"probability": class_prob
}
res_list.append(save_dict)
# save infer result into output dir
with io.open(
os.path.join(savedir, 'result' + str(index) + '.json'),
'w',
encoding='utf-8') as f:
f.write(json.dumps(res_list, ensure_ascii=False))
else:
self.calculator.finalize_metrics()
metrics_dict = self.calculator.get_computed_metrics()
loss = metrics_dict['avg_loss']
acc1 = metrics_dict['avg_acc1']
acc5 = metrics_dict['avg_acc5']
logger.info(info + '\tLoss: {},\ttop1_acc: {}, \ttop5_acc: {}'.format('%.6f' % loss, \
'%.2f' % acc1, '%.2f' % acc5))
def reset(self):
self.calculator.reset()
if self.mode == 'infer':
self.infer_results = []
class MetricsZoo(object): class MetricsZoo(object):
def __init__(self): def __init__(self):
self.metrics_zoo = {} self.metrics_zoo = {}
...@@ -164,6 +259,5 @@ def get_metrics(name, mode, cfg): ...@@ -164,6 +259,5 @@ def get_metrics(name, mode, cfg):
# sort by alphabet # sort by alphabet
regist_metrics("ATTENTIONCLUSTER", Youtube8mMetrics)
regist_metrics("ATTENTIONLSTM", Youtube8mMetrics) regist_metrics("ATTENTIONLSTM", Youtube8mMetrics)
regist_metrics("NEXTVLAD", Youtube8mMetrics) regist_metrics("TSN", Kinetics400Metrics)
...@@ -12,8 +12,6 @@ ...@@ -12,8 +12,6 @@
#See the License for the specific language governing permissions and #See the License for the specific language governing permissions and
#limitations under the License. #limitations under the License.
import numpy as np
import paddle.fluid as fluid import paddle.fluid as fluid
from paddle.fluid import ParamAttr from paddle.fluid import ParamAttr
...@@ -27,13 +25,13 @@ __all__ = ["AttentionLSTM"] ...@@ -27,13 +25,13 @@ __all__ = ["AttentionLSTM"]
class AttentionLSTM(ModelBase): class AttentionLSTM(ModelBase):
def __init__(self, name, cfg, mode='train'): def __init__(self, name, cfg, mode='train', is_videotag=False):
super(AttentionLSTM, self).__init__(name, cfg, mode) super(AttentionLSTM, self).__init__(name, cfg, mode)
self.is_videotag = is_videotag
self.get_config() self.get_config()
def get_config(self): def get_config(self):
# get model configs # get model configs
self.feature_num = self.cfg.MODEL.feature_num
self.feature_names = self.cfg.MODEL.feature_names self.feature_names = self.cfg.MODEL.feature_names
self.feature_dims = self.cfg.MODEL.feature_dims self.feature_dims = self.cfg.MODEL.feature_dims
self.num_classes = self.cfg.MODEL.num_classes self.num_classes = self.cfg.MODEL.num_classes
...@@ -45,18 +43,34 @@ class AttentionLSTM(ModelBase): ...@@ -45,18 +43,34 @@ class AttentionLSTM(ModelBase):
self.batch_size = self.get_config_from_sec(self.mode, 'batch_size', 1) self.batch_size = self.get_config_from_sec(self.mode, 'batch_size', 1)
self.num_gpus = self.get_config_from_sec(self.mode, 'num_gpus', 1) self.num_gpus = self.get_config_from_sec(self.mode, 'num_gpus', 1)
if self.mode == 'train':
self.learning_rate = self.get_config_from_sec('train',
'learning_rate', 1e-3)
self.weight_decay = self.get_config_from_sec('train',
'weight_decay', 8e-4)
self.num_samples = self.get_config_from_sec('train', 'num_samples',
5000000)
self.decay_epochs = self.get_config_from_sec('train',
'decay_epochs', [5])
self.decay_gamma = self.get_config_from_sec('train', 'decay_gamma',
0.1)
def build_input(self, use_dataloader): def build_input(self, use_dataloader):
self.feature_input = [] self.feature_input = []
for name, dim in zip(self.feature_names, self.feature_dims): for name, dim in zip(self.feature_names, self.feature_dims):
self.feature_input.append( self.feature_input.append(
fluid.data( fluid.data(
shape=[None, dim], lod_level=1, dtype='float32', name=name)) shape=[None, dim], lod_level=1, dtype='float32', name=name))
#video_tag without label_input if self.mode != 'infer':
self.label_input = fluid.data(
shape=[None, self.num_classes], dtype='float32', name='label')
else:
self.label_input = None
if use_dataloader: if use_dataloader:
assert self.mode != 'infer', \ assert self.mode != 'infer', \
'dataloader is not recommendated when infer, please set use_dataloader to be false.' 'dataloader is not recommendated when infer, please set use_dataloader to be false.'
self.dataloader = fluid.io.DataLoader.from_generator( self.dataloader = fluid.io.DataLoader.from_generator(
feed_list=self.feature_input, #video_tag feed_list=self.feature_input + [self.label_input],
capacity=8, capacity=8,
iterable=True) iterable=True)
...@@ -71,7 +85,7 @@ class AttentionLSTM(ModelBase): ...@@ -71,7 +85,7 @@ class AttentionLSTM(ModelBase):
if len(att_outs) > 1: if len(att_outs) > 1:
out = fluid.layers.concat(att_outs, axis=1) out = fluid.layers.concat(att_outs, axis=1)
else: else:
out = att_outs[0] out = att_outs[0] # video only, without audio in videoTag
fc1 = fluid.layers.fc( fc1 = fluid.layers.fc(
input=out, input=out,
...@@ -92,8 +106,7 @@ class AttentionLSTM(ModelBase): ...@@ -92,8 +106,7 @@ class AttentionLSTM(ModelBase):
self.logit = fluid.layers.fc(input=fc2, size=self.num_classes, act=None, \ self.logit = fluid.layers.fc(input=fc2, size=self.num_classes, act=None, \
bias_attr=ParamAttr(regularizer=fluid.regularizer.L2Decay(0.0), bias_attr=ParamAttr(regularizer=fluid.regularizer.L2Decay(0.0),
initializer=fluid.initializer.NormalInitializer(scale=0.0)), initializer=fluid.initializer.NormalInitializer(scale=0.0)), name='output')
name = 'output')
self.output = fluid.layers.sigmoid(self.logit) self.output = fluid.layers.sigmoid(self.logit)
...@@ -125,16 +138,29 @@ class AttentionLSTM(ModelBase): ...@@ -125,16 +138,29 @@ class AttentionLSTM(ModelBase):
return [self.output, self.logit] return [self.output, self.logit]
def feeds(self): def feeds(self):
return self.feature_input return self.feature_input if self.mode == 'infer' else self.feature_input + [
self.label_input
]
def fetches(self): def fetches(self):
fetch_list = [self.output] if self.mode == 'train' or self.mode == 'valid':
losses = self.loss()
fetch_list = [losses, self.output, self.label_input]
elif self.mode == 'test':
losses = self.loss()
fetch_list = [losses, self.output, self.label_input]
elif self.mode == 'infer':
fetch_list = [self.output]
else:
raise NotImplementedError('mode {} not implemented'.format(
self.mode))
return fetch_list return fetch_list
def weights_info(self): def weights_info(self):
return None return None, None
def load_pretrain_params(self, exe, pretrain, prog, place): def load_pretrain_params(self, exe, pretrain, prog):
logger.info("Load pretrain weights from {}, exclude fc layer.".format( logger.info("Load pretrain weights from {}, exclude fc layer.".format(
pretrain)) pretrain))
......
...@@ -148,7 +148,7 @@ class ModelBase(object): ...@@ -148,7 +148,7 @@ class ModelBase(object):
download(url, path) download(url, path)
return path return path
def load_pretrain_params(self, exe, pretrain, prog, place): def load_pretrain_params(self, exe, pretrain, prog):
logger.info("Load pretrain weights from {}".format(pretrain)) logger.info("Load pretrain weights from {}".format(pretrain))
state_dict = fluid.load_program_state(pretrain) state_dict = fluid.load_program_state(pretrain)
fluid.set_program_state(prog, state_dict) fluid.set_program_state(prog, state_dict)
...@@ -172,10 +172,10 @@ class ModelZoo(object): ...@@ -172,10 +172,10 @@ class ModelZoo(object):
type(model)) type(model))
self.model_zoo[name] = model self.model_zoo[name] = model
def get(self, name, cfg, mode='train'): def get(self, name, cfg, mode='train', is_videotag=False):
for k, v in self.model_zoo.items(): for k, v in self.model_zoo.items():
if k.upper() == name.upper(): if k.upper() == name.upper():
return v(name, cfg, mode) return v(name, cfg, mode, is_videotag)
raise ModelNotFoundError(name, self.model_zoo.keys()) raise ModelNotFoundError(name, self.model_zoo.keys())
...@@ -187,5 +187,5 @@ def regist_model(name, model): ...@@ -187,5 +187,5 @@ def regist_model(name, model):
model_zoo.regist(name, model) model_zoo.regist(name, model)
def get_model(name, cfg, mode='train'): def get_model(name, cfg, mode='train', is_videotag=False):
return model_zoo.get(name, cfg, mode) return model_zoo.get(name, cfg, mode, is_videotag)
...@@ -12,8 +12,6 @@ ...@@ -12,8 +12,6 @@
#See the License for the specific language governing permissions and #See the License for the specific language governing permissions and
#limitations under the License. #limitations under the License.
import numpy as np
import paddle.fluid as fluid import paddle.fluid as fluid
from paddle.fluid import ParamAttr from paddle.fluid import ParamAttr
...@@ -27,8 +25,9 @@ __all__ = ["TSN"] ...@@ -27,8 +25,9 @@ __all__ = ["TSN"]
class TSN(ModelBase): class TSN(ModelBase):
def __init__(self, name, cfg, mode='train'): def __init__(self, name, cfg, mode='train', is_videotag=False):
super(TSN, self).__init__(name, cfg, mode=mode) super(TSN, self).__init__(name, cfg, mode=mode)
self.is_videotag = is_videotag
self.get_config() self.get_config()
def get_config(self): def get_config(self):
...@@ -87,11 +86,11 @@ class TSN(ModelBase): ...@@ -87,11 +86,11 @@ class TSN(ModelBase):
videomodel = TSN_ResNet( videomodel = TSN_ResNet(
layers=cfg['layers'], layers=cfg['layers'],
seg_num=cfg['seg_num'], seg_num=cfg['seg_num'],
is_training=(self.mode == 'train')) is_training=(self.mode == 'train'),
is_extractor=self.is_videotag)
out = videomodel.net(input=self.feature_input[0], out = videomodel.net(input=self.feature_input[0],
class_dim=cfg['class_dim']) class_dim=cfg['class_dim'])
# videotag just need extractor feature self.network_outputs = [out]
self.feature_output = out
def optimizer(self): def optimizer(self):
assert self.mode == 'train', "optimizer only can be get in train mode" assert self.mode == 'train', "optimizer only can be get in train mode"
...@@ -133,9 +132,9 @@ class TSN(ModelBase): ...@@ -133,9 +132,9 @@ class TSN(ModelBase):
fetch_list = [losses, self.network_outputs[0], self.label_input] fetch_list = [losses, self.network_outputs[0], self.label_input]
elif self.mode == 'test': elif self.mode == 'test':
losses = self.loss() losses = self.loss()
fetch_list = [self.feature_output, self.label_input] fetch_list = [losses, self.network_outputs[0], self.label_input]
elif self.mode == 'infer': elif self.mode == 'infer':
fetch_list = self.feature_output fetch_list = self.network_outputs
else: else:
raise NotImplementedError('mode {} not implemented'.format( raise NotImplementedError('mode {} not implemented'.format(
self.mode)) self.mode))
...@@ -143,27 +142,22 @@ class TSN(ModelBase): ...@@ -143,27 +142,22 @@ class TSN(ModelBase):
return fetch_list return fetch_list
def pretrain_info(self): def pretrain_info(self):
return ( return None, None
'ResNet50_pretrained',
'https://paddlemodels.bj.bcebos.com/video_classification/ResNet50_pretrained.tar.gz'
)
def weights_info(self): def weights_info(self):
return None return None
def load_pretrain_params(self, exe, pretrain, prog, place): def load_pretrain_params(self, exe, pretrain, prog):
def is_parameter(var): def is_parameter(var):
return isinstance(var, fluid.framework.Parameter) return isinstance(var, fluid.framework.Parameter)
params_list = list(filter(is_parameter, prog.list_vars()))
for param in params_list:
print(param.name)
logger.info("Load pretrain weights from {}, exclude fc layer.".format( logger.info("Load pretrain weights from {}, exclude fc layer.".format(
pretrain)) pretrain))
print("===pretrain===", pretrain)
state_dict = fluid.load_program_state(pretrain) state_dict = fluid.load_program_state(pretrain)
dict_keys = list(state_dict.keys()) dict_keys = list(state_dict.keys())
# remove fc layer when pretrain, because the number of classes in final fc may not match
for name in dict_keys: for name in dict_keys:
if "fc_0" in name: if "fc_0" in name:
del state_dict[name] del state_dict[name]
......
...@@ -20,10 +20,15 @@ import math ...@@ -20,10 +20,15 @@ import math
class TSN_ResNet(): class TSN_ResNet():
def __init__(self, layers=50, seg_num=7, is_training=True): def __init__(self,
self.layers = 101 #layers layers=50,
seg_num=7,
is_training=True,
is_extractor=False):
self.layers = layers
self.seg_num = seg_num self.seg_num = seg_num
self.is_training = is_training self.is_training = is_training
self.is_extractor = is_extractor
def conv_bn_layer(self, def conv_bn_layer(self,
input, input,
...@@ -144,7 +149,18 @@ class TSN_ResNet(): ...@@ -144,7 +149,18 @@ class TSN_ResNet():
pool = fluid.layers.pool2d( pool = fluid.layers.pool2d(
input=conv, pool_size=7, pool_type='avg', global_pooling=True) input=conv, pool_size=7, pool_type='avg', global_pooling=True)
# video_tag just need extractor feature
feature = fluid.layers.reshape( feature = fluid.layers.reshape(
x=pool, shape=[-1, seg_num, pool.shape[1]]) x=pool, shape=[-1, seg_num, pool.shape[1]])
return feature if self.is_extractor:
out = feature
else:
out = fluid.layers.reduce_mean(feature, dim=1)
stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
out = fluid.layers.fc(input=out,
size=class_dim,
act='softmax',
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(
-stdv, stdv)))
return out
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import os
import sys
import time
import logging
import argparse
import ast
import numpy as np
try:
import cPickle as pickle
except:
import pickle
import paddle.fluid as fluid
from utils.config_utils import *
import models
from reader import get_reader
from metrics import get_metrics
from utils.utility import check_cuda
from utils.utility import check_version
logging.root.handlers = []
FORMAT = '[%(levelname)s: %(filename)s: %(lineno)4d]: %(message)s'
logging.basicConfig(level=logging.DEBUG, format=FORMAT, stream=sys.stdout)
logger = logging.getLogger(__name__)
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument(
'--model_name',
type=str,
default='AttentionCluster',
help='name of model to train.')
parser.add_argument(
'--config',
type=str,
default='configs/attention_cluster.txt',
help='path to config file of model')
parser.add_argument(
'--use_gpu',
type=ast.literal_eval,
default=True,
help='default use gpu.')
parser.add_argument(
'--weights',
type=str,
default='./data/checkpoints/AttentionLSTM_epoch9.pdparams',
help='weight path.')
parser.add_argument(
'--batch_size',
type=int,
default=1,
help='sample number in a batch for inference.')
parser.add_argument(
'--filelist',
type=str,
default=None,
help='path to inferenece data file lists file.')
parser.add_argument(
'--log_interval',
type=int,
default=1,
help='mini-batch interval to log.')
parser.add_argument(
'--infer_topk',
type=int,
default=20,
help='topk predictions to restore.')
parser.add_argument(
'--save_dir',
type=str,
default=os.path.join('data', 'predict_results', 'attention_lstm'),
help='directory to store results')
parser.add_argument(
'--video_path',
type=str,
default=None,
help='directory to store results')
parser.add_argument(
'--label_file',
type=str,
default='label_3396.txt',
help='chinese label file path')
args = parser.parse_args()
return args
def infer(args):
# parse config
config = parse_config(args.config)
infer_config = merge_configs(config, 'infer', vars(args))
print_configs(infer_config, "Infer")
infer_model = models.get_model(args.model_name, infer_config, mode='infer')
infer_model.build_input(use_dataloader=False)
infer_model.build_model()
infer_feeds = infer_model.feeds()
infer_outputs = infer_model.outputs()
place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
filelist = args.filelist or infer_config.INFER.filelist
filepath = args.video_path or infer_config.INFER.get('filepath', '')
if filepath != '':
assert os.path.exists(filepath), "{} not exist.".format(filepath)
else:
assert os.path.exists(filelist), "{} not exist.".format(filelist)
# get infer reader
infer_reader = get_reader(args.model_name.upper(), 'infer', infer_config)
if args.weights:
assert os.path.exists(
args.weights), "Given weight dir {} not exist.".format(args.weights)
# if no weight files specified, download weights from paddle
weights = args.weights or infer_model.get_weights()
infer_model.load_test_weights(exe, weights, fluid.default_main_program())
infer_feeder = fluid.DataFeeder(place=place, feed_list=infer_feeds)
fetch_list = infer_model.fetches()
infer_metrics = get_metrics(args.model_name.upper(), 'infer', infer_config)
infer_metrics.reset()
periods = []
cur_time = time.time()
for infer_iter, data in enumerate(infer_reader()):
data_feed_in = [items[:-1] for items in data]
video_id = [items[-1] for items in data]
infer_outs = exe.run(fetch_list=fetch_list,
feed=infer_feeder.feed(data_feed_in))
infer_result_list = [item for item in infer_outs] + [video_id]
prev_time = cur_time
cur_time = time.time()
period = cur_time - prev_time
periods.append(period)
infer_metrics.accumulate(infer_result_list)
if args.log_interval > 0 and infer_iter % args.log_interval == 0:
logger.info('Processed {} samples'.format((infer_iter + 1) * len(
video_id)))
logger.info('[INFER] infer finished. average time: {}'.format(
np.mean(periods)))
if not os.path.isdir(args.save_dir):
os.makedirs(args.save_dir)
infer_metrics.finalize_and_log_out(
savedir=args.save_dir, label_file=args.label_file)
if __name__ == "__main__":
args = parse_args()
# check whether the installed paddle is compiled with GPU
check_cuda(args.use_gpu)
check_version()
logger.info(args)
infer(args)
from .reader_utils import regist_reader, get_reader from .reader_utils import regist_reader, get_reader
from .feature_reader import FeatureReader
from .kinetics_reader import KineticsReader from .kinetics_reader import KineticsReader
# regist reader, sort by alphabet # regist reader, sort by alphabet
regist_reader("ATTENTIONLSTM", FeatureReader)
regist_reader("TSN", KineticsReader) regist_reader("TSN", KineticsReader)
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import sys
from .reader_utils import DataReader
try:
import cPickle as pickle
from cStringIO import StringIO
except ImportError:
import pickle
from io import BytesIO
import numpy as np
import random
python_ver = sys.version_info
class FeatureReader(DataReader):
"""
Data reader for youtube-8M dataset, which was stored as features extracted by prior networks
This is for the three models: lstm
dataset cfg: num_classes
batch_size
list
"""
def __init__(self, name, mode, cfg):
self.name = name
self.mode = mode
self.num_classes = cfg.MODEL.num_classes
# set batch size and file list
self.batch_size = cfg[mode.upper()]['batch_size']
self.filelist = cfg[mode.upper()]['filelist']
self.seg_num = cfg.MODEL.get('seg_num', None)
def create_reader(self):
fl = open(self.filelist).readlines()
fl = [line.strip() for line in fl if line.strip() != '']
if self.mode == 'train':
random.shuffle(fl)
def reader():
batch_out = []
for item in fl:
fileinfo = item.split(' ')
filepath = fileinfo[0]
rgb = np.load(filepath, allow_pickle=True)
nframes = rgb.shape[0]
label = [int(i) for i in fileinfo[1:]]
one_hot_label = make_one_hot(label, self.num_classes)
if self.mode != 'infer':
batch_out.append((rgb, one_hot_label))
else:
batch_out.append((rgb, filepath.split('/')[-1]))
if len(batch_out) == self.batch_size:
yield batch_out
batch_out = []
return reader
def make_one_hot(label, dim=3862):
one_hot_label = np.zeros(dim)
one_hot_label = one_hot_label.astype(float)
for ind in label:
one_hot_label[int(ind)] = 1
return one_hot_label
...@@ -18,7 +18,6 @@ import cv2 ...@@ -18,7 +18,6 @@ import cv2
import math import math
import random import random
import functools import functools
import time
try: try:
import cPickle as pickle import cPickle as pickle
from cStringIO import StringIO from cStringIO import StringIO
...@@ -26,7 +25,9 @@ except ImportError: ...@@ -26,7 +25,9 @@ except ImportError:
import pickle import pickle
from io import BytesIO from io import BytesIO
import numpy as np import numpy as np
import paddle
import paddle.fluid as fluid import paddle.fluid as fluid
from PIL import Image, ImageEnhance from PIL import Image, ImageEnhance
import logging import logging
...@@ -36,6 +37,30 @@ logger = logging.getLogger(__name__) ...@@ -36,6 +37,30 @@ logger = logging.getLogger(__name__)
python_ver = sys.version_info python_ver = sys.version_info
class VideoRecord(object):
'''
define a class method which used to describe the frames information of videos
1. self._data[0] is the frames' path
2. self._data[1] is the number of frames
3. self._data[2] is the label of frames
'''
def __init__(self, row):
self._data = row
@property
def path(self):
return self._data[0]
@property
def num_frames(self):
return int(self._data[1])
@property
def label(self):
return int(self._data[2])
class KineticsReader(DataReader): class KineticsReader(DataReader):
""" """
Data reader for kinetics dataset of two format mp4 and pkl. Data reader for kinetics dataset of two format mp4 and pkl.
...@@ -77,6 +102,7 @@ class KineticsReader(DataReader): ...@@ -77,6 +102,7 @@ class KineticsReader(DataReader):
# set batch size and file list # set batch size and file list
self.batch_size = cfg[mode.upper()]['batch_size'] self.batch_size = cfg[mode.upper()]['batch_size']
self.filelist = cfg[mode.upper()]['filelist'] self.filelist = cfg[mode.upper()]['filelist']
if self.fix_random_seed: if self.fix_random_seed:
random.seed(0) random.seed(0)
np.random.seed(0) np.random.seed(0)
...@@ -84,13 +110,13 @@ class KineticsReader(DataReader): ...@@ -84,13 +110,13 @@ class KineticsReader(DataReader):
def create_reader(self): def create_reader(self):
assert os.path.exists(self.filelist), \ assert os.path.exists(self.filelist), \
'{} not exist, please check the data list'.format(self.filelist) '{} not exist, please check the data list'.format(self.filelist)
_reader = self._reader_creator(self.filelist, self.mode, seg_num=self.seg_num, seglen = self.seglen, \ _reader = self._reader_creator(self.filelist, self.mode, seg_num=self.seg_num, seglen = self.seglen, \
short_size = self.short_size, target_size = self.target_size, \ short_size = self.short_size, target_size = self.target_size, \
img_mean = self.img_mean, img_std = self.img_std, \ img_mean = self.img_mean, img_std = self.img_std, \
shuffle = (self.mode == 'train'), \ shuffle = (self.mode == 'train'), \
num_threads = self.num_reader_threads, \ num_threads = self.num_reader_threads, \
buf_size = self.buf_size, format = self.format) buf_size = self.buf_size, format = self.format)
def _batch_reader(): def _batch_reader():
batch_out = [] batch_out = []
...@@ -105,7 +131,7 @@ class KineticsReader(DataReader): ...@@ -105,7 +131,7 @@ class KineticsReader(DataReader):
return _batch_reader return _batch_reader
def _reader_creator(self, def _reader_creator(self,
pickle_list, file_list,
mode, mode,
seg_num, seg_num,
seglen, seglen,
...@@ -116,15 +142,17 @@ class KineticsReader(DataReader): ...@@ -116,15 +142,17 @@ class KineticsReader(DataReader):
shuffle=False, shuffle=False,
num_threads=1, num_threads=1,
buf_size=1024, buf_size=1024,
format='pkl'): format='frames'):
def decode_mp4(sample, mode, seg_num, seglen, short_size, target_size, def decode_mp4(sample, mode, seg_num, seglen, short_size, target_size,
img_mean, img_std): img_mean, img_std):
sample = sample[0].split(' ') sample = sample[0].split(' ')
mp4_path = sample[0] mp4_path = sample[0]
if mode == "infer":
label = mp4_path.split('/')[-1]
else:
label = int(sample[1])
try: try:
load_time1 = time.time()
imgs = mp4_loader(mp4_path, seg_num, seglen, mode) imgs = mp4_loader(mp4_path, seg_num, seglen, mode)
load_time2 = time.time()
if len(imgs) < 1: if len(imgs) < 1:
logger.error('{} frame length {} less than 1.'.format( logger.error('{} frame length {} less than 1.'.format(
mp4_path, len(imgs))) mp4_path, len(imgs)))
...@@ -133,8 +161,29 @@ class KineticsReader(DataReader): ...@@ -133,8 +161,29 @@ class KineticsReader(DataReader):
logger.error('Error when loading {}'.format(mp4_path)) logger.error('Error when loading {}'.format(mp4_path))
return None, None return None, None
transform_time_1 = time.time() return imgs_transform(imgs, mode, seg_num, seglen, \
imgs = imgs_transform( short_size, target_size, img_mean, img_std, name = self.name), label
def decode_frames(sample, mode, seg_num, seglen, short_size,
target_size, img_mean, img_std):
recode = VideoRecord(sample[0].split(' '))
frames_dir_path = recode.path
if mode == "infer":
label = frames_dir_path
else:
label = recode.label
try:
imgs = frames_loader(recode, seg_num, seglen, mode)
if len(imgs) < 1:
logger.error('{} frame length {} less than 1.'.format(
frames_dir_path, len(imgs)))
return None, None
except:
logger.error('Error when loading {}'.format(frames_dir_path))
return None, None
return imgs_transform(
imgs, imgs,
mode, mode,
seg_num, seg_num,
...@@ -143,21 +192,26 @@ class KineticsReader(DataReader): ...@@ -143,21 +192,26 @@ class KineticsReader(DataReader):
target_size, target_size,
img_mean, img_mean,
img_std, img_std,
name=self.name) name=self.name), label
transform_time_2 = time.time()
return imgs, mp4_path
def reader(): def reader_():
with open(pickle_list) as flist: with open(file_list) as flist:
lines = [line.strip() for line in flist] lines = [line.strip() for line in flist]
if shuffle: if shuffle:
random.shuffle(lines) random.shuffle(lines)
for line in lines: for line in lines:
pickle_path = line.strip() file_path = line.strip()
yield [pickle_path] yield [file_path]
if format == 'frames':
decode_func = decode_frames
elif format == 'video':
decode_func = decode_mp4
else:
raise ("Not implemented format {}".format(format))
mapper = functools.partial( mapper = functools.partial(
decode_mp4, decode_func,
mode=mode, mode=mode,
seg_num=seg_num, seg_num=seg_num,
seglen=seglen, seglen=seglen,
...@@ -166,7 +220,8 @@ class KineticsReader(DataReader): ...@@ -166,7 +220,8 @@ class KineticsReader(DataReader):
img_mean=img_mean, img_mean=img_mean,
img_std=img_std) img_std=img_std)
return fluid.io.xmap_readers(mapper, reader, num_threads, buf_size) return fluid.io.xmap_readers(
mapper, reader_, num_threads, buf_size, order=True)
def imgs_transform(imgs, def imgs_transform(imgs,
...@@ -181,7 +236,13 @@ def imgs_transform(imgs, ...@@ -181,7 +236,13 @@ def imgs_transform(imgs,
imgs = group_scale(imgs, short_size) imgs = group_scale(imgs, short_size)
np_imgs = np.array([np.array(img).astype('float32') for img in imgs]) #dhwc np_imgs = np.array([np.array(img).astype('float32') for img in imgs]) #dhwc
np_imgs = group_center_crop(np_imgs, target_size)
if mode == 'train':
np_imgs = group_crop(np_imgs, target_size)
np_imgs = group_random_flip(np_imgs)
else:
np_imgs = group_crop(np_imgs, target_size, is_center=True)
np_imgs = np_imgs.transpose(0, 3, 1, 2) / 255 #dchw np_imgs = np_imgs.transpose(0, 3, 1, 2) / 255 #dchw
np_imgs -= img_mean np_imgs -= img_mean
np_imgs /= img_std np_imgs /= img_std
...@@ -189,20 +250,33 @@ def imgs_transform(imgs, ...@@ -189,20 +250,33 @@ def imgs_transform(imgs,
return np_imgs return np_imgs
def group_center_crop(np_imgs, target_size): def group_crop(np_imgs, target_size, is_center=True):
d, h, w, c = np_imgs.shape d, h, w, c = np_imgs.shape
th, tw = target_size, target_size th, tw = target_size, target_size
assert (w >= target_size) and (h >= target_size), \ assert (w >= target_size) and (h >= target_size), \
"image width({}) and height({}) should be larger than crop size".format(w, h, target_size) "image width({}) and height({}) should be larger than crop size".format(w, h, target_size)
h_off = int(round((h - th) / 2.)) if is_center:
w_off = int(round((w - tw) / 2.)) h_off = int(round((h - th) / 2.))
w_off = int(round((w - tw) / 2.))
else:
w_off = random.randint(0, w - tw)
h_off = random.randint(0, h - th)
img_crop = np_imgs[:, h_off:h_off + target_size, w_off:w_off + img_crop = np_imgs[:, h_off:h_off + target_size, w_off:w_off +
target_size, :] target_size, :]
return img_crop return img_crop
def group_random_flip(np_imgs):
prob = random.random()
if prob < 0.5:
ret = np_imgs[:, :, ::-1, :]
return ret
else:
return np_imgs
def group_scale(imgs, target_size): def group_scale(imgs, target_size):
resized_imgs = [] resized_imgs = []
for i in range(len(imgs)): for i in range(len(imgs)):
...@@ -239,13 +313,22 @@ def mp4_loader(filepath, nsample, seglen, mode): ...@@ -239,13 +313,22 @@ def mp4_loader(filepath, nsample, seglen, mode):
imgs = [] imgs = []
for i in range(nsample): for i in range(nsample):
idx = 0 idx = 0
if average_dur >= seglen: if mode == 'train':
idx = (average_dur - 1) // 2 if average_dur >= seglen:
idx += i * average_dur idx = random.randint(0, average_dur - seglen)
elif average_dur >= 1: idx += i * average_dur
idx += i * average_dur elif average_dur >= 1:
idx += i * average_dur
else:
idx = i
else: else:
idx = i if average_dur >= seglen:
idx = (average_dur - 1) // 2
idx += i * average_dur
elif average_dur >= 1:
idx += i * average_dur
else:
idx = i
for jj in range(idx, idx + seglen): for jj in range(idx, idx + seglen):
imgbuf = sampledFrames[int(jj % len(sampledFrames))] imgbuf = sampledFrames[int(jj % len(sampledFrames))]
...@@ -253,3 +336,34 @@ def mp4_loader(filepath, nsample, seglen, mode): ...@@ -253,3 +336,34 @@ def mp4_loader(filepath, nsample, seglen, mode):
imgs.append(img) imgs.append(img)
return imgs return imgs
def frames_loader(recode, nsample, seglen, mode):
imgpath, num_frames = recode.path, recode.num_frames
average_dur = int(num_frames / nsample)
imgs = []
for i in range(nsample):
idx = 0
if mode == 'train':
if average_dur >= seglen:
idx = random.randint(0, average_dur - seglen)
idx += i * average_dur
elif average_dur >= 1:
idx += i * average_dur
else:
idx = i
else:
if average_dur >= seglen:
idx = (average_dur - 1) // 2
idx += i * average_dur
elif average_dur >= 1:
idx += i * average_dur
else:
idx = i
for jj in range(idx, idx + seglen):
img = Image.open(
os.path.join(imgpath, 'img_{:05d}.jpg'.format(jj + 1))).convert(
'RGB')
imgs.append(img)
return imgs
export CUDA_VISIBLE_DEVICES=0
# TSN + AttentionLSTM
python videotag_main.py
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import os
import sys
import argparse
import ast
import logging
import paddle.fluid as fluid
from utils.train_utils import train_with_dataloader
import models
from utils.config_utils import *
from reader import get_reader
from metrics import get_metrics
from utils.utility import check_cuda
from utils.utility import check_version
logging.root.handlers = []
FORMAT = '[%(levelname)s: %(filename)s: %(lineno)4d]: %(message)s'
logging.basicConfig(level=logging.INFO, format=FORMAT, stream=sys.stdout)
logger = logging.getLogger(__name__)
def parse_args():
parser = argparse.ArgumentParser("Paddle Video train script")
parser.add_argument(
'--model_name',
type=str,
default='AttentionCluster',
help='name of model to train.')
parser.add_argument(
'--config',
type=str,
default='configs/attention_cluster.txt',
help='path to config file of model')
parser.add_argument(
'--batch_size',
type=int,
default=None,
help='training batch size. None to use config file setting.')
parser.add_argument(
'--learning_rate',
type=float,
default=None,
help='learning rate use for training. None to use config file setting.')
parser.add_argument(
'--pretrain', type=str, default=None, help='path to pretrain weights.')
parser.add_argument(
'--use_gpu',
type=ast.literal_eval,
default=True,
help='default use gpu.')
parser.add_argument(
'--no_memory_optimize',
action='store_true',
default=False,
help='whether to use memory optimize in train')
parser.add_argument(
'--epoch',
type=int,
default=None,
help='epoch number, 0 for read from config file')
parser.add_argument(
'--valid_interval',
type=int,
default=1,
help='validation epoch interval, 0 for no validation.')
parser.add_argument(
'--save_dir',
type=str,
default=os.path.join('data', 'checkpoints'),
help='directory name to save train snapshoot')
parser.add_argument(
'--log_interval',
type=int,
default=1,
help='mini-batch interval to log.')
parser.add_argument(
'--fix_random_seed',
type=ast.literal_eval,
default=False,
help='If set True, enable continuous evaluation job.')
args = parser.parse_args()
return args
def train(args):
# parse config
config = parse_config(args.config)
train_config = merge_configs(config, 'train', vars(args))
valid_config = merge_configs(config, 'valid', vars(args))
print_configs(train_config, 'Train')
train_model = models.get_model(args.model_name, train_config, mode='train')
valid_model = models.get_model(args.model_name, valid_config, mode='valid')
# build model
startup = fluid.Program()
train_prog = fluid.Program()
if args.fix_random_seed:
startup.random_seed = 1000
train_prog.random_seed = 1000
with fluid.program_guard(train_prog, startup):
with fluid.unique_name.guard():
train_model.build_input(use_dataloader=True)
train_model.build_model()
# for the input, has the form [data1, data2,..., label], so train_feeds[-1] is label
train_feeds = train_model.feeds()
train_fetch_list = train_model.fetches()
train_loss = train_fetch_list[0]
optimizer = train_model.optimizer()
optimizer.minimize(train_loss)
train_dataloader = train_model.dataloader()
valid_prog = fluid.Program()
with fluid.program_guard(valid_prog, startup):
with fluid.unique_name.guard():
valid_model.build_input(use_dataloader=True)
valid_model.build_model()
valid_feeds = valid_model.feeds()
valid_fetch_list = valid_model.fetches()
valid_dataloader = valid_model.dataloader()
place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(startup)
if args.pretrain:
train_model.load_pretrain_params(exe, args.pretrain, train_prog)
build_strategy = fluid.BuildStrategy()
build_strategy.enable_inplace = True
exec_strategy = fluid.ExecutionStrategy()
compiled_train_prog = fluid.compiler.CompiledProgram(
train_prog).with_data_parallel(
loss_name=train_loss.name,
build_strategy=build_strategy,
exec_strategy=exec_strategy)
compiled_valid_prog = fluid.compiler.CompiledProgram(
valid_prog).with_data_parallel(
share_vars_from=compiled_train_prog,
build_strategy=build_strategy,
exec_strategy=exec_strategy)
# get reader
bs_denominator = 1
if args.use_gpu:
# check number of GPUs
gpus = os.getenv("CUDA_VISIBLE_DEVICES", "")
if gpus == "":
pass
else:
gpus = gpus.split(",")
num_gpus = len(gpus)
assert num_gpus == train_config.TRAIN.num_gpus, \
"num_gpus({}) set by CUDA_VISIBLE_DEVICES " \
"shoud be the same as that " \
"set in {}({})".format(
num_gpus, args.config, train_config.TRAIN.num_gpus)
bs_denominator = train_config.TRAIN.num_gpus
train_config.TRAIN.batch_size = int(train_config.TRAIN.batch_size /
bs_denominator)
valid_config.VALID.batch_size = int(valid_config.VALID.batch_size /
bs_denominator)
train_reader = get_reader(args.model_name.upper(), 'train', train_config)
valid_reader = get_reader(args.model_name.upper(), 'valid', valid_config)
# get metrics
train_metrics = get_metrics(args.model_name.upper(), 'train', train_config)
valid_metrics = get_metrics(args.model_name.upper(), 'valid', valid_config)
epochs = args.epoch or train_model.epoch_num()
exe_places = fluid.cuda_places() if args.use_gpu else fluid.cpu_places()
train_dataloader.set_sample_list_generator(train_reader, places=exe_places)
valid_dataloader.set_sample_list_generator(valid_reader, places=exe_places)
train_with_dataloader(
exe,
train_prog,
compiled_train_prog,
train_dataloader,
train_fetch_list,
train_metrics,
epochs=epochs,
log_interval=args.log_interval,
valid_interval=args.valid_interval,
save_dir=args.save_dir,
save_model_name=args.model_name,
fix_random_seed=args.fix_random_seed,
compiled_test_prog=compiled_valid_prog,
test_dataloader=valid_dataloader,
test_fetch_list=valid_fetch_list,
test_metrics=valid_metrics)
if __name__ == "__main__":
args = parse_args()
# check whether the installed paddle is compiled with GPU
check_cuda(args.use_gpu)
check_version()
logger.info(args)
if not os.path.exists(args.save_dir):
os.makedirs(args.save_dir)
train(args)
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import os
import sys
import time
import logging
import argparse
import ast
import numpy as np
try:
import cPickle as pickle
except:
import pickle
import paddle.fluid as fluid
from utils.config_utils import *
import models
from reader import get_reader
from metrics import get_metrics
from utils.utility import check_cuda
from utils.utility import check_version
logging.root.handlers = []
FORMAT = '[%(levelname)s: %(filename)s: %(lineno)4d]: %(message)s'
logging.basicConfig(level=logging.DEBUG, format=FORMAT, stream=sys.stdout)
logger = logging.getLogger(__name__)
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument(
'--model_name',
type=str,
default='AttentionCluster',
help='name of model to train.')
parser.add_argument(
'--config',
type=str,
default='configs/attention_cluster.txt',
help='path to config file of model')
parser.add_argument(
'--use_gpu',
type=ast.literal_eval,
default=True,
help='default use gpu.')
parser.add_argument(
'--weights',
type=str,
default=None,
help='weight path, None to automatically download weights provided by Paddle.'
)
parser.add_argument(
'--batch_size',
type=int,
default=1,
help='sample number in a batch for inference.')
parser.add_argument(
'--filelist',
type=str,
default='./data/TsnExtractor.list',
help='path to inferenece data file lists file.')
parser.add_argument(
'--log_interval',
type=int,
default=1,
help='mini-batch interval to log.')
parser.add_argument(
'--infer_topk',
type=int,
default=20,
help='topk predictions to restore.')
parser.add_argument(
'--save_dir',
type=str,
default=os.path.join('data', 'tsn_features'),
help='directory to store tsn feature results')
parser.add_argument(
'--video_path',
type=str,
default=None,
help='directory to store results')
args = parser.parse_args()
return args
def infer(args):
# parse config
config = parse_config(args.config)
infer_config = merge_configs(config, 'infer', vars(args))
print_configs(infer_config, "Infer")
infer_model = models.get_model(
args.model_name, infer_config, mode='infer', is_videotag=True)
infer_model.build_input(use_dataloader=False)
infer_model.build_model()
infer_feeds = infer_model.feeds()
infer_outputs = infer_model.outputs()
place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
filelist = args.filelist or infer_config.INFER.filelist
filepath = args.video_path or infer_config.INFER.get('filepath', '')
if filepath != '':
assert os.path.exists(filepath), "{} not exist.".format(filepath)
else:
assert os.path.exists(filelist), "{} not exist.".format(filelist)
# get infer reader
infer_reader = get_reader(args.model_name.upper(), 'infer', infer_config)
if args.weights:
assert os.path.exists(
args.weights), "Given weight dir {} not exist.".format(args.weights)
# if no weight files specified, download weights from paddle
weights = args.weights or infer_model.get_weights()
infer_model.load_test_weights(exe, weights, fluid.default_main_program())
infer_feeder = fluid.DataFeeder(place=place, feed_list=infer_feeds)
fetch_list = infer_model.fetches()
infer_metrics = get_metrics(args.model_name.upper(), 'infer', infer_config)
infer_metrics.reset()
if not os.path.isdir(args.save_dir):
os.makedirs(args.save_dir)
for infer_iter, data in enumerate(infer_reader()):
data_feed_in = [items[:-1] for items in data]
video_id = [items[-1] for items in data]
bs = len(video_id)
feature_outs = exe.run(fetch_list=fetch_list,
feed=infer_feeder.feed(data_feed_in))
for i in range(bs):
filename = video_id[i].split('/')[-1][:-4]
np.save(
os.path.join(args.save_dir, filename + '.npy'),
feature_outs[0][i]) #shape: seg_num*feature_dim
logger.info("Feature extraction End~")
if __name__ == "__main__":
args = parse_args()
# check whether the installed paddle is compiled with GPU
check_cuda(args.use_gpu)
check_version()
logger.info(args)
infer(args)
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import sys
import time
import numpy as np
import paddle
import paddle.fluid as fluid
from paddle.fluid import profiler
import logging
import shutil
logger = logging.getLogger(__name__)
def log_lr_and_step():
try:
# In optimizers, if learning_rate is set as constant, lr_var
# name is 'learning_rate_0', and iteration counter is not
# recorded. If learning_rate is set as decayed values from
# learning_rate_scheduler, lr_var name is 'learning_rate',
# and iteration counter is recorded with name '@LR_DECAY_COUNTER@',
# better impliment is required here
lr_var = fluid.global_scope().find_var("learning_rate")
if not lr_var:
lr_var = fluid.global_scope().find_var("learning_rate_0")
lr = np.array(lr_var.get_tensor())
lr_count = '[-]'
lr_count_var = fluid.global_scope().find_var("@LR_DECAY_COUNTER@")
if lr_count_var:
lr_count = np.array(lr_count_var.get_tensor())
logger.info("------- learning rate {}, learning rate counter {} -----"
.format(np.array(lr), np.array(lr_count)))
except:
logger.warn("Unable to get learning_rate and LR_DECAY_COUNTER.")
def test_with_dataloader(exe,
compiled_test_prog,
test_dataloader,
test_fetch_list,
test_metrics,
log_interval=0,
save_model_name=''):
if not test_dataloader:
logger.error("[TEST] get dataloader failed.")
test_metrics.reset()
test_iter = 0
for data in test_dataloader():
test_outs = exe.run(compiled_test_prog,
fetch_list=test_fetch_list,
feed=data)
test_metrics.accumulate(test_outs)
if log_interval > 0 and test_iter % log_interval == 0:
test_metrics.calculate_and_log_out(test_outs, \
info = '[TEST] test_iter {} '.format(test_iter))
test_iter += 1
test_metrics.finalize_and_log_out("[TEST] Finish")
def train_with_dataloader(exe, train_prog, compiled_train_prog, train_dataloader, \
train_fetch_list, train_metrics, epochs = 10, \
log_interval = 0, valid_interval = 0, save_dir = './', \
num_trainers = 1, trainer_id = 0, \
save_model_name = 'model', fix_random_seed = False, \
compiled_test_prog = None, test_dataloader = None, \
test_fetch_list = None, test_metrics = None, \
is_profiler = None, profiler_path = None):
if not train_dataloader:
logger.error("[TRAIN] get dataloader failed.")
epoch_periods = []
train_loss = 0
for epoch in range(epochs):
log_lr_and_step()
train_iter = 0
epoch_periods = []
cur_time = time.time()
for data in train_dataloader():
train_outs = exe.run(compiled_train_prog,
fetch_list=train_fetch_list,
feed=data)
period = time.time() - cur_time
epoch_periods.append(period)
timeStamp = time.time()
localTime = time.localtime(timeStamp)
strTime = time.strftime("%Y-%m-%d %H:%M:%S", localTime)
if log_interval > 0 and (train_iter % log_interval == 0):
train_metrics.calculate_and_log_out(train_outs, \
info = '[TRAIN {}] Epoch {}, iter {}, time {}, '.format(strTime, epoch, train_iter, period))
train_iter += 1
cur_time = time.time()
# NOTE: profiler tools, used for benchmark
if is_profiler and epoch == 0 and train_iter == log_interval:
profiler.start_profiler("All")
elif is_profiler and epoch == 0 and train_iter == log_interval + 5:
profiler.stop_profiler("total", profiler_path)
return
if len(epoch_periods) < 1:
logger.info(
'No iteration was executed, please check the data reader')
sys.exit(1)
logger.info('[TRAIN] Epoch {} training finished, average time: {}'.
format(epoch, np.mean(epoch_periods[1:])))
if trainer_id == 0:
save_model(exe, train_prog, save_dir, save_model_name,
"_epoch{}".format(epoch))
if compiled_test_prog and valid_interval > 0 and (
epoch + 1) % valid_interval == 0:
test_with_dataloader(exe, compiled_test_prog, test_dataloader,
test_fetch_list, test_metrics, log_interval,
save_model_name)
if trainer_id == 0:
save_model(exe, train_prog, save_dir, save_model_name)
#when fix_random seed for debug
if fix_random_seed:
cards = os.environ.get('CUDA_VISIBLE_DEVICES')
gpu_num = len(cards.split(","))
print("kpis\ttrain_cost_card{}\t{}".format(gpu_num, train_loss))
print("kpis\ttrain_speed_card{}\t{}".format(gpu_num,
np.mean(epoch_periods)))
def save_model(exe, program, save_dir, model_name, postfix=''):
"""save paramters and optimizer related varaibles"""
if not os.path.isdir(save_dir):
os.makedirs(save_dir)
saved_model_name = model_name + postfix
fluid.save(program, os.path.join(save_dir, saved_model_name))
return
...@@ -78,10 +78,13 @@ def parse_args(): ...@@ -78,10 +78,13 @@ def parse_args():
parser.add_argument( parser.add_argument(
'--filelist', '--filelist',
type=str, type=str,
default=None, default='./data/VideoTag_test.list',
help='path of video data, multiple video') help='path of video data, multiple video')
parser.add_argument( parser.add_argument(
'--save_dir', type=str, default='data/results', help='output file path') '--save_dir',
type=str,
default='data/VideoTag_results',
help='output file path')
parser.add_argument( parser.add_argument(
'--label_file', '--label_file',
type=str, type=str,
...@@ -116,7 +119,10 @@ def main(): ...@@ -116,7 +119,10 @@ def main():
with fluid.unique_name.guard(): with fluid.unique_name.guard():
# build model # build model
extractor_model = models.get_model( extractor_model = models.get_model(
args.extractor_name, extractor_infer_config, mode='infer') args.extractor_name,
extractor_infer_config,
mode='infer',
is_videotag=True)
extractor_model.build_input(use_dataloader=False) extractor_model.build_input(use_dataloader=False)
extractor_model.build_model() extractor_model.build_model()
extractor_feeds = extractor_model.feeds() extractor_feeds = extractor_model.feeds()
...@@ -129,8 +135,9 @@ def main(): ...@@ -129,8 +135,9 @@ def main():
logger.info('load extractor weights from {}'.format( logger.info('load extractor weights from {}'.format(
args.extractor_weights)) args.extractor_weights))
extractor_model.load_test_weights(exe, args.extractor_weights,
extractor_main_prog) extractor_model.load_pretrain_params(
exe, args.extractor_weights, extractor_main_prog)
# get reader and metrics # get reader and metrics
extractor_reader = get_reader(args.extractor_name, 'infer', extractor_reader = get_reader(args.extractor_name, 'infer',
...@@ -224,8 +231,6 @@ def main(): ...@@ -224,8 +231,6 @@ def main():
if __name__ == '__main__': if __name__ == '__main__':
import paddle
paddle.enable_static()
start_time = time.time() start_time = time.time()
args = parse_args() args = parse_args()
print(args) print(args)
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册