未验证 提交 c44bfba7 编写于 作者: H huangjun12 提交者: GitHub

Update video tag (#4916)

* update videotag, add fine-tune code and doc

* refine datalist

* refine eval.py
上级 f24158de
# 模型微调指南
---
## 内容
参考本文档,您可以使用自己的训练数据在VideoTag预训练模型上进行fine-tune,训练出自己的模型。
文档内容包括:
- [原理解析](#原理解析)
- [对AttentionLSTM模型进行微调](#对AttentionLSTM模型进行微调)
- [对TSN模型进行微调](#对TSN模型进行微调)
- [扩展内容](#扩展内容)
- [参考论文](#参考论文)
## 原理解析
VideoTag采用两阶段建模方式,由两个模型组成: TSN + AttentionLSTM。
Temporal Segment Network (TSN) 是经典的基于2D-CNN的视频分类模型。该模型通过稀疏采样视频帧的方式,在捕获视频时序信息的同时降低了计算量。详细内容请参考论文[Temporal Segment Networks: Towards Good Practices for Deep Action Recognition](https://arxiv.org/abs/1608.00859)
AttentionLSTM以视频的特征向量作为输入,采用双向长短时记忆网络(LSTM)对所有帧特征进行编码,并增加Attention层,将每个时刻的隐状态输出与自适应权重线性加权得到最终分类向量。详细内容请参考论文[AttentionCluster](https://arxiv.org/abs/1711.09550)
VideoTag训练时分两个阶段: 第一阶段使用少量视频样本(十万级别)训练大规模视频特征提取模型(TSN);第二阶段使用千万级数据训练预测器(AttentionLSTM)。
VideoTag预测时也分两个阶段: 第一阶段以视频文件作为输入,经过去除了全连接层以及损失函数层的TSN网络后得到输出特征向量;第二阶段以TSN网络输出的特征向量作为输入,经过AttentionLSTM后得到最终的分类结果。
基于我们的预模型,您可以使用自己的训练数据进行fine-tune:
- [对AttentionLSTM模型进行微调](#对AttentionLSTM模型进行微调)
- [对TSN模型进行微调](#对TSN模型进行微调)
## 对AttentionLSTM模型进行微调
AttentionLSTM以视频特征作为输入,显存占用少,训练速度较TSN更快,因此推荐优先对AttentionLSTM模型进行微调。输入视频首先经过TSN预训练模型提取特征向量,然后将特征向量作为训练输入数据,微调AttentionLSTM模型。
### TSN预模型提取特征向量
#### 数据准备
- 预训练权重下载: 参考[样例代码运行指南-数据准备-预训练权重下载](./Run.md)
- 准备训练数据: 准备好待训练的视频数据,并在video\_tag/data/TsnExtractor.list文件中指定待训练的文件路径,内容格式如下:
```
my_video_path/my_video_file1.mp4
my_video_path/my_video_file2.mp4
...
```
#### 特征提取
特征提取脚本如下:
```
python tsn_extractor.py --model_name=TSN --config=./configs/tsn.yaml --weights=./weights/tsn.pdparams
```
- 通过--weights可指定TSN权重参数的存储路径,默认为video\_tag/weights/tsn.pdparams
- 通过--save\_dir可指定特征向量保存路径,默认为video\_tag/data/tsn\_features,不同输入视频的特征向量提取结果分文件保存在不同的npy文件中,目录形式为:
```
video_tag
├──data
├──tsn_features
├── my_feature_file1.npy
├── my_feature_file2.npy
...
```
- tsn提取的特征向量维度为```帧数*特征维度```,默认为300 * 2048。
### AttentionLSTM模型Fine-tune
#### 数据准备
VideoTag中的AttentionLSTM以TSN模型提取的特征向量作为输入。在video\_tag/data/dataset/attention\_lstm/train.list文件中指定待训练的文件路径和对应的标签,内容格式如下:
```
my_feature_path/my_feature_file1.npy label1 label2
my_feature_path/my_feature_file2.npy label1
...
```
- 一个输入视频可以有多个标签,标签索引为整型数据,文件名与标签之间、多个标签之间以一个空格分隔;
- 标签索引与标签名称的之间的对应关系以list文件指定,可参考VideoTag用到的label_3396.txt文件构造,行索引对应标签索引;
- 验证集、测试集以及预测数据集的构造方式同训练集类似,仅需要在video\_tag/data/attention\_lstm/目录下对应的list文件中指定相关文件路径/标签即可。
#### 模型训练
使用VideoTag中的AttentionLSTM预模型进行fine-tune训练脚本如下:
```
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python train.py --model_name=AttentionLSTM --config=./configs/attention_lstm.yaml --pretrain=./weights/attention_lstm
```
- AttentionLSTM模型默认使用8卡训练,总的batch size数是1024。若使用单卡训练,请修改环境变量,脚本如下:
```
export CUDA_VISIBLE_DEVICES=0
python train.py --model_name=AttentionLSTM --config=./configs/attention_lstm-single.yaml --pretrain=./weights/attention_lstm
```
- 请确保训练样本数大于batch_size数
- 通过--pretrain参数可指定AttentionLSTM预训练模型的路径,默认为./weights/attention\_lstm;
- 模型相关配置写在video_tag/configs/attention\_lstm.yaml文件中,可以方便的调节各项超参数;
- 通过--save_dir参数可指定训练模型参数的保存路径,默认为./data/checkpoints;
#### 模型评估
可用如下方式进行模型评估:
```
python eval.py --model_name=AttentionLSTM --config=./configs/attention_lstm.yaml --weights=./data/checkpoints/AttentionLSTM_epoch9.pdparams
```
- 通过--weights参数可指定评估需要的权重,默认为./data/checkpoints/AttentionLSTM_epoch9.pdparams;
- 评估结果以log的形式直接打印输出GAP、Hit@1等精度指标。
#### 模型推断
可用如下方式进行模型推断:
```
python predict.py --model_name=AttentionLSTM --config=./configs/attention_lstm.yaml --weights=./data/checkpoints/AttentionLSTM_epoch9.pdparams
```
- 通过--weights参数可指定推断需要的权重,默认为./data/checkpoints/AttentionLSTM_epoch9.pdparams;
- 通过--label_file参数指定标签文件,请根据自己的数据修改,默认为./label_3396.txt;
- 预测结果会以日志形式打印出来,同时也保存在json文件中,通过--save_dir参数可指定预测结果保存路径,默认为./data/predict_results/attention_lstm。
## 对TSN模型进行微调
VideoTag中使用的TSN模型以mp4文件为输入,backbone为ResNet101。
### 数据准备
准备好训练视频文件后,在video\_tag/data/dataset/tsn/train.list文件中指定待训练的文件路径和对应的标签即可,内容格式如下:
```
my_video_path/my_video_file1.mp4 label1
my_video_path/my_video_file2.mp4 label2
...
```
- 一个输入视频只能有一个标签,标签索引为整型数据,标签索引与文件名之间以一个空格分隔;
- 验证集、测试集以及预测数据集的构造方式同训练集类似,仅需要在video\_tag/data/dataset/tsn目录下对应的list文件中指定相关文件路径/标签即可。
#### 模型训练
使用VideoTag中的TSN预模型进行fine-tune训练脚本如下:
```
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python train.py --model_name=TSN --config=./configs/tsn.yaml --pretrain=./weights/tsn
```
- TSN模型默认使用8卡训练,总的batch size数是256。若使用单卡训练,请修改环境变量,脚本如下:
```
export CUDA_VISIBLE_DEVICES=0
python train.py --model_name=TSN --config=./configs/tsn-single.yaml --pretrain=./weights/tsn
```
- 通过--pretrain参数可指定TSN预训练模型的路径,示例为./weights/tsn;
- 模型相关配置写在video_tag/configs/tsn.yaml文件中,可以方便的调节各项超参数;
- 通过--save_dir参数可指定训练模型参数的保存路径,默认为./data/checkpoints;
#### 模型评估
可用如下方式进行模型评估:
```
python eval.py --model_name=TSN --config=./configs/tsn.yaml --weights=./data/checkpoints/TSN_epoch44.pdparams
```
- 通过--weights参数可指定评估需要的权重,示例为./data/checkpoints/TSN_epoch44.pdparams;
- 评估结果以log的形式直接打印输出TOP1_ACC、TOP5_ACC等精度指标。
#### 模型推断
可用如下方式进行模型推断:
```
python predict.py --model_name=TSN --config=./configs/tsn.yaml --weights=./data/checkpoints/TSN_epoch44.pdparams --save_dir=./data/predict_results/tsn/
```
- 通过--weights参数可指定推断需要的权重,示例为./data/checkpoints/TSN_epoch44.pdparams;
- 通过--label_file参数指定标签文件,请根据自己的数据修改,默认为./label_3396.txt;
- 预测结果会以日志形式打印出来,同时也保存在json文件中,通过--save_dir参数可指定预测结果保存路径,示例为./data/predict_results/tsn。
### 训练加速
TSN模型默认以mp4的视频文件作为输入,训练时需要先对视频文件解码,再将解码后的数据送入网络进行训练,如果视频文件很大,这个过程将会很耗时。
为加速训练,可以先将视频解码成图片,然后保存下来,训练时直接根据索引读取帧图片作为输入,加快训练过程。
- 数据准备: 首先将视频解码,存成帧图片;然后生成帧图片的文件路径列表。实现过程可参考[ucf-101数据准备](../../../../dygraph/tsn/data/dataset/ucf101/README.md)
- 修改配置文件: 修改配置文件./config/tsn.yaml,其中MODEL.format值改为"frames",不同模式下的filelist值改为对应的帧图片文件list。
## 扩展内容
- 更多关于TSN模型的内容可参考PaddleCV视频库[TSN视频分类模型](../../models/tsn/README.md)
- 更多关于AttentionLSTM模型的内容可参考PaddleCV视频库[AttentionLSTM视频分类模型](../../models/attention_lstm/README.md)
## 参考论文
- [Temporal Segment Networks: Towards Good Practices for Deep Action Recognition](https://arxiv.org/abs/1608.00859), Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, Luc Van Gool
- [Beyond Short Snippets: Deep Networks for Video Classification](https://arxiv.org/abs/1503.08909) Joe Yue-Hei Ng, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, George Toderici
......@@ -4,11 +4,7 @@
## 内容
- [模型简介](#模型简介)
- [安装说明](#安装说明)
- [数据准备](#数据准备)
- [模型推断](#模型推断)
- [模型微调](#模型微调)
- [参考论文](#参考论文)
- [使用方法](#使用方法)
## 模型简介
......@@ -16,7 +12,7 @@
飞桨大规模视频分类模型VideoTag基于百度短视频业务千万级数据,支持3000个源于产业实践的实用标签,具有良好的泛化能力,非常适用于国内大规模(千万/亿/十亿级别)短视频分类场景的应用。VideoTag采用两阶段建模方式,即图像建模和序列学习。第一阶段,使用少量视频样本(十万级别)训练大规模视频特征提取模型(Extractor);第二阶段,使用千万级数据训练预测器(Predictor),最终实现在超大规模(千万/亿/十亿级别)短视频上产业应用,其原理示意如下图所示。
<p align="center">
<img src="video_tag.png" height=220 width=800 hspace='10'/> <br />
<img src="images.png" height=220 width=800 hspace='10'/> <br />
Temporal shift module
</p>
......@@ -29,87 +25,7 @@ Temporal shift module
- 预测结果:融合多个模型结果实现视频分类,进一步提高分类准确率。
## 安装说明
运行样例代码需要PaddlePaddle版本>= 1.7.0,请参考[安装文档](https://www.paddlepaddle.org.cn/documentation/docs/zh/1.7/install/index_cn.html)安装PaddlePaddle。
- 环境依赖:
```
CUDA >= 9.0
cudnn >= 7.5
OpenCV >= 4.1.0 : pip install opencv-python
```
## 数据准备
- 预训练权重下载:我们提供了[TSN](https://videotag.bj.bcebos.com/video_tag_tsn.tar)[AttentionLSTM](https://videotag.bj.bcebos.com/video_tag_lstm.tar)预训练权重,请下载后解压,并将参数文件放在weights目录下,目录结构如下:
```
video_tag
├──weights
├── attention_lstm.pdmodel
├── attention_lstm.pdopt
├── attention_lstm.pdparams
├── tsn.pdmodel
├── tsn.pdopt
└── tsn.pdparams
```
- 示例视频下载:我们提供了[样例视频](https://videotag.bj.bcebos.com/mp4.tar)方便用户测试,请下载后解压,并将视频文件放置在video\_tag/data/mp4目录下,目录结构如下:
```
video_tag
├──data
├── mp4
├── 1.mp4
└── 2.mp4
```
- 目前支持的视频文件输入格式为:mp4、mkv和webm格式;
- 模型会从输入的视频文件中均匀抽取300帧用于预测。对于较长的视频文件,建议先截取有效部分输入模型以提高预测速度。
## 模型推断
模型推断的启动方式如下:
bash run_TSN_LSTM.sh
- 可修改video\_tag/data/tsn.list文件内容,指定待推断的文件路径列表;
- 通过--filelist可指定输入list文件路径,默认为video\_tag/data/tsn.list;
- 通过--extractor\_weights可指定特征提取器参数的存储路径,默认为video\_tag/weights/tsn;
- 通过--predictor\_weights可指定预测器参数的存储路径,默认为video\_tag/weights/attention\_lstm;
- 通过--use\_gpu参数可指定是否使用gpu进行推断,默认使用gpu。对于10s左右的短视频文件,gpu推断时间约为4s;
- 通过--save\_dir可指定预测结果存储路径,默认为video\_tag/data/results,结果保存在json文件中,其格式为:
```
[file_path,
{"class_name": class_name1, "probability": probability1, "class_id": class_id1},
{"class_name": class_name2, "probability": probability2, "class_id": class_id2},
...
]
```
- 通过--label\_file可指定标签文件存储路径,默认为video\_tag/label\_3396.txt;
- 模型相关配置写在video\_tag/configs目录下的yaml文件中。
## 模型微调
- VideoTag中的TSN模型只输出视频特征,无需输出最终分类结果,fine-tune请参考PaddleCV视频库[TSN视频分类模型](../../models/tsn/README.md)请对应修改模型文件。
- VideoTag中的attention\_lstm模型只需要输入视频特征,无需音频特征输入,fine-tune请参考PaddleCV视频库[AttentionLSTM视频分类模型](../../models/attention_lstm/README.md)对应修改模型文件。
## 参考论文
- [Temporal Segment Networks: Towards Good Practices for Deep Action Recognition](https://arxiv.org/abs/1608.00859), Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, Luc Van Gool
- [Beyond Short Snippets: Deep Networks for Video Classification](https://arxiv.org/abs/1503.08909) Joe Yue-Hei Ng, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, George Toderici
## 使用方法
- [1. 如何运行样例代码](./Run.md)
- [2. 如何使用自己的数据进行测试](./Test.md)
- [3. 如何进行模型fine-tune](./FineTune.md)
# 样例代码运行指南
---
## 内容
参考本文档,您可以快速熟悉VideoTag的使用方法,观察VideoTag的预训练模型在示例视频上的预测结果。
文档内容包括:
- [安装说明](#安装说明)
- [数据准备](#数据准备)
- [模型推断](#模型推断)
## 安装说明
### 环境依赖:
```
CUDA >= 9.0
cudnn >= 7.5
```
### 依赖安装:
- 1.7.0 <= PaddlePaddle版本 <= 2.0.0: pip install paddlepaddle-gpu==1.8.4.post97 -i https://mirror.baidu.com/pypi/simple
- opencv版本 >= 4.1.0: pip install opencv-python==4.2.0.34
## 数据准备
### 预训练权重下载
我们提供了[TSN](https://videotag.bj.bcebos.com/video_tag_tsn.tar)[AttentionLSTM](https://videotag.bj.bcebos.com/video_tag_lstm.tar)预训练权重,请在video\_tag目录下新建weights目录,并将下载解压后的参数文件放在weights目录下:
```
mkdir weights
cd weights
wget https://videotag.bj.bcebos.com/video_tag_tsn.tar
wget https://videotag.bj.bcebos.com/video_tag_lstm.tar
tar -zxvf video_tag_tsn.tar
tar -zxvf video_tag_lstm.tar
rm video_tag_tsn.tar -rf
rm video_tag_lstm.tar -rf
mv video_tag_tsn/* .
mv attention_lstm/* .
rm video_tag_tsn/ -rf
rm attention_lstm -rf
```
所得目录结构如下:
```
video_tag
├──weights
├── attention_lstm.pdmodel
├── attention_lstm.pdopt
├── attention_lstm.pdparams
├── tsn.pdmodel
├── tsn.pdopt
└── tsn.pdparams
```
### 示例视频下载
我们提供了[样例视频](https://videotag.bj.bcebos.com/mp4.tar)方便用户测试,请下载后解压,并将视频文件放置在video\_tag/data/mp4目录下:
```
cd data/
wget https://videotag.bj.bcebos.com/mp4.tar
tar -zxvf mp4.tar
rm mp4.tar -rf
```
所得目录结构如下:
```
video_tag
├──data
├── mp4
├── 1.mp4
├── 2.mp4
└── ...
```
## 模型推断
模型推断的启动方式如下:
python videotag_test.py
- 预测结果会以日志方式打印,示例如下:
```
[========video_id [ data/mp4/1.mp4 ] , topk(20) preds: ========]
class_id: 3110, class_name: 训练 , probability: 0.97730666399
class_id: 2159, class_name: 蹲 , probability: 0.945082366467
...
[========video_id [ data/mp4/2.mp4 ] , topk(20) preds: ========]
class_id: 2773, class_name: 舞蹈 , probability: 0.850423932076
class_id: 1128, class_name: 表演艺术 , probability: 0.0446354188025
...
```
- 通过--save\_dir可指定预测结果存储路径,默认为video\_tag/data/VideoTag\_results,不同输入视频的预测结果分文件保存在不同的json文件中,文件的内容格式为:
```
[file_path,
{"class_name": class_name1, "probability": probability1, "class_id": class_id1},
{"class_name": class_name2, "probability": probability2, "class_id": class_id2},
...
]
```
# 预训练模型自测指南
## 内容
参考本文档,您可以快速测试VideoTag的预训练模型在自己业务数据上的预测效果。
主要内容包括:
- [数据准备](#数据准备)
- [模型推断](#模型推断)
## 数据准备
在数据准备阶段,您需要准备好自己的测试数据,并在video\_tag/data/VideoTag\_test.list文件中指定待推断的测试文件路径,内容格式如下:
```
my_video_path/my_video_file1.mp4
my_video_path/my_video_file2.mp4
...
```
## 模型推断
模型推断的启动方式如下:
python videotag_test.py
- 目前支持的视频文件输入格式为:mp4、mkv和webm格式;
- 模型会从输入的视频文件中*均匀抽取300帧*用于预测。对于较长的视频文件,建议先截取有效部分输入模型以提高预测速度;
- 通过--use\_gpu参数可指定是否使用gpu进行推断,默认使用gpu。对于10s左右的短视频文件,gpu推断时间约为4s;
- 通过--filelist可指定输入list文件路径,默认为video\_tag/data/VideoTag\_test.list。
MODEL:
name: "AttentionLSTM"
dataset: "YouTube-8M" #Default, don't recommand to modify it
bone_nework: None
drop_rate: 0.5
feature_names: ['rgb'] #rbg only, without audio
feature_dims: [2048]
embedding_size: 1024
lstm_size: 512
num_classes: 3396
topk: 20
TRAIN:
epoch: 10
learning_rate: 0.000125
decay_epochs: [5]
decay_gamma: 0.1
weight_decay: 0.0008
num_samples: 5000000 # modify it according to the number samples of your dataset
pretrain_base: None
batch_size: 128
use_gpu: True
num_gpus: 1
filelist: "data/dataset/attention_lstm/train.list"
VALID:
batch_size: 128
filelist: "data/dataset/attention_lstm/val.list"
TEST:
batch_size: 128
filelist: "data/dataset/attention_lstm/test.list"
INFER:
batch_size: 1
filelist: "data/dataset/attention_lstm/infer.list"
MODEL:
name: "AttentionLSTM"
dataset: None
dataset: "YouTube-8M" #Default, don't recommand to modify it
bone_nework: None
drop_rate: 0.5
feature_num: 2
feature_names: ['rgb']
feature_names: ['rgb'] #rbg only, without audio
feature_dims: [2048]
embedding_size: 1024
lstm_size: 512
embedding_size: 1024
lstm_size: 512
num_classes: 3396
topk: 20
TRAIN:
epoch: 10
learning_rate: 0.001
decay_epochs: [5]
decay_gamma: 0.1
weight_decay: 0.0008
num_samples: 5000000 # modify it according to the number samples of your dataset
pretrain_base: None
batch_size: 1024
use_gpu: True
num_gpus: 8
filelist: "data/dataset/attention_lstm/train.list"
VALID:
batch_size: 1024
filelist: "data/dataset/attention_lstm/val.list"
TEST:
batch_size: 128
filelist: "data/dataset/attention_lstm/test.list"
INFER:
batch_size: 1
filelist: "data/dataset/attention_lstm/infer.list"
MODEL:
name: "TSN"
format: "video" # ["video", "frames"]
num_classes: 400
seglen: 1
image_mean: [0.485, 0.456, 0.406]
image_std: [0.229, 0.224, 0.225]
num_layers: 101
topk: 5
TRAIN:
seg_num: 3 # training with 3 segments
epoch: 45
short_size: 256
target_size: 224
num_reader_threads: 12
buf_size: 1024
batch_size: 32
use_gpu: True
num_gpus: 1
filelist: "./data/dataset/tsn/train.list"
learning_rate: 0.00125
learning_rate_decay: 0.1
l2_weight_decay: 1e-4
momentum: 0.9
total_videos: 224684 # modify it according to the number samples of your dataset
VALID:
seg_num: 3
short_size: 256
target_size: 224
num_reader_threads: 12
buf_size: 1024
batch_size: 32
filelist: "./data/dataset/tsn/val.list"
TEST:
seg_num: 7
short_size: 256
target_size: 224
num_reader_threads: 12
buf_size: 1024
batch_size: 16
filelist: "./data/dataset/tsn/test.list"
INFER:
seg_num: 300 # infer using 300 segments
short_size: 256
target_size: 224
num_reader_threads: 12
buf_size: 1024
batch_size: 1
filelist: "./data/dataset/tsn/infer.list"
MODEL:
name: "TSN"
format: "mp4"
format: "video" # ["video", "frames"]
num_classes: 400
seglen: 1
image_mean: [0.485, 0.456, 0.406]
image_std: [0.229, 0.224, 0.225]
num_layers: 50
num_layers: 101
topk: 5
TRAIN:
seg_num: 3 # training with 3 segments
epoch: 45
short_size: 256
target_size: 224
num_reader_threads: 12
buf_size: 1024
batch_size: 256
use_gpu: True
num_gpus: 8
filelist: "./data/dataset/tsn/train.list"
learning_rate: 0.01
learning_rate_decay: 0.1
l2_weight_decay: 1e-4
momentum: 0.9
total_videos: 224684 # modify it according to the number samples of your dataset
VALID:
seg_num: 3
short_size: 256
target_size: 224
num_reader_threads: 12
buf_size: 1024
batch_size: 256
filelist: "./data/dataset/tsn/val.list"
TEST:
seg_num: 7
short_size: 256
target_size: 224
num_reader_threads: 12
buf_size: 1024
batch_size: 16
filelist: "./data/dataset/tsn/test.list"
INFER:
seg_num: 300
seg_num: 300 # infer using 300 segments
short_size: 256
target_size: 224
num_reader_threads: 1
num_reader_threads: 12
buf_size: 1024
batch_size: 1
kinetics_labels: None
video_path: ""
filelist: "./data/tsn.list"
filelist: "./data/dataset/tsn/infer.list"
data/mp4/1.mp4
data/mp4/2.mp4
data/mp4/3.mp4
data/mp4/4.mp4
data/mp4/5.mp4
data/mp4/1.mp4
data/mp4/2.mp4
data/mp4/3.mp4
data/mp4/4.mp4
data/mp4/5.mp4
data/tsn_features/1.npy
data/tsn_features/2.npy
data/tsn_features/3.npy
data/tsn_features/4.npy
data/tsn_features/5.npy
data/tsn_features/1.npy 0 3 4
data/tsn_features/2.npy 1
data/tsn_features/3.npy 2
data/tsn_features/4.npy 3 4
data/tsn_features/5.npy 4
data/tsn_features/1.npy 0 3 4
data/tsn_features/2.npy 1
data/tsn_features/3.npy 2
data/tsn_features/4.npy 3 4
data/tsn_features/5.npy 4
data/tsn_features/1.npy 0 3 4
data/tsn_features/2.npy 1
data/tsn_features/3.npy 2
data/tsn_features/4.npy 3 4
data/tsn_features/5.npy 4
data/mp4/1.mp4
data/mp4/2.mp4
data/mp4/3.mp4
data/mp4/4.mp4
data/mp4/5.mp4
data/mp4/1.mp4 0
data/mp4/2.mp4 1
data/mp4/3.mp4 2
data/mp4/4.mp4 3
data/mp4/5.mp4 4
data/mp4/1.mp4 0
data/mp4/2.mp4 1
data/mp4/3.mp4 2
data/mp4/4.mp4 3
data/mp4/5.mp4 4
data/mp4/1.mp4 0
data/mp4/2.mp4 1
data/mp4/3.mp4 2
data/mp4/4.mp4 3
data/mp4/5.mp4 4
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import os
import sys
import time
import logging
import argparse
import ast
import paddle.fluid as fluid
from utils.config_utils import *
import models
from reader import get_reader
from metrics import get_metrics
from utils.utility import check_cuda
from utils.utility import check_version
logging.root.handlers = []
FORMAT = '[%(levelname)s: %(filename)s: %(lineno)4d]: %(message)s'
logging.basicConfig(level=logging.INFO, format=FORMAT, stream=sys.stdout)
logger = logging.getLogger(__name__)
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument(
'--model_name',
type=str,
default='AttentionCluster',
help='name of model to train.')
parser.add_argument(
'--config',
type=str,
default='configs/attention_cluster.txt',
help='path to config file of model')
parser.add_argument(
'--batch_size',
type=int,
default=None,
help='test batch size. None to use config file setting.')
parser.add_argument(
'--use_gpu',
type=ast.literal_eval,
default=True,
help='default use gpu.')
parser.add_argument(
'--weights',
type=str,
default='./data/checkpoints/AttentionLSTM_epoch9.pdparams',
help='weight path.')
parser.add_argument(
'--save_dir',
type=str,
default=os.path.join('data', 'evaluate_results'),
help='output dir path, default to use ./data/evaluate_results')
parser.add_argument(
'--log_interval',
type=int,
default=1,
help='mini-batch interval to log.')
args = parser.parse_args()
return args
def test(args):
# parse config
config = parse_config(args.config)
test_config = merge_configs(config, 'test', vars(args))
print_configs(test_config, "Test")
use_dali = test_config['TEST'].get('use_dali', False)
# build model
test_model = models.get_model(args.model_name, test_config, mode='test')
test_model.build_input(use_dataloader=False)
test_model.build_model()
test_feeds = test_model.feeds()
test_fetch_list = test_model.fetches()
place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
if args.weights:
assert os.path.exists(
args.weights), "Given weight dir {} not exist.".format(args.weights)
weights = args.weights or test_model.get_weights()
logger.info('load test weights from {}'.format(weights))
test_model.load_test_weights(exe, weights, fluid.default_main_program())
# get reader and metrics
test_reader = get_reader(args.model_name.upper(), 'test', test_config)
test_metrics = get_metrics(args.model_name.upper(), 'test', test_config)
test_feeder = fluid.DataFeeder(place=place, feed_list=test_feeds)
epoch_period = []
for test_iter, data in enumerate(test_reader()):
cur_time = time.time()
test_outs = exe.run(fetch_list=test_fetch_list,
feed=test_feeder.feed(data))
period = time.time() - cur_time
epoch_period.append(period)
test_metrics.accumulate(test_outs)
# metric here
if args.log_interval > 0 and test_iter % args.log_interval == 0:
info_str = '[EVAL] Batch {}'.format(test_iter)
test_metrics.calculate_and_log_out(test_outs, info_str)
if not os.path.isdir(args.save_dir):
os.makedirs(args.save_dir)
test_metrics.finalize_and_log_out("[EVAL] eval finished. ", args.save_dir)
if __name__ == "__main__":
args = parse_args()
# check whether the installed paddle is compiled with GPU
check_cuda(args.use_gpu)
check_version()
logger.info(args)
test(args)
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
from __future__ import unicode_literals
from __future__ import print_function
from __future__ import division
import numpy as np
import datetime
import logging
logger = logging.getLogger(__name__)
class MetricsCalculator():
def __init__(self, name, mode):
self.name = name
self.mode = mode # 'train', 'val', 'test'
self.reset()
def reset(self):
logger.info('Resetting {} metrics...'.format(self.mode))
self.aggr_acc1 = 0.0
self.aggr_acc5 = 0.0
self.aggr_loss = 0.0
self.aggr_batch_size = 0
def finalize_metrics(self):
self.avg_acc1 = self.aggr_acc1 / self.aggr_batch_size
self.avg_acc5 = self.aggr_acc5 / self.aggr_batch_size
self.avg_loss = self.aggr_loss / self.aggr_batch_size
def get_computed_metrics(self):
json_stats = {}
json_stats['avg_loss'] = self.avg_loss
json_stats['avg_acc1'] = self.avg_acc1
json_stats['avg_acc5'] = self.avg_acc5
return json_stats
def calculate_metrics(self, loss, softmax, labels):
accuracy1 = compute_topk_accuracy(softmax, labels, top_k=1) * 100.
accuracy5 = compute_topk_accuracy(softmax, labels, top_k=5) * 100.
return accuracy1, accuracy5
def accumulate(self, loss, softmax, labels):
cur_batch_size = softmax.shape[0]
# if returned loss is None for e.g. test, just set loss to be 0.
if loss is None:
cur_loss = 0.
else:
cur_loss = np.mean(np.array(loss)) #
self.aggr_batch_size += cur_batch_size
self.aggr_loss += cur_loss * cur_batch_size
accuracy1 = compute_topk_accuracy(softmax, labels, top_k=1) * 100.
accuracy5 = compute_topk_accuracy(softmax, labels, top_k=5) * 100.
self.aggr_acc1 += accuracy1 * cur_batch_size
self.aggr_acc5 += accuracy5 * cur_batch_size
return
# ----------------------------------------------
# other utils
# ----------------------------------------------
def compute_topk_correct_hits(top_k, preds, labels):
'''Compute the number of corret hits'''
batch_size = preds.shape[0]
top_k_preds = np.zeros((batch_size, top_k), dtype=np.float32)
for i in range(batch_size):
top_k_preds[i, :] = np.argsort(-preds[i, :])[:top_k]
correctness = np.zeros(batch_size, dtype=np.int32)
for i in range(batch_size):
if labels[i] in top_k_preds[i, :].astype(np.int32).tolist():
correctness[i] = 1
correct_hits = sum(correctness)
return correct_hits
def compute_topk_accuracy(softmax, labels, top_k):
computed_metrics = {}
assert labels.shape[0] == softmax.shape[0], "Batch size mismatch."
aggr_batch_size = labels.shape[0]
aggr_top_k_correct_hits = compute_topk_correct_hits(top_k, softmax, labels)
# normalize results
computed_metrics = \
float(aggr_top_k_correct_hits) / aggr_batch_size
return computed_metrics
......@@ -17,13 +17,14 @@ from __future__ import unicode_literals
from __future__ import print_function
from __future__ import division
import os
import io
import logging
import os
import io
import numpy as np
import json
from metrics.youtube8m import eval_util as youtube8m_metrics
from metrics.kinetics import accuracy_metrics as kinetics_metrics
logger = logging.getLogger(__name__)
......@@ -95,9 +96,8 @@ class Youtube8mMetrics(Metrics):
if self.mode == 'infer':
for index, item in enumerate(self.infer_results):
video_id = item[0]
logger.info(
'========video_id [ {} ] , topk({}) preds: ========\n'.
format(video_id, self.topk))
print('[========video_id [ {} ] , topk({}) preds: ========]\n'.
format(video_id, self.topk))
f = io.open(label_file, "r", encoding="utf-8")
fl = f.readlines()
......@@ -122,7 +122,7 @@ class Youtube8mMetrics(Metrics):
os.path.join(savedir, 'result' + str(index) + '.json'),
'w',
encoding='utf-8') as f:
f.write(json.dumps(res_list, f, ensure_ascii=False))
f.write(json.dumps(res_list, ensure_ascii=False))
else:
epoch_info_dict = self.calculator.get()
logger.info(info + '\tavg_hit_at_one: {0},\tavg_perr: {1},\tavg_loss :{2},\taps: {3},\tgap:{4}'\
......@@ -135,6 +135,101 @@ class Youtube8mMetrics(Metrics):
self.infer_results = []
class Kinetics400Metrics(Metrics):
def __init__(self, name, mode, metrics_args):
self.name = name
self.mode = mode
self.topk = metrics_args['MODEL']['topk']
self.calculator = kinetics_metrics.MetricsCalculator(name, mode.lower())
if self.mode == 'infer':
self.infer_results = []
def calculate_and_log_out(self, fetch_list, info=''):
if len(fetch_list) == 3:
loss = fetch_list[0]
loss = np.mean(np.array(loss))
pred = np.array(fetch_list[1])
label = np.array(fetch_list[2])
else:
loss = 0.
pred = np.array(fetch_list[0])
label = np.array(fetch_list[1])
acc1, acc5 = self.calculator.calculate_metrics(loss, pred, label)
logger.info(info + '\tLoss: {},\ttop1_acc: {}, \ttop5_acc: {}'.format('%.6f' % loss, \
'%.2f' % acc1, '%.2f' % acc5))
return loss
def accumulate(self, fetch_list, info=''):
if self.mode == 'infer':
predictions = np.array(fetch_list[0])
video_id = fetch_list[1]
for i in range(len(predictions)):
topk_inds = predictions[i].argsort()[0 - self.topk:]
topk_inds = topk_inds[::-1]
preds = predictions[i][topk_inds]
self.infer_results.append(
(video_id[i], topk_inds.tolist(), preds.tolist()))
else:
if len(fetch_list) == 3:
loss = fetch_list[0]
loss = np.mean(np.array(loss))
pred = np.array(fetch_list[1])
label = np.array(fetch_list[2])
else:
loss = 0.
pred = np.array(fetch_list[0])
label = np.array(fetch_list[1])
self.calculator.accumulate(loss, pred, label)
def finalize_and_log_out(self,
info='',
savedir='./data/results',
label_file='./label_3396.txt'):
if self.mode == 'infer':
for index, item in enumerate(self.infer_results):
video_id = item[0]
print('[========video_id [ {} ] , topk({}) preds: ========]\n'.
format(video_id, self.topk))
f = io.open(label_file, "r", encoding="utf-8")
fl = f.readlines()
res_list = []
res_list.append(video_id)
for i in range(len(item[1])):
class_id = item[1][i]
class_prob = item[2][i]
class_name = fl[class_id].split('\n')[0]
print('class_id: {},'.format(class_id), 'class_name:',
class_name,
', probability: {} \n'.format(class_prob))
save_dict = {
"'class_id": class_id,
"class_name": class_name,
"probability": class_prob
}
res_list.append(save_dict)
# save infer result into output dir
with io.open(
os.path.join(savedir, 'result' + str(index) + '.json'),
'w',
encoding='utf-8') as f:
f.write(json.dumps(res_list, ensure_ascii=False))
else:
self.calculator.finalize_metrics()
metrics_dict = self.calculator.get_computed_metrics()
loss = metrics_dict['avg_loss']
acc1 = metrics_dict['avg_acc1']
acc5 = metrics_dict['avg_acc5']
logger.info(info + '\tLoss: {},\ttop1_acc: {}, \ttop5_acc: {}'.format('%.6f' % loss, \
'%.2f' % acc1, '%.2f' % acc5))
def reset(self):
self.calculator.reset()
if self.mode == 'infer':
self.infer_results = []
class MetricsZoo(object):
def __init__(self):
self.metrics_zoo = {}
......@@ -164,6 +259,5 @@ def get_metrics(name, mode, cfg):
# sort by alphabet
regist_metrics("ATTENTIONCLUSTER", Youtube8mMetrics)
regist_metrics("ATTENTIONLSTM", Youtube8mMetrics)
regist_metrics("NEXTVLAD", Youtube8mMetrics)
regist_metrics("TSN", Kinetics400Metrics)
......@@ -12,8 +12,6 @@
#See the License for the specific language governing permissions and
#limitations under the License.
import numpy as np
import paddle.fluid as fluid
from paddle.fluid import ParamAttr
......@@ -27,13 +25,13 @@ __all__ = ["AttentionLSTM"]
class AttentionLSTM(ModelBase):
def __init__(self, name, cfg, mode='train'):
def __init__(self, name, cfg, mode='train', is_videotag=False):
super(AttentionLSTM, self).__init__(name, cfg, mode)
self.is_videotag = is_videotag
self.get_config()
def get_config(self):
# get model configs
self.feature_num = self.cfg.MODEL.feature_num
self.feature_names = self.cfg.MODEL.feature_names
self.feature_dims = self.cfg.MODEL.feature_dims
self.num_classes = self.cfg.MODEL.num_classes
......@@ -45,18 +43,34 @@ class AttentionLSTM(ModelBase):
self.batch_size = self.get_config_from_sec(self.mode, 'batch_size', 1)
self.num_gpus = self.get_config_from_sec(self.mode, 'num_gpus', 1)
if self.mode == 'train':
self.learning_rate = self.get_config_from_sec('train',
'learning_rate', 1e-3)
self.weight_decay = self.get_config_from_sec('train',
'weight_decay', 8e-4)
self.num_samples = self.get_config_from_sec('train', 'num_samples',
5000000)
self.decay_epochs = self.get_config_from_sec('train',
'decay_epochs', [5])
self.decay_gamma = self.get_config_from_sec('train', 'decay_gamma',
0.1)
def build_input(self, use_dataloader):
self.feature_input = []
for name, dim in zip(self.feature_names, self.feature_dims):
self.feature_input.append(
fluid.data(
shape=[None, dim], lod_level=1, dtype='float32', name=name))
#video_tag without label_input
if self.mode != 'infer':
self.label_input = fluid.data(
shape=[None, self.num_classes], dtype='float32', name='label')
else:
self.label_input = None
if use_dataloader:
assert self.mode != 'infer', \
'dataloader is not recommendated when infer, please set use_dataloader to be false.'
self.dataloader = fluid.io.DataLoader.from_generator(
feed_list=self.feature_input, #video_tag
feed_list=self.feature_input + [self.label_input],
capacity=8,
iterable=True)
......@@ -71,7 +85,7 @@ class AttentionLSTM(ModelBase):
if len(att_outs) > 1:
out = fluid.layers.concat(att_outs, axis=1)
else:
out = att_outs[0]
out = att_outs[0] # video only, without audio in videoTag
fc1 = fluid.layers.fc(
input=out,
......@@ -92,8 +106,7 @@ class AttentionLSTM(ModelBase):
self.logit = fluid.layers.fc(input=fc2, size=self.num_classes, act=None, \
bias_attr=ParamAttr(regularizer=fluid.regularizer.L2Decay(0.0),
initializer=fluid.initializer.NormalInitializer(scale=0.0)),
name = 'output')
initializer=fluid.initializer.NormalInitializer(scale=0.0)), name='output')
self.output = fluid.layers.sigmoid(self.logit)
......@@ -125,16 +138,29 @@ class AttentionLSTM(ModelBase):
return [self.output, self.logit]
def feeds(self):
return self.feature_input
return self.feature_input if self.mode == 'infer' else self.feature_input + [
self.label_input
]
def fetches(self):
fetch_list = [self.output]
if self.mode == 'train' or self.mode == 'valid':
losses = self.loss()
fetch_list = [losses, self.output, self.label_input]
elif self.mode == 'test':
losses = self.loss()
fetch_list = [losses, self.output, self.label_input]
elif self.mode == 'infer':
fetch_list = [self.output]
else:
raise NotImplementedError('mode {} not implemented'.format(
self.mode))
return fetch_list
def weights_info(self):
return None
return None, None
def load_pretrain_params(self, exe, pretrain, prog, place):
def load_pretrain_params(self, exe, pretrain, prog):
logger.info("Load pretrain weights from {}, exclude fc layer.".format(
pretrain))
......
......@@ -148,7 +148,7 @@ class ModelBase(object):
download(url, path)
return path
def load_pretrain_params(self, exe, pretrain, prog, place):
def load_pretrain_params(self, exe, pretrain, prog):
logger.info("Load pretrain weights from {}".format(pretrain))
state_dict = fluid.load_program_state(pretrain)
fluid.set_program_state(prog, state_dict)
......@@ -172,10 +172,10 @@ class ModelZoo(object):
type(model))
self.model_zoo[name] = model
def get(self, name, cfg, mode='train'):
def get(self, name, cfg, mode='train', is_videotag=False):
for k, v in self.model_zoo.items():
if k.upper() == name.upper():
return v(name, cfg, mode)
return v(name, cfg, mode, is_videotag)
raise ModelNotFoundError(name, self.model_zoo.keys())
......@@ -187,5 +187,5 @@ def regist_model(name, model):
model_zoo.regist(name, model)
def get_model(name, cfg, mode='train'):
return model_zoo.get(name, cfg, mode)
def get_model(name, cfg, mode='train', is_videotag=False):
return model_zoo.get(name, cfg, mode, is_videotag)
......@@ -12,8 +12,6 @@
#See the License for the specific language governing permissions and
#limitations under the License.
import numpy as np
import paddle.fluid as fluid
from paddle.fluid import ParamAttr
......@@ -27,8 +25,9 @@ __all__ = ["TSN"]
class TSN(ModelBase):
def __init__(self, name, cfg, mode='train'):
def __init__(self, name, cfg, mode='train', is_videotag=False):
super(TSN, self).__init__(name, cfg, mode=mode)
self.is_videotag = is_videotag
self.get_config()
def get_config(self):
......@@ -87,11 +86,11 @@ class TSN(ModelBase):
videomodel = TSN_ResNet(
layers=cfg['layers'],
seg_num=cfg['seg_num'],
is_training=(self.mode == 'train'))
is_training=(self.mode == 'train'),
is_extractor=self.is_videotag)
out = videomodel.net(input=self.feature_input[0],
class_dim=cfg['class_dim'])
# videotag just need extractor feature
self.feature_output = out
self.network_outputs = [out]
def optimizer(self):
assert self.mode == 'train', "optimizer only can be get in train mode"
......@@ -133,9 +132,9 @@ class TSN(ModelBase):
fetch_list = [losses, self.network_outputs[0], self.label_input]
elif self.mode == 'test':
losses = self.loss()
fetch_list = [self.feature_output, self.label_input]
fetch_list = [losses, self.network_outputs[0], self.label_input]
elif self.mode == 'infer':
fetch_list = self.feature_output
fetch_list = self.network_outputs
else:
raise NotImplementedError('mode {} not implemented'.format(
self.mode))
......@@ -143,27 +142,22 @@ class TSN(ModelBase):
return fetch_list
def pretrain_info(self):
return (
'ResNet50_pretrained',
'https://paddlemodels.bj.bcebos.com/video_classification/ResNet50_pretrained.tar.gz'
)
return None, None
def weights_info(self):
return None
def load_pretrain_params(self, exe, pretrain, prog, place):
def load_pretrain_params(self, exe, pretrain, prog):
def is_parameter(var):
return isinstance(var, fluid.framework.Parameter)
params_list = list(filter(is_parameter, prog.list_vars()))
for param in params_list:
print(param.name)
logger.info("Load pretrain weights from {}, exclude fc layer.".format(
pretrain))
print("===pretrain===", pretrain)
state_dict = fluid.load_program_state(pretrain)
dict_keys = list(state_dict.keys())
# remove fc layer when pretrain, because the number of classes in final fc may not match
for name in dict_keys:
if "fc_0" in name:
del state_dict[name]
......
......@@ -20,10 +20,15 @@ import math
class TSN_ResNet():
def __init__(self, layers=50, seg_num=7, is_training=True):
self.layers = 101 #layers
def __init__(self,
layers=50,
seg_num=7,
is_training=True,
is_extractor=False):
self.layers = layers
self.seg_num = seg_num
self.is_training = is_training
self.is_extractor = is_extractor
def conv_bn_layer(self,
input,
......@@ -144,7 +149,18 @@ class TSN_ResNet():
pool = fluid.layers.pool2d(
input=conv, pool_size=7, pool_type='avg', global_pooling=True)
# video_tag just need extractor feature
feature = fluid.layers.reshape(
x=pool, shape=[-1, seg_num, pool.shape[1]])
return feature
if self.is_extractor:
out = feature
else:
out = fluid.layers.reduce_mean(feature, dim=1)
stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
out = fluid.layers.fc(input=out,
size=class_dim,
act='softmax',
param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(
-stdv, stdv)))
return out
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import os
import sys
import time
import logging
import argparse
import ast
import numpy as np
try:
import cPickle as pickle
except:
import pickle
import paddle.fluid as fluid
from utils.config_utils import *
import models
from reader import get_reader
from metrics import get_metrics
from utils.utility import check_cuda
from utils.utility import check_version
logging.root.handlers = []
FORMAT = '[%(levelname)s: %(filename)s: %(lineno)4d]: %(message)s'
logging.basicConfig(level=logging.DEBUG, format=FORMAT, stream=sys.stdout)
logger = logging.getLogger(__name__)
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument(
'--model_name',
type=str,
default='AttentionCluster',
help='name of model to train.')
parser.add_argument(
'--config',
type=str,
default='configs/attention_cluster.txt',
help='path to config file of model')
parser.add_argument(
'--use_gpu',
type=ast.literal_eval,
default=True,
help='default use gpu.')
parser.add_argument(
'--weights',
type=str,
default='./data/checkpoints/AttentionLSTM_epoch9.pdparams',
help='weight path.')
parser.add_argument(
'--batch_size',
type=int,
default=1,
help='sample number in a batch for inference.')
parser.add_argument(
'--filelist',
type=str,
default=None,
help='path to inferenece data file lists file.')
parser.add_argument(
'--log_interval',
type=int,
default=1,
help='mini-batch interval to log.')
parser.add_argument(
'--infer_topk',
type=int,
default=20,
help='topk predictions to restore.')
parser.add_argument(
'--save_dir',
type=str,
default=os.path.join('data', 'predict_results', 'attention_lstm'),
help='directory to store results')
parser.add_argument(
'--video_path',
type=str,
default=None,
help='directory to store results')
parser.add_argument(
'--label_file',
type=str,
default='label_3396.txt',
help='chinese label file path')
args = parser.parse_args()
return args
def infer(args):
# parse config
config = parse_config(args.config)
infer_config = merge_configs(config, 'infer', vars(args))
print_configs(infer_config, "Infer")
infer_model = models.get_model(args.model_name, infer_config, mode='infer')
infer_model.build_input(use_dataloader=False)
infer_model.build_model()
infer_feeds = infer_model.feeds()
infer_outputs = infer_model.outputs()
place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
filelist = args.filelist or infer_config.INFER.filelist
filepath = args.video_path or infer_config.INFER.get('filepath', '')
if filepath != '':
assert os.path.exists(filepath), "{} not exist.".format(filepath)
else:
assert os.path.exists(filelist), "{} not exist.".format(filelist)
# get infer reader
infer_reader = get_reader(args.model_name.upper(), 'infer', infer_config)
if args.weights:
assert os.path.exists(
args.weights), "Given weight dir {} not exist.".format(args.weights)
# if no weight files specified, download weights from paddle
weights = args.weights or infer_model.get_weights()
infer_model.load_test_weights(exe, weights, fluid.default_main_program())
infer_feeder = fluid.DataFeeder(place=place, feed_list=infer_feeds)
fetch_list = infer_model.fetches()
infer_metrics = get_metrics(args.model_name.upper(), 'infer', infer_config)
infer_metrics.reset()
periods = []
cur_time = time.time()
for infer_iter, data in enumerate(infer_reader()):
data_feed_in = [items[:-1] for items in data]
video_id = [items[-1] for items in data]
infer_outs = exe.run(fetch_list=fetch_list,
feed=infer_feeder.feed(data_feed_in))
infer_result_list = [item for item in infer_outs] + [video_id]
prev_time = cur_time
cur_time = time.time()
period = cur_time - prev_time
periods.append(period)
infer_metrics.accumulate(infer_result_list)
if args.log_interval > 0 and infer_iter % args.log_interval == 0:
logger.info('Processed {} samples'.format((infer_iter + 1) * len(
video_id)))
logger.info('[INFER] infer finished. average time: {}'.format(
np.mean(periods)))
if not os.path.isdir(args.save_dir):
os.makedirs(args.save_dir)
infer_metrics.finalize_and_log_out(
savedir=args.save_dir, label_file=args.label_file)
if __name__ == "__main__":
args = parse_args()
# check whether the installed paddle is compiled with GPU
check_cuda(args.use_gpu)
check_version()
logger.info(args)
infer(args)
from .reader_utils import regist_reader, get_reader
from .feature_reader import FeatureReader
from .kinetics_reader import KineticsReader
# regist reader, sort by alphabet
regist_reader("ATTENTIONLSTM", FeatureReader)
regist_reader("TSN", KineticsReader)
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import sys
from .reader_utils import DataReader
try:
import cPickle as pickle
from cStringIO import StringIO
except ImportError:
import pickle
from io import BytesIO
import numpy as np
import random
python_ver = sys.version_info
class FeatureReader(DataReader):
"""
Data reader for youtube-8M dataset, which was stored as features extracted by prior networks
This is for the three models: lstm
dataset cfg: num_classes
batch_size
list
"""
def __init__(self, name, mode, cfg):
self.name = name
self.mode = mode
self.num_classes = cfg.MODEL.num_classes
# set batch size and file list
self.batch_size = cfg[mode.upper()]['batch_size']
self.filelist = cfg[mode.upper()]['filelist']
self.seg_num = cfg.MODEL.get('seg_num', None)
def create_reader(self):
fl = open(self.filelist).readlines()
fl = [line.strip() for line in fl if line.strip() != '']
if self.mode == 'train':
random.shuffle(fl)
def reader():
batch_out = []
for item in fl:
fileinfo = item.split(' ')
filepath = fileinfo[0]
rgb = np.load(filepath, allow_pickle=True)
nframes = rgb.shape[0]
label = [int(i) for i in fileinfo[1:]]
one_hot_label = make_one_hot(label, self.num_classes)
if self.mode != 'infer':
batch_out.append((rgb, one_hot_label))
else:
batch_out.append((rgb, filepath.split('/')[-1]))
if len(batch_out) == self.batch_size:
yield batch_out
batch_out = []
return reader
def make_one_hot(label, dim=3862):
one_hot_label = np.zeros(dim)
one_hot_label = one_hot_label.astype(float)
for ind in label:
one_hot_label[int(ind)] = 1
return one_hot_label
......@@ -18,7 +18,6 @@ import cv2
import math
import random
import functools
import time
try:
import cPickle as pickle
from cStringIO import StringIO
......@@ -26,7 +25,9 @@ except ImportError:
import pickle
from io import BytesIO
import numpy as np
import paddle
import paddle.fluid as fluid
from PIL import Image, ImageEnhance
import logging
......@@ -36,6 +37,30 @@ logger = logging.getLogger(__name__)
python_ver = sys.version_info
class VideoRecord(object):
'''
define a class method which used to describe the frames information of videos
1. self._data[0] is the frames' path
2. self._data[1] is the number of frames
3. self._data[2] is the label of frames
'''
def __init__(self, row):
self._data = row
@property
def path(self):
return self._data[0]
@property
def num_frames(self):
return int(self._data[1])
@property
def label(self):
return int(self._data[2])
class KineticsReader(DataReader):
"""
Data reader for kinetics dataset of two format mp4 and pkl.
......@@ -77,6 +102,7 @@ class KineticsReader(DataReader):
# set batch size and file list
self.batch_size = cfg[mode.upper()]['batch_size']
self.filelist = cfg[mode.upper()]['filelist']
if self.fix_random_seed:
random.seed(0)
np.random.seed(0)
......@@ -84,13 +110,13 @@ class KineticsReader(DataReader):
def create_reader(self):
assert os.path.exists(self.filelist), \
'{} not exist, please check the data list'.format(self.filelist)
'{} not exist, please check the data list'.format(self.filelist)
_reader = self._reader_creator(self.filelist, self.mode, seg_num=self.seg_num, seglen = self.seglen, \
short_size = self.short_size, target_size = self.target_size, \
img_mean = self.img_mean, img_std = self.img_std, \
shuffle = (self.mode == 'train'), \
num_threads = self.num_reader_threads, \
buf_size = self.buf_size, format = self.format)
short_size = self.short_size, target_size = self.target_size, \
img_mean = self.img_mean, img_std = self.img_std, \
shuffle = (self.mode == 'train'), \
num_threads = self.num_reader_threads, \
buf_size = self.buf_size, format = self.format)
def _batch_reader():
batch_out = []
......@@ -105,7 +131,7 @@ class KineticsReader(DataReader):
return _batch_reader
def _reader_creator(self,
pickle_list,
file_list,
mode,
seg_num,
seglen,
......@@ -116,15 +142,17 @@ class KineticsReader(DataReader):
shuffle=False,
num_threads=1,
buf_size=1024,
format='pkl'):
format='frames'):
def decode_mp4(sample, mode, seg_num, seglen, short_size, target_size,
img_mean, img_std):
sample = sample[0].split(' ')
mp4_path = sample[0]
if mode == "infer":
label = mp4_path.split('/')[-1]
else:
label = int(sample[1])
try:
load_time1 = time.time()
imgs = mp4_loader(mp4_path, seg_num, seglen, mode)
load_time2 = time.time()
if len(imgs) < 1:
logger.error('{} frame length {} less than 1.'.format(
mp4_path, len(imgs)))
......@@ -133,8 +161,29 @@ class KineticsReader(DataReader):
logger.error('Error when loading {}'.format(mp4_path))
return None, None
transform_time_1 = time.time()
imgs = imgs_transform(
return imgs_transform(imgs, mode, seg_num, seglen, \
short_size, target_size, img_mean, img_std, name = self.name), label
def decode_frames(sample, mode, seg_num, seglen, short_size,
target_size, img_mean, img_std):
recode = VideoRecord(sample[0].split(' '))
frames_dir_path = recode.path
if mode == "infer":
label = frames_dir_path
else:
label = recode.label
try:
imgs = frames_loader(recode, seg_num, seglen, mode)
if len(imgs) < 1:
logger.error('{} frame length {} less than 1.'.format(
frames_dir_path, len(imgs)))
return None, None
except:
logger.error('Error when loading {}'.format(frames_dir_path))
return None, None
return imgs_transform(
imgs,
mode,
seg_num,
......@@ -143,21 +192,26 @@ class KineticsReader(DataReader):
target_size,
img_mean,
img_std,
name=self.name)
transform_time_2 = time.time()
return imgs, mp4_path
name=self.name), label
def reader():
with open(pickle_list) as flist:
def reader_():
with open(file_list) as flist:
lines = [line.strip() for line in flist]
if shuffle:
random.shuffle(lines)
for line in lines:
pickle_path = line.strip()
yield [pickle_path]
file_path = line.strip()
yield [file_path]
if format == 'frames':
decode_func = decode_frames
elif format == 'video':
decode_func = decode_mp4
else:
raise ("Not implemented format {}".format(format))
mapper = functools.partial(
decode_mp4,
decode_func,
mode=mode,
seg_num=seg_num,
seglen=seglen,
......@@ -166,7 +220,8 @@ class KineticsReader(DataReader):
img_mean=img_mean,
img_std=img_std)
return fluid.io.xmap_readers(mapper, reader, num_threads, buf_size)
return fluid.io.xmap_readers(
mapper, reader_, num_threads, buf_size, order=True)
def imgs_transform(imgs,
......@@ -181,7 +236,13 @@ def imgs_transform(imgs,
imgs = group_scale(imgs, short_size)
np_imgs = np.array([np.array(img).astype('float32') for img in imgs]) #dhwc
np_imgs = group_center_crop(np_imgs, target_size)
if mode == 'train':
np_imgs = group_crop(np_imgs, target_size)
np_imgs = group_random_flip(np_imgs)
else:
np_imgs = group_crop(np_imgs, target_size, is_center=True)
np_imgs = np_imgs.transpose(0, 3, 1, 2) / 255 #dchw
np_imgs -= img_mean
np_imgs /= img_std
......@@ -189,20 +250,33 @@ def imgs_transform(imgs,
return np_imgs
def group_center_crop(np_imgs, target_size):
def group_crop(np_imgs, target_size, is_center=True):
d, h, w, c = np_imgs.shape
th, tw = target_size, target_size
assert (w >= target_size) and (h >= target_size), \
"image width({}) and height({}) should be larger than crop size".format(w, h, target_size)
"image width({}) and height({}) should be larger than crop size".format(w, h, target_size)
h_off = int(round((h - th) / 2.))
w_off = int(round((w - tw) / 2.))
if is_center:
h_off = int(round((h - th) / 2.))
w_off = int(round((w - tw) / 2.))
else:
w_off = random.randint(0, w - tw)
h_off = random.randint(0, h - th)
img_crop = np_imgs[:, h_off:h_off + target_size, w_off:w_off +
target_size, :]
return img_crop
def group_random_flip(np_imgs):
prob = random.random()
if prob < 0.5:
ret = np_imgs[:, :, ::-1, :]
return ret
else:
return np_imgs
def group_scale(imgs, target_size):
resized_imgs = []
for i in range(len(imgs)):
......@@ -239,13 +313,22 @@ def mp4_loader(filepath, nsample, seglen, mode):
imgs = []
for i in range(nsample):
idx = 0
if average_dur >= seglen:
idx = (average_dur - 1) // 2
idx += i * average_dur
elif average_dur >= 1:
idx += i * average_dur
if mode == 'train':
if average_dur >= seglen:
idx = random.randint(0, average_dur - seglen)
idx += i * average_dur
elif average_dur >= 1:
idx += i * average_dur
else:
idx = i
else:
idx = i
if average_dur >= seglen:
idx = (average_dur - 1) // 2
idx += i * average_dur
elif average_dur >= 1:
idx += i * average_dur
else:
idx = i
for jj in range(idx, idx + seglen):
imgbuf = sampledFrames[int(jj % len(sampledFrames))]
......@@ -253,3 +336,34 @@ def mp4_loader(filepath, nsample, seglen, mode):
imgs.append(img)
return imgs
def frames_loader(recode, nsample, seglen, mode):
imgpath, num_frames = recode.path, recode.num_frames
average_dur = int(num_frames / nsample)
imgs = []
for i in range(nsample):
idx = 0
if mode == 'train':
if average_dur >= seglen:
idx = random.randint(0, average_dur - seglen)
idx += i * average_dur
elif average_dur >= 1:
idx += i * average_dur
else:
idx = i
else:
if average_dur >= seglen:
idx = (average_dur - 1) // 2
idx += i * average_dur
elif average_dur >= 1:
idx += i * average_dur
else:
idx = i
for jj in range(idx, idx + seglen):
img = Image.open(
os.path.join(imgpath, 'img_{:05d}.jpg'.format(jj + 1))).convert(
'RGB')
imgs.append(img)
return imgs
export CUDA_VISIBLE_DEVICES=0
# TSN + AttentionLSTM
python videotag_main.py
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import os
import sys
import argparse
import ast
import logging
import paddle.fluid as fluid
from utils.train_utils import train_with_dataloader
import models
from utils.config_utils import *
from reader import get_reader
from metrics import get_metrics
from utils.utility import check_cuda
from utils.utility import check_version
logging.root.handlers = []
FORMAT = '[%(levelname)s: %(filename)s: %(lineno)4d]: %(message)s'
logging.basicConfig(level=logging.INFO, format=FORMAT, stream=sys.stdout)
logger = logging.getLogger(__name__)
def parse_args():
parser = argparse.ArgumentParser("Paddle Video train script")
parser.add_argument(
'--model_name',
type=str,
default='AttentionCluster',
help='name of model to train.')
parser.add_argument(
'--config',
type=str,
default='configs/attention_cluster.txt',
help='path to config file of model')
parser.add_argument(
'--batch_size',
type=int,
default=None,
help='training batch size. None to use config file setting.')
parser.add_argument(
'--learning_rate',
type=float,
default=None,
help='learning rate use for training. None to use config file setting.')
parser.add_argument(
'--pretrain', type=str, default=None, help='path to pretrain weights.')
parser.add_argument(
'--use_gpu',
type=ast.literal_eval,
default=True,
help='default use gpu.')
parser.add_argument(
'--no_memory_optimize',
action='store_true',
default=False,
help='whether to use memory optimize in train')
parser.add_argument(
'--epoch',
type=int,
default=None,
help='epoch number, 0 for read from config file')
parser.add_argument(
'--valid_interval',
type=int,
default=1,
help='validation epoch interval, 0 for no validation.')
parser.add_argument(
'--save_dir',
type=str,
default=os.path.join('data', 'checkpoints'),
help='directory name to save train snapshoot')
parser.add_argument(
'--log_interval',
type=int,
default=1,
help='mini-batch interval to log.')
parser.add_argument(
'--fix_random_seed',
type=ast.literal_eval,
default=False,
help='If set True, enable continuous evaluation job.')
args = parser.parse_args()
return args
def train(args):
# parse config
config = parse_config(args.config)
train_config = merge_configs(config, 'train', vars(args))
valid_config = merge_configs(config, 'valid', vars(args))
print_configs(train_config, 'Train')
train_model = models.get_model(args.model_name, train_config, mode='train')
valid_model = models.get_model(args.model_name, valid_config, mode='valid')
# build model
startup = fluid.Program()
train_prog = fluid.Program()
if args.fix_random_seed:
startup.random_seed = 1000
train_prog.random_seed = 1000
with fluid.program_guard(train_prog, startup):
with fluid.unique_name.guard():
train_model.build_input(use_dataloader=True)
train_model.build_model()
# for the input, has the form [data1, data2,..., label], so train_feeds[-1] is label
train_feeds = train_model.feeds()
train_fetch_list = train_model.fetches()
train_loss = train_fetch_list[0]
optimizer = train_model.optimizer()
optimizer.minimize(train_loss)
train_dataloader = train_model.dataloader()
valid_prog = fluid.Program()
with fluid.program_guard(valid_prog, startup):
with fluid.unique_name.guard():
valid_model.build_input(use_dataloader=True)
valid_model.build_model()
valid_feeds = valid_model.feeds()
valid_fetch_list = valid_model.fetches()
valid_dataloader = valid_model.dataloader()
place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(startup)
if args.pretrain:
train_model.load_pretrain_params(exe, args.pretrain, train_prog)
build_strategy = fluid.BuildStrategy()
build_strategy.enable_inplace = True
exec_strategy = fluid.ExecutionStrategy()
compiled_train_prog = fluid.compiler.CompiledProgram(
train_prog).with_data_parallel(
loss_name=train_loss.name,
build_strategy=build_strategy,
exec_strategy=exec_strategy)
compiled_valid_prog = fluid.compiler.CompiledProgram(
valid_prog).with_data_parallel(
share_vars_from=compiled_train_prog,
build_strategy=build_strategy,
exec_strategy=exec_strategy)
# get reader
bs_denominator = 1
if args.use_gpu:
# check number of GPUs
gpus = os.getenv("CUDA_VISIBLE_DEVICES", "")
if gpus == "":
pass
else:
gpus = gpus.split(",")
num_gpus = len(gpus)
assert num_gpus == train_config.TRAIN.num_gpus, \
"num_gpus({}) set by CUDA_VISIBLE_DEVICES " \
"shoud be the same as that " \
"set in {}({})".format(
num_gpus, args.config, train_config.TRAIN.num_gpus)
bs_denominator = train_config.TRAIN.num_gpus
train_config.TRAIN.batch_size = int(train_config.TRAIN.batch_size /
bs_denominator)
valid_config.VALID.batch_size = int(valid_config.VALID.batch_size /
bs_denominator)
train_reader = get_reader(args.model_name.upper(), 'train', train_config)
valid_reader = get_reader(args.model_name.upper(), 'valid', valid_config)
# get metrics
train_metrics = get_metrics(args.model_name.upper(), 'train', train_config)
valid_metrics = get_metrics(args.model_name.upper(), 'valid', valid_config)
epochs = args.epoch or train_model.epoch_num()
exe_places = fluid.cuda_places() if args.use_gpu else fluid.cpu_places()
train_dataloader.set_sample_list_generator(train_reader, places=exe_places)
valid_dataloader.set_sample_list_generator(valid_reader, places=exe_places)
train_with_dataloader(
exe,
train_prog,
compiled_train_prog,
train_dataloader,
train_fetch_list,
train_metrics,
epochs=epochs,
log_interval=args.log_interval,
valid_interval=args.valid_interval,
save_dir=args.save_dir,
save_model_name=args.model_name,
fix_random_seed=args.fix_random_seed,
compiled_test_prog=compiled_valid_prog,
test_dataloader=valid_dataloader,
test_fetch_list=valid_fetch_list,
test_metrics=valid_metrics)
if __name__ == "__main__":
args = parse_args()
# check whether the installed paddle is compiled with GPU
check_cuda(args.use_gpu)
check_version()
logger.info(args)
if not os.path.exists(args.save_dir):
os.makedirs(args.save_dir)
train(args)
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import os
import sys
import time
import logging
import argparse
import ast
import numpy as np
try:
import cPickle as pickle
except:
import pickle
import paddle.fluid as fluid
from utils.config_utils import *
import models
from reader import get_reader
from metrics import get_metrics
from utils.utility import check_cuda
from utils.utility import check_version
logging.root.handlers = []
FORMAT = '[%(levelname)s: %(filename)s: %(lineno)4d]: %(message)s'
logging.basicConfig(level=logging.DEBUG, format=FORMAT, stream=sys.stdout)
logger = logging.getLogger(__name__)
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument(
'--model_name',
type=str,
default='AttentionCluster',
help='name of model to train.')
parser.add_argument(
'--config',
type=str,
default='configs/attention_cluster.txt',
help='path to config file of model')
parser.add_argument(
'--use_gpu',
type=ast.literal_eval,
default=True,
help='default use gpu.')
parser.add_argument(
'--weights',
type=str,
default=None,
help='weight path, None to automatically download weights provided by Paddle.'
)
parser.add_argument(
'--batch_size',
type=int,
default=1,
help='sample number in a batch for inference.')
parser.add_argument(
'--filelist',
type=str,
default='./data/TsnExtractor.list',
help='path to inferenece data file lists file.')
parser.add_argument(
'--log_interval',
type=int,
default=1,
help='mini-batch interval to log.')
parser.add_argument(
'--infer_topk',
type=int,
default=20,
help='topk predictions to restore.')
parser.add_argument(
'--save_dir',
type=str,
default=os.path.join('data', 'tsn_features'),
help='directory to store tsn feature results')
parser.add_argument(
'--video_path',
type=str,
default=None,
help='directory to store results')
args = parser.parse_args()
return args
def infer(args):
# parse config
config = parse_config(args.config)
infer_config = merge_configs(config, 'infer', vars(args))
print_configs(infer_config, "Infer")
infer_model = models.get_model(
args.model_name, infer_config, mode='infer', is_videotag=True)
infer_model.build_input(use_dataloader=False)
infer_model.build_model()
infer_feeds = infer_model.feeds()
infer_outputs = infer_model.outputs()
place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
filelist = args.filelist or infer_config.INFER.filelist
filepath = args.video_path or infer_config.INFER.get('filepath', '')
if filepath != '':
assert os.path.exists(filepath), "{} not exist.".format(filepath)
else:
assert os.path.exists(filelist), "{} not exist.".format(filelist)
# get infer reader
infer_reader = get_reader(args.model_name.upper(), 'infer', infer_config)
if args.weights:
assert os.path.exists(
args.weights), "Given weight dir {} not exist.".format(args.weights)
# if no weight files specified, download weights from paddle
weights = args.weights or infer_model.get_weights()
infer_model.load_test_weights(exe, weights, fluid.default_main_program())
infer_feeder = fluid.DataFeeder(place=place, feed_list=infer_feeds)
fetch_list = infer_model.fetches()
infer_metrics = get_metrics(args.model_name.upper(), 'infer', infer_config)
infer_metrics.reset()
if not os.path.isdir(args.save_dir):
os.makedirs(args.save_dir)
for infer_iter, data in enumerate(infer_reader()):
data_feed_in = [items[:-1] for items in data]
video_id = [items[-1] for items in data]
bs = len(video_id)
feature_outs = exe.run(fetch_list=fetch_list,
feed=infer_feeder.feed(data_feed_in))
for i in range(bs):
filename = video_id[i].split('/')[-1][:-4]
np.save(
os.path.join(args.save_dir, filename + '.npy'),
feature_outs[0][i]) #shape: seg_num*feature_dim
logger.info("Feature extraction End~")
if __name__ == "__main__":
args = parse_args()
# check whether the installed paddle is compiled with GPU
check_cuda(args.use_gpu)
check_version()
logger.info(args)
infer(args)
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import sys
import time
import numpy as np
import paddle
import paddle.fluid as fluid
from paddle.fluid import profiler
import logging
import shutil
logger = logging.getLogger(__name__)
def log_lr_and_step():
try:
# In optimizers, if learning_rate is set as constant, lr_var
# name is 'learning_rate_0', and iteration counter is not
# recorded. If learning_rate is set as decayed values from
# learning_rate_scheduler, lr_var name is 'learning_rate',
# and iteration counter is recorded with name '@LR_DECAY_COUNTER@',
# better impliment is required here
lr_var = fluid.global_scope().find_var("learning_rate")
if not lr_var:
lr_var = fluid.global_scope().find_var("learning_rate_0")
lr = np.array(lr_var.get_tensor())
lr_count = '[-]'
lr_count_var = fluid.global_scope().find_var("@LR_DECAY_COUNTER@")
if lr_count_var:
lr_count = np.array(lr_count_var.get_tensor())
logger.info("------- learning rate {}, learning rate counter {} -----"
.format(np.array(lr), np.array(lr_count)))
except:
logger.warn("Unable to get learning_rate and LR_DECAY_COUNTER.")
def test_with_dataloader(exe,
compiled_test_prog,
test_dataloader,
test_fetch_list,
test_metrics,
log_interval=0,
save_model_name=''):
if not test_dataloader:
logger.error("[TEST] get dataloader failed.")
test_metrics.reset()
test_iter = 0
for data in test_dataloader():
test_outs = exe.run(compiled_test_prog,
fetch_list=test_fetch_list,
feed=data)
test_metrics.accumulate(test_outs)
if log_interval > 0 and test_iter % log_interval == 0:
test_metrics.calculate_and_log_out(test_outs, \
info = '[TEST] test_iter {} '.format(test_iter))
test_iter += 1
test_metrics.finalize_and_log_out("[TEST] Finish")
def train_with_dataloader(exe, train_prog, compiled_train_prog, train_dataloader, \
train_fetch_list, train_metrics, epochs = 10, \
log_interval = 0, valid_interval = 0, save_dir = './', \
num_trainers = 1, trainer_id = 0, \
save_model_name = 'model', fix_random_seed = False, \
compiled_test_prog = None, test_dataloader = None, \
test_fetch_list = None, test_metrics = None, \
is_profiler = None, profiler_path = None):
if not train_dataloader:
logger.error("[TRAIN] get dataloader failed.")
epoch_periods = []
train_loss = 0
for epoch in range(epochs):
log_lr_and_step()
train_iter = 0
epoch_periods = []
cur_time = time.time()
for data in train_dataloader():
train_outs = exe.run(compiled_train_prog,
fetch_list=train_fetch_list,
feed=data)
period = time.time() - cur_time
epoch_periods.append(period)
timeStamp = time.time()
localTime = time.localtime(timeStamp)
strTime = time.strftime("%Y-%m-%d %H:%M:%S", localTime)
if log_interval > 0 and (train_iter % log_interval == 0):
train_metrics.calculate_and_log_out(train_outs, \
info = '[TRAIN {}] Epoch {}, iter {}, time {}, '.format(strTime, epoch, train_iter, period))
train_iter += 1
cur_time = time.time()
# NOTE: profiler tools, used for benchmark
if is_profiler and epoch == 0 and train_iter == log_interval:
profiler.start_profiler("All")
elif is_profiler and epoch == 0 and train_iter == log_interval + 5:
profiler.stop_profiler("total", profiler_path)
return
if len(epoch_periods) < 1:
logger.info(
'No iteration was executed, please check the data reader')
sys.exit(1)
logger.info('[TRAIN] Epoch {} training finished, average time: {}'.
format(epoch, np.mean(epoch_periods[1:])))
if trainer_id == 0:
save_model(exe, train_prog, save_dir, save_model_name,
"_epoch{}".format(epoch))
if compiled_test_prog and valid_interval > 0 and (
epoch + 1) % valid_interval == 0:
test_with_dataloader(exe, compiled_test_prog, test_dataloader,
test_fetch_list, test_metrics, log_interval,
save_model_name)
if trainer_id == 0:
save_model(exe, train_prog, save_dir, save_model_name)
#when fix_random seed for debug
if fix_random_seed:
cards = os.environ.get('CUDA_VISIBLE_DEVICES')
gpu_num = len(cards.split(","))
print("kpis\ttrain_cost_card{}\t{}".format(gpu_num, train_loss))
print("kpis\ttrain_speed_card{}\t{}".format(gpu_num,
np.mean(epoch_periods)))
def save_model(exe, program, save_dir, model_name, postfix=''):
"""save paramters and optimizer related varaibles"""
if not os.path.isdir(save_dir):
os.makedirs(save_dir)
saved_model_name = model_name + postfix
fluid.save(program, os.path.join(save_dir, saved_model_name))
return
......@@ -78,10 +78,13 @@ def parse_args():
parser.add_argument(
'--filelist',
type=str,
default=None,
default='./data/VideoTag_test.list',
help='path of video data, multiple video')
parser.add_argument(
'--save_dir', type=str, default='data/results', help='output file path')
'--save_dir',
type=str,
default='data/VideoTag_results',
help='output file path')
parser.add_argument(
'--label_file',
type=str,
......@@ -116,7 +119,10 @@ def main():
with fluid.unique_name.guard():
# build model
extractor_model = models.get_model(
args.extractor_name, extractor_infer_config, mode='infer')
args.extractor_name,
extractor_infer_config,
mode='infer',
is_videotag=True)
extractor_model.build_input(use_dataloader=False)
extractor_model.build_model()
extractor_feeds = extractor_model.feeds()
......@@ -129,8 +135,9 @@ def main():
logger.info('load extractor weights from {}'.format(
args.extractor_weights))
extractor_model.load_test_weights(exe, args.extractor_weights,
extractor_main_prog)
extractor_model.load_pretrain_params(
exe, args.extractor_weights, extractor_main_prog)
# get reader and metrics
extractor_reader = get_reader(args.extractor_name, 'infer',
......@@ -224,8 +231,6 @@ def main():
if __name__ == '__main__':
import paddle
paddle.enable_static()
start_time = time.time()
args = parse_args()
print(args)
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册