未验证 提交 2346fd72 编写于 作者: K Kaipeng Deng 提交者: GitHub

Merge pull request #1817 from heavengate/video_doc

Video doc
......@@ -6,11 +6,17 @@
| 模型 | 类别 | 描述 |
| :--------------- | :--------: | :------------: |
| [Attention Cluster](./models/attention_cluster/README.md) [[论文](https://arxiv.org/abs/1711.09550)] | 视频分类| CVPR'18提出的视频多模态特征注意力聚簇融合方法 |
| [Attention LSTM](./models/attention_lstm/README.md) [[论文](https://arxiv.org/abs/1503.08909)] | 视频分类| 常用模型,速度快精度高 |
| [NeXtVLAD](./models/nextvlad/README.md) [[论文](https://arxiv.org/abs/1811.05014)] | 视频分类| 2nd-Youtube-8M最优单模型 |
| [StNet](./models/stnet/README.md) [[论文](https://arxiv.org/abs/1811.01549)] | 视频分类| AAAI'19提出的视频联合时空建模方法 |
| [TSN](./models/tsn/README.md) [[论文](https://arxiv.org/abs/1608.00859)] | 视频分类| ECCV'16提出的基于2D-CNN经典解决方案 |
| [Attention Cluster](./models/attention_cluster/README.md) | 视频分类| CVPR'18提出的视频多模态特征注意力聚簇融合方法 |
| [Attention LSTM](./models/attention_lstm/README.md) | 视频分类| 常用模型,速度快精度高 |
| [NeXtVLAD](./models/nextvlad/README.md) | 视频分类| 2nd-Youtube-8M最优单模型 |
| [StNet](./models/stnet/README.md) | 视频分类| AAAI'19提出的视频联合时空建模方法 |
| [TSN](./models/tsn/README.md) | 视频分类| ECCV'16提出的基于2D-CNN经典解决方案 |
### 主要特点
- 包含视频分类方向的多个主流领先模型,其中Attention LSTM,Attention Cluster和NeXtVLAD是比较流行的特征序列模型,TSN和StNet是两个End-to-End的视频分类模型。Attention LSTM模型速度快精度高,NeXtVLAD是2nd-Youtube-8M比赛中最好的单模型, TSN是基于2D-CNN的经典解决方案。Attention Cluster和StNet是百度自研模型,分别发表于CVPR2018和AAAI2019,是Kinetics600比赛第一名中使用到的模型。
- 提供了适合视频分类任务的通用骨架代码,用户可一键式高效配置模型完成训练和评测。
## 安装
......@@ -52,6 +58,47 @@ bash scripts/train/train_stnet.sh
- 请根据`CUDA_VISIBLE_DEVICES`指定卡数修改`config`文件中的`num_gpus``batch_size`配置。
## 模型库结构
### 代码结构
```
configs/
stnet.txt
tsn.txt
...
dataset/
youtube/
kinetics/
datareader/
feature_readeer.py
kinetics_reader.py
...
metrics/
kinetics/
youtube8m/
...
models/
stnet/
tsn/
...
scripts/
train/
test/
train.py
test.py
infer.py
```
- `configs`: 各模型配置文件模板
- `datareader`: 提供Youtube-8M,Kinetics数据集reader
- `metrics`: Youtube-8,Kinetics数据集评估脚本
- `models`: 各模型网络结构构建脚本
- `scripts`: 各模型快速训练评估脚本
- `train.py`: 一键式训练脚本,可通过指定模型名,配置文件等一键式启动训练
- `test.py`: 一键式评估脚本,可通过指定模型名,配置文件,模型权重等一键式启动评估
- `infer.py`: 一键式推断脚本,可通过指定模型名,配置文件,模型权重,待推断文件列表等一键式启动推断
## Model Zoo
- 基于Youtube-8M数据集模型:
......@@ -69,6 +116,14 @@ bash scripts/train/train_stnet.sh
| StNet | 128 | 8卡P40 | 5.1 | 0.69 | [model](https://paddlemodels.bj.bcebos.com/video_classification/stnet_kinetics.tar.gz) |
| TSN | 256 | 8卡P40 | 7.1 | 0.67 | [model](https://paddlemodels.bj.bcebos.com/video_classification/tsn_kinetics.tar.gz) |
## 参考文献
- [Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification](https://arxiv.org/abs/1711.09550), Xiang Long, Chuang Gan, Gerard de Melo, Jiajun Wu, Xiao Liu, Shilei Wen
- [Beyond Short Snippets: Deep Networks for Video Classification](https://arxiv.org/abs/1503.08909) Joe Yue-Hei Ng, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, George Toderici
- [NeXtVLAD: An Efficient Neural Network to Aggregate Frame-level Features for Large-scale Video Classification](https://arxiv.org/abs/1811.05014), Rongcheng Lin, Jing Xiao, Jianping Fan
- [StNet:Local and Global Spatial-Temporal Modeling for Human Action Recognition](https://arxiv.org/abs/1811.01549), Dongliang He, Zhichao Zhou, Chuang Gan, Fu Li, Xiao Liu, Yandong Li, Limin Wang, Shilei Wen
- [Temporal Segment Networks: Towards Good Practices for Deep Action Recognition](https://arxiv.org/abs/1608.00859), Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, Luc Van Gool
## 版本更新
- 3/2019: 新增模型库,发布Attention Cluster,Attention LSTM,NeXtVLAD,StNet,TSN五个视频分类模型。
......
......@@ -105,5 +105,5 @@ StNet的训练数据采用由DeepMind公布的Kinetics-400动作识别数据集
## 参考论文
[StNet:Local and Global Spatial-Temporal Modeling for Human Action Recognition](https://arxiv.org/abs/1811.01549), Dongliang He, Zhichao Zhou, Chuang Gan, Fu Li, Xiao Liu, Yandong Li, Limin Wang, Shilei Wen
- [StNet:Local and Global Spatial-Temporal Modeling for Human Action Recognition](https://arxiv.org/abs/1811.01549), Dongliang He, Zhichao Zhou, Chuang Gan, Fu Li, Xiao Liu, Yandong Li, Limin Wang, Shilei Wen
......@@ -81,5 +81,5 @@ TSN的训练数据采用由DeepMind公布的Kinetics-400动作识别数据集。
## 参考论文
- [StNet:Local and Global Spatial-Temporal Modeling for Human Action Recognition](https://arxiv.org/abs/1608.00859), Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, Luc Van Gool
- [Temporal Segment Networks: Towards Good Practices for Deep Action Recognition](https://arxiv.org/abs/1608.00859), Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, Luc Van Gool
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册