未验证 提交 9afa98c9 编写于 作者: S SunGaofeng 提交者: GitHub

change video dirname to PaddleVideo (#2600)

上级 9b426b2c
## 简介
本教程期望给开发者提供基于PaddlePaddle的便捷、高效的使用深度学习算法解决视频理解、视频编辑、视频生成等一系列模型。目前包含视频分类和动作定位模型,后续会不断的扩展到其他更多场景。
目前视频分类和动作定位模型包括:
| 模型 | 类别 | 描述 |
| :--------------- | :--------: | :------------: |
| [Attention Cluster](./models/attention_cluster/README.md) | 视频分类| CVPR'18提出的视频多模态特征注意力聚簇融合方法 |
| [Attention LSTM](./models/attention_lstm/README.md) | 视频分类| 常用模型,速度快精度高 |
| [NeXtVLAD](./models/nextvlad/README.md) | 视频分类| 2nd-Youtube-8M最优单模型 |
| [StNet](./models/stnet/README.md) | 视频分类| AAAI'19提出的视频联合时空建模方法 |
| [TSM](./models/tsm/README.md) | 视频分类| 基于时序移位的简单高效视频时空建模方法 |
| [TSN](./models/tsn/README.md) | 视频分类| ECCV'16提出的基于2D-CNN经典解决方案 |
| [Non-local](./models/nonlocal_model/README.md) | 视频分类| 视频非局部关联建模模型 |
| [C-TCN](./models/ctcn/README.md) | 视频动作定位| 2018年ActivityNet夺冠方案 |
### 主要特点
- 包含视频分类和动作定位方向的多个主流领先模型,其中Attention LSTM,Attention Cluster和NeXtVLAD是比较流行的特征序列模型,Non-local, TSN, TSM和StNet是End-to-End的视频分类模型。Attention LSTM模型速度快精度高,NeXtVLAD是2nd-Youtube-8M比赛中最好的单模型, TSN是基于2D-CNN的经典解决方案,TSM是基于时序移位的简单高效视频时空建模方法,Non-local模型提出了视频非局部关联建模方法。Attention Cluster和StNet是百度自研模型,分别发表于CVPR2018和AAAI2019,是Kinetics600比赛第一名中使用到的模型。C-TCN也是百度自研模型,2018年ActivityNet比赛的夺冠方案。
- 提供了适合视频分类和动作定位任务的通用骨架代码,用户可一键式高效配置模型完成训练和评测。
## 安装
在当前模型库运行样例代码需要PadddlePaddle Fluid v.1.5.0或以上的版本。如果你的运行环境中的PaddlePaddle低于此版本,请根据[安装文档](http://www.paddlepaddle.org/documentation/docs/zh/1.5/beginners_guide/install/index_cn.html)中的说明来更新PaddlePaddle。
## 数据准备
视频模型库使用Youtube-8M和Kinetics数据集, 具体使用方法请参考[数据说明](./dataset/README.md)
## 快速使用
视频模型库提供通用的train/test/infer框架,通过`train.py/test.py/infer.py`指定模型名、模型配置参数等可一键式进行训练和预测。
以StNet模型为例:
单卡训练:
``` bash
export CUDA_VISIBLE_DEVICES=0
python train.py --model_name=STNET
--config=./configs/stnet.txt
--save_dir=checkpoints
```
多卡训练:
``` bash
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python train.py --model_name=STNET
--config=./configs/stnet.txt
--save_dir=checkpoints
```
视频模型库同时提供了快速训练脚本,脚本位于`scripts/train`目录下,可通过如下命令启动训练:
``` bash
bash scripts/train/train_stnet.sh
```
- 请根据`CUDA_VISIBLE_DEVICES`指定卡数修改`config`文件中的`num_gpus``batch_size`配置。
### 注意
使用Windows GPU环境的用户,需要将示例代码中的[fluid.ParallelExecutor](http://paddlepaddle.org/documentation/docs/zh/1.4/api_cn/fluid_cn.html#parallelexecutor)替换为[fluid.Executor](http://paddlepaddle.org/documentation/docs/zh/1.4/api_cn/fluid_cn.html#executor)
## 模型库结构
### 代码结构
```
configs/
stnet.txt
tsn.txt
...
dataset/
youtube/
kinetics/
datareader/
feature_readeer.py
kinetics_reader.py
...
metrics/
kinetics/
youtube8m/
...
models/
stnet/
tsn/
...
scripts/
train/
test/
train.py
test.py
infer.py
```
- `configs`: 各模型配置文件模板
- `datareader`: 提供Youtube-8M,Kinetics数据集reader
- `metrics`: Youtube-8,Kinetics数据集评估脚本
- `models`: 各模型网络结构构建脚本
- `scripts`: 各模型快速训练评估脚本
- `train.py`: 一键式训练脚本,可通过指定模型名,配置文件等一键式启动训练
- `test.py`: 一键式评估脚本,可通过指定模型名,配置文件,模型权重等一键式启动评估
- `infer.py`: 一键式推断脚本,可通过指定模型名,配置文件,模型权重,待推断文件列表等一键式启动推断
## Model Zoo
- 基于Youtube-8M数据集模型:
| 模型 | Batch Size | 环境配置 | cuDNN版本 | GAP | 下载链接 |
| :-------: | :---: | :---------: | :-----: | :----: | :----------: |
| Attention Cluster | 2048 | 8卡P40 | 7.1 | 0.84 | [model](https://paddlemodels.bj.bcebos.com/video_classification/attention_cluster_youtube8m.tar.gz) |
| Attention LSTM | 1024 | 8卡P40 | 7.1 | 0.86 | [model](https://paddlemodels.bj.bcebos.com/video_classification/attention_lstm_youtube8m.tar.gz) |
| NeXtVLAD | 160 | 4卡P40 | 7.1 | 0.87 | [model](https://paddlemodels.bj.bcebos.com/video_classification/nextvlad_youtube8m.tar.gz) |
- 基于Kinetics数据集模型:
| 模型 | Batch Size | 环境配置 | cuDNN版本 | Top-1 | 下载链接 |
| :-------: | :---: | :---------: | :----: | :----: | :----------: |
| StNet | 128 | 8卡P40 | 7.1 | 0.69 | [model](https://paddlemodels.bj.bcebos.com/video_classification/stnet_kinetics.tar.gz) |
| TSN | 256 | 8卡P40 | 7.1 | 0.67 | [model](https://paddlemodels.bj.bcebos.com/video_classification/tsn_kinetics.tar.gz) |
| TSM | 128 | 8卡P40 | 7.1 | 0.70 | [model](https://paddlemodels.bj.bcebos.com/video_classification/tsm_kinetics.tar.gz) |
| Non-local | 64 | 8卡P40 | 7.1 | 0.74 | [model](https://paddlemodels.bj.bcebos.com/video_classification/nonlocal_kinetics.tar.gz) |
- 基于ActivityNet的动作定位模型:
| 模型 | Batch Size | 环境配置 | cuDNN版本 | MAP | 下载链接 |
| :-------: | :---: | :---------: | :----: | :----: | :----------: |
| C-TCN | 16 | 8卡P40 | 7.1 | 0.31| [model](https://paddlemodels.bj.bcebos.com/video_detection/ctcn.tar.gz) |
## 参考文献
- [Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification](https://arxiv.org/abs/1711.09550), Xiang Long, Chuang Gan, Gerard de Melo, Jiajun Wu, Xiao Liu, Shilei Wen
- [Beyond Short Snippets: Deep Networks for Video Classification](https://arxiv.org/abs/1503.08909) Joe Yue-Hei Ng, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, George Toderici
- [NeXtVLAD: An Efficient Neural Network to Aggregate Frame-level Features for Large-scale Video Classification](https://arxiv.org/abs/1811.05014), Rongcheng Lin, Jing Xiao, Jianping Fan
- [StNet:Local and Global Spatial-Temporal Modeling for Human Action Recognition](https://arxiv.org/abs/1811.01549), Dongliang He, Zhichao Zhou, Chuang Gan, Fu Li, Xiao Liu, Yandong Li, Limin Wang, Shilei Wen
- [Temporal Segment Networks: Towards Good Practices for Deep Action Recognition](https://arxiv.org/abs/1608.00859), Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, Luc Van Gool
- [Temporal Shift Module for Efficient Video Understanding](https://arxiv.org/abs/1811.08383v1), Ji Lin, Chuang Gan, Song Han
- [Non-local Neural Networks](https://arxiv.org/abs/1711.07971v1), Xiaolong Wang, Ross Girshick, Abhinav Gupta, Kaiming He
## 版本更新
- 3/2019: 新增模型库,发布Attention Cluster,Attention LSTM,NeXtVLAD,StNet,TSN五个视频分类模型。
- 4/2019: 发布Non-local, TSM两个视频分类模型。
- 6/2019: 发布C-TCN视频动作定位模型;Non-local模型增加C2D ResNet101和I3D ResNet50骨干网络;NeXtVLAD、TSM模型速度和显存优化。
...@@ -5,11 +5,26 @@ import sys ...@@ -5,11 +5,26 @@ import sys
sys.path.append(os.environ['ceroot']) sys.path.append(os.environ['ceroot'])
from kpi import CostKpi, DurationKpi from kpi import CostKpi, DurationKpi
train_cost_card1_kpi = CostKpi('train_cost_card1', 0.08, 0, actived=True, desc='train cost') train_cost_card1_kpi = CostKpi(
train_speed_card1_kpi = DurationKpi('train_speed_card1', 0.08, 0, actived=True, desc='train speed in one GPU card') 'train_cost_card1', 0.08, 0, actived=True, desc='train cost')
train_cost_card4_kpi = CostKpi('train_cost_card4', 0.08, 0, actived=True, desc='train cost') train_speed_card1_kpi = DurationKpi(
train_speed_card4_kpi = DurationKpi('train_speed_card4', 0.3, 0, actived=True, desc='train speed in four GPU card') 'train_speed_card1',
tracking_kpis = [train_cost_card1_kpi, train_speed_card1_kpi, train_cost_card4_kpi, train_speed_card4_kpi] 0.08,
0,
actived=True,
desc='train speed in one GPU card')
train_cost_card4_kpi = CostKpi(
'train_cost_card4', 0.08, 0, actived=True, desc='train cost')
train_speed_card4_kpi = DurationKpi(
'train_speed_card4',
0.3,
0,
actived=True,
desc='train speed in four GPU card')
tracking_kpis = [
train_cost_card1_kpi, train_speed_card1_kpi, train_cost_card4_kpi,
train_speed_card4_kpi
]
def parse_log(log): def parse_log(log):
......
...@@ -62,7 +62,8 @@ def merge_configs(cfg, sec, args_dict): ...@@ -62,7 +62,8 @@ def merge_configs(cfg, sec, args_dict):
def print_configs(cfg, mode): def print_configs(cfg, mode):
logger.info("---------------- {:>5} Arguments ----------------".format(mode)) logger.info("---------------- {:>5} Arguments ----------------".format(
mode))
for sec, sec_items in cfg.items(): for sec, sec_items in cfg.items():
logger.info("{}:".format(sec)) logger.info("{}:".format(sec))
for k, v in sec_items.items(): for k, v in sec_items.items():
......
...@@ -64,7 +64,8 @@ class KineticsReader(DataReader): ...@@ -64,7 +64,8 @@ class KineticsReader(DataReader):
self.seg_num = self.get_config_from_sec(mode, 'seg_num', self.seg_num) self.seg_num = self.get_config_from_sec(mode, 'seg_num', self.seg_num)
self.short_size = self.get_config_from_sec(mode, 'short_size') self.short_size = self.get_config_from_sec(mode, 'short_size')
self.target_size = self.get_config_from_sec(mode, 'target_size') self.target_size = self.get_config_from_sec(mode, 'target_size')
self.num_reader_threads = self.get_config_from_sec(mode, 'num_reader_threads') self.num_reader_threads = self.get_config_from_sec(mode,
'num_reader_threads')
self.buf_size = self.get_config_from_sec(mode, 'buf_size') self.buf_size = self.get_config_from_sec(mode, 'buf_size')
self.enable_ce = self.get_config_from_sec(mode, 'enable_ce') self.enable_ce = self.get_config_from_sec(mode, 'enable_ce')
...@@ -99,7 +100,6 @@ class KineticsReader(DataReader): ...@@ -99,7 +100,6 @@ class KineticsReader(DataReader):
return _batch_reader return _batch_reader
def _reader_creator(self, def _reader_creator(self,
pickle_list, pickle_list,
mode, mode,
...@@ -113,8 +113,8 @@ class KineticsReader(DataReader): ...@@ -113,8 +113,8 @@ class KineticsReader(DataReader):
num_threads=1, num_threads=1,
buf_size=1024, buf_size=1024,
format='pkl'): format='pkl'):
def decode_mp4(sample, mode, seg_num, seglen, short_size, target_size, img_mean, def decode_mp4(sample, mode, seg_num, seglen, short_size, target_size,
img_std): img_mean, img_std):
sample = sample[0].split(' ') sample = sample[0].split(' ')
mp4_path = sample[0] mp4_path = sample[0]
# when infer, we store vid as label # when infer, we store vid as label
...@@ -122,8 +122,8 @@ class KineticsReader(DataReader): ...@@ -122,8 +122,8 @@ class KineticsReader(DataReader):
try: try:
imgs = mp4_loader(mp4_path, seg_num, seglen, mode) imgs = mp4_loader(mp4_path, seg_num, seglen, mode)
if len(imgs) < 1: if len(imgs) < 1:
logger.error('{} frame length {} less than 1.'.format(mp4_path, logger.error('{} frame length {} less than 1.'.format(
len(imgs))) mp4_path, len(imgs)))
return None, None return None, None
except: except:
logger.error('Error when loading {}'.format(mp4_path)) logger.error('Error when loading {}'.format(mp4_path))
...@@ -132,20 +132,20 @@ class KineticsReader(DataReader): ...@@ -132,20 +132,20 @@ class KineticsReader(DataReader):
return imgs_transform(imgs, label, mode, seg_num, seglen, \ return imgs_transform(imgs, label, mode, seg_num, seglen, \
short_size, target_size, img_mean, img_std) short_size, target_size, img_mean, img_std)
def decode_pickle(sample, mode, seg_num, seglen, short_size,
def decode_pickle(sample, mode, seg_num, seglen, short_size, target_size, target_size, img_mean, img_std):
img_mean, img_std):
pickle_path = sample[0] pickle_path = sample[0]
try: try:
if python_ver < (3, 0): if python_ver < (3, 0):
data_loaded = pickle.load(open(pickle_path, 'rb')) data_loaded = pickle.load(open(pickle_path, 'rb'))
else: else:
data_loaded = pickle.load(open(pickle_path, 'rb'), encoding='bytes') data_loaded = pickle.load(
open(pickle_path, 'rb'), encoding='bytes')
vid, label, frames = data_loaded vid, label, frames = data_loaded
if len(frames) < 1: if len(frames) < 1:
logger.error('{} frame length {} less than 1.'.format(pickle_path, logger.error('{} frame length {} less than 1.'.format(
len(frames))) pickle_path, len(frames)))
return None, None return None, None
except: except:
logger.info('Error when loading {}'.format(pickle_path)) logger.info('Error when loading {}'.format(pickle_path))
...@@ -160,9 +160,8 @@ class KineticsReader(DataReader): ...@@ -160,9 +160,8 @@ class KineticsReader(DataReader):
return imgs_transform(imgs, ret_label, mode, seg_num, seglen, \ return imgs_transform(imgs, ret_label, mode, seg_num, seglen, \
short_size, target_size, img_mean, img_std) short_size, target_size, img_mean, img_std)
def imgs_transform(imgs, label, mode, seg_num, seglen, short_size,
def imgs_transform(imgs, label, mode, seg_num, seglen, short_size, target_size, target_size, img_mean, img_std):
img_mean, img_std):
imgs = group_scale(imgs, short_size) imgs = group_scale(imgs, short_size)
if mode == 'train': if mode == 'train':
...@@ -182,11 +181,11 @@ class KineticsReader(DataReader): ...@@ -182,11 +181,11 @@ class KineticsReader(DataReader):
imgs = np_imgs imgs = np_imgs
imgs -= img_mean imgs -= img_mean
imgs /= img_std imgs /= img_std
imgs = np.reshape(imgs, (seg_num, seglen * 3, target_size, target_size)) imgs = np.reshape(imgs,
(seg_num, seglen * 3, target_size, target_size))
return imgs, label return imgs, label
def reader(): def reader():
with open(pickle_list) as flist: with open(pickle_list) as flist:
lines = [line.strip() for line in flist] lines = [line.strip() for line in flist]
...@@ -229,8 +228,14 @@ def group_multi_scale_crop(img_group, target_size, scales=None, \ ...@@ -229,8 +228,14 @@ def group_multi_scale_crop(img_group, target_size, scales=None, \
base_size = min(image_w, image_h) base_size = min(image_w, image_h)
crop_sizes = [int(base_size * x) for x in scales] crop_sizes = [int(base_size * x) for x in scales]
crop_h = [input_size[1] if abs(x - input_size[1]) < 3 else x for x in crop_sizes] crop_h = [
crop_w = [input_size[0] if abs(x - input_size[0]) < 3 else x for x in crop_sizes] input_size[1] if abs(x - input_size[1]) < 3 else x
for x in crop_sizes
]
crop_w = [
input_size[0] if abs(x - input_size[0]) < 3 else x
for x in crop_sizes
]
pairs = [] pairs = []
for i, h in enumerate(crop_h): for i, h in enumerate(crop_h):
...@@ -273,8 +278,14 @@ def group_multi_scale_crop(img_group, target_size, scales=None, \ ...@@ -273,8 +278,14 @@ def group_multi_scale_crop(img_group, target_size, scales=None, \
return crop_pair[0], crop_pair[1], w_offset, h_offset return crop_pair[0], crop_pair[1], w_offset, h_offset
crop_w, crop_h, offset_w, offset_h = _sample_crop_size(im_size) crop_w, crop_h, offset_w, offset_h = _sample_crop_size(im_size)
crop_img_group = [img.crop((offset_w, offset_h, offset_w + crop_w, offset_h + crop_h)) for img in img_group] crop_img_group = [
ret_img_group = [img.resize((input_size[0], input_size[1]), Image.BILINEAR) for img in crop_img_group] img.crop((offset_w, offset_h, offset_w + crop_w, offset_h + crop_h))
for img in img_group
]
ret_img_group = [
img.resize((input_size[0], input_size[1]), Image.BILINEAR)
for img in crop_img_group
]
return ret_img_group return ret_img_group
......
...@@ -52,7 +52,6 @@ class DataReader(object): ...@@ -52,7 +52,6 @@ class DataReader(object):
return self.cfg[sec.upper()].get(item, default) return self.cfg[sec.upper()].get(item, default)
class ReaderZoo(object): class ReaderZoo(object):
def __init__(self): def __init__(self):
self.reader_zoo = {} self.reader_zoo = {}
......
# C-TCN模型数据使用说明 # C-TCN模型数据使用说明
C-TCN模型使用ActivityNet 1.3数据集,具体下载方法请参考官方[下载说明](http://activity-net.org/index.html)。在训练此模型时,需要先使用训练好的TSN模型对mp4源文件进行特征提取,这里对RGB和Optical Flow分别提取特征,并存储为pickle文件格式。我们将会提供转化后的数据下载链接。转化后的数据文件目录结构为: C-TCN模型使用ActivityNet 1.3数据集,具体下载方法请参考官方[下载说明](http://activity-net.org/index.html)。在训练此模型时,需要先对mp4源文件抽取RGB和Flow特征,然后再用训练好的TSN模型提取出抽象的特征数据,并存储为pickle文件格式。我们将会提供转化后的数据下载链接。转化后的数据文件目录结构为:
``` ```
data data
......
...@@ -18,10 +18,8 @@ import paddle.fluid as fluid ...@@ -18,10 +18,8 @@ import paddle.fluid as fluid
class LogisticModel(object): class LogisticModel(object):
"""Logistic model.""" """Logistic model."""
def build_model(self,
model_input, def build_model(self, model_input, vocab_size, **unused_params):
vocab_size,
**unused_params):
"""Creates a logistic model. """Creates a logistic model.
Args: Args:
......
...@@ -147,5 +147,7 @@ class AttentionLSTM(ModelBase): ...@@ -147,5 +147,7 @@ class AttentionLSTM(ModelBase):
] ]
def weights_info(self): def weights_info(self):
return ('attention_lstm_youtube8m', return (
'https://paddlemodels.bj.bcebos.com/video_classification/attention_lstm_youtube8m.tar.gz') 'attention_lstm_youtube8m',
'https://paddlemodels.bj.bcebos.com/video_classification/attention_lstm_youtube8m.tar.gz'
)
...@@ -62,7 +62,9 @@ class LSTMAttentionModel(object): ...@@ -62,7 +62,9 @@ class LSTMAttentionModel(object):
input=[lstm_forward, lstm_backward], axis=1) input=[lstm_forward, lstm_backward], axis=1)
lstm_dropout = fluid.layers.dropout( lstm_dropout = fluid.layers.dropout(
x=lstm_concat, dropout_prob=self.drop_rate, is_test=(not is_training)) x=lstm_concat,
dropout_prob=self.drop_rate,
is_test=(not is_training))
lstm_weight = fluid.layers.fc( lstm_weight = fluid.layers.fc(
input=lstm_dropout, input=lstm_dropout,
......
...@@ -61,8 +61,8 @@ C-TCN的训练数据采用ActivityNet1.3提供的数据集,数据下载及准 ...@@ -61,8 +61,8 @@ C-TCN的训练数据采用ActivityNet1.3提供的数据集,数据下载及准
当取如下参数时,在ActivityNet1.3数据集下评估精度如下: 当取如下参数时,在ActivityNet1.3数据集下评估精度如下:
| score\_thresh | nms\_thresh | soft\_sigma | soft\_thresh | Top-1 | | score\_thresh | nms\_thresh | soft\_sigma | soft\_thresh | MAP |
| :-----------: | :---------: | :---------: | :----------: | :----: | | :-----------: | :---------: | :---------: | :----------: | :---: |
| 0.001 | 0.8 | 0.9 | 0.004 | 31% | | 0.001 | 0.8 | 0.9 | 0.004 | 31% |
......
...@@ -79,4 +79,3 @@ NeXtVLAD模型使用2nd-Youtube-8M数据集, 数据下载及准备请参考[数 ...@@ -79,4 +79,3 @@ NeXtVLAD模型使用2nd-Youtube-8M数据集, 数据下载及准备请参考[数
## 参考论文 ## 参考论文
- [NeXtVLAD: An Efficient Neural Network to Aggregate Frame-level Features for Large-scale Video Classification](https://arxiv.org/abs/1811.05014), Rongcheng Lin, Jing Xiao, Jianping Fan - [NeXtVLAD: An Efficient Neural Network to Aggregate Frame-level Features for Large-scale Video Classification](https://arxiv.org/abs/1811.05014), Rongcheng Lin, Jing Xiao, Jianping Fan
...@@ -46,7 +46,8 @@ class STNET(ModelBase): ...@@ -46,7 +46,8 @@ class STNET(ModelBase):
'l2_weight_decay') 'l2_weight_decay')
self.momentum = self.get_config_from_sec('train', 'momentum') self.momentum = self.get_config_from_sec('train', 'momentum')
self.seg_num = self.get_config_from_sec(self.mode, 'seg_num', self.seg_num) self.seg_num = self.get_config_from_sec(self.mode, 'seg_num',
self.seg_num)
self.target_size = self.get_config_from_sec(self.mode, 'target_size') self.target_size = self.get_config_from_sec(self.mode, 'target_size')
self.batch_size = self.get_config_from_sec(self.mode, 'batch_size') self.batch_size = self.get_config_from_sec(self.mode, 'batch_size')
...@@ -127,11 +128,16 @@ class STNET(ModelBase): ...@@ -127,11 +128,16 @@ class STNET(ModelBase):
] ]
def pretrain_info(self): def pretrain_info(self):
return ('ResNet50_pretrained', 'https://paddlemodels.bj.bcebos.com/video_classification/ResNet50_pretrained.tar.gz') return (
'ResNet50_pretrained',
'https://paddlemodels.bj.bcebos.com/video_classification/ResNet50_pretrained.tar.gz'
)
def weights_info(self): def weights_info(self):
return ('stnet_kinetics', return (
'https://paddlemodels.bj.bcebos.com/video_classification/stnet_kinetics.tar.gz') 'stnet_kinetics',
'https://paddlemodels.bj.bcebos.com/video_classification/stnet_kinetics.tar.gz'
)
def load_pretrain_params(self, exe, pretrain, prog, place): def load_pretrain_params(self, exe, pretrain, prog, place):
def is_parameter(var): def is_parameter(var):
...@@ -139,7 +145,9 @@ class STNET(ModelBase): ...@@ -139,7 +145,9 @@ class STNET(ModelBase):
return isinstance(var, fluid.framework.Parameter) and (not ("fc_0" in var.name)) \ return isinstance(var, fluid.framework.Parameter) and (not ("fc_0" in var.name)) \
and (not ("batch_norm" in var.name)) and (not ("xception" in var.name)) and (not ("conv3d" in var.name)) and (not ("batch_norm" in var.name)) and (not ("xception" in var.name)) and (not ("conv3d" in var.name))
logger.info("Load pretrain weights from {}, exclude fc, batch_norm, xception, conv3d layers.".format(pretrain)) logger.info(
"Load pretrain weights from {}, exclude fc, batch_norm, xception, conv3d layers.".
format(pretrain))
vars = filter(is_parameter, prog.list_vars()) vars = filter(is_parameter, prog.list_vars())
fluid.io.load_vars(exe, pretrain, vars=vars, main_program=prog) fluid.io.load_vars(exe, pretrain, vars=vars, main_program=prog)
......
...@@ -122,17 +122,23 @@ class TSM(ModelBase): ...@@ -122,17 +122,23 @@ class TSM(ModelBase):
] ]
def pretrain_info(self): def pretrain_info(self):
return ('ResNet50_pretrained', 'https://paddlemodels.bj.bcebos.com/video_classification/ResNet50_pretrained.tar.gz') return (
'ResNet50_pretrained',
'https://paddlemodels.bj.bcebos.com/video_classification/ResNet50_pretrained.tar.gz'
)
def weights_info(self): def weights_info(self):
return ('tsm_kinetics', return (
'https://paddlemodels.bj.bcebos.com/video_classification/tsm_kinetics.tar.gz') 'tsm_kinetics',
'https://paddlemodels.bj.bcebos.com/video_classification/tsm_kinetics.tar.gz'
)
def load_pretrain_params(self, exe, pretrain, prog, place): def load_pretrain_params(self, exe, pretrain, prog, place):
def is_parameter(var): def is_parameter(var):
return isinstance(var, fluid.framework.Parameter) and (not ("fc_0" in var.name)) return isinstance(var, fluid.framework.Parameter) and (
not ("fc_0" in var.name))
logger.info("Load pretrain weights from {}, exclude fc layer.".format(pretrain)) logger.info("Load pretrain weights from {}, exclude fc layer.".format(
pretrain))
vars = filter(is_parameter, prog.list_vars()) vars = filter(is_parameter, prog.list_vars())
fluid.io.load_vars(exe, pretrain, vars=vars, main_program=prog) fluid.io.load_vars(exe, pretrain, vars=vars, main_program=prog)
...@@ -45,19 +45,21 @@ class TSM_ResNet(): ...@@ -45,19 +45,21 @@ class TSM_ResNet():
padding=(filter_size - 1) // 2, padding=(filter_size - 1) // 2,
groups=groups, groups=groups,
act=None, act=None,
param_attr=fluid.param_attr.ParamAttr(name=name+"_weights"), param_attr=fluid.param_attr.ParamAttr(name=name + "_weights"),
bias_attr=False) bias_attr=False)
if name == "conv1": if name == "conv1":
bn_name = "bn_" + name bn_name = "bn_" + name
else: else:
bn_name = "bn" + name[3:] bn_name = "bn" + name[3:]
return fluid.layers.batch_norm(input=conv, act=act, return fluid.layers.batch_norm(
is_test=(not self.is_training), input=conv,
param_attr=fluid.param_attr.ParamAttr(name=bn_name+"_scale"), act=act,
bias_attr=fluid.param_attr.ParamAttr(bn_name+'_offset'), is_test=(not self.is_training),
moving_mean_name=bn_name+"_mean", param_attr=fluid.param_attr.ParamAttr(name=bn_name + "_scale"),
moving_variance_name=bn_name+'_variance') bias_attr=fluid.param_attr.ParamAttr(bn_name + '_offset'),
moving_mean_name=bn_name + "_mean",
moving_variance_name=bn_name + '_variance')
def shortcut(self, input, ch_out, stride, name): def shortcut(self, input, ch_out, stride, name):
ch_in = input.shape[1] ch_in = input.shape[1]
...@@ -70,18 +72,27 @@ class TSM_ResNet(): ...@@ -70,18 +72,27 @@ class TSM_ResNet():
shifted = self.shift_module(input) shifted = self.shift_module(input)
conv0 = self.conv_bn_layer( conv0 = self.conv_bn_layer(
input=shifted, num_filters=num_filters, filter_size=1, act='relu', input=shifted,
name=name+"_branch2a") num_filters=num_filters,
filter_size=1,
act='relu',
name=name + "_branch2a")
conv1 = self.conv_bn_layer( conv1 = self.conv_bn_layer(
input=conv0, input=conv0,
num_filters=num_filters, num_filters=num_filters,
filter_size=3, filter_size=3,
stride=stride, stride=stride,
act='relu', name=name+"_branch2b") act='relu',
name=name + "_branch2b")
conv2 = self.conv_bn_layer( conv2 = self.conv_bn_layer(
input=conv1, num_filters=num_filters * 4, filter_size=1, act=None, name=name+"_branch2c") input=conv1,
num_filters=num_filters * 4,
filter_size=1,
act=None,
name=name + "_branch2c")
short = self.shortcut(input, num_filters * 4, stride, name=name+"_branch1") short = self.shortcut(
input, num_filters * 4, stride, name=name + "_branch1")
return fluid.layers.elementwise_add(x=short, y=conv2, act='relu') return fluid.layers.elementwise_add(x=short, y=conv2, act='relu')
...@@ -109,7 +120,12 @@ class TSM_ResNet(): ...@@ -109,7 +120,12 @@ class TSM_ResNet():
num_filters = [64, 128, 256, 512] num_filters = [64, 128, 256, 512]
conv = self.conv_bn_layer( conv = self.conv_bn_layer(
input=input, num_filters=64, filter_size=7, stride=2, act='relu', name='conv1') input=input,
num_filters=64,
filter_size=7,
stride=2,
act='relu',
name='conv1')
conv = fluid.layers.pool2d( conv = fluid.layers.pool2d(
input=conv, input=conv,
pool_size=3, pool_size=3,
...@@ -121,11 +137,11 @@ class TSM_ResNet(): ...@@ -121,11 +137,11 @@ class TSM_ResNet():
for i in range(depth[block]): for i in range(depth[block]):
if layers in [101, 152] and block == 2: if layers in [101, 152] and block == 2:
if i == 0: if i == 0:
conv_name = "res" + str(block+2) + "a" conv_name = "res" + str(block + 2) + "a"
else: else:
conv_name = "res" + str(block+2) + "b" + str(i) conv_name = "res" + str(block + 2) + "b" + str(i)
else: else:
conv_name = "res" + str(block+2) + chr(97+i) conv_name = "res" + str(block + 2) + chr(97 + i)
conv = self.bottleneck_block( conv = self.bottleneck_block(
input=conv, input=conv,
...@@ -136,7 +152,8 @@ class TSM_ResNet(): ...@@ -136,7 +152,8 @@ class TSM_ResNet():
pool = fluid.layers.pool2d( pool = fluid.layers.pool2d(
input=conv, pool_size=7, pool_type='avg', global_pooling=True) input=conv, pool_size=7, pool_type='avg', global_pooling=True)
dropout = fluid.layers.dropout(x=pool, dropout_prob=0.5, is_test=(not self.is_training)) dropout = fluid.layers.dropout(
x=pool, dropout_prob=0.5, is_test=(not self.is_training))
feature = fluid.layers.reshape( feature = fluid.layers.reshape(
x=dropout, shape=[-1, seg_num, pool.shape[1]]) x=dropout, shape=[-1, seg_num, pool.shape[1]])
...@@ -149,6 +166,7 @@ class TSM_ResNet(): ...@@ -149,6 +166,7 @@ class TSM_ResNet():
param_attr=fluid.param_attr.ParamAttr( param_attr=fluid.param_attr.ParamAttr(
initializer=fluid.initializer.Uniform(-stdv, initializer=fluid.initializer.Uniform(-stdv,
stdv)), stdv)),
bias_attr=fluid.param_attr.ParamAttr(learning_rate=2.0, bias_attr=fluid.param_attr.ParamAttr(
learning_rate=2.0,
regularizer=fluid.regularizer.L2Decay(0.))) regularizer=fluid.regularizer.L2Decay(0.)))
return out return out
...@@ -47,7 +47,8 @@ class TSN(ModelBase): ...@@ -47,7 +47,8 @@ class TSN(ModelBase):
'l2_weight_decay') 'l2_weight_decay')
self.momentum = self.get_config_from_sec('train', 'momentum') self.momentum = self.get_config_from_sec('train', 'momentum')
self.seg_num = self.get_config_from_sec(self.mode, 'seg_num', self.seg_num) self.seg_num = self.get_config_from_sec(self.mode, 'seg_num',
self.seg_num)
self.target_size = self.get_config_from_sec(self.mode, 'target_size') self.target_size = self.get_config_from_sec(self.mode, 'target_size')
self.batch_size = self.get_config_from_sec(self.mode, 'batch_size') self.batch_size = self.get_config_from_sec(self.mode, 'batch_size')
...@@ -131,17 +132,23 @@ class TSN(ModelBase): ...@@ -131,17 +132,23 @@ class TSN(ModelBase):
] ]
def pretrain_info(self): def pretrain_info(self):
return ('ResNet50_pretrained', 'https://paddlemodels.bj.bcebos.com/video_classification/ResNet50_pretrained.tar.gz') return (
'ResNet50_pretrained',
'https://paddlemodels.bj.bcebos.com/video_classification/ResNet50_pretrained.tar.gz'
)
def weights_info(self): def weights_info(self):
return ('tsn_kinetics', return (
'https://paddlemodels.bj.bcebos.com/video_classification/tsn_kinetics.tar.gz') 'tsn_kinetics',
'https://paddlemodels.bj.bcebos.com/video_classification/tsn_kinetics.tar.gz'
)
def load_pretrain_params(self, exe, pretrain, prog, place): def load_pretrain_params(self, exe, pretrain, prog, place):
def is_parameter(var): def is_parameter(var):
return isinstance(var, fluid.framework.Parameter) and (not ("fc_0" in var.name)) return isinstance(var, fluid.framework.Parameter) and (
not ("fc_0" in var.name))
logger.info("Load pretrain weights from {}, exclude fc layer.".format(pretrain)) logger.info("Load pretrain weights from {}, exclude fc layer.".format(
pretrain))
vars = filter(is_parameter, prog.list_vars()) vars = filter(is_parameter, prog.list_vars())
fluid.io.load_vars(exe, pretrain, vars=vars, main_program=prog) fluid.io.load_vars(exe, pretrain, vars=vars, main_program=prog)
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册