提交 29653aea 编写于 作者: 0 0YuanZhang0

Merge branch 'master' of https://github.com/PaddlePaddle/hapi into sequence_tagging

...@@ -29,7 +29,6 @@ BMN Overview ...@@ -29,7 +29,6 @@ BMN Overview
├── train.py # 训练代码,训练网络 ├── train.py # 训练代码,训练网络
├── eval.py # 评估代码,评估网络性能 ├── eval.py # 评估代码,评估网络性能
├── predict.py # 预测代码,针对任意输入预测结果 ├── predict.py # 预测代码,针对任意输入预测结果
├── bmn_model.py # 网络结构与损失函数定义
├── bmn_metric.py # 精度评估方法定义 ├── bmn_metric.py # 精度评估方法定义
├── reader.py # 数据reader,构造Dataset和Dataloader ├── reader.py # 数据reader,构造Dataset和Dataloader
├── bmn_utils.py # 模型细节相关代码 ├── bmn_utils.py # 模型细节相关代码
...@@ -41,7 +40,7 @@ BMN Overview ...@@ -41,7 +40,7 @@ BMN Overview
## 数据准备 ## 数据准备
BMN的训练数据采用ActivityNet1.3提供的数据集,我们提供了处理好的视频特征,请下载[bmn\_feat](https://paddlemodels.bj.bcebos.com/video_detection/bmn_feat.tar.gz)数据后解压,同时相应的修改bmn.yaml中的特征路径feat\_path。对应的标签文件请下载[label](https://paddlemodels.bj.bcebos.com/video_detection/activitynet_1.3_annotations.json)并修改bmn.yaml中的标签文件路径anno\_file。 BMN的训练数据采用ActivityNet1.3提供的数据集,我们提供了处理好的视频特征和对应的标签文件,请下载特征数据[bmn\_feat](https://paddlemodels.bj.bcebos.com/video_detection/bmn_feat.tar.gz)和标签数据[label](https://paddlemodels.bj.bcebos.com/video_detection/activitynet_1.3_annotations.json),并相应地修改配置文件bmn.yaml中的特征文件路径feat\_path和标签文件路径anno\_file。
## 模型训练 ## 模型训练
...@@ -52,22 +51,17 @@ BMN的训练数据采用ActivityNet1.3提供的数据集,我们提供了处理 ...@@ -52,22 +51,17 @@ BMN的训练数据采用ActivityNet1.3提供的数据集,我们提供了处理
bash run.sh bash run.sh
若使用单卡训练,启动方式如下: 若使用单卡训练,请将配置文件bmn.yaml中的batch\_size调整为16,启动方式如下:
export CUDA_VISIBLE_DEVICES=0
python train.py python train.py
- 代码运行需要先安装pandas 默认使用静态图训练,若使用动态图训练只需要在运行脚本添加`-d`参数即可,如:
- 从头开始训练,使用上述启动命令行或者脚本程序即可启动训练,不需要用到预训练模型
- 单卡训练时,请将配置文件中的batch_size调整为16 python train.py -d
**训练策略:** - 代码运行需要先安装pandas
* 采用Adam优化器,初始learning\_rate=0.001 - 从头开始训练,使用上述启动命令行或者脚本程序即可启动训练,不需要用到预训练模型
* 权重衰减系数为1e-4
* 学习率在迭代次数达到4200的时候做一次衰减,衰减系数为0.1
## 模型评估 ## 模型评估
...@@ -76,9 +70,9 @@ BMN的训练数据采用ActivityNet1.3提供的数据集,我们提供了处理 ...@@ -76,9 +70,9 @@ BMN的训练数据采用ActivityNet1.3提供的数据集,我们提供了处理
python eval.py --weights=$PATH_TO_WEIGHTS python eval.py --weights=$PATH_TO_WEIGHTS
- 进行评估时,可修改命令行中的`weights`参数指定需要评估的权重,如果不设置,将使用默认参数文件checkpoint/final.pdparams - 进行评估时,可修改命令行中的`weights`参数指定需要评估的权重,若未指定,脚本会下载已发布的模型[model](https://paddlemodels.bj.bcebos.com/hapi/bmn.pdparams)进行评估
- 上述程序会将运行结果保存在output/EVAL/BMN\_results文件夹下,测试结果保存在evaluate\_results/bmn\_results\_validation.json文件中 - 上述程序会将运行结果保存在`--output_path`参数指定的文件夹下,默认为output/EVAL/BMN\_results;测试结果保存在`--result_path`参数指定的文件夹下,默认为evaluate\_results
- 注:评估时可能会出现loss为nan的情况。这是由于评估时用的是单个样本,可能存在没有iou>0.6的样本,所以为nan,对最终的评估结果没有影响。 - 注:评估时可能会出现loss为nan的情况。这是由于评估时用的是单个样本,可能存在没有iou>0.6的样本,所以为nan,对最终的评估结果没有影响。
...@@ -87,9 +81,9 @@ BMN的训练数据采用ActivityNet1.3提供的数据集,我们提供了处理 ...@@ -87,9 +81,9 @@ BMN的训练数据采用ActivityNet1.3提供的数据集,我们提供了处理
- ActivityNet数据集的具体使用说明可以参考其[官方网站](http://activity-net.org) - ActivityNet数据集的具体使用说明可以参考其[官方网站](http://activity-net.org)
- 下载指标评估代码,请从[ActivityNet Gitub repository](https://github.com/activitynet/ActivityNet.git)下载,将Evaluation文件夹拷贝至models/dygraph/bmn目录下。(注:由于第三方评估代码不支持python3,此处建议使用python2进行评估;若使用python3,print函数需要添加括号,请对Evaluation目录下的.py文件做相应修改。) - 下载指标评估代码,请从[ActivityNet Gitub repository](https://github.com/activitynet/ActivityNet.git)下载,将Evaluation文件夹拷贝至hapi/examples/bmn目录下。(注:由于第三方评估代码不支持python3,此处建议使用python2进行评估;若使用python3,print函数需要添加括号,请对Evaluation目录下的.py文件做相应修改。)
- 请下载[activity\_net\_1\_3\_new.json](https://paddlemodels.bj.bcebos.com/video_detection/activity_net_1_3_new.json)文件,并将其放置在models/dygraph/bmn/Evaluation/data目录下,相较于原始的activity\_net.v1-3.min.json文件,我们过滤了其中一些失效的视频条目。 - 请下载[activity\_net\_1\_3\_new.json](https://paddlemodels.bj.bcebos.com/video_detection/activity_net_1_3_new.json)文件,并将其放置在hapi/examples/bmn/Evaluation/data目录下,相较于原始的activity\_net.v1-3.min.json文件,我们过滤了其中一些失效的视频条目。
- 计算精度指标 - 计算精度指标
...@@ -100,7 +94,7 @@ BMN的训练数据采用ActivityNet1.3提供的数据集,我们提供了处理 ...@@ -100,7 +94,7 @@ BMN的训练数据采用ActivityNet1.3提供的数据集,我们提供了处理
| AR@1 | AR@5 | AR@10 | AR@100 | AUC | | AR@1 | AR@5 | AR@10 | AR@100 | AUC |
| :---: | :---: | :---: | :---: | :---: | | :---: | :---: | :---: | :---: | :---: |
| 33.46 | 49.25 | 56.25 | 75.40 | 67.16% | | 33.10 | 49.18 | 56.54 | 75.12 | 67.16% |
## 模型推断 ## 模型推断
...@@ -110,9 +104,9 @@ BMN的训练数据采用ActivityNet1.3提供的数据集,我们提供了处理 ...@@ -110,9 +104,9 @@ BMN的训练数据采用ActivityNet1.3提供的数据集,我们提供了处理
python predict.py --weights=$PATH_TO_WEIGHTS \ python predict.py --weights=$PATH_TO_WEIGHTS \
--filelist=$FILELIST --filelist=$FILELIST
- 使用python命令行启动程序时,`--filelist`参数指定待推断的文件列表,如果不设置,默认为./infer.list。`--weights`参数为训练好的权重参数,如果不设置,将使用默认参数文件checkpoint/final.pdparams - 使用python命令行启动程序时,`--filelist`参数指定待推断的文件列表,如果不设置,默认为./infer.list。`--weights`参数为训练好的权重参数,若未指定,脚本会下载已发布的模型[model](https://paddlemodels.bj.bcebos.com/hapi/bmn.pdparams)进行预测
- 上述程序会将运行结果保存在output/INFER/BMN\_results文件夹下,测试结果保存在predict\_results/bmn\_results\_test.json文件中 - 上述程序会将运行结果保存在`--output_path`参数指定的文件夹下,默认为output/INFER/BMN\_results;测试结果保存在`--result_path`参数指定的文件夹下,默认为predict\_results
## 参考论文 ## 参考论文
......
...@@ -12,11 +12,10 @@ MODEL: ...@@ -12,11 +12,10 @@ MODEL:
TRAIN: TRAIN:
subset: "train" subset: "train"
epoch: 9 epoch: 9
batch_size: 4 batch_size: 4
num_workers: 4 num_workers: 4
use_shuffle: True use_shuffle: True
device: "gpu" device: "gpu"
num_gpus: 4
learning_rate: 0.001 learning_rate: 0.001
learning_rate_decay: 0.1 learning_rate_decay: 0.1
lr_decay_iter: 4200 lr_decay_iter: 4200
...@@ -29,10 +28,6 @@ TEST: ...@@ -29,10 +28,6 @@ TEST:
subset: "validation" subset: "validation"
batch_size: 1 batch_size: 1
num_workers: 1 num_workers: 1
use_buffer: False
snms_alpha: 0.001
snms_t1: 0.5
snms_t2: 0.9
output_path: "output/EVAL/BMN_results" output_path: "output/EVAL/BMN_results"
result_path: "evaluate_results" result_path: "evaluate_results"
...@@ -40,10 +35,6 @@ INFER: ...@@ -40,10 +35,6 @@ INFER:
subset: "test" subset: "test"
batch_size: 1 batch_size: 1
num_workers: 1 num_workers: 1
use_buffer: False
snms_alpha: 0.4
snms_t1: 0.5
snms_t2: 0.9
filelist: './infer.list' filelist: './infer.list'
output_path: "output/INFER/BMN_results" output_path: "output/INFER/BMN_results"
result_path: "predict_results" result_path: "predict_results"
......
...@@ -36,8 +36,21 @@ class BmnMetric(Metric): ...@@ -36,8 +36,21 @@ class BmnMetric(Metric):
#get video_dict and video_list #get video_dict and video_list
if self.mode == 'test': if self.mode == 'test':
self.get_test_dataset_dict() self.get_test_dataset_dict()
if not os.path.isdir(self.cfg.TEST.output_path):
os.makedirs(self.cfg.TEST.output_path)
if not os.path.isdir(self.cfg.TEST.result_path):
os.makedirs(self.cfg.TEST.result_path)
elif self.mode == 'infer': elif self.mode == 'infer':
self.get_infer_dataset_dict() self.get_infer_dataset_dict()
if not os.path.isdir(self.cfg.INFER.output_path):
os.makedirs(self.cfg.INFER.output_path)
if not os.path.isdir(self.cfg.INFER.result_path):
os.makedirs(self.cfg.INFER.result_path)
def add_metric_op(self, preds, label):
pred_bm, pred_start, pred_en = preds
video_index = label[-1]
return [pred_bm, pred_start, pred_en, video_index] #return list
def update(self, pred_bm, pred_start, pred_end, fid): def update(self, pred_bm, pred_start, pred_end, fid):
# generate proposals # generate proposals
......
...@@ -37,7 +37,6 @@ def parse_args(): ...@@ -37,7 +37,6 @@ def parse_args():
parser.add_argument( parser.add_argument(
"-d", "-d",
"--dynamic", "--dynamic",
default=True,
action='store_true', action='store_true',
help="enable dygraph mode, only support dynamic mode at present time") help="enable dygraph mode, only support dynamic mode at present time")
parser.add_argument( parser.add_argument(
...@@ -56,6 +55,17 @@ def parse_args(): ...@@ -56,6 +55,17 @@ def parse_args():
default=None, default=None,
help='weight path, None to automatically download weights provided by Paddle.' help='weight path, None to automatically download weights provided by Paddle.'
) )
parser.add_argument(
'--output_path',
type=str,
default="output/EVAL/BMN_results",
help='output dir path, default to use output/EVAL/BMN_results')
parser.add_argument(
'--result_path',
type=str,
default="evaluate_results/",
help='output dir path after post processing, default to use ./evaluate_results/'
)
parser.add_argument( parser.add_argument(
'--log_interval', '--log_interval',
type=int, type=int,
...@@ -67,17 +77,21 @@ def parse_args(): ...@@ -67,17 +77,21 @@ def parse_args():
# Performance Evaluation # Performance Evaluation
def test_bmn(args): def test_bmn(args):
# only support dynamic mode at present time
device = set_device(args.device) device = set_device(args.device)
fluid.enable_dygraph(device) if args.dynamic else None fluid.enable_dygraph(device) if args.dynamic else None
#config setting
config = parse_config(args.config_file) config = parse_config(args.config_file)
eval_cfg = merge_configs(config, 'test', vars(args)) eval_cfg = merge_configs(config, 'test', vars(args))
if not os.path.isdir(config.TEST.output_path):
os.makedirs(config.TEST.output_path)
if not os.path.isdir(config.TEST.result_path):
os.makedirs(config.TEST.result_path)
feat_dim = config.MODEL.feat_dim
tscale = config.MODEL.tscale
dscale = config.MODEL.dscale
prop_boundary_ratio = config.MODEL.prop_boundary_ratio
num_sample = config.MODEL.num_sample
num_sample_perbin = config.MODEL.num_sample_perbin
#input and video index
inputs = [ inputs = [
Input( Input(
[None, config.MODEL.feat_dim, config.MODEL.tscale], [None, config.MODEL.feat_dim, config.MODEL.tscale],
...@@ -97,9 +111,14 @@ def test_bmn(args): ...@@ -97,9 +111,14 @@ def test_bmn(args):
eval_dataset = BmnDataset(eval_cfg, 'test') eval_dataset = BmnDataset(eval_cfg, 'test')
#model #model
model = bmn(config, pretrained=args.weights is None) model = bmn(tscale,
dscale,
prop_boundary_ratio,
num_sample,
num_sample_perbin,
pretrained=args.weights is None)
model.prepare( model.prepare(
loss_function=BmnLoss(config), loss_function=BmnLoss(tscale, dscale),
metrics=BmnMetric( metrics=BmnMetric(
config, mode='test'), config, mode='test'),
inputs=inputs, inputs=inputs,
......
...@@ -37,7 +37,6 @@ def parse_args(): ...@@ -37,7 +37,6 @@ def parse_args():
parser.add_argument( parser.add_argument(
"-d", "-d",
"--dynamic", "--dynamic",
default=True,
action='store_true', action='store_true',
help="enable dygraph mode, only support dynamic mode at present time") help="enable dygraph mode, only support dynamic mode at present time")
parser.add_argument( parser.add_argument(
...@@ -54,10 +53,21 @@ def parse_args(): ...@@ -54,10 +53,21 @@ def parse_args():
help='weight path, None to automatically download weights provided by Paddle.' help='weight path, None to automatically download weights provided by Paddle.'
) )
parser.add_argument( parser.add_argument(
'--save_dir', '--filelist',
type=str,
default="infer.list",
help='infer file list, default to use ./infer.list')
parser.add_argument(
'--output_path',
type=str,
default="output/INFER/BMN_results",
help='output dir path, default to use output/INFER/BMN_results')
parser.add_argument(
'--result_path',
type=str, type=str,
default="predict_results/", default="predict_results/",
help='output dir path, default to use ./predict_results/') help='output dir path after post processing, default to use ./predict_results/'
)
parser.add_argument( parser.add_argument(
'--log_interval', '--log_interval',
type=int, type=int,
...@@ -69,18 +79,21 @@ def parse_args(): ...@@ -69,18 +79,21 @@ def parse_args():
# Prediction # Prediction
def infer_bmn(args): def infer_bmn(args):
# only support dynamic mode at present time
device = set_device(args.device) device = set_device(args.device)
fluid.enable_dygraph(device) if args.dynamic else None fluid.enable_dygraph(device) if args.dynamic else None
#config setting
config = parse_config(args.config_file) config = parse_config(args.config_file)
infer_cfg = merge_configs(config, 'infer', vars(args)) infer_cfg = merge_configs(config, 'infer', vars(args))
if not os.path.isdir(config.INFER.output_path): feat_dim = config.MODEL.feat_dim
os.makedirs(config.INFER.output_path) tscale = config.MODEL.tscale
if not os.path.isdir(config.INFER.result_path): dscale = config.MODEL.dscale
os.makedirs(config.INFER.result_path) prop_boundary_ratio = config.MODEL.prop_boundary_ratio
num_sample = config.MODEL.num_sample
num_sample_perbin = config.MODEL.num_sample_perbin
#input and video index
inputs = [ inputs = [
Input( Input(
[None, config.MODEL.feat_dim, config.MODEL.tscale], [None, config.MODEL.feat_dim, config.MODEL.tscale],
...@@ -92,7 +105,13 @@ def infer_bmn(args): ...@@ -92,7 +105,13 @@ def infer_bmn(args):
#data #data
infer_dataset = BmnDataset(infer_cfg, 'infer') infer_dataset = BmnDataset(infer_cfg, 'infer')
model = bmn(config, pretrained=args.weights is None) #model
model = bmn(tscale,
dscale,
prop_boundary_ratio,
num_sample,
num_sample_perbin,
pretrained=args.weights is None)
model.prepare( model.prepare(
metrics=BmnMetric( metrics=BmnMetric(
config, mode='infer'), config, mode='infer'),
......
export CUDA_VISIBLE_DEVICES=0,1,2,3 export CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch train.py python -m paddle.distributed.launch train.py
...@@ -34,11 +34,7 @@ logger = logging.getLogger(__name__) ...@@ -34,11 +34,7 @@ logger = logging.getLogger(__name__)
def parse_args(): def parse_args():
parser = argparse.ArgumentParser("Paddle high level api of BMN.") parser = argparse.ArgumentParser("Paddle high level api of BMN.")
parser.add_argument( parser.add_argument(
"-d", "-d", "--dynamic", action='store_true', help="enable dygraph mode")
"--dynamic",
default=True,
action='store_true',
help="enable dygraph mode")
parser.add_argument( parser.add_argument(
'--config_file', '--config_file',
type=str, type=str,
...@@ -48,7 +44,7 @@ def parse_args(): ...@@ -48,7 +44,7 @@ def parse_args():
'--batch_size', '--batch_size',
type=int, type=int,
default=None, default=None,
help='training batch size. None to use config file setting.') help='training batch size. None for read from config file.')
parser.add_argument( parser.add_argument(
'--learning_rate', '--learning_rate',
type=float, type=float,
...@@ -68,8 +64,8 @@ def parse_args(): ...@@ -68,8 +64,8 @@ def parse_args():
parser.add_argument( parser.add_argument(
'--epoch', '--epoch',
type=int, type=int,
default=9, default=None,
help='epoch number, 0 for read from config file') help='epoch number, None for read from config file')
parser.add_argument( parser.add_argument(
'--valid_interval', '--valid_interval',
type=int, type=int,
...@@ -113,22 +109,23 @@ def train_bmn(args): ...@@ -113,22 +109,23 @@ def train_bmn(args):
if not os.path.isdir(args.save_dir): if not os.path.isdir(args.save_dir):
os.makedirs(args.save_dir) os.makedirs(args.save_dir)
#config setting
config = parse_config(args.config_file) config = parse_config(args.config_file)
train_cfg = merge_configs(config, 'train', vars(args)) train_cfg = merge_configs(config, 'train', vars(args))
val_cfg = merge_configs(config, 'valid', vars(args)) val_cfg = merge_configs(config, 'valid', vars(args))
inputs = [ feat_dim = config.MODEL.feat_dim
Input( tscale = config.MODEL.tscale
[None, config.MODEL.feat_dim, config.MODEL.tscale], dscale = config.MODEL.dscale
'float32', prop_boundary_ratio = config.MODEL.prop_boundary_ratio
name='feat_input') num_sample = config.MODEL.num_sample
] num_sample_perbin = config.MODEL.num_sample_perbin
gt_iou_map = Input(
[None, config.MODEL.dscale, config.MODEL.tscale], # input and label list
'float32', inputs = [Input([None, feat_dim, tscale], 'float32', name='feat_input')]
name='gt_iou_map') gt_iou_map = Input([None, dscale, tscale], 'float32', name='gt_iou_map')
gt_start = Input([None, config.MODEL.tscale], 'float32', name='gt_start') gt_start = Input([None, tscale], 'float32', name='gt_start')
gt_end = Input([None, config.MODEL.tscale], 'float32', name='gt_end') gt_end = Input([None, tscale], 'float32', name='gt_end')
labels = [gt_iou_map, gt_start, gt_end] labels = [gt_iou_map, gt_start, gt_end]
# data # data
...@@ -136,11 +133,16 @@ def train_bmn(args): ...@@ -136,11 +133,16 @@ def train_bmn(args):
val_dataset = BmnDataset(val_cfg, 'valid') val_dataset = BmnDataset(val_cfg, 'valid')
# model # model
model = bmn(config, pretrained=False) model = bmn(tscale,
dscale,
prop_boundary_ratio,
num_sample,
num_sample_perbin,
pretrained=False)
optim = optimizer(config, parameter_list=model.parameters()) optim = optimizer(config, parameter_list=model.parameters())
model.prepare( model.prepare(
optimizer=optim, optimizer=optim,
loss_function=BmnLoss(config), loss_function=BmnLoss(tscale, dscale),
inputs=inputs, inputs=inputs,
labels=labels, labels=labels,
device=device) device=device)
...@@ -148,11 +150,10 @@ def train_bmn(args): ...@@ -148,11 +150,10 @@ def train_bmn(args):
# if resume weights is given, load resume weights directly # if resume weights is given, load resume weights directly
if args.resume is not None: if args.resume is not None:
model.load(args.resume) model.load(args.resume)
model.fit(train_data=train_dataset, model.fit(train_data=train_dataset,
eval_data=val_dataset, eval_data=val_dataset,
batch_size=train_cfg.TRAIN.batch_size, batch_size=train_cfg.TRAIN.batch_size,
epochs=args.epoch, epochs=train_cfg.TRAIN.epoch,
eval_freq=args.valid_interval, eval_freq=args.valid_interval,
log_freq=args.log_interval, log_freq=args.log_interval,
save_dir=args.save_dir, save_dir=args.save_dir,
......
...@@ -14,7 +14,6 @@ ...@@ -14,7 +14,6 @@
import paddle.fluid as fluid import paddle.fluid as fluid
from paddle.fluid import ParamAttr from paddle.fluid import ParamAttr
from paddle.fluid.framework import in_dygraph_mode
import numpy as np import numpy as np
import math import math
...@@ -27,7 +26,7 @@ DATATYPE = 'float32' ...@@ -27,7 +26,7 @@ DATATYPE = 'float32'
pretrain_infos = { pretrain_infos = {
'bmn': ('https://paddlemodels.bj.bcebos.com/hapi/bmn.pdparams', 'bmn': ('https://paddlemodels.bj.bcebos.com/hapi/bmn.pdparams',
'9286c821acc4cad46d6613b931ba468c') '9286c821acc4cad46d6613b931ba468c')
} }
...@@ -131,17 +130,23 @@ class BMN(Model): ...@@ -131,17 +130,23 @@ class BMN(Model):
`"BMN: Boundary-Matching Network for Temporal Action Proposal Generation" <https://arxiv.org/abs/1907.09702>`_ `"BMN: Boundary-Matching Network for Temporal Action Proposal Generation" <https://arxiv.org/abs/1907.09702>`_
Args: Args:
cfg (AttrDict): configs for BMN model tscale (int): sequence length, default 100.
dscale (int): max duration length, default 100.
prop_boundary_ratio (float): ratio of expanded temporal region in proposal boundary, default 0.5.
num_sample (int): number of samples betweent starting boundary and ending boundary of each propoasl, default 32.
num_sample_perbin (int): number of selected points in each sample, default 3.
""" """
def __init__(self, cfg):
def __init__(self, tscale, dscale, prop_boundary_ratio, num_sample,
num_sample_perbin):
super(BMN, self).__init__() super(BMN, self).__init__()
#init config #init config
self.tscale = cfg.MODEL.tscale self.tscale = tscale
self.dscale = cfg.MODEL.dscale self.dscale = dscale
self.prop_boundary_ratio = cfg.MODEL.prop_boundary_ratio self.prop_boundary_ratio = prop_boundary_ratio
self.num_sample = cfg.MODEL.num_sample self.num_sample = num_sample
self.num_sample_perbin = cfg.MODEL.num_sample_perbin self.num_sample_perbin = num_sample_perbin
self.hidden_dim_1d = 256 self.hidden_dim_1d = 256
self.hidden_dim_2d = 128 self.hidden_dim_2d = 128
...@@ -192,23 +197,17 @@ class BMN(Model): ...@@ -192,23 +197,17 @@ class BMN(Model):
padding=1, padding=1,
act="relu") act="relu")
# init to speed up # get sample mask
sample_mask_array = get_interp1d_mask( sample_mask_array = get_interp1d_mask(
self.tscale, self.dscale, self.prop_boundary_ratio, self.tscale, self.dscale, self.prop_boundary_ratio,
self.num_sample, self.num_sample_perbin) self.num_sample, self.num_sample_perbin)
if in_dygraph_mode(): self.sample_mask = fluid.layers.create_parameter(
self.sample_mask = fluid.dygraph.base.to_variable( shape=[self.tscale, self.num_sample * self.dscale * self.tscale],
sample_mask_array) dtype=DATATYPE,
else: # static attr=fluid.ParamAttr(
self.sample_mask = fluid.layers.create_parameter( name="sample_mask", trainable=False),
shape=[ default_initializer=fluid.initializer.NumpyArrayInitializer(
self.tscale, self.num_sample * self.dscale * self.tscale sample_mask_array))
],
dtype=DATATYPE,
attr=fluid.ParamAttr(
name="sample_mask", trainable=False),
default_initializer=fluid.initializer.NumpyArrayInitializer(
sample_mask_array))
self.sample_mask.stop_gradient = True self.sample_mask.stop_gradient = True
...@@ -292,23 +291,27 @@ class BmnLoss(Loss): ...@@ -292,23 +291,27 @@ class BmnLoss(Loss):
"""Loss for BMN model """Loss for BMN model
Args: Args:
cfg (AttrDict): configs for BMN model tscale (int): sequence length, default 100.
dscale (int): max duration length, default 100.
""" """
def __init__(self, cfg):
def __init__(self, tscale, dscale):
super(BmnLoss, self).__init__() super(BmnLoss, self).__init__()
self.cfg = cfg self.tscale = tscale
self.dscale = dscale
def _get_mask(self): def _get_mask(self):
dscale = self.cfg.MODEL.dscale
tscale = self.cfg.MODEL.tscale
bm_mask = [] bm_mask = []
for idx in range(dscale): for idx in range(self.dscale):
mask_vector = [1 for i in range(tscale - idx) mask_vector = [1 for i in range(self.tscale - idx)
] + [0 for i in range(idx)] ] + [0 for i in range(idx)]
bm_mask.append(mask_vector) bm_mask.append(mask_vector)
bm_mask = np.array(bm_mask, dtype=np.float32) bm_mask = np.array(bm_mask, dtype=np.float32)
self_bm_mask = fluid.layers.create_global_var( self_bm_mask = fluid.layers.create_global_var(
shape=[dscale, tscale], value=0, dtype=DATATYPE, persistable=True) shape=[self.dscale, self.tscale],
value=0,
dtype=DATATYPE,
persistable=True)
fluid.layers.assign(bm_mask, self_bm_mask) fluid.layers.assign(bm_mask, self_bm_mask)
self_bm_mask.stop_gradient = True self_bm_mask.stop_gradient = True
return self_bm_mask return self_bm_mask
...@@ -437,15 +440,24 @@ class BmnLoss(Loss): ...@@ -437,15 +440,24 @@ class BmnLoss(Loss):
return loss return loss
def bmn(cfg, pretrained=True): def bmn(tscale,
dscale,
prop_boundary_ratio,
num_sample,
num_sample_perbin,
pretrained=True):
"""BMN model """BMN model
Args: Args:
cfg (AttrDict): configs for BMN model tscale (int): sequence length, default 100.
pretrained (bool): If True, returns a model with pre-trained model dscale (int): max duration length, default 100.
on COCO, default True prop_boundary_ratio (float): ratio of expanded temporal region in proposal boundary, default 0.5.
num_sample (int): number of samples betweent starting boundary and ending boundary of each propoasl, default 32.
num_sample_perbin (int): number of selected points in each sample, default 3.
pretrained (bool): If True, returns a model with pre-trained model, default True.
""" """
model = BMN(cfg) model = BMN(tscale, dscale, prop_boundary_ratio, num_sample,
num_sample_perbin)
if pretrained: if pretrained:
weight_path = get_weights_path(*(pretrain_infos['bmn'])) weight_path = get_weights_path(*(pretrain_infos['bmn']))
assert weight_path.endswith('.pdparams'), \ assert weight_path.endswith('.pdparams'), \
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册