Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
models
提交
cd1a1963
M
models
项目概览
PaddlePaddle
/
models
1 年多 前同步成功
通知
222
Star
6828
Fork
2962
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
602
列表
看板
标记
里程碑
合并请求
255
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
M
models
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
602
Issue
602
列表
看板
标记
里程碑
合并请求
255
合并请求
255
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
cd1a1963
编写于
9月 28, 2019
作者:
S
shippingwang
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
add tall models
上级
554d8864
变更
14
展开全部
隐藏空白更改
内联
并排
Showing
14 changed file
with
1331 addition
and
1 deletion
+1331
-1
PaddleCV/PaddleVideo/README.md
PaddleCV/PaddleVideo/README.md
+11
-1
PaddleCV/PaddleVideo/configs/tall.yaml
PaddleCV/PaddleVideo/configs/tall.yaml
+35
-0
PaddleCV/PaddleVideo/data/dataset/README.md
PaddleCV/PaddleVideo/data/dataset/README.md
+4
-0
PaddleCV/PaddleVideo/metrics/metrics_util.py
PaddleCV/PaddleVideo/metrics/metrics_util.py
+23
-0
PaddleCV/PaddleVideo/metrics/tall/__init__.py
PaddleCV/PaddleVideo/metrics/tall/__init__.py
+0
-0
PaddleCV/PaddleVideo/metrics/tall/accuracy_metrics.py
PaddleCV/PaddleVideo/metrics/tall/accuracy_metrics.py
+152
-0
PaddleCV/PaddleVideo/models/__init__.py
PaddleCV/PaddleVideo/models/__init__.py
+2
-0
PaddleCV/PaddleVideo/models/tall/README.md
PaddleCV/PaddleVideo/models/tall/README.md
+6
-0
PaddleCV/PaddleVideo/models/tall/__init__.py
PaddleCV/PaddleVideo/models/tall/__init__.py
+1
-0
PaddleCV/PaddleVideo/models/tall/tall.py
PaddleCV/PaddleVideo/models/tall/tall.py
+625
-0
PaddleCV/PaddleVideo/models/tall/tall_model.py
PaddleCV/PaddleVideo/models/tall/tall_model.py
+97
-0
PaddleCV/PaddleVideo/reader/__init__.py
PaddleCV/PaddleVideo/reader/__init__.py
+2
-0
PaddleCV/PaddleVideo/reader/tacos_reader.py
PaddleCV/PaddleVideo/reader/tacos_reader.py
+67
-0
PaddleCV/PaddleVideo/reader/tall_reader.py
PaddleCV/PaddleVideo/reader/tall_reader.py
+306
-0
未找到文件。
PaddleCV/PaddleVideo/README.md
浏览文件 @
cd1a1963
...
...
@@ -16,10 +16,12 @@
|
[
C-TCN
](
./models/ctcn/README.md
)
| 视频动作定位| 2018年ActivityNet夺冠方案 |
|
[
BSN
](
./models/bsn/README.md
)
| 视频动作定位| 为视频动作定位问题提供高效的proposal生成方法 |
|
[
BMN
](
./models/bmn/README.md
)
| 视频动作定位| 2019年ActivityNet夺冠方案 |
|
[
TALL
](
./models/tall/README.md
)
| 视频检索 | |
### 主要特点
-
包含视频分类和动作定位方向的多个主流领先模型,其中Attention LSTM,Attention Cluster和NeXtVLAD是比较流行的特征序列模型,Non-local, TSN, TSM和StNet是End-to-End的视频分类模型。Attention LSTM模型速度快精度高,NeXtVLAD是2nd-Youtube-8M比赛中最好的单模型, TSN是基于2D-CNN的经典解决方案,TSM是基于时序移位的简单高效视频时空建模方法,Non-local模型提出了视频非局部关联建模方法。Attention Cluster和StNet是百度自研模型,分别发表于CVPR2018和AAAI2019,是Kinetics600比赛第一名中使用到的模型。C-TCN动作定位模型也是百度自研,2018年ActivityNet比赛的夺冠方案。BSN模型采用自底向上的方法生成proposal,为视频动作定位问题中proposal的生成提供高效的解决方案。BMN模型是百度自研模型,2019年ActivityNet夺冠方案。
-
包含视频分类和动作定位方向的多个主流领先模型,其中Attention LSTM,Attention Cluster和NeXtVLAD是比较流行的特征序列模型,Non-local, TSN, TSM和StNet是End-to-End的视频分类模型。Attention LSTM模型速度快精度高,NeXtVLAD是2nd-Youtube-8M比赛中最好的单模型, TSN是基于2D-CNN的经典解决方案,TSM是基于时序移位的简单高效视频时空建模方法,Non-local模型提出了视频非局部关联建模方法。Attention Cluster和StNet是百度自研模型,分别发表于CVPR2018和AAAI2019,是Kinetics600比赛第一名中使用到的模型。C-TCN动作定位模型也是百度自研,2018年ActivityNet比赛的夺冠方案。BSN模型采用自底向上的方法生成proposal,为视频动作定位问题中proposal的生成提供高效的解决方案。BMN模型是百度自研模型,2019年ActivityNet夺冠方案。
TALL模型
-
提供了适合视频分类和动作定位任务的通用骨架代码,用户可一键式高效配置模型完成训练和评测。
...
...
@@ -178,6 +180,13 @@ run.sh
| BSN | 16 | 1卡K40 | 7.0 | 66.64% (AUC) |
[
model-tem
](
https://paddlemodels.bj.bcebos.com/video_detection/BsnTem_final.pdparams
)
,
[
model-pem
](
https://paddlemodels.bj.bcebos.com/video_detection/BsnPem_final.pdparams
)
|
| BMN | 16 | 4卡K40 | 7.0 | 67.19% (AUC) |
[
model
](
https://paddlemodels.bj.bcebos.com/video_detection/BMN_final.pdparams
)
|
基于TACoS数据集的视频检索模型:
| 模型 | Batch Size | 环境配置 | cuDNN版本 | 精度 | 下载链接 |
| :-: | :-: | :-: | :-: | :-: | :-: |
| TALL | 56 | 1卡K40 | 7.2 |
## 参考文献
...
...
@@ -190,6 +199,7 @@ run.sh
-
[
Non-local Neural Networks
](
https://arxiv.org/abs/1711.07971v1
)
, Xiaolong Wang, Ross Girshick, Abhinav Gupta, Kaiming He
-
[
Bsn: Boundary sensitive network for temporal action proposal generation
](
http://arxiv.org/abs/1806.02964
)
, Tianwei Lin, Xu Zhao, Haisheng Su, Chongjing Wang, Ming Yang.
-
[
BMN: Boundary-Matching Network for Temporal Action Proposal Generation
](
https://arxiv.org/abs/1907.09702
)
, Tianwei Lin, Xiao Liu, Xin Li, Errui Ding, Shilei Wen.
-
[
TALL: Temporal Activity Localization via Language Query
](
https://arxiv.org/abs/1705.02101
)
, Jiyang Gao, Chen Sun, Zhenheng Yang, Ram Nevatia.
## 版本更新
...
...
PaddleCV/PaddleVideo/configs/tall.yaml
0 → 100644
浏览文件 @
cd1a1963
MODEL
:
name
:
"
TALL"
visual_feature_dim
:
12288
sentence_embedding_size
:
4800
semantic_size
:
1024
hidden_size
:
1000
output_size
:
3
TRAIN
:
epoch
:
21
use_gpu
:
True
num_gpus
:
1
batch_size
:
56
feats_dimen
:
4096
off_size
:
2
context_num
:
1
context_size
:
128
visual_feature_dim
:
12288
sent_vec_dim
:
4800
sliding_clip_path
:
"
data/dataset/tacos/Interval64_128_256_512_overlap0.8_c3d_fc6/"
clip_sentvec
:
"
data/dataset/tacos/train_clip-sentvec.pkl"
movie_length_info
:
"
data/dataset/tacos/video_allframes_info.pkl"
dataset
:
TACoS
model
:
TALL
VALID
:
batch_size
:
1
context_num
:
1
context_size
:
128
feats_dimen
:
4096
visual_feature_dim
:
12288
sent_vec_dim
:
4800
semantic_size
:
4800
sliding_clip_path
:
"
data/dataset/tacos/Interval128_256_overlap0.8_c3d_fc6/"
clip_sentvec
:
"
data/dataset/tacos/test_clip-sentvec.pkl"
PaddleCV/PaddleVideo/data/dataset/README.md
浏览文件 @
cd1a1963
...
...
@@ -162,3 +162,7 @@ Non-local模型也使用kinetics数据集,不过其数据处理方式和其他
## C-TCN
C-TCN模型使用ActivityNet 1.3数据集,具体使用方法见
[
C-TCN数据说明
](
./ctcn/README.md
)
## TALL
TALL模型使用TACoS数据集,具体使用方法见
[
TALL数据说明
](
./tall/README.md
)
PaddleCV/PaddleVideo/metrics/metrics_util.py
浏览文件 @
cd1a1963
...
...
@@ -28,6 +28,10 @@ from metrics.detections import detection_metrics as detection_metrics
from
metrics.bmn_metrics
import
bmn_proposal_metrics
as
bmn_proposal_metrics
from
metrics.bsn_metrics
import
bsn_tem_metrics
as
bsn_tem_metrics
from
metrics.bsn_metrics
import
bsn_pem_metrics
as
bsn_pem_metrics
from
metrics.tall
import
accuracy_metrics
as
tall_metrics
logger
=
logging
.
getLogger
(
__name__
)
...
...
@@ -420,6 +424,24 @@ class BsnPemMetrics(Metrics):
def
reset
(
self
):
self
.
calculator
.
reset
()
##shipping
class
TallMetrics
(
Metrics
):
def
__init__
(
self
,
name
,
model
,
cfg
):
self
.
name
=
name
self
.
mode
=
mode
self
.
calculator
=
tall_metrics
.
MetricsCalculator
(
cfg
=
cfg
,
name
=
self
.
name
,
mode
=
self
.
mode
)
def
calculator_and_log_out
(
self
,
fetch_list
,
info
=
""
):
loss
=
np
.
array
(
fetch_list
[
0
])
logger
.
info
(
info
+
'
\t
Loss = {}'
.
format
(
'%.6f'
%
np
.
mean
(
loss
)))
def
accumalate
()
def
finalize_and_log_out
(
self
,
info
=
""
,
savedir
=
"/"
):
def
reset
(
self
):
self
.
calculator
.
clear
()
class
MetricsZoo
(
object
):
def
__init__
(
self
):
...
...
@@ -461,3 +483,4 @@ regist_metrics("CTCN", DetectionMetrics)
regist_metrics
(
"BMN"
,
BmnMetrics
)
regist_metrics
(
"BSNTEM"
,
BsnTemMetrics
)
regist_metrics
(
"BSNPEM"
,
BsnPemMetrics
)
redist_metrics
(
"TALL"
,
TallMetrics
)
PaddleCV/PaddleVideo/metrics/tall/__init__.py
0 → 100644
浏览文件 @
cd1a1963
PaddleCV/PaddleVideo/metrics/tall/accuracy_metrics.py
0 → 100644
浏览文件 @
cd1a1963
# Copyright 2016 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS-IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import
numpy
as
np
from
six.moves
import
xrange
import
time
import
pickle
import
operator
class
MetricsCalculator
():
def
__init__
(
self
,
name
,
mode
):
self
.
name
=
name
self
.
mode
=
mode
self
.
reset
()
def
reset
(
self
):
logger
.
info
(
"Resetting {} metrics..."
.
format
(
self
.
mode
))
return
def
finalize_metrics
(
self
):
return
def
calculate_metrics
(
self
,):
return
def
accumalate
(
self
):
return
def
calculate_reward_batch_withstop
(
Previou_IoU
,
current_IoU
,
t
):
batch_size
=
len
(
Previou_IoU
)
reward
=
torch
.
zeros
(
batch_size
)
for
i
in
range
(
batch_size
):
if
current_IoU
[
i
]
>
Previou_IoU
[
i
]
and
Previou_IoU
[
i
]
>=
0
:
reward
[
i
]
=
1
-
0.001
*
t
elif
current_IoU
[
i
]
<=
Previou_IoU
[
i
]
and
current_IoU
[
i
]
>=
0
:
reward
[
i
]
=
-
0.001
*
t
else
:
reward
[
i
]
=
-
1
-
0.001
*
t
return
reward
def
calculate_reward
(
Previou_IoU
,
current_IoU
,
t
):
if
current_IoU
>
Previou_IoU
and
Previou_IoU
>=
0
:
reward
=
1
-
0.001
*
t
elif
current_IoU
<=
Previou_IoU
and
current_IoU
>=
0
:
reward
=
-
0.001
*
t
else
:
reward
=
-
1
-
0.001
*
t
return
reward
def
calculate_RL_IoU_batch
(
i0
,
i1
):
# calculate temporal intersection over union
batch_size
=
len
(
i0
)
iou_batch
=
torch
.
zeros
(
batch_size
)
for
i
in
range
(
len
(
i0
)):
union
=
(
min
(
i0
[
i
][
0
],
i1
[
i
][
0
]),
max
(
i0
[
i
][
1
],
i1
[
i
][
1
]))
inter
=
(
max
(
i0
[
i
][
0
],
i1
[
i
][
0
]),
min
(
i0
[
i
][
1
],
i1
[
i
][
1
]))
# if inter[1] < inter[0]:
# iou = 0
# else:
iou
=
1.0
*
(
inter
[
1
]
-
inter
[
0
])
/
(
union
[
1
]
-
union
[
0
])
iou_batch
[
i
]
=
iou
return
iou_batch
def
calculate_IoU
(
i0
,
i1
):
# calculate temporal intersection over union
union
=
(
min
(
i0
[
0
],
i1
[
0
]),
max
(
i0
[
1
],
i1
[
1
]))
inter
=
(
max
(
i0
[
0
],
i1
[
0
]),
min
(
i0
[
1
],
i1
[
1
]))
iou
=
1.0
*
(
inter
[
1
]
-
inter
[
0
])
/
(
union
[
1
]
-
union
[
0
])
return
iou
def
nms_temporal
(
x1
,
x2
,
sim
,
overlap
):
pick
=
[]
assert
len
(
x1
)
==
len
(
sim
)
assert
len
(
x2
)
==
len
(
sim
)
if
len
(
x1
)
==
0
:
return
pick
union
=
map
(
operator
.
sub
,
x2
,
x1
)
# union = x2-x1
I
=
[
i
[
0
]
for
i
in
sorted
(
enumerate
(
sim
),
key
=
lambda
x
:
x
[
1
])]
# sort and get index
while
len
(
I
)
>
0
:
i
=
I
[
-
1
]
pick
.
append
(
i
)
xx1
=
[
max
(
x1
[
i
],
x1
[
j
])
for
j
in
I
[:
-
1
]]
xx2
=
[
min
(
x2
[
i
],
x2
[
j
])
for
j
in
I
[:
-
1
]]
inter
=
[
max
(
0.0
,
k2
-
k1
)
for
k1
,
k2
in
zip
(
xx1
,
xx2
)]
o
=
[
inter
[
u
]
/
(
union
[
i
]
+
union
[
I
[
u
]]
-
inter
[
u
])
for
u
in
range
(
len
(
I
)
-
1
)]
I_new
=
[]
for
j
in
range
(
len
(
o
)):
if
o
[
j
]
<=
overlap
:
I_new
.
append
(
I
[
j
])
I
=
I_new
return
pick
def
compute_IoU_recall_top_n_forreg_rl
(
top_n
,
iou_thresh
,
sentence_image_reg_mat
,
sclips
):
correct_num
=
0.0
for
k
in
range
(
sentence_image_reg_mat
.
shape
[
0
]):
gt
=
sclips
[
k
]
# print(gt)
gt_start
=
float
(
gt
.
split
(
"_"
)[
1
])
gt_end
=
float
(
gt
.
split
(
"_"
)[
2
])
pred_start
=
sentence_image_reg_mat
[
k
,
0
]
pred_end
=
sentence_image_reg_mat
[
k
,
1
]
iou
=
calculate_IoU
((
gt_start
,
gt_end
),(
pred_start
,
pred_end
))
if
iou
>=
iou_thresh
:
correct_num
+=
1
return
correct_num
def
compute_IoU_recall_top_n_forreg
(
top_n
,
iou_thresh
,
sentence_image_mat
,
sentence_image_reg_mat
,
sclips
,
iclips
):
correct_num
=
0.0
for
k
in
range
(
sentence_image_mat
.
shape
[
0
]):
gt
=
sclips
[
k
]
gt_start
=
float
(
gt
.
split
(
"_"
)[
1
])
gt_end
=
float
(
gt
.
split
(
"_"
)[
2
])
#print gt +" "+str(gt_start)+" "+str(gt_end)
sim_v
=
[
v
for
v
in
sentence_image_mat
[
k
]]
starts
=
[
s
for
s
in
sentence_image_reg_mat
[
k
,:,
0
]]
ends
=
[
e
for
e
in
sentence_image_reg_mat
[
k
,:,
1
]]
picks
=
nms_temporal
(
starts
,
ends
,
sim_v
,
iou_thresh
-
0.05
)
#sim_argsort=np.argsort(sim_v)[::-1][0:top_n]
if
top_n
<
len
(
picks
):
picks
=
picks
[
0
:
top_n
]
for
index
in
picks
:
pred_start
=
sentence_image_reg_mat
[
k
,
index
,
0
]
pred_end
=
sentence_image_reg_mat
[
k
,
index
,
1
]
iou
=
calculate_IoU
((
gt_start
,
gt_end
),(
pred_start
,
pred_end
))
if
iou
>=
iou_thresh
:
correct_num
+=
1
break
return
correct_num
PaddleCV/PaddleVideo/models/__init__.py
浏览文件 @
cd1a1963
...
...
@@ -10,6 +10,7 @@ from .ctcn import CTCN
from
.bmn
import
BMN
from
.bsn
import
BsnTem
from
.bsn
import
BsnPem
from
.tall
import
TALL
# regist models, sort by alphabet
regist_model
(
"AttentionCluster"
,
AttentionCluster
)
...
...
@@ -23,3 +24,4 @@ regist_model("CTCN", CTCN)
regist_model
(
"BMN"
,
BMN
)
regist_model
(
"BsnTem"
,
BsnTem
)
regist_model
(
"BsnPem"
,
BsnPem
)
regist_model
(
"TALL"
,
TALL
)
PaddleCV/PaddleVideo/models/tall/README.md
0 → 100644
浏览文件 @
cd1a1963
# TALL 视频模型
---
## 内容
-[模型介绍]
PaddleCV/PaddleVideo/models/tall/__init__.py
0 → 100644
浏览文件 @
cd1a1963
from
.tall
import
*
PaddleCV/PaddleVideo/models/tall/tall.py
0 → 100644
浏览文件 @
cd1a1963
此差异已折叠。
点击以展开。
PaddleCV/PaddleVideo/models/tall/tall_model.py
0 → 100644
浏览文件 @
cd1a1963
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import
os
import
time
import
sys
import
paddle.fluid
as
fluid
class
TALL
(
object
):
def
__init__
(
self
,
mode
,
cfg
):
self
.
images
=
cfg
[
"images"
]
self
.
sentences
=
cfg
[
"sentences"
]
if
self
.
mode
==
"train"
:
self
.
offsets
=
cfg
[
offsets
]
self
.
semantic_size
=
cfg
[
"semantic_size"
]
self
.
hidden_size
=
cfg
[
"hidden_size"
]
self
.
output_size
=
cfg
[
"output_size"
]
def
_cross_modal_comb
(
visual_feat
,
sentence_embed
):
#batch_size = visual_feat.size(0)
visual_feat
=
fluid
.
layers
.
reshape
(
visual_feat
,
[
1
,
-
1
,
semantic_size
])
vv_feature
=
fluid
.
layers
.
expand
(
visual_feat
,
[
train_batch_size
,
1
,
1
])
sentence_embed
=
fluid
.
layers
.
reshape
(
sentence_embed
,
[
-
1
,
1
,
semantic_size
])
ss_feature
=
fluid
.
layers
.
expand
(
sentence_embed
,
[
1
,
train_batch_size
,
1
])
concat_feature
=
fluid
.
layers
.
concat
([
vv_feature
,
ss_feature
],
axis
=
2
)
#1,1,2048
mul_feature
=
vv_feature
*
ss_feature
# B,B,1024
add_feature
=
vv_feature
+
ss_feature
# B,B,1024
comb_feature
=
fluid
.
layers
.
concat
([
mul_feature
,
add_feature
,
concat_feature
],
axis
=
2
)
return
comb_feature
def
net
(
self
)
# visual2semantic
transformed_clip_train
=
fluid
.
layers
.
fc
(
input
=
self
.
images
,
size
=
semantic_size
,
act
=
None
,
name
=
'v2s_lt'
,
param_attr
=
fluid
.
ParamAttr
(
name
=
'v2s_lt_weights'
,
initializer
=
fluid
.
initializer
.
NormalInitializer
(
loc
=
0.0
,
scale
=
1.0
,
seed
=
0
)),
bias_attr
=
False
)
#l2_normalize
transformed_clip_train
=
fluid
.
layers
.
l2_normalize
(
x
=
transformed_clip_train
,
axis
=
1
)
# sentence2semantic
transformed_sentence_train
=
fluid
.
layers
.
fc
(
input
=
self
.
sentences
,
size
=
semantic_size
,
act
=
None
,
name
=
's2s_lt'
,
param_attr
=
fluid
.
ParamAttr
(
name
=
's2s_lt_weights'
,
initializer
=
fluid
.
initializer
.
NormalInitializer
(
loc
=
0.0
,
scale
=
1.0
,
seed
=
0
)),
bias_attr
=
False
)
#l2_normalize
transformed_sentence_train
=
fluid
.
layers
.
l2_normalize
(
x
=
transformed_sentence_train
,
axis
=
1
)
cross_modal_vec_train
=
_cross_modal_comb
(
transformed_clip_train
,
transformed_sentence_train
)
cross_modal_vec_train
=
fluid
.
layers
.
unsqueeze
(
input
=
cross_modal_vec_train
,
axes
=
[
0
])
cross_modal_vec_train
=
fluid
.
layers
.
transpose
(
cross_modal_vec_train
,
perm
=
[
0
,
3
,
1
,
2
])
mid_output
=
fluid
.
layers
.
conv2d
(
input
=
cross_modal_vec_train
,
num_filters
=
hidden_size
,
filter_size
=
1
,
stride
=
1
,
act
=
"relu"
,
param_attr
=
fluid
.
param_attr
.
ParamAttr
(
name
=
"mid_out_weights"
),
bias_attr
=
False
)
sim_score_mat_train
=
fluid
.
layers
.
conv2d
(
input
=
mid_output
,
num_filters
=
output_size
,
filter_size
=
1
,
stride
=
1
,
act
=
None
,
param_attr
=
fluid
.
param_attr
.
ParamAttr
(
name
=
"sim_mat_weights"
),
bias_attr
=
False
)
self
.
sim_score_mat_train
=
fluid
.
layers
.
squeeze
(
input
=
sim_score_mat_train
,
axes
=
[
0
])
return
self
.
sim_score_mat_train
,
self
.
offsets
PaddleCV/PaddleVideo/reader/__init__.py
浏览文件 @
cd1a1963
...
...
@@ -6,6 +6,7 @@ from .ctcn_reader import CTCNReader
from
.bmn_reader
import
BMNReader
from
.bsn_reader
import
BSNVideoReader
from
.bsn_reader
import
BSNProposalReader
from
.tall_reader
import
TALLReader
# regist reader, sort by alphabet
regist_reader
(
"ATTENTIONCLUSTER"
,
FeatureReader
)
...
...
@@ -19,3 +20,4 @@ regist_reader("CTCN", CTCNReader)
regist_reader
(
"BMN"
,
BMNReader
)
regist_reader
(
"BSNTEM"
,
BSNVideoReader
)
regist_reader
(
"BSNPEM"
,
BSNProposalReader
)
regist_reader
(
"TALL"
,
TALLReader
)
PaddleCV/PaddleVideo/reader/tacos_reader.py
0 → 100644
浏览文件 @
cd1a1963
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import
os
import
sys
import
cv2
import
math
import
random
import
functools
try
:
import
cPickle
as
pickle
from
cStringIO
import
StringIO
except
ImportError
:
import
pickle
from
io
import
BytesIO
import
numpy
as
np
import
paddle
from
PIL
import
Image
,
ImageEnhance
import
logging
from
.reader_utils
import
DataReader
class
TacosReader
(
DataReader
):
def
__init__
(
self
,
name
,
mode
,
cfg
):
self
.
name
=
name
self
.
mode
=
mode
self
.
cfg
=
cfg
def
create_reader
(
self
):
cfg
=
self
.
cfg
mode
=
self
.
mode
num_reader_threads
=
cfg
[
mode
.
upper
()][
'num_reader_threads'
]
assert
num_reader_threads
>=
1
,
\
"number of reader threads({}) should be a positive integer"
.
format
(
num_reader_threads
)
if
num_reader_threads
==
1
:
reader_func
=
make_reader
else
:
reader_func
=
make_multi_reader
filelist
=
cfg
[
mode
.
upper
()][
''
]
if
self
.
mode
==
'train'
:
return
reader_func
()
elif
self
.
mode
==
'valid'
:
return
reader_func
()
else
:
logger
.
info
(
"Not implemented"
)
raise
NotImplementedError
def
make_reader
(
cfg
):
def
reader
():
cs
=
cPickle
.
load
(
open
(
cfg
.
TRAIN
.
train_clip_sentvec
))
movie_length_info
=
cPickle
.
load
(
open
(
cfg
.
TRAIN
.
movie_length_info
))
#put train() in here
PaddleCV/PaddleVideo/reader/tall_reader.py
0 → 100644
浏览文件 @
cd1a1963
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
import
os
import
sys
import
cPickle
import
random
import
numpy
as
np
import
math
import
paddle
import
paddle.fluid
as
fluid
import
functools
import
pdb
random
.
seed
(
0
)
THREAD
=
8
BUF_SIZE
=
1024
class
TallReader
(
DataReader
):
def
__init__
(
self
,
name
,
mode
,
cfg
):
self
.
name
=
name
self
.
mode
=
mode
self
.
cfg
=
cfg
def
create_reader
(
self
):
cfg
=
self
.
cfg
mode
=
self
.
mode
if
self
.
mode
==
'train'
:
train_batch_size
=
cfg
.
TRAIN
.
batch_size
return
paddle
.
batch
(
train
(
cfg
),
batch_size
=
train_batch_size
,
drop_last
=
True
)
elif
self
.
mode
==
'valid'
:
return
test
(
cfg
)
else
:
logger
.
info
(
"Not implemented"
)
raise
NotImplementedError
'''
calculate temporal intersection over union
'''
def
calculate_IoU
(
i0
,
i1
):
union
=
(
min
(
i0
[
0
],
i1
[
0
]),
max
(
i0
[
1
],
i1
[
1
]))
inter
=
(
max
(
i0
[
0
],
i1
[
0
]),
min
(
i0
[
1
],
i1
[
1
]))
iou
=
1.0
*
(
inter
[
1
]
-
inter
[
0
])
/
(
union
[
1
]
-
union
[
0
])
return
iou
'''
calculate the non Intersection part over Length ratia, make sure the input IoU is larger than 0
'''
#[(x1_max-x1_min)-overlap]/(x1_max-x1_min)
def
calculate_nIoL
(
base
,
sliding_clip
):
inter
=
(
max
(
base
[
0
],
sliding_clip
[
0
]),
min
(
base
[
1
],
sliding_clip
[
1
]))
inter_l
=
inter
[
1
]
-
inter
[
0
]
length
=
sliding_clip
[
1
]
-
sliding_clip
[
0
]
nIoL
=
1.0
*
(
length
-
inter_l
)
/
length
return
nIoL
def
get_context_window
(
sliding_clip_path
,
clip_name
,
win_length
,
context_size
,
feats_dimen
):
# compute left (pre) and right (post) context features based on read_unit_level_feats().
movie_name
=
clip_name
.
split
(
"_"
)[
0
]
start
=
int
(
clip_name
.
split
(
"_"
)[
1
])
end
=
int
(
clip_name
.
split
(
"_"
)[
2
].
split
(
"."
)[
0
])
clip_length
=
context_size
left_context_feats
=
np
.
zeros
([
win_length
,
feats_dimen
],
dtype
=
np
.
float32
)
right_context_feats
=
np
.
zeros
([
win_length
,
feats_dimen
],
dtype
=
np
.
float32
)
last_left_feat
=
np
.
load
(
sliding_clip_path
+
clip_name
)
last_right_feat
=
np
.
load
(
sliding_clip_path
+
clip_name
)
for
k
in
range
(
win_length
):
left_context_start
=
start
-
clip_length
*
(
k
+
1
)
left_context_end
=
start
-
clip_length
*
k
right_context_start
=
end
+
clip_length
*
k
right_context_end
=
end
+
clip_length
*
(
k
+
1
)
left_context_name
=
movie_name
+
"_"
+
str
(
left_context_start
)
+
"_"
+
str
(
left_context_end
)
+
".npy"
right_context_name
=
movie_name
+
"_"
+
str
(
right_context_start
)
+
"_"
+
str
(
right_context_end
)
+
".npy"
if
os
.
path
.
exists
(
sliding_clip_path
+
left_context_name
):
left_context_feat
=
np
.
load
(
sliding_clip_path
+
left_context_name
)
last_left_feat
=
left_context_feat
else
:
left_context_feat
=
last_left_feat
if
os
.
path
.
exists
(
sliding_clip_path
+
right_context_name
):
right_context_feat
=
np
.
load
(
sliding_clip_path
+
right_context_name
)
last_right_feat
=
right_context_feat
else
:
right_context_feat
=
last_right_feat
left_context_feats
[
k
]
=
left_context_feat
right_context_feats
[
k
]
=
right_context_feat
return
np
.
mean
(
left_context_feats
,
axis
=
0
),
np
.
mean
(
right_context_feats
,
axis
=
0
)
def
process_data
(
sample
,
is_train
):
clip_sentence_pair
,
sliding_clip_path
,
context_num
,
context_size
,
feats_dimen
,
sent_vec_dim
=
sample
if
is_train
:
offset
=
np
.
zeros
(
2
,
dtype
=
np
.
float32
)
clip_name
=
clip_sentence_pair
[
0
]
feat_path
=
sliding_clip_path
+
clip_sentence_pair
[
2
]
featmap
=
np
.
load
(
feat_path
)
left_context_feat
,
right_context_feat
=
get_context_window
(
sliding_clip_path
,
clip_sentence_pair
[
2
],
context_num
,
context_size
,
feats_dimen
)
image
=
np
.
hstack
((
left_context_feat
,
featmap
,
right_context_feat
))
sentence
=
clip_sentence_pair
[
1
][:
sent_vec_dim
]
p_offset
=
clip_sentence_pair
[
3
]
l_offset
=
clip_sentence_pair
[
4
]
offset
[
0
]
=
p_offset
offset
[
1
]
=
l_offset
return
image
,
sentence
,
offset
else
:
pass
def
make_train_reader
(
cfg
,
clip_sentence_pairs_iou
,
shuffle
=
False
,
is_train
=
True
):
sliding_clip_path
=
cfg
.
TRAIN
.
sliding_clip_path
context_num
=
cfg
.
TRAIN
.
context_num
context_size
=
cfg
.
TRAIN
.
context_size
feats_dimen
=
cfg
.
TRAIN
.
feats_dimen
sent_vec_dim
=
cfg
.
TRAIN
.
sent_vec_dim
def
reader
():
if
shuffle
:
random
.
shuffle
(
clip_sentence_pairs_iou
)
for
clip_sentence_pair
in
clip_sentence_pairs_iou
:
yield
[
clip_sentence_pair
,
sliding_clip_path
,
context_num
,
context_size
,
feats_dimen
,
sent_vec_dim
]
mapper
=
functools
.
partial
(
process_data
,
is_train
=
is_train
)
return
paddle
.
reader
.
xmap_readers
(
mapper
,
reader
,
THREAD
,
BUF_SIZE
)
def
train
(
cfg
):
## TALL
feats_dimen
=
cfg
.
TRAIN
.
feats_dimen
context_num
=
cfg
.
TRAIN
.
context_num
context_size
=
cfg
.
TRAIN
.
context_size
visual_feature_dim
=
cfg
.
TRAIN
.
visual_feature_dim
sent_vec_dim
=
cfg
.
TRAIN
.
sent_vec_dim
sliding_clip_path
=
cfg
.
TRAIN
.
sliding_clip_path
cs
=
cPickle
.
load
(
open
(
cfg
.
TRAIN
.
train_clip_sentvec
))
movie_length_info
=
cPickle
.
load
(
open
(
cfg
.
TRAIN
.
movie_length_info
))
clip_sentence_pairs
=
[]
for
l
in
cs
:
clip_name
=
l
[
0
]
sent_vecs
=
l
[
1
]
for
sent_vec
in
sent_vecs
:
clip_sentence_pairs
.
append
((
clip_name
,
sent_vec
))
#10146
print
"TRAIN: "
+
str
(
len
(
clip_sentence_pairs
))
+
" clip-sentence pairs are readed"
movie_names_set
=
set
()
movie_clip_names
=
{}
# read groundtruth sentence-clip pairs
for
k
in
range
(
len
(
clip_sentence_pairs
)):
clip_name
=
clip_sentence_pairs
[
k
][
0
]
movie_name
=
clip_name
.
split
(
"_"
)[
0
]
if
not
movie_name
in
movie_names_set
:
movie_names_set
.
add
(
movie_name
)
movie_clip_names
[
movie_name
]
=
[]
movie_clip_names
[
movie_name
].
append
(
k
)
movie_names
=
list
(
movie_names_set
)
num_samples
=
len
(
clip_sentence_pairs
)
print
"TRAIN: "
+
str
(
len
(
movie_names
))
+
" movies."
# read sliding windows, and match them with the groundtruths to make training samples
sliding_clips_tmp
=
os
.
listdir
(
sliding_clip_path
)
#161396
clip_sentence_pairs_iou
=
[]
#count = 0
for
clip_name
in
sliding_clips_tmp
:
if
clip_name
.
split
(
"."
)[
2
]
==
"npy"
:
movie_name
=
clip_name
.
split
(
"_"
)[
0
]
for
clip_sentence
in
clip_sentence_pairs
:
original_clip_name
=
clip_sentence
[
0
]
original_movie_name
=
original_clip_name
.
split
(
"_"
)[
0
]
if
original_movie_name
==
movie_name
:
start
=
int
(
clip_name
.
split
(
"_"
)[
1
])
end
=
int
(
clip_name
.
split
(
"_"
)[
2
].
split
(
"."
)[
0
])
o_start
=
int
(
original_clip_name
.
split
(
"_"
)[
1
])
o_end
=
int
(
original_clip_name
.
split
(
"_"
)[
2
].
split
(
"."
)[
0
])
iou
=
calculate_IoU
((
start
,
end
),
(
o_start
,
o_end
))
if
iou
>
0.5
:
nIoL
=
calculate_nIoL
((
o_start
,
o_end
),
(
start
,
end
))
if
nIoL
<
0.15
:
movie_length
=
movie_length_info
[
movie_name
.
split
(
"."
)[
0
]]
start_offset
=
o_start
-
start
end_offset
=
o_end
-
end
clip_sentence_pairs_iou
.
append
((
clip_sentence
[
0
],
clip_sentence
[
1
],
clip_name
,
start_offset
,
end_offset
))
# count += 1
# if count > 200:
# break
num_samples_iou
=
len
(
clip_sentence_pairs_iou
)
print
"TRAIN: "
+
str
(
len
(
clip_sentence_pairs_iou
))
+
" iou clip-sentence pairs are readed"
return
make_train_reader
(
cfg
,
clip_sentence_pairs_iou
,
shuffle
=
True
,
is_train
=
True
)
class
test
(
cfg
):
'''
'''
def
__init__
(
self
,
cfg
):
self
.
context_num
=
cfg
.
TEST
.
context_num
self
.
visual_feature_dim
=
cfg
.
TEST
.
visual_feature_dim
self
.
feats_dimen
=
cfg
.
TEST
.
feats_dimen
self
.
context_size
=
cfg
.
TEST
.
context_size
self
.
semantic_size
=
cfg
.
TEST
.
semantic_size
self
.
sliding_clip_path
=
cfg
.
TEST
.
sliding_clip_path
self
.
sent_vec_dim
=
cfg
.
TEST
.
sent_vec_dim
self
.
cs
=
cPickle
.
load
(
open
(
cfg
.
TEST
.
test_clip_sentvec
))
self
.
clip_sentence_pairs
=
[]
for
l
in
self
.
cs
:
clip_name
=
l
[
0
]
sent_vecs
=
l
[
1
]
for
sent_vec
in
sent_vecs
:
self
.
clip_sentence_pairs
.
append
((
clip_name
,
sent_vec
))
print
"TEST: "
+
str
(
len
(
self
.
clip_sentence_pairs
))
+
" pairs are readed"
movie_names_set
=
set
()
self
.
movie_clip_names
=
{}
for
k
in
range
(
len
(
self
.
clip_sentence_pairs
)):
clip_name
=
self
.
clip_sentence_pairs
[
k
][
0
]
movie_name
=
clip_name
.
split
(
"_"
)[
0
]
if
not
movie_name
in
movie_names_set
:
movie_names_set
.
add
(
movie_name
)
self
.
movie_clip_names
[
movie_name
]
=
[]
self
.
movie_clip_names
[
movie_name
].
append
(
k
)
self
.
movie_names
=
list
(
movie_names_set
)
print
"TEST: "
+
str
(
len
(
self
.
movie_names
))
+
" movies."
self
.
clip_num_per_movie_max
=
0
for
movie_name
in
self
.
movie_clip_names
:
if
len
(
self
.
movie_clip_names
[
movie_name
])
>
self
.
clip_num_per_movie_max
:
self
.
clip_num_per_movie_max
=
len
(
self
.
movie_clip_names
[
movie_name
])
print
"TEST: "
+
"Max number of clips in a movie is "
+
str
(
self
.
clip_num_per_movie_max
)
sliding_clips_tmp
=
os
.
listdir
(
self
.
sliding_clip_path
)
# 62741
self
.
sliding_clip_names
=
[]
for
clip_name
in
sliding_clips_tmp
:
if
clip_name
.
split
(
"."
)[
2
]
==
"npy"
:
movie_name
=
clip_name
.
split
(
"_"
)[
0
]
if
movie_name
in
self
.
movie_clip_names
:
self
.
sliding_clip_names
.
append
(
clip_name
.
split
(
"."
)[
0
]
+
"."
+
clip_name
.
split
(
"."
)[
1
])
self
.
num_samples
=
len
(
self
.
clip_sentence_pairs
)
print
"TEST: "
+
"sliding clips number: "
+
str
(
len
(
self
.
sliding_clip_names
))
def
get_test_context_window
(
self
,
clip_name
,
win_length
):
# compute left (pre) and right (post) context features based on read_unit_level_feats().
movie_name
=
clip_name
.
split
(
"_"
)[
0
]
start
=
int
(
clip_name
.
split
(
"_"
)[
1
])
end
=
int
(
clip_name
.
split
(
"_"
)[
2
].
split
(
"."
)[
0
])
clip_length
=
self
.
context_size
#128
left_context_feats
=
np
.
zeros
([
win_length
,
self
.
feats_dimen
],
dtype
=
np
.
float32
)
#(1,4096)
right_context_feats
=
np
.
zeros
([
win_length
,
self
.
feats_dimen
],
dtype
=
np
.
float32
)
#(1,4096)
last_left_feat
=
np
.
load
(
self
.
sliding_clip_path
+
clip_name
)
last_right_feat
=
np
.
load
(
self
.
sliding_clip_path
+
clip_name
)
for
k
in
range
(
win_length
):
left_context_start
=
start
-
clip_length
*
(
k
+
1
)
left_context_end
=
start
-
clip_length
*
k
right_context_start
=
end
+
clip_length
*
k
right_context_end
=
end
+
clip_length
*
(
k
+
1
)
left_context_name
=
movie_name
+
"_"
+
str
(
left_context_start
)
+
"_"
+
str
(
left_context_end
)
+
".npy"
right_context_name
=
movie_name
+
"_"
+
str
(
right_context_start
)
+
"_"
+
str
(
right_context_end
)
+
".npy"
if
os
.
path
.
exists
(
self
.
sliding_clip_path
+
left_context_name
):
left_context_feat
=
np
.
load
(
self
.
sliding_clip_path
+
left_context_name
)
last_left_feat
=
left_context_feat
else
:
left_context_feat
=
last_left_feat
if
os
.
path
.
exists
(
self
.
sliding_clip_path
+
right_context_name
):
right_context_feat
=
np
.
load
(
self
.
sliding_clip_path
+
right_context_name
)
last_right_feat
=
right_context_feat
else
:
right_context_feat
=
last_right_feat
left_context_feats
[
k
]
=
left_context_feat
right_context_feats
[
k
]
=
right_context_feat
return
np
.
mean
(
left_context_feats
,
axis
=
0
),
np
.
mean
(
right_context_feats
,
axis
=
0
)
def
load_movie_slidingclip
(
self
,
movie_name
,
sample_num
):
# load unit level feats and sentence vector
movie_clip_sentences
=
[]
movie_clip_featmap
=
[]
clip_set
=
set
()
for
k
in
range
(
len
(
self
.
clip_sentence_pairs
)):
if
movie_name
in
self
.
clip_sentence_pairs
[
k
][
0
]:
movie_clip_sentences
.
append
((
self
.
clip_sentence_pairs
[
k
][
0
],
self
.
clip_sentence_pairs
[
k
][
1
][:
self
.
semantic_size
]))
for
k
in
range
(
len
(
self
.
sliding_clip_names
)):
if
movie_name
in
self
.
sliding_clip_names
[
k
]:
# print str(k)+"/"+str(len(self.movie_clip_names[movie_name]))
visual_feature_path
=
self
.
sliding_clip_path
+
self
.
sliding_clip_names
[
k
]
+
".npy"
#context_feat=self.get_context(self.sliding_clip_names[k]+".npy")
left_context_feat
,
right_context_feat
=
self
.
get_test_context_window
(
self
.
sliding_clip_names
[
k
]
+
".npy"
,
1
)
feature_data
=
np
.
load
(
visual_feature_path
)
#comb_feat=np.hstack((context_feat,feature_data))
comb_feat
=
np
.
hstack
((
left_context_feat
,
feature_data
,
right_context_feat
))
movie_clip_featmap
.
append
((
self
.
sliding_clip_names
[
k
],
comb_feat
))
return
movie_clip_featmap
,
movie_clip_sentences
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录