Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
BaiXuePrincess
PaddleRec
提交
8404c7af
P
PaddleRec
项目概览
BaiXuePrincess
/
PaddleRec
与 Fork 源项目一致
Fork自
PaddlePaddle / PaddleRec
通知
1
Star
0
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
PaddleRec
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
未验证
提交
8404c7af
编写于
6月 12, 2020
作者:
X
xujiaqi01
提交者:
GitHub
6月 12, 2020
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
fix recall py3 bug, and dnn (#71)
* fix recall py3 bug Co-authored-by:
N
tangwei12
<
tangwei12@baidu.com
>
上级
5cd75923
变更
17
隐藏空白更改
内联
并排
Showing
17 changed file
with
31 addition
and
26 deletion
+31
-26
models/rank/dataset/Criteo_data/get_slot_data.py
models/rank/dataset/Criteo_data/get_slot_data.py
+1
-1
models/rank/dcn/data/get_slot_data.py
models/rank/dcn/data/get_slot_data.py
+1
-1
models/rank/deepfm/data/get_slot_data.py
models/rank/deepfm/data/get_slot_data.py
+1
-1
models/rank/dnn/README.md
models/rank/dnn/README.md
+2
-2
models/rank/dnn/data/get_slot_data.py
models/rank/dnn/data/get_slot_data.py
+1
-1
models/rank/logistic_regression/data/get_slot_data.py
models/rank/logistic_regression/data/get_slot_data.py
+1
-1
models/rank/nfm/data/get_slot_data.py
models/rank/nfm/data/get_slot_data.py
+1
-1
models/rank/wide_deep/data/get_slot_data.py
models/rank/wide_deep/data/get_slot_data.py
+1
-0
models/rank/xdeepfm/data/get_slot_data.py
models/rank/xdeepfm/data/get_slot_data.py
+1
-1
models/recall/gnn/evaluate_reader.py
models/recall/gnn/evaluate_reader.py
+2
-1
models/recall/gnn/reader.py
models/recall/gnn/reader.py
+2
-1
models/recall/gru4rec/rsc15_reader.py
models/recall/gru4rec/rsc15_reader.py
+1
-1
models/recall/ncf/movielens_infer_reader.py
models/recall/ncf/movielens_infer_reader.py
+2
-2
models/recall/ncf/movielens_reader.py
models/recall/ncf/movielens_reader.py
+3
-2
models/recall/ssr/ssr_infer_reader.py
models/recall/ssr/ssr_infer_reader.py
+4
-4
models/recall/ssr/ssr_reader.py
models/recall/ssr/ssr_reader.py
+1
-1
models/recall/youtube_dnn/random_reader.py
models/recall/youtube_dnn/random_reader.py
+6
-5
未找到文件。
models/rank/dataset/Criteo_data/get_slot_data.py
浏览文件 @
8404c7af
...
@@ -87,7 +87,7 @@ class Reader(dg.MultiSlotDataGenerator):
...
@@ -87,7 +87,7 @@ class Reader(dg.MultiSlotDataGenerator):
v
=
i
[
1
]
v
=
i
[
1
]
for
j
in
v
:
for
j
in
v
:
s
+=
" "
+
k
+
":"
+
str
(
j
)
s
+=
" "
+
k
+
":"
+
str
(
j
)
print
s
.
strip
(
)
print
(
s
.
strip
()
)
yield
None
yield
None
return
data_iter
return
data_iter
...
...
models/rank/dcn/data/get_slot_data.py
浏览文件 @
8404c7af
...
@@ -92,7 +92,7 @@ class Reader(dg.MultiSlotDataGenerator):
...
@@ -92,7 +92,7 @@ class Reader(dg.MultiSlotDataGenerator):
v
=
i
[
1
]
v
=
i
[
1
]
for
j
in
v
:
for
j
in
v
:
s
+=
" "
+
k
+
":"
+
str
(
j
)
s
+=
" "
+
k
+
":"
+
str
(
j
)
print
s
.
strip
(
)
print
(
s
.
strip
()
)
yield
None
yield
None
return
data_iter
return
data_iter
...
...
models/rank/deepfm/data/get_slot_data.py
浏览文件 @
8404c7af
...
@@ -79,7 +79,7 @@ class Reader(dg.MultiSlotDataGenerator):
...
@@ -79,7 +79,7 @@ class Reader(dg.MultiSlotDataGenerator):
v
=
i
[
1
]
v
=
i
[
1
]
for
j
in
v
:
for
j
in
v
:
s
+=
" "
+
k
+
":"
+
str
(
j
)
s
+=
" "
+
k
+
":"
+
str
(
j
)
print
s
.
strip
(
)
print
(
s
.
strip
()
)
yield
None
yield
None
return
data_iter
return
data_iter
...
...
models/rank/dnn/README.md
浏览文件 @
8404c7af
...
@@ -185,7 +185,7 @@ inputs = [dense_input] + sparse_input_ids + [label]
...
@@ -185,7 +185,7 @@ inputs = [dense_input] + sparse_input_ids + [label]
### CTR-DNN模型组网
### CTR-DNN模型组网
CTR-DNN模型的组网比较直观,本质是一个二分类任务,代码参考
`
network_conf
.py`
。模型主要组成是一个
`Embedding`
层,三个
`FC`
层,以及相应的分类任务的loss计算和auc计算。
CTR-DNN模型的组网比较直观,本质是一个二分类任务,代码参考
`
model
.py`
。模型主要组成是一个
`Embedding`
层,三个
`FC`
层,以及相应的分类任务的loss计算和auc计算。
#### Embedding层
#### Embedding层
首先介绍Embedding层的搭建方式:
`Embedding`
层的输入是
`sparse_input`
,shape由超参的
`sparse_feature_dim`
和
`embedding_size`
定义。需要特别解释的是
`is_sparse`
参数,当我们指定
`is_sprase=True`
后,计算图会将该参数视为稀疏参数,反向更新以及分布式通信时,都以稀疏的方式进行,会极大的提升运行效率,同时保证效果一致。
首先介绍Embedding层的搭建方式:
`Embedding`
层的输入是
`sparse_input`
,shape由超参的
`sparse_feature_dim`
和
`embedding_size`
定义。需要特别解释的是
`is_sparse`
参数,当我们指定
`is_sprase=True`
后,计算图会将该参数视为稀疏参数,反向更新以及分布式通信时,都以稀疏的方式进行,会极大的提升运行效率,同时保证效果一致。
...
@@ -235,7 +235,7 @@ fc3 = fluid.layers.fc(
...
@@ -235,7 +235,7 @@ fc3 = fluid.layers.fc(
)
)
```
```
#### Loss及Auc计算
#### Loss及Auc计算
-
预测的结果通过一个输出shape为2的FC层给出,该FC层的激活函数
时
softmax,会给出每条样本分属于正负样本的概率。
-
预测的结果通过一个输出shape为2的FC层给出,该FC层的激活函数
是
softmax,会给出每条样本分属于正负样本的概率。
-
每条样本的损失由交叉熵给出,交叉熵的输入维度为[batch_size,2],数据类型为float,label的输入维度为[batch_size,1],数据类型为int。
-
每条样本的损失由交叉熵给出,交叉熵的输入维度为[batch_size,2],数据类型为float,label的输入维度为[batch_size,1],数据类型为int。
-
该batch的损失
`avg_cost`
是各条样本的损失之和
-
该batch的损失
`avg_cost`
是各条样本的损失之和
-
我们同时还会计算预测的auc,auc的结果由
`fluid.layers.auc()`
给出,该层的返回值有三个,分别是全局auc:
`auc_var`
,当前batch的auc:
`batch_auc_var`
,以及auc_states:
`auc_states`
,auc_states包含了
`batch_stat_pos, batch_stat_neg, stat_pos, stat_neg`
信息。
`batch_auc`
我们取近20个batch的平均,由参数
`slide_steps=20`
指定,roc曲线的离散化的临界数值设置为4096,由
`num_thresholds=2**12`
指定。
-
我们同时还会计算预测的auc,auc的结果由
`fluid.layers.auc()`
给出,该层的返回值有三个,分别是全局auc:
`auc_var`
,当前batch的auc:
`batch_auc_var`
,以及auc_states:
`auc_states`
,auc_states包含了
`batch_stat_pos, batch_stat_neg, stat_pos, stat_neg`
信息。
`batch_auc`
我们取近20个batch的平均,由参数
`slide_steps=20`
指定,roc曲线的离散化的临界数值设置为4096,由
`num_thresholds=2**12`
指定。
...
...
models/rank/dnn/data/get_slot_data.py
浏览文件 @
8404c7af
...
@@ -61,7 +61,7 @@ class CriteoDataset(dg.MultiSlotDataGenerator):
...
@@ -61,7 +61,7 @@ class CriteoDataset(dg.MultiSlotDataGenerator):
s
+=
" dense_feature:"
+
str
(
i
)
s
+=
" dense_feature:"
+
str
(
i
)
for
i
in
range
(
1
,
1
+
len
(
categorical_range_
)):
for
i
in
range
(
1
,
1
+
len
(
categorical_range_
)):
s
+=
" "
+
str
(
i
)
+
":"
+
str
(
sparse_feature
[
i
-
1
][
0
])
s
+=
" "
+
str
(
i
)
+
":"
+
str
(
sparse_feature
[
i
-
1
][
0
])
print
s
.
strip
(
)
print
(
s
.
strip
()
)
yield
None
yield
None
return
reader
return
reader
...
...
models/rank/logistic_regression/data/get_slot_data.py
浏览文件 @
8404c7af
...
@@ -88,7 +88,7 @@ class Reader(dg.MultiSlotDataGenerator):
...
@@ -88,7 +88,7 @@ class Reader(dg.MultiSlotDataGenerator):
v
=
i
[
1
]
v
=
i
[
1
]
for
j
in
v
:
for
j
in
v
:
s
+=
" "
+
k
+
":"
+
str
(
j
)
s
+=
" "
+
k
+
":"
+
str
(
j
)
print
s
.
strip
(
)
print
(
s
.
strip
()
)
yield
None
yield
None
return
data_iter
return
data_iter
...
...
models/rank/nfm/data/get_slot_data.py
浏览文件 @
8404c7af
...
@@ -87,7 +87,7 @@ class Reader(dg.MultiSlotDataGenerator):
...
@@ -87,7 +87,7 @@ class Reader(dg.MultiSlotDataGenerator):
v
=
i
[
1
]
v
=
i
[
1
]
for
j
in
v
:
for
j
in
v
:
s
+=
" "
+
k
+
":"
+
str
(
j
)
s
+=
" "
+
k
+
":"
+
str
(
j
)
print
s
.
strip
(
)
print
(
s
.
strip
()
)
yield
None
yield
None
return
data_iter
return
data_iter
...
...
models/rank/wide_deep/data/get_slot_data.py
浏览文件 @
8404c7af
...
@@ -50,6 +50,7 @@ class Reader(dg.MultiSlotDataGenerator):
...
@@ -50,6 +50,7 @@ class Reader(dg.MultiSlotDataGenerator):
v
=
i
[
1
]
v
=
i
[
1
]
for
j
in
v
:
for
j
in
v
:
s
+=
" "
+
k
+
":"
+
str
(
j
)
s
+=
" "
+
k
+
":"
+
str
(
j
)
print
(
s
.
strip
())
yield
None
yield
None
return
data_iter
return
data_iter
...
...
models/rank/xdeepfm/data/get_slot_data.py
浏览文件 @
8404c7af
...
@@ -49,7 +49,7 @@ class Reader(dg.MultiSlotDataGenerator):
...
@@ -49,7 +49,7 @@ class Reader(dg.MultiSlotDataGenerator):
v
=
i
[
1
]
v
=
i
[
1
]
for
j
in
v
:
for
j
in
v
:
s
+=
" "
+
k
+
":"
+
str
(
j
)
s
+=
" "
+
k
+
":"
+
str
(
j
)
print
s
.
strip
(
)
print
(
s
.
strip
()
)
yield
None
yield
None
return
data_iter
return
data_iter
...
...
models/recall/gnn/evaluate_reader.py
浏览文件 @
8404c7af
...
@@ -95,7 +95,8 @@ class Reader(ReaderBase):
...
@@ -95,7 +95,8 @@ class Reader(ReaderBase):
(
batch_size
,
max_uniq_len
,
max_uniq_len
))
(
batch_size
,
max_uniq_len
,
max_uniq_len
))
mask
=
np
.
array
(
mask
).
astype
(
"float32"
).
reshape
((
batch_size
,
-
1
,
1
))
mask
=
np
.
array
(
mask
).
astype
(
"float32"
).
reshape
((
batch_size
,
-
1
,
1
))
label
=
np
.
array
(
label
).
astype
(
"int64"
).
reshape
((
batch_size
,
1
))
label
=
np
.
array
(
label
).
astype
(
"int64"
).
reshape
((
batch_size
,
1
))
return
zip
(
items
,
seq_index
,
last_index
,
adj_in
,
adj_out
,
mask
,
label
)
return
list
(
zip
(
items
,
seq_index
,
last_index
,
adj_in
,
adj_out
,
mask
,
label
))
def
batch_reader
(
self
,
batch_size
,
batch_group_size
,
train
=
True
):
def
batch_reader
(
self
,
batch_size
,
batch_group_size
,
train
=
True
):
def
_reader
():
def
_reader
():
...
...
models/recall/gnn/reader.py
浏览文件 @
8404c7af
...
@@ -94,7 +94,8 @@ class Reader(ReaderBase):
...
@@ -94,7 +94,8 @@ class Reader(ReaderBase):
(
batch_size
,
max_uniq_len
,
max_uniq_len
))
(
batch_size
,
max_uniq_len
,
max_uniq_len
))
mask
=
np
.
array
(
mask
).
astype
(
"float32"
).
reshape
((
batch_size
,
-
1
,
1
))
mask
=
np
.
array
(
mask
).
astype
(
"float32"
).
reshape
((
batch_size
,
-
1
,
1
))
label
=
np
.
array
(
label
).
astype
(
"int64"
).
reshape
((
batch_size
,
1
))
label
=
np
.
array
(
label
).
astype
(
"int64"
).
reshape
((
batch_size
,
1
))
return
zip
(
items
,
seq_index
,
last_index
,
adj_in
,
adj_out
,
mask
,
label
)
return
list
(
zip
(
items
,
seq_index
,
last_index
,
adj_in
,
adj_out
,
mask
,
label
))
def
batch_reader
(
self
,
batch_size
,
batch_group_size
,
train
=
True
):
def
batch_reader
(
self
,
batch_size
,
batch_group_size
,
train
=
True
):
def
_reader
():
def
_reader
():
...
...
models/recall/gru4rec/rsc15_reader.py
浏览文件 @
8404c7af
...
@@ -37,6 +37,6 @@ class Reader(ReaderBase):
...
@@ -37,6 +37,6 @@ class Reader(ReaderBase):
trg_seq
=
l
[
1
:]
trg_seq
=
l
[
1
:]
trg_seq
=
[
int
(
e
)
for
e
in
trg_seq
]
trg_seq
=
[
int
(
e
)
for
e
in
trg_seq
]
feature_name
=
[
"src_wordseq"
,
"dst_wordseq"
]
feature_name
=
[
"src_wordseq"
,
"dst_wordseq"
]
yield
zip
(
feature_name
,
[
src_seq
]
+
[
trg_seq
]
)
yield
list
(
zip
(
feature_name
,
[
src_seq
]
+
[
trg_seq
])
)
return
reader
return
reader
models/recall/ncf/movielens_infer_reader.py
浏览文件 @
8404c7af
...
@@ -35,7 +35,7 @@ class Reader(ReaderBase):
...
@@ -35,7 +35,7 @@ class Reader(ReaderBase):
features
=
line
.
strip
().
split
(
','
)
features
=
line
.
strip
().
split
(
','
)
feature_name
=
[
"user_input"
,
"item_input"
]
feature_name
=
[
"user_input"
,
"item_input"
]
yield
zip
(
feature_name
,
yield
list
(
[[
int
(
features
[
0
])]]
+
[[
int
(
features
[
1
])]]
)
zip
(
feature_name
,
[[
int
(
features
[
0
])]]
+
[[
int
(
features
[
1
])]])
)
return
reader
return
reader
models/recall/ncf/movielens_reader.py
浏览文件 @
8404c7af
...
@@ -35,7 +35,8 @@ class Reader(ReaderBase):
...
@@ -35,7 +35,8 @@ class Reader(ReaderBase):
features
=
line
.
strip
().
split
(
','
)
features
=
line
.
strip
().
split
(
','
)
feature_name
=
[
"user_input"
,
"item_input"
,
"label"
]
feature_name
=
[
"user_input"
,
"item_input"
,
"label"
]
yield
zip
(
feature_name
,
[[
int
(
features
[
0
])]]
+
yield
list
(
[[
int
(
features
[
1
])]]
+
[[
int
(
features
[
2
])]])
zip
(
feature_name
,
[[
int
(
features
[
0
])]]
+
[[
int
(
features
[
1
])]]
+
[[
int
(
features
[
2
])]]))
return
reader
return
reader
models/recall/ssr/ssr_infer_reader.py
浏览文件 @
8404c7af
...
@@ -40,9 +40,9 @@ class Reader(ReaderBase):
...
@@ -40,9 +40,9 @@ class Reader(ReaderBase):
src
=
conv_ids
[:
boundary
]
src
=
conv_ids
[:
boundary
]
pos_tgt
=
[
conv_ids
[
boundary
]]
pos_tgt
=
[
conv_ids
[
boundary
]]
feature_name
=
[
"user"
,
"all_item"
,
"p_item"
]
feature_name
=
[
"user"
,
"all_item"
,
"p_item"
]
yield
zip
(
yield
list
(
feature_name
,
zip
(
feature_name
,
[
src
]
+
[
[
src
]
+
[
np
.
arange
(
self
.
vocab_size
).
astype
(
"int64"
).
tolist
()]
+
np
.
arange
(
self
.
vocab_size
).
astype
(
"int64"
).
tolist
()
[
pos_tgt
]
)
]
+
[
pos_tgt
])
)
return
reader
return
reader
models/recall/ssr/ssr_reader.py
浏览文件 @
8404c7af
...
@@ -42,6 +42,6 @@ class Reader(ReaderBase):
...
@@ -42,6 +42,6 @@ class Reader(ReaderBase):
pos_tgt
=
[
conv_ids
[
boundary
]]
pos_tgt
=
[
conv_ids
[
boundary
]]
neg_tgt
=
[
self
.
sample_neg_from_seq
(
src
)]
neg_tgt
=
[
self
.
sample_neg_from_seq
(
src
)]
feature_name
=
[
"user"
,
"p_item"
,
"n_item"
]
feature_name
=
[
"user"
,
"p_item"
,
"n_item"
]
yield
zip
(
feature_name
,
[
src
]
+
[
pos_tgt
]
+
[
neg_tgt
]
)
yield
list
(
zip
(
feature_name
,
[
src
]
+
[
pos_tgt
]
+
[
neg_tgt
])
)
return
reader
return
reader
models/recall/youtube_dnn/random_reader.py
浏览文件 @
8404c7af
...
@@ -41,10 +41,11 @@ class Reader(ReaderBase):
...
@@ -41,10 +41,11 @@ class Reader(ReaderBase):
"""
"""
feature_name
=
[
"watch_vec"
,
"search_vec"
,
"other_feat"
,
"label"
]
feature_name
=
[
"watch_vec"
,
"search_vec"
,
"other_feat"
,
"label"
]
yield
zip
(
feature_name
,
yield
list
(
[
np
.
random
.
rand
(
self
.
watch_vec_size
).
tolist
()]
+
zip
(
feature_name
,
[
[
np
.
random
.
rand
(
self
.
search_vec_size
).
tolist
()]
+
np
.
random
.
rand
(
self
.
watch_vec_size
).
tolist
()
[
np
.
random
.
rand
(
self
.
other_feat_size
).
tolist
()]
+
]
+
[
np
.
random
.
rand
(
self
.
search_vec_size
).
tolist
()]
+
[
[[
np
.
random
.
randint
(
self
.
output_size
)]])
np
.
random
.
rand
(
self
.
other_feat_size
).
tolist
()
]
+
[[
np
.
random
.
randint
(
self
.
output_size
)]]))
return
reader
return
reader
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录