Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
BaiXuePrincess
PaddleRec
提交
d190118a
P
PaddleRec
项目概览
BaiXuePrincess
/
PaddleRec
与 Fork 源项目一致
Fork自
PaddlePaddle / PaddleRec
通知
1
Star
0
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
PaddleRec
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
未验证
提交
d190118a
编写于
6月 01, 2020
作者:
W
wuzhihua
提交者:
GitHub
6月 01, 2020
浏览文件
操作
浏览文件
下载
差异文件
Merge pull request #21 from yaoxuefeng6/mod_yaml
update rank yaml, model, and add infer option in rank models
上级
5a6abdf6
2ff6b226
变更
14
隐藏空白更改
内联
并排
Showing
14 changed file
with
507 addition
and
402 deletion
+507
-402
models/rank/dcn/config.yaml
models/rank/dcn/config.yaml
+60
-37
models/rank/dcn/data/sample_data/infer/infer_sample_data
models/rank/dcn/data/sample_data/infer/infer_sample_data
+10
-0
models/rank/dcn/model.py
models/rank/dcn/model.py
+39
-43
models/rank/deepfm/config.yaml
models/rank/deepfm/config.yaml
+59
-33
models/rank/deepfm/model.py
models/rank/deepfm/model.py
+29
-44
models/rank/din/config.yaml
models/rank/din/config.yaml
+53
-33
models/rank/din/model.py
models/rank/din/model.py
+70
-77
models/rank/din/reader.py
models/rank/din/reader.py
+4
-3
models/rank/readme.md
models/rank/readme.md
+25
-3
models/rank/wide_deep/config.yaml
models/rank/wide_deep/config.yaml
+51
-29
models/rank/wide_deep/model.py
models/rank/wide_deep/model.py
+14
-19
models/rank/xdeepfm/config.yaml
models/rank/xdeepfm/config.yaml
+53
-33
models/rank/xdeepfm/model.py
models/rank/xdeepfm/model.py
+38
-47
setup.py
setup.py
+2
-1
未找到文件。
models/rank/dcn/config.yaml
浏览文件 @
d190118a
...
@@ -12,43 +12,66 @@
...
@@ -12,43 +12,66 @@
# See the License for the specific language governing permissions and
# See the License for the specific language governing permissions and
# limitations under the License.
# limitations under the License.
train
:
trainer
:
# global settings
# for cluster training
debug
:
false
strategy
:
"
async"
workspace
:
"
paddlerec.models.rank.dcn"
epochs
:
10
dataset
:
workspace
:
"
paddlerec.models.rank.dcn"
-
name
:
train_sample
type
:
QueueDataset
reader
:
batch_size
:
5
batch_size
:
2
data_path
:
"
{workspace}/data/sample_data/train"
train_data_path
:
"
{workspace}/data/sample_data/train"
sparse_slots
:
"
label
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
C11
C12
C13
C14
C15
C16
C17
C18
C19
C20
C21
C22
C23
C24
C25
C26"
feat_dict_name
:
"
{workspace}/data/vocab"
dense_slots
:
"
I1:1
I2:1
I3:1
I4:1
I5:1
I6:1
I7:1
I8:1
I9:1
I10:1
I11:1
I12:1
I13:1"
-
name
:
infer_sample
type
:
QueueDataset
batch_size
:
5
data_path
:
"
{workspace}/data/sample_data/infer"
sparse_slots
:
"
label
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
C11
C12
C13
C14
C15
C16
C17
C18
C19
C20
C21
C22
C23
C24
C25
C26"
sparse_slots
:
"
label
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
C11
C12
C13
C14
C15
C16
C17
C18
C19
C20
C21
C22
C23
C24
C25
C26"
dense_slots
:
"
I1:1
I2:1
I3:1
I4:1
I5:1
I6:1
I7:1
I8:1
I9:1
I10:1
I11:1
I12:1
I13:1"
dense_slots
:
"
I1:1
I2:1
I3:1
I4:1
I5:1
I6:1
I7:1
I8:1
I9:1
I10:1
I11:1
I12:1
I13:1"
model
:
hyper_parameters
:
models
:
"
{workspace}/model.py"
optimizer
:
hyper_parameters
:
class
:
Adam
cross_num
:
2
learning_rate
:
0.0001
dnn_hidden_units
:
[
128
,
128
]
# 用户自定义配置
l2_reg_cross
:
0.00005
cross_num
:
2
dnn_use_bn
:
False
dnn_hidden_units
:
[
128
,
128
]
clip_by_norm
:
100.0
l2_reg_cross
:
0.00005
cat_feat_num
:
"
{workspace}/data/sample_data/cat_feature_num.txt"
dnn_use_bn
:
False
is_sparse
:
False
clip_by_norm
:
100.0
is_test
:
False
cat_feat_num
:
"
{workspace}/data/sample_data/cat_feature_num.txt"
num_field
:
39
is_sparse
:
False
learning_rate
:
0.0001
act
:
"
relu"
optimizer
:
adam
mode
:
train_runner
# if infer, change mode to "infer_runner" and change phase to "infer_phase"
save
:
increment
:
runner
:
dirname
:
"
increment"
-
name
:
train_runner
epoch_interval
:
2
trainer_class
:
single_train
save_last
:
True
epochs
:
1
inference
:
device
:
cpu
dirname
:
"
inference"
init_model_path
:
"
"
epoch_interval
:
4
save_checkpoint_interval
:
1
save_last
:
True
save_inference_interval
:
1
save_checkpoint_path
:
"
increment"
save_inference_path
:
"
inference"
print_interval
:
1
-
name
:
infer_runner
trainer_class
:
single_infer
epochs
:
1
device
:
cpu
init_model_path
:
"
increment/0"
print_interval
:
1
phase
:
-
name
:
phase1
model
:
"
{workspace}/model.py"
dataset_name
:
train_sample
thread_num
:
1
#- name: infer_phase
# model: "{workspace}/model.py"
# dataset_name: infer_sample
# thread_num: 1
models/rank/dcn/data/sample_data/infer/infer_sample_data
0 → 100644
浏览文件 @
d190118a
label:0 I1:0.69314718056 I2:1.60943791243 I3:1.79175946923 I4:0.0 I5:7.23201033166 I6:1.60943791243 I7:2.77258872224 I8:1.09861228867 I9:5.20400668708 I10:0.69314718056 I11:1.09861228867 I12:0 I13:1.09861228867 C1:95 C2:398 C3:0 C4:0 C5:53 C6:1 C7:73 C8:71 C9:3 C10:1974 C11:832 C12:0 C13:875 C14:8 C15:1764 C16:0 C17:5 C18:390 C19:226 C20:1 C21:0 C22:0 C23:8 C24:1759 C25:1 C26:862
label:0 I1:1.09861228867 I2:1.38629436112 I3:3.80666248977 I4:0.69314718056 I5:4.63472898823 I6:2.19722457734 I7:1.09861228867 I8:1.09861228867 I9:1.60943791243 I10:0.69314718056 I11:0.69314718056 I12:0 I13:1.60943791243 C1:95 C2:200 C3:1184 C4:1929 C5:53 C6:4 C7:1477 C8:2 C9:3 C10:1283 C11:1567 C12:1048 C13:271 C14:6 C15:1551 C16:899 C17:1 C18:162 C19:226 C20:2 C21:575 C22:0 C23:8 C24:1615 C25:1 C26:659
label:0 I1:1.09861228867 I2:1.38629436112 I3:0.69314718056 I4:2.7080502011 I5:6.64378973315 I6:4.49980967033 I7:1.60943791243 I8:1.09861228867 I9:5.50533153593 I10:0.69314718056 I11:1.38629436112 I12:1.38629436112 I13:3.82864139649 C1:123 C2:378 C3:991 C4:197 C5:53 C6:1 C7:689 C8:2 C9:3 C10:245 C11:623 C12:1482 C13:887 C14:21 C15:106 C16:720 C17:3 C18:768 C19:0 C20:0 C21:1010 C22:1 C23:8 C24:720 C25:0 C26:0
label:0 I1:0 I2:6.79905586206 I3:0 I4:0 I5:8.38776764398 I6:0 I7:0.0 I8:0.0 I9:0.0 I10:0 I11:0.0 I12:0 I13:0 C1:95 C2:227 C3:0 C4:219 C5:53 C6:4 C7:3174 C8:2 C9:3 C10:569 C11:1963 C12:0 C13:1150 C14:21 C15:1656 C16:0 C17:6 C18:584 C19:0 C20:0 C21:0 C22:0 C23:8 C24:954 C25:0 C26:0
label:0 I1:1.38629436112 I2:1.09861228867 I3:0 I4:0.0 I5:1.09861228867 I6:0.0 I7:1.38629436112 I8:0.0 I9:0.0 I10:0.69314718056 I11:0.69314718056 I12:0 I13:0.0 C1:121 C2:147 C3:0 C4:1356 C5:53 C6:7 C7:2120 C8:2 C9:3 C10:703 C11:1678 C12:1210 C13:1455 C14:8 C15:538 C16:1276 C17:6 C18:346 C19:0 C20:0 C21:944 C22:0 C23:10 C24:355 C25:0 C26:0
label:0 I1:0 I2:1.09861228867 I3:0 I4:0 I5:9.45915167004 I6:0 I7:0.0 I8:0.0 I9:1.94591014906 I10:0 I11:0.0 I12:0 I13:0 C1:14 C2:75 C3:993 C4:480 C5:50 C6:6 C7:1188 C8:2 C9:3 C10:245 C11:1037 C12:1365 C13:1421 C14:21 C15:786 C16:5 C17:2 C18:555 C19:0 C20:0 C21:1408 C22:6 C23:7 C24:753 C25:0 C26:0
label:0 I1:0 I2:1.60943791243 I3:1.09861228867 I4:0 I5:8.06117135969 I6:0 I7:0.0 I8:0.69314718056 I9:1.09861228867 I10:0 I11:0.0 I12:0 I13:0 C1:139 C2:343 C3:553 C4:828 C5:50 C6:4 C7:0 C8:2 C9:3 C10:245 C11:2081 C12:260 C13:455 C14:21 C15:122 C16:1159 C17:2 C18:612 C19:0 C20:0 C21:1137 C22:0 C23:1 C24:1583 C25:0 C26:0
label:1 I1:0.69314718056 I2:2.07944154168 I3:1.09861228867 I4:0.0 I5:0.0 I6:0.0 I7:0.69314718056 I8:0.0 I9:0.0 I10:0.69314718056 I11:0.69314718056 I12:0 I13:0.0 C1:95 C2:227 C3:0 C4:1567 C5:21 C6:7 C7:2496 C8:71 C9:3 C10:1913 C11:2212 C12:0 C13:673 C14:21 C15:1656 C16:0 C17:5 C18:584 C19:0 C20:0 C21:0 C22:0 C23:10 C24:954 C25:0 C26:0
label:0 I1:0 I2:3.87120101091 I3:1.60943791243 I4:2.19722457734 I5:9.85277303799 I6:5.52146091786 I7:3.36729582999 I8:3.4657359028 I9:4.9558270576 I10:0 I11:0.69314718056 I12:0 I13:2.19722457734 C1:14 C2:14 C3:454 C4:197 C5:53 C6:1 C7:1386 C8:2 C9:3 C10:0 C11:1979 C12:205 C13:214 C14:6 C15:1837 C16:638 C17:5 C18:6 C19:0 C20:0 C21:70 C22:0 C23:10 C24:720 C25:0 C26:0
label:0 I1:0 I2:3.66356164613 I3:0 I4:0.69314718056 I5:10.4263800775 I6:3.09104245336 I7:0.69314718056 I8:1.09861228867 I9:1.38629436112 I10:0 I11:0.69314718056 I12:0 I13:0.69314718056 C1:14 C2:179 C3:120 C4:746 C5:53 C6:0 C7:1312 C8:2 C9:3 C10:1337 C11:1963 C12:905 C13:1150 C14:21 C15:1820 C16:328 C17:9 C18:77 C19:0 C20:0 C21:311 C22:0 C23:10 C24:89 C25:0 C26:0
models/rank/dcn/model.py
浏览文件 @
d190118a
...
@@ -24,44 +24,21 @@ class Model(ModelBase):
...
@@ -24,44 +24,21 @@ class Model(ModelBase):
def
__init__
(
self
,
config
):
def
__init__
(
self
,
config
):
ModelBase
.
__init__
(
self
,
config
)
ModelBase
.
__init__
(
self
,
config
)
def
init_network
(
self
):
def
_init_hyper_parameters
(
self
):
self
.
cross_num
=
envs
.
get_global_env
(
"hyper_parameters.cross_num"
,
self
.
cross_num
=
envs
.
get_global_env
(
"hyper_parameters.cross_num"
,
None
,
self
.
_namespace
)
None
)
self
.
dnn_hidden_units
=
envs
.
get_global_env
(
self
.
dnn_hidden_units
=
envs
.
get_global_env
(
"hyper_parameters.dnn_hidden_units"
,
None
,
self
.
_namespace
)
"hyper_parameters.dnn_hidden_units"
,
None
)
self
.
l2_reg_cross
=
envs
.
get_global_env
(
self
.
l2_reg_cross
=
envs
.
get_global_env
(
"hyper_parameters.l2_reg_cross"
,
None
,
self
.
_namespace
)
"hyper_parameters.l2_reg_cross"
,
None
)
self
.
dnn_use_bn
=
envs
.
get_global_env
(
"hyper_parameters.dnn_use_bn"
,
self
.
dnn_use_bn
=
envs
.
get_global_env
(
"hyper_parameters.dnn_use_bn"
,
None
,
self
.
_namespace
)
None
)
self
.
clip_by_norm
=
envs
.
get_global_env
(
self
.
clip_by_norm
=
envs
.
get_global_env
(
"hyper_parameters.clip_by_norm"
,
None
,
self
.
_namespace
)
"hyper_parameters.clip_by_norm"
,
None
)
cat_feat_num
=
envs
.
get_global_env
(
"hyper_parameters.cat_feat_num"
,
self
.
cat_feat_num
=
envs
.
get_global_env
(
None
,
self
.
_namespace
)
"hyper_parameters.cat_feat_num"
,
None
)
self
.
sparse_inputs
=
self
.
_sparse_data_var
[
1
:]
self
.
dense_inputs
=
self
.
_dense_data_var
self
.
target_input
=
self
.
_sparse_data_var
[
0
]
cat_feat_dims_dict
=
OrderedDict
()
for
line
in
open
(
cat_feat_num
):
spls
=
line
.
strip
().
split
()
assert
len
(
spls
)
==
2
cat_feat_dims_dict
[
spls
[
0
]]
=
int
(
spls
[
1
])
self
.
cat_feat_dims_dict
=
cat_feat_dims_dict
if
cat_feat_dims_dict
else
OrderedDict
(
)
self
.
is_sparse
=
envs
.
get_global_env
(
"hyper_parameters.is_sparse"
,
self
.
is_sparse
=
envs
.
get_global_env
(
"hyper_parameters.is_sparse"
,
None
,
self
.
_namespace
)
None
)
self
.
dense_feat_names
=
[
i
.
name
for
i
in
self
.
dense_inputs
]
self
.
sparse_feat_names
=
[
i
.
name
for
i
in
self
.
sparse_inputs
]
# {feat_name: dims}
self
.
feat_dims_dict
=
OrderedDict
(
[(
feat_name
,
1
)
for
feat_name
in
self
.
dense_feat_names
])
self
.
feat_dims_dict
.
update
(
self
.
cat_feat_dims_dict
)
self
.
net_input
=
None
self
.
loss
=
None
def
_create_embedding_input
(
self
):
def
_create_embedding_input
(
self
):
# sparse embedding
# sparse embedding
...
@@ -121,9 +98,29 @@ class Model(ModelBase):
...
@@ -121,9 +98,29 @@ class Model(ModelBase):
def
_l2_loss
(
self
,
w
):
def
_l2_loss
(
self
,
w
):
return
fluid
.
layers
.
reduce_sum
(
fluid
.
layers
.
square
(
w
))
return
fluid
.
layers
.
reduce_sum
(
fluid
.
layers
.
square
(
w
))
def
train_net
(
self
):
def
net
(
self
,
inputs
,
is_infer
=
False
):
self
.
_init_slots
()
self
.
sparse_inputs
=
self
.
_sparse_data_var
[
1
:]
self
.
init_network
()
self
.
dense_inputs
=
self
.
_dense_data_var
self
.
target_input
=
self
.
_sparse_data_var
[
0
]
cat_feat_dims_dict
=
OrderedDict
()
for
line
in
open
(
self
.
cat_feat_num
):
spls
=
line
.
strip
().
split
()
assert
len
(
spls
)
==
2
cat_feat_dims_dict
[
spls
[
0
]]
=
int
(
spls
[
1
])
self
.
cat_feat_dims_dict
=
cat_feat_dims_dict
if
cat_feat_dims_dict
else
OrderedDict
(
)
self
.
dense_feat_names
=
[
i
.
name
for
i
in
self
.
dense_inputs
]
self
.
sparse_feat_names
=
[
i
.
name
for
i
in
self
.
sparse_inputs
]
# {feat_name: dims}
self
.
feat_dims_dict
=
OrderedDict
(
[(
feat_name
,
1
)
for
feat_name
in
self
.
dense_feat_names
])
self
.
feat_dims_dict
.
update
(
self
.
cat_feat_dims_dict
)
self
.
net_input
=
None
self
.
loss
=
None
self
.
net_input
=
self
.
_create_embedding_input
()
self
.
net_input
=
self
.
_create_embedding_input
()
...
@@ -146,6 +143,9 @@ class Model(ModelBase):
...
@@ -146,6 +143,9 @@ class Model(ModelBase):
self
.
_metrics
[
"AUC"
]
=
auc_var
self
.
_metrics
[
"AUC"
]
=
auc_var
self
.
_metrics
[
"BATCH_AUC"
]
=
batch_auc_var
self
.
_metrics
[
"BATCH_AUC"
]
=
batch_auc_var
if
is_infer
:
self
.
_infer_results
[
"AUC"
]
=
auc_var
# logloss
# logloss
logloss
=
fluid
.
layers
.
log_loss
(
logloss
=
fluid
.
layers
.
log_loss
(
self
.
prob
,
fluid
.
layers
.
cast
(
self
.
prob
,
fluid
.
layers
.
cast
(
...
@@ -157,11 +157,7 @@ class Model(ModelBase):
...
@@ -157,11 +157,7 @@ class Model(ModelBase):
self
.
loss
=
self
.
avg_logloss
+
l2_reg_cross_loss
self
.
loss
=
self
.
avg_logloss
+
l2_reg_cross_loss
self
.
_cost
=
self
.
loss
self
.
_cost
=
self
.
loss
def
optimizer
(
self
):
#def optimizer(self):
learning_rate
=
envs
.
get_global_env
(
"hyper_parameters.learning_rate"
,
#
None
,
self
.
_namespace
)
# optimizer = fluid.optimizer.Adam(self.learning_rate, lazy_mode=True)
optimizer
=
fluid
.
optimizer
.
Adam
(
learning_rate
,
lazy_mode
=
True
)
# return optimizer
return
optimizer
def
infer_net
(
self
):
self
.
train_net
()
models/rank/deepfm/config.yaml
浏览文件 @
d190118a
...
@@ -12,39 +12,65 @@
...
@@ -12,39 +12,65 @@
# See the License for the specific language governing permissions and
# See the License for the specific language governing permissions and
# limitations under the License.
# limitations under the License.
train
:
# global settings
trainer
:
debug
:
false
# for cluster training
workspace
:
"
paddlerec.models.rank.deepfm"
strategy
:
"
async"
epochs
:
10
dataset
:
workspace
:
"
paddlerec.models.rank.deepfm"
-
name
:
train_sample
type
:
QueueDataset
reader
:
batch_size
:
5
batch_size
:
2
data_path
:
"
{workspace}/data/sample_data/train"
train_data_path
:
"
{workspace}/data/sample_data/train"
sparse_slots
:
"
label
feat_idx"
feat_dict_name
:
"
{workspace}/data/sample_data/feat_dict_10.pkl2"
dense_slots
:
"
feat_value:39"
-
name
:
infer_sample
type
:
QueueDataset
batch_size
:
5
data_path
:
"
{workspace}/data/sample_data/train"
sparse_slots
:
"
label
feat_idx"
sparse_slots
:
"
label
feat_idx"
dense_slots
:
"
feat_value:39"
dense_slots
:
"
feat_value:39"
model
:
hyper_parameters
:
models
:
"
{workspace}/model.py"
optimizer
:
hyper_parameters
:
class
:
SGD
sparse_feature_number
:
1086460
learning_rate
:
0.0001
sparse_feature_dim
:
9
sparse_feature_number
:
1086460
num_field
:
39
sparse_feature_dim
:
9
fc_sizes
:
[
400
,
400
,
400
]
num_field
:
39
learning_rate
:
0.0001
fc_sizes
:
[
400
,
400
,
400
]
reg
:
0.001
reg
:
0.001
act
:
"
relu"
act
:
"
relu"
optimizer
:
SGD
save
:
mode
:
train_runner
increment
:
# if infer, change mode to "infer_runner" and change phase to "infer_phase"
dirname
:
"
increment"
epoch_interval
:
2
runner
:
save_last
:
True
-
name
:
train_runner
inference
:
trainer_class
:
single_train
dirname
:
"
inference"
epochs
:
2
epoch_interval
:
4
device
:
cpu
save_last
:
True
init_model_path
:
"
"
save_checkpoint_interval
:
1
save_inference_interval
:
1
save_checkpoint_path
:
"
increment"
save_inference_path
:
"
inference"
print_interval
:
1
-
name
:
infer_runner
trainer_class
:
single_infer
epochs
:
1
device
:
cpu
init_model_path
:
"
increment/0"
print_interval
:
1
phase
:
-
name
:
phase1
model
:
"
{workspace}/model.py"
dataset_name
:
train_sample
thread_num
:
1
#- name: infer_phase
# model: "{workspace}/model.py"
# dataset_name: infer_sample
# thread_num: 1
models/rank/deepfm/model.py
浏览文件 @
d190118a
...
@@ -24,42 +24,46 @@ class Model(ModelBase):
...
@@ -24,42 +24,46 @@ class Model(ModelBase):
def
__init__
(
self
,
config
):
def
__init__
(
self
,
config
):
ModelBase
.
__init__
(
self
,
config
)
ModelBase
.
__init__
(
self
,
config
)
def
deepfm_net
(
self
):
def
_init_hyper_parameters
(
self
):
self
.
sparse_feature_number
=
envs
.
get_global_env
(
"hyper_parameters.sparse_feature_number"
,
None
)
self
.
sparse_feature_dim
=
envs
.
get_global_env
(
"hyper_parameters.sparse_feature_dim"
,
None
)
self
.
num_field
=
envs
.
get_global_env
(
"hyper_parameters.num_field"
,
None
)
self
.
reg
=
envs
.
get_global_env
(
"hyper_parameters.reg"
,
1e-4
)
self
.
layer_sizes
=
envs
.
get_global_env
(
"hyper_parameters.fc_sizes"
,
None
)
self
.
act
=
envs
.
get_global_env
(
"hyper_parameters.act"
,
None
)
def
net
(
self
,
inputs
,
is_infer
=
False
):
init_value_
=
0.1
init_value_
=
0.1
is_distributed
=
True
if
envs
.
get_trainer
()
==
"CtrTrainer"
else
False
is_distributed
=
True
if
envs
.
get_trainer
()
==
"CtrTrainer"
else
False
sparse_feature_number
=
envs
.
get_global_env
(
"hyper_parameters.sparse_feature_number"
,
None
,
self
.
_namespace
)
sparse_feature_dim
=
envs
.
get_global_env
(
"hyper_parameters.sparse_feature_dim"
,
None
,
self
.
_namespace
)
# ------------------------- network input --------------------------
# ------------------------- network input --------------------------
num_field
=
envs
.
get_global_env
(
"hyper_parameters.num_field"
,
None
,
self
.
_namespace
)
raw_feat_idx
=
self
.
_sparse_data_var
[
1
]
raw_feat_idx
=
self
.
_sparse_data_var
[
1
]
raw_feat_value
=
self
.
_dense_data_var
[
0
]
raw_feat_value
=
self
.
_dense_data_var
[
0
]
self
.
label
=
self
.
_sparse_data_var
[
0
]
self
.
label
=
self
.
_sparse_data_var
[
0
]
feat_idx
=
raw_feat_idx
feat_idx
=
raw_feat_idx
feat_value
=
fluid
.
layers
.
reshape
(
feat_value
=
fluid
.
layers
.
reshape
(
raw_feat_value
,
[
-
1
,
num_field
,
1
])
# None * num_field * 1
raw_feat_value
,
[
-
1
,
self
.
num_field
,
1
])
# None * num_field * 1
reg
=
envs
.
get_global_env
(
"hyper_parameters.reg"
,
1e-4
,
self
.
_namespace
)
first_weights_re
=
fluid
.
embedding
(
first_weights_re
=
fluid
.
embedding
(
input
=
feat_idx
,
input
=
feat_idx
,
is_sparse
=
True
,
is_sparse
=
True
,
is_distributed
=
is_distributed
,
is_distributed
=
is_distributed
,
dtype
=
'float32'
,
dtype
=
'float32'
,
size
=
[
sparse_feature_number
+
1
,
1
],
size
=
[
s
elf
.
s
parse_feature_number
+
1
,
1
],
padding_idx
=
0
,
padding_idx
=
0
,
param_attr
=
fluid
.
ParamAttr
(
param_attr
=
fluid
.
ParamAttr
(
initializer
=
fluid
.
initializer
.
TruncatedNormalInitializer
(
initializer
=
fluid
.
initializer
.
TruncatedNormalInitializer
(
loc
=
0.0
,
scale
=
init_value_
),
loc
=
0.0
,
scale
=
init_value_
),
regularizer
=
fluid
.
regularizer
.
L1DecayRegularizer
(
reg
)))
regularizer
=
fluid
.
regularizer
.
L1DecayRegularizer
(
self
.
reg
)))
first_weights
=
fluid
.
layers
.
reshape
(
first_weights
=
fluid
.
layers
.
reshape
(
first_weights_re
,
shape
=
[
-
1
,
num_field
,
1
])
# None * num_field * 1
first_weights_re
,
shape
=
[
-
1
,
self
.
num_field
,
1
])
# None * num_field * 1
y_first_order
=
fluid
.
layers
.
reduce_sum
((
first_weights
*
feat_value
),
y_first_order
=
fluid
.
layers
.
reduce_sum
((
first_weights
*
feat_value
),
1
)
1
)
...
@@ -70,16 +74,17 @@ class Model(ModelBase):
...
@@ -70,16 +74,17 @@ class Model(ModelBase):
is_sparse
=
True
,
is_sparse
=
True
,
is_distributed
=
is_distributed
,
is_distributed
=
is_distributed
,
dtype
=
'float32'
,
dtype
=
'float32'
,
size
=
[
s
parse_feature_number
+
1
,
sparse_feature_dim
],
size
=
[
s
elf
.
sparse_feature_number
+
1
,
self
.
sparse_feature_dim
],
padding_idx
=
0
,
padding_idx
=
0
,
param_attr
=
fluid
.
ParamAttr
(
param_attr
=
fluid
.
ParamAttr
(
initializer
=
fluid
.
initializer
.
TruncatedNormalInitializer
(
initializer
=
fluid
.
initializer
.
TruncatedNormalInitializer
(
loc
=
0.0
,
loc
=
0.0
,
scale
=
init_value_
/
math
.
sqrt
(
float
(
sparse_feature_dim
)))))
scale
=
init_value_
/
math
.
sqrt
(
float
(
self
.
sparse_feature_dim
)))))
feat_embeddings
=
fluid
.
layers
.
reshape
(
feat_embeddings
=
fluid
.
layers
.
reshape
(
feat_embeddings_re
,
feat_embeddings_re
,
shape
=
[
-
1
,
num_field
,
shape
=
[
-
1
,
self
.
num_field
,
self
.
sparse_feature_dim
sparse_feature_dim
])
# None * num_field * embedding_size
])
# None * num_field * embedding_size
feat_embeddings
=
feat_embeddings
*
feat_value
# None * num_field * embedding_size
feat_embeddings
=
feat_embeddings
*
feat_value
# None * num_field * embedding_size
# sum_square part
# sum_square part
...
@@ -101,17 +106,13 @@ class Model(ModelBase):
...
@@ -101,17 +106,13 @@ class Model(ModelBase):
# ------------------------- DNN --------------------------
# ------------------------- DNN --------------------------
layer_sizes
=
envs
.
get_global_env
(
"hyper_parameters.fc_sizes"
,
None
,
y_dnn
=
fluid
.
layers
.
reshape
(
self
.
_namespace
)
feat_embeddings
,
[
-
1
,
self
.
num_field
*
self
.
sparse_feature_dim
])
act
=
envs
.
get_global_env
(
"hyper_parameters.act"
,
None
,
for
s
in
self
.
layer_sizes
:
self
.
_namespace
)
y_dnn
=
fluid
.
layers
.
reshape
(
feat_embeddings
,
[
-
1
,
num_field
*
sparse_feature_dim
])
for
s
in
layer_sizes
:
y_dnn
=
fluid
.
layers
.
fc
(
y_dnn
=
fluid
.
layers
.
fc
(
input
=
y_dnn
,
input
=
y_dnn
,
size
=
s
,
size
=
s
,
act
=
act
,
act
=
self
.
act
,
param_attr
=
fluid
.
ParamAttr
(
param_attr
=
fluid
.
ParamAttr
(
initializer
=
fluid
.
initializer
.
TruncatedNormalInitializer
(
initializer
=
fluid
.
initializer
.
TruncatedNormalInitializer
(
loc
=
0.0
,
scale
=
init_value_
/
math
.
sqrt
(
float
(
10
)))),
loc
=
0.0
,
scale
=
init_value_
/
math
.
sqrt
(
float
(
10
)))),
...
@@ -133,21 +134,12 @@ class Model(ModelBase):
...
@@ -133,21 +134,12 @@ class Model(ModelBase):
self
.
predict
=
fluid
.
layers
.
sigmoid
(
y_first_order
+
y_second_order
+
self
.
predict
=
fluid
.
layers
.
sigmoid
(
y_first_order
+
y_second_order
+
y_dnn
)
y_dnn
)
def
train_net
(
self
):
self
.
_init_slots
()
self
.
deepfm_net
()
# ------------------------- Cost(logloss) --------------------------
cost
=
fluid
.
layers
.
log_loss
(
cost
=
fluid
.
layers
.
log_loss
(
input
=
self
.
predict
,
label
=
fluid
.
layers
.
cast
(
self
.
label
,
"float32"
))
input
=
self
.
predict
,
label
=
fluid
.
layers
.
cast
(
self
.
label
,
"float32"
))
avg_cost
=
fluid
.
layers
.
reduce_sum
(
cost
)
avg_cost
=
fluid
.
layers
.
reduce_sum
(
cost
)
self
.
_cost
=
avg_cost
self
.
_cost
=
avg_cost
# ------------------------- Metric(Auc) --------------------------
predict_2d
=
fluid
.
layers
.
concat
([
1
-
self
.
predict
,
self
.
predict
],
1
)
predict_2d
=
fluid
.
layers
.
concat
([
1
-
self
.
predict
,
self
.
predict
],
1
)
label_int
=
fluid
.
layers
.
cast
(
self
.
label
,
'int64'
)
label_int
=
fluid
.
layers
.
cast
(
self
.
label
,
'int64'
)
auc_var
,
batch_auc_var
,
_
=
fluid
.
layers
.
auc
(
input
=
predict_2d
,
auc_var
,
batch_auc_var
,
_
=
fluid
.
layers
.
auc
(
input
=
predict_2d
,
...
@@ -155,12 +147,5 @@ class Model(ModelBase):
...
@@ -155,12 +147,5 @@ class Model(ModelBase):
slide_steps
=
0
)
slide_steps
=
0
)
self
.
_metrics
[
"AUC"
]
=
auc_var
self
.
_metrics
[
"AUC"
]
=
auc_var
self
.
_metrics
[
"BATCH_AUC"
]
=
batch_auc_var
self
.
_metrics
[
"BATCH_AUC"
]
=
batch_auc_var
if
is_infer
:
def
optimizer
(
self
):
self
.
_infer_results
[
"AUC"
]
=
auc_var
learning_rate
=
envs
.
get_global_env
(
"hyper_parameters.learning_rate"
,
None
,
self
.
_namespace
)
optimizer
=
fluid
.
optimizer
.
Adam
(
learning_rate
,
lazy_mode
=
True
)
return
optimizer
def
infer_net
(
self
):
self
.
train_net
()
models/rank/din/config.yaml
浏览文件 @
d190118a
...
@@ -12,40 +12,60 @@
...
@@ -12,40 +12,60 @@
# See the License for the specific language governing permissions and
# See the License for the specific language governing permissions and
# limitations under the License.
# limitations under the License.
train
:
# global settings
trainer
:
debug
:
false
# for cluster training
workspace
:
"
paddlerec.models.rank.din"
strategy
:
"
async"
epochs
:
10
dataset
:
workspace
:
"
paddlerec.models.rank.din"
-
name
:
sample_1
type
:
DataLoader
batch_size
:
5
data_path
:
"
{workspace}/data/train_data"
data_converter
:
"
{workspace}/reader.py"
-
name
:
infer_sample
type
:
DataLoader
batch_size
:
5
data_path
:
"
{workspace}/data/train_data"
data_converter
:
"
{workspace}/reader.py"
reader
:
hyper_parameters
:
batch_size
:
2
optimizer
:
class
:
"
{workspace}/reader.py"
class
:
SGD
train_data_path
:
"
{workspace}/data/train_data"
learning_rate
:
0.0001
dataset_class
:
"
DataLoader"
use_DataLoader
:
True
item_emb_size
:
64
cat_emb_size
:
64
is_sparse
:
False
item_count
:
63001
cat_count
:
801
model
:
act
:
"
sigmoid"
models
:
"
{workspace}/model.py"
hyper_parameters
:
use_DataLoader
:
True
item_emb_size
:
64
cat_emb_size
:
64
is_sparse
:
False
config_path
:
"
data/config.txt"
fc_sizes
:
[
400
,
400
,
400
]
learning_rate
:
0.0001
reg
:
0.001
act
:
"
sigmoid"
optimizer
:
SGD
save
:
increment
:
mode
:
train_runner
dirname
:
"
increment"
epoch_interval
:
2
runner
:
save_last
:
True
-
name
:
train_runner
inference
:
trainer_class
:
single_train
dirname
:
"
inference"
epochs
:
1
epoch_interval
:
4
device
:
cpu
save_last
:
True
init_model_path
:
"
"
save_checkpoint_interval
:
1
save_inference_interval
:
1
save_checkpoint_path
:
"
increment"
save_inference_path
:
"
inference"
print_interval
:
1
-
name
:
infer_runner
trainer_class
:
single_infer
epochs
:
1
device
:
cpu
init_model_path
:
"
increment/0"
phase
:
-
name
:
phase1
model
:
"
{workspace}/model.py"
dataset_name
:
sample_1
thread_num
:
1
#- name: infer_phase
# model: "{workspace}/model.py"
# dataset_name: infer_sample
# thread_num: 1
models/rank/din/model.py
浏览文件 @
d190118a
...
@@ -22,12 +22,58 @@ class Model(ModelBase):
...
@@ -22,12 +22,58 @@ class Model(ModelBase):
def
__init__
(
self
,
config
):
def
__init__
(
self
,
config
):
ModelBase
.
__init__
(
self
,
config
)
ModelBase
.
__init__
(
self
,
config
)
def
config_read
(
self
,
config_path
):
def
_init_hyper_parameters
(
self
):
with
open
(
config_path
,
"r"
)
as
fin
:
self
.
item_emb_size
=
envs
.
get_global_env
(
user_count
=
int
(
fin
.
readline
().
strip
())
"hyper_parameters.item_emb_size"
,
64
)
item_count
=
int
(
fin
.
readline
().
strip
())
self
.
cat_emb_size
=
envs
.
get_global_env
(
cat_count
=
int
(
fin
.
readline
().
strip
())
"hyper_parameters.cat_emb_size"
,
64
)
return
user_count
,
item_count
,
cat_count
self
.
act
=
envs
.
get_global_env
(
"hyper_parameters.act"
,
"sigmoid"
)
self
.
is_sparse
=
envs
.
get_global_env
(
"hyper_parameters.is_sparse"
,
False
)
#significant for speeding up the training process
self
.
use_DataLoader
=
envs
.
get_global_env
(
"hyper_parameters.use_DataLoader"
,
False
)
self
.
item_count
=
envs
.
get_global_env
(
"hyper_parameters.item_count"
,
63001
)
self
.
cat_count
=
envs
.
get_global_env
(
"hyper_parameters.cat_count"
,
801
)
def
input_data
(
self
,
is_infer
=
False
,
**
kwargs
):
seq_len
=
-
1
self
.
data_var
=
[]
hist_item_seq
=
fluid
.
data
(
name
=
"hist_item_seq"
,
shape
=
[
None
,
seq_len
],
dtype
=
"int64"
)
self
.
data_var
.
append
(
hist_item_seq
)
hist_cat_seq
=
fluid
.
data
(
name
=
"hist_cat_seq"
,
shape
=
[
None
,
seq_len
],
dtype
=
"int64"
)
self
.
data_var
.
append
(
hist_cat_seq
)
target_item
=
fluid
.
data
(
name
=
"target_item"
,
shape
=
[
None
],
dtype
=
"int64"
)
self
.
data_var
.
append
(
target_item
)
target_cat
=
fluid
.
data
(
name
=
"target_cat"
,
shape
=
[
None
],
dtype
=
"int64"
)
self
.
data_var
.
append
(
target_cat
)
label
=
fluid
.
data
(
name
=
"label"
,
shape
=
[
None
,
1
],
dtype
=
"float32"
)
self
.
data_var
.
append
(
label
)
mask
=
fluid
.
data
(
name
=
"mask"
,
shape
=
[
None
,
seq_len
,
1
],
dtype
=
"float32"
)
self
.
data_var
.
append
(
mask
)
target_item_seq
=
fluid
.
data
(
name
=
"target_item_seq"
,
shape
=
[
None
,
seq_len
],
dtype
=
"int64"
)
self
.
data_var
.
append
(
target_item_seq
)
target_cat_seq
=
fluid
.
data
(
name
=
"target_cat_seq"
,
shape
=
[
None
,
seq_len
],
dtype
=
"int64"
)
self
.
data_var
.
append
(
target_cat_seq
)
train_inputs
=
[
hist_item_seq
]
+
[
hist_cat_seq
]
+
[
target_item
]
+
[
target_cat
]
+
[
label
]
+
[
mask
]
+
[
target_item_seq
]
+
[
target_cat_seq
]
return
train_inputs
def
din_attention
(
self
,
hist
,
target_expand
,
mask
):
def
din_attention
(
self
,
hist
,
target_expand
,
mask
):
"""activation weight"""
"""activation weight"""
...
@@ -59,104 +105,58 @@ class Model(ModelBase):
...
@@ -59,104 +105,58 @@ class Model(ModelBase):
out
=
fluid
.
layers
.
reshape
(
x
=
out
,
shape
=
[
0
,
hidden_size
])
out
=
fluid
.
layers
.
reshape
(
x
=
out
,
shape
=
[
0
,
hidden_size
])
return
out
return
out
def
train_net
(
self
):
def
net
(
self
,
inputs
,
is_infer
=
False
):
seq_len
=
-
1
hist_item_seq
=
inputs
[
0
]
self
.
item_emb_size
=
envs
.
get_global_env
(
hist_cat_seq
=
inputs
[
1
]
"hyper_parameters.item_emb_size"
,
64
,
self
.
_namespace
)
target_item
=
inputs
[
2
]
self
.
cat_emb_size
=
envs
.
get_global_env
(
target_cat
=
inputs
[
3
]
"hyper_parameters.cat_emb_size"
,
64
,
self
.
_namespace
)
label
=
inputs
[
4
]
self
.
act
=
envs
.
get_global_env
(
"hyper_parameters.act"
,
"sigmoid"
,
mask
=
inputs
[
5
]
self
.
_namespace
)
target_item_seq
=
inputs
[
6
]
#item_emb_size = 64
target_cat_seq
=
inputs
[
7
]
#cat_emb_size = 64
self
.
is_sparse
=
envs
.
get_global_env
(
"hyper_parameters.is_sparse"
,
False
,
self
.
_namespace
)
#significant for speeding up the training process
self
.
config_path
=
envs
.
get_global_env
(
"hyper_parameters.config_path"
,
"data/config.txt"
,
self
.
_namespace
)
self
.
use_DataLoader
=
envs
.
get_global_env
(
"hyper_parameters.use_DataLoader"
,
False
,
self
.
_namespace
)
user_count
,
item_count
,
cat_count
=
self
.
config_read
(
self
.
config_path
)
item_emb_attr
=
fluid
.
ParamAttr
(
name
=
"item_emb"
)
item_emb_attr
=
fluid
.
ParamAttr
(
name
=
"item_emb"
)
cat_emb_attr
=
fluid
.
ParamAttr
(
name
=
"cat_emb"
)
cat_emb_attr
=
fluid
.
ParamAttr
(
name
=
"cat_emb"
)
hist_item_seq
=
fluid
.
data
(
name
=
"hist_item_seq"
,
shape
=
[
None
,
seq_len
],
dtype
=
"int64"
)
self
.
_data_var
.
append
(
hist_item_seq
)
hist_cat_seq
=
fluid
.
data
(
name
=
"hist_cat_seq"
,
shape
=
[
None
,
seq_len
],
dtype
=
"int64"
)
self
.
_data_var
.
append
(
hist_cat_seq
)
target_item
=
fluid
.
data
(
name
=
"target_item"
,
shape
=
[
None
],
dtype
=
"int64"
)
self
.
_data_var
.
append
(
target_item
)
target_cat
=
fluid
.
data
(
name
=
"target_cat"
,
shape
=
[
None
],
dtype
=
"int64"
)
self
.
_data_var
.
append
(
target_cat
)
label
=
fluid
.
data
(
name
=
"label"
,
shape
=
[
None
,
1
],
dtype
=
"float32"
)
self
.
_data_var
.
append
(
label
)
mask
=
fluid
.
data
(
name
=
"mask"
,
shape
=
[
None
,
seq_len
,
1
],
dtype
=
"float32"
)
self
.
_data_var
.
append
(
mask
)
target_item_seq
=
fluid
.
data
(
name
=
"target_item_seq"
,
shape
=
[
None
,
seq_len
],
dtype
=
"int64"
)
self
.
_data_var
.
append
(
target_item_seq
)
target_cat_seq
=
fluid
.
data
(
name
=
"target_cat_seq"
,
shape
=
[
None
,
seq_len
],
dtype
=
"int64"
)
self
.
_data_var
.
append
(
target_cat_seq
)
if
self
.
use_DataLoader
:
self
.
_data_loader
=
fluid
.
io
.
DataLoader
.
from_generator
(
feed_list
=
self
.
_data_var
,
capacity
=
10000
,
use_double_buffer
=
False
,
iterable
=
False
)
hist_item_emb
=
fluid
.
embedding
(
hist_item_emb
=
fluid
.
embedding
(
input
=
hist_item_seq
,
input
=
hist_item_seq
,
size
=
[
item_count
,
self
.
item_emb_size
],
size
=
[
self
.
item_count
,
self
.
item_emb_size
],
param_attr
=
item_emb_attr
,
param_attr
=
item_emb_attr
,
is_sparse
=
self
.
is_sparse
)
is_sparse
=
self
.
is_sparse
)
hist_cat_emb
=
fluid
.
embedding
(
hist_cat_emb
=
fluid
.
embedding
(
input
=
hist_cat_seq
,
input
=
hist_cat_seq
,
size
=
[
cat_count
,
self
.
cat_emb_size
],
size
=
[
self
.
cat_count
,
self
.
cat_emb_size
],
param_attr
=
cat_emb_attr
,
param_attr
=
cat_emb_attr
,
is_sparse
=
self
.
is_sparse
)
is_sparse
=
self
.
is_sparse
)
target_item_emb
=
fluid
.
embedding
(
target_item_emb
=
fluid
.
embedding
(
input
=
target_item
,
input
=
target_item
,
size
=
[
item_count
,
self
.
item_emb_size
],
size
=
[
self
.
item_count
,
self
.
item_emb_size
],
param_attr
=
item_emb_attr
,
param_attr
=
item_emb_attr
,
is_sparse
=
self
.
is_sparse
)
is_sparse
=
self
.
is_sparse
)
target_cat_emb
=
fluid
.
embedding
(
target_cat_emb
=
fluid
.
embedding
(
input
=
target_cat
,
input
=
target_cat
,
size
=
[
cat_count
,
self
.
cat_emb_size
],
size
=
[
self
.
cat_count
,
self
.
cat_emb_size
],
param_attr
=
cat_emb_attr
,
param_attr
=
cat_emb_attr
,
is_sparse
=
self
.
is_sparse
)
is_sparse
=
self
.
is_sparse
)
target_item_seq_emb
=
fluid
.
embedding
(
target_item_seq_emb
=
fluid
.
embedding
(
input
=
target_item_seq
,
input
=
target_item_seq
,
size
=
[
item_count
,
self
.
item_emb_size
],
size
=
[
self
.
item_count
,
self
.
item_emb_size
],
param_attr
=
item_emb_attr
,
param_attr
=
item_emb_attr
,
is_sparse
=
self
.
is_sparse
)
is_sparse
=
self
.
is_sparse
)
target_cat_seq_emb
=
fluid
.
embedding
(
target_cat_seq_emb
=
fluid
.
embedding
(
input
=
target_cat_seq
,
input
=
target_cat_seq
,
size
=
[
cat_count
,
self
.
cat_emb_size
],
size
=
[
self
.
cat_count
,
self
.
cat_emb_size
],
param_attr
=
cat_emb_attr
,
param_attr
=
cat_emb_attr
,
is_sparse
=
self
.
is_sparse
)
is_sparse
=
self
.
is_sparse
)
item_b
=
fluid
.
embedding
(
item_b
=
fluid
.
embedding
(
input
=
target_item
,
input
=
target_item
,
size
=
[
item_count
,
1
],
size
=
[
self
.
item_count
,
1
],
param_attr
=
fluid
.
initializer
.
Constant
(
value
=
0.0
))
param_attr
=
fluid
.
initializer
.
Constant
(
value
=
0.0
))
hist_seq_concat
=
fluid
.
layers
.
concat
(
hist_seq_concat
=
fluid
.
layers
.
concat
(
...
@@ -195,12 +195,5 @@ class Model(ModelBase):
...
@@ -195,12 +195,5 @@ class Model(ModelBase):
slide_steps
=
0
)
slide_steps
=
0
)
self
.
_metrics
[
"AUC"
]
=
auc_var
self
.
_metrics
[
"AUC"
]
=
auc_var
self
.
_metrics
[
"BATCH_AUC"
]
=
batch_auc_var
self
.
_metrics
[
"BATCH_AUC"
]
=
batch_auc_var
if
is_infer
:
def
optimizer
(
self
):
self
.
_infer_results
[
"AUC"
]
=
auc_var
learning_rate
=
envs
.
get_global_env
(
"hyper_parameters.learning_rate"
,
None
,
self
.
_namespace
)
optimizer
=
fluid
.
optimizer
.
Adam
(
learning_rate
,
lazy_mode
=
True
)
return
optimizer
def
infer_net
(
self
,
parameter_list
):
self
.
deepfm_net
()
models/rank/din/reader.py
浏览文件 @
d190118a
...
@@ -29,8 +29,8 @@ from paddlerec.core.utils import envs
...
@@ -29,8 +29,8 @@ from paddlerec.core.utils import envs
class
TrainReader
(
Reader
):
class
TrainReader
(
Reader
):
def
init
(
self
):
def
init
(
self
):
self
.
train_data_path
=
envs
.
get_global_env
(
"train_data_path"
,
None
,
self
.
train_data_path
=
envs
.
get_global_env
(
"train.reader"
)
"dataset.sample_1.data_path"
,
None
)
self
.
res
=
[]
self
.
res
=
[]
self
.
max_len
=
0
self
.
max_len
=
0
...
@@ -46,7 +46,8 @@ class TrainReader(Reader):
...
@@ -46,7 +46,8 @@ class TrainReader(Reader):
fo
=
open
(
"tmp.txt"
,
"w"
)
fo
=
open
(
"tmp.txt"
,
"w"
)
fo
.
write
(
str
(
self
.
max_len
))
fo
.
write
(
str
(
self
.
max_len
))
fo
.
close
()
fo
.
close
()
self
.
batch_size
=
envs
.
get_global_env
(
"batch_size"
,
32
,
"train.reader"
)
self
.
batch_size
=
envs
.
get_global_env
(
"dataset.sample_1.batch_size"
,
32
,
"train.reader"
)
self
.
group_size
=
self
.
batch_size
*
20
self
.
group_size
=
self
.
batch_size
*
20
def
_process_line
(
self
,
line
):
def
_process_line
(
self
,
line
):
...
...
models/rank/readme.md
浏览文件 @
d190118a
...
@@ -56,7 +56,18 @@
...
@@ -56,7 +56,18 @@
<img
align=
"center"
src=
"../../doc/imgs/din.png"
>
<img
align=
"center"
src=
"../../doc/imgs/din.png"
>
<p>
<p>
## 使用教程
## 使用教程(快速开始)
使用样例数据快速开始,参考
[
训练
](
###训练
)
&
[
预测
](
###预测
)
## 使用教程(复现论文)
为了方便使用者能够快速的跑通每一个模型,我们在每个模型下都提供了样例数据,并且调整了batch_size等超参以便在样例数据上更加友好的显示训练&测试日志。如果需要复现readme中的效果请按照如下表格调整batch_size等超参,并使用提供的脚本下载对应数据集以及数据预处理。
| 模型 | batch_size | thread_num | epoch_num |
| :------------------: | :--------------------: | :--------------------: | :--------------------: |
| DNN | 1000 | 10 | 1 |
| DCN | 512 | 20 | 2 |
| DeepFM | 100 | 10 | 30 |
| DIN | 32 | 10 | 100 |
| Wide&Deep | 40 | 1 | 40 |
| xDeepFM | 100 | 1 | 10 |
### 数据处理
### 数据处理
参考每个模型目录数据下载&预处理脚本
参考每个模型目录数据下载&预处理脚本
...
@@ -68,11 +79,21 @@ sh run.sh
...
@@ -68,11 +79,21 @@ sh run.sh
### 训练
### 训练
```
```
python -m paddlerec.run -m paddlerec.models.rank.dnn # 以DNN为例
cd modles/rank/dnn # 进入选定好的排序模型的目录 以DNN为例
python -m paddlerec.run -m paddlerec.models.rank.dnn # 使用内置配置
# 如果需要使用自定义配置,config.yaml中workspace需要使用改模型目录的绝对路径
# 自定义修改超参后,指定配置文件,使用自定义配置
python -m paddlerec.run -m ./config.yaml
```
```
### 预测
### 预测
```
```
python -m paddlerec.run -m paddlerec.models.rank.dnn # 以DNN为例
# 修改对应模型的config.yaml,mode配置infer_runner
# 示例: mode: runner1 -> mode: infer_runner
# infer_runner中 class配置为 class: single_infer
# 如果训练阶段和预测阶段的模型输入一致,phase不需要改动,复用train的即可
# 修改完config.yaml后 执行:
python -m paddlerec.run -m ./config.yaml # 以DNN为例
```
```
## 效果对比
## 效果对比
...
@@ -87,6 +108,7 @@ python -m paddlerec.run -m paddlerec.models.rank.dnn # 以DNN为例
...
@@ -87,6 +108,7 @@ python -m paddlerec.run -m paddlerec.models.rank.dnn # 以DNN为例
| Census-income Data | Wide&Deep | 0.76195 | 0.90577 | -- | -- |
| Census-income Data | Wide&Deep | 0.76195 | 0.90577 | -- | -- |
| Amazon Product | DIN | 0.47005 | 0.86379 | -- | -- |
| Amazon Product | DIN | 0.47005 | 0.86379 | -- | -- |
## 分布式
## 分布式
### 模型训练性能 (样本/s)
### 模型训练性能 (样本/s)
| 数据集 | 模型 | 单机 | 同步 (4节点) | 同步 (8节点) | 同步 (16节点) | 同步 (32节点) |
| 数据集 | 模型 | 单机 | 同步 (4节点) | 同步 (8节点) | 同步 (16节点) | 同步 (32节点) |
...
...
models/rank/wide_deep/config.yaml
浏览文件 @
d190118a
...
@@ -12,37 +12,59 @@
...
@@ -12,37 +12,59 @@
# See the License for the specific language governing permissions and
# See the License for the specific language governing permissions and
# limitations under the License.
# limitations under the License.
train
:
# global settings
trainer
:
debug
:
false
# for cluster training
workspace
:
"
paddlerec.models.rank.wide_deep"
strategy
:
"
async"
epochs
:
10
workspace
:
"
paddlerec.models.rank.wide_deep"
reader
:
dataset
:
batch_size
:
2
-
name
:
sample_1
train_data_path
:
"
{workspace}/data/sample_data/train"
type
:
QueueDataset
batch_size
:
5
data_path
:
"
{workspace}/data/sample_data/train"
sparse_slots
:
"
label"
sparse_slots
:
"
label"
dense_slots
:
"
wide_input:8
deep_input:58"
dense_slots
:
"
wide_input:8
deep_input:58"
-
name
:
infer_sample
type
:
QueueDataset
batch_size
:
5
data_path
:
"
{workspace}/data/sample_data/train"
sparse_slots
:
"
label"
dense_slots
:
"
wide_input:8
deep_input:58"
hyper_parameters
:
optimizer
:
class
:
SGD
learning_rate
:
0.0001
hidden1_units
:
75
hidden2_units
:
50
hidden3_units
:
25
mode
:
train_runner
# if infer, change mode to "infer_runner" and change phase to "infer_phase"
runner
:
-
name
:
train_runner
trainer_class
:
single_train
epochs
:
1
device
:
cpu
init_model_path
:
"
"
save_checkpoint_interval
:
1
save_inference_interval
:
1
save_checkpoint_path
:
"
increment"
save_inference_path
:
"
inference"
-
name
:
infer_runner
trainer_class
:
single_infer
epochs
:
1
device
:
cpu
init_model_path
:
"
increment/0"
model
:
phase
:
models
:
"
{workspace}/model.py"
-
name
:
phase1
hyper_parameters
:
model
:
"
{workspace}/model.py"
hidden1_units
:
75
dataset_name
:
sample_1
hidden2_units
:
50
thread_num
:
1
hidden3_units
:
25
#- name: infer_phase
learning_rate
:
0.0001
# model: "{workspace}/model.py"
reg
:
0.001
# dataset_name: infer_sample
act
:
"
relu"
# thread_num: 1
optimizer
:
SGD
save
:
increment
:
dirname
:
"
increment"
epoch_interval
:
2
save_last
:
True
inference
:
dirname
:
"
inference"
epoch_interval
:
4
save_last
:
True
models/rank/wide_deep/model.py
浏览文件 @
d190118a
...
@@ -24,6 +24,14 @@ class Model(ModelBase):
...
@@ -24,6 +24,14 @@ class Model(ModelBase):
def
__init__
(
self
,
config
):
def
__init__
(
self
,
config
):
ModelBase
.
__init__
(
self
,
config
)
ModelBase
.
__init__
(
self
,
config
)
def
_init_hyper_parameters
(
self
):
self
.
hidden1_units
=
envs
.
get_global_env
(
"hyper_parameters.hidden1_units"
,
75
)
self
.
hidden2_units
=
envs
.
get_global_env
(
"hyper_parameters.hidden2_units"
,
50
)
self
.
hidden3_units
=
envs
.
get_global_env
(
"hyper_parameters.hidden3_units"
,
25
)
def
wide_part
(
self
,
data
):
def
wide_part
(
self
,
data
):
out
=
fluid
.
layers
.
fc
(
out
=
fluid
.
layers
.
fc
(
input
=
data
,
input
=
data
,
...
@@ -56,21 +64,14 @@ class Model(ModelBase):
...
@@ -56,21 +64,14 @@ class Model(ModelBase):
return
l3
return
l3
def
train_net
(
self
):
def
net
(
self
,
inputs
,
is_infer
=
False
):
self
.
_init_slots
()
wide_input
=
self
.
_dense_data_var
[
0
]
wide_input
=
self
.
_dense_data_var
[
0
]
deep_input
=
self
.
_dense_data_var
[
1
]
deep_input
=
self
.
_dense_data_var
[
1
]
label
=
self
.
_sparse_data_var
[
0
]
label
=
self
.
_sparse_data_var
[
0
]
hidden1_units
=
envs
.
get_global_env
(
"hyper_parameters.hidden1_units"
,
75
,
self
.
_namespace
)
hidden2_units
=
envs
.
get_global_env
(
"hyper_parameters.hidden2_units"
,
50
,
self
.
_namespace
)
hidden3_units
=
envs
.
get_global_env
(
"hyper_parameters.hidden3_units"
,
25
,
self
.
_namespace
)
wide_output
=
self
.
wide_part
(
wide_input
)
wide_output
=
self
.
wide_part
(
wide_input
)
deep_output
=
self
.
deep_part
(
deep_input
,
hidden1_units
,
hidden2
_units
,
deep_output
=
self
.
deep_part
(
deep_input
,
self
.
hidden1
_units
,
hidden3_units
)
self
.
hidden2_units
,
self
.
hidden3_units
)
wide_model
=
fluid
.
layers
.
fc
(
wide_model
=
fluid
.
layers
.
fc
(
input
=
wide_output
,
input
=
wide_output
,
...
@@ -109,18 +110,12 @@ class Model(ModelBase):
...
@@ -109,18 +110,12 @@ class Model(ModelBase):
self
.
_metrics
[
"AUC"
]
=
auc_var
self
.
_metrics
[
"AUC"
]
=
auc_var
self
.
_metrics
[
"BATCH_AUC"
]
=
batch_auc
self
.
_metrics
[
"BATCH_AUC"
]
=
batch_auc
self
.
_metrics
[
"ACC"
]
=
acc
self
.
_metrics
[
"ACC"
]
=
acc
if
is_infer
:
self
.
_infer_results
[
"AUC"
]
=
auc_var
self
.
_infer_results
[
"ACC"
]
=
acc
cost
=
fluid
.
layers
.
sigmoid_cross_entropy_with_logits
(
cost
=
fluid
.
layers
.
sigmoid_cross_entropy_with_logits
(
x
=
prediction
,
label
=
fluid
.
layers
.
cast
(
x
=
prediction
,
label
=
fluid
.
layers
.
cast
(
label
,
dtype
=
'float32'
))
label
,
dtype
=
'float32'
))
avg_cost
=
fluid
.
layers
.
mean
(
cost
)
avg_cost
=
fluid
.
layers
.
mean
(
cost
)
self
.
_cost
=
avg_cost
self
.
_cost
=
avg_cost
def
optimizer
(
self
):
learning_rate
=
envs
.
get_global_env
(
"hyper_parameters.learning_rate"
,
None
,
self
.
_namespace
)
optimizer
=
fluid
.
optimizer
.
Adam
(
learning_rate
,
lazy_mode
=
True
)
return
optimizer
def
infer_net
(
self
):
self
.
train_net
()
models/rank/xdeepfm/config.yaml
浏览文件 @
d190118a
...
@@ -11,41 +11,61 @@
...
@@ -11,41 +11,61 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# See the License for the specific language governing permissions and
# limitations under the License.
# limitations under the License.
debug
:
false
workspace
:
"
paddlerec.models.rank.xdeepfm"
train
:
dataset
:
trainer
:
-
name
:
sample_1
# for cluster training
type
:
QueueDataset
#或者DataLoader
strategy
:
"
async"
batch_size
:
5
data_path
:
"
{workspace}/data/sample_data/train"
epochs
:
10
sparse_slots
:
"
label
feat_idx"
workspace
:
"
paddlerec.models.rank.xdeepfm
"
dense_slots
:
"
feat_value:39
"
-
name
:
infer_sample
reader
:
type
:
QueueDataset
#或者DataLoader
batch_size
:
2
batch_size
:
5
train_
data_path
:
"
{workspace}/data/sample_data/train"
data_path
:
"
{workspace}/data/sample_data/train"
sparse_slots
:
"
label
feat_idx"
sparse_slots
:
"
label
feat_idx"
dense_slots
:
"
feat_value:39"
dense_slots
:
"
feat_value:39"
model
:
hyper_parameters
:
models
:
"
{workspace}/model.py"
optimizer
:
hyper_parameters
:
class
:
SGD
layer_sizes_dnn
:
[
10
,
10
,
10
]
learning_rate
:
0.0001
layer_sizes_cin
:
[
10
,
10
]
layer_sizes_dnn
:
[
10
,
10
,
10
]
sparse_feature_number
:
1086460
layer_sizes_cin
:
[
10
,
10
]
sparse_feature_dim
:
9
sparse_feature_number
:
1086460
num_field
:
39
sparse_feature_dim
:
9
fc_sizes
:
[
400
,
400
,
400
]
num_field
:
39
learning_rate
:
0.0001
fc_sizes
:
[
400
,
400
,
400
]
reg
:
0.0001
act
:
"
relu"
act
:
"
relu"
optimizer
:
SGD
mode
:
train_runner
# if infer, change mode to "infer_runner" and change phase to "infer_phase"
runner
:
-
name
:
train_runner
trainer_class
:
single_train
epochs
:
1
device
:
cpu
init_model_path
:
"
"
save_checkpoint_interval
:
1
save_inference_interval
:
1
save_checkpoint_path
:
"
increment"
save_inference_path
:
"
inference"
-
name
:
infer_runner
trainer_class
:
single_infer
epochs
:
1
device
:
cpu
init_model_path
:
"
increment/0"
sav
e
:
phas
e
:
increment
:
-
name
:
phase1
dirname
:
"
increment
"
model
:
"
{workspace}/model.py
"
epoch_interval
:
2
dataset_name
:
sample_1
save_last
:
True
thread_num
:
1
inference
:
#- name: infer_phase
dirname
:
"
inference
"
# model: "{workspace}/model.py
"
epoch_interval
:
4
# dataset_name: infer_sample
save_last
:
True
# thread_num: 1
models/rank/xdeepfm/model.py
浏览文件 @
d190118a
...
@@ -22,38 +22,45 @@ class Model(ModelBase):
...
@@ -22,38 +22,45 @@ class Model(ModelBase):
def
__init__
(
self
,
config
):
def
__init__
(
self
,
config
):
ModelBase
.
__init__
(
self
,
config
)
ModelBase
.
__init__
(
self
,
config
)
def
xdeepfm_net
(
self
):
def
_init_hyper_parameters
(
self
):
self
.
sparse_feature_number
=
envs
.
get_global_env
(
"hyper_parameters.sparse_feature_number"
,
None
)
self
.
sparse_feature_dim
=
envs
.
get_global_env
(
"hyper_parameters.sparse_feature_dim"
,
None
)
self
.
num_field
=
envs
.
get_global_env
(
"hyper_parameters.num_field"
,
None
)
self
.
layer_sizes_cin
=
envs
.
get_global_env
(
"hyper_parameters.layer_sizes_cin"
,
None
)
self
.
layer_sizes_dnn
=
envs
.
get_global_env
(
"hyper_parameters.layer_sizes_dnn"
,
None
)
self
.
act
=
envs
.
get_global_env
(
"hyper_parameters.act"
,
None
)
def
net
(
self
,
inputs
,
is_infer
=
False
):
raw_feat_idx
=
self
.
_sparse_data_var
[
1
]
raw_feat_value
=
self
.
_dense_data_var
[
0
]
self
.
label
=
self
.
_sparse_data_var
[
0
]
init_value_
=
0.1
init_value_
=
0.1
initer
=
fluid
.
initializer
.
TruncatedNormalInitializer
(
initer
=
fluid
.
initializer
.
TruncatedNormalInitializer
(
loc
=
0.0
,
scale
=
init_value_
)
loc
=
0.0
,
scale
=
init_value_
)
is_distributed
=
True
if
envs
.
get_trainer
()
==
"CtrTrainer"
else
False
is_distributed
=
True
if
envs
.
get_trainer
()
==
"CtrTrainer"
else
False
sparse_feature_number
=
envs
.
get_global_env
(
"hyper_parameters.sparse_feature_number"
,
None
,
self
.
_namespace
)
sparse_feature_dim
=
envs
.
get_global_env
(
"hyper_parameters.sparse_feature_dim"
,
None
,
self
.
_namespace
)
# ------------------------- network input --------------------------
# ------------------------- network input --------------------------
num_field
=
envs
.
get_global_env
(
"hyper_parameters.num_field"
,
None
,
self
.
_namespace
)
raw_feat_idx
=
self
.
_sparse_data_var
[
1
]
raw_feat_value
=
self
.
_dense_data_var
[
0
]
self
.
label
=
self
.
_sparse_data_var
[
0
]
feat_idx
=
raw_feat_idx
feat_idx
=
raw_feat_idx
feat_value
=
fluid
.
layers
.
reshape
(
feat_value
=
fluid
.
layers
.
reshape
(
raw_feat_value
,
[
-
1
,
num_field
,
1
])
# None * num_field * 1
raw_feat_value
,
[
-
1
,
self
.
num_field
,
1
])
# None * num_field * 1
feat_embeddings
=
fluid
.
embedding
(
feat_embeddings
=
fluid
.
embedding
(
input
=
feat_idx
,
input
=
feat_idx
,
is_sparse
=
True
,
is_sparse
=
True
,
dtype
=
'float32'
,
dtype
=
'float32'
,
size
=
[
s
parse_feature_number
+
1
,
sparse_feature_dim
],
size
=
[
s
elf
.
sparse_feature_number
+
1
,
self
.
sparse_feature_dim
],
padding_idx
=
0
,
padding_idx
=
0
,
param_attr
=
fluid
.
ParamAttr
(
initializer
=
initer
))
param_attr
=
fluid
.
ParamAttr
(
initializer
=
initer
))
feat_embeddings
=
fluid
.
layers
.
reshape
(
feat_embeddings
,
[
feat_embeddings
=
fluid
.
layers
.
reshape
(
feat_embeddings
,
[
-
1
,
num_field
,
sparse_feature_dim
-
1
,
self
.
num_field
,
self
.
sparse_feature_dim
])
# None * num_field * embedding_size
])
# None * num_field * embedding_size
feat_embeddings
=
feat_embeddings
*
feat_value
# None * num_field * embedding_size
feat_embeddings
=
feat_embeddings
*
feat_value
# None * num_field * embedding_size
...
@@ -63,11 +70,11 @@ class Model(ModelBase):
...
@@ -63,11 +70,11 @@ class Model(ModelBase):
input
=
feat_idx
,
input
=
feat_idx
,
is_sparse
=
True
,
is_sparse
=
True
,
dtype
=
'float32'
,
dtype
=
'float32'
,
size
=
[
sparse_feature_number
+
1
,
1
],
size
=
[
s
elf
.
s
parse_feature_number
+
1
,
1
],
padding_idx
=
0
,
padding_idx
=
0
,
param_attr
=
fluid
.
ParamAttr
(
initializer
=
initer
))
param_attr
=
fluid
.
ParamAttr
(
initializer
=
initer
))
weights_linear
=
fluid
.
layers
.
reshape
(
weights_linear
=
fluid
.
layers
.
reshape
(
weights_linear
,
[
-
1
,
num_field
,
1
])
# None * num_field * 1
weights_linear
,
[
-
1
,
self
.
num_field
,
1
])
# None * num_field * 1
b_linear
=
fluid
.
layers
.
create_parameter
(
b_linear
=
fluid
.
layers
.
create_parameter
(
shape
=
[
1
],
shape
=
[
1
],
dtype
=
'float32'
,
dtype
=
'float32'
,
...
@@ -77,31 +84,30 @@ class Model(ModelBase):
...
@@ -77,31 +84,30 @@ class Model(ModelBase):
# -------------------- CIN --------------------
# -------------------- CIN --------------------
layer_sizes_cin
=
envs
.
get_global_env
(
"hyper_parameters.layer_sizes_cin"
,
None
,
self
.
_namespace
)
Xs
=
[
feat_embeddings
]
Xs
=
[
feat_embeddings
]
last_s
=
num_field
last_s
=
self
.
num_field
for
s
in
layer_sizes_cin
:
for
s
in
self
.
layer_sizes_cin
:
# calculate Z^(k+1) with X^k and X^0
# calculate Z^(k+1) with X^k and X^0
X_0
=
fluid
.
layers
.
reshape
(
X_0
=
fluid
.
layers
.
reshape
(
fluid
.
layers
.
transpose
(
Xs
[
0
],
[
0
,
2
,
1
]),
fluid
.
layers
.
transpose
(
Xs
[
0
],
[
0
,
2
,
1
]),
[
-
1
,
s
parse_feature_dim
,
num_field
,
[
-
1
,
s
elf
.
sparse_feature_dim
,
self
.
num_field
,
1
])
# None, embedding_size, num_field, 1
1
])
# None, embedding_size, num_field, 1
X_k
=
fluid
.
layers
.
reshape
(
X_k
=
fluid
.
layers
.
reshape
(
fluid
.
layers
.
transpose
(
Xs
[
-
1
],
[
0
,
2
,
1
]),
fluid
.
layers
.
transpose
(
Xs
[
-
1
],
[
0
,
2
,
1
]),
[
-
1
,
sparse_feature_dim
,
1
,
[
-
1
,
s
elf
.
s
parse_feature_dim
,
1
,
last_s
])
# None, embedding_size, 1, last_s
last_s
])
# None, embedding_size, 1, last_s
Z_k_1
=
fluid
.
layers
.
matmul
(
Z_k_1
=
fluid
.
layers
.
matmul
(
X_0
,
X_k
)
# None, embedding_size, num_field, last_s
X_0
,
X_k
)
# None, embedding_size, num_field, last_s
# compresses Z^(k+1) to X^(k+1)
# compresses Z^(k+1) to X^(k+1)
Z_k_1
=
fluid
.
layers
.
reshape
(
Z_k_1
,
[
Z_k_1
=
fluid
.
layers
.
reshape
(
Z_k_1
,
[
-
1
,
s
parse_feature_dim
,
last_s
*
num_field
-
1
,
s
elf
.
sparse_feature_dim
,
last_s
*
self
.
num_field
])
# None, embedding_size, last_s*num_field
])
# None, embedding_size, last_s*num_field
Z_k_1
=
fluid
.
layers
.
transpose
(
Z_k_1
=
fluid
.
layers
.
transpose
(
Z_k_1
,
[
0
,
2
,
1
])
# None, s*num_field, embedding_size
Z_k_1
,
[
0
,
2
,
1
])
# None, s*num_field, embedding_size
Z_k_1
=
fluid
.
layers
.
reshape
(
Z_k_1
=
fluid
.
layers
.
reshape
(
Z_k_1
,
[
-
1
,
last_s
*
num_field
,
1
,
sparse_feature_dim
]
Z_k_1
,
[
-
1
,
last_s
*
self
.
num_field
,
1
,
self
.
sparse_feature_dim
]
)
# None, last_s*num_field, 1, embedding_size (None, channal_in, h, w)
)
# None, last_s*num_field, 1, embedding_size (None, channal_in, h, w)
X_k_1
=
fluid
.
layers
.
conv2d
(
X_k_1
=
fluid
.
layers
.
conv2d
(
Z_k_1
,
Z_k_1
,
...
@@ -112,7 +118,8 @@ class Model(ModelBase):
...
@@ -112,7 +118,8 @@ class Model(ModelBase):
param_attr
=
fluid
.
ParamAttr
(
param_attr
=
fluid
.
ParamAttr
(
initializer
=
initer
))
# None, s, 1, embedding_size
initializer
=
initer
))
# None, s, 1, embedding_size
X_k_1
=
fluid
.
layers
.
reshape
(
X_k_1
=
fluid
.
layers
.
reshape
(
X_k_1
,
[
-
1
,
s
,
sparse_feature_dim
])
# None, s, embedding_size
X_k_1
,
[
-
1
,
s
,
self
.
sparse_feature_dim
])
# None, s, embedding_size
Xs
.
append
(
X_k_1
)
Xs
.
append
(
X_k_1
)
last_s
=
s
last_s
=
s
...
@@ -130,17 +137,13 @@ class Model(ModelBase):
...
@@ -130,17 +137,13 @@ class Model(ModelBase):
# -------------------- DNN --------------------
# -------------------- DNN --------------------
layer_sizes_dnn
=
envs
.
get_global_env
(
y_dnn
=
fluid
.
layers
.
reshape
(
"hyper_parameters.layer_sizes_dnn"
,
None
,
self
.
_namespace
)
feat_embeddings
,
[
-
1
,
self
.
num_field
*
self
.
sparse_feature_dim
])
act
=
envs
.
get_global_env
(
"hyper_parameters.act"
,
None
,
for
s
in
self
.
layer_sizes_dnn
:
self
.
_namespace
)
y_dnn
=
fluid
.
layers
.
reshape
(
feat_embeddings
,
[
-
1
,
num_field
*
sparse_feature_dim
])
for
s
in
layer_sizes_dnn
:
y_dnn
=
fluid
.
layers
.
fc
(
y_dnn
=
fluid
.
layers
.
fc
(
input
=
y_dnn
,
input
=
y_dnn
,
size
=
s
,
size
=
s
,
act
=
act
,
act
=
self
.
act
,
param_attr
=
fluid
.
ParamAttr
(
initializer
=
initer
),
param_attr
=
fluid
.
ParamAttr
(
initializer
=
initer
),
bias_attr
=
None
)
bias_attr
=
None
)
y_dnn
=
fluid
.
layers
.
fc
(
input
=
y_dnn
,
y_dnn
=
fluid
.
layers
.
fc
(
input
=
y_dnn
,
...
@@ -152,11 +155,6 @@ class Model(ModelBase):
...
@@ -152,11 +155,6 @@ class Model(ModelBase):
# ------------------- xDeepFM ------------------
# ------------------- xDeepFM ------------------
self
.
predict
=
fluid
.
layers
.
sigmoid
(
y_linear
+
y_cin
+
y_dnn
)
self
.
predict
=
fluid
.
layers
.
sigmoid
(
y_linear
+
y_cin
+
y_dnn
)
def
train_net
(
self
):
self
.
_init_slots
()
self
.
xdeepfm_net
()
cost
=
fluid
.
layers
.
log_loss
(
cost
=
fluid
.
layers
.
log_loss
(
input
=
self
.
predict
,
input
=
self
.
predict
,
label
=
fluid
.
layers
.
cast
(
self
.
label
,
"float32"
),
label
=
fluid
.
layers
.
cast
(
self
.
label
,
"float32"
),
...
@@ -172,12 +170,5 @@ class Model(ModelBase):
...
@@ -172,12 +170,5 @@ class Model(ModelBase):
slide_steps
=
0
)
slide_steps
=
0
)
self
.
_metrics
[
"AUC"
]
=
auc_var
self
.
_metrics
[
"AUC"
]
=
auc_var
self
.
_metrics
[
"BATCH_AUC"
]
=
batch_auc_var
self
.
_metrics
[
"BATCH_AUC"
]
=
batch_auc_var
if
is_infer
:
def
optimizer
(
self
):
self
.
_infer_results
[
"AUC"
]
=
auc_var
learning_rate
=
envs
.
get_global_env
(
"hyper_parameters.learning_rate"
,
None
,
self
.
_namespace
)
optimizer
=
fluid
.
optimizer
.
Adam
(
learning_rate
,
lazy_mode
=
True
)
return
optimizer
def
infer_net
(
self
):
self
.
train_net
()
setup.py
浏览文件 @
d190118a
...
@@ -62,7 +62,8 @@ def build(dirname):
...
@@ -62,7 +62,8 @@ def build(dirname):
models_copy
=
[
models_copy
=
[
'data/*.txt'
,
'data/*/*.txt'
,
'*.yaml'
,
'*.sh'
,
'tree/*.npy'
,
'data/*.txt'
,
'data/*/*.txt'
,
'*.yaml'
,
'*.sh'
,
'tree/*.npy'
,
'tree/*.txt'
,
'data/sample_data/*'
,
'data/sample_data/train/*'
'tree/*.txt'
,
'data/sample_data/*'
,
'data/sample_data/train/*'
,
'data/sample_data/infer/*'
]
]
engine_copy
=
[
'*/*.sh'
]
engine_copy
=
[
'*/*.sh'
]
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录