Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
book
提交
aaefadae
B
book
项目概览
PaddlePaddle
/
book
通知
16
Star
4
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
40
列表
看板
标记
里程碑
合并请求
37
Wiki
5
Wiki
分析
仓库
DevOps
项目成员
Pages
B
book
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
40
Issue
40
列表
看板
标记
里程碑
合并请求
37
合并请求
37
Pages
分析
分析
仓库分析
DevOps
Wiki
5
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
未验证
提交
aaefadae
编写于
12月 05, 2018
作者:
L
lujun
提交者:
GitHub
12月 05, 2018
浏览文件
操作
浏览文件
下载
差异文件
Merge pull request #647 from junjun315/05-stuff
update to low level api--05 recommender system
上级
65036c6c
072c8c29
变更
3
隐藏空白更改
内联
并排
Showing
3 changed file
with
356 addition
and
227 deletion
+356
-227
05.recommender_system/README.cn.md
05.recommender_system/README.cn.md
+108
-71
05.recommender_system/index.cn.html
05.recommender_system/index.cn.html
+108
-71
05.recommender_system/train.py
05.recommender_system/train.py
+140
-85
未找到文件。
05.recommender_system/README.cn.md
浏览文件 @
aaefadae
...
@@ -225,15 +225,6 @@ import paddle
...
@@ -225,15 +225,6 @@ import paddle
import
paddle.fluid
as
fluid
import
paddle.fluid
as
fluid
import
paddle.fluid.layers
as
layers
import
paddle.fluid.layers
as
layers
import
paddle.fluid.nets
as
nets
import
paddle.fluid.nets
as
nets
try
:
from
paddle.fluid.contrib.trainer
import
*
from
paddle.fluid.contrib.inferencer
import
*
except
ImportError
:
print
(
"In the fluid 1.0, the trainer and inferencer are moving to paddle.fluid.contrib"
,
file
=
sys
.
stderr
)
from
paddle.fluid.trainer
import
*
from
paddle.fluid.inferencer
import
*
IS_SPARSE
=
True
IS_SPARSE
=
True
USE_GPU
=
False
USE_GPU
=
False
...
@@ -414,13 +405,8 @@ test_reader = paddle.batch(
...
@@ -414,13 +405,8 @@ test_reader = paddle.batch(
paddle
.
dataset
.
movielens
.
test
(),
batch_size
=
BATCH_SIZE
)
paddle
.
dataset
.
movielens
.
test
(),
batch_size
=
BATCH_SIZE
)
```
```
### 构造训练器(trainer)
### 构造训练过程(trainer)
训练器需要一个训练程序和一个训练优化函数。
我们这里构造了一个训练过程,包括训练优化函数。
```
python
trainer
=
Trainer
(
train_func
=
train_program
,
place
=
place
,
optimizer_func
=
optimizer_func
)
```
### 提供数据
### 提供数据
...
@@ -433,56 +419,92 @@ feed_order = [
...
@@ -433,56 +419,92 @@ feed_order = [
]
]
```
```
### 事件处理器
### 构建训练程序以及测试程序
回调函数
`event_handler`
在一个之前定义好的事件发生后会被调用。例如,我们可以在每步训练结束后查看误差。
分别构建训练程序和测试程序,并引入训练优化器。
```
python
main_program
=
fluid
.
default_main_program
()
star_program
=
fluid
.
default_startup_program
()
[
avg_cost
,
scale_infer
]
=
train_program
()
test_program
=
main_program
.
clone
(
for_test
=
True
)
sgd_optimizer
=
optimizer_func
()
sgd_optimizer
.
minimize
(
avg_cost
)
exe
=
fluid
.
Executor
(
place
)
def
train_test
(
program
,
reader
):
count
=
0
feed_var_list
=
[
program
.
global_block
().
var
(
var_name
)
for
var_name
in
feed_order
]
feeder_test
=
fluid
.
DataFeeder
(
feed_list
=
feed_var_list
,
place
=
place
)
test_exe
=
fluid
.
Executor
(
place
)
accumulated
=
len
([
avg_cost
,
scale_infer
])
*
[
0
]
for
test_data
in
reader
():
avg_cost_np
=
test_exe
.
run
(
program
=
program
,
feed
=
feeder_test
.
feed
(
test_data
),
fetch_list
=
[
avg_cost
,
scale_infer
])
accumulated
=
[
x
[
0
]
+
x
[
1
][
0
]
for
x
in
zip
(
accumulated
,
avg_cost_np
)]
count
+=
1
return
[
x
/
count
for
x
in
accumulated
]
```
### 构建训练主循环并开始训练
我们根据上面定义的训练循环数(
`PASS_NUM`
)和一些别的参数,来进行训练循环,并且每次循环都进行一次测试,当测试结果足够好时退出训练并保存训练好的参数。
```
python
```
python
# Specify the directory path to save the parameters
# Specify the directory path to save the parameters
params_dirname
=
"recommender_system.inference.model"
params_dirname
=
"recommender_system.inference.model"
def
event_handler
(
event
):
if
isinstance
(
event
,
EndStepEvent
):
from
paddle.utils.plot
import
Ploter
test_reader
=
paddle
.
batch
(
test_prompt
=
"Test cost"
paddle
.
dataset
.
movielens
.
test
(),
batch_size
=
BATCH_SIZE
)
plot_cost
=
Ploter
(
test_prompt
)
avg_cost_set
=
trainer
.
test
(
reader
=
test_reader
,
feed_order
=
feed_order
)
def
train_loop
():
feed_list
=
[
# get avg cost
main_program
.
global_block
().
var
(
var_name
)
for
var_name
in
feed_order
avg_cost
=
np
.
array
(
avg_cost_set
).
mean
()
]
feeder
=
fluid
.
DataFeeder
(
feed_list
,
place
)
print
(
"avg_cost: %s"
%
avg_cost
)
exe
.
run
(
star_program
)
if
float
(
avg_cost
)
<
4
:
# Change this number to adjust accuracy
for
pass_id
in
range
(
PASS_NUM
):
trainer
.
save_params
(
params_dirname
)
for
batch_id
,
data
in
enumerate
(
train_reader
()):
trainer
.
stop
()
# train a mini-batch
else
:
outs
=
exe
.
run
(
program
=
main_program
,
print
(
'BatchID {0}, Test Loss {1:0.2}'
.
format
(
event
.
epoch
+
1
,
feed
=
feeder
.
feed
(
data
),
float
(
avg_cost
)))
fetch_list
=
[
avg_cost
])
if
math
.
isnan
(
float
(
avg_cost
)):
out
=
np
.
array
(
outs
[
0
])
avg_cost_set
=
train_test
(
test_program
,
test_reader
)
# get test avg_cost
test_avg_cost
=
np
.
array
(
avg_cost_set
).
mean
()
plot_cost
.
append
(
test_prompt
,
batch_id
,
outs
[
0
])
plot_cost
.
plot
()
print
(
"avg_cost: %s"
%
test_avg_cost
)
if
batch_id
==
20
:
if
params_dirname
is
not
None
:
fluid
.
io
.
save_inference_model
(
params_dirname
,
[
"user_id"
,
"gender_id"
,
"age_id"
,
"job_id"
,
"movie_id"
,
"category_id"
,
"movie_title"
],
[
scale_infer
],
exe
)
return
else
:
print
(
'BatchID {0}, Test Loss {1:0.2}'
.
format
(
pass_id
+
1
,
float
(
test_avg_cost
)))
if
math
.
isnan
(
float
(
out
[
0
])):
sys
.
exit
(
"got NaN loss, training failed."
)
sys
.
exit
(
"got NaN loss, training failed."
)
```
```
### 开始训练
最后,我们传入训练循环数(
`num_epoch`
)和一些别的参数,调用
`trainer.train`
来开始训练。
```
python
```
python
trainer
.
train
(
train_loop
()
num_epochs
=
1
,
event_handler
=
event_handler
,
reader
=
train_reader
,
feed_order
=
feed_order
)
```
```
## 应用模型
## 应用模型
### 构建预测器
### 生成测试数据
传入
`inference_program`
和
`params_dirname`
来初始化一个预测器,
`params_dirname`
用来存放训练过程中的各个参数。
```
python
inferencer
=
Inferencer
(
inference_program
,
param_path
=
params_dirname
,
place
=
place
)
```
### 生成测试用输入数据
使用 create_lod_tensor(data, lod, place) 的API来生成细节层次的张量。
`data`
是一个序列,每个元素是一个索引号的序列。
`lod`
是细节层次的信息,对应于
`data`
。比如,data = [[10, 2, 3], [2, 3]] 意味着它包含两个序列,长度分别是3和2。于是相应地 lod = [[3, 2]],它表明其包含一层细节信息,意味着
`data`
有两个序列,长度分别是3和2。
使用 create_lod_tensor(data, lod, place) 的API来生成细节层次的张量。
`data`
是一个序列,每个元素是一个索引号的序列。
`lod`
是细节层次的信息,对应于
`data`
。比如,data = [[10, 2, 3], [2, 3]] 意味着它包含两个序列,长度分别是3和2。于是相应地 lod = [[3, 2]],它表明其包含一层细节信息,意味着
`data`
有两个序列,长度分别是3和2。
在这个预测例子中,我们试着预测用户ID为1的用户对于电影'Hunchback of Notre Dame'的评分
在这个预测例子中,我们试着预测用户ID为1的用户对于电影'Hunchback of Notre Dame'的评分
...
@@ -500,27 +522,42 @@ movie_title = fluid.create_lod_tensor([[1069, 4140, 2923, 710, 988]], [[5]],
...
@@ -500,27 +522,42 @@ movie_title = fluid.create_lod_tensor([[1069, 4140, 2923, 710, 988]], [[5]],
place
)
# 'hunchback','of','notre','dame','the'
place
)
# 'hunchback','of','notre','dame','the'
```
```
### 构建预测过程并测试
与训练过程类似,我们需要构建一个预测过程。其中,
`params_dirname`
是之前用来存放训练过程中的各个参数的地址。
```
python
place
=
fluid
.
CUDAPlace
(
0
)
if
use_cuda
else
fluid
.
CPUPlace
()
exe
=
fluid
.
Executor
(
place
)
inference_scope
=
fluid
.
core
.
Scope
()
```
### 测试
### 测试
现在我们可以进行预测了。我们要提供的
`feed_order`
应该和训练过程一致。
现在我们可以进行预测了。我们要提供的
`feed_order`
应该和训练过程一致。
```
python
```
python
results
=
inferencer
.
infer
(
with
fluid
.
scope_guard
(
inference_scope
):
{
[
inferencer
,
feed_target_names
,
'user_id'
:
user_id
,
fetch_targets
]
=
fluid
.
io
.
load_inference_model
(
params_dirname
,
exe
)
'gender_id'
:
gender_id
,
'age_id'
:
age_id
,
results
=
exe
.
run
(
inferencer
,
'job_id'
:
job_id
,
feed
=
{
'movie_id'
:
movie_id
,
'user_id'
:
user_id
,
'category_id'
:
category_id
,
'gender_id'
:
gender_id
,
'movie_title'
:
movie_title
'age_id'
:
age_id
,
},
'job_id'
:
job_id
,
return_numpy
=
False
)
'movie_id'
:
movie_id
,
'category_id'
:
category_id
,
predict_rating
=
np
.
array
(
results
[
0
])
'movie_title'
:
movie_title
print
(
"Predict Rating of user id 1 on movie
\"
"
+
infer_movie_name
+
"
\"
is "
+
str
(
predict_rating
[
0
][
0
]))
},
print
(
"Actual Rating of user id 1 on movie
\"
"
+
infer_movie_name
+
"
\"
is 4."
)
fetch_list
=
fetch_targets
,
return_numpy
=
False
)
predict_rating
=
np
.
array
(
results
[
0
])
print
(
"Predict Rating of user id 1 on movie
\"
"
+
infer_movie_name
+
"
\"
is "
+
str
(
predict_rating
[
0
][
0
]))
print
(
"Actual Rating of user id 1 on movie
\"
"
+
infer_movie_name
+
"
\"
is 4."
)
```
```
## 总结
## 总结
...
...
05.recommender_system/index.cn.html
浏览文件 @
aaefadae
...
@@ -267,15 +267,6 @@ import paddle
...
@@ -267,15 +267,6 @@ import paddle
import paddle.fluid as fluid
import paddle.fluid as fluid
import paddle.fluid.layers as layers
import paddle.fluid.layers as layers
import paddle.fluid.nets as nets
import paddle.fluid.nets as nets
try:
from paddle.fluid.contrib.trainer import *
from paddle.fluid.contrib.inferencer import *
except ImportError:
print(
"In the fluid 1.0, the trainer and inferencer are moving to paddle.fluid.contrib",
file=sys.stderr)
from paddle.fluid.trainer import *
from paddle.fluid.inferencer import *
IS_SPARSE = True
IS_SPARSE = True
USE_GPU = False
USE_GPU = False
...
@@ -456,13 +447,8 @@ test_reader = paddle.batch(
...
@@ -456,13 +447,8 @@ test_reader = paddle.batch(
paddle.dataset.movielens.test(), batch_size=BATCH_SIZE)
paddle.dataset.movielens.test(), batch_size=BATCH_SIZE)
```
```
### 构造训练器(trainer)
### 构造训练过程(trainer)
训练器需要一个训练程序和一个训练优化函数。
我们这里构造了一个训练过程,包括训练优化函数。
```python
trainer = Trainer(
train_func=train_program, place=place, optimizer_func=optimizer_func)
```
### 提供数据
### 提供数据
...
@@ -475,56 +461,92 @@ feed_order = [
...
@@ -475,56 +461,92 @@ feed_order = [
]
]
```
```
### 事件处理器
### 构建训练程序以及测试程序
回调函数`event_handler`在一个之前定义好的事件发生后会被调用。例如,我们可以在每步训练结束后查看误差。
分别构建训练程序和测试程序,并引入训练优化器。
```python
main_program = fluid.default_main_program()
star_program = fluid.default_startup_program()
[avg_cost, scale_infer] = train_program()
test_program = main_program.clone(for_test=True)
sgd_optimizer = optimizer_func()
sgd_optimizer.minimize(avg_cost)
exe = fluid.Executor(place)
def train_test(program, reader):
count = 0
feed_var_list = [
program.global_block().var(var_name) for var_name in feed_order
]
feeder_test = fluid.DataFeeder(
feed_list=feed_var_list, place=place)
test_exe = fluid.Executor(place)
accumulated = len([avg_cost, scale_infer]) * [0]
for test_data in reader():
avg_cost_np = test_exe.run(program=program,
feed=feeder_test.feed(test_data),
fetch_list=[avg_cost, scale_infer])
accumulated = [x[0] + x[1][0] for x in zip(accumulated, avg_cost_np)]
count += 1
return [x / count for x in accumulated]
```
### 构建训练主循环并开始训练
我们根据上面定义的训练循环数(`PASS_NUM`)和一些别的参数,来进行训练循环,并且每次循环都进行一次测试,当测试结果足够好时退出训练并保存训练好的参数。
```python
```python
# Specify the directory path to save the parameters
# Specify the directory path to save the parameters
params_dirname = "recommender_system.inference.model"
params_dirname = "recommender_system.inference.model"
def event_handler(event):
if isinstance(event, EndStepEvent):
from paddle.utils.plot import Ploter
test_reader = paddle.batch(
test_prompt = "Test cost"
paddle.dataset.movielens.test(), batch_size=BATCH_SIZE)
plot_cost = Ploter(test_prompt)
avg_cost_set = trainer.test(
reader=test_reader, feed_order=feed_order)
def train_loop():
feed_list = [
# get avg cost
main_program.global_block().var(var_name) for var_name in feed_order
avg_cost = np.array(avg_cost_set).mean()
]
feeder = fluid.DataFeeder(feed_list, place)
print("avg_cost: %s" % avg_cost)
exe.run(star_program)
if float(avg_cost)
<
4
:
#
Change
this
number
to
adjust
accuracy
for pass_id in range(PASS_NUM):
trainer.save_params
(
params_dirname
)
for batch_id, data in enumerate(train_reader()):
trainer.stop
()
# train a mini-batch
else:
outs = exe.run(program=main_program,
print
('
BatchID
{0},
Test
Loss
{1
:0.2
}'.
format
(
event.epoch
+
1,
feed=feeder.feed(data),
float
(
avg_cost
)))
fetch_list=[avg_cost])
if
math.isnan
(
float
(
avg_cost
))
:
out = np.array(outs[0])
avg_cost_set = train_test(test_program, test_reader)
# get test avg_cost
test_avg_cost = np.array(avg_cost_set).mean()
plot_cost.append(test_prompt, batch_id, outs[0])
plot_cost.plot()
print("avg_cost: %s" % test_avg_cost)
if batch_id == 20:
if params_dirname is not None:
fluid.io.save_inference_model(params_dirname, [
"user_id", "gender_id", "age_id", "job_id",
"movie_id", "category_id", "movie_title"
], [scale_infer], exe)
return
else:
print('BatchID {0}, Test Loss {1:0.2}'.format(pass_id + 1,
float(test_avg_cost)))
if math.isnan(float(out[0])):
sys.exit("got NaN loss, training failed.")
sys.exit("got NaN loss, training failed.")
```
```
###
开始训练
最后
,
我们传入训练循环数
(`
num_epoch
`)
和一些别的参数
,
调用
`
trainer.train
`
来开始训练
。
```python
```python
trainer.train
(
train_loop()
num_epochs=
1,
event_handler=
event_handler,
reader=
train_reader,
feed_order=
feed_order)
```
```
## 应用模型
## 应用模型
###
构建预测器
### 生成测试数据
传入
`
inference_program
`
和
`
params_dirname
`
来初始化一个预测器
,
`
params_dirname
`
用来存放训练过程中的各个参数
。
```
python
inferencer =
Inferencer(
inference_program
,
param_path=
params_dirname,
place=
place)
```
###
生成测试用输入数据
使用 create_lod_tensor(data, lod, place) 的API来生成细节层次的张量。`data`是一个序列,每个元素是一个索引号的序列。`lod`是细节层次的信息,对应于`data`。比如,data = [[10, 2, 3], [2, 3]] 意味着它包含两个序列,长度分别是3和2。于是相应地 lod = [[3, 2]],它表明其包含一层细节信息,意味着 `data` 有两个序列,长度分别是3和2。
使用 create_lod_tensor(data, lod, place) 的API来生成细节层次的张量。`data`是一个序列,每个元素是一个索引号的序列。`lod`是细节层次的信息,对应于`data`。比如,data = [[10, 2, 3], [2, 3]] 意味着它包含两个序列,长度分别是3和2。于是相应地 lod = [[3, 2]],它表明其包含一层细节信息,意味着 `data` 有两个序列,长度分别是3和2。
在这个预测例子中,我们试着预测用户ID为1的用户对于电影'Hunchback of Notre Dame'的评分
在这个预测例子中,我们试着预测用户ID为1的用户对于电影'Hunchback of Notre Dame'的评分
...
@@ -542,27 +564,42 @@ movie_title = fluid.create_lod_tensor([[1069, 4140, 2923, 710, 988]], [[5]],
...
@@ -542,27 +564,42 @@ movie_title = fluid.create_lod_tensor([[1069, 4140, 2923, 710, 988]], [[5]],
place) # 'hunchback','of','notre','dame','the'
place) # 'hunchback','of','notre','dame','the'
```
```
### 构建预测过程并测试
与训练过程类似,我们需要构建一个预测过程。其中, `params_dirname`是之前用来存放训练过程中的各个参数的地址。
```python
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
exe = fluid.Executor(place)
inference_scope = fluid.core.Scope()
```
### 测试
### 测试
现在我们可以进行预测了。我们要提供的`feed_order`应该和训练过程一致。
现在我们可以进行预测了。我们要提供的`feed_order`应该和训练过程一致。
```python
```python
results =
inferencer.infer(
with fluid.scope_guard(inference_scope):
{
[inferencer, feed_target_names,
'
user_id
'
:
user_id
,
fetch_targets] = fluid.io.load_inference_model(params_dirname, exe)
'
gender_id
'
:
gender_id
,
'
age_id
'
:
age_id
,
results = exe.run(inferencer,
'
job_id
'
:
job_id
,
feed={
'
movie_id
'
:
movie_id
,
'user_id': user_id,
'
category_id
'
:
category_id
,
'gender_id': gender_id,
'
movie_title
'
:
movie_title
'age_id': age_id,
},
'job_id': job_id,
return_numpy=
False)
'movie_id': movie_id,
'category_id': category_id,
predict_rating =
np.array(results[0])
'movie_title': movie_title
print
("
Predict
Rating
of
user
id
1
on
movie
\""
+
infer_movie_name
+
"\"
is
"
+
str
(
predict_rating
[0][0]))
},
print
("
Actual
Rating
of
user
id
1
on
movie
\""
+
infer_movie_name
+
"\"
is
4.")
fetch_list=fetch_targets,
return_numpy=False)
predict_rating = np.array(results[0])
print("Predict Rating of user id 1 on movie \"" + infer_movie_name +
"\" is " + str(predict_rating[0][0]))
print("Actual Rating of user id 1 on movie \"" + infer_movie_name +
"\" is 4.")
```
```
## 总结
## 总结
...
...
05.recommender_system/train.py
浏览文件 @
aaefadae
...
@@ -20,19 +20,11 @@ import paddle
...
@@ -20,19 +20,11 @@ import paddle
import
paddle.fluid
as
fluid
import
paddle.fluid
as
fluid
import
paddle.fluid.layers
as
layers
import
paddle.fluid.layers
as
layers
import
paddle.fluid.nets
as
nets
import
paddle.fluid.nets
as
nets
try
:
from
paddle.fluid.contrib.trainer
import
*
from
paddle.fluid.contrib.inferencer
import
*
except
ImportError
:
print
(
"In the fluid 1.0, the trainer and inferencer are moving to paddle.fluid.contrib"
,
file
=
sys
.
stderr
)
from
paddle.fluid.trainer
import
*
from
paddle.fluid.inferencer
import
*
IS_SPARSE
=
True
IS_SPARSE
=
True
USE_GPU
=
False
USE_GPU
=
False
BATCH_SIZE
=
256
BATCH_SIZE
=
256
PASS_NUM
=
100
def
get_usr_combined_features
():
def
get_usr_combined_features
():
...
@@ -148,71 +140,101 @@ def inference_program():
...
@@ -148,71 +140,101 @@ def inference_program():
inference
=
layers
.
cos_sim
(
X
=
usr_combined_features
,
Y
=
mov_combined_features
)
inference
=
layers
.
cos_sim
(
X
=
usr_combined_features
,
Y
=
mov_combined_features
)
scale_infer
=
layers
.
scale
(
x
=
inference
,
scale
=
5.0
)
scale_infer
=
layers
.
scale
(
x
=
inference
,
scale
=
5.0
)
return
scale_infer
def
train_program
():
scale_infer
=
inference_program
()
label
=
layers
.
data
(
name
=
'score'
,
shape
=
[
1
],
dtype
=
'float32'
)
label
=
layers
.
data
(
name
=
'score'
,
shape
=
[
1
],
dtype
=
'float32'
)
square_cost
=
layers
.
square_error_cost
(
input
=
scale_infer
,
label
=
label
)
square_cost
=
layers
.
square_error_cost
(
input
=
scale_infer
,
label
=
label
)
avg_cost
=
layers
.
mean
(
square_cost
)
avg_cost
=
layers
.
mean
(
square_cost
)
return
[
avg_cost
,
scale_infer
]
return
scale_infer
,
avg_cost
def
optimizer_func
():
def
optimizer_func
():
return
fluid
.
optimizer
.
SGD
(
learning_rate
=
0.2
)
return
fluid
.
optimizer
.
SGD
(
learning_rate
=
0.2
)
def
train
(
use_cuda
,
train_program
,
params_dirname
):
def
train
(
use_cuda
,
params_dirname
):
place
=
fluid
.
CUDAPlace
(
0
)
if
use_cuda
else
fluid
.
CPUPlace
()
place
=
fluid
.
CUDAPlace
(
0
)
if
use_cuda
else
fluid
.
CPUPlace
()
trainer
=
Trainer
(
train_reader
=
paddle
.
batch
(
train_func
=
train_program
,
place
=
place
,
optimizer_func
=
optimizer_func
)
paddle
.
reader
.
shuffle
(
paddle
.
dataset
.
movielens
.
train
(),
buf_size
=
8192
),
batch_size
=
BATCH_SIZE
)
test_reader
=
paddle
.
batch
(
paddle
.
dataset
.
movielens
.
test
(),
batch_size
=
BATCH_SIZE
)
feed_order
=
[
feed_order
=
[
'user_id'
,
'gender_id'
,
'age_id'
,
'job_id'
,
'movie_id'
,
'category_id'
,
'user_id'
,
'gender_id'
,
'age_id'
,
'job_id'
,
'movie_id'
,
'category_id'
,
'movie_title'
,
'score'
'movie_title'
,
'score'
]
]
def
event_handler
(
event
):
main_program
=
fluid
.
default_main_program
()
if
isinstance
(
event
,
EndStepEvent
):
star_program
=
fluid
.
default_startup_program
()
test_reader
=
paddle
.
batch
(
scale_infer
,
avg_cost
=
inference_program
()
paddle
.
dataset
.
movielens
.
test
(),
batch_size
=
BATCH_SIZE
)
avg_cost_set
=
trainer
.
test
(
test_program
=
main_program
.
clone
(
for_test
=
True
)
reader
=
test_reader
,
feed_order
=
feed_order
)
sgd_optimizer
=
optimizer_func
()
sgd_optimizer
.
minimize
(
avg_cost
)
# get avg cost
exe
=
fluid
.
Executor
(
place
)
avg_cost
=
np
.
array
(
avg_cost_set
).
mean
()
def
train_test
(
program
,
reader
):
print
(
"avg_cost: %s"
%
avg_cost
)
count
=
0
feed_var_list
=
[
if
float
(
avg_cost
)
<
4
:
# Change this number to adjust accuracy
program
.
global_block
().
var
(
var_name
)
for
var_name
in
feed_order
trainer
.
save_params
(
params_dirname
)
]
trainer
.
stop
()
feeder_test
=
fluid
.
DataFeeder
(
feed_list
=
feed_var_list
,
place
=
place
)
else
:
test_exe
=
fluid
.
Executor
(
place
)
print
(
'BatchID {0}, Test Loss {1:0.2}'
.
format
(
event
.
epoch
+
1
,
accumulated
=
len
([
avg_cost
,
scale_infer
])
*
[
0
]
float
(
avg_cost
)))
for
test_data
in
reader
():
if
math
.
isnan
(
float
(
avg_cost
)):
avg_cost_np
=
test_exe
.
run
(
program
=
program
,
feed
=
feeder_test
.
feed
(
test_data
),
fetch_list
=
[
avg_cost
,
scale_infer
])
accumulated
=
[
x
[
0
]
+
x
[
1
][
0
]
for
x
in
zip
(
accumulated
,
avg_cost_np
)
]
count
+=
1
return
[
x
/
count
for
x
in
accumulated
]
def
train_loop
():
feed_list
=
[
main_program
.
global_block
().
var
(
var_name
)
for
var_name
in
feed_order
]
feeder
=
fluid
.
DataFeeder
(
feed_list
,
place
)
exe
.
run
(
star_program
)
for
pass_id
in
range
(
PASS_NUM
):
for
batch_id
,
data
in
enumerate
(
train_reader
()):
# train a mini-batch
outs
=
exe
.
run
(
program
=
main_program
,
feed
=
feeder
.
feed
(
data
),
fetch_list
=
[
avg_cost
])
out
=
np
.
array
(
outs
[
0
])
avg_cost_set
=
train_test
(
test_program
,
test_reader
)
# get test avg_cost
test_avg_cost
=
np
.
array
(
avg_cost_set
).
mean
()
print
(
"avg_cost: %s"
%
test_avg_cost
)
# if test_avg_cost < 4.0: # Change this number to adjust accuracy
if
batch_id
==
20
:
if
params_dirname
is
not
None
:
fluid
.
io
.
save_inference_model
(
params_dirname
,
[
"user_id"
,
"gender_id"
,
"age_id"
,
"job_id"
,
"movie_id"
,
"category_id"
,
"movie_title"
],
[
scale_infer
],
exe
)
return
else
:
print
(
'BatchID {0}, Test Loss {1:0.2}'
.
format
(
pass_id
+
1
,
float
(
test_avg_cost
)))
if
math
.
isnan
(
float
(
out
[
0
])):
sys
.
exit
(
"got NaN loss, training failed."
)
sys
.
exit
(
"got NaN loss, training failed."
)
train_reader
=
paddle
.
batch
(
train_loop
()
paddle
.
reader
.
shuffle
(
paddle
.
dataset
.
movielens
.
train
(),
buf_size
=
8192
),
batch_size
=
BATCH_SIZE
)
trainer
.
train
(
num_epochs
=
1
,
event_handler
=
event_handler
,
reader
=
train_reader
,
feed_order
=
feed_order
)
def
infer
(
use_cuda
,
inference_program
,
params_dirname
):
def
infer
(
use_cuda
,
params_dirname
):
place
=
fluid
.
CUDAPlace
(
0
)
if
use_cuda
else
fluid
.
CPUPlace
()
place
=
fluid
.
CUDAPlace
(
0
)
if
use_cuda
else
fluid
.
CPUPlace
()
inferencer
=
Inferencer
(
inference_program
,
param_path
=
params_dirname
,
place
=
place
)
# Use the first data from paddle.dataset.movielens.test() as input.
# Use the first data from paddle.dataset.movielens.test() as input.
# Use create_lod_tensor(data, lod, place) API to generate LoD Tensor,
# Use create_lod_tensor(data, lod, place) API to generate LoD Tensor,
...
@@ -225,46 +247,79 @@ def infer(use_cuda, inference_program, params_dirname):
...
@@ -225,46 +247,79 @@ def infer(use_cuda, inference_program, params_dirname):
infer_movie_id
=
783
infer_movie_id
=
783
infer_movie_name
=
paddle
.
dataset
.
movielens
.
movie_info
()[
infer_movie_name
=
paddle
.
dataset
.
movielens
.
movie_info
()[
infer_movie_id
].
title
infer_movie_id
].
title
user_id
=
fluid
.
create_lod_tensor
([[
1
]],
[[
1
]],
place
)
gender_id
=
fluid
.
create_lod_tensor
([[
1
]],
[[
1
]],
place
)
exe
=
fluid
.
Executor
(
place
)
age_id
=
fluid
.
create_lod_tensor
([[
0
]],
[[
1
]],
place
)
job_id
=
fluid
.
create_lod_tensor
([[
10
]],
[[
1
]],
place
)
inference_scope
=
fluid
.
core
.
Scope
()
movie_id
=
fluid
.
create_lod_tensor
([[
783
]],
[[
1
]],
place
)
category_id
=
fluid
.
create_lod_tensor
([[
10
,
8
,
9
]],
[[
3
]],
place
)
with
fluid
.
scope_guard
(
inference_scope
):
movie_title
=
fluid
.
create_lod_tensor
([[
1069
,
4140
,
2923
,
710
,
988
]],
[[
5
]],
# Use fluid.io.load_inference_model to obtain the inference program desc,
place
)
# the feed_target_names (the names of variables that will be feeded
# data using feed operators), and the fetch_targets (variables that
results
=
inferencer
.
infer
(
# we want to obtain data from using fetch operators).
{
[
inferencer
,
feed_target_names
,
'user_id'
:
user_id
,
fetch_targets
]
=
fluid
.
io
.
load_inference_model
(
params_dirname
,
exe
)
'gender_id'
:
gender_id
,
'age_id'
:
age_id
,
# Use the first data from paddle.dataset.movielens.test() as input
'job_id'
:
job_id
,
assert
feed_target_names
[
0
]
==
"user_id"
'movie_id'
:
movie_id
,
# Use create_lod_tensor(data, recursive_sequence_lengths, place) API
'category_id'
:
category_id
,
# to generate LoD Tensor where `data` is a list of sequences of index
'movie_title'
:
movie_title
# numbers, `recursive_sequence_lengths` is the length-based level of detail
},
# (lod) info associated with `data`.
return_numpy
=
False
)
# For example, data = [[10, 2, 3], [2, 3]] means that it contains
# two sequences of indexes, of length 3 and 2, respectively.
predict_rating
=
np
.
array
(
results
[
0
])
# Correspondingly, recursive_sequence_lengths = [[3, 2]] contains one
print
(
"Predict Rating of user id 1 on movie
\"
"
+
infer_movie_name
+
# level of detail info, indicating that `data` consists of two sequences
"
\"
is "
+
str
(
predict_rating
[
0
][
0
]))
# of length 3 and 2, respectively.
print
(
"Actual Rating of user id 1 on movie
\"
"
+
infer_movie_name
+
user_id
=
fluid
.
create_lod_tensor
([[
1
]],
[[
1
]],
place
)
"
\"
is 4."
)
assert
feed_target_names
[
1
]
==
"gender_id"
gender_id
=
fluid
.
create_lod_tensor
([[
1
]],
[[
1
]],
place
)
assert
feed_target_names
[
2
]
==
"age_id"
age_id
=
fluid
.
create_lod_tensor
([[
0
]],
[[
1
]],
place
)
assert
feed_target_names
[
3
]
==
"job_id"
job_id
=
fluid
.
create_lod_tensor
([[
10
]],
[[
1
]],
place
)
assert
feed_target_names
[
4
]
==
"movie_id"
movie_id
=
fluid
.
create_lod_tensor
([[
783
]],
[[
1
]],
place
)
assert
feed_target_names
[
5
]
==
"category_id"
category_id
=
fluid
.
create_lod_tensor
([[
10
,
8
,
9
]],
[[
3
]],
place
)
assert
feed_target_names
[
6
]
==
"movie_title"
movie_title
=
fluid
.
create_lod_tensor
([[
1069
,
4140
,
2923
,
710
,
988
]],
[[
5
]],
place
)
# Construct feed as a dictionary of {feed_target_name: feed_target_data}
# and results will contain a list of data corresponding to fetch_targets.
results
=
exe
.
run
(
inferencer
,
feed
=
{
feed_target_names
[
0
]:
user_id
,
feed_target_names
[
1
]:
gender_id
,
feed_target_names
[
2
]:
age_id
,
feed_target_names
[
3
]:
job_id
,
feed_target_names
[
4
]:
movie_id
,
feed_target_names
[
5
]:
category_id
,
feed_target_names
[
6
]:
movie_title
},
fetch_list
=
fetch_targets
,
return_numpy
=
False
)
predict_rating
=
np
.
array
(
results
[
0
])
print
(
"Predict Rating of user id 1 on movie
\"
"
+
infer_movie_name
+
"
\"
is "
+
str
(
predict_rating
[
0
][
0
]))
print
(
"Actual Rating of user id 1 on movie
\"
"
+
infer_movie_name
+
"
\"
is 4."
)
def
main
(
use_cuda
):
def
main
(
use_cuda
):
if
use_cuda
and
not
fluid
.
core
.
is_compiled_with_cuda
():
if
use_cuda
and
not
fluid
.
core
.
is_compiled_with_cuda
():
return
return
params_dirname
=
"recommender_system.inference.model"
params_dirname
=
"recommender_system.inference.model"
train
(
train
(
use_cuda
=
use_cuda
,
params_dirname
=
params_dirname
)
use_cuda
=
use_cuda
,
infer
(
use_cuda
=
use_cuda
,
params_dirname
=
params_dirname
)
train_program
=
train_program
,
params_dirname
=
params_dirname
)
infer
(
use_cuda
=
use_cuda
,
inference_program
=
inference_program
,
params_dirname
=
params_dirname
)
if
__name__
==
'__main__'
:
if
__name__
==
'__main__'
:
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录