Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
PaddleRec
提交
6d74ba9d
P
PaddleRec
项目概览
PaddlePaddle
/
PaddleRec
通知
68
Star
12
Fork
5
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
27
列表
看板
标记
里程碑
合并请求
10
Wiki
1
Wiki
分析
仓库
DevOps
项目成员
Pages
P
PaddleRec
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
27
Issue
27
列表
看板
标记
里程碑
合并请求
10
合并请求
10
Pages
分析
分析
仓库分析
DevOps
Wiki
1
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
6d74ba9d
编写于
8月 13, 2020
作者:
S
seiriosPlus
浏览文件
操作
浏览文件
下载
差异文件
Merge branch 'master' of
https://github.com/PaddlePaddle/PaddleRec
into online_training
上级
84a9ad29
efeb9f96
变更
60
隐藏空白更改
内联
并排
Showing
60 changed file
with
106 addition
and
454 deletion
+106
-454
.travis.yml
.travis.yml
+8
-3
README.md
README.md
+4
-1
README_EN.md
README_EN.md
+4
-1
doc/custom_reader.md
doc/custom_reader.md
+0
-362
doc/distributed_train.md
doc/distributed_train.md
+1
-1
doc/model_develop.md
doc/model_develop.md
+1
-1
doc/train.md
doc/train.md
+1
-1
models/contentunderstanding/classification/config.yaml
models/contentunderstanding/classification/config.yaml
+1
-1
models/contentunderstanding/readme.md
models/contentunderstanding/readme.md
+5
-2
models/contentunderstanding/tagspace/config.yaml
models/contentunderstanding/tagspace/config.yaml
+1
-1
models/demo/movie_recommand/rank/config.yaml
models/demo/movie_recommand/rank/config.yaml
+1
-1
models/demo/movie_recommand/recall/config.yaml
models/demo/movie_recommand/recall/config.yaml
+1
-1
models/match/dssm/config.yaml
models/match/dssm/config.yaml
+1
-1
models/match/match-pyramid/config.yaml
models/match/match-pyramid/config.yaml
+1
-1
models/match/multiview-simnet/config.yaml
models/match/multiview-simnet/config.yaml
+1
-1
models/match/readme.md
models/match/readme.md
+5
-2
models/multitask/esmm/config.yaml
models/multitask/esmm/config.yaml
+1
-1
models/multitask/mmoe/config.yaml
models/multitask/mmoe/config.yaml
+1
-1
models/multitask/readme.md
models/multitask/readme.md
+6
-3
models/multitask/share-bottom/config.yaml
models/multitask/share-bottom/config.yaml
+1
-1
models/rank/AutoInt/config.yaml
models/rank/AutoInt/config.yaml
+1
-1
models/rank/BST/config.yaml
models/rank/BST/config.yaml
+1
-1
models/rank/afm/config.yaml
models/rank/afm/config.yaml
+1
-1
models/rank/dcn/config.yaml
models/rank/dcn/config.yaml
+1
-1
models/rank/deep_crossing/config.yaml
models/rank/deep_crossing/config.yaml
+1
-1
models/rank/deepfm/config.yaml
models/rank/deepfm/config.yaml
+1
-1
models/rank/dien/config.yaml
models/rank/dien/config.yaml
+1
-1
models/rank/din/config.yaml
models/rank/din/config.yaml
+1
-1
models/rank/dnn/config.yaml
models/rank/dnn/config.yaml
+1
-1
models/rank/ffm/config.yaml
models/rank/ffm/config.yaml
+1
-1
models/rank/fgcnn/config.yaml
models/rank/fgcnn/config.yaml
+1
-1
models/rank/fibinet/README.md
models/rank/fibinet/README.md
+1
-1
models/rank/fibinet/config.yaml
models/rank/fibinet/config.yaml
+1
-1
models/rank/flen/README.md
models/rank/flen/README.md
+1
-1
models/rank/flen/config.yaml
models/rank/flen/config.yaml
+1
-1
models/rank/fm/config.yaml
models/rank/fm/config.yaml
+1
-1
models/rank/fnn/config.yaml
models/rank/fnn/config.yaml
+1
-1
models/rank/logistic_regression/config.yaml
models/rank/logistic_regression/config.yaml
+1
-1
models/rank/nfm/config.yaml
models/rank/nfm/config.yaml
+1
-1
models/rank/pnn/config.yaml
models/rank/pnn/config.yaml
+1
-1
models/rank/readme.md
models/rank/readme.md
+1
-1
models/rank/wide_deep/config.yaml
models/rank/wide_deep/config.yaml
+1
-1
models/rank/xdeepfm/config.yaml
models/rank/xdeepfm/config.yaml
+1
-1
models/recall/fasttext/config.yaml
models/recall/fasttext/config.yaml
+1
-1
models/recall/gnn/config.yaml
models/recall/gnn/config.yaml
+1
-1
models/recall/gnn/readme.md
models/recall/gnn/readme.md
+1
-1
models/recall/gru4rec/config.yaml
models/recall/gru4rec/config.yaml
+1
-1
models/recall/look-alike_recall/README.md
models/recall/look-alike_recall/README.md
+1
-1
models/recall/look-alike_recall/config.yaml
models/recall/look-alike_recall/config.yaml
+1
-1
models/recall/ncf/config.yaml
models/recall/ncf/config.yaml
+1
-1
models/recall/readme.md
models/recall/readme.md
+12
-6
models/recall/ssr/config.yaml
models/recall/ssr/config.yaml
+1
-1
models/recall/word2vec/config.yaml
models/recall/word2vec/config.yaml
+1
-1
models/recall/youtube_dnn/config.yaml
models/recall/youtube_dnn/config.yaml
+1
-1
models/rerank/listwise/config.yaml
models/rerank/listwise/config.yaml
+1
-1
models/rerank/readme.md
models/rerank/readme.md
+4
-1
models/treebased/tdm/README.md
models/treebased/tdm/README.md
+4
-1
models/treebased/tdm/config.yaml
models/treebased/tdm/config.yaml
+1
-1
setup.cfg
setup.cfg
+0
-2
setup.py
setup.py
+6
-22
未找到文件。
.travis.yml
浏览文件 @
6d74ba9d
...
@@ -16,15 +16,20 @@ before_install:
...
@@ -16,15 +16,20 @@ before_install:
# For pylint dockstring checker
# For pylint dockstring checker
-
sudo apt-get update
-
sudo apt-get update
-
sudo apt-get install -y python-pip libpython-dev
-
sudo apt-get install -y python-pip libpython-dev
-
sudo apt-get remove python-urllib3
-
sudo apt-get purge python-urllib3
-
sudo rm /usr/lib/python2.7/dist-packages/chardet-*
-
sudo pip install -U pip
-
sudo pip install -U pip
-
sudo pip install --upgrade setuptools
-
sudo pip install six --upgrade --ignore-installed six
-
sudo pip install six --upgrade --ignore-installed six
-
sudo pip install pillow
-
sudo pip install PyYAML
-
sudo pip install PyYAML
-
sudo pip install pylint pytest astroid isort pre-commit
-
sudo pip install pylint pytest astroid isort pre-commit
-
sudo pip install kiwisolver
-
sudo pip install kiwisolver
-
sudo pip install paddlepaddle==1.7.2 --ignore-installed urllib3
-
sudo pip install scikit-build
-
sudo pip uninstall -y rarfile
-
sudo pip install Pillow==5.3.0
-
sudo pip install opencv-python==3.4.3.18
-
sudo pip install rarfile==3.0
-
sudo pip install rarfile==3.0
-
sudo pip install paddlepaddle==1.7.2
-
sudo python setup.py install
-
sudo python setup.py install
-
|
-
|
function timeout() { perl -e 'alarm shift; exec @ARGV' "$@"; }
function timeout() { perl -e 'alarm shift; exec @ARGV' "$@"; }
...
...
README.md
浏览文件 @
6d74ba9d
...
@@ -124,7 +124,10 @@
...
@@ -124,7 +124,10 @@
```
bash
```
bash
# 使用CPU进行单机训练
# 使用CPU进行单机训练
python
-m
paddlerec.run
-m
paddlerec.models.rank.dnn
git clone https://github.com/PaddlePaddle/PaddleRec.git paddle-rec
cd
paddle-rec
python
-m
paddlerec.run
-m
models/rank/dnn/config.yaml
```
```
...
...
README_EN.md
浏览文件 @
6d74ba9d
...
@@ -119,7 +119,10 @@ We take the `dnn` algorithm as an example to get start of `PaddleRec`, and we ta
...
@@ -119,7 +119,10 @@ We take the `dnn` algorithm as an example to get start of `PaddleRec`, and we ta
```
bash
```
bash
# Training with cpu
# Training with cpu
python
-m
paddlerec.run
-m
paddlerec.models.rank.dnn
git clone https://github.com/PaddlePaddle/PaddleRec.git paddle-rec
cd
paddle-rec
python
-m
paddlerec.run
-m
models/rank/dnn/config.yaml
```
```
...
...
doc/custom_reader.md
已删除
100644 → 0
浏览文件 @
84a9ad29
# PaddleRec 自定义数据集及Reader
用户自定义数据集及配置异步Reader,需要关注以下几个步骤:
*
[
数据集整理
](
#数据集整理
)
*
[
在模型组网中加入输入占位符
](
#在模型组网中加入输入占位符
)
*
[
Reader实现
](
#Reader的实现
)
*
[
在yaml文件中配置Reader
](
#在yaml文件中配置reader
)
我们以CTR-DNN模型为例,给出了从数据整理,变量定义,Reader写法,调试的完整历程。
*
[
数据及Reader示例-DNN
](
#数据及Reader示例-DNN
)
## 数据集整理
PaddleRec支持模型自定义数据集。
关于数据的tips:
1.
数据量:
PaddleRec面向大规模数据设计,可以轻松支持亿级的数据读取,工业级的数据读写api:`dataset`在搜索、推荐、信息流等业务得到了充分打磨。
2.
文件类型:
支持任意直接可读的文本数据,`dataset`同时支持`.gz`格式的文本压缩数据,无需额外代码,可直接读取。数据样本应以`\n`为标志,按行组织。
3.
文件存放位置:
文件通常存放在训练节点本地,但同时,`dataset`支持使用`hadoop`远程读取数据,数据无需下载到本地,为dataset配置hadoop相关账户及地址即可。
4.
数据类型
Reader处理的是以行为单位的`string`数据,喂入网络的数据需要转为`int`,`float`的数值数据,不支持`string`喂入网络,不建议明文保存及处理训练数据。
5.
Tips
Dataset模式下,训练线程与数据读取线程的关系强相关,为了多线程充分利用,`强烈建议将文件合理的拆为多个小文件`,尤其是在分布式训练场景下,可以均衡各个节点的数据量,同时加快数据的下载速度。
## 在模型组网中加入输入占位符
Reader读取文件后,产出的数据喂入网络,需要有占位符进行接收。占位符在Paddle中使用
`fluid.data`
或
`fluid.layers.data`
进行定义。
`data`
的定义可以参考
[
fluid.data
](
https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/fluid_cn/data_cn.html#data
)
以及
[
fluid.layers.data
](
https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/layers_cn/data_cn.html#data
)
。
假如您希望输入三个数据,分别是维度32的数据A,维度变长的稀疏数据B,以及一个一维的标签数据C,并希望梯度可以经过该变量向前传递,则示例如下:
数据A的定义:
```
python
var_a
=
fluid
.
data
(
name
=
'A'
,
shape
=
[
-
1
,
32
],
dtype
=
'float32'
)
```
数据B的定义,变长数据的使用可以参考
[
LoDTensor
](
https://www.paddlepaddle.org.cn/documentation/docs/zh/beginners_guide/basic_concept/lod_tensor.html#cn-user-guide-lod-tensor
)
:
```
python
var_b
=
fluid
.
data
(
name
=
'B'
,
shape
=
[
-
1
,
1
],
lod_level
=
1
,
dtype
=
'int64'
)
```
数据C的定义:
```
python
var_c
=
fluid
.
data
(
name
=
'C'
,
shape
=
[
-
1
,
1
],
dtype
=
'int32'
)
var_c
.
stop_gradient
=
False
```
当我们完成以上三个数据的定义后,在PaddleRec的模型定义中,还需将其加入model基类成员变量
`self._data_var`
```
python
self
.
_data_var
.
append
(
var_a
)
self
.
_data_var
.
append
(
var_b
)
self
.
_data_var
.
append
(
var_c
)
```
至此,我们完成了在组网中定义输入数据的工作。
## Reader的实现
### Reader的实现范式
Reader的逻辑需要一个单独的python文件进行描述。我们试写一个
`test_reader.py`
,实现的具体流程如下:
1.
首先我们需要引入Reader基类
```python
from paddlerec.core.reader import ReaderBase
```
2.
创建一个子类,继承Reader的基类,训练所需Reader命名为
`TrainerReader`
```
python
class
TrainerReader
(
ReaderBase
):
def
init
(
self
):
pass
def
generator_sample
(
self
,
line
):
pass
```
3.
在
`init(self)`
函数中声明一些在数据读取中会用到的变量,必要时可以在
`config.yaml`
文件中配置变量,利用
`env.get_global_env()`
拿到。
比如,我们希望从yaml文件中读取一个数据预处理变量
`avg=10`
,目的是将数据A的数据缩小10倍,可以这样实现:
首先更改yaml文件,在某个space下加入该变量
```yaml
...
train:
reader:
avg: 10
...
```
再更改Reader的init函数
```python
from paddlerec.core.utils import envs
class TrainerReader(Reader):
def init(self):
self.avg = envs.get_global_env("avg", None, "train.reader")
def generator_sample(self, line):
pass
```
4.
继承并实现基类中的
`generate_sample(self, line)`
函数,逐行读取数据。
-
该函数应返回一个可以迭代的reader方法(带有yield的函数不再是一个普通的函数,而是一个生成器generator,成为了可以迭代的对象,等价于一个数组、链表、文件、字符串etc.)
-
在这个可以迭代的函数中,如示例代码中的
`def reader()`
,我们定义数据读取的逻辑。以行为单位的数据进行截取,转换及预处理。
-
最后,我们需要将数据整理为特定的格式,才能够被PaddleRec的Reader正确读取,并灌入的训练的网络中。简单来说,数据的输出顺序与我们在网络中创建的
`inputs`
必须是严格一一对应的,并转换为类似字典的形式。
示例: 假设数据ABC在文本数据中,每行以这样的形式存储:
```
shell
0.1,0.2,0.3...3.0,3.1,3.2
\t
99999,99998,99997
\t
1
\n
```
则示例代码如下:
```python
from paddlerec.core.utils import envs
class TrainerReader(Reader):
def init(self):
self.avg = envs.get_global_env("avg", None, "train.reader")
def generator_sample(self, line):
def reader(self, line):
# 先分割 '\n', 再以 '\t'为标志分割为list
variables = (line.strip('\n')).split('\t')
# A是第一个元素,并且每个数据之间使用','分割
var_a = variables[0].split(',') # list
var_a = [float(i) / self.avg for i in var_a] # 将str数据转换为float
# B是第二个元素,同样以 ',' 分割
var_b = variables[1].split(',') # list
var_b = [int(i) for i in var_b] # 将str数据转换为int
# C是第三个元素, 只有一个元素,没有分割符
var_c = variables[2]
var_c = int(var_c) # 将str数据转换为int
var_c = [var_c] # 将单独的数据元素置入list中
# 将数据与数据名结合,组织为dict的形式
# 如下,output形式为{ A: var_a, B: var_b, C: var_c}
variable_name = ['A', 'B', 'C']
output = zip(variable_name, [var_a] + [var_b] + [var_c])
# 将数据输出,使用yield方法,将该函数变为了一个可迭代的对象
yield output
```
至此,我们完成了Reader的实现。
### 在yaml文件中配置Reader
在模型的yaml配置文件中,主要的修改是三个,如下
```
yaml
reader
:
batch_size
:
2
class
:
"
{workspace}/reader.py"
train_data_path
:
"
{workspace}/data/train_data"
reader_debug_mode
:
False
```
batch_size: 顾名思义,是小批量训练时的样本大小
class: 运行改模型所需reader的路径
train_data_path: 训练数据所在文件夹
reader_debug_mode: 测试reader语法,及输出是否符合预期的debug模式的开关
## 数据及Reader示例-DNN
Reader代码来源于
[
criteo_reader.py
](
../models/rank/criteo_reader.py
)
, 组网代码来源于
[
model.py
](
../models/rank/dnn/model.py
)
### Criteo数据集格式
CTR-DNN训练及测试数据集选用
[
Display Advertising Challenge
](
https://www.kaggle.com/c/criteo-display-ad-challenge/
)
所用的Criteo数据集。该数据集包括两部分:训练集和测试集。训练集包含一段时间内Criteo的部分流量,测试集则对应训练数据后一天的广告点击流量。
每一行数据格式如下所示:
```
bash
<label> <integer feature 1> ... <integer feature 13> <categorical feature 1> ... <categorical feature 26>
```
其中
```<label>```
表示广告是否被点击,点击用1表示,未点击用0表示。
```<integer feature>```
代表数值特征(连续特征),共有13个连续特征。
```<categorical feature>```
代表分类特征(离散特征),共有26个离散特征。相邻两个特征用
```\t```
分隔,缺失特征用空格表示。测试集中
```<label>```
特征已被移除。
### Criteo数据集的预处理
数据预处理共包括两步:
-
将原始训练集按9:1划分为训练集和验证集
-
数值特征(连续特征)需进行归一化处理,但需要注意的是,对每一个特征
```<integer feature i>```
,归一化时用到的最大值并不是用全局最大值,而是取排序后95%位置处的特征值作为最大值,同时保留极值。
### CTR网络输入的定义
正如前所述,Criteo数据集中,分为连续数据与离散(稀疏)数据,所以整体而言,CTR-DNN模型的数据输入层包括三个,分别是:
`dense_input`
用于输入连续数据,维度由超参数
`dense_feature_dim`
指定,数据类型是归一化后的浮点型数据。
`sparse_input_ids`
用于记录离散数据,在Criteo数据集中,共有26个slot,所以我们创建了名为
`C1~C26`
的26个稀疏参数输入,并设置
`lod_level=1`
,代表其为变长数据,数据类型为整数;最后是每条样本的
`label`
,代表了是否被点击,数据类型是整数,0代表负样例,1代表正样例。
在Paddle中数据输入的声明使用
`paddle.fluid.layers.data()`
,会创建指定类型的占位符,数据IO会依据此定义进行数据的输入。
稀疏参数输入的定义:
```
python
def
sparse_inputs
():
ids
=
envs
.
get_global_env
(
"hyper_parameters.sparse_inputs_slots"
,
None
)
sparse_input_ids
=
[
fluid
.
layers
.
data
(
name
=
"S"
+
str
(
i
),
shape
=
[
1
],
lod_level
=
1
,
dtype
=
"int64"
)
for
i
in
range
(
1
,
ids
)
]
return
sparse_input_ids
```
稠密参数输入的定义:
```
python
def
dense_input
():
dim
=
envs
.
get_global_env
(
"hyper_parameters.dense_input_dim"
,
None
)
dense_input_var
=
fluid
.
layers
.
data
(
name
=
"D"
,
shape
=
[
dim
],
dtype
=
"float32"
)
return
dense_input_var
```
标签的定义:
```
python
def
label_input
():
label
=
fluid
.
layers
.
data
(
name
=
"click"
,
shape
=
[
1
],
dtype
=
"int64"
)
return
label
```
组合起来,正确的声明他们:
```
python
self
.
sparse_inputs
=
sparse_inputs
()
self
.
dense_input
=
dense_input
()
self
.
label_input
=
label_input
()
self
.
_data_var
.
append
(
self
.
dense_input
)
for
input
in
self
.
sparse_inputs
:
self
.
_data_var
.
append
(
input
)
self
.
_data_var
.
append
(
self
.
label_input
)
```
### Criteo Reader写法
```
python
# 引入PaddleRec的Reader基类
from
paddlerec.core.reader
import
ReaderBase
# 引入PaddleRec的读取yaml配置文件的方法
from
paddlerec.core.utils
import
envs
# 定义TrainReader,需要继承 paddlerec.core.reader.Reader
class
Reader
(
ReaderBase
)::
# 数据预处理逻辑,继承自基类
# 如果无需处理, 使用pass跳过该函数的执行
def
init
(
self
):
self
.
cont_min_
=
[
0
,
-
3
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
]
self
.
cont_max_
=
[
20
,
600
,
100
,
50
,
64000
,
500
,
100
,
50
,
500
,
10
,
10
,
10
,
50
]
self
.
cont_diff_
=
[
20
,
603
,
100
,
50
,
64000
,
500
,
100
,
50
,
500
,
10
,
10
,
10
,
50
]
self
.
hash_dim_
=
envs
.
get_global_env
(
"hyper_parameters.sparse_feature_number"
,
None
,
"train.model"
)
self
.
continuous_range_
=
range
(
1
,
14
)
self
.
categorical_range_
=
range
(
14
,
40
)
# 读取数据方法,继承自基类
# 实现可以迭代的reader函数,逐行处理数据
def
generate_sample
(
self
,
line
):
"""
Read the data line by line and process it as a dictionary
"""
def
reader
():
"""
This function needs to be implemented by the user, based on data format
"""
features
=
line
.
rstrip
(
'
\n
'
).
split
(
'
\t
'
)
dense_feature
=
[]
sparse_feature
=
[]
for
idx
in
self
.
continuous_range_
:
if
features
[
idx
]
==
""
:
dense_feature
.
append
(
0.0
)
else
:
dense_feature
.
append
(
(
float
(
features
[
idx
])
-
self
.
cont_min_
[
idx
-
1
])
/
self
.
cont_diff_
[
idx
-
1
])
for
idx
in
self
.
categorical_range_
:
sparse_feature
.
append
(
[
hash
(
str
(
idx
)
+
features
[
idx
])
%
self
.
hash_dim_
])
label
=
[
int
(
features
[
0
])]
feature_name
=
[
"D"
]
for
idx
in
self
.
categorical_range_
:
feature_name
.
append
(
"S"
+
str
(
idx
-
13
))
feature_name
.
append
(
"label"
)
yield
zip
(
feature_name
,
[
dense_feature
]
+
sparse_feature
+
[
label
])
return
reader
```
### 调试Reader
在Linux下运行时,默认启动
`Dataset`
模式,在Win/Mac下运行时,默认启动
`Dataloader`
模式。
通过在
`config.yaml`
中添加或修改
`reader_debug_mode=True`
打开debug模式,只会结合组网运行reader的部分,读取10条样本,并print,方便您观察格式是否符合预期或隐藏bug。
```
yaml
reader
:
batch_size
:
2
class
:
"
{workspace}/../criteo_reader.py"
train_data_path
:
"
{workspace}/data/train"
reader_debug_mode
:
True
```
修改后,使用paddlerec.run执行该修改后的yaml文件,可以观察输出。
```
bash
python
-m
paddlerec.run
-m
./models/rank/dnn/config.yaml
-e
single
```
### Dataset调试
dataset输出的数据格式如下:
` dense_input:size ; dense_input:value ; sparse_input:size ; sparse_input:value ; ... ; sparse_input:size ; sparse_input:value ; label:size ; label:value `
基本规律是对于每个变量,会先输出其维度大小,再输出其具体值。
直接debug
`criteo_reader`
理想的输出为(截取了一个片段):
```
bash
...
13 0.0 0.00497512437811 0.05 0.08 0.207421875 0.028 0.35 0.08 0.082 0.0 0.4 0.0 0.08 1 737395 1 210498 1 903564 1 286224 1 286835 1 906818 1 90
6116 1 67180 1 27346 1 51086 1 142177 1 95024 1 157883 1 873363 1 600281 1 812592 1 228085 1 35900 1 880474 1 984402 1 100885 1 26235 1 410878 1 798162 1 499868 1 306163 1 0
...
```
可以看到首先输出的是13维的dense参数,随后是分立的sparse参数,最后一个是1维的label,数值为0,输出符合预期。
>使用Dataset的一些注意事项
> - Dataset的基本原理:将数据print到缓存,再由C++端的代码实现读取,因此,我们不能在dataset的读取代码中,加入与数据读取无关的print信息,会导致C++端拿到错误的数据信息。
> - dataset目前只支持在`unbuntu`及`CentOS`等标准Linux环境下使用,在`Windows`及`Mac`下使用时,会产生预料之外的错误,请知悉。
### DataLoader调试
dataloader的输出格式为
`list: [ list[var_1], list[var_2], ... , list[var_3]]`
,每条样本的数据会被放在一个
**list[list]**
中,list[0]为第一个variable。
直接debug
`criteo_reader`
理想的输出为(截取了一个片段):
```
bash
...
[[
0.0, 0.004975124378109453, 0.05, 0.08, 0.207421875, 0.028, 0.35, 0.08, 0.082, 0.0, 0.4, 0.0, 0.08],
[
560746],
[
902436],
[
262029],
[
182633],
[
368411],
[
735166],
[
321120],
[
39572],
[
185732],
[
140298],
[
926671],
[
81559],
[
461249],
[
728372],
[
915018],
[
907965],
[
818961],
[
850958],
[
311492],
[
980340],
[
254960],
[
175041],
[
524857],
[
764893],
[
526288],
[
220126],
[
0]]
...
```
可以看到首先输出的是13维的dense参数的list,随后是分立的sparse参数,各自在一个list中,最后一个是1维的label的list,数值为0,输出符合预期。
doc/distributed_train.md
浏览文件 @
6d74ba9d
...
@@ -48,7 +48,7 @@
...
@@ -48,7 +48,7 @@
```
yaml
```
yaml
# workspace
# workspace
workspace
:
"
paddlerec.models.rank.
dnn"
workspace
:
"
models/rank/
dnn"
mode
:
[
single_cpu_train
]
mode
:
[
single_cpu_train
]
runner
:
runner
:
...
...
doc/model_develop.md
浏览文件 @
6d74ba9d
...
@@ -92,7 +92,7 @@ def input_data(self, is_infer=False, **kwargs):
...
@@ -92,7 +92,7 @@ def input_data(self, is_infer=False, **kwargs):
return
train_inputs
return
train_inputs
```
```
更多数据读取教程,请参考
[
自定义数据集及Reader
](
custom_
dataset_
reader.md
)
更多数据读取教程,请参考
[
自定义数据集及Reader
](
custom_reader.md
)
### 组网的定义
### 组网的定义
...
...
doc/train.md
浏览文件 @
6d74ba9d
...
@@ -20,7 +20,7 @@ python -m paddlerec.run -m paddlerec.models.xxx.yyy
...
@@ -20,7 +20,7 @@ python -m paddlerec.run -m paddlerec.models.xxx.yyy
例如启动
`recall`
下的
`word2vec`
模型的默认配置;
例如启动
`recall`
下的
`word2vec`
模型的默认配置;
```
shell
```
shell
python
-m
paddlerec.run
-m
paddlerec.models.recall.
word2vec
python
-m
paddlerec.run
-m
models/recall/
word2vec
```
```
### 2. 启动内置模型的个性化配置训练
### 2. 启动内置模型的个性化配置训练
...
...
models/contentunderstanding/classification/config.yaml
浏览文件 @
6d74ba9d
...
@@ -12,7 +12,7 @@
...
@@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and
# See the License for the specific language governing permissions and
# limitations under the License.
# limitations under the License.
workspace
:
"
paddlerec.models.contentunderstanding.
classification"
workspace
:
"
models/contentunderstanding/
classification"
dataset
:
dataset
:
-
name
:
data1
-
name
:
data1
...
...
models/contentunderstanding/readme.md
浏览文件 @
6d74ba9d
...
@@ -39,8 +39,11 @@
...
@@ -39,8 +39,11 @@
##使用教程(快速开始)
##使用教程(快速开始)
```
```
python -m paddlerec.run -m paddlerec.models.contentunderstanding.tagspace
git clone https://github.com/PaddlePaddle/PaddleRec.git paddle-rec
python -m paddlerec.run -m paddlerec.models.contentunderstanding.classification
cd paddle-rec
python -m paddlerec.run -m models/contentunderstanding/tagspace/config.yaml
python -m paddlerec.run -m models/contentunderstanding/classification/config.yaml
```
```
## 使用教程(复现论文)
## 使用教程(复现论文)
...
...
models/contentunderstanding/tagspace/config.yaml
浏览文件 @
6d74ba9d
...
@@ -12,7 +12,7 @@
...
@@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and
# See the License for the specific language governing permissions and
# limitations under the License.
# limitations under the License.
workspace
:
"
paddlerec.models.contentunderstanding.
tagspace"
workspace
:
"
models/contentunderstanding/
tagspace"
dataset
:
dataset
:
-
name
:
sample_1
-
name
:
sample_1
...
...
models/demo/movie_recommand/rank/config.yaml
浏览文件 @
6d74ba9d
...
@@ -12,7 +12,7 @@
...
@@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and
# See the License for the specific language governing permissions and
# limitations under the License.
# limitations under the License.
workspace
:
"
paddlerec.models.demo.
movie_recommand"
workspace
:
"
models/demo/
movie_recommand"
# list of dataset
# list of dataset
dataset
:
dataset
:
...
...
models/demo/movie_recommand/recall/config.yaml
浏览文件 @
6d74ba9d
...
@@ -12,7 +12,7 @@
...
@@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and
# See the License for the specific language governing permissions and
# limitations under the License.
# limitations under the License.
workspace
:
"
paddlerec.models.demo.
movie_recommand"
workspace
:
"
models/demo/
movie_recommand"
# list of dataset
# list of dataset
dataset
:
dataset
:
...
...
models/match/dssm/config.yaml
浏览文件 @
6d74ba9d
...
@@ -13,7 +13,7 @@
...
@@ -13,7 +13,7 @@
# limitations under the License.
# limitations under the License.
workspace
:
"
paddlerec.models.match.
dssm"
workspace
:
"
models/match/
dssm"
dataset
:
dataset
:
-
name
:
dataset_train
-
name
:
dataset_train
...
...
models/match/match-pyramid/config.yaml
浏览文件 @
6d74ba9d
...
@@ -13,7 +13,7 @@
...
@@ -13,7 +13,7 @@
# limitations under the License.
# limitations under the License.
workspace
:
"
paddlerec.models.match.
match-pyramid"
workspace
:
"
models/match/
match-pyramid"
dataset
:
dataset
:
-
name
:
dataset_train
-
name
:
dataset_train
...
...
models/match/multiview-simnet/config.yaml
浏览文件 @
6d74ba9d
...
@@ -13,7 +13,7 @@
...
@@ -13,7 +13,7 @@
# limitations under the License.
# limitations under the License.
# workspace
# workspace
workspace
:
"
paddlerec.models.match.
multiview-simnet"
workspace
:
"
models/match/
multiview-simnet"
# list of dataset
# list of dataset
dataset
:
dataset
:
...
...
models/match/readme.md
浏览文件 @
6d74ba9d
...
@@ -34,8 +34,11 @@
...
@@ -34,8 +34,11 @@
## 使用教程(快速开始)
## 使用教程(快速开始)
### 训练
### 训练
```
shell
```
shell
python
-m
paddlerec.run
-m
paddlerec.models.match.dssm
# dssm
git clone https://github.com/PaddlePaddle/PaddleRec.git paddle-rec
python
-m
paddlerec.run
-m
paddlerec.models.match.multiview-simnet
# multiview-simnet
cd
paddle-rec
python
-m
paddlerec.run
-m
models/match/dssm/config.yaml
# dssm
python
-m
paddlerec.run
-m
models/match/multiview-simnet/config.yaml
# multiview-simnet
```
```
### 预测
### 预测
...
...
models/multitask/esmm/config.yaml
浏览文件 @
6d74ba9d
...
@@ -13,7 +13,7 @@
...
@@ -13,7 +13,7 @@
# limitations under the License.
# limitations under the License.
workspace
:
"
paddlerec.models.multitask.
esmm"
workspace
:
"
models/multitask/
esmm"
dataset
:
dataset
:
-
name
:
dataset_train
-
name
:
dataset_train
...
...
models/multitask/mmoe/config.yaml
浏览文件 @
6d74ba9d
...
@@ -12,7 +12,7 @@
...
@@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and
# See the License for the specific language governing permissions and
# limitations under the License.
# limitations under the License.
workspace
:
"
paddlerec.models.multitask.
mmoe"
workspace
:
"
models/multitask/
mmoe"
dataset
:
dataset
:
-
name
:
dataset_train
-
name
:
dataset_train
...
...
models/multitask/readme.md
浏览文件 @
6d74ba9d
...
@@ -44,9 +44,12 @@
...
@@ -44,9 +44,12 @@
## 使用教程(快速开始)
## 使用教程(快速开始)
```
shell
```
shell
python
-m
paddlerec.run
-m
paddlerec.models.multitask.mmoe
# mmoe
git clone https://github.com/PaddlePaddle/PaddleRec.git paddle-rec
python
-m
paddlerec.run
-m
paddlerec.models.multitask.share-bottom
# share-bottom
cd
paddle-rec
python
-m
paddlerec.run
-m
paddlerec.models.multitask.esmm
# esmm
python
-m
paddlerec.run
-m
models/multitask/mmoe/config.yaml
# mmoe
python
-m
paddlerec.run
-m
models/multitask/share-bottom/config.yaml
# share-bottom
python
-m
paddlerec.run
-m
models/multitask/esmm/config.yaml
# esmm
```
```
## 使用教程(复现论文)
## 使用教程(复现论文)
...
...
models/multitask/share-bottom/config.yaml
浏览文件 @
6d74ba9d
...
@@ -12,7 +12,7 @@
...
@@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and
# See the License for the specific language governing permissions and
# limitations under the License.
# limitations under the License.
workspace
:
"
paddlerec.models.multitask.
share-bottom"
workspace
:
"
models/multitask/
share-bottom"
dataset
:
dataset
:
-
name
:
dataset_train
-
name
:
dataset_train
...
...
models/rank/AutoInt/config.yaml
浏览文件 @
6d74ba9d
...
@@ -14,7 +14,7 @@
...
@@ -14,7 +14,7 @@
# global settings
# global settings
debug
:
false
debug
:
false
workspace
:
"
paddlerec.models.rank.
AutoInt"
workspace
:
"
models/rank/
AutoInt"
dataset
:
dataset
:
...
...
models/rank/BST/config.yaml
浏览文件 @
6d74ba9d
...
@@ -14,7 +14,7 @@
...
@@ -14,7 +14,7 @@
# global settings
# global settings
debug
:
false
debug
:
false
workspace
:
"
paddlerec.models.rank.
BST"
workspace
:
"
models/rank/
BST"
dataset
:
dataset
:
-
name
:
sample_1
-
name
:
sample_1
...
...
models/rank/afm/config.yaml
浏览文件 @
6d74ba9d
...
@@ -15,7 +15,7 @@
...
@@ -15,7 +15,7 @@
# global settings
# global settings
debug
:
false
debug
:
false
workspace
:
"
paddlerec.models.rank.
afm"
workspace
:
"
models/rank/
afm"
dataset
:
dataset
:
-
name
:
train_sample
-
name
:
train_sample
...
...
models/rank/dcn/config.yaml
浏览文件 @
6d74ba9d
...
@@ -15,7 +15,7 @@
...
@@ -15,7 +15,7 @@
# global settings
# global settings
debug
:
false
debug
:
false
workspace
:
"
paddlerec.models.rank.
dcn"
workspace
:
"
models/rank/
dcn"
dataset
:
dataset
:
-
name
:
train_sample
-
name
:
train_sample
...
...
models/rank/deep_crossing/config.yaml
浏览文件 @
6d74ba9d
...
@@ -15,7 +15,7 @@
...
@@ -15,7 +15,7 @@
# global settings
# global settings
debug
:
false
debug
:
false
workspace
:
"
paddlerec.models.rank.
deep_crossing"
workspace
:
"
models/rank/
deep_crossing"
dataset
:
dataset
:
-
name
:
train_sample
-
name
:
train_sample
...
...
models/rank/deepfm/config.yaml
浏览文件 @
6d74ba9d
...
@@ -14,7 +14,7 @@
...
@@ -14,7 +14,7 @@
# global settings
# global settings
debug
:
false
debug
:
false
workspace
:
"
paddlerec.models.rank.
deepfm"
workspace
:
"
models/rank/
deepfm"
dataset
:
dataset
:
...
...
models/rank/dien/config.yaml
浏览文件 @
6d74ba9d
...
@@ -14,7 +14,7 @@
...
@@ -14,7 +14,7 @@
# global settings
# global settings
debug
:
false
debug
:
false
workspace
:
"
paddlerec.models.rank.
dien"
workspace
:
"
models/rank/
dien"
dataset
:
dataset
:
-
name
:
sample_1
-
name
:
sample_1
...
...
models/rank/din/config.yaml
浏览文件 @
6d74ba9d
...
@@ -14,7 +14,7 @@
...
@@ -14,7 +14,7 @@
# global settings
# global settings
debug
:
false
debug
:
false
workspace
:
"
paddlerec.models.rank.
din"
workspace
:
"
models/rank/
din"
dataset
:
dataset
:
-
name
:
sample_1
-
name
:
sample_1
...
...
models/rank/dnn/config.yaml
浏览文件 @
6d74ba9d
...
@@ -13,7 +13,7 @@
...
@@ -13,7 +13,7 @@
# limitations under the License.
# limitations under the License.
# workspace
# workspace
workspace
:
"
paddlerec.models.rank.
dnn"
workspace
:
"
models/rank/
dnn"
# list of dataset
# list of dataset
dataset
:
dataset
:
...
...
models/rank/ffm/config.yaml
浏览文件 @
6d74ba9d
...
@@ -15,7 +15,7 @@
...
@@ -15,7 +15,7 @@
# global settings
# global settings
debug
:
false
debug
:
false
workspace
:
"
paddlerec.models.rank.
ffm"
workspace
:
"
models/rank/
ffm"
dataset
:
dataset
:
-
name
:
train_sample
-
name
:
train_sample
...
...
models/rank/fgcnn/config.yaml
浏览文件 @
6d74ba9d
...
@@ -15,7 +15,7 @@
...
@@ -15,7 +15,7 @@
# global settings
# global settings
debug
:
false
debug
:
false
workspace
:
"
paddlerec.models.rank.
fgcnn"
workspace
:
"
models/rank/
fgcnn"
dataset
:
dataset
:
-
name
:
train_sample
-
name
:
train_sample
...
...
models/rank/fibinet/README.md
浏览文件 @
6d74ba9d
...
@@ -132,7 +132,7 @@ CPU环境
...
@@ -132,7 +132,7 @@ CPU环境
### 运行
### 运行
```
```
python -m paddlerec.run -m
paddlerec.models.rank.
fibinet
python -m paddlerec.run -m
models/rank/
fibinet
```
```
...
...
models/rank/fibinet/config.yaml
浏览文件 @
6d74ba9d
...
@@ -13,7 +13,7 @@
...
@@ -13,7 +13,7 @@
# limitations under the License.
# limitations under the License.
# workspace
# workspace
workspace
:
"
paddlerec.models.rank.
fibinet"
workspace
:
"
models/rank/
fibinet"
# list of dataset
# list of dataset
dataset
:
dataset
:
...
...
models/rank/flen/README.md
浏览文件 @
6d74ba9d
...
@@ -110,7 +110,7 @@ CPU环境
...
@@ -110,7 +110,7 @@ CPU环境
### 运行
### 运行
```
```
python -m paddlerec.run -m
paddlerec.models.rank.
flen
python -m paddlerec.run -m
models/rank/
flen
```
```
## 论文复现
## 论文复现
...
...
models/rank/flen/config.yaml
浏览文件 @
6d74ba9d
...
@@ -13,7 +13,7 @@
...
@@ -13,7 +13,7 @@
# limitations under the License.
# limitations under the License.
# workspace
# workspace
workspace
:
"
paddlerec.models.rank.
flen"
workspace
:
"
models/rank/
flen"
# list of dataset
# list of dataset
dataset
:
dataset
:
...
...
models/rank/fm/config.yaml
浏览文件 @
6d74ba9d
...
@@ -15,7 +15,7 @@
...
@@ -15,7 +15,7 @@
# global settings
# global settings
debug
:
false
debug
:
false
workspace
:
"
paddlerec.models.rank.
fm"
workspace
:
"
models/rank/
fm"
dataset
:
dataset
:
-
name
:
train_sample
-
name
:
train_sample
...
...
models/rank/fnn/config.yaml
浏览文件 @
6d74ba9d
...
@@ -15,7 +15,7 @@
...
@@ -15,7 +15,7 @@
# global settings
# global settings
debug
:
false
debug
:
false
workspace
:
"
paddlerec.models.rank.
fnn"
workspace
:
"
models/rank/
fnn"
dataset
:
dataset
:
-
name
:
train_sample
-
name
:
train_sample
...
...
models/rank/logistic_regression/config.yaml
浏览文件 @
6d74ba9d
...
@@ -14,7 +14,7 @@
...
@@ -14,7 +14,7 @@
# global settings
# global settings
debug
:
false
debug
:
false
workspace
:
"
paddlerec.models.rank.
logistic_regression"
workspace
:
"
models/rank/
logistic_regression"
dataset
:
dataset
:
...
...
models/rank/nfm/config.yaml
浏览文件 @
6d74ba9d
...
@@ -15,7 +15,7 @@
...
@@ -15,7 +15,7 @@
# global settings
# global settings
debug
:
false
debug
:
false
workspace
:
"
paddlerec.models.rank.
nfm"
workspace
:
"
models/rank/
nfm"
dataset
:
dataset
:
-
name
:
train_sample
-
name
:
train_sample
...
...
models/rank/pnn/config.yaml
浏览文件 @
6d74ba9d
...
@@ -15,7 +15,7 @@
...
@@ -15,7 +15,7 @@
# global settings
# global settings
debug
:
false
debug
:
false
workspace
:
"
paddlerec.models.rank.
pnn"
workspace
:
"
models/rank/
pnn"
dataset
:
dataset
:
-
name
:
train_sample
-
name
:
train_sample
...
...
models/rank/readme.md
浏览文件 @
6d74ba9d
...
@@ -107,7 +107,7 @@ sh run.sh
...
@@ -107,7 +107,7 @@ sh run.sh
### 训练
### 训练
```
```
cd modles/rank/dnn # 进入选定好的排序模型的目录 以DNN为例
cd modles/rank/dnn # 进入选定好的排序模型的目录 以DNN为例
python -m paddlerec.run -m
paddlerec.models.rank.dnn
# 使用内置配置
python -m paddlerec.run -m
models/rank/dnn/config.yaml
# 使用内置配置
# 如果需要使用自定义配置,config.yaml中workspace需要使用改模型目录的绝对路径
# 如果需要使用自定义配置,config.yaml中workspace需要使用改模型目录的绝对路径
# 自定义修改超参后,指定配置文件,使用自定义配置
# 自定义修改超参后,指定配置文件,使用自定义配置
python -m paddlerec.run -m ./config.yaml
python -m paddlerec.run -m ./config.yaml
...
...
models/rank/wide_deep/config.yaml
浏览文件 @
6d74ba9d
...
@@ -14,7 +14,7 @@
...
@@ -14,7 +14,7 @@
# global settings
# global settings
debug
:
false
debug
:
false
workspace
:
"
paddlerec.models.rank.
wide_deep"
workspace
:
"
models/rank/
wide_deep"
dataset
:
dataset
:
...
...
models/rank/xdeepfm/config.yaml
浏览文件 @
6d74ba9d
...
@@ -12,7 +12,7 @@
...
@@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and
# See the License for the specific language governing permissions and
# limitations under the License.
# limitations under the License.
debug
:
false
debug
:
false
workspace
:
"
paddlerec.models.rank.
xdeepfm"
workspace
:
"
models/rank/
xdeepfm"
dataset
:
dataset
:
-
name
:
sample_1
-
name
:
sample_1
...
...
models/recall/fasttext/config.yaml
浏览文件 @
6d74ba9d
...
@@ -11,7 +11,7 @@
...
@@ -11,7 +11,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# See the License for the specific language governing permissions and
# limitations under the License.
# limitations under the License.
workspace
:
"
paddlerec.models.recall.
fasttext"
workspace
:
"
models/recall/
fasttext"
# list of dataset
# list of dataset
dataset
:
dataset
:
...
...
models/recall/gnn/config.yaml
浏览文件 @
6d74ba9d
...
@@ -13,7 +13,7 @@
...
@@ -13,7 +13,7 @@
# limitations under the License.
# limitations under the License.
# workspace
# workspace
workspace
:
"
paddlerec.models.recall.
gnn"
workspace
:
"
models/recall/
gnn"
# list of dataset
# list of dataset
dataset
:
dataset
:
...
...
models/recall/gnn/readme.md
浏览文件 @
6d74ba9d
...
@@ -165,7 +165,7 @@ CPU环境
...
@@ -165,7 +165,7 @@ CPU环境
### 运行
### 运行
```
```
python -m paddlerec.run -m
paddlerec.models.recall.gnn
python -m paddlerec.run -m
models/recall/gnn/config.yaml
```
```
### 结果展示
### 结果展示
...
...
models/recall/gru4rec/config.yaml
浏览文件 @
6d74ba9d
...
@@ -12,7 +12,7 @@
...
@@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and
# See the License for the specific language governing permissions and
# limitations under the License.
# limitations under the License.
workspace
:
"
paddlerec.models.recall.
gru4rec"
workspace
:
"
models/recall/
gru4rec"
dataset
:
dataset
:
-
name
:
dataset_train
-
name
:
dataset_train
...
...
models/recall/look-alike_recall/README.md
浏览文件 @
6d74ba9d
...
@@ -129,7 +129,7 @@ CPU环境
...
@@ -129,7 +129,7 @@ CPU环境
### 运行
### 运行
```
```
python -m paddlerec.run -m
paddlerec.models.recall.look-alike_recal
l
python -m paddlerec.run -m
models/recall/look-alike_recall/config.yam
l
```
```
...
...
models/recall/look-alike_recall/config.yaml
浏览文件 @
6d74ba9d
...
@@ -14,7 +14,7 @@
...
@@ -14,7 +14,7 @@
# global settings
# global settings
debug
:
false
debug
:
false
workspace
:
"
paddlerec.models.recall.
look-alike_recall"
workspace
:
"
models/recall/
look-alike_recall"
dataset
:
dataset
:
-
name
:
sample_1
-
name
:
sample_1
...
...
models/recall/ncf/config.yaml
浏览文件 @
6d74ba9d
...
@@ -12,7 +12,7 @@
...
@@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and
# See the License for the specific language governing permissions and
# limitations under the License.
# limitations under the License.
workspace
:
"
paddlerec.models.recall.
ncf"
workspace
:
"
models/recall/
ncf"
dataset
:
dataset
:
-
name
:
dataset_train
-
name
:
dataset_train
...
...
models/recall/readme.md
浏览文件 @
6d74ba9d
...
@@ -62,12 +62,15 @@
...
@@ -62,12 +62,15 @@
## 使用教程(快速开始)
## 使用教程(快速开始)
###
###
```
shell
```
shell
python
-m
paddlerec.run
-m
paddlerec.models.recall.word2vec
# word2vec
git clone https://github.com/PaddlePaddle/PaddleRec.git paddle-rec
python
-m
paddlerec.run
-m
paddlerec.models.recall.ssr
# ssr
cd
paddle-rec
python
-m
paddlerec.run
-m
paddlerec.models.recall.gru4rec
# gru4rec
python
-m
paddlerec.run
-m
paddlerec.models.recall.gnn
# gnn
python
-m
paddlerec.run
-m
models/recall/word2vec/config.yaml
# word2vec
python
-m
paddlerec.run
-m
paddlerec.models.recall.ncf
# ncf
python
-m
paddlerec.run
-m
models/recall/ssr/config.yaml
# ssr
python
-m
paddlerec.run
-m
paddlerec.models.recall.youtube_dnn
# youtube_dnn
python
-m
paddlerec.run
-m
models/recall/gru4rec/config.yaml
# gru4rec
python
-m
paddlerec.run
-m
models/recall/gnn/config.yaml
# gnn
python
-m
paddlerec.run
-m
models/recall/ncf/config.yaml
# ncf
python
-m
paddlerec.run
-m
models/recall/youtube_dnn/config.yaml
# youtube_dnn
```
```
## 使用教程(复现论文)
## 使用教程(复现论文)
...
@@ -87,6 +90,9 @@ sh data_prepare.sh
...
@@ -87,6 +90,9 @@ sh data_prepare.sh
### 训练
### 训练
```
bash
```
bash
git clone https://github.com/PaddlePaddle/PaddleRec.git paddle-rec
cd
paddle-rec
cd
modles/recall/gnn
# 进入选定好的召回模型的目录 以gnn为例
cd
modles/recall/gnn
# 进入选定好的召回模型的目录 以gnn为例
python
-m
paddlerec.run
-m
./config.yaml
# 自定义修改超参后,指定配置文件,使用自定义配置
python
-m
paddlerec.run
-m
./config.yaml
# 自定义修改超参后,指定配置文件,使用自定义配置
```
```
...
...
models/recall/ssr/config.yaml
浏览文件 @
6d74ba9d
...
@@ -12,7 +12,7 @@
...
@@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and
# See the License for the specific language governing permissions and
# limitations under the License.
# limitations under the License.
workspace
:
"
paddlerec.models.recall.
ssr"
workspace
:
"
models/recall/
ssr"
dataset
:
dataset
:
-
name
:
dataset_train
-
name
:
dataset_train
...
...
models/recall/word2vec/config.yaml
浏览文件 @
6d74ba9d
...
@@ -11,7 +11,7 @@
...
@@ -11,7 +11,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# See the License for the specific language governing permissions and
# limitations under the License.
# limitations under the License.
workspace
:
"
paddlerec.models.recall.word2vec"
workspace
:
"
models/recall/word2vec"
# list of dataset
# list of dataset
dataset
:
dataset
:
...
...
models/recall/youtube_dnn/config.yaml
浏览文件 @
6d74ba9d
...
@@ -13,7 +13,7 @@
...
@@ -13,7 +13,7 @@
# limitations under the License.
# limitations under the License.
workspace
:
"
paddlerec.models.recall.
youtube_dnn"
workspace
:
"
models/recall/
youtube_dnn"
dataset
:
dataset
:
-
name
:
dataset_train
-
name
:
dataset_train
...
...
models/rerank/listwise/config.yaml
浏览文件 @
6d74ba9d
...
@@ -13,7 +13,7 @@
...
@@ -13,7 +13,7 @@
# limitations under the License.
# limitations under the License.
workspace
:
"
paddlerec.models.rerank.
listwise"
workspace
:
"
models/rerank/
listwise"
dataset
:
dataset
:
-
name
:
dataset_train
-
name
:
dataset_train
...
...
models/rerank/readme.md
浏览文件 @
6d74ba9d
...
@@ -28,7 +28,10 @@
...
@@ -28,7 +28,10 @@
## 使用教程(快速开始)
## 使用教程(快速开始)
```
shell
```
shell
python
-m
paddlerec.run
-m
paddlerec.models.rerank.listwise
# listwise
git clone https://github.com/PaddlePaddle/PaddleRec.git paddle-rec
cd
paddle-rec
python
-m
paddlerec.run
-m
models/rerank/listwise/config.yaml
# listwise
```
```
## 使用教程(复现论文)
## 使用教程(复现论文)
...
...
models/treebased/tdm/README.md
浏览文件 @
6d74ba9d
...
@@ -8,7 +8,10 @@
...
@@ -8,7 +8,10 @@
2.
基于单机模型,可以进行分布式的参数服务器训练
2.
基于单机模型,可以进行分布式的参数服务器训练
```
shell
```
shell
python
-m
paddlerec.run
-m
paddlerec.models.treebased.tdm
git clone https://github.com/PaddlePaddle/PaddleRec.git paddle-rec
cd
paddle-rec
python
-m
paddlerec.run
-m
models/treebased/tdm/config.yaml
```
```
## 树结构的准备
## 树结构的准备
...
...
models/treebased/tdm/config.yaml
浏览文件 @
6d74ba9d
...
@@ -12,7 +12,7 @@
...
@@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and
# See the License for the specific language governing permissions and
# limitations under the License.
# limitations under the License.
workspace
:
"
paddlerec.models.treebased.
tdm"
workspace
:
"
models/treebased/
tdm"
# list of dataset
# list of dataset
dataset
:
dataset
:
...
...
setup.cfg
已删除
100644 → 0
浏览文件 @
84a9ad29
[easy_install]
index_url=http://pip.baidu.com/pypi/simple
\ No newline at end of file
setup.py
浏览文件 @
6d74ba9d
...
@@ -38,15 +38,18 @@ readme = ""
...
@@ -38,15 +38,18 @@ readme = ""
def
build
(
dirname
):
def
build
(
dirname
):
package_dir
=
os
.
path
.
dirname
(
os
.
path
.
abspath
(
__file__
))
package_dir
=
os
.
path
.
dirname
(
os
.
path
.
abspath
(
__file__
))
shutil
.
copytree
(
shutil
.
copytree
(
package_dir
,
dirname
,
ignore
=
shutil
.
ignore_patterns
(
".git"
))
package_dir
,
dirname
,
ignore
=
shutil
.
ignore_patterns
(
".git"
,
"models"
,
"build"
,
"dist"
,
"*.md"
))
os
.
mkdir
(
os
.
path
.
join
(
dirname
,
"paddlerec"
))
os
.
mkdir
(
os
.
path
.
join
(
dirname
,
"paddlerec"
))
shutil
.
move
(
shutil
.
move
(
os
.
path
.
join
(
dirname
,
"core"
),
os
.
path
.
join
(
dirname
,
"paddlerec"
))
os
.
path
.
join
(
dirname
,
"core"
),
os
.
path
.
join
(
dirname
,
"paddlerec"
))
shutil
.
move
(
shutil
.
move
(
os
.
path
.
join
(
dirname
,
"doc"
),
os
.
path
.
join
(
dirname
,
"paddlerec"
))
os
.
path
.
join
(
dirname
,
"doc"
),
os
.
path
.
join
(
dirname
,
"paddlerec"
))
shutil
.
move
(
os
.
path
.
join
(
dirname
,
"models"
),
os
.
path
.
join
(
dirname
,
"paddlerec"
))
shutil
.
move
(
shutil
.
move
(
os
.
path
.
join
(
dirname
,
"tests"
),
os
.
path
.
join
(
dirname
,
"paddlerec"
))
os
.
path
.
join
(
dirname
,
"tests"
),
os
.
path
.
join
(
dirname
,
"paddlerec"
))
shutil
.
move
(
shutil
.
move
(
...
@@ -63,17 +66,8 @@ def build(dirname):
...
@@ -63,17 +66,8 @@ def build(dirname):
package_dir
=
{
''
:
dirname
}
package_dir
=
{
''
:
dirname
}
package_data
=
{}
package_data
=
{}
models_copy
=
[
'data/*.txt'
,
'data/*/*.txt'
,
'*.yaml'
,
'*.sh'
,
'tree/*.npy'
,
'tree/*.txt'
,
'data/sample_data/*'
,
'data/sample_data/train/*'
,
'data/sample_data/infer/*'
,
'data/*/*.csv'
,
'Criteo_data/*'
,
'Criteo_data/sample_data/train/*'
]
engine_copy
=
[
'*/*.sh'
,
'*/*.template'
]
engine_copy
=
[
'*/*.sh'
,
'*/*.template'
]
for
package
in
packages
:
for
package
in
packages
:
if
package
.
startswith
(
"paddlerec.models."
):
package_data
[
package
]
=
models_copy
if
package
.
startswith
(
"paddlerec.core.engine"
):
if
package
.
startswith
(
"paddlerec.core.engine"
):
package_data
[
package
]
=
engine_copy
package_data
[
package
]
=
engine_copy
...
@@ -98,16 +92,6 @@ build(dirname)
...
@@ -98,16 +92,6 @@ build(dirname)
shutil
.
rmtree
(
dirname
)
shutil
.
rmtree
(
dirname
)
print
(
u
'''
print
(
u
'''
\033
[32m
██████╗ █████╗ ██████╗ ██████╗ ██╗ ███████╗██████╗ ███████╗ ██████╗
██╔══██╗██╔══██╗██╔══██╗██╔══██╗██║ ██╔════╝██╔══██╗██╔════╝██╔════╝
██████╔╝███████║██║ ██║██║ ██║██║ █████╗ ██████╔╝█████╗ ██║
██╔═══╝ ██╔══██║██║ ██║██║ ██║██║ ██╔══╝ ██╔══██╗██╔══╝ ██║
██║ ██║ ██║██████╔╝██████╔╝███████╗███████╗██║ ██║███████╗╚██████╗
╚═╝ ╚═╝ ╚═╝╚═════╝ ╚═════╝ ╚══════╝╚══════╝╚═╝ ╚═╝╚══════╝ ╚═════╝
\033
[0m
\033
[34m
Installation Complete. Congratulations!
Installation Complete. Congratulations!
How to use it ? Please visit our webside: https://github.com/PaddlePaddle/PaddleRec
How to use it ? Please visit our webside: https://github.com/PaddlePaddle/PaddleRec
\033
[0m
'''
)
'''
)
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录