未验证 提交 70d0b27a 编写于 作者: T tangwei12 提交者: GitHub

Merge branch 'master' into classification

...@@ -50,11 +50,6 @@ ESMM是发表在 SIGIR’2018 的论文[《Entire Space Multi-Task Model: An E ...@@ -50,11 +50,6 @@ ESMM是发表在 SIGIR’2018 的论文[《Entire Space Multi-Task Model: An E
数据地址:[Ali-CCP:Alibaba Click and Conversion Prediction]( https://tianchi.aliyun.com/datalab/dataSet.html?dataId=408 ) 数据地址:[Ali-CCP:Alibaba Click and Conversion Prediction]( https://tianchi.aliyun.com/datalab/dataSet.html?dataId=408 )
```
cd data
sh run.sh
```
数据格式参见demo数据:data/train 数据格式参见demo数据:data/train
...@@ -108,11 +103,25 @@ CPU环境 ...@@ -108,11 +103,25 @@ CPU环境
## 论文复现 ## 论文复现
用原论文的完整数据复现论文效果需要在config.yaml中修改batch_size=1000, thread_num=8, epoch_num=4 由于原论文的数据太大,我们选取了部分数据作为训练和测试数据, 建议使用gpu训练。
我们的测试ctr auc为0.79+,ctcvr auc为0.82+。
```
wget https://paddlerec.bj.bcebos.com/esmm/traindata_10w.csv
wget https://paddlerec.bj.bcebos.com/esmm/testdata_10w.csv
mkdir data/train_data data/test_data
mv traindata_10w.csv data/train_data
mv testdata_10w.csv data/test_data
```
修改后运行方案:修改config.yaml中的'workspace'为config.yaml的目录位置,执行 用原论文的完整数据复现论文效果需要在config.yaml中修改batch_size=1024, epoch=10, device=gpu, selected_gpus:"0"
具体配置可以下载config_10w.yaml文件
```
wget https://paddlerec.bj.bcebos.com/esmm/config_10w.yaml
```
修改后运行
``` ```
python -m paddlerec.run -m /home/your/dir/config.yaml #调试模式 直接指定本地config的绝对路径 python -m paddlerec.run -m /home/your/dir/config.yaml #调试模式 直接指定本地config的绝对路径
``` ```
......
...@@ -17,19 +17,19 @@ workspace: "models/multitask/esmm" ...@@ -17,19 +17,19 @@ workspace: "models/multitask/esmm"
dataset: dataset:
- name: dataset_train - name: dataset_train
batch_size: 1 batch_size: 5
type: QueueDataset type: QueueDataset
data_path: "{workspace}/data/train" data_path: "{workspace}/data/train"
data_converter: "{workspace}/esmm_reader.py" data_converter: "{workspace}/esmm_reader.py"
- name: dataset_infer - name: dataset_infer
batch_size: 1 batch_size: 5
type: QueueDataset type: QueueDataset
data_path: "{workspace}/data/test" data_path: "{workspace}/data/test"
data_converter: "{workspace}/esmm_reader.py" data_converter: "{workspace}/esmm_reader.py"
hyper_parameters: hyper_parameters:
vocab_size: 10000 vocab_size: 737946
embed_size: 128 embed_size: 12
optimizer: optimizer:
class: adam class: adam
learning_rate: 0.001 learning_rate: 0.001
...@@ -43,15 +43,15 @@ runner: ...@@ -43,15 +43,15 @@ runner:
class: train class: train
device: cpu device: cpu
epochs: 3 epochs: 3
save_checkpoint_interval: 2 save_checkpoint_interval: 1
save_inference_interval: 4 save_inference_interval: 4
save_checkpoint_path: "increment" save_checkpoint_path: "increment_esmm"
save_inference_path: "inference" save_inference_path: "inference"
print_interval: 10 print_interval: 10
phases: [train] phases: [train]
- name: infer_runner - name: infer_runner
class: infer class: infer
init_model_path: "increment/1" init_model_path: "increment_esmm/1"
device: cpu device: cpu
print_interval: 1 print_interval: 1
phases: [infer] phases: [infer]
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册