Update ctr dnn (#4265)

* update ctr/dnn for v/1.7

Update ctr dnn (#4265)
* update ctr/dnn for v/1.7
1a1b2f18 · Chengmo · GitHub · de81b6fb · de81b6fb · de81b6fb
18 changed file
--- a/PaddleRec/ctr/dnn/.run_ce.sh
+++ b/PaddleRec/ctr/dnn/.run_ce.sh
-#!/bin/bash
-
-export MKL_NUM_THREADS=1
-export OMP_NUM_THREADS=1
-
-
-#cudaid=${face_detection:=0} # use 0-th card as default
-#export CUDA_VISIBLE_DEVICES=$cudaid
-export CPU_NUM=1
-export NUM_THREADS=1
-
-FLAGS_benchmark=true  python train.py --is_local 1 --cloud_train 0 --train_data_path data/raw/train.txt --enable_ce | python _ce.py
-
-export CPU_NUM=1
-export NUM_THREADS=8
-
-FLAGS_benchmark=true  python train.py --is_local 1 --cloud_train 0 --train_data_path data/raw/train.txt --enable_ce | python _ce.py
-
-export CPU_NUM=8
-export NUM_THREADS=8
-
-FLAGS_benchmark=true  python train.py --is_local 1 --cloud_train 0 --train_data_path data/raw/train.txt --enable_ce | python _ce.py
--- a/PaddleRec/ctr/dnn/README.cn.md
+++ b/PaddleRec/ctr/dnn/README.cn.md
-
-# 基于DNN模型的点击率预估模型
-
-## 介绍
-本模型实现了下述论文中提出的DNN模型：
-
-```text
-@inproceedings{guo2017deepfm,
-  title={DeepFM: A Factorization-Machine based Neural Network for CTR Prediction},
-  author={Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li and Xiuqiang He},
-  booktitle={the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI)},
-  pages={1725--1731},
-  year={2017}
-}
-```
-
-## 运行环境
-**要求使用PaddlePaddle 1.6及以上版本或适当的develop版本。**
-需要先安装PaddlePaddle Fluid，然后运行：
-
-```shell
-pip install -r requirements.txt
-```
-
-## 数据集
-本文使用的是Kaggle公司举办的[展示广告竞赛](https://www.kaggle.com/c/criteo-display-ad-challenge/)中所使用的Criteo数据集。
-
-每一行是一次广告展示的特征，第一列是一个标签，表示这次广告展示是否被点击。总共有39个特征，其中13个特征采用整型值，另外26个特征是类别类特征。测试集中是没有标签的。
-
-下载数据集：
-```bash
-cd data && python download.py && cd ..
-```
-
-## 模型
-本例子只实现了DeepFM论文中介绍的模型的DNN部分，DeepFM会在其他例子中给出。
-
-
-## 数据准备
-处理原始数据集，整型特征使用min-max归一化方法规范到[0, 1]，类别类特征使用了one-hot编码。原始数据集分割成两部分：90%用于训练，其他10%用于训练过程中的验证。
-
-## 训练
-训练的命令行选项可以通过`python train.py -h`列出。
-
-### 单机训练：
-```bash
-python train.py \
-        --train_data_path data/raw/train.txt \
-        2>&1 | tee train.log
-```
-
-### 分布式训练
-
-本地启动一个2 trainer 2 pserver的分布式训练任务，分布式场景下训练数据会按照trainer的id进行切分，保证trainer之间的训练数据不会重叠，提高训练效率
-
-```bash
-# 该sh不支持Windows
-sh cluster_train.sh
-```
-
-## 预测
-预测的命令行选项可以通过`python infer.py -h`列出。
-
-对测试集进行预测：
-```bash
-python infer.py \
-        --model_path models/pass-2/ \
-        --data_path data/raw/train.txt
-```
-
-加载pass-2的模型, 预期测试AUC为`0.794`。
-
-注意：infer.py跑完最后输出的AUC才是整个预测文件的整体AUC。train.txt文件在reader.py中被分为训练和测试两部分，所以这里数据不会和训练重叠。
-
-## 在百度云上运行集群训练
-1. 参考文档 [在百度云上启动Fluid分布式训练](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/user_guides/howto/training/train_on_baidu_cloud_cn.rst) 在百度云上部署一个CPU集群。
-1. 用preprocess.py处理训练数据生成train.txt。
-1. 将train.txt切分成集群机器份，放到每台机器上。
-1. 用上面的 `分布式训练` 中的命令行启动分布式训练任务.
-
-## 在PaddleCloud上运行集群训练
-如果你正在使用PaddleCloud做集群训练，你可以使用```cloud.py```这个文件来帮助你提交任务，```trian.py```中所需要的参数可以通过PaddleCloud的环境变量来提交。
--- a/PaddleRec/ctr/dnn/README.md
+++ b/PaddleRec/ctr/dnn/README.md
--- a/PaddleRec/ctr/dnn/_ce.py
+++ b/PaddleRec/ctr/dnn/_ce.py
-# this file is only used for continuous evaluation test!
-
-import os
-import sys
-sys.path.append(os.environ['ceroot'])
-from kpi import CostKpi
-from kpi import DurationKpi
-from kpi import AccKpi
-
-each_pass_duration_cpu1_thread1_kpi = DurationKpi(
-    'each_pass_duration_cpu1_thread1', 0.08, 0, actived=True)
-train_loss_cpu1_thread1_kpi = CostKpi('train_loss_cpu1_thread1', 0.08, 0)
-train_auc_val_cpu1_thread1_kpi = AccKpi('train_auc_val_cpu1_thread1', 0.08, 0)
-train_batch_auc_val_cpu1_thread1_kpi = AccKpi(
-    'train_batch_auc_val_cpu1_thread1', 0.08, 0)
-each_pass_duration_cpu1_thread8_kpi = DurationKpi(
-    'each_pass_duration_cpu1_thread8', 0.08, 0, actived=True)
-train_loss_cpu1_thread8_kpi = CostKpi('train_loss_cpu1_thread8', 0.08, 0)
-train_auc_val_cpu1_thread8_kpi = AccKpi('train_auc_val_cpu1_thread8', 0.08, 0)
-train_batch_auc_val_cpu1_thread8_kpi = AccKpi(
-    'train_batch_auc_val_cpu1_thread8', 0.08, 0)
-each_pass_duration_cpu8_thread8_kpi = DurationKpi(
-    'each_pass_duration_cpu8_thread8', 0.08, 0, actived=True)
-train_loss_cpu8_thread8_kpi = CostKpi('train_loss_cpu8_thread8', 0.08, 0)
-train_auc_val_cpu8_thread8_kpi = AccKpi('train_auc_val_cpu8_thread8', 0.08, 0)
-train_batch_auc_val_cpu8_thread8_kpi = AccKpi(
-    'train_batch_auc_val_cpu8_thread8', 0.08, 0)
-
-tracking_kpis = [
-    each_pass_duration_cpu1_thread1_kpi,
-    train_loss_cpu1_thread1_kpi,
-    train_auc_val_cpu1_thread1_kpi,
-    train_batch_auc_val_cpu1_thread1_kpi,
-    each_pass_duration_cpu1_thread8_kpi,
-    train_loss_cpu1_thread8_kpi,
-    train_auc_val_cpu1_thread8_kpi,
-    train_batch_auc_val_cpu1_thread8_kpi,
-    each_pass_duration_cpu8_thread8_kpi,
-    train_loss_cpu8_thread8_kpi,
-    train_auc_val_cpu8_thread8_kpi,
-    train_batch_auc_val_cpu8_thread8_kpi,
-]
-
-
-def parse_log(log):
-    '''
-    This method should be implemented by model developers.
-
-    The suggestion:
-
-    each line in the log should be key, value, for example:
-
-    "
-    train_cost\t1.0
-    test_cost\t1.0
-    train_cost\t1.0
-    train_cost\t1.0
-    train_acc\t1.2
-    "
-    '''
-    for line in log.split('\n'):
-        fs = line.strip().split('\t')
-        print(fs)
-        if len(fs) == 3 and fs[0] == 'kpis':
-            kpi_name = fs[1]
-            kpi_value = float(fs[2])
-            yield kpi_name, kpi_value
-
-
-def log_to_ce(log):
-    kpi_tracker = {}
-    for kpi in tracking_kpis:
-        kpi_tracker[kpi.name] = kpi
-
-    for (kpi_name, kpi_value) in parse_log(log):
-        print(kpi_name, kpi_value)
-        kpi_tracker[kpi_name].add_record(kpi_value)
-        kpi_tracker[kpi_name].persist()
-
-
-if __name__ == '__main__':
-    log = sys.stdin.read()
-    log_to_ce(log)
--- a/PaddleRec/ctr/dnn/cloud.py
+++ b/PaddleRec/ctr/dnn/cloud.py
-#!/usr/bin/env python
-# -*- coding: utf-8 -*-
-# ======================================================================
-#
-# Copyright (c) 2017 Baidu.com, Inc. All Rights Reserved
-#
-# ======================================================================
-"""this file is only for PaddleCloud"""
-
-import os
-
-import logging
-
-import paddle.fluid.contrib.utils.hdfs_utils as hdfs_utils
-
-logging.basicConfig(format='%(asctime)s - %(levelname)s - %(message)s')
-logger = logging.getLogger("cloud")
-logger.setLevel(logging.INFO)
-
-
-def run():
-    cmd = "python -u train.py "
-
-    cmd += " --train_data_path %s " % "data/train.txt"
-
-    cmd += " --test_data_path %s " % "data/test.txt"
-
-    if os.getenv("BATCH_SIZE", ""):
-        cmd += " --batch_size %s " % os.getenv("BATCH_SIZE")
-
-    if os.getenv("EMBEDDING_SIZE", ""):
-        cmd += " --embedding_size %s " % os.getenv("EMBEDDING_SIZE")
-
-    if os.getenv("NUM_PASSES", ""):
-        cmd += " --num_passes %s " % os.getenv("NUM_PASSES")
-
-    if os.getenv("MODEL_OUTPUT_DIR", ""):
-        cmd += " --model_output_dir %s " % os.getenv("MODEL_OUTPUT_DIR")
-
-    if os.getenv("SPARSE_FEATURE_DIM", ""):
-        cmd += " --sparse_feature_dim %s " % os.getenv("SPARSE_FEATURE_DIM")
-
-    if os.getenv("ASYNC_MODE", ""):
-        cmd += " --async_mode "
-
-    if os.getenv("NO_SPLIT_VAR", ""):
-        cmd += " --no_split_var "
-
-    is_local = int(os.getenv("PADDLE_IS_LOCAL", "1"))
-
-    if is_local:
-        cmd += " --is_local 1 "
-        cmd += " --cloud_train 0 "
-    else:
-        cmd += " --is_local 0 "
-        cmd += " --cloud_train 1 "
-
-        trainer_id = int(os.environ["PADDLE_TRAINER_ID"])
-        trainers = int(os.environ["PADDLE_TRAINERS"])
-        training_role = os.environ["PADDLE_TRAINING_ROLE"]
-
-        port = os.getenv("PADDLE_PSERVER_PORT", "6174")
-        pserver_ips = os.getenv("PADDLE_PSERVER_IPS", "")
-        eplist = []
-        for ip in pserver_ips.split(","):
-            eplist.append(':'.join([ip, port]))
-        pserver_endpoints = ",".join(eplist)
-        current_endpoint = os.getenv("PADDLE_CURRENT_IP", "") + ":" + port
-
-        if training_role == "PSERVER":
-            cmd += " --role pserver "
-        else:
-            cmd += " --role trainer "
-        cmd += " --endpoints %s " % pserver_endpoints
-        cmd += " --current_endpoint %s " % current_endpoint
-        cmd += " --trainer_id %s " % trainer_id
-        cmd += " --trainers %s " % trainers
-
-    logging.info("run cluster commands: {}".format(cmd))
-
-    exit(os.system(cmd))
-
-
-def download():
-    hadoop_home = os.getenv("HADOOP_HOME")
-
-    configs = {}
-    configs["fs.default.name"] = os.getenv("DATA_FS_NAME")
-    configs["hadoop.job.ugi"] = os.getenv("DATA_FS_UGI")
-    client = hdfs_utils.HDFSClient(hadoop_home, configs)
-
-    local_train_data_dir = os.getenv("TRAIN_DATA_LOCAL", "data")
-    hdfs_train_data_dir = os.getenv("TRAIN_DATA_HDFS", "")
-
-    downloads = hdfs_utils.multi_download(
-        client,
-        hdfs_train_data_dir,
-        local_train_data_dir,
-        0,
-        1,
-        multi_processes=1)
-
-    print(downloads)
-    for d in downloads:
-        base_dir = os.path.dirname(d)
-        tar_cmd = "tar -zxvf {} -C {}".format(d, base_dir)
-        print tar_cmd
-
-    for d in downloads:
-        base_dir = os.path.dirname(d)
-        tar_cmd = "tar -zxvf {} -C {}".format(d, base_dir)
-        logging.info("DOWNLOAD DATA: {}, AND TAR IT: {}".format(d, tar_cmd))
-        os.system(tar_cmd)
-
-
-def env_declar():
-    logging.info("********  Rename Cluster Env to PaddleFluid Env ********")
-
-    if os.environ["TRAINING_ROLE"] == "PSERVER" or os.environ[
-            "PADDLE_IS_LOCAL"] == "0":
-        os.environ["PADDLE_TRAINING_ROLE"] = os.environ["TRAINING_ROLE"]
-        os.environ["PADDLE_PSERVER_PORT"] = os.environ["PADDLE_PORT"]
-        os.environ["PADDLE_PSERVER_IPS"] = os.environ["PADDLE_PSERVERS"]
-        os.environ["PADDLE_TRAINERS"] = os.environ["PADDLE_TRAINERS_NUM"]
-        os.environ["PADDLE_CURRENT_IP"] = os.environ["POD_IP"]
-        os.environ["PADDLE_TRAINER_ID"] = os.environ["PADDLE_TRAINER_ID"]
-
-    os.environ["CPU_NUM"] = os.getenv("CPU_NUM", "12")
-    os.environ["NUM_THREADS"] = os.getenv("NUM_THREADS", "12")
-
-    logging.info("Content-Type: text/plain\n\n")
-    for key in os.environ.keys():
-        logging.info("%30s %s \n" % (key, os.environ[key]))
-
-    logging.info("******  Rename Cluster Env to PaddleFluid Env END ******")
-
-
-if __name__ == '__main__':
-    env_declar()
-
-    if os.getenv("NEED_CUSTOM_DOWNLOAD", ""):
-
-        if os.environ["PADDLE_TRAINING_ROLE"] == "PSERVER":
-            logging.info("PSERVER do not need to download datas")
-        else:
-            logging.info(
-                "NEED_CUSTOM_DOWNLOAD is True, will download train data with hdfs_utils"
-            )
-            download()
-
-    run()
--- a/PaddleRec/ctr/dnn/cluster_train.sh
+++ b/PaddleRec/ctr/dnn/cluster_train.sh
-#!/bin/bash
-
-# start pserver0
-python train.py \
-    --train_data_path /paddle/data/train.txt \
-    --is_local 0 \
-    --role pserver \
-    --endpoints 127.0.0.1:6000,127.0.0.1:6001 \
-    --current_endpoint 127.0.0.1:6000 \
-    --trainers 2 \
-    > pserver0.log 2>&1 &
-
-# start pserver1
-python train.py \
-    --train_data_path /paddle/data/train.txt \
-    --is_local 0 \
-    --role pserver \
-    --endpoints 127.0.0.1:6000,127.0.0.1:6001 \
-    --current_endpoint 127.0.0.1:6001 \
-    --trainers 2 \
-    > pserver1.log 2>&1 &
-
-# start trainer0
-python train.py \
-    --train_data_path /paddle/data/train.txt \
-    --is_local 0 \
-    --role trainer \
-    --endpoints 127.0.0.1:6000,127.0.0.1:6001 \
-    --trainers 2 \
-    --trainer_id 0 \
-    > trainer0.log 2>&1 &
-
-# start trainer1
-python train.py \
-    --train_data_path /paddle/data/train.txt \
-    --is_local 0 \
-    --role trainer \
-    --endpoints 127.0.0.1:6000,127.0.0.1:6001 \
-    --trainers 2 \
-    --trainer_id 1 \
-    > trainer1.log 2>&1 &
\ No newline at end of file
--- a/PaddleRec/ctr/dnn/data/download.py
+++ b/PaddleRec/ctr/dnn/data/download.py
-import os
-import shutil
-import sys
-import glob
-
-LOCAL_PATH = os.path.dirname(os.path.abspath(__file__))
-TOOLS_PATH = os.path.join(LOCAL_PATH, "..", "..", "tools")
-sys.path.append(TOOLS_PATH)
-
-from tools import download_file_and_uncompress
-
-if __name__ == '__main__':
-    url = "https://s3-eu-west-1.amazonaws.com/kaggle-display-advertising-challenge-dataset/dac.tar.gz"
-
-    print("download and extract starting...")
-    download_file_and_uncompress(url)
-    print("download and extract finished")
-
-    if os.path.exists("raw"):
-        shutil.rmtree("raw")
-    os.mkdir("raw")
-
-    # mv ./*.txt raw/
-    files = glob.glob("*.txt")
-    for f in files:
-        shutil.move(f, "raw")
-
-    print("done")
--- a/PaddleRec/ctr/dnn/data/download.sh
+++ b/PaddleRec/ctr/dnn/data/download.sh
-#!/bin/bash
-
-wget --no-check-certificate https://s3-eu-west-1.amazonaws.com/kaggle-display-advertising-challenge-dataset/dac.tar.gz
-tar zxf dac.tar.gz >/dev/null 2>&1
-rm -f dac.tar.gz
-
-mkdir raw
-mv ./*.txt raw/
--- a/PaddleRec/ctr/dnn/dataset_generator.py
+++ b/PaddleRec/ctr/dnn/dataset_generator.py
+#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import paddle.fluid.incubate.data_generator as dg
+
+cont_min_ = [0, -3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
+cont_max_ = [20, 600, 100, 50, 64000, 500, 100, 50, 500, 10, 10, 10, 50]
+cont_diff_ = [20, 603, 100, 50, 64000, 500, 100, 50, 500, 10, 10, 10, 50]
+hash_dim_ = 1000001
+continuous_range_ = range(1, 14)
+categorical_range_ = range(14, 40)
+
+
+class CriteoDataset(dg.MultiSlotDataGenerator):
+    """
+    DacDataset: inheritance MultiSlotDataGeneratior, Implement data reading
+    Help document: http://wiki.baidu.com/pages/viewpage.action?pageId=728820675
+    """
+
+    def generate_sample(self, line):
+        """
+        Read the data line by line and process it as a dictionary
+        """
+        def reader():
+            """
+            This function needs to be implemented by the user, based on data format
+            """
+            features = line.rstrip('\n').split('\t')
+            dense_feature = []
+            sparse_feature = []
+            for idx in continuous_range_:
+                if features[idx] == "":
+                    dense_feature.append(0.0)
+                else:
+                    dense_feature.append(
+                        (float(features[idx]) - cont_min_[idx - 1]) /
+                        cont_diff_[idx - 1])
+            for idx in categorical_range_:
+                sparse_feature.append(
+                    [hash(str(idx) + features[idx]) % hash_dim_])
+            label = [int(features[0])]
+            process_line = dense_feature, sparse_feature, label
+            feature_name = ["dense_feature"]
+            for idx in categorical_range_:
+                feature_name.append("C" + str(idx - 13))
+            feature_name.append("label")
+
+            yield zip(feature_name, [dense_feature] + sparse_feature + [label])
+
+        return reader
+
+
+d = CriteoDataset()
+d.run_from_stdin()
--- a/PaddleRec/ctr/dnn/download_data.sh
+++ b/PaddleRec/ctr/dnn/download_data.sh
+wget --no-check-certificate https://fleet.bj.bcebos.com/ctr_data.tar.gz
+tar -zxvf ctr_data.tar.gz
+mv ./raw_data ./train_data_full
+mkdir train_data && cd train_data
+cp ../train_data_full/part-0 ../train_data_full/part-1 ./ && cd ..
+mv ./test_data ./test_data_full
+mkdir test_data && cd test_data
+cp ../test_data_full/part-220 ./  && cd ..
+echo "Complete data download."
+echo "Full Train data stored in ./train_data_full "
+echo "Full Test data stored in ./test_data_full "
+echo "Rapid Verification train data stored in ./train_data "
+echo "Rapid Verification test data stored in ./test_data "
\ No newline at end of file
--- a/PaddleRec/ctr/dnn/reader.py
+++ b/PaddleRec/ctr/dnn/reader.py
-import mmh3
+#!/usr/bin/python
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+# There are 13 integer features and 26 categorical features
+continous_features = range(1, 14)
+categorial_features = range(14, 40)
+continous_clip = [20, 600, 100, 50, 64000, 500, 100, 50, 500, 10, 10, 10, 50]

-class Dataset:
-    def __init__(self):
-        pass

-
-class CriteoDataset(Dataset):
+class CriteoDataset(object):
    def __init__(self, sparse_feature_dim):
        self.cont_min_ = [0, -3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
        self.cont_max_ = [
@@ -28,12 +41,6 @@ class CriteoDataset(Dataset):
                    line_idx = 0
                    for line in f:
                        line_idx += 1
-                        if is_train and line_idx > self.train_idx_:
-                            break
-                        elif not is_train and line_idx <= self.train_idx_:
-                            continue
-                        if line_idx % trainer_num != trainer_id:
-                            continue
                        features = line.rstrip('\n').split('\t')
                        dense_feature = []
                        sparse_feature = []
@@ -41,13 +48,13 @@ class CriteoDataset(Dataset):
                            if features[idx] == '':
                                dense_feature.append(0.0)
                            else:
-                                dense_feature.append((float(features[idx]) -
-                                                      self.cont_min_[idx - 1]) /
-                                                     self.cont_diff_[idx - 1])
+                                dense_feature.append(
+                                    (float(features[idx]) -
+                                     self.cont_min_[idx - 1]) /
+                                    self.cont_diff_[idx - 1])
                        for idx in self.categorical_range_:
                            sparse_feature.append([
-                                mmh3.hash(str(idx) + features[idx]) %
-                                self.hash_dim_
+                                hash(str(idx) + features[idx]) % self.hash_dim_
                            ])

                        label = [int(features[0])]
@@ -60,6 +67,3 @@ class CriteoDataset(Dataset):

    def test(self, file_list):
        return self._reader_creator(file_list, False, 1, 0)
-
-    def infer(self, file_list):
-        return self._reader_creator(file_list, False, 1, 0)
--- a/PaddleRec/ctr/dnn/infer.py
+++ b/PaddleRec/ctr/dnn/infer.py
-import argparse
-import logging
-
-import numpy as np
-# disable gpu training for this example 
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import print_function
 import os
-os.environ["CUDA_VISIBLE_DEVICES"] = ""
+import time
+import numpy as np
+import logging
+import argparse
 import paddle
 import paddle.fluid as fluid
-
-import reader
-from network_conf import ctr_dnn_model
-import utils
+from network_conf import CTR
+import feed_generator as generator

 logging.basicConfig(format='%(asctime)s - %(levelname)s - %(message)s')
 logger = logging.getLogger("fluid")
@@ -18,17 +29,43 @@ logger.setLevel(logging.INFO)


 def parse_args():
-    parser = argparse.ArgumentParser(description="PaddlePaddle DeepFM example")
+    parser = argparse.ArgumentParser(
+        description="PaddlePaddle CTR-DNN example")
+    # -------------Data & Model Path-------------
    parser.add_argument(
-        '--model_path',
+        '--train_files_path',
        type=str,
-        required=True,
-        help="The path of model parameters gz file")
+        default='./train_data',
+        help="The path of training dataset")
    parser.add_argument(
-        '--data_path',
+        '--test_files_path',
+        type=str,
+        default='./test_data',
+        help="The path of testing dataset")
+    parser.add_argument(
+        '--model_path',
        type=str,
-        required=True,
-        help="The path of the dataset to infer")
+        default='models',
+        help='The path for model to store (default: models)')
+
+    # -------------Training parameter-------------
+    parser.add_argument(
+        '--learning_rate',
+        type=float,
+        default=1e-4,
+        help="Initial learning rate for training")
+    parser.add_argument(
+        '--batch_size',
+        type=int,
+        default=1000,
+        help="The size of mini-batch (default:1000)")
+    parser.add_argument(
+        "--epochs",
+        type=int,
+        default=1,
+        help="Number of epochs for training.")
+
+    # -------------Network parameter-------------
    parser.add_argument(
        '--embedding_size',
        type=int,
@@ -38,59 +75,118 @@ def parse_args():
        '--sparse_feature_dim',
        type=int,
        default=1000001,
-        help="The size for embedding layer (default:1000001)")
+        help='sparse feature hashing space for index processing')
    parser.add_argument(
-        '--batch_size',
+        '--dense_feature_dim',
        type=int,
-        default=1000,
-        help="The size of mini-batch (default:1000)")
+        default=13,
+        help='dense feature shape')
+
+    # -------------device parameter-------------
+    parser.add_argument(
+        '--is_local',
+        type=int,
+        default=0,
+        help='Local train or distributed train (default: 1)')
+    parser.add_argument(
+        '--is_cloud',
+        type=int,
+        default=0,
+        help='Local train or distributed train on paddlecloud (default: 0)')
+    parser.add_argument(
+        '--save_model',
+        type=int,
+        default=0,
+        help='Save training model or not')
+    parser.add_argument(
+        '--enable_ce',
+        action='store_true',
+        help='If set, run the task with continuous evaluation logs.')
+    parser.add_argument(
+        '--cpu_num',
+        type=int,
+        default=2,
+        help='threads for ctr training')

    return parser.parse_args()


-def infer():
-    args = parse_args()
-
+def run_infer(args, model_path):
    place = fluid.CPUPlace()
-    inference_scope = fluid.Scope()
-
-    dataset = reader.CriteoDataset(args.sparse_feature_dim)
-    test_reader = paddle.batch(
-        dataset.test([args.data_path]), batch_size=args.batch_size)
-
+    train_generator = generator.CriteoDataset(args.sparse_feature_dim)
+    file_list = [
+        str(args.test_files_path) + "/%s" % x
+        for x in os.listdir(args.test_files_path)
+    ]
+    test_reader = paddle.batch(train_generator.test(file_list),
+                               batch_size=args.batch_size)
    startup_program = fluid.framework.Program()
    test_program = fluid.framework.Program()
-    with fluid.scope_guard(inference_scope):
-        with fluid.framework.program_guard(test_program, startup_program):
-            loss, auc_var, batch_auc_var, _, data_list, auc_states = ctr_dnn_model(
-                args.embedding_size, args.sparse_feature_dim, False)
-
-            exe = fluid.Executor(place)
-
-            feeder = fluid.DataFeeder(feed_list=data_list, place=place)
-
-            fluid.io.load_persistables(
-                executor=exe,
-                dirname=args.model_path,
-                main_program=fluid.default_main_program())
-
-            def set_zero(var_name):
-                param = inference_scope.var(var_name).get_tensor()
+    ctr_model = CTR()
+
+    def set_zero():
+        auc_states_names = [
+            '_generated_var_0', '_generated_var_1', '_generated_var_2',
+            '_generated_var_3'
+        ]
+        for name in auc_states_names:
+            param = fluid.global_scope().var(name).get_tensor()
+            if param:
                param_array = np.zeros(param._get_dims()).astype("int64")
                param.set(param_array, place)

-            for var in auc_states:
-                set_zero(var.name)
+    with fluid.framework.program_guard(test_program, startup_program):
+        with fluid.unique_name.guard():
+            inputs = ctr_model.input_data(args)
+            loss, auc_var = ctr_model.net(inputs, args)

+            exe = fluid.Executor(place)
+            feeder = fluid.DataFeeder(feed_list=inputs, place=place)
+
+            if args.is_cloud:
+                fluid.io.load_persistables(
+                    executor=exe,
+                    dirname=model_path,
+                    main_program=fluid.default_main_program())
+            elif args.is_local:
+                fluid.load(fluid.default_main_program(),
+                           model_path + "/checkpoint", exe)
+            set_zero()
+
+            run_index = 0
+            infer_auc = 0
+            L = []
            for batch_id, data in enumerate(test_reader()):
                loss_val, auc_val = exe.run(test_program,
                                            feed=feeder.feed(data),
                                            fetch_list=[loss, auc_var])
+                run_index += 1
+                infer_auc = auc_val
+                L.append(loss_val / args.batch_size)
                if batch_id % 100 == 0:
                    logger.info("TEST --> batch: {} loss: {} auc: {}".format(
                        batch_id, loss_val / args.batch_size, auc_val))

+            infer_loss = np.mean(L)
+            infer_result = {}
+            infer_result['loss'] = infer_loss
+            infer_result['auc'] = infer_auc
+            log_path = model_path + '/infer_result.log'
+            logger.info(str(infer_result))
+            with open(log_path, 'w+') as f:
+                f.write(str(infer_result))
+            logger.info("Inference complete")
+    return infer_result

-if __name__ == '__main__':
-    utils.check_version()
-    infer()
+
+if __name__ == "__main__":
+    args = parse_args()
+    model_list = []
+    for _, dir, _ in os.walk(args.model_path):
+        for model in dir:
+            if "epoch" in model:
+                path = "/".join([args.model_path, model])
+                model_list.append(path)
+    for model in model_list:
+        logger.info("Test model {}".format(model))
+        run_infer(args, model)
--- a/PaddleRec/ctr/dnn/local_cluster.sh
+++ b/PaddleRec/ctr/dnn/local_cluster.sh
+#!/bin/bash
+echo "WARNING: This script only for run Paddle Paddle CTR distribute training locally"
+
+if [ ! -d "./models" ]; then
+  mkdir ./models
+  echo "Create model folder for store infer model"
+fi
+
+if [ ! -d "./log" ]; then
+  mkdir ./log
+  echo "Create log floder for store running log"
+fi
+
+if [ ! -d "./output" ]; then
+  mkdir ./output
+  echo "Create output floder"
+fi
+
+# kill existing server process
+ps -ef|grep python|awk '{print $2}'|xargs kill -9
+
+# environment variables for fleet distribute training
+export PADDLE_TRAINER_ID=0
+
+export PADDLE_TRAINERS_NUM=2
+export OUTPUT_PATH="output"
+
+export FLAGS_communicator_thread_pool_size=5
+export FLAGS_communicator_fake_rpc=0
+export FLAGS_communicator_is_sgd_optimizer=0
+export FLAGS_rpc_retry_times=3
+export PADDLE_PSERVERS_IP_PORT_LIST="127.0.0.1:36011,127.0.0.1:36012"
+export PADDLE_PSERVER_PORT_ARRAY=(36011 36012)
+
+export PADDLE_PSERVER_NUMS=2
+export PADDLE_TRAINERS=2
+
+export TRAINING_ROLE=PSERVER
+export GLOG_v=0
+export GLOG_logtostderr=1
+
+
+train_mode=$1
+
+for((i=0;i<$PADDLE_PSERVER_NUMS;i++))
+do
+    cur_port=${PADDLE_PSERVER_PORT_ARRAY[$i]}
+    echo "PADDLE WILL START PSERVER "$cur_port
+    export PADDLE_PORT=${cur_port}
+    export POD_IP=127.0.0.1
+    python -u train.py --save_model=1 --is_cloud=1 --cpu_num=10 &> ./log/pserver.$i.log &
+done
+
+export TRAINING_ROLE=TRAINER
+export GLOG_v=0
+export GLOG_logtostderr=1
+
+for((i=0;i<$PADDLE_TRAINERS;i++))
+do
+    echo "PADDLE WILL START Trainer "$i
+    PADDLE_TRAINER_ID=$i
+    python -u train.py --save_model=1 --is_cloud=1 --cpu_num=10 &> ./log/trainer.$i.log &
+done
+
+echo "Training log stored in ./log/"
\ No newline at end of file
--- a/PaddleRec/ctr/dnn/network_conf.py
+++ b/PaddleRec/ctr/dnn/network_conf.py
+#!/usr/bin/python
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 import paddle.fluid as fluid
 import math

-dense_feature_dim = 13
-
-
-def ctr_deepfm_model(factor_size, sparse_feature_dim, dense_feature_dim,
-                     sparse_input):
-    def dense_fm_layer(input, emb_dict_size, factor_size, fm_param_attr):
-        """
-        dense_fm_layer
-        """
-        first_order = fluid.layers.fc(input=input, size=1)
-        emb_table = fluid.layers.create_parameter(
-            shape=[emb_dict_size, factor_size],
-            dtype='float32',
-            attr=fm_param_attr)
-
-        input_mul_factor = fluid.layers.matmul(input, emb_table)
-        input_mul_factor_square = fluid.layers.square(input_mul_factor)
-        input_square = fluid.layers.square(input)
-        factor_square = fluid.layers.square(emb_table)
-        input_square_mul_factor_square = fluid.layers.matmul(input_square,
-                                                             factor_square)
-
-        second_order = 0.5 * (
-            input_mul_factor_square - input_square_mul_factor_square)
-        return first_order, second_order
-
-    def sparse_fm_layer(input, emb_dict_size, factor_size, fm_param_attr):
-        """
-        sparse_fm_layer
-        """
-        first_embeddings = fluid.embedding(
-            input=input,
-            dtype='float32',
-            size=[emb_dict_size, 1],
-            is_sparse=True)
-        first_embeddings = fluid.layers.squeeze(
-            input=first_embeddings, axes=[1])
-        first_order = fluid.layers.sequence_pool(
-            input=first_embeddings, pool_type='sum')
-
-        nonzero_embeddings = fluid.embedding(
-            input=input,
-            dtype='float32',
-            size=[emb_dict_size, factor_size],
-            param_attr=fm_param_attr,
-            is_sparse=True)
-        nonzero_embeddings = fluid.layers.squeeze(
-            input=nonzero_embeddings, axes=[1])
-        summed_features_emb = fluid.layers.sequence_pool(
-            input=nonzero_embeddings, pool_type='sum')
-        summed_features_emb_square = fluid.layers.square(summed_features_emb)
-
-        squared_features_emb = fluid.layers.square(nonzero_embeddings)
-        squared_sum_features_emb = fluid.layers.sequence_pool(
-            input=squared_features_emb, pool_type='sum')
-
-        second_order = 0.5 * (
-            summed_features_emb_square - squared_sum_features_emb)
-        return first_order, second_order
-
-    dense_input = fluid.data(
-        name="dense_input", shape=[None, dense_feature_dim], dtype='float32')
-
-    sparse_input_ids = [
-        fluid.layers.data(
-            name="C" + str(i), shape=[1], lod_level=1, dtype='int64')
-        for i in range(1, 27)
-    ]
-
-    label = fluid.data(name='label', shape=[None, 1], dtype='int64')
-
-    datas = [dense_input] + sparse_input_ids + [label]
-
-    py_reader = fluid.layers.create_py_reader_by_data(
-        capacity=64, feed_list=datas, name='py_reader', use_double_buffer=True)
-    words = fluid.layers.read_file(py_reader)
-
-    sparse_fm_param_attr = fluid.param_attr.ParamAttr(
-        name="SparseFeatFactors",
-        initializer=fluid.initializer.Normal(scale=1 /
-                                             math.sqrt(sparse_feature_dim)))
-    dense_fm_param_attr = fluid.param_attr.ParamAttr(
-        name="DenseFeatFactors",
-        initializer=fluid.initializer.Normal(scale=1 /
-                                             math.sqrt(dense_feature_dim)))
-
-    sparse_fm_first, sparse_fm_second = sparse_fm_layer(
-        sparse_input, sparse_feature_dim, factor_size, sparse_fm_param_attr)
-    dense_fm_first, dense_fm_second = dense_fm_layer(
-        dense_input, dense_feature_dim, factor_size, dense_fm_param_attr)
-
-    def embedding_layer(input):
-        """embedding_layer"""
-        emb = fluid.layers.embedding(
-            input=input,
-            dtype='float32',
-            size=[sparse_feature_dim, factor_size],
-            param_attr=sparse_fm_param_attr,
-            is_sparse=True)
-        emb = fluid.layers.squeeze(input=emb, axes=[1])
-        return fluid.layers.sequence_pool(input=emb, pool_type='average')
-
-    sparse_embed_seq = list(map(embedding_layer, sparse_input_ids))
-    concated = fluid.layers.concat(sparse_embed_seq + [dense_input], axis=1)
-    fc1 = fluid.layers.fc(input=concated,
-                          size=400,
-                          act='relu',
-                          param_attr=fluid.ParamAttr(
-                              initializer=fluid.initializer.Normal(
-                                  scale=1 / math.sqrt(concated.shape[1]))))
-    fc2 = fluid.layers.fc(input=fc1,
-                          size=400,
-                          act='relu',
-                          param_attr=fluid.ParamAttr(
-                              initializer=fluid.initializer.Normal(
-                                  scale=1 / math.sqrt(fc1.shape[1]))))
-    fc3 = fluid.layers.fc(input=fc2,
-                          size=400,
-                          act='relu',
-                          param_attr=fluid.ParamAttr(
-                              initializer=fluid.initializer.Normal(
-                                  scale=1 / math.sqrt(fc2.shape[1]))))
-    predict = fluid.layers.fc(input=[
-        sparse_fm_first, sparse_fm_second, dense_fm_first, dense_fm_second, fc3
-    ],
-                              size=2,
-                              act="softmax",
-                              param_attr=fluid.ParamAttr(
-                                  initializer=fluid.initializer.Normal(
-                                      scale=1 / math.sqrt(fc3.shape[1]))))
-
-    cost = fluid.layers.cross_entropy(input=predict, label=words[-1])
-    avg_cost = fluid.layers.reduce_sum(cost)
-    accuracy = fluid.layers.accuracy(input=predict, label=words[-1])
-    auc_var, batch_auc_var, auc_states = \
-        fluid.layers.auc(input=predict, label=words[-1], num_thresholds=2 ** 12, slide_steps=20)
-
-    return avg_cost, auc_var, batch_auc_var, py_reader
-
-
-def ctr_dnn_model(embedding_size, sparse_feature_dim, use_py_reader=True):
-    def embedding_layer(input):
-        """embedding_layer"""
-        emb = fluid.embedding(
-            input=input,
-            is_sparse=True,
-            # you need to patch https://github.com/PaddlePaddle/Paddle/pull/14190
-            # if you want to set is_distributed to True
-            is_distributed=False,
-            size=[sparse_feature_dim, embedding_size],
-            param_attr=fluid.ParamAttr(
-                name="SparseFeatFactors",
-                initializer=fluid.initializer.Uniform()))
-        emb = fluid.layers.squeeze(input=emb, axes=[1])
-        return fluid.layers.sequence_pool(input=emb, pool_type='average')
-
-    dense_input = fluid.data(
-        name="dense_input", shape=[None, dense_feature_dim], dtype='float32')
-
-    sparse_input_ids = [
-        fluid.data(
-            name="C" + str(i), shape=[None, 1], lod_level=1, dtype='int64')
-        for i in range(1, 27)
-    ]
-
-    label = fluid.data(name='label', shape=[None, 1], dtype='int64')
-
-    words = [dense_input] + sparse_input_ids + [label]
-
-    py_reader = None
-    if use_py_reader:
-        py_reader = fluid.layers.create_py_reader_by_data(
-            capacity=64,
-            feed_list=words,
-            name='py_reader',
-            use_double_buffer=True)
-        words = fluid.layers.read_file(py_reader)
-
-    sparse_embed_seq = list(map(embedding_layer, words[1:-1]))
-    concated = fluid.layers.concat(sparse_embed_seq + words[0:1], axis=1)
-
-    fc1 = fluid.layers.fc(input=concated,
-                          size=400,
-                          act='relu',
-                          param_attr=fluid.ParamAttr(
-                              initializer=fluid.initializer.Normal(
-                                  scale=1 / math.sqrt(concated.shape[1]))))
-    fc2 = fluid.layers.fc(input=fc1,
-                          size=400,
-                          act='relu',
-                          param_attr=fluid.ParamAttr(
-                              initializer=fluid.initializer.Normal(
-                                  scale=1 / math.sqrt(fc1.shape[1]))))
-    fc3 = fluid.layers.fc(input=fc2,
-                          size=400,
-                          act='relu',
-                          param_attr=fluid.ParamAttr(
-                              initializer=fluid.initializer.Normal(
-                                  scale=1 / math.sqrt(fc2.shape[1]))))
-    predict = fluid.layers.fc(input=fc3,
-                              size=1,
-                              act='sigmoid',
-                              param_attr=fluid.ParamAttr(
-                                  initializer=fluid.initializer.Normal(
-                                      scale=1 / math.sqrt(fc3.shape[1]))))
-
-    cost = fluid.layers.log_loss(input=predict, label=fluid.layers.cast(words[-1], dtype="float32"))
-    avg_cost = fluid.layers.reduce_sum(cost)
-    accuracy = fluid.layers.accuracy(input=predict, label=words[-1])
-    auc_var, batch_auc_var, auc_states = \
-        fluid.layers.auc(input=predict, label=words[-1], num_thresholds=2 ** 12, slide_steps=20)

-    return avg_cost, auc_var, batch_auc_var, py_reader, words, auc_states
+class CTR(object):
+    """
+    DNN for Click-Through Rate prediction
+    """
+
+    def input_data(self, args):
+        dense_input = fluid.data(name="dense_input",
+                                 shape=[-1, args.dense_feature_dim],
+                                 dtype="float32")
+
+        sparse_input_ids = [
+            fluid.data(name="C" + str(i),
+                       shape=[-1, 1],
+                       lod_level=1,
+                       dtype="int64") for i in range(1, 27)
+        ]
+
+        label = fluid.data(name="label", shape=[-1, 1], dtype="int64")
+
+        inputs = [dense_input] + sparse_input_ids + [label]
+        return inputs
+
+    def net(self, inputs, args):
+        def embedding_layer(input):
+            return fluid.layers.embedding(
+                input=input,
+                is_sparse=True,
+                size=[args.sparse_feature_dim, args.embedding_size],
+                param_attr=fluid.ParamAttr(
+                    name="SparseFeatFactors",
+                    initializer=fluid.initializer.Uniform()),
+            )
+
+        sparse_embed_seq = list(map(embedding_layer, inputs[1:-1]))
+
+        concated = fluid.layers.concat(sparse_embed_seq + inputs[0:1], axis=1)
+
+        fc1 = fluid.layers.fc(
+            input=concated,
+            size=400,
+            act="relu",
+            param_attr=fluid.ParamAttr(initializer=fluid.initializer.Normal(
+                scale=1 / math.sqrt(concated.shape[1]))),
+        )
+        fc2 = fluid.layers.fc(
+            input=fc1,
+            size=400,
+            act="relu",
+            param_attr=fluid.ParamAttr(initializer=fluid.initializer.Normal(
+                scale=1 / math.sqrt(fc1.shape[1]))),
+        )
+        fc3 = fluid.layers.fc(
+            input=fc2,
+            size=400,
+            act="relu",
+            param_attr=fluid.ParamAttr(initializer=fluid.initializer.Normal(
+                scale=1 / math.sqrt(fc2.shape[1]))),
+        )
+        predict = fluid.layers.fc(
+            input=fc3,
+            size=2,
+            act="softmax",
+            param_attr=fluid.ParamAttr(initializer=fluid.initializer.Normal(
+                scale=1 / math.sqrt(fc3.shape[1]))),
+        )
+
+        cost = fluid.layers.cross_entropy(input=predict, label=inputs[-1])
+        avg_cost = fluid.layers.reduce_sum(cost)
+        auc_var, _, _ = fluid.layers.auc(input=predict,
+                                         label=inputs[-1],
+                                         num_thresholds=2**12,
+                                         slide_steps=20)
+
+        return avg_cost, auc_var
--- a/PaddleRec/ctr/dnn/preprocess.py
+++ b/PaddleRec/ctr/dnn/preprocess.py
-"""
-Preprocess Criteo dataset. This dataset was used for the Display Advertising
-Challenge (https://www.kaggle.com/c/criteo-display-ad-challenge).
-"""
-import os
-import sys
-import click
-import random
-import collections
-
-# There are 13 integer features and 26 categorical features
-continous_features = range(1, 14)
-categorial_features = range(14, 40)
-
-# Clip integer features. The clip point for each integer feature
-# is derived from the 95% quantile of the total values in each feature
-continous_clip = [20, 600, 100, 50, 64000, 500, 100, 50, 500, 10, 10, 10, 50]
-
-
-class CategoryDictGenerator:
-    """
-    Generate dictionary for each of the categorical features
-    """
-
-    def __init__(self, num_feature):
-        self.dicts = []
-        self.num_feature = num_feature
-        for i in range(0, num_feature):
-            self.dicts.append(collections.defaultdict(int))
-
-    def build(self, datafile, categorial_features, cutoff=0):
-        with open(datafile, 'r') as f:
-            for line in f:
-                features = line.rstrip('\n').split('\t')
-                for i in range(0, self.num_feature):
-                    if features[categorial_features[i]] != '':
-                        self.dicts[i][features[categorial_features[i]]] += 1
-        for i in range(0, self.num_feature):
-            self.dicts[i] = filter(lambda x: x[1] >= cutoff,
-                                   self.dicts[i].items())
-            self.dicts[i] = sorted(self.dicts[i], key=lambda x: (-x[1], x[0]))
-            vocabs, _ = list(zip(*self.dicts[i]))
-            self.dicts[i] = dict(zip(vocabs, range(1, len(vocabs) + 1)))
-            self.dicts[i]['<unk>'] = 0
-
-    def gen(self, idx, key):
-        if key not in self.dicts[idx]:
-            res = self.dicts[idx]['<unk>']
-        else:
-            res = self.dicts[idx][key]
-        return res
-
-    def dicts_sizes(self):
-        return list(map(len, self.dicts))
-
-
-class ContinuousFeatureGenerator:
-    """
-    Normalize the integer features to [0, 1] by min-max normalization
-    """
-
-    def __init__(self, num_feature):
-        self.num_feature = num_feature
-        self.min = [sys.maxsize] * num_feature
-        self.max = [-sys.maxsize] * num_feature
-
-    def build(self, datafile, continous_features):
-        with open(datafile, 'r') as f:
-            for line in f:
-                features = line.rstrip('\n').split('\t')
-                for i in range(0, self.num_feature):
-                    val = features[continous_features[i]]
-                    if val != '':
-                        val = int(val)
-                        if val > continous_clip[i]:
-                            val = continous_clip[i]
-                        self.min[i] = min(self.min[i], val)
-                        self.max[i] = max(self.max[i], val)
-
-    def gen(self, idx, val):
-        if val == '':
-            return 0.0
-        val = float(val)
-        return (val - self.min[idx]) / (self.max[idx] - self.min[idx])
-
-
-@click.command("preprocess")
-@click.option("--datadir", type=str, help="Path to raw criteo dataset")
-@click.option("--outdir", type=str, help="Path to save the processed data")
-def preprocess(datadir, outdir):
-    """
-    All 13 integer features are normalized to continuous values and these continuous
-    features are combined into one vector with dimension of 13.
-
-    Each of the 26 categorical features are one-hot encoded and all the one-hot
-    vectors are combined into one sparse binary vector.
-    """
-    dists = ContinuousFeatureGenerator(len(continous_features))
-    dists.build(os.path.join(datadir, 'train.txt'), continous_features)
-
-    dicts = CategoryDictGenerator(len(categorial_features))
-    dicts.build(
-        os.path.join(datadir, 'train.txt'), categorial_features, cutoff=200)
-
-    dict_sizes = dicts.dicts_sizes()
-    categorial_feature_offset = [0]
-    for i in range(1, len(categorial_features)):
-        offset = categorial_feature_offset[i - 1] + dict_sizes[i - 1]
-        categorial_feature_offset.append(offset)
-
-    random.seed(0)
-
-    # 90% of the data are used for training, and 10% of the data are used
-    # for validation.
-    with open(os.path.join(outdir, 'train.txt'), 'w') as out_train:
-        with open(os.path.join(outdir, 'valid.txt'), 'w') as out_valid:
-            with open(os.path.join(datadir, 'train.txt'), 'r') as f:
-                for line in f:
-                    features = line.rstrip('\n').split('\t')
-
-                    continous_vals = []
-                    for i in range(0, len(continous_features)):
-                        val = dists.gen(i, features[continous_features[i]])
-                        continous_vals.append("{0:.6f}".format(val).rstrip('0')
-                                              .rstrip('.'))
-                    categorial_vals = []
-                    for i in range(0, len(categorial_features)):
-                        val = dicts.gen(i, features[categorial_features[
-                            i]]) + categorial_feature_offset[i]
-                        categorial_vals.append(str(val))
-
-                    continous_vals = ','.join(continous_vals)
-                    categorial_vals = ','.join(categorial_vals)
-                    label = features[0]
-                    if random.randint(0, 9999) % 10 != 0:
-                        out_train.write('\t'.join(
-                            [continous_vals, categorial_vals, label]) + '\n')
-                    else:
-                        out_valid.write('\t'.join(
-                            [continous_vals, categorial_vals, label]) + '\n')
-
-    with open(os.path.join(outdir, 'test.txt'), 'w') as out:
-        with open(os.path.join(datadir, 'test.txt'), 'r') as f:
-            for line in f:
-                features = line.rstrip('\n').split('\t')
-
-                continous_vals = []
-                for i in range(0, len(continous_features)):
-                    val = dists.gen(i, features[continous_features[i] - 1])
-                    continous_vals.append("{0:.6f}".format(val).rstrip('0')
-                                          .rstrip('.'))
-                categorial_vals = []
-                for i in range(0, len(categorial_features)):
-                    val = dicts.gen(i, features[categorial_features[
-                        i] - 1]) + categorial_feature_offset[i]
-                    categorial_vals.append(str(val))
-
-                continous_vals = ','.join(continous_vals)
-                categorial_vals = ','.join(categorial_vals)
-                out.write('\t'.join([continous_vals, categorial_vals]) + '\n')
-
-
-if __name__ == "__main__":
-    preprocess()
--- a/PaddleRec/ctr/dnn/requirements.txt
+++ b/PaddleRec/ctr/dnn/requirements.txt
-click
-mmh3
--- a/PaddleRec/ctr/dnn/train.py
+++ b/PaddleRec/ctr/dnn/train.py
+#!/usr/bin/python
+# -*- coding=utf-8 -*-
+# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
 from __future__ import print_function

 import argparse
 import logging
 import os
 import time
-
+import random
 import numpy as np

 import paddle
 import paddle.fluid as fluid

-import reader
-from network_conf import ctr_dnn_model
-from multiprocessing import cpu_count
+from network_conf import CTR
 import utils

+import paddle.fluid.incubate.fleet.base.role_maker as role_maker
+from paddle.fluid.incubate.fleet.parameter_server.distribute_transpiler import fleet
+from paddle.fluid.transpiler.distribute_transpiler import DistributeTranspilerConfig
+
 # disable gpu training for this example
 os.environ["CUDA_VISIBLE_DEVICES"] = ""

@@ -24,249 +41,224 @@ logger.setLevel(logging.INFO)


 def parse_args():
-    parser = argparse.ArgumentParser(description="PaddlePaddle CTR example")
+    parser = argparse.ArgumentParser(
+        description="PaddlePaddle CTR-DNN example")
+    # -------------Data & Model Path-------------
    parser.add_argument(
-        '--train_data_path',
+        '--train_files_path',
        type=str,
-        default='./data/raw/train.txt',
+        default='./train_data',
        help="The path of training dataset")
    parser.add_argument(
-        '--test_data_path',
+        '--test_files_path',
        type=str,
-        default='./data/raw/valid.txt',
+        default='./test_data',
        help="The path of testing dataset")
+    parser.add_argument(
+        '--model_path',
+        type=str,
+        default='models',
+        help='The path for model to store (default: models)')
+
+    # -------------Training parameter-------------
+    parser.add_argument(
+        '--learning_rate',
+        type=float,
+        default=1e-4,
+        help="Initial learning rate for training")
    parser.add_argument(
        '--batch_size',
        type=int,
        default=1000,
        help="The size of mini-batch (default:1000)")
    parser.add_argument(
-        '--embedding_size',
+        "--epochs",
        type=int,
-        default=10,
-        help="The size for embedding layer (default:10)")
+        default=1,
+        help="Number of epochs for training.")
+
+    # -------------Network parameter-------------
    parser.add_argument(
-        '--num_passes',
+        '--embedding_size',
        type=int,
        default=10,
-        help="The number of passes to train (default: 10)")
-    parser.add_argument(
-        '--model_output_dir',
-        type=str,
-        default='models',
-        help='The path for model to store (default: models)')
+        help="The size for embedding layer (default:10)")
    parser.add_argument(
        '--sparse_feature_dim',
        type=int,
        default=1000001,
        help='sparse feature hashing space for index processing')
+    parser.add_argument(
+        '--dense_feature_dim',
+        type=int,
+        default=13,
+        help='dense feature shape')
+
+    # -------------device parameter-------------
    parser.add_argument(
        '--is_local',
        type=int,
-        default=1,
+        default=0,
        help='Local train or distributed train (default: 1)')
    parser.add_argument(
-        '--cloud_train',
+        '--is_cloud',
        type=int,
        default=0,
        help='Local train or distributed train on paddlecloud (default: 0)')
    parser.add_argument(
-        '--async_mode',
-        action='store_true',
-        default=False,
-        help='Whether start pserver in async mode to support ASGD')
-    parser.add_argument(
-        '--no_split_var',
-        action='store_true',
-        default=False,
-        help='Whether split variables into blocks when update_method is pserver')
-    # the following arguments is used for distributed train, if is_local == false, then you should set them
-    parser.add_argument(
-        '--role',
-        type=str,
-        default='pserver',  # trainer or pserver
-        help='The path for model to store (default: models)')
-    parser.add_argument(
-        '--endpoints',
-        type=str,
-        default='127.0.0.1:6000',
-        help='The pserver endpoints, like: 127.0.0.1:6000,127.0.0.1:6001')
-    parser.add_argument(
-        '--current_endpoint',
-        type=str,
-        default='127.0.0.1:6000',
-        help='The path for model to store (default: 127.0.0.1:6000)')
-    parser.add_argument(
-        '--trainer_id',
+        '--save_model',
        type=int,
        default=0,
-        help='The path for model to store (default: models)')
+        help='Save training model or not')
    parser.add_argument(
-        '--trainers',
+        '--cpu_num',
        type=int,
-        default=1,
-        help='The num of trianers, (default: 1)')
-    parser.add_argument(
-        '--enable_ce',
-        action='store_true',
-        help='If set, run the task with continuous evaluation logs.')
+        default=2,
+        help='threads for ctr training')

    return parser.parse_args()


-def train_loop(args, train_program, py_reader, loss, auc_var, batch_auc_var,
-               trainer_num, trainer_id):
-
-    if args.enable_ce:
-        SEED = 102
-        train_program.random_seed = SEED
-        fluid.default_startup_program().random_seed = SEED
-
-    dataset = reader.CriteoDataset(args.sparse_feature_dim)
-    train_reader = paddle.batch(
-        paddle.reader.shuffle(
-            dataset.train([args.train_data_path], trainer_num, trainer_id),
-            buf_size=args.batch_size * 100),
-        batch_size=args.batch_size)
-
-    py_reader.decorate_paddle_reader(train_reader)
-    data_name_list = []
-
-    place = fluid.CPUPlace()
-    exe = fluid.Executor(place)
-
-    exec_strategy = fluid.ExecutionStrategy()
-    build_strategy = fluid.BuildStrategy()
-
-    if os.getenv("NUM_THREADS", ""):
-        exec_strategy.num_threads = int(os.getenv("NUM_THREADS"))
-
-    cpu_num = int(os.environ.get('CPU_NUM', cpu_count()))
-    build_strategy.reduce_strategy = \
-        fluid.BuildStrategy.ReduceStrategy.Reduce if cpu_num > 1 \
-            else fluid.BuildStrategy.ReduceStrategy.AllReduce
-
+def get_dataset(inputs, args):
+    dataset = fluid.DatasetFactory().create_dataset()
+    dataset.set_use_var(inputs)
+    dataset.set_pipe_command("python dataset_generator.py")
+    dataset.set_batch_size(args.batch_size)
+    thread_num = int(args.cpu_num)
+    dataset.set_thread(thread_num)
+    file_list = [
+        str(args.train_files_path) + "/%s" % x
+        for x in os.listdir(args.train_files_path)
+    ]
+    # 请确保每一个训练节点都持有不同的训练文件
+    # 当我们用本地多进程模拟分布式时，每个进程需要拿到不同的文件
+    # 使用 fleet.split_files 可以便捷的以文件为单位根据节点编号分配训练样本
+    if int(args.is_cloud):
+        file_list = fleet.split_files(file_list)
+    logger.info("file list: {}".format(file_list))
+
+    return dataset, file_list
+
+
+def local_train(args):
+    # 引入模型的组网
+    ctr_model = CTR()
+    inputs = ctr_model.input_data(args)
+    avg_cost, auc_var = ctr_model.net(inputs, args)
+
+    # 选择反向更新优化策略
+    optimizer = fluid.optimizer.Adam(args.learning_rate)
+    optimizer.minimize(avg_cost)
+
+    # 在CPU上创建训练的执行器并做参数初始化
+    exe = fluid.Executor(fluid.CPUPlace())
    exe.run(fluid.default_startup_program())
-    pe = fluid.ParallelExecutor(
-        use_cuda=False,
-        loss_name=loss.name,
-        main_program=train_program,
-        build_strategy=build_strategy,
-        exec_strategy=exec_strategy)
-
-    total_time = 0
-    for pass_id in range(args.num_passes):
-        pass_start = time.time()
-        batch_id = 0
-        py_reader.start()
-
-        try:
-            while True:
-                loss_val, auc_val, batch_auc_val = pe.run(
-                    fetch_list=[loss.name, auc_var.name, batch_auc_var.name])
-                loss_val = np.mean(loss_val)
-                auc_val = np.mean(auc_val)
-                batch_auc_val = np.mean(batch_auc_val)
-
-                logger.info(
-                    "TRAIN --> pass: {} batch: {} loss: {} auc: {}, batch_auc: {}"
-                    .format(pass_id, batch_id, loss_val / args.batch_size,
-                            auc_val, batch_auc_val))
-                if batch_id % 1000 == 0 and batch_id != 0:
-                    model_dir = os.path.join(args.model_output_dir,
-                                             'batch-' + str(batch_id))
-                    if args.trainer_id == 0:
-                        fluid.io.save_persistables(
-                            executor=exe,
-                            dirname=model_dir,
-                            main_program=fluid.default_main_program())
-                batch_id += 1
-        except fluid.core.EOFException:
-            py_reader.reset()
-        print("pass_id: %d, pass_time_cost: %f" %
-              (pass_id, time.time() - pass_start))
-
-        total_time += time.time() - pass_start
-
-        model_dir = os.path.join(args.model_output_dir, 'pass-' + str(pass_id))
-        if args.trainer_id == 0:
-            fluid.io.save_persistables(
-                executor=exe,
-                dirname=model_dir,
-                main_program=fluid.default_main_program())
-
-    # only for ce
-    if args.enable_ce:
-        threads_num, cpu_num = get_cards(args)
-        epoch_idx = args.num_passes
-        print("kpis\teach_pass_duration_cpu%s_thread%s\t%s" %
-              (cpu_num, threads_num, total_time / epoch_idx))
-        print("kpis\ttrain_loss_cpu%s_thread%s\t%s" %
-              (cpu_num, threads_num, loss_val / args.batch_size))
-        print("kpis\ttrain_auc_val_cpu%s_thread%s\t%s" %
-              (cpu_num, threads_num, auc_val))
-        print("kpis\ttrain_batch_auc_val_cpu%s_thread%s\t%s" %
-              (cpu_num, threads_num, batch_auc_val))
+
+    # 引入训练数据读取器与训练数据列表
+    dataset, file_list = get_dataset(inputs, args)
+
+    logger.info("Training Begin")
+    for epoch in range(args.epochs):
+        # 以文件为粒度进行shuffle
+        random.shuffle(file_list)
+        dataset.set_filelist(file_list)
+
+        # 使用train_from_dataset实现多线程并发训练
+        start_time = time.time()
+        exe.train_from_dataset(program=fluid.default_main_program(),
+                               dataset=dataset,
+                               fetch_list=[auc_var],
+                               fetch_info=["Epoch {} auc ".format(epoch)],
+                               print_period=100,
+                               debug=False)
+        end_time = time.time()
+        logger.info("epoch %d finished, use time=%d\n" %
+                    ((epoch), end_time - start_time))
+
+        if args.save_model:
+            model_path = (str(args.model_path) + "/" +
+                          "epoch_" + str(epoch) + "/")
+            if not os.path.isdir(model_path):
+                os.mkdir(model_path)
+            fluid.save(fluid.default_main_program(), model_path + "checkpoint")
+
+    logger.info("Train Success!")
+
+
+def distribute_train(args):
+    # 根据环境变量确定当前机器/进程在分布式训练中扮演的角色
+    # 然后使用 fleet api的 init()方法初始化这个节点
+    role = role_maker.PaddleCloudRoleMaker()
+    fleet.init(role)
+
+    # 我们还可以进一步指定分布式的运行模式，通过 DistributeTranspilerConfig进行配置
+    # 如下，我们设置分布式运行模式为异步(async)，同时将参数进行切分，以分配到不同的节点
+    strategy = DistributeTranspilerConfig()
+    strategy.sync_mode = False
+    strategy.runtime_split_send_recv = True
+
+    ctr_model = CTR()
+    inputs = ctr_model.input_data(args)
+    avg_cost, auc_var = ctr_model.net(inputs, args)
+
+    # 配置分布式的optimizer，传入我们指定的strategy，构建program
+    optimizer = fluid.optimizer.Adam(args.learning_rate)
+    optimizer = fleet.distributed_optimizer(optimizer, strategy)
+    optimizer.minimize(avg_cost)
+
+    # 根据节点角色，分别运行不同的逻辑
+    if fleet.is_server():
+        # 初始化及运行参数服务器节点
+        fleet.init_server()
+        fleet.run_server()
+
+    elif fleet.is_worker():
+        # 初始化工作节点
+        fleet.init_worker()
+
+        exe = fluid.Executor(fluid.CPUPlace())
+        # 初始化含有分布式流程的fleet.startup_program
+        exe.run(fleet.startup_program)
+        dataset, file_list = get_dataset(inputs, args)
+        for epoch in range(args.epochs):
+            # 以文件为粒度进行shuffle
+            random.shuffle(file_list)
+            dataset.set_filelist(file_list)
+
+            # 训练节点运行的是经过分布式裁剪的fleet.mian_program
+            start_time = time.time()
+            exe.train_from_dataset(program=fleet.main_program,
+                                   dataset=dataset,
+                                   fetch_list=[auc_var],
+                                   fetch_info=["Epoch {} auc ".format(epoch)],
+                                   print_period=100,
+                                   debug=False)
+            end_time = time.time()
+            logger.info("epoch %d finished, use time=%d\n" %
+                        ((epoch), end_time - start_time))
+
+            # 默认使用0号节点保存模型
+            if args.save_model and fleet.is_first_worker():
+                model_path = (str(args.model_path) + "/" + "epoch_" +
+                              str(epoch))
+                fleet.save_persistables(executor=exe, dirname=model_path)
+
+        fleet.stop_worker()
+        logger.info("Distribute Train Success!")


 def train():
    args = parse_args()

-    if not os.path.isdir(args.model_output_dir):
-        os.mkdir(args.model_output_dir)
-
-    loss, auc_var, batch_auc_var, py_reader, _, auc_states = ctr_dnn_model(
-        args.embedding_size, args.sparse_feature_dim)
-    optimizer = fluid.optimizer.Adam(learning_rate=1e-4)
-    optimizer.minimize(loss)
-    if args.cloud_train:
-        # the port of all pservers, needed by both trainer and pserver
-        port = os.getenv("PADDLE_PORT", "6174")
-        # comma separated ips of all pservers, needed by trainer and
-        pserver_ips = os.getenv("PADDLE_PSERVERS", "")
-        eplist = []
-        for ip in pserver_ips.split(","):
-            eplist.append(':'.join([ip, port]))
-        args.endpoints = ",".join(eplist)
-        args.trainers = int(os.getenv("PADDLE_TRAINERS_NUM", "1"))
-        args.current_endpoint = os.getenv("POD_IP", "localhost") + ":" + port
-        args.role = os.getenv("TRAINING_ROLE", "TRAINER")
-        args.trainer_id = int(os.getenv("PADDLE_TRAINER_ID", "0"))
-        args.is_local = bool(int(os.getenv("PADDLE_IS_LOCAL", 0)))
-
-    if args.is_local:
+    if not os.path.isdir(args.model_path):
+        os.mkdir(args.model_path)
+
+    if args.is_cloud:
+        logger.info("run cloud training")
+        distribute_train(args)
+    elif args.is_local:
        logger.info("run local training")
-        main_program = fluid.default_main_program()
-        train_loop(args, main_program, py_reader, loss, auc_var, batch_auc_var,
-                   1, 0)
-    else:
-        logger.info("run dist training")
-        t = fluid.DistributeTranspiler()
-        t.transpile(
-            args.trainer_id, pservers=args.endpoints, trainers=args.trainers)
-        if args.role == "pserver" or args.role == "PSERVER":
-            logger.info("run pserver")
-            prog = t.get_pserver_program(args.current_endpoint)
-            startup = t.get_startup_program(
-                args.current_endpoint, pserver_program=prog)
-            exe = fluid.Executor(fluid.CPUPlace())
-            exe.run(startup)
-            exe.run(prog)
-        elif args.role == "trainer" or args.role == "TRAINER":
-            logger.info("run trainer")
-            train_prog = t.get_trainer_program()
-            train_loop(args, train_prog, py_reader, loss, auc_var,
-                       batch_auc_var, args.trainers, args.trainer_id)
-        else:
-            raise ValueError(
-                'PADDLE_TRAINING_ROLE environment variable must be either TRAINER or PSERVER'
-            )
-
-
-def get_cards(args):
-    threads_num = os.environ.get('NUM_THREADS', 1)
-    cpu_num = os.environ.get('CPU_NUM', 1)
-    return int(threads_num), int(cpu_num)
+        local_train(args)


 if __name__ == '__main__':

--- a/PaddleRec/ctr/dnn/utils.py
+++ b/PaddleRec/ctr/dnn/utils.py
@@ -13,12 +13,12 @@ def check_version():
    Log error and exit when the installed version of paddlepaddle is
    not satisfied.
    """
-    err = "PaddlePaddle version 1.6 or higher is required, " \
+    err = "PaddlePaddle version 1.6.1 or higher is required, " \
          "or a suitable develop version is satisfied as well. \n" \
          "Please make sure the version is good with your code." \

    try:
-        fluid.require_version('1.6.0')
+        fluid.require_version('1.6.1')
    except Exception as e:
        logger.error(err)
        sys.exit(1)