Merge branch 'master' of https://github.com/PaddlePaddle/PaddleRec into gru4rec

3e43efca · malin10 · c718dc5d · 12fc8c82 · 3e43efca · 3e43efca
124 changed file
--- a/.travis.yml
+++ b/.travis.yml
@@ -16,13 +16,20 @@ before_install:
  # For pylint dockstring checker
  - sudo apt-get update 
  - sudo apt-get install -y python-pip libpython-dev
+  - sudo apt-get remove python-urllib3
+  - sudo apt-get purge python-urllib3
+  - sudo rm /usr/lib/python2.7/dist-packages/chardet-*
  - sudo pip install -U pip
+  - sudo pip install --upgrade setuptools
  - sudo pip install six --upgrade --ignore-installed six
-  - sudo pip install pillow
  - sudo pip install PyYAML
  - sudo pip install pylint pytest astroid isort pre-commit
  - sudo pip install kiwisolver
-  - sudo pip install paddlepaddle==1.7.2 --ignore-installed urllib3
+  - sudo pip install scikit-build
+  - sudo pip install Pillow==5.3.0
+  - sudo pip install opencv-python==3.4.3.18
+  - sudo pip install rarfile==3.0
+  - sudo pip install paddlepaddle==1.7.2
  - sudo python setup.py install
  - |
    function timeout() { perl -e 'alarm shift; exec @ARGV' "$@"; }

--- a/README.md
+++ b/README.md
--- a/README_CN.md
+++ b/README_CN.md
-(简体中文|[English](./README.md))
-
-<p align="center">
-<img align="center" src="doc/imgs/logo.png">
-<p>
-<p align="center">
-<img align="center" src="doc/imgs/structure.png">
-<p>
-<p align="center">
-<img align="center" src="doc/imgs/overview.png">
-<p>
-
-
-<h2 align="center">什么是推荐系统?</h2>
-<p align="center">
-<img align="center" src="doc/imgs/rec-overview.png">
-<p>
-
- 推荐系统是在互联网信息爆炸式增长的时代背景下，帮助用户高效获得感兴趣信息的关键；
-
- 推荐系统也是帮助产品最大限度吸引用户、留存用户、增加用户粘性、提高用户转化率的银弹。
-
- 有无数优秀的产品依靠用户可感知的推荐系统建立了良好的口碑，也有无数的公司依靠直击用户痛点的推荐系统在行业中占领了一席之地。
-
-  > 可以说，谁能掌握和利用好推荐系统，谁就能在信息分发的激烈竞争中抢得先机。
-  > 但与此同时，有着许多问题困扰着推荐系统的开发者，比如：庞大的数据量，复杂的模型结构，低效的分布式训练环境，波动的在离线一致性，苛刻的上线部署要求，以上种种，不胜枚举。
-
-<h2 align="center">什么是PaddleRec?</h2>
-
-
- 源于飞桨生态的搜索推荐模型 **一站式开箱即用工具** 
- 适合初学者，开发者，研究者的推荐系统全流程解决方案
- 包含内容理解、匹配、召回、排序、 多任务、重排序等多个任务的完整推荐搜索算法库
-
-
-    |   方向   |                                   模型                                    | 单机CPU | 单机GPU | 分布式CPU | 分布式GPU | 论文                                                                                                                                                                                                        |
-    | :------: | :-----------------------------------------------------------------------: | :-----: | :-----: | :-------: | :-------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-    | 内容理解 | [Text-Classifcation](models/contentunderstanding/classification/model.py) |    ✓    |    ✓    |     ✓     |     x     | [EMNLP 2014][Convolutional neural networks for sentence classication](https://www.aclweb.org/anthology/D14-1181.pdf)                                                                                        |
-    | 内容理解 |         [TagSpace](models/contentunderstanding/tagspace/model.py)         |    ✓    |    ✓    |     ✓     |     x     | [EMNLP 2014][TagSpace: Semantic Embeddings from Hashtags](https://www.aclweb.org/anthology/D14-1194.pdf)                                                                                                    |
-    |   匹配   |                    [DSSM](models/match/dssm/model.py)                     |    ✓    |    ✓    |     ✓     |     x     | [CIKM 2013][Learning Deep Structured Semantic Models for Web Search using Clickthrough Data](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/cikm2013_DSSM_fullversion.pdf)             |
-    |   匹配   |        [MultiView-Simnet](models/match/multiview-simnet/model.py)         |    ✓    |    ✓    |     ✓     |     x     | [WWW 2015][A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/frp1159-songA.pdf)             |
-    |   召回   |                   [TDM](models/treebased/tdm/model.py)                    |    ✓    | >=1.8.0 |     ✓     |  >=1.8.0  | [KDD 2018][Learning Tree-based Deep Model for Recommender Systems](https://arxiv.org/pdf/1801.02294.pdf)                                                                                                    |
-    |   召回   |                [fasttext](models/recall/fasttext/model.py)                |    ✓    |    ✓    |     x     |     x     | [EACL 2017][Bag of Tricks for Efficient Text Classification](https://www.aclweb.org/anthology/E17-2068.pdf)                                                                                                 |
-    |   召回   |                [Word2Vec](models/recall/word2vec/model.py)                |    ✓    |    ✓    |     ✓     |     x     | [NIPS 2013][Distributed Representations of Words and Phrases and their Compositionality](https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf) |
-    |   召回   |                     [SSR](models/recall/ssr/model.py)                     |    ✓    |    ✓    |     ✓     |     ✓     | [SIGIR 2016][Multi-Rate Deep Learning for Temporal Recommendation](http://sonyis.me/paperpdf/spr209-song_sigir16.pdf)                                                                                       |
-    |   召回   |                 [Gru4Rec](models/recall/gru4rec/model.py)                 |    ✓    |    ✓    |     ✓     |     ✓     | [2015][Session-based Recommendations with Recurrent Neural Networks](https://arxiv.org/abs/1511.06939)                                                                                                      |
-    |   召回   |             [Youtube_dnn](models/recall/youtube_dnn/model.py)             |    ✓    |    ✓    |     ✓     |     ✓     | [RecSys 2016][Deep Neural Networks for YouTube Recommendations](https://static.googleusercontent.com/media/research.google.com/zh-CN//pubs/archive/45530.pdf)                                               |
-    |   召回   |                     [NCF](models/recall/ncf/model.py)                     |    ✓    |    ✓    |     ✓     |     ✓     | [WWW 2017][Neural Collaborative Filtering](https://arxiv.org/pdf/1708.05031.pdf)                                                                                                                            |
-    |   召回   |                     [GNN](models/recall/gnn/model.py)                     |    ✓    |    ✓    |     ✓     |     ✓     | [AAAI 2019][Session-based Recommendation with Graph Neural Networks](https://arxiv.org/abs/1811.00855)                                                                                                      |
-    |   排序   |      [Logistic Regression](models/rank/logistic_regression/model.py)      |    ✓    |    x    |     ✓     |     x     | /                                                                                                                                                                                                           |
-    |   排序   |                      [Dnn](models/rank/dnn/model.py)                      |    ✓    |    ✓    |     ✓     |     ✓     | /                                                                                                                                                                                                           |
-    |   排序   |                       [FM](models/rank/fm/model.py)                       |    ✓    |    x    |     ✓     |     x     | [IEEE Data Mining 2010][Factorization machines](https://analyticsconsultores.com.mx/wp-content/uploads/2019/03/Factorization-Machines-Steffen-Rendle-Osaka-University-2010.pdf)                             |
-    |   排序   |                      [FFM](models/rank/ffm/model.py)                      |    ✓    |    x    |     ✓     |     x     | [RECSYS 2016][Field-aware Factorization Machines for CTR Prediction](https://dl.acm.org/doi/pdf/10.1145/2959100.2959134)                                                                                    |
-    |   排序   |                      [FNN](models/rank/fnn/model.py)                      |    ✓    |    x    |     ✓     |     x     | [ECIR 2016][Deep Learning over Multi-field Categorical Data](https://arxiv.org/pdf/1601.02376.pdf)                                                                                                          |
-    |   排序   |            [Deep Crossing](models/rank/deep_crossing/model.py)            |    ✓    |    x    |     ✓     |     x     | [ACM 2016][Deep Crossing: Web-Scale Modeling without Manually Crafted Combinatorial Features](https://www.kdd.org/kdd2016/papers/files/adf0975-shanA.pdf)                                                   |
-    |   排序   |                      [Pnn](models/rank/pnn/model.py)                      |    ✓    |    x    |     ✓     |     x     | [ICDM 2016][Product-based Neural Networks for User Response Prediction](https://arxiv.org/pdf/1611.00144.pdf)                                                                                               |
-    |   排序   |                      [DCN](models/rank/dcn/model.py)                      |    ✓    |    x    |     ✓     |     x     | [KDD 2017][Deep & Cross Network for Ad Click Predictions](https://dl.acm.org/doi/pdf/10.1145/3124749.3124754)                                                                                               |
-    |   排序   |                      [NFM](models/rank/nfm/model.py)                      |    ✓    |    x    |     ✓     |     x     | [SIGIR 2017][Neural Factorization Machines for Sparse Predictive Analytics](https://dl.acm.org/doi/pdf/10.1145/3077136.3080777)                                                                             |
-    |   排序   |                      [AFM](models/rank/afm/model.py)                      |    ✓    |    x    |     ✓     |     x     | [IJCAI 2017][Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks](https://arxiv.org/pdf/1708.04617.pdf)                                                  |
-    |   排序   |                   [DeepFM](models/rank/deepfm/model.py)                   |    ✓    |    x    |     ✓     |     x     | [IJCAI 2017][DeepFM: A Factorization-Machine based Neural Network for CTR Prediction](https://arxiv.org/pdf/1703.04247.pdf)                                                                                 |
-    |   排序   |                  [xDeepFM](models/rank/xdeepfm/model.py)                  |    ✓    |    x    |     ✓     |     x     | [KDD 2018][xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems](https://dl.acm.org/doi/pdf/10.1145/3219819.3220023)                                                       |
-    |   排序   |                      [DIN](models/rank/din/model.py)                      |    ✓    |    x    |     ✓     |     x     | [KDD 2018][Deep Interest Network for Click-Through Rate Prediction](https://dl.acm.org/doi/pdf/10.1145/3219819.3219823)                                                                                     |
-    |   排序   |                      [DIEN](models/rank/dien/model.py)                      |    ✓    |    x    |     ✓     |     x     | [AAAI 2019][Deep Interest Evolution Network for Click-Through Rate Prediction](https://www.aaai.org/ojs/index.php/AAAI/article/view/4545/4423)                                                                                     |
-    |   排序   |                      [BST](models/rank/BST/model.py)                      |    ✓    |    x    |     ✓     |     x     | [DLP_KDD 2019][Behavior Sequence Transformer for E-commerce Recommendation in Alibaba](https://arxiv.org/pdf/1905.06874v1.pdf)                                                                                     |
-    |   排序   |                      [AutoInt](models/rank/AutoInt/model.py)                      |    ✓    |    x    |     ✓     |     x     | [CIKM 2019][AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks](https://arxiv.org/pdf/1810.11921.pdf)                                                                                     |
-    |   排序   |                [Wide&Deep](models/rank/wide_deep/model.py)                |    ✓    |    x    |     ✓     |     x     | [DLRS 2016][Wide & Deep Learning for Recommender Systems](https://dl.acm.org/doi/pdf/10.1145/2988450.2988454)                                                                                               |
-    |   排序   |                    [FGCNN](models/rank/fgcnn/model.py)                    |    ✓    |    ✓    |     ✓     |     ✓     | [WWW 2019][Feature Generation by Convolutional Neural Network for Click-Through Rate Prediction](https://arxiv.org/pdf/1904.04447.pdf)                                                                      |
-    |   排序   |                  [Fibinet](models/rank/fibinet/model.py)                  |    ✓    |    ✓    |     ✓     |     ✓     | [RecSys19][FiBiNET: Combining Feature Importance and Bilinear feature Interaction for Click-Through Rate Prediction]( https://arxiv.org/pdf/1905.09433.pdf)                                                 |
-    |   排序   |                  [Flen](models/rank/flen/model.py)                  |    ✓    |    ✓    |     ✓     |     ✓     | [2019][FLEN: Leveraging Field for Scalable CTR Prediction]( https://arxiv.org/pdf/1911.04690.pdf)                                                 |
-    |  多任务  |                  [ESMM](models/multitask/esmm/model.py)                   |    ✓    |    ✓    |     ✓     |     ✓     | [SIGIR 2018][Entire Space Multi-Task Model: An Effective Approach for Estimating Post-Click Conversion Rate](https://arxiv.org/abs/1804.07931)                                                              |
-    |  多任务  |                  [MMOE](models/multitask/mmoe/model.py)                   |    ✓    |    ✓    |     ✓     |     ✓     | [KDD 2018][Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts](https://dl.acm.org/doi/abs/10.1145/3219819.3220007)                                                       |
-    |  多任务  |           [ShareBottom](models/multitask/share-bottom/model.py)           |    ✓    |    ✓    |     ✓     |     ✓     | [1998][Multitask learning](http://reports-archive.adm.cs.cmu.edu/anon/1997/CMU-CS-97-203.pdf)                                                                                                               |
-    |  重排序  |                [Listwise](models/rerank/listwise/model.py)                |    ✓    |    ✓    |     ✓     |     x     | [2019][Sequential Evaluation and Generation Framework for Combinatorial Recommender System](https://arxiv.org/pdf/1902.00245.pdf)                                                                           |
-
-
-
-
-
-<h2 align="center">快速安装</h2>
-
-### 环境要求
-* Python 2.7/ 3.5 / 3.6 / 3.7
-* PaddlePaddle  >= 1.7.2
-* 操作系统: Windows/Mac/Linux
-
-  > Windows下PaddleRec目前仅支持单机训练，分布式训练建议使用Linux环境
-  
-### 安装命令
-
- 安装方法一 **PIP源直接安装**
-  ```bash
-  python -m pip install paddle-rec
-  ```
-    > 该方法会默认下载安装`paddlepaddle v1.7.2 cpu版本`，若提示`PaddlePaddle`无法安装，则依照下述方法首先安装`PaddlePaddle`，再安装`PaddleRec`：
-    > - 可以在[该地址](https://pypi.org/project/paddlepaddle/1.7.2/#files)，下载PaddlePaddle后手动安装whl包
-    > - 可以先pip安装`PaddlePaddle`，`python -m pip install paddlepaddle==1.7.2 -i https://mirror.baidu.com/pypi/simple`
-    > - 其他安装问题可以在[Paddle Issue](https://github.com/PaddlePaddle/Paddle/issues)或[PaddleRec Issue](https://github.com/PaddlePaddle/PaddleRec/issues)提出，会有工程师及时解答
-
- 安装方法二 **源码编译安装**
-  
-  - 安装飞桨  **注：需要用户安装版本 == 1.7.2 的飞桨**
-
-    ```shell
-    python -m pip install paddlepaddle==1.7.2 -i https://mirror.baidu.com/pypi/simple
-    ```
-
-  - 源码安装PaddleRec
-
-    ```
-    git clone https://github.com/PaddlePaddle/PaddleRec/
-    cd PaddleRec
-    python setup.py install
-    ```
-
- PaddleRec-GPU安装方法
-
-  在使用方法一或方法二完成PaddleRec安装后，需再手动安装`paddlepaddle-gpu`，并根据自身环境(Cuda/Cudnn)选择合适的版本，安装教程请查阅[飞桨-开始使用](https://www.paddlepaddle.org.cn/install/quick)
-
-
-<h2 align="center">一键启动</h2>
-
-我们以排序模型中的`dnn`模型为例介绍PaddleRec的一键启动。训练数据来源为[Criteo数据集](https://www.kaggle.com/c/criteo-display-ad-challenge/)，我们从中截取了100条数据：
-
-```bash
-# 使用CPU进行单机训练
-python -m paddlerec.run -m paddlerec.models.rank.dnn  
-```
-
-
-<h2 align="center">帮助文档</h2>
-
-### 项目背景
-* [推荐系统介绍](doc/rec_background.md)
-* [分布式深度学习介绍](doc/ps_background.md)
-
-### 快速开始
-* [十分钟上手PaddleRec](https://aistudio.baidu.com/aistudio/projectdetail/559336)
-
-### 入门教程
-* [数据准备](doc/slot_reader.md)
-* [模型调参](doc/model.md)
-* [启动单机训练](doc/train.md)
-* [启动分布式训练](doc/distributed_train.md)
-* [启动预测](doc/predict.md)
-* [快速部署](doc/serving.md)
-
-
-### 进阶教程
-* [自定义Reader](doc/custom_reader.md)
-* [自定义模型](doc/model_develop.md)
-* [自定义流程](doc/trainer_develop.md)
-* [yaml配置说明](doc/yaml.md)
-* [PaddleRec设计文档](doc/design.md)
-
-### Benchmark
-* [Benchmark](doc/benchmark.md)
-
-### FAQ
-* [常见问题FAQ](doc/faq.md)
-
-
-<h2 align="center">社区</h2>
-
-<p align="center">
-    <br>
-    <img alt="Release" src="https://img.shields.io/badge/Release-0.1.0-yellowgreen">
-    <img alt="License" src="https://img.shields.io/github/license/PaddlePaddle/PaddleRec">
-    <img alt="Slack" src="https://img.shields.io/badge/Join-Slack-green">
-    <br>
-<p>
-
-### 版本历史
- 2020.06.17 - PaddleRec v0.1.0
- 2020.06.03 - PaddleRec v0.0.2
- 2020.05.14 - PaddleRec v0.0.1
-  
-### 许可证书
-本项目的发布受[Apache 2.0 license](LICENSE)许可认证。
-
-### 联系我们
-
-如有意见、建议及使用中的BUG，欢迎在[GitHub Issue](https://github.com/PaddlePaddle/PaddleRec/issues)提交
-
-亦可通过以下方式与我们沟通交流：
-
- QQ群号码：`861717190`
- 微信小助手微信号：`paddlerec2020`
-
-<p align="center"><img width="200" height="200" margin="500" src="./doc/imgs/QQ_group.png"/>&#8194;&#8194;&#8194;&#8194;&#8194<img width="200" height="200"  src="doc/imgs/weixin_supporter.png"/></p>
-<p align="center">PaddleRec交流QQ群&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;PaddleRec微信小助手</p>
--- a/README_EN.md
+++ b/README_EN.md
+([简体中文](./README.md)|English)
+<p align="center">
+<img align="center" src="doc/imgs/logo.png">
+<p>
+<p align="center">
+<img align="center" src="doc/imgs/overview_en.png">
+<p>
+
+
+<h2 align="center">What is recommendation system ?</h2>
+<p align="center">
+<img align="center" src="doc/imgs/rec-overview-en.png">
+<p>
+
+- Recommendation system helps users quickly find useful and interesting information from massive data.
+
+- Recommendation system is also a silver bullet to attract users, retain users, increase users' stickness or conversionn.
+
+  > Who can better use the recommendation system, who can gain more advantage in the fierce competition.
+  >
+  > At the same time, there are many problems in the process of using the recommendation system, such as: huge data, complex model, inefficient distributed training, and so on.
+
+<h2 align="center">What is PaddleRec ?</h2>
+
+
+- A quick start tool of search & recommendation algorithm based on [PaddlePaddle](https://www.paddlepaddle.org.cn/documentation/docs/en/beginners_guide/index_en.html)
+- A complete solution of recommendation system for beginners, developers and researchers.
+- Recommendation algorithm library including content-understanding, match, recall, rank, multi-task, re-rank etc.
+
+
+    |         Type          |                                 Algorithm                                 |  CPU  |   GPU   | Parameter-Server | Multi-GPU | Paper                                                                                                                                                                                                       |
+    | :-------------------: | :-----------------------------------------------------------------------: | :---: | :-----: | :--------------: | :-------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+    | Content-Understanding | [Text-Classifcation](models/contentunderstanding/classification/model.py) |   ✓   |    ✓    |        ✓         |     x     | [EMNLP 2014][Convolutional neural networks for sentence classication](https://www.aclweb.org/anthology/D14-1181.pdf)                                                                                        |
+    | Content-Understanding |         [TagSpace](models/contentunderstanding/tagspace/model.py)         |   ✓   |    ✓    |        ✓         |     x     | [EMNLP 2014][TagSpace: Semantic Embeddings from Hashtags](https://www.aclweb.org/anthology/D14-1194.pdf)                                                                                                    |
+    |         Match         |                    [DSSM](models/match/dssm/model.py)                     |   ✓   |    ✓    |        ✓         |     x     | [CIKM 2013][Learning Deep Structured Semantic Models for Web Search using Clickthrough Data](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/cikm2013_DSSM_fullversion.pdf)             |
+    |         Match         |        [MultiView-Simnet](models/match/multiview-simnet/model.py)         |   ✓   |    ✓    |        ✓         |     x     | [WWW 2015][A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/frp1159-songA.pdf)             |
+    |        Recall         |                   [TDM](models/treebased/tdm/model.py)                    |   ✓   | >=1.8.0 |        ✓         |  >=1.8.0  | [KDD 2018][Learning Tree-based Deep Model for Recommender Systems](https://arxiv.org/pdf/1801.02294.pdf)                                                                                                    |
+    |        Recall         |                [fasttext](models/recall/fasttext/model.py)                |   ✓   |    ✓    |        x         |     x     | [EACL 2017][Bag of Tricks for Efficient Text Classification](https://www.aclweb.org/anthology/E17-2068.pdf)                                                                                                 |
+    |        Recall         |                [Word2Vec](models/recall/word2vec/model.py)                |   ✓   |    ✓    |        ✓         |     x     | [NIPS 2013][Distributed Representations of Words and Phrases and their Compositionality](https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf) |
+    |        Recall         |                     [SSR](models/recall/ssr/model.py)                     |   ✓   |    ✓    |        ✓         |     ✓     | [SIGIR 2016][Multi-Rate Deep Learning for Temporal Recommendation](http://sonyis.me/paperpdf/spr209-song_sigir16.pdf)                                                                                       |
+    |        Recall         |                 [Gru4Rec](models/recall/gru4rec/model.py)                 |   ✓   |    ✓    |        ✓         |     ✓     | [2015][Session-based Recommendations with Recurrent Neural Networks](https://arxiv.org/abs/1511.06939)                                                                                                      |
+    |        Recall         |             [Youtube_dnn](models/recall/youtube_dnn/model.py)             |   ✓   |    ✓    |        ✓         |     ✓     | [RecSys 2016][Deep Neural Networks for YouTube Recommendations](https://static.googleusercontent.com/media/research.google.com/zh-CN//pubs/archive/45530.pdf)                                               |
+    |        Recall         |                     [NCF](models/recall/ncf/model.py)                     |   ✓   |    ✓    |        ✓         |     ✓     | [WWW 2017][Neural Collaborative Filtering](https://arxiv.org/pdf/1708.05031.pdf)                                                                                                                            |
+    |        Recall         |                     [GNN](models/recall/gnn/model.py)                     |   ✓   |    ✓    |        ✓         |     ✓     | [AAAI 2019][Session-based Recommendation with Graph Neural Networks](https://arxiv.org/abs/1811.00855)                                                                                                      |
+    |        Recall         |                     [RALM](models/recall/look-alike_recall/model.py)                     |   ✓   |    ✓    |        ✓         |     ✓     | [KDD 2019][Real-time Attention Based Look-alike Model for Recommender System](https://arxiv.org/pdf/1906.05022.pdf)                                                                                                      |
+    |         Rank          |      [Logistic Regression](models/rank/logistic_regression/model.py)      |   ✓   |    x    |        ✓         |     x     | /                                                                                                                                                                                                           |
+    |         Rank          |                      [Dnn](models/rank/dnn/model.py)                      |   ✓   |    ✓    |        ✓         |     ✓     | /                                                                                                                                                                                                           |
+    |         Rank          |                       [FM](models/rank/fm/model.py)                       |   ✓   |    x    |        ✓         |     x     | [IEEE Data Mining 2010][Factorization machines](https://analyticsconsultores.com.mx/wp-content/uploads/2019/03/Factorization-Machines-Steffen-Rendle-Osaka-University-2010.pdf)                             |
+    |         Rank          |                      [FFM](models/rank/ffm/model.py)                      |   ✓   |    x    |        ✓         |     x     | [RECSYS 2016][Field-aware Factorization Machines for CTR Prediction](https://dl.acm.org/doi/pdf/10.1145/2959100.2959134)                                                                                    |
+    |         Rank          |                      [FNN](models/rank/fnn/model.py)                      |   ✓   |    x    |        ✓         |     x     | [ECIR 2016][Deep Learning over Multi-field Categorical Data](https://arxiv.org/pdf/1601.02376.pdf)                                                                                                          |
+    |         Rank          |            [Deep Crossing](models/rank/deep_crossing/model.py)            |   ✓   |    x    |        ✓         |     x     | [ACM 2016][Deep Crossing: Web-Scale Modeling without Manually Crafted Combinatorial Features](https://www.kdd.org/kdd2016/papers/files/adf0975-shanA.pdf)                                                   |
+    |         Rank          |                      [Pnn](models/rank/pnn/model.py)                      |   ✓   |    x    |        ✓         |     x     | [ICDM 2016][Product-based Neural Networks for User Response Prediction](https://arxiv.org/pdf/1611.00144.pdf)                                                                                               |
+    |         Rank          |                      [DCN](models/rank/dcn/model.py)                      |   ✓   |    x    |        ✓         |     x     | [KDD 2017][Deep & Cross Network for Ad Click Predictions](https://dl.acm.org/doi/pdf/10.1145/3124749.3124754)                                                                                               |
+    |         Rank          |                      [NFM](models/rank/nfm/model.py)                      |   ✓   |    x    |        ✓         |     x     | [SIGIR 2017][Neural Factorization Machines for Sparse Predictive Analytics](https://dl.acm.org/doi/pdf/10.1145/3077136.3080777)                                                                             |
+    |         Rank          |                      [AFM](models/rank/afm/model.py)                      |   ✓   |    x    |        ✓         |     x     | [IJCAI 2017][Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks](https://arxiv.org/pdf/1708.04617.pdf)                                                  |
+    |         Rank          |                   [DeepFM](models/rank/deepfm/model.py)                   |   ✓   |    x    |        ✓         |     x     | [IJCAI 2017][DeepFM: A Factorization-Machine based Neural Network for CTR Prediction](https://arxiv.org/pdf/1703.04247.pdf)                                                                                 |
+    |         Rank          |                  [xDeepFM](models/rank/xdeepfm/model.py)                  |   ✓   |    x    |        ✓         |     x     | [KDD 2018][xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems](https://dl.acm.org/doi/pdf/10.1145/3219819.3220023)                                                       |
+    |         Rank          |                      [DIN](models/rank/din/model.py)                      |   ✓   |    x    |        ✓         |     x     | [KDD 2018][Deep Interest Network for Click-Through Rate Prediction](https://dl.acm.org/doi/pdf/10.1145/3219819.3219823)                                                                                     |
+    |         Rank          |                     [DIEN](models/rank/dien/model.py)                     |   ✓   |    x    |        ✓         |     x     | [AAAI 2019][Deep Interest Evolution Network for Click-Through Rate Prediction](https://www.aaai.org/ojs/index.php/AAAI/article/view/4545/4423)                                                              |
+    |         Rank          |                      [BST](models/rank/BST/model.py)                      |   ✓   |    x    |        ✓         |     x     | [DLP-KDD 2019][Behavior Sequence Transformer for E-commerce Recommendation in Alibaba](https://arxiv.org/pdf/1905.06874v1.pdf)                                                                              |
+    |         Rank          |                  [AutoInt](models/rank/AutoInt/model.py)                  |   ✓   |    x    |        ✓         |     x     | [CIKM 2019][AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks](https://arxiv.org/pdf/1810.11921.pdf)                                                                       |
+    |         Rank          |                [Wide&Deep](models/rank/wide_deep/model.py)                |   ✓   |    x    |        ✓         |     x     | [DLRS 2016][Wide & Deep Learning for Recommender Systems](https://dl.acm.org/doi/pdf/10.1145/2988450.2988454)                                                                                               |
+    |         Rank          |                    [FGCNN](models/rank/fgcnn/model.py)                    |   ✓   |    ✓    |        ✓         |     ✓     | [WWW 2019][Feature Generation by Convolutional Neural Network for Click-Through Rate Prediction](https://arxiv.org/pdf/1904.04447.pdf)                                                                      |
+    |         Rank          |                  [Fibinet](models/rank/fibinet/model.py)                  |   ✓   |    ✓    |        ✓         |     ✓     | [RecSys19][FiBiNET: Combining Feature Importance and Bilinear feature Interaction for Click-Through Rate Prediction]( https://arxiv.org/pdf/1905.09433.pdf)                                                 |
+    |         Rank          |                     [Flen](models/rank/flen/model.py)                     |   ✓   |    ✓    |        ✓         |     ✓     | [2019][FLEN: Leveraging Field for Scalable CTR Prediction]( https://arxiv.org/pdf/1911.04690.pdf)                                                                                                           |
+    |      Multi-Task       |                  [ESMM](models/multitask/esmm/model.py)                   |   ✓   |    ✓    |        ✓         |     ✓     | [SIGIR 2018][Entire Space Multi-Task Model: An Effective Approach for Estimating Post-Click Conversion Rate](https://arxiv.org/abs/1804.07931)                                                              |
+    |      Multi-Task       |                  [MMOE](models/multitask/mmoe/model.py)                   |   ✓   |    ✓    |        ✓         |     ✓     | [KDD 2018][Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts](https://dl.acm.org/doi/abs/10.1145/3219819.3220007)                                                       |
+    |      Multi-Task       |           [ShareBottom](models/multitask/share-bottom/model.py)           |   ✓   |    ✓    |        ✓         |     ✓     | [1998][Multitask learning](http://reports-archive.adm.cs.cmu.edu/anon/1997/CMU-CS-97-203.pdf)                                                                                                               |
+    |        Re-Rank        |                [Listwise](models/rerank/listwise/model.py)                |   ✓   |    ✓    |        ✓         |     x     | [2019][Sequential Evaluation and Generation Framework for Combinatorial Recommender System](https://arxiv.org/pdf/1902.00245.pdf)                                                                           |
+
+
+
+
+
+<h2 align="center">Getting Started</h2>
+
+### Environmental requirements
+* Python 2.7/ 3.5 / 3.6 / 3.7
+* PaddlePaddle  >= 1.7.2
+* operating system: Windows/Mac/Linux
+
+  > Linux is recommended for distributed training
+  
+### Installation
+
+1. **Install by pip**
+  ```bash
+  python -m pip install paddle-rec
+  ```
+  > This method will download and install `paddlepaddle-v1.7.2-cpu`. If `PaddlePaddle` can not be installed automatically，You need to install `PaddlePaddle` manually，and then install `PaddleRec` again：
+  > - Download [PaddlePaddle](https://pypi.org/project/paddlepaddle/1.7.2/#files) and install by pip.
+  > - Install `PaddlePaddle` by pip，`python -m pip install paddlepaddle==1.7.2 -i https://mirror.baidu.com/pypi/simple`
+  > - Other installation problems can be raised in [Paddle Issue](https://github.com/PaddlePaddle/Paddle/issues) or [PaddleRec Issue](https://github.com/PaddlePaddle/PaddleRec/issues)
+
+2. **Install by source code**
+
+  - Install PaddlePaddle  
+
+    ```shell
+    python -m pip install paddlepaddle==1.7.2 -i https://mirror.baidu.com/pypi/simple
+    ```
+
+  - Install PaddleRec by source code
+
+    ```
+    git clone https://github.com/PaddlePaddle/PaddleRec/
+    cd PaddleRec
+    python setup.py install
+    ```
+
+- Install PaddleRec-GPU  
+
+  After installing `PaddleRec`，please install the appropriate version of `paddlepaddle-gpu` according to your environment (CUDA / cudnn)，refer to the installation tutorial [Installation Manuals](https://www.paddlepaddle.org.cn/documentation/docs/en/install/index_en.html)
+
+
+<h2 align="center">Quick Start</h2>
+
+We take the `dnn` algorithm as an example to get start of `PaddleRec`, and we take 100 pieces of training data from [Criteo Dataset](https://www.kaggle.com/c/criteo-display-ad-challenge/):
+
+```bash
+# Training with cpu
+git clone https://github.com/PaddlePaddle/PaddleRec.git paddle-rec
+cd paddle-rec
+
+python -m paddlerec.run -m models/rank/dnn/config.yaml  
+```
+
+
+<h2 align="center">Documentation</h2>
+
+### Background
+* [Recommendation System](doc/rec_background.md)
+* [Distributed deep learning](doc/ps_background.md)
+
+### Introductory Project
+* [Get start of PaddleRec in ten minutes](https://aistudio.baidu.com/aistudio/projectdetail/559336)
+
+### Introductory tutorial
+* [Data](doc/slot_reader.md)
+* [Model](doc/model.md)
+* [Loacl Train](doc/train.md)
+* [Distributed Train](doc/distributed_train.md)
+* [Predict](doc/predict.md)
+* [Serving](doc/serving.md)
+
+
+### Advanced tutorial
+* [Custom Reader](doc/custom_reader.md)
+* [Custom Model](doc/model_develop.md)
+* [Custom Training Process](doc/trainer_develop.md)
+* [Configuration description of yaml](doc/yaml.md)
+* [Design document of PaddleRec](doc/design.md)
+
+### Benchmark
+* [Benchmark](doc/benchmark.md)
+
+### FAQ
+* [Common Problem FAQ](doc/faq.md)
+
+
+<h2 align="center">Community</h2>
+
+<p align="center">
+    <br>
+    <img alt="Release" src="https://img.shields.io/badge/Release-0.1.0-yellowgreen">
+    <img alt="License" src="https://img.shields.io/github/license/PaddlePaddle/PaddleRec">
+    <img alt="Slack" src="https://img.shields.io/badge/Join-Slack-green">
+    <br>
+<p>
+
+### Version history
+- 2020.06.17 - PaddleRec v0.1.0
+- 2020.06.03 - PaddleRec v0.0.2
+- 2020.05.14 - PaddleRec v0.0.1
+  
+### License
+[Apache 2.0 license](LICENSE)
+
+### Contact us
+
+For any feedback, please propose a [GitHub Issue](https://github.com/PaddlePaddle/PaddleRec/issues)
+
+You can also communicate with us in the following ways：
+
+- QQ group id：`861717190`
+- Wechat account：`paddlerec2020`
+
+<p align="center"><img width="200" height="200" margin="500" src="./doc/imgs/QQ_group.png"/>&#8194;&#8194;&#8194;&#8194;&#8194<img width="200" height="200"  src="doc/imgs/weixin_supporter.png"/></p>
+<p align="center">PaddleRec QQ Group&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;PaddleRec Wechat account</p>
--- a/core/engine/cluster/cloud/before_hook_cpu.sh.template
+++ b/core/engine/cluster/cloud/before_hook_cpu.sh.template
 echo "Run before_hook.sh ..."

-wget https://paddlerec.bj.bcebos.com/whl/PaddleRec.tar.gz
+wget https://paddlerec.bj.bcebos.com/whl/PaddleRec.tar.gz --no-check-certificate 

 tar -xf PaddleRec.tar.gz

@@ -10,6 +10,6 @@ python setup.py install

 pip uninstall -y paddlepaddle

-pip install paddlepaddle-gpu==<$ PADDLEPADDLE_VERSION $> --index-url=http://pip.baidu.com/pypi/simple --trusted-host pip.baidu.com
+pip install paddlepaddle==<$ PADDLEPADDLE_VERSION $> --index-url=http://pip.baidu.com/pypi/simple --trusted-host pip.baidu.com

 echo "End before_hook.sh ..."
--- a/core/engine/cluster/cloud/before_hook_gpu.sh.template
+++ b/core/engine/cluster/cloud/before_hook_gpu.sh.template
 echo "Run before_hook.sh ..."

-wget https://paddlerec.bj.bcebos.com/whl/PaddleRec.tar.gz
+wget https://paddlerec.bj.bcebos.com/whl/PaddleRec.tar.gz --no-check-certificate

 tar -xf PaddleRec.tar.gz


--- a/core/engine/cluster/cloud/cluster.sh
+++ b/core/engine/cluster/cloud/cluster.sh
@@ -39,7 +39,12 @@ function _before_submit() {
  elif [ ${DISTRIBUTE_MODE} == "COLLECTIVE_GPU_K8S" ]; then
    _gen_gpu_before_hook
    _gen_k8s_config
-    _gen_k8s_job
+    _gen_k8s_gpu_job
+    _gen_end_hook
+  elif [ ${DISTRIBUTE_MODE} == "PS_CPU_K8S" ]; then
+    _gen_cpu_before_hook
+    _gen_k8s_config
+    _gen_k8s_cpu_job
    _gen_end_hook
  fi
  
@@ -54,6 +59,7 @@ function _gen_mpi_config() {
      -e "s#<$ OUTPUT_PATH $>#$OUTPUT_PATH#g" \
      -e "s#<$ THIRDPARTY_PATH $>#$THIRDPARTY_PATH#g" \
      -e "s#<$ CPU_NUM $>#$max_thread_num#g" \
+      -e "s#<$ USE_PYTHON3 $>#$USE_PYTHON3#g" \
      -e "s#<$ FLAGS_communicator_is_sgd_optimizer $>#$FLAGS_communicator_is_sgd_optimizer#g" \
      -e "s#<$ FLAGS_communicator_send_queue_size $>#$FLAGS_communicator_send_queue_size#g" \
      -e "s#<$ FLAGS_communicator_thread_pool_size $>#$FLAGS_communicator_thread_pool_size#g" \
@@ -71,6 +77,7 @@ function _gen_k8s_config() {
      -e "s#<$ AFS_REMOTE_MOUNT_POINT $>#$AFS_REMOTE_MOUNT_POINT#g" \
      -e "s#<$ OUTPUT_PATH $>#$OUTPUT_PATH#g" \
      -e "s#<$ CPU_NUM $>#$max_thread_num#g" \
+      -e "s#<$ USE_PYTHON3 $>#$USE_PYTHON3#g" \
      -e "s#<$ FLAGS_communicator_is_sgd_optimizer $>#$FLAGS_communicator_is_sgd_optimizer#g" \
      -e "s#<$ FLAGS_communicator_send_queue_size $>#$FLAGS_communicator_send_queue_size#g" \
      -e "s#<$ FLAGS_communicator_thread_pool_size $>#$FLAGS_communicator_thread_pool_size#g" \
@@ -101,6 +108,7 @@ function _gen_end_hook() {
 function _gen_mpi_job() {
  echo "gen mpi_job.sh"
  sed -e "s#<$ GROUP_NAME $>#$GROUP_NAME#g" \
+      -e "s#<$ JOB_NAME $>#$OLD_JOB_NAME#g" \
      -e "s#<$ AK $>#$AK#g" \
      -e "s#<$ SK $>#$SK#g" \
      -e "s#<$ MPI_PRIORITY $>#$PRIORITY#g" \
@@ -109,18 +117,34 @@ function _gen_mpi_job() {
      ${abs_dir}/cloud/mpi_job.sh.template >${PWD}/job.sh
 }

-function _gen_k8s_job() {
+function _gen_k8s_gpu_job() {
  echo "gen k8s_job.sh"
  sed -e "s#<$ GROUP_NAME $>#$GROUP_NAME#g" \
+      -e "s#<$ JOB_NAME $>#$OLD_JOB_NAME#g" \
      -e "s#<$ AK $>#$AK#g" \
      -e "s#<$ SK $>#$SK#g" \
      -e "s#<$ K8S_PRIORITY $>#$PRIORITY#g" \
      -e "s#<$ K8S_TRAINERS $>#$K8S_TRAINERS#g" \
+      -e "s#<$ K8S_CPU_CORES $>#$K8S_CPU_CORES#g" \
      -e "s#<$ K8S_GPU_CARD $>#$K8S_GPU_CARD#g" \
      -e "s#<$ START_CMD $>#$START_CMD#g" \
      ${abs_dir}/cloud/k8s_job.sh.template >${PWD}/job.sh
 }

+function _gen_k8s_cpu_job() {
+  echo "gen k8s_job.sh"
+  sed -e "s#<$ GROUP_NAME $>#$GROUP_NAME#g" \
+      -e "s#<$ JOB_NAME $>#$OLD_JOB_NAME#g" \
+      -e "s#<$ AK $>#$AK#g" \
+      -e "s#<$ SK $>#$SK#g" \
+      -e "s#<$ K8S_PRIORITY $>#$PRIORITY#g" \
+      -e "s#<$ K8S_TRAINERS $>#$K8S_TRAINERS#g" \
+      -e "s#<$ K8S_PS_NUM $>#$K8S_PS_NUM#g" \
+      -e "s#<$ K8S_PS_CORES $>#$K8S_PS_CORES#g" \
+      -e "s#<$ K8S_CPU_CORES $>#$K8S_CPU_CORES#g" \
+      -e "s#<$ START_CMD $>#$START_CMD#g" \
+      ${abs_dir}/cloud/k8s_cpu_job.sh.template >${PWD}/job.sh
+}


 #-----------------------------------------------------------------------------------------------------------------
@@ -145,6 +169,7 @@ function _submit() {
 function package_hook() {
  cur_time=`date  +"%Y%m%d%H%M"`
  new_job_name="${JOB_NAME}_${cur_time}"
+  export OLD_JOB_NAME=${JOB_NAME}
  export JOB_NAME=${new_job_name}
  export job_file_path="${PWD}/${new_job_name}"
  mkdir ${job_file_path}

--- a/core/engine/cluster/cloud/k8s_config.ini.template
+++ b/core/engine/cluster/cloud/k8s_config.ini.template
@@ -19,6 +19,8 @@ afs_local_mount_point="/root/paddlejob/workspace/env_run/afs/"
 # 新k8s afs挂载帮助文档: http://wiki.baidu.com/pages/viewpage.action?pageId=906443193

 PADDLE_PADDLEREC_ROLE=WORKER
+PADDLEREC_CLUSTER_TYPE=K8S
+use_python3=<$ USE_PYTHON3 $>
 CPU_NUM=<$ CPU_NUM $>
 GLOG_v=0


--- a/core/engine/cluster/cloud/k8s_cpu_job.sh.template
+++ b/core/engine/cluster/cloud/k8s_cpu_job.sh.template
+#!/bin/bash
+###############################################################
+##                  注意-- 注意--注意                          ##
+##                 K8S PS-CPU多机作业作业示例                    ##
+###############################################################
+job_name=<$ JOB_NAME $>
+
+# 作业参数
+group_name="<$ GROUP_NAME $>"               
+job_version="paddle-fluid-v1.7.1"
+start_cmd="<$ START_CMD $>"
+wall_time="2000:00:00"
+
+k8s_priority=<$ K8S_PRIORITY $>
+k8s_trainers=<$ K8S_TRAINERS $>
+k8s_cpu_cores=<$ K8S_CPU_CORES $>
+k8s_ps_num=<$ K8S_PS_NUM $>
+k8s_ps_cores=<$ K8S_PS_CORES $>
+
+# 你的ak/sk（可在paddlecloud web页面【个人中心】处获取）
+ak=<$ AK $>
+sk=<$ SK $>
+
+paddlecloud job --ak ${ak} --sk ${sk} \
+        train --job-name ${job_name} \
+        --group-name ${group_name} \
+        --job-conf config.ini \
+        --start-cmd "${start_cmd}" \
+        --files ./*  \
+        --job-version ${job_version}  \
+        --k8s-priority ${k8s_priority} \
+        --wall-time ${wall_time} \
+        --k8s-trainers ${k8s_trainers} \
+        --k8s-cpu-cores ${k8s_cpu_cores} \
+        --k8s-ps-num ${k8s_ps_num} \
+        --k8s-ps-cores ${k8s_ps_cores} \
+        --is-standalone 0 \
+        --distribute-job-type "PSERVER" \
+        --json
+        
\ No newline at end of file
--- a/core/engine/cluster/cloud/k8s_job.sh.template
+++ b/core/engine/cluster/cloud/k8s_job.sh.template
@@ -3,18 +3,30 @@
 ##                  注意-- 注意--注意                          ##
 ##                 K8S NCCL2多机作业作业示例                    ##
 ###############################################################
-job_name=${JOB_NAME}
+job_name=<$ JOB_NAME $>

 # 作业参数
 group_name="<$ GROUP_NAME $>"               
 job_version="paddle-fluid-v1.7.1"
 start_cmd="<$ START_CMD $>"
-wall_time="10:00:00"
+wall_time="2000:00:00"

 k8s_priority=<$ K8S_PRIORITY $>
 k8s_trainers=<$ K8S_TRAINERS $>
+k8s_cpu_cores=<$ K8S_CPU_CORES $>
 k8s_gpu_cards=<$ K8S_GPU_CARD $>

+is_stand_alone=0
+nccl="--distribute-job-type "NCCL2""
+if [ ${k8s_trainers} == 1 ];then
+    is_stand_alone=1
+    nccl="--job-remark single-trainer"
+    if [ ${k8s_gpu_cards} == 1];then
+        nccl="--job-remark single-gpu"
+        echo "Attention: Use single GPU card for PaddleRec distributed training, please set runner class from 'cluster_train' to 'train' in config.yaml."
+    fi
+fi
+
 # 你的ak/sk（可在paddlecloud web页面【个人中心】处获取）
 ak=<$ AK $>
 sk=<$ SK $>
@@ -27,9 +39,11 @@ paddlecloud job --ak ${ak} --sk ${sk} \
        --files ./*  \
        --job-version ${job_version}  \
        --k8s-trainers ${k8s_trainers} \
+        --k8s-cpu-cores ${k8s_cpu_cores} \
        --k8s-gpu-cards ${k8s_gpu_cards} \
        --k8s-priority ${k8s_priority} \
        --wall-time ${wall_time} \
-        --is-standalone 0 \
-        --distribute-job-type "NCCL2" \
-        --json
\ No newline at end of file
+        --is-standalone ${is_stand_alone} \
+        --json \
+        ${nccl} 
+        
\ No newline at end of file
--- a/core/engine/cluster/cloud/mpi_config.ini.template
+++ b/core/engine/cluster/cloud/mpi_config.ini.template
@@ -17,6 +17,8 @@ output_path=<$ OUTPUT_PATH $>
 thirdparty_path=<$ THIRDPARTY_PATH $>

 PADDLE_PADDLEREC_ROLE=WORKER
+PADDLEREC_CLUSTER_TYPE=MPI
+use_python3=<$ USE_PYTHON3 $>
 CPU_NUM=<$ CPU_NUM $>
 GLOG_v=0


--- a/core/engine/cluster/cloud/mpi_job.sh.template
+++ b/core/engine/cluster/cloud/mpi_job.sh.template
@@ -3,13 +3,13 @@
 ##                  注意--注意--注意                         ##
 ##                  MPI 类型作业演示                         ##
 ###############################################################
-job_name=${JOB_NAME}
+job_name=<$ JOB_NAME $>

 # 作业参数
 group_name=<$ GROUP_NAME $>
 job_version="paddle-fluid-v1.7.1"
 start_cmd="<$ START_CMD $>"
-wall_time="2:00:00"
+wall_time="2000:00:00"

 # 你的ak/sk（可在paddlecloud web页面【个人中心】处获取）
 ak=<$ AK $>

--- a/core/engine/cluster/cluster.py
+++ b/core/engine/cluster/cluster.py
@@ -67,10 +67,10 @@ class ClusterEngine(Engine):

    @staticmethod
    def workspace_replace():
-        workspace = envs.get_runtime_environ("workspace")
+        remote_workspace = envs.get_runtime_environ("remote_workspace")

        for k, v in os.environ.items():
-            v = v.replace("{workspace}", workspace)
+            v = v.replace("{workspace}", remote_workspace)
            os.environ[k] = str(v)

    def run(self):
@@ -98,14 +98,12 @@ class ClusterEngine(Engine):
                cluster_env_check_tool = PaddleCloudMpiEnv()
            else:
                raise ValueError(
-                    "Paddlecloud with Mpi don't support GPU training, check your config"
+                    "Paddlecloud with Mpi don't support GPU training, check your config.yaml & backend.yaml"
                )
        elif cluster_type.upper() == "K8S":
            if fleet_mode == "PS":
                if device == "CPU":
-                    raise ValueError(
-                        "PS-CPU on paddlecloud is not supported at this time, comming soon"
-                    )
+                    cluster_env_check_tool = CloudPsCpuEnv()
                elif device == "GPU":
                    raise ValueError(
                        "PS-GPU on paddlecloud is not supported at this time, comming soon"
@@ -115,7 +113,7 @@ class ClusterEngine(Engine):
                    cluster_env_check_tool = CloudCollectiveEnv()
                elif device == "CPU":
                    raise ValueError(
-                        "Unexpected config -> device: CPU with fleet_mode: Collective, check your config"
+                        "Unexpected config -> device: CPU with fleet_mode: Collective, check your config.yaml"
                    )
        else:
            raise ValueError("cluster_type {} error, must in MPI/K8S".format(
@@ -161,23 +159,30 @@ class ClusterEnvBase(object):
        self.cluster_env["PADDLE_VERSION"] = self.backend_env.get(
            "config.paddle_version", "1.7.2")

+        # python_version
+        self.cluster_env["USE_PYTHON3"] = self.backend_env.get(
+            "config.use_python3", "0")
+
        # communicator
+        max_thread_num = int(envs.get_runtime_environ("max_thread_num"))
        self.cluster_env[
            "FLAGS_communicator_is_sgd_optimizer"] = self.backend_env.get(
                "config.communicator.FLAGS_communicator_is_sgd_optimizer", 0)
        self.cluster_env[
            "FLAGS_communicator_send_queue_size"] = self.backend_env.get(
-                "config.communicator.FLAGS_communicator_send_queue_size", 5)
+                "config.communicator.FLAGS_communicator_send_queue_size",
+                max_thread_num)
        self.cluster_env[
            "FLAGS_communicator_thread_pool_size"] = self.backend_env.get(
                "config.communicator.FLAGS_communicator_thread_pool_size", 32)
        self.cluster_env[
            "FLAGS_communicator_max_merge_var_num"] = self.backend_env.get(
-                "config.communicator.FLAGS_communicator_max_merge_var_num", 5)
+                "config.communicator.FLAGS_communicator_max_merge_var_num",
+                max_thread_num)
        self.cluster_env[
            "FLAGS_communicator_max_send_grad_num_before_recv"] = self.backend_env.get(
                "config.communicator.FLAGS_communicator_max_send_grad_num_before_recv",
-                5)
+                max_thread_num)
        self.cluster_env["FLAGS_communicator_fake_rpc"] = self.backend_env.get(
            "config.communicator.FLAGS_communicator_fake_rpc", 0)
        self.cluster_env["FLAGS_rpc_retry_times"] = self.backend_env.get(
@@ -234,7 +239,7 @@ class PaddleCloudMpiEnv(ClusterEnvBase):
            "config.train_data_path", "")
        if self.cluster_env["TRAIN_DATA_PATH"] == "":
            raise ValueError(
-                "No -- TRAIN_DATA_PATH -- found in your backend.yaml, please check."
+                "No -- TRAIN_DATA_PATH -- found in your backend.yaml, please add train_data_path in your backend yaml."
            )
        # test_data_path
        self.cluster_env["TEST_DATA_PATH"] = self.backend_env.get(
@@ -274,7 +279,7 @@ class PaddleCloudK8sEnv(ClusterEnvBase):
                category=UserWarning,
                stacklevel=2)
        warnings.warn(
-            "The remote mount point will be mounted to the ./afs/",
+            "The remote afs path will be mounted to the ./afs/",
            category=UserWarning,
            stacklevel=2)

@@ -293,3 +298,21 @@ class CloudCollectiveEnv(PaddleCloudK8sEnv):
            "submit.k8s_gpu_card", 1)
        self.cluster_env["K8S_CPU_CORES"] = self.backend_env.get(
            "submit.k8s_cpu_cores", 1)
+
+
+class CloudPsCpuEnv(PaddleCloudK8sEnv):
+    def __init__(self):
+        super(CloudPsCpuEnv, self).__init__()
+
+    def env_check(self):
+        super(CloudPsCpuEnv, self).env_check()
+
+        self.cluster_env["DISTRIBUTE_MODE"] = "PS_CPU_K8S"
+        self.cluster_env["K8S_TRAINERS"] = self.backend_env.get(
+            "submit.k8s_trainers", 1)
+        self.cluster_env["K8S_CPU_CORES"] = self.backend_env.get(
+            "submit.k8s_cpu_cores", 2)
+        self.cluster_env["K8S_PS_NUM"] = self.backend_env.get(
+            "submit.k8s_ps_num", 1)
+        self.cluster_env["K8S_PS_CORES"] = self.backend_env.get(
+            "submit.k8s_ps_cores", 2)
--- a/core/factory.py
+++ b/core/factory.py
@@ -22,6 +22,19 @@ trainers = {}


 def trainer_registry():
+    trainers["SingleTrainer"] = os.path.join(trainer_abs, "single_trainer.py")
+    trainers["ClusterTrainer"] = os.path.join(trainer_abs,
+                                              "cluster_trainer.py")
+    trainers["CtrCodingTrainer"] = os.path.join(trainer_abs,
+                                                "ctr_coding_trainer.py")
+    trainers["CtrModulTrainer"] = os.path.join(trainer_abs,
+                                               "ctr_modul_trainer.py")
+    trainers["TDMSingleTrainer"] = os.path.join(trainer_abs,
+                                                "tdm_single_trainer.py")
+    trainers["TDMClusterTrainer"] = os.path.join(trainer_abs,
+                                                 "tdm_cluster_trainer.py")
+    trainers["OnlineLearningTrainer"] = os.path.join(
+        trainer_abs, "online_learning_trainer.py")
    # Definition of procedure execution process
    trainers["CtrCodingTrainer"] = os.path.join(trainer_abs,
                                                "ctr_coding_trainer.py")

--- a/core/metric.py
+++ b/core/metric.py
@@ -23,34 +23,58 @@ class Metric(object):
    __metaclass__ = abc.ABCMeta

    def __init__(self, config):
-        """ """
+        """R
+        """
        pass

-    def clear(self, scope=None, **kwargs):
-        """
-        clear current value
-        Args:
-            scope: value container
-            params: extend varilable for clear
+    def clear(self, scope=None):
+        """R
        """
        if scope is None:
            scope = fluid.global_scope()

        place = fluid.CPUPlace()
-        for (varname, dtype) in self._need_clear_list:
-            if scope.find_var(varname) is None:
+        for key in self._global_metric_state_vars:
+            varname, dtype = self._global_metric_state_vars[key]
+            var = scope.find_var(varname)
+            if not var:
                continue
-            var = scope.var(varname).get_tensor()
+            var = var.get_tensor()
            data_array = np.zeros(var._get_dims()).astype(dtype)
            var.set(data_array, place)

-    def calculate(self, scope, params):
+    def _get_global_metric_state(self, fleet, scope, metric_name, mode="sum"):
+        """R
        """
-        calculate result
-        Args:
-            scope: value container
-            params: extend varilable for clear
+        var = scope.find_var(metric_name)
+        if not var:
+            return None
+        input = np.array(var.get_tensor())
+        if fleet is None:
+            return input
+        fleet._role_maker._barrier_worker()
+        old_shape = np.array(input.shape)
+        input = input.reshape(-1)
+        output = np.copy(input) * 0
+        fleet._role_maker._all_reduce(input, output, mode=mode)
+        output = output.reshape(old_shape)
+        return output
+
+    def calc_global_metrics(self, fleet, scope=None):
+        """R
        """
+        if scope is None:
+            scope = fluid.global_scope()
+
+        global_metrics = dict()
+        for key in self._global_metric_state_vars:
+            varname, dtype = self._global_metric_state_vars[key]
+            global_metrics[key] = self._get_global_metric_state(fleet, scope,
+                                                                varname)
+
+        return self._calculate(global_metrics)
+
+    def _calculate(self, global_metrics):
        pass

    @abc.abstractmethod

--- a/core/metrics/__init__.py
+++ b/core/metrics/__init__.py
@@ -12,6 +12,9 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-from precision import Precision
+from .recall_k import RecallK
+from .pairwise_pn import PosNegRatio
+from .precision_recall import PrecisionRecall
+from .auc import AUC

-__all__ = ['Precision']
+__all__ = ['RecallK', 'PosNegRatio', 'AUC', 'PrecisionRecall']
--- a/core/metrics/auc_metrics.py
+++ b/core/metrics/auc_metrics.py
@@ -18,102 +18,60 @@ import numpy as np
 import paddle.fluid as fluid

 from paddlerec.core.metric import Metric
+from paddle.fluid.layers.tensor import Variable


-class AUCMetric(Metric):
+class AUC(Metric):
    """
    Metric For Fluid Model
    """

-    def __init__(self, config, fleet):
+    def __init__(self,
+                 input,
+                 label,
+                 curve='ROC',
+                 num_thresholds=2**12 - 1,
+                 topk=1,
+                 slide_steps=1):
        """ """
-        self.config = config
-        self.fleet = fleet
-
-    def clear(self, scope, params):
-        """
-        Clear current metric value, usually set to zero
-        Args:
-            scope : paddle runtime var container
-            params(dict) :
-                label : a group name for metric
-                metric_dict : current metric_items in group
-        Return:
-            None
-        """
-        self._label = params['label']
-        self._metric_dict = params['metric_dict']
-        self._result = {}
-        place = fluid.CPUPlace()
-        for metric_name in self._metric_dict:
-            metric_config = self._metric_dict[metric_name]
-            if scope.find_var(metric_config['var'].name) is None:
-                continue
-            metric_var = scope.var(metric_config['var'].name).get_tensor()
-            data_type = 'float32'
-            if 'data_type' in metric_config:
-                data_type = metric_config['data_type']
-            data_array = np.zeros(metric_var._get_dims()).astype(data_type)
-            metric_var.set(data_array, place)
-
-    def get_metric(self, scope, metric_name):
-        """
-        reduce metric named metric_name from all worker
-        Return:
-            metric reduce result
-        """
-        metric = np.array(scope.find_var(metric_name).get_tensor())
-        old_metric_shape = np.array(metric.shape)
-        metric = metric.reshape(-1)
-        global_metric = np.copy(metric) * 0
-        self.fleet._role_maker.all_reduce_worker(metric, global_metric)
-        global_metric = global_metric.reshape(old_metric_shape)
-        return global_metric[0]
-
-    def get_global_metrics(self, scope, metric_dict):
-        """
-        reduce all metric in metric_dict from all worker
-        Return:
-            dict : {matric_name : metric_result}
-        """
-        self.fleet._role_maker._barrier_worker()
-        result = {}
-        for metric_name in metric_dict:
-            metric_item = metric_dict[metric_name]
-            if scope.find_var(metric_item['var'].name) is None:
-                result[metric_name] = None
-                continue
-            result[metric_name] = self.get_metric(scope,
-                                                  metric_item['var'].name)
-        return result
-
-    def calculate_auc(self, global_pos, global_neg):
-        """R
-        """
-        num_bucket = len(global_pos)
-        area = 0.0
-        pos = 0.0
-        neg = 0.0
-        new_pos = 0.0
-        new_neg = 0.0
-        total_ins_num = 0
-        for i in range(num_bucket):
-            index = num_bucket - 1 - i
-            new_pos = pos + global_pos[index]
-            total_ins_num += global_pos[index]
-            new_neg = neg + global_neg[index]
-            total_ins_num += global_neg[index]
-            area += (new_neg - neg) * (pos + new_pos) / 2
-            pos = new_pos
-            neg = new_neg
-        auc_value = None
-        if pos * neg == 0 or total_ins_num == 0:
-            auc_value = 0.5
-        else:
-            auc_value = area / (pos * neg)
-        return auc_value
-
-    def calculate_bucket_error(self, global_pos, global_neg):
+        if not isinstance(input, Variable):
+            raise ValueError("input must be Variable, but received %s" %
+                             type(input))
+        if not isinstance(label, Variable):
+            raise ValueError("label must be Variable, but received %s" %
+                             type(label))
+
+        auc_out, batch_auc_out, [
+            batch_stat_pos, batch_stat_neg, stat_pos, stat_neg
+        ] = fluid.layers.auc(input,
+                             label,
+                             curve=curve,
+                             num_thresholds=num_thresholds,
+                             topk=topk,
+                             slide_steps=slide_steps)
+
+        prob = fluid.layers.slice(input, axes=[1], starts=[1], ends=[2])
+        label_cast = fluid.layers.cast(label, dtype="float32")
+        label_cast.stop_gradient = True
+        sqrerr, abserr, prob, q, pos, total = \
+            fluid.contrib.layers.ctr_metric_bundle(prob, label_cast)
+
+        self._global_metric_state_vars = dict()
+        self._global_metric_state_vars['stat_pos'] = (stat_pos.name, "float32")
+        self._global_metric_state_vars['stat_neg'] = (stat_neg.name, "float32")
+        self._global_metric_state_vars['total_ins_num'] = (total.name,
+                                                           "float32")
+        self._global_metric_state_vars['pos_ins_num'] = (pos.name, "float32")
+        self._global_metric_state_vars['q'] = (q.name, "float32")
+        self._global_metric_state_vars['prob'] = (prob.name, "float32")
+        self._global_metric_state_vars['abserr'] = (abserr.name, "float32")
+        self._global_metric_state_vars['sqrerr'] = (sqrerr.name, "float32")
+
+        self.metrics = dict()
+        self.metrics["AUC"] = auc_out
+        self.metrics["BATCH_AUC"] = batch_auc_out
+
+    def _calculate_bucket_error(self, global_pos, global_neg):
        """R
        """
        num_bucket = len(global_pos)
@@ -161,56 +119,69 @@ class AUCMetric(Metric):
        bucket_error = error_sum / error_count if error_count > 0 else 0.0
        return bucket_error

-    def calculate(self, scope, params):
-        """ """
-        self._label = params['label']
-        self._metric_dict = params['metric_dict']
-        self.fleet._role_maker._barrier_worker()
-        result = self.get_global_metrics(scope, self._metric_dict)
+    def _calculate_auc(self, global_pos, global_neg):
+        """R
+        """
+        num_bucket = len(global_pos)
+        area = 0.0
+        pos = 0.0
+        neg = 0.0
+        new_pos = 0.0
+        new_neg = 0.0
+        total_ins_num = 0
+        for i in range(num_bucket):
+            index = num_bucket - 1 - i
+            new_pos = pos + global_pos[index]
+            total_ins_num += global_pos[index]
+            new_neg = neg + global_neg[index]
+            total_ins_num += global_neg[index]
+            area += (new_neg - neg) * (pos + new_pos) / 2
+            pos = new_pos
+            neg = new_neg
+        auc_value = None
+        if pos * neg == 0 or total_ins_num == 0:
+            auc_value = 0.5
+        else:
+            auc_value = area / (pos * neg)
+        return auc_value
+
+    def _calculate(self, global_metrics):
+        result = dict()
+        for key in self._global_metric_state_vars:
+            if key not in global_metrics:
+                raise ValueError("%s not existed" % key)
+            result[key] = global_metrics[key][0]
+
        if result['total_ins_num'] == 0:
-            self._result = result
-            self._result['auc'] = 0
-            self._result['bucket_error'] = 0
-            self._result['actual_ctr'] = 0
-            self._result['predict_ctr'] = 0
-            self._result['mae'] = 0
-            self._result['rmse'] = 0
-            self._result['copc'] = 0
-            self._result['mean_q'] = 0
-            return self._result
-        if 'stat_pos' in result and 'stat_neg' in result:
-            result['auc'] = self.calculate_auc(result['stat_pos'],
-                                               result['stat_neg'])
-            result['bucket_error'] = self.calculate_auc(result['stat_pos'],
-                                                        result['stat_neg'])
-        if 'pos_ins_num' in result:
+            result['auc'] = 0
+            result['bucket_error'] = 0
+            result['actual_ctr'] = 0
+            result['predict_ctr'] = 0
+            result['mae'] = 0
+            result['rmse'] = 0
+            result['copc'] = 0
+            result['mean_q'] = 0
+        else:
+            result['auc'] = self._calculate_auc(result['stat_pos'],
+                                                result['stat_neg'])
+            result['bucket_error'] = self._calculate_bucket_error(
+                result['stat_pos'], result['stat_neg'])
            result['actual_ctr'] = result['pos_ins_num'] / result[
                'total_ins_num']
-        if 'abserr' in result:
            result['mae'] = result['abserr'] / result['total_ins_num']
-        if 'sqrerr' in result:
            result['rmse'] = math.sqrt(result['sqrerr'] /
                                       result['total_ins_num'])
-        if 'prob' in result:
            result['predict_ctr'] = result['prob'] / result['total_ins_num']
            if abs(result['predict_ctr']) > 1e-6:
                result['copc'] = result['actual_ctr'] / result['predict_ctr']
-
-        if 'q' in result:
            result['mean_q'] = result['q'] / result['total_ins_num']
-        self._result = result
-        return result
-
-    def get_result(self):
-        """ """
-        return self._result

-    def __str__(self):
-        """ """
-        result = self.get_result()
-        result_str = "%s AUC=%.6f BUCKET_ERROR=%.6f MAE=%.6f RMSE=%.6f " \
+        result_str = "AUC=%.6f BUCKET_ERROR=%.6f MAE=%.6f RMSE=%.6f " \
                     "Actural_CTR=%.6f Predicted_CTR=%.6f COPC=%.6f MEAN Q_VALUE=%.6f Ins number=%s" % \
-                     (self._label, result['auc'], result['bucket_error'], result['mae'], result['rmse'],
+                     (result['auc'], result['bucket_error'], result['mae'], result['rmse'],
                      result['actual_ctr'],
                      result['predict_ctr'], result['copc'], result['mean_q'], result['total_ins_num'])
        return result_str
+
+    def get_result(self):
+        return self.metrics
--- a/core/metrics/pairwise_pn.py
+++ b/core/metrics/pairwise_pn.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import math
+
+import numpy as np
+import paddle.fluid as fluid
+
+from paddlerec.core.metric import Metric
+from paddle.fluid.initializer import Constant
+from paddle.fluid.layer_helper import LayerHelper
+from paddle.fluid.layers.tensor import Variable
+
+
+class PosNegRatio(Metric):
+    """
+    Metric For Fluid Model
+    """
+
+    def __init__(self, pos_score, neg_score):
+        """ """
+        kwargs = locals()
+        del kwargs['self']
+
+        helper = LayerHelper("PaddleRec_PosNegRatio", **kwargs)
+        if "pos_score" not in kwargs or "neg_score" not in kwargs:
+            raise ValueError(
+                "PosNegRatio expect pos_score and neg_score as inputs.")
+        pos_score = kwargs.get('pos_score')
+        neg_score = kwargs.get('neg_score')
+
+        if not isinstance(pos_score, Variable):
+            raise ValueError("pos_score must be Variable, but received %s" %
+                             type(pos_score))
+        if not isinstance(neg_score, Variable):
+            raise ValueError("neg_score must be Variable, but received %s" %
+                             type(neg_score))
+
+        wrong = fluid.layers.cast(
+            fluid.layers.less_equal(pos_score, neg_score), dtype='float32')
+        wrong_cnt = fluid.layers.reduce_sum(wrong)
+        right = fluid.layers.cast(
+            fluid.layers.less_than(neg_score, pos_score), dtype='float32')
+        right_cnt = fluid.layers.reduce_sum(right)
+
+        global_right_cnt, _ = helper.create_or_get_global_variable(
+            name="right_cnt", persistable=True, dtype='float32', shape=[1])
+        global_wrong_cnt, _ = helper.create_or_get_global_variable(
+            name="wrong_cnt", persistable=True, dtype='float32', shape=[1])
+
+        for var in [global_right_cnt, global_wrong_cnt]:
+            helper.set_variable_initializer(
+                var, Constant(
+                    value=0.0, force_cpu=True))
+
+        helper.append_op(
+            type="elementwise_add",
+            inputs={"X": [global_right_cnt],
+                    "Y": [right_cnt]},
+            outputs={"Out": [global_right_cnt]})
+        helper.append_op(
+            type="elementwise_add",
+            inputs={"X": [global_wrong_cnt],
+                    "Y": [wrong_cnt]},
+            outputs={"Out": [global_wrong_cnt]})
+        self.pn = (global_right_cnt + 1.0) / (global_wrong_cnt + 1.0)
+
+        self._global_metric_state_vars = dict()
+        self._global_metric_state_vars['right_cnt'] = (global_right_cnt.name,
+                                                       "float32")
+        self._global_metric_state_vars['wrong_cnt'] = (global_wrong_cnt.name,
+                                                       "float32")
+
+        self.metrics = dict()
+        self.metrics['WrongCnt'] = global_wrong_cnt
+        self.metrics['RightCnt'] = global_right_cnt
+        self.metrics['PN'] = self.pn
+
+    def _calculate(self, global_metrics):
+        for key in self._global_communicate_var:
+            if key not in global_metrics:
+                raise ValueError("%s not existed" % key)
+        pn = (global_metrics['right_cnt'][0] + 1.0) / (
+            global_metrics['wrong_cnt'][0] + 1.0)
+        return "RightCnt=%s WrongCnt=%s PN=%s" % (
+            str(global_metrics['right_cnt'][0]),
+            str(global_metrics['wrong_cnt'][0]), str(pn))
+
+    def get_result(self):
+        return self.metrics
--- a/core/metrics/precision_recall.py
+++ b/core/metrics/precision_recall.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import math
+
+import numpy as np
+import paddle.fluid as fluid
+
+from paddlerec.core.metric import Metric
+from paddle.fluid.initializer import Constant
+from paddle.fluid.layer_helper import LayerHelper
+from paddle.fluid.layers.tensor import Variable
+
+
+class PrecisionRecall(Metric):
+    """
+    Metric For Fluid Model
+    """
+
+    def __init__(self, input, label, class_num):
+        """R
+        """
+        kwargs = locals()
+        del kwargs['self']
+
+        self.num_cls = class_num
+
+        if not isinstance(input, Variable):
+            raise ValueError("input must be Variable, but received %s" %
+                             type(input))
+        if not isinstance(label, Variable):
+            raise ValueError("label must be Variable, but received %s" %
+                             type(label))
+
+        helper = LayerHelper("PaddleRec_PrecisionRecall", **kwargs)
+        label = fluid.layers.cast(label, dtype="int32")
+        label.stop_gradient = True
+        max_probs, indices = fluid.layers.nn.topk(input, k=1)
+        indices = fluid.layers.cast(indices, dtype="int32")
+        indices.stop_gradient = True
+
+        states_info, _ = helper.create_or_get_global_variable(
+            name="states_info",
+            persistable=True,
+            dtype='float32',
+            shape=[self.num_cls, 4])
+        states_info.stop_gradient = True
+
+        helper.set_variable_initializer(
+            states_info, Constant(
+                value=0.0, force_cpu=True))
+
+        batch_metrics, _ = helper.create_or_get_global_variable(
+            name="batch_metrics",
+            persistable=False,
+            dtype='float32',
+            shape=[6])
+        accum_metrics, _ = helper.create_or_get_global_variable(
+            name="global_metrics",
+            persistable=False,
+            dtype='float32',
+            shape=[6])
+
+        batch_states = fluid.layers.fill_constant(
+            shape=[self.num_cls, 4], value=0.0, dtype="float32")
+        batch_states.stop_gradient = True
+
+        helper.append_op(
+            type="precision_recall",
+            attrs={'class_number': self.num_cls},
+            inputs={
+                'MaxProbs': [max_probs],
+                'Indices': [indices],
+                'Labels': [label],
+                'StatesInfo': [states_info]
+            },
+            outputs={
+                'BatchMetrics': [batch_metrics],
+                'AccumMetrics': [accum_metrics],
+                'AccumStatesInfo': [batch_states]
+            })
+        helper.append_op(
+            type="assign",
+            inputs={'X': [batch_states]},
+            outputs={'Out': [states_info]})
+
+        batch_states.stop_gradient = True
+        states_info.stop_gradient = True
+
+        self._global_metric_state_vars = dict()
+        self._global_metric_state_vars['states_info'] = (states_info.name,
+                                                         "float32")
+
+        self.metrics = dict()
+        self.metrics["precision_recall_f1"] = accum_metrics
+        self.metrics["[TP FP TN FN]"] = states_info
+
+    def _calculate(self, global_metrics):
+        for key in self._global_metric_state_vars:
+            if key not in global_metrics:
+                raise ValueError("%s not existed" % key)
+
+        def calc_precision(tp_count, fp_count):
+            if tp_count > 0.0 or fp_count > 0.0:
+                return tp_count / (tp_count + fp_count)
+            return 1.0
+
+        def calc_recall(tp_count, fn_count):
+            if tp_count > 0.0 or fn_count > 0.0:
+                return tp_count / (tp_count + fn_count)
+            return 1.0
+
+        def calc_f1_score(precision, recall):
+            if precision > 0.0 or recall > 0.0:
+                return 2 * precision * recall / (precision + recall)
+            return 0.0
+
+        states = global_metrics["states_info"]
+        total_tp_count = 0.0
+        total_fp_count = 0.0
+        total_fn_count = 0.0
+        macro_avg_precision = 0.0
+        macro_avg_recall = 0.0
+        for i in range(self.num_cls):
+            total_tp_count += states[i][0]
+            total_fp_count += states[i][1]
+            total_fn_count += states[i][3]
+            macro_avg_precision += calc_precision(states[i][0], states[i][1])
+            macro_avg_recall += calc_recall(states[i][0], states[i][3])
+        metrics = []
+        macro_avg_precision /= self.num_cls
+        macro_avg_recall /= self.num_cls
+        metrics.append(macro_avg_precision)
+        metrics.append(macro_avg_recall)
+        metrics.append(calc_f1_score(macro_avg_precision, macro_avg_recall))
+        micro_avg_precision = calc_precision(total_tp_count, total_fp_count)
+        metrics.append(micro_avg_precision)
+        micro_avg_recall = calc_recall(total_tp_count, total_fn_count)
+        metrics.append(micro_avg_recall)
+        metrics.append(calc_f1_score(micro_avg_precision, micro_avg_recall))
+        return "total metrics: [TP, FP, TN, FN]=%s; precision_recall_f1=%s" % (
+            str(states), str(np.array(metrics).astype('float32')))
+
+    def get_result(self):
+        return self.metrics
--- a/core/metrics/precision.py
+++ b/core/metrics/precision.py
@@ -18,92 +18,86 @@ import numpy as np
 import paddle.fluid as fluid

 from paddlerec.core.metric import Metric
-from paddle.fluid.layers import nn, accuracy
+from paddle.fluid.layers import accuracy
 from paddle.fluid.initializer import Constant
 from paddle.fluid.layer_helper import LayerHelper
+from paddle.fluid.layers.tensor import Variable


-class Precision(Metric):
+class RecallK(Metric):
    """
    Metric For Fluid Model
    """

-    def __init__(self, **kwargs):
+    def __init__(self, input, label, k=20):
        """ """
-        helper = LayerHelper("PaddleRec_Precision", **kwargs)
-        self.batch_accuracy = accuracy(
-            kwargs.get("input"), kwargs.get("label"), kwargs.get("k"))
-        local_ins_num, _ = helper.create_or_get_global_variable(
-            name="local_ins_num", persistable=True, dtype='float32',
-            shape=[1])
-        local_pos_num, _ = helper.create_or_get_global_variable(
-            name="local_pos_num", persistable=True, dtype='float32',
-            shape=[1])
-
-        batch_pos_num, _ = helper.create_or_get_global_variable(
-            name="batch_pos_num",
-            persistable=False,
-            dtype='float32',
-            shape=[1])
-        batch_ins_num, _ = helper.create_or_get_global_variable(
-            name="batch_ins_num",
-            persistable=False,
-            dtype='float32',
-            shape=[1])
-
-        tmp_ones = helper.create_global_variable(
-            name="batch_size_like_ones",
-            persistable=False,
-            dtype='float32',
-            shape=[-1])
-
-        for var in [
-                batch_pos_num, batch_ins_num, local_pos_num, local_ins_num
-        ]:
-            print(var, type(var))
+        kwargs = locals()
+        del kwargs['self']
+        self.k = k
+
+        if not isinstance(input, Variable):
+            raise ValueError("input must be Variable, but received %s" %
+                             type(input))
+        if not isinstance(label, Variable):
+            raise ValueError("label must be Variable, but received %s" %
+                             type(label))
+
+        helper = LayerHelper("PaddleRec_RecallK", **kwargs)
+        batch_accuracy = accuracy(input, label, self.k)
+        global_ins_cnt, _ = helper.create_or_get_global_variable(
+            name="ins_cnt", persistable=True, dtype='float32', shape=[1])
+        global_pos_cnt, _ = helper.create_or_get_global_variable(
+            name="pos_cnt", persistable=True, dtype='float32', shape=[1])
+
+        for var in [global_ins_cnt, global_pos_cnt]:
            helper.set_variable_initializer(
                var, Constant(
                    value=0.0, force_cpu=True))

-        helper.append_op(
-            type='fill_constant_batch_size_like',
-            inputs={"Input": kwargs.get("label")},
-            outputs={'Out': [tmp_ones]},
-            attrs={
-                'shape': [-1, 1],
-                'dtype': tmp_ones.dtype,
-                'value': float(1.0),
-            })
-        helper.append_op(
-            type="reduce_sum",
-            inputs={"X": [tmp_ones]},
-            outputs={"Out": [batch_ins_num]})
-
-        helper.append_op(
-            type="elementwise_mul",
-            inputs={"X": [batch_ins_num],
-                    "Y": [self.batch_accuracy]},
-            outputs={"Out": [batch_pos_num]})
+        tmp_ones = fluid.layers.fill_constant(
+            shape=fluid.layers.shape(label), dtype="float32", value=1.0)
+        batch_ins = fluid.layers.reduce_sum(tmp_ones)
+        batch_pos = batch_ins * batch_accuracy

        helper.append_op(
            type="elementwise_add",
-            inputs={"X": [local_pos_num],
-                    "Y": [batch_pos_num]},
-            outputs={"Out": [local_pos_num]})
+            inputs={"X": [global_ins_cnt],
+                    "Y": [batch_ins]},
+            outputs={"Out": [global_ins_cnt]})

        helper.append_op(
            type="elementwise_add",
-            inputs={"X": [local_ins_num],
-                    "Y": [batch_ins_num]},
-            outputs={"Out": [local_ins_num]})
+            inputs={"X": [global_pos_cnt],
+                    "Y": [batch_pos]},
+            outputs={"Out": [global_pos_cnt]})
+
+        self.acc = global_pos_cnt / global_ins_cnt

-        self.accuracy = local_pos_num / local_ins_num
+        self._global_metric_state_vars = dict()
+        self._global_metric_state_vars['ins_cnt'] = (global_ins_cnt.name,
+                                                     "float32")
+        self._global_metric_state_vars['pos_cnt'] = (global_pos_cnt.name,
+                                                     "float32")

-        self._need_clear_list = [("local_ins_num", "float32"),
-                                 ("local_pos_num", "float32")]
+        metric_name = "Acc(Recall@%d)" % self.k
        self.metrics = dict()
-        metric_varname = "P@%d" % kwargs.get("k")
-        self.metrics[metric_varname] = self.accuracy
+        self.metrics["InsCnt"] = global_ins_cnt
+        self.metrics["RecallCnt"] = global_pos_cnt
+        self.metrics[metric_name] = self.acc
+
+    # self.metrics["batch_metrics"] = batch_metrics
+    def _calculate(self, global_metrics):
+        for key in self._global_metric_state_vars:
+            if key not in global_metrics:
+                raise ValueError("%s not existed" % key)
+        ins_cnt = global_metrics['ins_cnt'][0]
+        pos_cnt = global_metrics['pos_cnt'][0]
+        if ins_cnt == 0:
+            acc = 0
+        else:
+            acc = float(pos_cnt) / ins_cnt
+        return "InsCnt=%s RecallCnt=%s Acc(Recall@%d)=%s" % (
+            str(ins_cnt), str(pos_cnt), self.k, str(acc))

    def get_result(self):
        return self.metrics
--- a/core/trainer.py
+++ b/core/trainer.py
@@ -107,6 +107,7 @@ class Trainer(object):
            self.device = Device.GPU
            gpu_id = int(os.environ.get('FLAGS_selected_gpus', 0))
            self._place = fluid.CUDAPlace(gpu_id)
+            print("PaddleRec run on device GPU: {}".format(gpu_id))
            self._exe = fluid.Executor(self._place)
        elif device == "CPU":
            self.device = Device.CPU
@@ -146,6 +147,7 @@ class Trainer(object):
        elif engine.upper() == "CLUSTER":
            self.engine = EngineMode.CLUSTER
            self.is_fleet = True
+            self.which_cluster_type()
        else:
            raise ValueError("Not Support Engine {}".format(engine))
        self._context["is_fleet"] = self.is_fleet
@@ -165,6 +167,14 @@ class Trainer(object):
        self._context["is_pslib"] = (fleet_mode.upper() == "PSLIB")
        self._context["fleet_mode"] = fleet_mode

+    def which_cluster_type(self):
+        cluster_type = os.getenv("PADDLEREC_CLUSTER_TYPE", "MPI")
+        print("PADDLEREC_CLUSTER_TYPE: {}".format(cluster_type))
+        if cluster_type and cluster_type.upper() == "K8S":
+            self._context["cluster_type"] = "K8S"
+        else:
+            self._context["cluster_type"] = "MPI"
+
    def which_executor_mode(self):
        executor_mode = envs.get_runtime_environ("train.trainer.executor_mode")
        if executor_mode.upper() not in ["TRAIN", "INFER"]:

--- a/core/trainers/finetuning_trainer.py
+++ b/core/trainers/finetuning_trainer.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+General Trainer, applicable to many situations: Single/Cluster/Local_Cluster + PS/COLLECTIVE
+"""
+from __future__ import print_function
+
+import os
+
+from paddlerec.core.utils import envs
+from paddlerec.core.trainer import Trainer, EngineMode, FleetMode
+
+
+class FineTuningTrainer(Trainer):
+    """
+    Trainer for various situations
+    """
+
+    def __init__(self, config=None):
+        Trainer.__init__(self, config)
+        self.processor_register()
+        self.abs_dir = os.path.dirname(os.path.abspath(__file__))
+        self.runner_env_name = "runner." + self._context["runner_name"]
+
+    def processor_register(self):
+        print("processor_register begin")
+        self.regist_context_processor('uninit', self.instance)
+        self.regist_context_processor('network_pass', self.network)
+        self.regist_context_processor('startup_pass', self.startup)
+        self.regist_context_processor('train_pass', self.runner)
+        self.regist_context_processor('terminal_pass', self.terminal)
+
+    def instance(self, context):
+        instance_class_path = envs.get_global_env(
+            self.runner_env_name + ".instance_class_path", default_value=None)
+        if instance_class_path:
+            instance_class = envs.lazy_instance_by_fliename(
+                instance_class_path, "Instance")(context)
+        else:
+            if self.engine == EngineMode.SINGLE:
+                instance_class_name = "SingleInstance"
+            else:
+                raise ValueError(
+                    "FineTuningTrainer can only support SingleTraining.")
+
+            instance_path = os.path.join(self.abs_dir, "framework",
+                                         "instance.py")
+
+            instance_class = envs.lazy_instance_by_fliename(
+                instance_path, instance_class_name)(context)
+
+        instance_class.instance(context)
+
+    def network(self, context):
+        network_class_path = envs.get_global_env(
+            self.runner_env_name + ".network_class_path", default_value=None)
+        if network_class_path:
+            network_class = envs.lazy_instance_by_fliename(network_class_path,
+                                                           "Network")(context)
+        else:
+            if self.engine == EngineMode.SINGLE:
+                network_class_name = "FineTuningNetwork"
+            else:
+                raise ValueError(
+                    "FineTuningTrainer can only support SingleTraining.")
+
+            network_path = os.path.join(self.abs_dir, "framework",
+                                        "network.py")
+            network_class = envs.lazy_instance_by_fliename(
+                network_path, network_class_name)(context)
+
+        network_class.build_network(context)
+
+    def startup(self, context):
+        startup_class_path = envs.get_global_env(
+            self.runner_env_name + ".startup_class_path", default_value=None)
+        if startup_class_path:
+            startup_class = envs.lazy_instance_by_fliename(startup_class_path,
+                                                           "Startup")(context)
+        else:
+            if self.engine == EngineMode.SINGLE and not context["is_infer"]:
+                startup_class_name = "FineTuningStartup"
+            else:
+                raise ValueError(
+                    "FineTuningTrainer can only support SingleTraining.")
+
+            startup_path = os.path.join(self.abs_dir, "framework",
+                                        "startup.py")
+
+            startup_class = envs.lazy_instance_by_fliename(
+                startup_path, startup_class_name)(context)
+        startup_class.startup(context)
+
+    def runner(self, context):
+        runner_class_path = envs.get_global_env(
+            self.runner_env_name + ".runner_class_path", default_value=None)
+        if runner_class_path:
+            runner_class = envs.lazy_instance_by_fliename(runner_class_path,
+                                                          "Runner")(context)
+        else:
+            if self.engine == EngineMode.SINGLE and not context["is_infer"]:
+                runner_class_name = "SingleRunner"
+            else:
+                raise ValueError(
+                    "FineTuningTrainer can only support SingleTraining.")
+
+            runner_path = os.path.join(self.abs_dir, "framework", "runner.py")
+            runner_class = envs.lazy_instance_by_fliename(
+                runner_path, runner_class_name)(context)
+        runner_class.run(context)
+
+    def terminal(self, context):
+        terminal_class_path = envs.get_global_env(
+            self.runner_env_name + ".terminal_class_path", default_value=None)
+        if terminal_class_path:
+            terminal_class = envs.lazy_instance_by_fliename(
+                terminal_class_path, "Terminal")(context)
+            terminal_class.terminal(context)
+        else:
+            terminal_class_name = "TerminalBase"
+            if self.engine != EngineMode.SINGLE and self.fleet_mode != FleetMode.COLLECTIVE:
+                terminal_class_name = "PSTerminal"
+
+            terminal_path = os.path.join(self.abs_dir, "framework",
+                                         "terminal.py")
+            terminal_class = envs.lazy_instance_by_fliename(
+                terminal_path, terminal_class_name)(context)
+        terminal_class.terminal(context)
+        context['is_exit'] = True
--- a/core/trainers/framework/dataset.py
+++ b/core/trainers/framework/dataset.py
@@ -123,10 +123,21 @@ class QueueDataset(DatasetBase):
            os.path.join(train_data_path, x)
            for x in os.listdir(train_data_path)
        ]
+        file_list.sort()
+        need_split_files = False
        if context["engine"] == EngineMode.LOCAL_CLUSTER:
+            # for local cluster: split files for multi process
+            need_split_files = True
+        elif context["engine"] == EngineMode.CLUSTER and context[
+                "cluster_type"] == "K8S":
+            # for k8s mount afs, split files for every node
+            need_split_files = True
+
+        if need_split_files:
            file_list = split_files(file_list, context["fleet"].worker_index(),
                                    context["fleet"].worker_num())
        print("File_list: {}".format(file_list))
+
        dataset.set_filelist(file_list)
        for model_dict in context["phases"]:
            if model_dict["dataset_name"] == dataset_name:

--- a/core/trainers/framework/network.py
+++ b/core/trainers/framework/network.py
@@ -23,7 +23,7 @@ from paddlerec.core.trainers.framework.dataset import DataLoader, QueueDataset

 __all__ = [
    "NetworkBase", "SingleNetwork", "PSNetwork", "PslibNetwork",
-    "CollectiveNetwork"
+    "CollectiveNetwork", "FineTuningNetwork"
 ]


@@ -99,7 +99,90 @@ class SingleNetwork(NetworkBase):
        context["dataset"] = {}
        for dataset in context["env"]["dataset"]:
            type = envs.get_global_env("dataset." + dataset["name"] + ".type")
-            if type != "DataLoader":
+
+            if type == "QueueDataset":
+                dataset_class = QueueDataset(context)
+                context["dataset"][dataset[
+                    "name"]] = dataset_class.create_dataset(dataset["name"],
+                                                            context)
+
+        context["status"] = "startup_pass"
+
+
+class FineTuningNetwork(NetworkBase):
+    """R
+    """
+
+    def __init__(self, context):
+        print("Running FineTuningNetwork.")
+
+    def build_network(self, context):
+        context["model"] = {}
+        for model_dict in context["phases"]:
+            context["model"][model_dict["name"]] = {}
+            train_program = fluid.Program()
+            startup_program = fluid.Program()
+            scope = fluid.Scope()
+            dataset_name = model_dict["dataset_name"]
+
+            with fluid.program_guard(train_program, startup_program):
+                with fluid.unique_name.guard():
+                    with fluid.scope_guard(scope):
+                        model_path = envs.os_path_adapter(
+                            envs.workspace_adapter(model_dict["model"]))
+                        model = envs.lazy_instance_by_fliename(
+                            model_path, "Model")(context["env"])
+
+                        model._data_var = model.input_data(
+                            dataset_name=model_dict["dataset_name"])
+
+                        if envs.get_global_env("dataset." + dataset_name +
+                                               ".type") == "DataLoader":
+                            model._init_dataloader(
+                                is_infer=context["is_infer"])
+                            data_loader = DataLoader(context)
+                            data_loader.get_dataloader(context, dataset_name,
+                                                       model._data_loader)
+
+                        model.net(model._data_var, context["is_infer"])
+
+                        finetuning_varnames = envs.get_global_env(
+                            "runner." + context["runner_name"] +
+                            ".finetuning_aspect_varnames",
+                            default_value=[])
+
+                        if len(finetuning_varnames) == 0:
+                            raise ValueError(
+                                "nothing need to be fine tuning, you may use other traning mode"
+                            )
+
+                        if len(finetuning_varnames) != 1:
+                            raise ValueError(
+                                "fine tuning mode can only accept one varname now"
+                            )
+
+                        varname = finetuning_varnames[0]
+                        finetuning_vars = train_program.global_block().vars[
+                            varname]
+                        finetuning_vars.stop_gradient = True
+                        optimizer = model.optimizer()
+                        optimizer.minimize(model._cost)
+
+            context["model"][model_dict["name"]][
+                "main_program"] = train_program
+            context["model"][model_dict["name"]][
+                "startup_program"] = startup_program
+            context["model"][model_dict["name"]]["scope"] = scope
+            context["model"][model_dict["name"]]["model"] = model
+            context["model"][model_dict["name"]][
+                "default_main_program"] = train_program.clone()
+            context["model"][model_dict["name"]]["compiled_program"] = None
+
+        context["dataset"] = {}
+        for dataset in context["env"]["dataset"]:
+            type = envs.get_global_env("dataset." + dataset["name"] + ".type")
+
+            if type == "QueueDataset":
                dataset_class = QueueDataset(context)
                context["dataset"][dataset[
                    "name"]] = dataset_class.create_dataset(dataset["name"],
@@ -133,9 +216,7 @@ class PSNetwork(NetworkBase):
        if envs.get_global_env("dataset." + dataset_name +
                               ".type") == "DataLoader":
            model._init_dataloader(is_infer=False)
-            data_loader = DataLoader(context)
-            data_loader.get_dataloader(context, dataset_name,
-                                       model._data_loader)
+
        model.net(model._data_var, False)
        optimizer = model.optimizer()
        strategy = self._build_strategy(context)
@@ -160,7 +241,11 @@ class PSNetwork(NetworkBase):
            for dataset in context["env"]["dataset"]:
                type = envs.get_global_env("dataset." + dataset["name"] +
                                           ".type")
-                if type != "DataLoader":
+                if type == "DataLoader":
+                    data_loader = DataLoader(context)
+                    data_loader.get_dataloader(context, dataset_name,
+                                               model._data_loader)
+                elif type == "QueueDataset":
                    dataset_class = QueueDataset(context)
                    context["dataset"][dataset[
                        "name"]] = dataset_class.create_dataset(
@@ -229,9 +314,6 @@ class PslibNetwork(NetworkBase):
                    if envs.get_global_env("dataset." + dataset_name +
                                           ".type") == "DataLoader":
                        model._init_dataloader(is_infer=False)
-                        data_loader = DataLoader(context)
-                        data_loader.get_dataloader(context, dataset_name,
-                                                   model._data_loader)
                    model.net(model._data_var, False)
                    optimizer = model.optimizer()

@@ -257,7 +339,11 @@ class PslibNetwork(NetworkBase):
            for dataset in context["env"]["dataset"]:
                type = envs.get_global_env("dataset." + dataset["name"] +
                                           ".type")
-                if type != "DataLoader":
+                if type == "DataLoader":
+                    data_loader = DataLoader(context)
+                    data_loader.get_dataloader(context, dataset_name, context[
+                        "model"][model_dict["name"]]["model"]._data_loader)
+                elif type == "QueueDataset":
                    dataset_class = QueueDataset(context)
                    context["dataset"][dataset[
                        "name"]] = dataset_class.create_dataset(
@@ -323,7 +409,10 @@ class CollectiveNetwork(NetworkBase):
        context["dataset"] = {}
        for dataset in context["env"]["dataset"]:
            type = envs.get_global_env("dataset." + dataset["name"] + ".type")
-            if type != "DataLoader":
+            if type == "QueueDataset":
+                raise ValueError(
+                    "Collective don't support QueueDataset training, please use DataLoader."
+                )
                dataset_class = QueueDataset(context)
                context["dataset"][dataset[
                    "name"]] = dataset_class.create_dataset(dataset["name"],

--- a/core/trainers/framework/runner.py
+++ b/core/trainers/framework/runner.py
@@ -16,10 +16,12 @@ from __future__ import print_function

 import os
 import time
+import warnings
 import numpy as np
 import paddle.fluid as fluid

 from paddlerec.core.utils import envs
+from paddlerec.core.metric import Metric

 __all__ = [
    "RunnerBase", "SingleRunner", "PSRunner", "CollectiveRunner", "PslibRunner"
@@ -77,9 +79,10 @@ class RunnerBase(object):
        name = "dataset." + reader_name + "."

        if envs.get_global_env(name + "type") == "DataLoader":
-            self._executor_dataloader_train(model_dict, context)
+            return self._executor_dataloader_train(model_dict, context)
        else:
            self._executor_dataset_train(model_dict, context)
+            return None

    def _executor_dataset_train(self, model_dict, context):
        reader_name = model_dict["dataset_name"]
@@ -137,8 +140,10 @@ class RunnerBase(object):

        metrics_varnames = []
        metrics_format = []
+        metrics_names = ["total_batch"]
        metrics_format.append("{}: {{}}".format("batch"))
        for name, var in metrics.items():
+            metrics_names.append(name)
            metrics_varnames.append(var.name)
            metrics_format.append("{}: {{}}".format(name))
        metrics_format = ", ".join(metrics_format)
@@ -147,6 +152,7 @@ class RunnerBase(object):
        reader.start()
        batch_id = 0
        scope = context["model"][model_name]["scope"]
+        result = None
        with fluid.scope_guard(scope):
            try:
                while True:
@@ -168,6 +174,10 @@ class RunnerBase(object):
            except fluid.core.EOFException:
                reader.reset()

+        if batch_id > 0:
+            result = dict(zip(metrics_names, metrics))
+        return result
+
    def _get_dataloader_program(self, model_dict, context):
        model_name = model_dict["name"]
        if context["model"][model_name]["compiled_program"] == None:
@@ -275,6 +285,7 @@ class RunnerBase(object):
            return (epoch_id + 1) % epoch_interval == 0

        def save_inference_model():
+            # get global env
            name = "runner." + context["runner_name"] + "."
            save_interval = int(
                envs.get_global_env(name + "save_inference_interval", -1))
@@ -287,18 +298,44 @@ class RunnerBase(object):
            if feed_varnames is None or fetch_varnames is None or feed_varnames == "" or fetch_varnames == "" or \
                    len(feed_varnames) == 0 or len(fetch_varnames) == 0:
                return
-            fetch_vars = [
-                fluid.default_main_program().global_block().vars[varname]
-                for varname in fetch_varnames
-            ]
+
+            # check feed var exist
+            for var_name in feed_varnames:
+                if var_name not in fluid.default_main_program().global_block(
+                ).vars:
+                    raise ValueError(
+                        "Feed variable: {} not in default_main_program, global block has follow vars: {}".
+                        format(var_name,
+                               fluid.default_main_program().global_block()
+                               .vars.keys()))
+
+            # check fetch var exist
+            fetch_vars = []
+            for var_name in fetch_varnames:
+                if var_name not in fluid.default_main_program().global_block(
+                ).vars:
+                    raise ValueError(
+                        "Fetch variable: {} not in default_main_program, global block has follow vars: {}".
+                        format(var_name,
+                               fluid.default_main_program().global_block()
+                               .vars.keys()))
+                else:
+                    fetch_vars.append(fluid.default_main_program()
+                                      .global_block().vars[var_name])
+
            dirname = envs.get_global_env(name + "save_inference_path", None)

            assert dirname is not None
            dirname = os.path.join(dirname, str(epoch_id))

            if is_fleet:
-                context["fleet"].save_inference_model(
-                    context["exe"], dirname, feed_varnames, fetch_vars)
+                warnings.warn(
+                    "Save inference model in cluster training is not recommended! Using save checkpoint instead.",
+                    category=UserWarning,
+                    stacklevel=2)
+                if context["fleet"].worker_index() == 0:
+                    context["fleet"].save_inference_model(
+                        context["exe"], dirname, feed_varnames, fetch_vars)
            else:
                fluid.io.save_inference_model(dirname, feed_varnames,
                                              fetch_vars, context["exe"])
@@ -314,7 +351,8 @@ class RunnerBase(object):
                return
            dirname = os.path.join(dirname, str(epoch_id))
            if is_fleet:
-                context["fleet"].save_persistables(context["exe"], dirname)
+                if context["fleet"].worker_index() == 0:
+                    context["fleet"].save_persistables(context["exe"], dirname)
            else:
                fluid.io.save_persistables(context["exe"], dirname)

@@ -336,11 +374,28 @@ class SingleRunner(RunnerBase):
                                ".epochs"))
        for epoch in range(epochs):
            for model_dict in context["phases"]:
+                model_class = context["model"][model_dict["name"]]["model"]
+                metrics = model_class._metrics
+
                begin_time = time.time()
-                self._run(context, model_dict)
+                result = self._run(context, model_dict)
                end_time = time.time()
                seconds = end_time - begin_time
-                print("epoch {} done, use time: {}".format(epoch, seconds))
+                message = "epoch {} done, use time: {}".format(epoch, seconds)
+                metrics_result = []
+                for key in metrics:
+                    if isinstance(metrics[key], Metric):
+                        _str = metrics[key].calc_global_metrics(
+                            None,
+                            context["model"][model_dict["name"]]["scope"])
+                        metrics_result.append(_str)
+                    elif result is not None:
+                        _str = "{}={}".format(key, result[key])
+                        metrics_result.append(_str)
+                if len(metrics_result) > 0:
+                    message += ", global metrics: " + ", ".join(metrics_result)
+                print(message)
+
                with fluid.scope_guard(context["model"][model_dict["name"]][
                        "scope"]):
                    train_prog = context["model"][model_dict["name"]][
@@ -362,12 +417,32 @@ class PSRunner(RunnerBase):
            envs.get_global_env("runner." + context["runner_name"] +
                                ".epochs"))
        model_dict = context["env"]["phase"][0]
+        model_class = context["model"][model_dict["name"]]["model"]
+        metrics = model_class._metrics
        for epoch in range(epochs):
            begin_time = time.time()
-            self._run(context, model_dict)
+            result = self._run(context, model_dict)
            end_time = time.time()
            seconds = end_time - begin_time
-            print("epoch {} done, use time: {}".format(epoch, seconds))
+            message = "epoch {} done, use time: {}".format(epoch, seconds)
+
+            # TODO, wait for PaddleCloudRoleMaker supports gloo
+            from paddle.fluid.incubate.fleet.base.role_maker import GeneralRoleMaker
+            if context["fleet"] is not None and isinstance(context["fleet"],
+                                                           GeneralRoleMaker):
+                metrics_result = []
+                for key in metrics:
+                    if isinstance(metrics[key], Metric):
+                        _str = metrics[key].calc_global_metrics(
+                            context["fleet"],
+                            context["model"][model_dict["name"]]["scope"])
+                        metrics_result.append(_str)
+                    elif result is not None:
+                        _str = "{}={}".format(key, result[key])
+                        metrics_result.append(_str)
+                if len(metrics_result) > 0:
+                    message += ", global metrics: " + ", ".join(metrics_result)
+            print(message)
            with fluid.scope_guard(context["model"][model_dict["name"]][
                    "scope"]):
                train_prog = context["model"][model_dict["name"]][
@@ -476,14 +551,30 @@ class SingleInferRunner(RunnerBase):

        for index, epoch_name in enumerate(self.epoch_model_name_list):
            for model_dict in context["phases"]:
+                model_class = context["model"][model_dict["name"]]["model"]
+                metrics = model_class._infer_results
                self._load(context, model_dict,
                           self.epoch_model_path_list[index])
                begin_time = time.time()
-                self._run(context, model_dict)
+                result = self._run(context, model_dict)
                end_time = time.time()
                seconds = end_time - begin_time
-                print("Infer {} of {} done, use time: {}".format(model_dict[
-                    "name"], epoch_name, seconds))
+                message = "Infer {} of epoch {} done, use time: {}".format(
+                    model_dict["name"], epoch_name, seconds)
+                metrics_result = []
+                for key in metrics:
+                    if isinstance(metrics[key], Metric):
+                        _str = metrics[key].calc_global_metrics(
+                            None,
+                            context["model"][model_dict["name"]]["scope"])
+                        metrics_result.append(_str)
+                    elif result is not None:
+                        _str = "{}={}".format(key, result[key])
+                        metrics_result.append(_str)
+                if len(metrics_result) > 0:
+                    message += ", global metrics: " + ", ".join(metrics_result)
+                print(message)
+
        context["status"] = "terminal_pass"

    def _load(self, context, model_dict, model_path):

--- a/core/trainers/framework/startup.py
+++ b/core/trainers/framework/startup.py
@@ -17,9 +17,13 @@ from __future__ import print_function
 import warnings

 import paddle.fluid as fluid
+import paddle.fluid.core as core
 from paddlerec.core.utils import envs

-__all__ = ["StartupBase", "SingleStartup", "PSStartup", "CollectiveStartup"]
+__all__ = [
+    "StartupBase", "SingleStartup", "PSStartup", "CollectiveStartup",
+    "FineTuningStartup"
+]


 class StartupBase(object):
@@ -65,6 +69,122 @@ class SingleStartup(StartupBase):
        context["status"] = "train_pass"


+class FineTuningStartup(StartupBase):
+    """R
+    """
+
+    def __init__(self, context):
+        self.op_name_scope = "op_namescope"
+        self.clip_op_name_scope = "@CLIP"
+        self.self.op_role_var_attr_name = core.op_proto_and_checker_maker.kOpRoleVarAttrName(
+        )
+
+        print("Running SingleStartup.")
+
+    def _is_opt_role_op(self, op):
+        # NOTE: depend on oprole to find out whether this op is for
+        # optimize
+        op_maker = core.op_proto_and_checker_maker
+        optimize_role = core.op_proto_and_checker_maker.OpRole.Optimize
+        if op_maker.kOpRoleAttrName() in op.attr_names and \
+                int(op.all_attrs()[op_maker.kOpRoleAttrName()]) == int(optimize_role):
+            return True
+        return False
+
+    def _get_params_grads(self, program):
+        """
+        Get optimizer operators, parameters and gradients from origin_program
+        Returns:
+            opt_ops (list): optimize operators.
+            params_grads (dict): parameter->gradient.
+        """
+        block = program.global_block()
+        params_grads = []
+        # tmp set to dedup
+        optimize_params = set()
+        origin_var_dict = program.global_block().vars
+        for op in block.ops:
+            if self._is_opt_role_op(op):
+                # Todo(chengmo): Whether clip related op belongs to Optimize guard should be discussed
+                # delete clip op from opt_ops when run in Parameter Server mode
+                if self.op_name_scope in op.all_attrs(
+                ) and self.clip_op_name_scope in op.attr(self.op_name_scope):
+                    op._set_attr(
+                        "op_role",
+                        int(core.op_proto_and_checker_maker.OpRole.Backward))
+                    continue
+
+                if op.attr(self.op_role_var_attr_name):
+                    param_name = op.attr(self.op_role_var_attr_name)[0]
+                    grad_name = op.attr(self.op_role_var_attr_name)[1]
+                    if not param_name in optimize_params:
+                        optimize_params.add(param_name)
+                        params_grads.append([
+                            origin_var_dict[param_name],
+                            origin_var_dict[grad_name]
+                        ])
+        return params_grads
+
+    @staticmethod
+    def is_persistable(var):
+        """
+        Check whether the given variable is persistable.
+
+        Args:
+            var(Variable): The variable to be checked.
+
+        Returns:
+            bool: True if the given `var` is persistable
+            False if not.
+
+        Examples:
+            .. code-block:: python
+
+                import paddle.fluid as fluid
+                param = fluid.default_main_program().global_block().var('fc.b')
+                res = fluid.io.is_persistable(param)
+        """
+        if var.desc.type() == core.VarDesc.VarType.FEED_MINIBATCH or \
+                var.desc.type() == core.VarDesc.VarType.FETCH_LIST or \
+                var.desc.type() == core.VarDesc.VarType.READER:
+            return False
+        return var.persistable
+
+    def load(self, context, is_fleet=False, main_program=None):
+        dirname = envs.get_global_env(
+            "runner." + context["runner_name"] + ".init_model_path", None)
+        if dirname is None or dirname == "":
+            return
+        print("going to load ", dirname)
+
+        params_grads = self._get_params_grads(main_program)
+        update_params = [p for p, _ in params_grads]
+        need_load_vars = []
+        parameters = list(
+            filter(FineTuningStartup.is_persistable, main_program.list_vars()))
+
+        for param in parameters:
+            if param not in update_params:
+                need_load_vars.append(param)
+
+        fluid.io.load_vars(context["exe"], dirname, main_program,
+                           need_load_vars)
+        print("load from {} success".format(dirname))
+
+    def startup(self, context):
+        for model_dict in context["phases"]:
+            with fluid.scope_guard(context["model"][model_dict["name"]][
+                    "scope"]):
+                train_prog = context["model"][model_dict["name"]][
+                    "main_program"]
+                startup_prog = context["model"][model_dict["name"]][
+                    "startup_program"]
+                with fluid.program_guard(train_prog, startup_prog):
+                    context["exe"].run(startup_prog)
+                    self.load(context, main_program=train_prog)
+        context["status"] = "train_pass"
+
+
 class PSStartup(StartupBase):
    def __init__(self, context):
        print("Running PSStartup.")

--- a/core/utils/dataloader_instance.py
+++ b/core/utils/dataloader_instance.py
@@ -39,9 +39,21 @@ def dataloader_by_name(readerclass,
        data_path = os.path.join(package_base, data_path.split("::")[1])

    files = [str(data_path) + "/%s" % x for x in os.listdir(data_path)]
+    files.sort()
+
+    need_split_files = False
    if context["engine"] == EngineMode.LOCAL_CLUSTER:
+        # for local cluster: split files for multi process
+        need_split_files = True
+    elif context["engine"] == EngineMode.CLUSTER and context[
+            "cluster_type"] == "K8S":
+        # for k8s mount mode, split files for every node
+        need_split_files = True
+    print("need_split_files: {}".format(need_split_files))
+    if need_split_files:
        files = split_files(files, context["fleet"].worker_index(),
                            context["fleet"].worker_num())
+
    print("file_list : {}".format(files))

    reader = reader_class(yaml_file)
@@ -81,10 +93,20 @@ def slotdataloader_by_name(readerclass, dataset_name, yaml_file, context):
        data_path = os.path.join(package_base, data_path.split("::")[1])

    files = [str(data_path) + "/%s" % x for x in os.listdir(data_path)]
+    files.sort()
+
+    need_split_files = False
    if context["engine"] == EngineMode.LOCAL_CLUSTER:
+        # for local cluster: split files for multi process
+        need_split_files = True
+    elif context["engine"] == EngineMode.CLUSTER and context[
+            "cluster_type"] == "K8S":
+        # for k8s mount mode, split files for every node
+        need_split_files = True
+
+    if need_split_files:
        files = split_files(files, context["fleet"].worker_index(),
                            context["fleet"].worker_num())
-        print("file_list: {}".format(files))

    sparse = get_global_env(name + "sparse_slots", "#")
    if sparse == "":
@@ -135,10 +157,20 @@ def slotdataloader(readerclass, train, yaml_file, context):
        data_path = os.path.join(package_base, data_path.split("::")[1])

    files = [str(data_path) + "/%s" % x for x in os.listdir(data_path)]
+    files.sort()
+
+    need_split_files = False
    if context["engine"] == EngineMode.LOCAL_CLUSTER:
+        # for local cluster: split files for multi process
+        need_split_files = True
+    elif context["engine"] == EngineMode.CLUSTER and context[
+            "cluster_type"] == "K8S":
+        # for k8s mount mode, split files for every node
+        need_split_files = True
+
+    if need_split_files:
        files = split_files(files, context["fleet"].worker_index(),
                            context["fleet"].worker_num())
-        print("file_list: {}".format(files))

    sparse = get_global_env("sparse_slots", "#", namespace)
    if sparse == "":

--- a/doc/custom_reader.md
+++ b/doc/custom_reader.md
-# PaddleRec 自定义数据集及Reader
-
-用户自定义数据集及配置异步Reader，需要关注以下几个步骤：
-
-* [数据集整理](#数据集整理)
-* [在模型组网中加入输入占位符](#在模型组网中加入输入占位符)
-* [Reader实现](#Reader的实现)
-* [在yaml文件中配置Reader](#在yaml文件中配置reader)
-
-我们以CTR-DNN模型为例，给出了从数据整理，变量定义，Reader写法，调试的完整历程。
-
-* [数据及Reader示例-DNN](#数据及Reader示例-DNN)
-
-
-## 数据集整理
-
-PaddleRec支持模型自定义数据集。
-
-关于数据的tips：
-1. 数据量：
-
-    PaddleRec面向大规模数据设计，可以轻松支持亿级的数据读取，工业级的数据读写api：`dataset`在搜索、推荐、信息流等业务得到了充分打磨。
-2. 文件类型:
-
-    支持任意直接可读的文本数据，`dataset`同时支持`.gz`格式的文本压缩数据，无需额外代码，可直接读取。数据样本应以`\n`为标志，按行组织。
-
-3. 文件存放位置：
-
-    文件通常存放在训练节点本地，但同时，`dataset`支持使用`hadoop`远程读取数据，数据无需下载到本地，为dataset配置hadoop相关账户及地址即可。
-4. 数据类型
-
-    Reader处理的是以行为单位的`string`数据，喂入网络的数据需要转为`int`,`float`的数值数据，不支持`string`喂入网络，不建议明文保存及处理训练数据。
-5. Tips
-
-    Dataset模式下，训练线程与数据读取线程的关系强相关，为了多线程充分利用，`强烈建议将文件合理的拆为多个小文件`，尤其是在分布式训练场景下，可以均衡各个节点的数据量，同时加快数据的下载速度。
-
-## 在模型组网中加入输入占位符
-
-Reader读取文件后，产出的数据喂入网络，需要有占位符进行接收。占位符在Paddle中使用`fluid.data`或`fluid.layers.data`进行定义。`data`的定义可以参考[fluid.data](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/fluid_cn/data_cn.html#data)以及[fluid.layers.data](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/layers_cn/data_cn.html#data)。
-
-假如您希望输入三个数据，分别是维度32的数据A，维度变长的稀疏数据B，以及一个一维的标签数据C，并希望梯度可以经过该变量向前传递，则示例如下：
-
-数据A的定义：
-```python
-var_a = fluid.data(name='A', shape= [-1, 32], dtype='float32')
-```
-
-数据B的定义，变长数据的使用可以参考[LoDTensor](https://www.paddlepaddle.org.cn/documentation/docs/zh/beginners_guide/basic_concept/lod_tensor.html#cn-user-guide-lod-tensor)：
-```python
-var_b = fluid.data(name='B', shape=[-1, 1], lod_level=1, dtype='int64')
-```
-
-数据C的定义：
-```python
-var_c = fluid.data(name='C', shape=[-1, 1], dtype='int32')
-var_c.stop_gradient = False
-```
-
-当我们完成以上三个数据的定义后，在PaddleRec的模型定义中，还需将其加入model基类成员变量`self._data_var`
-
-```python
-self._data_var.append(var_a)
-self._data_var.append(var_b)
-self._data_var.append(var_c)
-```
-至此，我们完成了在组网中定义输入数据的工作。
-
-## Reader的实现
-
-### Reader的实现范式
-
-Reader的逻辑需要一个单独的python文件进行描述。我们试写一个`test_reader.py`，实现的具体流程如下：
-1. 首先我们需要引入Reader基类
-
-    ```python
-    from paddlerec.core.reader import ReaderBase
-    ```
-2. 创建一个子类，继承Reader的基类，训练所需Reader命名为`TrainerReader`
-    ```python
-    class TrainerReader(ReaderBase):
-        def init(self):
-            pass
-
-        def generator_sample(self, line):
-            pass
-    ```
-
-3. 在`init(self)`函数中声明一些在数据读取中会用到的变量，必要时可以在`config.yaml`文件中配置变量，利用`env.get_global_env()`拿到。
-   
-    比如，我们希望从yaml文件中读取一个数据预处理变量`avg=10`，目的是将数据A的数据缩小10倍，可以这样实现：
-
-    首先更改yaml文件，在某个space下加入该变量
-
-    ```yaml
-    ...
-    train:
-        reader:
-            avg: 10
-    ...
-    ```
-
-
-    再更改Reader的init函数
-
-    ```python
-    from paddlerec.core.utils import envs
-    class TrainerReader(Reader):
-        def init(self):
-            self.avg = envs.get_global_env("avg", None, "train.reader")
-
-        def generator_sample(self, line):
-            pass
-    ```
-
-4. 继承并实现基类中的`generate_sample(self, line)`函数，逐行读取数据。
-   - 该函数应返回一个可以迭代的reader方法(带有yield的函数不再是一个普通的函数，而是一个生成器generator，成为了可以迭代的对象，等价于一个数组、链表、文件、字符串etc.)
-   - 在这个可以迭代的函数中，如示例代码中的`def reader()`，我们定义数据读取的逻辑。以行为单位的数据进行截取，转换及预处理。
-   - 最后，我们需要将数据整理为特定的格式，才能够被PaddleRec的Reader正确读取，并灌入的训练的网络中。简单来说，数据的输出顺序与我们在网络中创建的`inputs`必须是严格一一对应的，并转换为类似字典的形式。
-    
-    示例： 假设数据ABC在文本数据中，每行以这样的形式存储：
-    ```shell
-    0.1,0.2,0.3...3.0,3.1,3.2 \t 99999,99998,99997 \t 1 \n
-    ```
-
-    则示例代码如下：
-    ```python
-    from paddlerec.core.utils import envs
-    class TrainerReader(Reader):
-        def init(self):
-            self.avg = envs.get_global_env("avg", None, "train.reader")
-
-        def generator_sample(self, line):
-            
-            def reader(self, line):
-                # 先分割 '\n'， 再以 '\t'为标志分割为list
-                variables = (line.strip('\n')).split('\t')
-
-                # A是第一个元素，并且每个数据之间使用','分割
-                var_a = variables[0].split(',') # list
-                var_a = [float(i) / self.avg for i in var_a] # 将str数据转换为float
-                
-
-                # B是第二个元素，同样以 ',' 分割
-                var_b = variables[1].split(',') # list
-                var_b = [int(i) for i in var_b] # 将str数据转换为int
-
-                # C是第三个元素, 只有一个元素，没有分割符
-                var_c = variables[2]
-                var_c = int(var_c) # 将str数据转换为int
-                var_c = [var_c] # 将单独的数据元素置入list中
-
-                # 将数据与数据名结合，组织为dict的形式
-                # 如下，output形式为{ A: var_a, B: var_b, C: var_c}
-                variable_name = ['A', 'B', 'C']
-                output = zip(variable_name, [var_a] + [var_b] + [var_c])
-
-                # 将数据输出，使用yield方法，将该函数变为了一个可迭代的对象
-                yield output
-
-    ```
-    
-    至此，我们完成了Reader的实现。
-
-
-### 在yaml文件中配置Reader
-
-在模型的yaml配置文件中，主要的修改是三个，如下
-
-```yaml
-reader:
-    batch_size: 2
-    class: "{workspace}/reader.py"
-    train_data_path: "{workspace}/data/train_data"
-    reader_debug_mode: False
-```
-
-batch_size: 顾名思义，是小批量训练时的样本大小
-class: 运行改模型所需reader的路径
-train_data_path: 训练数据所在文件夹
-reader_debug_mode: 测试reader语法，及输出是否符合预期的debug模式的开关
-
-
-## 数据及Reader示例-DNN
-
-Reader代码来源于[criteo_reader.py](../models/rank/criteo_reader.py), 组网代码来源于[model.py](../models/rank/dnn/model.py)
-
-### Criteo数据集格式
-
-CTR-DNN训练及测试数据集选用[Display Advertising Challenge](https://www.kaggle.com/c/criteo-display-ad-challenge/)所用的Criteo数据集。该数据集包括两部分：训练集和测试集。训练集包含一段时间内Criteo的部分流量，测试集则对应训练数据后一天的广告点击流量。
-每一行数据格式如下所示：
-```bash
-<label> <integer feature 1> ... <integer feature 13> <categorical feature 1> ... <categorical feature 26>
-```
-其中```<label>```表示广告是否被点击，点击用1表示，未点击用0表示。```<integer feature>```代表数值特征（连续特征），共有13个连续特征。```<categorical feature>```代表分类特征（离散特征），共有26个离散特征。相邻两个特征用```\t```分隔，缺失特征用空格表示。测试集中```<label>```特征已被移除。
-
-### Criteo数据集的预处理
-
-数据预处理共包括两步：
- 将原始训练集按9:1划分为训练集和验证集
- 数值特征（连续特征）需进行归一化处理，但需要注意的是，对每一个特征```<integer feature i>```，归一化时用到的最大值并不是用全局最大值，而是取排序后95%位置处的特征值作为最大值，同时保留极值。
-
-### CTR网络输入的定义
-
-正如前所述，Criteo数据集中，分为连续数据与离散（稀疏）数据，所以整体而言，CTR-DNN模型的数据输入层包括三个，分别是：`dense_input`用于输入连续数据，维度由超参数`dense_feature_dim`指定，数据类型是归一化后的浮点型数据。`sparse_input_ids`用于记录离散数据，在Criteo数据集中，共有26个slot，所以我们创建了名为`C1~C26`的26个稀疏参数输入，并设置`lod_level=1`，代表其为变长数据，数据类型为整数；最后是每条样本的`label`，代表了是否被点击，数据类型是整数，0代表负样例，1代表正样例。
-
-在Paddle中数据输入的声明使用`paddle.fluid.layers.data()`，会创建指定类型的占位符，数据IO会依据此定义进行数据的输入。
-
-稀疏参数输入的定义:
-```python
-def sparse_inputs():
-    ids = envs.get_global_env("hyper_parameters.sparse_inputs_slots", None)
-
-    sparse_input_ids = [
-        fluid.layers.data(name="S" + str(i),
-                            shape=[1],
-                            lod_level=1,
-                            dtype="int64") for i in range(1, ids)
-    ]
-    return sparse_input_ids
-```
-
-稠密参数输入的定义：
-```python
-def dense_input():
-    dim = envs.get_global_env("hyper_parameters.dense_input_dim", None)
-
-    dense_input_var = fluid.layers.data(name="D",
-                                        shape=[dim],
-                                        dtype="float32")
-    return dense_input_var
-```
-
-标签的定义：
-```python
-def label_input():
-    label = fluid.layers.data(name="click", shape=[1], dtype="int64")
-    return label
-```
-
-组合起来，正确的声明他们：
-```python
-self.sparse_inputs = sparse_inputs()
-self.dense_input = dense_input()
-self.label_input = label_input()
-
-self._data_var.append(self.dense_input)
-
-for input in self.sparse_inputs:
-    self._data_var.append(input)
-
-self._data_var.append(self.label_input)
-
-```
-
-
-### Criteo Reader写法
-
-```python
-# 引入PaddleRec的Reader基类
-from paddlerec.core.reader import ReaderBase
-# 引入PaddleRec的读取yaml配置文件的方法
-from paddlerec.core.utils import envs
-
-# 定义TrainReader，需要继承 paddlerec.core.reader.Reader
-class Reader(ReaderBase)::
-
-    # 数据预处理逻辑，继承自基类
-    # 如果无需处理， 使用pass跳过该函数的执行
-    def init(self):
-        self.cont_min_ = [0, -3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
-        self.cont_max_ = [20, 600, 100, 50, 64000, 500, 100, 50, 500, 10, 10, 10, 50]
-        self.cont_diff_ = [20, 603, 100, 50, 64000, 500, 100, 50, 500, 10, 10, 10, 50]
-        self.hash_dim_ = envs.get_global_env("hyper_parameters.sparse_feature_number", None, "train.model")
-        self.continuous_range_ = range(1, 14)
-        self.categorical_range_ = range(14, 40)
-
-    # 读取数据方法，继承自基类
-    # 实现可以迭代的reader函数，逐行处理数据
-    def generate_sample(self, line):
-        """
-        Read the data line by line and process it as a dictionary
-        """
-
-        def reader():
-            """
-            This function needs to be implemented by the user, based on data format
-            """
-            features = line.rstrip('\n').split('\t')
-
-            dense_feature = []
-            sparse_feature = []
-            for idx in self.continuous_range_:
-                if features[idx] == "":
-                    dense_feature.append(0.0)
-                else:
-                    dense_feature.append(
-                        (float(features[idx]) - self.cont_min_[idx - 1]) /
-                        self.cont_diff_[idx - 1])
-
-            for idx in self.categorical_range_:
-                sparse_feature.append(
-                    [hash(str(idx) + features[idx]) % self.hash_dim_])
-            label = [int(features[0])]
-            feature_name = ["D"]
-            for idx in self.categorical_range_:
-                feature_name.append("S" + str(idx - 13))
-            feature_name.append("label")
-            yield zip(feature_name, [dense_feature] + sparse_feature + [label])
-
-        return reader
-```
-
-
-### 调试Reader
-
-在Linux下运行时，默认启动`Dataset`模式，在Win/Mac下运行时，默认启动`Dataloader`模式。
-
-通过在`config.yaml`中添加或修改`reader_debug_mode=True`打开debug模式，只会结合组网运行reader的部分，读取10条样本，并print，方便您观察格式是否符合预期或隐藏bug。
-```yaml
-reader:
-    batch_size: 2
-    class: "{workspace}/../criteo_reader.py"
-    train_data_path: "{workspace}/data/train"
-    reader_debug_mode: True
-```
-
-修改后，使用paddlerec.run执行该修改后的yaml文件，可以观察输出。
-```bash
-python -m paddlerec.run -m ./models/rank/dnn/config.yaml -e single
-```
-
-### Dataset调试
-
-dataset输出的数据格式如下：
-` dense_input:size ; dense_input:value ; sparse_input:size ; sparse_input:value ; ... ; sparse_input:size ; sparse_input:value ; label:size ; label:value `
-
-基本规律是对于每个变量，会先输出其维度大小，再输出其具体值。
-
-直接debug `criteo_reader`理想的输出为(截取了一个片段)：
-```bash
-...
-13 0.0 0.00497512437811 0.05 0.08 0.207421875 0.028 0.35 0.08 0.082 0.0 0.4 0.0 0.08 1 737395 1 210498 1 903564 1 286224 1 286835 1 906818 1 90
-6116 1 67180 1 27346 1 51086 1 142177 1 95024 1 157883 1 873363 1 600281 1 812592 1 228085 1 35900 1 880474 1 984402 1 100885 1 26235 1 410878 1 798162 1 499868 1 306163 1 0
-...
-```
-可以看到首先输出的是13维的dense参数，随后是分立的sparse参数，最后一个是1维的label，数值为0，输出符合预期。
-
->使用Dataset的一些注意事项
-> - Dataset的基本原理：将数据print到缓存，再由C++端的代码实现读取，因此，我们不能在dataset的读取代码中，加入与数据读取无关的print信息，会导致C++端拿到错误的数据信息。
-> - dataset目前只支持在`unbuntu`及`CentOS`等标准Linux环境下使用，在`Windows`及`Mac`下使用时，会产生预料之外的错误，请知悉。
-
-### DataLoader调试
-
-dataloader的输出格式为`list: [ list[var_1], list[var_2], ... , list[var_3]]`，每条样本的数据会被放在一个 **list[list]** 中，list[0]为第一个variable。
-
-直接debug `criteo_reader`理想的输出为(截取了一个片段)：
-```bash
-...
-[[0.0, 0.004975124378109453, 0.05, 0.08, 0.207421875, 0.028, 0.35, 0.08, 0.082, 0.0, 0.4, 0.0, 0.08], [560746], [902436], [262029], [182633], [368411], [735166], [321120], [39572], [185732], [140298], [926671], [81559], [461249], [728372], [915018], [907965], [818961], [850958], [311492], [980340], [254960], [175041], [524857], [764893], [526288], [220126], [0]]
-...
-```
-可以看到首先输出的是13维的dense参数的list，随后是分立的sparse参数，各自在一个list中，最后一个是1维的label的list，数值为0，输出符合预期。
--- a/doc/distributed_train.md
+++ b/doc/distributed_train.md
@@ -9,6 +9,7 @@
    - [第三步：增加集群运行`backend.yaml`配置](#第三步增加集群运行backendyaml配置)
      - [MPI集群的Parameter Server模式配置](#mpi集群的parameter-server模式配置)
      - [K8S集群的Collective模式配置](#k8s集群的collective模式配置)
+      - [K8S集群的PS-CPU模式配置](#k8s集群的ps-cpu模式配置)
    - [第四步：任务提交](#第四步任务提交)
  - [使用PaddleCloud Client提交](#使用paddlecloud-client提交)
    - [第一步：在`before_hook.sh`里手动安装PaddleRec](#第一步在before_hooksh里手动安装paddlerec)
@@ -34,10 +35,10 @@

 分布式运行首先需要更改`config.yaml`，主要调整以下内容：

- workspace: 调整为在节点运行时的工作目录
- runner_class: 从单机的"train"调整为"cluster_train"
- fleet_mode: 选则参数服务器模式，抑或GPU Collective模式
- distribute_strategy: 可选项，选择分布式训练的策略
+- workspace: 调整为在远程节点运行时的工作目录，一般设置为`"./"`即可
+- runner_class: 从单机的"train"调整为"cluster_train"，单机训练->分布式训练（例外情况，k8s上单机单卡训练仍然为train，后续支持）
+- fleet_mode: 选择参数服务器模式(ps)，或者GPU的all-reduce模式(collective)
+- distribute_strategy: 可选项，选择分布式训练的策略，目前只在参数服务器模式下生效，可选项:`sync、asycn、half_async、geo`

 配置选项具体参数，可以参考[yaml配置说明](./yaml.md)

@@ -47,50 +48,72 @@

 ```yaml
 # workspace
-workspace: "paddlerec.models.rank.dnn"
+workspace: "models/rank/dnn"

 mode: [single_cpu_train]
-# config of each runner.
-# runner is a kind of paddle training class, which wraps the train/infer process.
 runner:
 - name: single_cpu_train
  class: train
-  # num of epochs
  epochs: 4
-  # device to run training or infer
  device: cpu
-  save_checkpoint_interval: 2 # save model interval of epochs
-  save_checkpoint_path: "increment_dnn" # save checkpoint path
-  init_model_path: "" # load model path
+  save_checkpoint_interval: 2 
+  save_checkpoint_path: "increment_dnn" 
+  init_model_path: "" 
  print_interval: 10
  phases: [phase1]
+
+dataset:
+- name: dataloader_train 
+  batch_size: 2
+  type: DataLoader 
+  data_path: "{workspace}/data/sample_data/train"
+  sparse_slots: "click 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26"
+  dense_slots: "dense_var:13"
+
+phase:
+- name: phase1
+  model: "{workspace}/model.py"
+  dataset_name: dataloader_train 
+  thread_num: 1
 ```

 分布式的训练配置可以改为：
 ```yaml
-# workspace
-# 改变一：代码上传至节点后，与运行shell同在一个默认目录下
+# 改变一：代码上传至节点后，在默认目录下
 workspace: "./" 

 mode: [ps_cluster]
-# config of each runner.
-# runner is a kind of paddle training class, which wraps the train/infer process.
 runner:
 - name: ps_cluster
  # 改变二：调整runner的class
  class: cluster_train
-  # num of epochs
  epochs: 4
-  # device to run training or infer
  device: cpu
  # 改变三 & 四： 指定fleet_mode 与 distribute_strategy
  fleet_mode: ps
  distribute_strategy: async
-  save_checkpoint_interval: 2 # save model interval of epochs
-  save_checkpoint_path: "increment_dnn" # save checkpoint path
-  init_model_path: "" # load model path
+  save_checkpoint_interval: 2 
+  save_checkpoint_path: "increment_dnn" 
+  init_model_path: "" 
  print_interval: 10
  phases: [phase1]
+
+dataset:
+- name: dataloader_train 
+  batch_size: 2
+  type: DataLoader 
+  # 改变五： 改变数据的读取目录
+  # 通常而言，mpi模式下，数据会下载到远程节点执行目录的'./train_data'下， k8s则与挂载位置有关
+  data_path: "{workspace}/train_data"
+  sparse_slots: "click 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26"
+  dense_slots: "dense_var:13"
+
+phase:
+- name: phase1
+  model: "{workspace}/model.py"
+  dataset_name: dataloader_train 
+  # 分布式训练节点的CPU_NUM环境变量与thread_num相等，多个phase时，取最大的thread_num
+  thread_num: 1
 ```

 除此之外，还需关注数据及模型加载的路径，一般而言：
@@ -110,6 +133,8 @@ cluster_type: mpi # k8s 可选
 config:
  # 填写任务运行的paddle官方版本号 >= 1.7.2， 默认1.7.2
  paddle_version: "1.7.2" 
+  # 是否使用PaddleCloud运行环境下的Python3，默认使用python2
+  use_python3: 1

  # hdfs/afs的配置信息填写
  fs_name: "afs://xxx.com"
@@ -130,11 +155,13 @@ config:

  # paddle参数服务器分布式底层超参，无特殊需求不理不改
  communicator:
+    # 使用SGD优化器时，建议设置为1
    FLAGS_communicator_is_sgd_optimizer: 0
+    # 以下三个变量默认都等于训练时的线程数：CPU_NUM
    FLAGS_communicator_send_queue_size: 5
-    FLAGS_communicator_thread_pool_size: 32
    FLAGS_communicator_max_merge_var_num: 5
    FLAGS_communicator_max_send_grad_num_before_recv: 5
+    FLAGS_communicator_thread_pool_size: 32
    FLAGS_communicator_fake_rpc: 0
    FLAGS_rpc_retry_times: 3
  
@@ -165,7 +192,14 @@ submit:
  # for k8s gpu        
  # k8s gpu 模式下，训练节点数，及每个节点上的GPU卡数
  k8s_trainers: 2
+  k8s_cpu_cores: 4
  k8s_gpu_card: 1
+
+  # for k8s ps-cpu
+  k8s_trainers: 2
+  k8s_cpu_cores: 4
+  k8s_ps_num: 2
+  k8s_ps_cores: 4
  
 ```

@@ -173,18 +207,51 @@ submit:

 除此之外，我们还需要关注上传到工作目录的文件(`files选项`)的路径问题，在示例中是`./*.py`，说明我们执行任务提交时，与这些py文件在同一目录。若不在同一目录，则需要适当调整files路径，或改为这些文件的绝对路径。

-不建议利用`files`上传数据文件，可以通过指定`train_data_path`自动下载，或指定`afs_remote_mount_point`挂载实现数据到节点的转移。
+不建议利用`files`上传过大的数据文件，可以通过指定`train_data_path`自动下载，或在k8s模式下指定`afs_remote_mount_point`挂载实现数据到节点的转移。

 #### MPI集群的Parameter Server模式配置

 下面是一个利用PaddleCloud提交MPI参数服务器模式任务的`backend.yaml`示例

+首先调整`config.yaml`:
+```yaml
+workspace: "./"
+mode: [ps_cluster]
+
+dataset:
+- name: dataloader_train 
+  batch_size: 2
+  type: DataLoader 
+  data_path: "{workspace}/train_data"
+  sparse_slots: "click 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26"
+  dense_slots: "dense_var:13"
+
+runner:
+- name: ps_cluster
+  class: cluster_train
+  epochs: 2
+  device: cpu
+  fleet_mode: ps
+  save_checkpoint_interval: 1 
+  save_checkpoint_path: "increment_dnn" 
+  init_model_path: "" 
+  print_interval: 1
+  phases: [phase1]
+
+phase:
+- name: phase1
+  model: "{workspace}/model.py"
+  dataset_name: dataloader_train 
+  thread_num: 1
+```
+
+
+再新增`backend.yaml`
 ```yaml
 backend: "PaddleCloud"
-cluster_type: mpi # k8s 可选
+cluster_type: mpi # k8s可选

 config:
-  # 填写任务运行的paddle官方版本号 >= 1.7.2， 默认1.7.2
  paddle_version: "1.7.2" 

  # hdfs/afs的配置信息填写
@@ -229,9 +296,45 @@ submit:

 下面是一个利用PaddleCloud提交K8S集群进行GPU训练的`backend.yaml`示例

+首先调整`config.yaml`
+
+```yaml
+workspace: "./"
+mode: [collective_cluster]
+
+dataset:
+- name: dataloader_train 
+  batch_size: 2
+  type: DataLoader 
+  data_path: "{workspace}/afs/挂载数据文件夹的路径"
+  sparse_slots: "click 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26"
+  dense_slots: "dense_var:13"
+
+runner:
+- name: collective_cluster
+  class: cluster_train
+  epochs: 2
+  device: gpu
+  fleet_mode: collective
+  save_checkpoint_interval: 1 # save model interval of epochs
+  save_checkpoint_path: "increment_dnn" # save checkpoint path
+  init_model_path: "" # load model path
+  print_interval: 1
+  phases: [phase1]
+
+phase:
+- name: phase1
+  model: "{workspace}/model.py"
+  dataset_name: dataloader_train 
+  thread_num: 1
+```
+
+
+再增加`backend.yaml`
+
 ```yaml
 backend: "PaddleCloud"
-cluster_type: mpi # k8s 可选
+cluster_type: k8s # mpi 可选

 config:
  # 填写任务运行的paddle官方版本号 >= 1.7.2， 默认1.7.2
@@ -271,9 +374,93 @@ submit:
  # for k8s gpu        
  # k8s gpu 模式下，训练节点数，及每个节点上的GPU卡数
  k8s_trainers: 2
+  k8s_cpu_cores: 4
  k8s_gpu_card: 1
 ```

+#### K8S集群的PS-CPU模式配置
+下面是一个利用PaddleCloud提交K8S集群进行参数服务器CPU训练的`backend.yaml`示例
+
+首先调整`config.yaml`:
+```yaml
+workspace: "./"
+mode: [ps_cluster]
+
+dataset:
+- name: dataloader_train 
+  batch_size: 2
+  type: DataLoader 
+  data_path: "{workspace}/afs/挂载数据文件夹的路径"
+  sparse_slots: "click 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26"
+  dense_slots: "dense_var:13"
+
+runner:
+- name: ps_cluster
+  class: cluster_train
+  epochs: 2
+  device: cpu
+  fleet_mode: ps
+  save_checkpoint_interval: 1 
+  save_checkpoint_path: "increment_dnn" 
+  init_model_path: "" 
+  print_interval: 1
+  phases: [phase1]
+
+phase:
+- name: phase1
+  model: "{workspace}/model.py"
+  dataset_name: dataloader_train 
+  thread_num: 1
+```
+
+再新增`backend.yaml`
+```yaml
+backend: "PaddleCloud"
+cluster_type: k8s # mpi 可选
+
+config:
+  # 填写任务运行的paddle官方版本号 >= 1.7.2， 默认1.7.2
+  paddle_version: "1.7.2" 
+
+  # hdfs/afs的配置信息填写
+  fs_name: "afs://xxx.com"
+  fs_ugi: "usr,pwd"
+
+  # 填任务输出目录的远程地址，如afs:/user/your/path/ 则此处填 /user/your/path
+  output_path: "" 
+  
+  # for k8s
+  # 填远程挂载地址，如afs:/user/your/path/ 则此处填 /user/your/path
+  afs_remote_mount_point: "" 
+  
+submit:
+  # PaddleCloud 个人信息 AK 及 SK
+  ak: ""
+  sk: ""
+  
+  # 任务运行优先级，默认high
+  priority: "high"
+  
+  # 任务名称
+  job_name: "PaddleRec_CTR"
+
+  # 训练资源所在组
+  group: ""
+
+  # 节点上的任务启动命令
+  start_cmd: "python -m paddlerec.run -m ./config.yaml"
+  
+  # 本地需要上传到节点工作目录的文件
+  files: ./*.py ./*.yaml
+  
+  # for k8s gpu        
+  # k8s ps-cpu 模式下，训练节点数，参数服务器节点数，及每个节点上的cpu核心数及内存限制
+  k8s_trainers: 2
+  k8s_cpu_cores: 4
+  k8s_ps_num: 2
+  k8s_ps_cores: 4
+```
+
 ### 第四步：任务提交

 当我们准备好`config.yaml`与`backend.yaml`，便可以进行一键任务提交，命令为：

--- a/doc/metrics.md
+++ b/doc/metrics.md
+# 如何给模型增加Metric
+
+## PaddleRec Metric使用示例
+```
+from paddlerec.core.model import ModelBase
+from paddlerec.core.metrics import RecallK
+
+class Model(ModelBase):
+    def __init__(self, config):
+        ModelBase.__init__(self, config)
+
+    def net(self, inputs, is_infer=False):
+        ...
+        acc = RecallK(input=logits, label=label, k=20)
+        self._metrics["Train_P@20"] = acc
+```
+## Metric类
+### 成员变量
+> _global_metric_state_vars（dict), 
+字典类型，用以存储metric计算过程中需要的中间状态变量。一般情况下，这些中间状态需要是Persistable=True的变量，所以会在模型保存的时候也会被保存下来。因此infer阶段需手动将这些中间状态值清零，进而保证预测结果的正确性。
+
+### 成员函数
+> clear(self, scope):
+从scope中将self._global_metric_state_vars中的状态值全清零。该函数一般用在**infer**阶段开始的时候。用以保证预测指标的正确性。
+
+> calc_global_metrics(self, fleet, scope=None):
+将self._global_metric_state_vars中的状态值在所有训练节点上做all_reduce操作，进而下一步调用_calculate()函数计算全局指标。若fleet=None，则all_reduce的结果为自己本身，即单机全局指标计算。
+
+> get_result(self): 返回训练过程中需要fetch，并定期打印至屏幕的变量。返回类型为dict。
+
+## Metrics
+### AUC
+> AUC(input ,label, curve='ROC', num_thresholds=2**12 - 1, topk=1, slide_steps=1)
+
+Auc，全称Area Under the Curve(AUC)，该层根据前向输出和标签计算AUC，在二分类(binary classification)估计中广泛使用。在二分类(binary classification)中广泛使用。相关定义参考 https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve 。
+
+#### 参数
+- **input(Tensor|LoDTensor)**: 数据类型为float32，float64。浮点二维变量。输入为网络的预测值。shape为[batch_size, 2]。
+- **label(Tensor|LoDTensor)**: 数据类型为int64，int32。输入为数据集的标签。shape为[batch_size, 1]。
+- **curve(str)**: 曲线类型，可以为 ROC 或 PR，默认 ROC。 
+- **num_thresholds(int)**: 将roc曲线离散化时使用的临界值数。默认200。
+- **topk(int)**: 取topk的输出值用于计算。
+- **slide_steps(int)**: - 当计算batch auc时，不仅用当前步也用于先前步。slide_steps=1，表示用当前步；slide_steps = 3表示用当前步和前两步；slide_steps = 0，则用所有步。
+
+#### 返回值
+该指标训练过程中定期的变量有两个：
+- **AUC**: 整体AUC值
+- **BATCH_AUC**：当前batch的AUC值
+
+
+### PrecisionRecall
+> PrecisionRecall(input, label, class_num)
+
+计算precison, recall, f1。
+
+#### 参数
+- **input(Tensor|LoDTensor)**: 数据类型为float32,float64。输入为网络的预测值。shape为[batch_size, class_num]
+- **label(Tensor|LoDTensor)**: 数据类型为int32。输入为数据集的标签。shape为 [batch_size, 1] 
+- **class_num(int)**: 类别个数。
+
+#### 返回值
+- **[TP FP TN FN]**: 形状为[class_num, 4]的变量，用以表征每种类型的TP，FP，TN和FN值。TP=true positive, FP=false positive, TN=true negative, FN=false negative。若需计算每种类型的precison, recall，f1, 则可根据如下公式进行计算：
+precision = TP / (TP + FP); recall = TP = TP / (TP + FN); F1 = 2 * precision * recall / (precision + recall)。
+
+- **precision_recall_f1**: 形状为[6]，分别代表[macro_avg_precision, macro_avg_recall, macro_avg_f1, micro_avg_precision, micro_avg_recall, micro_avg_f1]，这里macro代表先计算每种类型的准确率，召回率，F1，然后求平均。micro代表先计算所有类型的整体TP，TN， FP, FN等中间值，然后在计算准确率，召回率，F1.
+
+
+### RecallK
+> RecallK(input, label, k=20)
+
+TopK的召回准确率，对于任意一条样本来说，若前top_k个分类结果中包含正确分类标签，则视为正样本。
+
+#### 参数
+- **input(Tensor|LoDTensor)**: 数据类型为float32,float64。输入为网络的预测值。shape为[batch_size, class_dim]
+- **label(Tensor|LoDTensor)**: 数据类型为int64，int32。输入为数据集的标签。shape为 [batch_size, 1] 
+- **k(int)**: 取每个类别中top_k个预测值用于计算召回准确率。
+
+#### 返回值
+- **InsCnt**：样本总数
+- **RecallCnt**: topk可以正确被召回的样本数
+- **Acc(Recall@k)**: RecallCnt/InsCnt，即Topk召回准确率。
+
+## PairWise_PN
+> PosNegRatio(pos_score, neg_score)
+
+正逆序指标，一般用在输入是pairwise的模型中。例如输入既包含正样本，也包含负样本，模型需要去学习最大化正负样本打分的差异。
+
+#### 参数
+- **pos_score(Tensor|LoDTensor）**: 正样本的打分，数据类型为float32，float64。浮点二维变量，值的范围为[0,1]。
+- **neg_score(Tensor|LoDTensor)**：负样本的打分。数据类型为float32，float64。浮点二维变量，值的范围为[0,1]。
+
+#### 返回值
+- **RightCnt**: pos_score > neg_score的样本数
+- **WrongCnt**: pos_score <= neg_score的样本数
+- **PN**: (RightCnt + 1.0) / (WrongCnt + 1.0), 正逆序，+1.0是为了避免除0错误。
+
+### Customized_Metric
+如果你需要在自定义metric，那么你需要按如下步骤操作：
+1. 继承paddlerec.core.Metric，定义你的MyMetric类。
+2. 在MyMetric的构造函数中，自定义Metric组网，声明self._global_metric_state_vars私有变量。
+3. 定义_calculate(global_metrics)，全局指标计算。该函数的输入globla_metrics，存储了self._global_metric_state_vars中所有中间状态变量的全局统计值。最终结果以str格式返回。
+
+自定义Metric模版如下，你可以参考注释，或paddlerec.core.metrics下已经实现的precision_recall， auc, pairwise_pn， recall_k等指标的计算方式，自定义自己的Metric类。
+```
+from paddlerec.core.Metric import Metric
+
+class MyMetric(Metric):
+    def __init__(self):
+        # 1. 自定义Metric组网
+        ** 1. your code **
+
+        # 2. 设置中间状态字典
+        self._global_metric_state_vars = dict()
+        ** 2. your code **
+
+    def get_result(self):
+        # 3. 定义训练过程中需要打印的变量，以字典格式返回
+        self. _metrics = dict()
+        ** 3. your code **
+
+    def _calculate(self, global_metrics):
+        # 4. 全局指标计算，global_metrics为字典类型，存储了self._global_metric_state_vars中所有中间状态变量的全局统计值。返回格式为str。
+        ** your code **
+```
--- a/doc/model_develop.md
+++ b/doc/model_develop.md
@@ -92,7 +92,7 @@ def input_data(self, is_infer=False, **kwargs):
        return train_inputs
 ```

-更多数据读取教程，请参考[自定义数据集及Reader](custom_dataset_reader.md)
+更多数据读取教程，请参考[自定义数据集及Reader](custom_reader.md)


 ### 组网的定义
@@ -113,6 +113,8 @@ def input_data(self, is_infer=False, **kwargs):

 可以参考官方模型的示例学习net的构造方法。

+除可以使用Paddle的Metrics接口外，PaddleRec也统一封装了一些常见的Metrics评价指标，并允许开发者定义自己的Metrics类，相关文件参考[Metrics开发文档](metrics.md)。
+
 ## 如何运行自定义模型

 记录`model.py`,`config.yaml`及数据读取`reader.py`的文件路径，建议置于同一文件夹下，如`/home/custom_model`下，更改`config.yaml`中的配置选项

--- a/doc/pre_train_model.md
+++ b/doc/pre_train_model.md
+# PaddleRec 预训练模型
+
+PaddleRec基于业务实践，使用真实数据，产出了推荐领域算法的若干预训练模型，方便开发者进行算法调研。
+
+## 文本分类预训练模型
+
+### 获取地址
+
+```bash
+wget xxx.tar.gz
+```
+
+### 使用方法
+
+解压后，得到的是一个paddle的模型文件夹，使用`PaddleRec/models/contentunderstanding/classification_finetue`模型进行加载
--- a/doc/train.md
+++ b/doc/train.md
@@ -20,7 +20,7 @@ python -m paddlerec.run -m paddlerec.models.xxx.yyy
 例如启动`recall`下的`word2vec`模型的默认配置;

 ```shell
-python -m paddlerec.run -m paddlerec.models.recall.word2vec
+python -m paddlerec.run -m models/recall/word2vec
 ```

 ### 2. 启动内置模型的个性化配置训练

--- a/doc/yaml.md
+++ b/doc/yaml.md
-# PaddleRec yaml配置说明
+# PaddleRec config.yaml配置说明

 ## 全局变量

@@ -12,31 +12,31 @@

 ## runner变量

-|             名称              |     类型     |                     取值                      | 是否必须 |                               作用描述                               |
-| :---------------------------: | :----------: | :-------------------------------------------: | :------: | :------------------------------------------------------------------: |
-|             name              |    string    |                     任意                      |    是    |                            指定runner名称                            |
+|             名称              |     类型     |                           取值                            | 是否必须 |                               作用描述                               |
+| :---------------------------: | :----------: | :-------------------------------------------------------: | :------: | :------------------------------------------------------------------: |
+|             name              |    string    |                           任意                            |    是    |                            指定runner名称                            |
 |             class             |    string    | train(默认) / infer / local_cluster_train / cluster_train |    是    |           指定运行runner的类别（单机/分布式， 训练/预测）            |
-|            device             |    string    |                cpu(默认) / gpu                |    否    |                             程序执行设备                             |
-|          fleet_mode           |    string    |         ps(默认) / pslib / collective         |    否    |                            分布式运行模式                            |
-|         selected_gpus         |    string    |                   "0"(默认)                   |    否    | 程序运行GPU卡号，若以"0,1"的方式指定多卡，则会默认启用collective模式 |
-|          worker_num           |     int      |                    1(默认)                    |    否    |                     参数服务器模式下worker的数量                     |
-|          server_num           |     int      |                    1(默认)                    |    否    |                     参数服务器模式下server的数量                     |
-|      distribute_strategy      |    string    |        async(默认)/sync/half_async/geo        |    否    |                    参数服务器模式下训练模式的选择                    |
-|            epochs             |     int      |                     >= 1                      |    否    |                           模型训练迭代轮数                           |
-|            phases             | list[string] |            由phase name组成的list             |    否    |                  当前runner的训练过程列表，顺序执行                  |
-|        init_model_path        |    string    |                     路径                      |    否    |                            初始化模型地址                            |
-|   save_checkpoint_interval    |     int      |                     >= 1                      |    否    |                          Save参数的轮数间隔                          |
-|     save_checkpoint_path      |    string    |                     路径                      |    否    |                            Save参数的地址                            |
-|    save_inference_interval    |     int      |                     >= 1                      |    否    |                        Save预测模型的轮数间隔                        |
-|      save_inference_path      |    string    |                     路径                      |    否    |                          Save预测模型的地址                          |
-| save_inference_feed_varnames  | list[string] |           组网中指定Variable的name            |    否    |                        预测模型的入口变量name                        |
-| save_inference_fetch_varnames | list[string] |           组网中指定Variable的name            |    否    |                        预测模型的出口变量name                        |
-|        print_interval         |     int      |                     >= 1                      |    否    |                        训练指标打印batch间隔                         |
-|      instance_class_path      |    string    |                     路径                      |    否    |                     自定义instance流程实现的地址                     |
-|      network_class_path       |    string    |                     路径                      |    否    |                     自定义network流程实现的地址                      |
-|      startup_class_path       |    string    |                     路径                      |    否    |                     自定义startup流程实现的地址                      |
-|       runner_class_path       |    string    |                     路径                      |    否    |                      自定义runner流程实现的地址                      |
-|      terminal_class_path      |    string    |                     路径                      |    否    |                     自定义terminal流程实现的地址                     |
+|            device             |    string    |                      cpu(默认) / gpu                      |    否    |                             程序执行设备                             |
+|          fleet_mode           |    string    |               ps(默认) / pslib / collective               |    否    |                            分布式运行模式                            |
+|         selected_gpus         |    string    |                         "0"(默认)                         |    否    | 程序运行GPU卡号，若以"0,1"的方式指定多卡，则会默认启用collective模式 |
+|          worker_num           |     int      |                          1(默认)                          |    否    |                     参数服务器模式下worker的数量                     |
+|          server_num           |     int      |                          1(默认)                          |    否    |                     参数服务器模式下server的数量                     |
+|      distribute_strategy      |    string    |              async(默认)/sync/half_async/geo              |    否    |                    参数服务器模式下训练模式的选择                    |
+|            epochs             |     int      |                           >= 1                            |    否    |                           模型训练迭代轮数                           |
+|            phases             | list[string] |                  由phase name组成的list                   |    否    |                  当前runner的训练过程列表，顺序执行                  |
+|        init_model_path        |    string    |                           路径                            |    否    |                            初始化模型地址                            |
+|   save_checkpoint_interval    |     int      |                           >= 1                            |    否    |                          Save参数的轮数间隔                          |
+|     save_checkpoint_path      |    string    |                           路径                            |    否    |                            Save参数的地址                            |
+|    save_inference_interval    |     int      |                           >= 1                            |    否    |                        Save预测模型的轮数间隔                        |
+|      save_inference_path      |    string    |                           路径                            |    否    |                          Save预测模型的地址                          |
+| save_inference_feed_varnames  | list[string] |                 组网中指定Variable的name                  |    否    |                        预测模型的入口变量name                        |
+| save_inference_fetch_varnames | list[string] |                 组网中指定Variable的name                  |    否    |                        预测模型的出口变量name                        |
+|        print_interval         |     int      |                           >= 1                            |    否    |                        训练指标打印batch间隔                         |
+|      instance_class_path      |    string    |                           路径                            |    否    |                     自定义instance流程实现的地址                     |
+|      network_class_path       |    string    |                           路径                            |    否    |                     自定义network流程实现的地址                      |
+|      startup_class_path       |    string    |                           路径                            |    否    |                     自定义startup流程实现的地址                      |
+|       runner_class_path       |    string    |                           路径                            |    否    |                      自定义runner流程实现的地址                      |
+|      terminal_class_path      |    string    |                           路径                            |    否    |                     自定义terminal流程实现的地址                     |



@@ -70,3 +70,55 @@
 | optimizer.learning_rate | float  |       > 0        |    否    |         指定学习率          |
 |           reg           | float  |       > 0        |    否    | L2正则化参数，只在SGD下生效 |
 |         others          |   /    |        /         |    /     |   由各个模型组网独立指定    |
+
+
+# PaddleRec backend.yaml配置说明
+
+## 全局变量
+
+ |     名称     |  类型  |      取值       | 是否必须 |                     作用描述                     |
+ | :----------: | :----: | :-------------: | :------: | :----------------------------------------------: |
+ |   backend    | string | paddlecloud/k8s |    是    | 使用PaddleCloud平台提交，还是在公有云K8S集群提交 |
+ | cluster_type | string |     mpi/k8s     |    是    |        指定运行的计算集群： mpi 还是 k8s         |
+
+ ## config
+ |          名称          |  类型  |                  取值                   | 是否必须 |                                           作用描述                                           |
+ | :--------------------: | :----: | :-------------------------------------: | :------: | :------------------------------------------------------------------------------------------: |
+ |     paddle_version     | string | paddle官方版本号，如1.7.2/1.8.0/1.8.3等 |    否    |                           指定运行训练使用的Paddle版本，默认1.7.2                            |
+ |      use_python3       |  int   |               0（默认）/1               |    否    |                                 指定是否使用python3进行训练                                  |
+ |        fs_name         | string |             "afs://xxx.com"             |    是    |                                   hdfs/afs集群名称所需配置                                   |
+ |         fs_ugi         | string |                "usr,pwd"                |    是    |                                   hdfs/afs集群密钥所需配置                                   |
+ |      output_path       | string |            "/user/your/path"            |    否    |                                      任务输出的远程目录                                      |
+ |    train_data_path     | string |            "/user/your/path"            |    是    | mpi集群下指定训练数据路径，paddlecloud会自动将数据分片并下载到工作目录的`./train_data`文件夹 |
+ |     test_data_path     | string |            "/user/your/path"            |    否    |             mpi集群下指定测试数据路径，会自动下载到工作目录的`./test_data`文件夹             |
+ |    thirdparty_path     | string |            "/user/your/path"            |    否    |           mpi集群下指定thirdparty路径，会自动下载到工作目录的`./thirdparty`文件夹            |
+ | afs_remote_mount_point | string |            "/user/your/path"            |    是    |                  k8s集群下指定远程路径的地址，会挂载到工作目录的`./afs/下`                   |
+ 
+ ### config.communicator
+ |                       名称                       | 类型  |      取值      | 是否必须 |                        作用描述                        |
+ | :----------------------------------------------: | :---: | :------------: | :------: | :----------------------------------------------------: |
+ |       FLAGS_communicator_is_sgd_optimizer        |  int  |  0（默认）/1   |    否    | 异步分布式训练时的多线程的梯度融合方式是否使用SGD模式  |
+ |        FLAGS_communicator_send_queue_size        |  int  | 线程数（默认） |    否    |               分布式训练时发送队列的大小               |
+ |       FLAGS_communicator_max_merge_var_num       |  int  | 线程数（默认） |    否    |        分布式训练多线程梯度融合时，线程数的配置        |
+ | FLAGS_communicator_max_send_grad_num_before_recv |  int  | 线程数（默认） |    否    | 分布式训练使用独立recv参数线程时，与send的步调配置超参 |
+ |       FLAGS_communicator_thread_pool_size        |  int  |   32（默认）   |    否    |        分布式训练时，多线程发送参数的线程池大小        |
+ |           FLAGS_communicator_fake_rpc            |  int  |  0（默认）/1   |    否    |              分布式训练时，选择不进行通信              |
+ |              FLAGS_rpc_retry_times               |  int  |    3(默认)     |    否    |            分布式训练时，GRPC的失败重试次数            |
+
+
+## submit
+|     名称      |  类型  |            取值             | 是否必须 |                         作用描述                         |
+| :-----------: | :----: | :-------------------------: | :------: | :------------------------------------------------------: |
+|      ak       | string | PaddleCloud平台提供的ak密钥 |    是    |                   paddlecloud用户配置                    |
+|      sk       | string | PaddleCloud平台提供的sk密钥 |    否    |                   paddlecloud用户配置                    |
+|   priority    | string |    normal/high/very_high    |    否    |                        任务优先级                        |
+|   job_name    | string |            任意             |    是    |                         任务名称                         |
+|     group     | string |     计算资源所在组名称      |    是    |                          组名称                          |
+|   start_cmd   | string |            任意             |    是    | 启动命令，默认`python -m paddlerec.run -m ./config.yaml` |
+|     files     | string |            任意             |    是    |         随任务提交上传的文件，给出相对或绝对路径         |
+|     nodes     |  int   |        >=1（默认1）         |    否    |                    mpi集群下的节点数                     |
+| k8s_trainers  |  int   |        >=1（默认1）         |    否    |                 k8s集群下worker的节点数                  |
+| k8s_cpu_cores |  int   |        >=1（默认1）         |    否    |                 k8s集群下worker的CPU核数                 |
+| k8s_gpu_card  |  int   |        >=1（默认1）         |    否    |                 k8s集群下worker的GPU卡数                 |
+|  k8s_ps_num   |  int   |        >=1（默认1）         |    否    |                 k8s集群下server的节点数                  |
+| k8s_ps_cores  |  int   |        >=1（默认1）         |    否    |                 k8s集群下server的CPU核数                 |
--- a/models/contentunderstanding/classification/config.yaml
+++ b/models/contentunderstanding/classification/config.yaml
@@ -12,7 +12,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-workspace: "paddlerec.models.contentunderstanding.classification"
+workspace: "models/contentunderstanding/classification"

 dataset:
 - name: data1

--- a/models/contentunderstanding/readme.md
+++ b/models/contentunderstanding/readme.md
@@ -39,8 +39,11 @@

 ##使用教程(快速开始)
 ```
-python -m paddlerec.run -m paddlerec.models.contentunderstanding.tagspace
-python -m paddlerec.run -m paddlerec.models.contentunderstanding.classification
+git clone https://github.com/PaddlePaddle/PaddleRec.git paddle-rec
+cd paddle-rec
+
+python -m paddlerec.run -m models/contentunderstanding/tagspace/config.yaml
+python -m paddlerec.run -m models/contentunderstanding/classification/config.yaml
 ```

 ## 使用教程（复现论文）

--- a/models/contentunderstanding/tagspace/config.yaml
+++ b/models/contentunderstanding/tagspace/config.yaml
@@ -12,7 +12,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-workspace: "paddlerec.models.contentunderstanding.tagspace"
+workspace: "models/contentunderstanding/tagspace"

 dataset:
 - name: sample_1

--- a/models/demo/movie_recommand/rank/config.yaml
+++ b/models/demo/movie_recommand/rank/config.yaml
@@ -12,7 +12,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-workspace: "paddlerec.models.demo.movie_recommand"
+workspace: "models/demo/movie_recommand"

 # list of dataset
 dataset:

--- a/models/demo/movie_recommand/recall/config.yaml
+++ b/models/demo/movie_recommand/recall/config.yaml
@@ -12,7 +12,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-workspace: "paddlerec.models.demo.movie_recommand"
+workspace: "models/demo/movie_recommand"

 # list of dataset
 dataset:

--- a/models/match/dssm/config.yaml
+++ b/models/match/dssm/config.yaml
@@ -13,7 +13,7 @@
 # limitations under the License.


-workspace: "paddlerec.models.match.dssm"
+workspace: "models/match/dssm"

 dataset:
 - name: dataset_train

--- a/models/match/match-pyramid/__init__.py
+++ b/models/match/match-pyramid/__init__.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
--- a/models/match/match-pyramid/config.yaml
+++ b/models/match/match-pyramid/config.yaml
+# Copyrigh t(c) 2020 PaddlePaddle Authors. All Rights Reserved.
+# 
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+workspace: "models/match/match-pyramid"
+
+dataset:
+- name: dataset_train
+  batch_size: 128
+  type: DataLoader
+  data_path: "{workspace}/data/train" 
+  data_converter: "{workspace}/train_reader.py"
+- name: dataset_infer
+  batch_size: 1
+  type: DataLoader
+  data_path: "{workspace}/data/test"
+  data_converter: "{workspace}/test_reader.py"
+
+
+hyper_parameters:
+  optimizer:
+    class: adam
+    learning_rate: 0.001
+    strategy: async
+  emb_path: "./data/embedding.npy"
+  sentence_left_size: 20
+  sentence_right_size: 500
+  vocab_size: 193368
+  emb_size: 50
+  kernel_num: 8
+  hidden_size: 20
+  hidden_act: "relu"
+  out_size: 1
+  channels: 1
+  conv_filter: [2,10]
+  conv_act: "relu"
+  pool_size: [6,50]
+  pool_stride: [6,50]
+  pool_type: "max"
+  pool_padding: "VALID"
+
+mode: [train_runner , infer_runner]
+# config of each runner.
+# runner is a kind of paddle training class, which wraps the train/infer process.
+runner:
+- name: train_runner
+  class: train
+  # num of epochs
+  epochs: 2
+  # device to run training or infer
+  device: cpu
+  save_checkpoint_interval: 1 # save model interval of epochs
+  save_inference_interval: 1 # save inference
+  save_checkpoint_path: "inference" # save checkpoint path
+  save_inference_path: "inference" # save inference path
+  save_inference_feed_varnames: [] # feed vars of save inference
+  save_inference_fetch_varnames: [] # fetch vars of save inference
+  init_model_path: "" # load model path
+  print_interval: 2
+  phases: phase_train
+- name: infer_runner
+  class: infer
+  # device to run training or infer
+  device: cpu
+  print_interval: 1
+  init_model_path: "inference/1" # load model path
+  phases: phase_infer
+
+# runner will run all the phase in each epoch
+phase:
+- name: phase_train
+  model: "{workspace}/model.py" # user-defined model
+  dataset_name: dataset_train # select dataset by name
+  thread_num: 1
+- name: phase_infer
+  model: "{workspace}/model.py" # user-defined model
+  dataset_name: dataset_infer # select dataset by name
+  thread_num: 1
--- a/models/match/match-pyramid/data/process.py
+++ b/models/match/match-pyramid/data/process.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import numpy as np
+import random
+
+
+# Read Word Dict and Inverse Word Dict
+def read_word_dict(filename):
+    word_dict = {}
+    for line in open(filename):
+        line = line.strip().split()
+        word_dict[int(line[1])] = line[0]
+    print('[%s]\n\tWord dict size: %d' % (filename, len(word_dict)))
+    return word_dict
+
+
+# Read Embedding File
+def read_embedding(filename):
+    embed = {}
+    for line in open(filename):
+        line = line.strip().split()
+        embed[int(line[0])] = list(map(float, line[1:]))
+    print('[%s]\n\tEmbedding size: %d' % (filename, len(embed)))
+    return embed
+
+
+# Convert Embedding Dict 2 numpy array
+def convert_embed_2_numpy(embed_dict, embed=None):
+    for k in embed_dict:
+        embed[k] = np.array(embed_dict[k])
+    print('Generate numpy embed:', embed.shape)
+    return embed
+
+
+# Read Data
+def read_data(filename):
+    data = {}
+    for line in open(filename):
+        line = line.strip().split()
+        data[line[0]] = list(map(int, line[2:]))
+    print('[%s]\n\tData size: %s' % (filename, len(data)))
+    return data
+
+
+# Read Relation Data
+def read_relation(filename):
+    data = []
+    for line in open(filename):
+        line = line.strip().split()
+        data.append((int(line[0]), line[1], line[2]))
+    print('[%s]\n\tInstance size: %s' % (filename, len(data)))
+    return data
+
+
+Letor07Path = "./data"
+word_dict = read_word_dict(filename=os.path.join(Letor07Path, 'word_dict.txt'))
+query_data = read_data(filename=os.path.join(Letor07Path, 'qid_query.txt'))
+doc_data = read_data(filename=os.path.join(Letor07Path, 'docid_doc.txt'))
+embed_dict = read_embedding(filename=os.path.join(Letor07Path,
+                                                  'embed_wiki-pdc_d50_norm'))
+
+_PAD_ = len(word_dict)  #193367
+embed_dict[_PAD_] = np.zeros((50, ), dtype=np.float32)
+word_dict[_PAD_] = '[PAD]'
+W_init_embed = np.float32(np.random.uniform(-0.02, 0.02, [len(word_dict), 50]))
+embedding = convert_embed_2_numpy(embed_dict, embed=W_init_embed)
+np.save("embedding.npy", embedding)
+
+batch_size = 64
+data1_maxlen = 20
+data2_maxlen = 500
+embed_size = 50
+train_iters = 2500
+
+
+def make_train():
+    rel_set = {}
+    pair_list = []
+    rel = read_relation(filename=os.path.join(Letor07Path,
+                                              'relation.train.fold1.txt'))
+    for label, d1, d2 in rel:
+        if d1 not in rel_set:
+            rel_set[d1] = {}
+        if label not in rel_set[d1]:
+            rel_set[d1][label] = []
+        rel_set[d1][label].append(d2)
+    for d1 in rel_set:
+        label_list = sorted(rel_set[d1].keys(), reverse=True)
+        for hidx, high_label in enumerate(label_list[:-1]):
+            for low_label in label_list[hidx + 1:]:
+                for high_d2 in rel_set[d1][high_label]:
+                    for low_d2 in rel_set[d1][low_label]:
+                        pair_list.append((d1, high_d2, low_d2))
+    print('Pair Instance Count:', len(pair_list))
+
+    f = open("./data/train/train.txt", "w")
+    for batch in range(800):
+        X1 = np.zeros((batch_size * 2, data1_maxlen), dtype=np.int32)
+        X2 = np.zeros((batch_size * 2, data2_maxlen), dtype=np.int32)
+        X1[:] = _PAD_
+        X2[:] = _PAD_
+        for i in range(batch_size):
+            d1, d2p, d2n = random.choice(pair_list)
+            d1_len = min(data1_maxlen, len(query_data[d1]))
+            d2p_len = min(data2_maxlen, len(doc_data[d2p]))
+            d2n_len = min(data2_maxlen, len(doc_data[d2n]))
+            X1[i, :d1_len] = query_data[d1][:d1_len]
+            X2[i, :d2p_len] = doc_data[d2p][:d2p_len]
+            X1[i + batch_size, :d1_len] = query_data[d1][:d1_len]
+            X2[i + batch_size, :d2n_len] = doc_data[d2n][:d2n_len]
+        for i in range(batch_size * 2):
+            q = [str(x) for x in list(X1[i])]
+            d = [str(x) for x in list(X2[i])]
+            f.write(",".join(q) + "\t" + ",".join(d) + "\n")
+    f.close()
+
+
+def make_test():
+    rel = read_relation(filename=os.path.join(Letor07Path,
+                                              'relation.test.fold1.txt'))
+    f = open("./data/test/test.txt", "w")
+    for label, d1, d2 in rel:
+        X1 = np.zeros(data1_maxlen, dtype=np.int32)
+        X2 = np.zeros(data2_maxlen, dtype=np.int32)
+        X1[:] = _PAD_
+        X2[:] = _PAD_
+        d1_len = min(data1_maxlen, len(query_data[d1]))
+        d2_len = min(data2_maxlen, len(doc_data[d2]))
+        X1[:d1_len] = query_data[d1][:d1_len]
+        X2[:d2_len] = doc_data[d2][:d2_len]
+        q = [str(x) for x in list(X1)]
+        d = [str(x) for x in list(X2)]
+        f.write(",".join(q) + "\t" + ",".join(d) + "\t" + str(label) + "\t" +
+                d1 + "\n")
+    f.close()
+
+
+make_train()
+make_test()
--- a/models/match/match-pyramid/data/relation.test.fold1.txt
+++ b/models/match/match-pyramid/data/relation.test.fold1.txt
+2 9639 GX099-60-3149248
+1 9639 GX028-47-6554966
+1 9639 GX031-84-2802741
+1 9639 GX031-86-1702683
+1 9639 GX031-89-11392170
+1 9639 GX035-46-10142187
+1 9639 GX039-07-1333080
+1 9639 GX040-05-15096071
+1 9639 GX045-35-10693225
+1 9639 GX045-74-6226888
+1 9639 GX046-31-8871083
+1 9639 GX046-56-6274894
+1 9639 GX050-09-14629105
+1 9639 GX097-05-12714275
+1 9639 GX101-06-7768196
+1 9639 GX124-50-4934142
+1 9639 GX259-01-13320140
+1 9639 GX259-50-8109630
+1 9639 GX259-72-16176934
+1 9639 GX259-98-7821925
+1 9639 GX260-27-13260880
+1 9639 GX260-54-6363694
+1 9639 GX260-78-6999656
+1 9639 GX261-04-0843988
+1 9639 GX261-23-4964814
+0 9639 GX021-75-7026755
+0 9639 GX021-80-16449591
+0 9639 GX025-40-7135810
+0 9639 GX031-89-9020252
+0 9639 GX037-45-0533209
+0 9639 GX038-17-11223353
+0 9639 GX057-07-13335832
+0 9639 GX081-50-12756687
+0 9639 GX124-43-2364716
+0 9639 GX129-60-0000000
+0 9639 GX219-07-7475581
+0 9639 GX233-90-7976935
+0 9639 GX267-49-2983064
+0 9639 GX267-74-2413254
+0 9639 GX270-05-13614294
+1 9329 GX234-05-0812081
+0 9329 GX000-00-0000000
+0 9329 GX008-50-3899336
+0 9329 GX011-75-8470249
+0 9329 GX020-42-13388867
+0 9329 GX024-91-8520306
+0 9329 GX026-88-6087429
+0 9329 GX027-22-1703847
+0 9329 GX034-11-2617393
+0 9329 GX036-02-7994497
+0 9329 GX046-08-13858054
+0 9329 GX059-85-11403109
+0 9329 GX099-37-0232298
+0 9329 GX099-46-11473306
+0 9329 GX108-04-9589788
+0 9329 GX110-50-11723940
+0 9329 GX124-11-4119164
+0 9329 GX149-82-15204191
+0 9329 GX165-95-6198495
+0 9329 GX225-56-4184936
+0 9329 GX229-57-4487470
+0 9329 GX230-37-4125963
+0 9329 GX231-40-14574318
+0 9329 GX238-44-10302536
+0 9329 GX239-85-8572461
+0 9329 GX244-17-10154048
+0 9329 GX245-16-4169590
+0 9329 GX245-46-6341859
+0 9329 GX246-91-8487173
+0 9329 GX262-88-13259441
+0 9329 GX263-41-4135561
+0 9329 GX264-07-6385713
+0 9329 GX264-38-12253757
+0 9329 GX264-90-15990025
+0 9329 GX265-89-6212449
+0 9329 GX268-41-12034794
+0 9329 GX268-83-5140660
+0 9329 GX270-46-0293828
+0 9329 GX270-64-11852140
+0 9329 GX271-10-12458597
+2 9326 GX272-03-6610348
+1 9326 GX011-12-0595978
+0 9326 GX000-00-0000000
+0 9326 GX000-38-9492606
+0 9326 GX000-84-4587136
+0 9326 GX002-41-5566464
+0 9326 GX002-51-2615036
+0 9326 GX004-56-12238694
+0 9326 GX004-72-2476906
+0 9326 GX008-13-1835206
+0 9326 GX008-64-7705528
+0 9326 GX009-87-0976731
+0 9326 GX012-24-7688369
+0 9326 GX012-96-8727608
+0 9326 GX023-87-16736657
+0 9326 GX025-21-11820239
+0 9326 GX025-22-15113698
+0 9326 GX025-51-13959128
+0 9326 GX025-57-11414648
+0 9326 GX025-64-7587631
+0 9326 GX027-62-4542881
+0 9326 GX031-25-4759403
+0 9326 GX036-10-7902858
+0 9326 GX047-04-9457544
+0 9326 GX047-06-4014803
+0 9326 GX048-00-15113058
+0 9326 GX048-02-12975919
+0 9326 GX048-78-3273874
+0 9326 GX235-35-0963257
+0 9326 GX235-98-3789570
+0 9326 GX236-51-15473637
+0 9326 GX237-96-0892713
+0 9326 GX239-35-7413891
+0 9326 GX239-95-0176537
+0 9326 GX251-34-10377030
+0 9326 GX254-19-11374782
+0 9326 GX260-63-10533444
+0 9326 GX265-94-14886230
+0 9326 GX269-78-1500497
+0 9326 GX270-59-10270517
+2 8946 GX046-79-6984659
+2 8946 GX148-33-1869479
+2 8946 GX252-36-12638222
+1 8946 GX017-47-13290921
+1 8946 GX030-69-3218092
+1 8946 GX034-82-4550348
+1 8946 GX044-01-9283107
+1 8946 GX047-98-6660623
+1 8946 GX057-96-12580825
+1 8946 GX059-94-12068143
+1 8946 GX060-13-13600036
+1 8946 GX060-74-6594973
+1 8946 GX093-08-1158999
+0 8946 GX000-00-0000000
+0 8946 GX000-42-15811803
+0 8946 GX000-81-16418910
+0 8946 GX008-38-10557859
+0 8946 GX011-01-10891808
+0 8946 GX013-71-5708874
+0 8946 GX015-72-4458924
+0 8946 GX023-91-9869060
+0 8946 GX027-56-6376748
+0 8946 GX037-11-10829529
+0 8946 GX038-55-0681330
+0 8946 GX043-86-4200105
+0 8946 GX047-52-3712485
+0 8946 GX053-77-4836617
+0 8946 GX070-62-1070063
+0 8946 GX105-53-13372327
+0 8946 GX218-61-6263172
+0 8946 GX223-72-13625320
+0 8946 GX230-68-14727182
+0 8946 GX235-34-7733230
+0 8946 GX251-73-0159347
+0 8946 GX254-47-1098586
+0 8946 GX263-76-6934681
+0 8946 GX263-84-8668756
+0 8946 GX264-70-14223639
+0 8946 GX269-12-5910753
+0 8946 GX271-93-9895614
+1 9747 GX006-77-1973537
+1 9747 GX244-83-8716953
+1 9747 GX269-92-7189826
+0 9747 GX000-00-0000000
+0 9747 GX001-51-8693413
+0 9747 GX003-10-2820641
+0 9747 GX003-74-0557776
+0 9747 GX003-79-13695689
+0 9747 GX009-57-0938999
+0 9747 GX009-59-8595527
+0 9747 GX009-80-10629348
+0 9747 GX010-37-0206372
+0 9747 GX013-46-2187318
+0 9747 GX014-58-4004859
+0 9747 GX015-79-5393654
+0 9747 GX032-50-7316370
+0 9747 GX049-33-2206612
+0 9747 GX050-34-0439256
+0 9747 GX062-76-0914936
+0 9747 GX065-73-7392661
+0 9747 GX148-27-15770966
+0 9747 GX155-71-0504939
+0 9747 GX229-75-14750078
+0 9747 GX231-01-0640962
+0 9747 GX236-45-15598812
+0 9747 GX247-19-9516715
+0 9747 GX247-34-4277646
+0 9747 GX247-63-10766287
+0 9747 GX248-23-15998266
+0 9747 GX249-85-9742193
+0 9747 GX250-31-7671617
+0 9747 GX252-56-2141580
+0 9747 GX253-15-3406713
+0 9747 GX264-07-15838087
+0 9747 GX264-43-6543997
+0 9747 GX266-18-14688076
+0 9747 GX267-50-2036010
+0 9747 GX268-28-0548507
+0 9747 GX269-49-14171555
+0 9747 GX269-63-15607386
+2 9740 GX005-94-14208849
+2 9740 GX008-51-5639660
+2 9740 GX012-37-2342061
+2 9740 GX019-75-13916532
+2 9740 GX074-76-16261807
+2 9740 GX077-07-2951943
+2 9740 GX229-28-11068981
+2 9740 GX237-80-7497206
+2 9740 GX257-53-10589749
+2 9740 GX258-06-0611419
+2 9740 GX268-55-9791226
+1 9740 GX007-62-1126118
+1 9740 GX015-78-0216468
+1 9740 GX038-65-1678199
+1 9740 GX041-25-14803324
+1 9740 GX063-71-0401425
+1 9740 GX077-08-15801730
+1 9740 GX098-07-2885671
+1 9740 GX135-28-6485892
+1 9740 GX228-85-10518518
+1 9740 GX231-93-11279468
+1 9740 GX234-70-15061254
+1 9740 GX236-31-11149347
+1 9740 GX240-68-1184464
+1 9740 GX248-03-7275316
+1 9740 GX253-11-9846012
+1 9740 GX255-05-10638500
+1 9740 GX267-73-4450097
+1 9740 GX269-19-0642640
+0 9740 GX001-74-5132048
+0 9740 GX001-88-2603815
+0 9740 GX004-83-7935833
+0 9740 GX007-01-16750210
+0 9740 GX040-11-5249209
+0 9740 GX042-38-2886005
+0 9740 GX052-20-4359789
+0 9740 GX067-74-3718011
+0 9740 GX077-01-13481396
+0 9740 GX242-92-8868913
+0 9740 GX262-74-4596688
+2 8835 GX010-99-5715419
+2 8835 GX049-99-2518724
+0 8835 GX000-00-0000000
+0 8835 GX007-91-6779497
+0 8835 GX008-14-0788708
+0 8835 GX008-15-13942125
+0 8835 GX011-58-14336551
+0 8835 GX012-79-10684001
+0 8835 GX013-00-10822427
+0 8835 GX013-03-5962783
+0 8835 GX015-54-0251701
+0 8835 GX017-36-5859317
+0 8835 GX017-60-0601078
+0 8835 GX027-24-16202205
+0 8835 GX030-11-15814183
+0 8835 GX030-76-11969233
--- a/models/match/match-pyramid/data/test/test.txt
+++ b/models/match/match-pyramid/data/test/test.txt
--- a/models/match/match-pyramid/data/train/train.txt
+++ b/models/match/match-pyramid/data/train/train.txt
--- a/models/match/match-pyramid/data_process.sh
+++ b/models/match/match-pyramid/data_process.sh
+#!/bin/bash
+
+echo "...........load  data................."
+wget --no-check-certificate 'https://paddlerec.bj.bcebos.com/match_pyramid/match_pyramid_data.tar.gz'
+mv ./match_pyramid_data.tar.gz ./data
+rm -rf ./data/relation.test.fold1.txt ./data/realtion.train.fold1.txt
+tar -xvf ./data/match_pyramid_data.tar.gz
+echo "...........data process..............."
+python ./data/process.py
--- a/models/match/match-pyramid/eval.py
+++ b/models/match/match-pyramid/eval.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import random
+import numpy as np
+
+
+def eval_MAP(pred, gt):
+    map_value = 0.0
+    r = 0.0
+    c = list(zip(pred, gt))
+    random.shuffle(c)
+    c = sorted(c, key=lambda x: x[0], reverse=True)
+    for j, (p, g) in enumerate(c):
+        if g != 0:
+            r += 1
+            map_value += r / (j + 1.0)
+    if r == 0:
+        return 0.0
+    else:
+        return map_value / r
+
+
+filename = './data/relation.test.fold1.txt'
+gt = []
+qid = []
+f = open(filename, "r")
+f.readline()
+num = 0
+for line in f.readlines():
+    num = num + 1
+    line = line.strip().split()
+    gt.append(int(line[0]))
+    qid.append(line[1])
+f.close()
+print(num)
+filename = './result.txt'
+pred = []
+for line in open(filename):
+    line = line.strip().split(",")
+    line[1] = line[1].split(":")
+    line = line[1][1].strip(" ")
+    line = line.strip("[")
+    line = line.strip("]")
+    pred.append(float(line))
+
+result_dict = {}
+for i in range(len(qid)):
+    if qid[i] not in result_dict:
+        result_dict[qid[i]] = []
+    result_dict[qid[i]].append([gt[i], pred[i]])
+print(len(result_dict))
+
+map = 0
+for qid in result_dict:
+    gt = np.array(result_dict[qid])[:, 0]
+    pred = np.array(result_dict[qid])[:, 1]
+    map += eval_MAP(pred, gt)
+map = map / len(result_dict)
+
+print("map=", map)
--- a/models/match/match-pyramid/model.py
+++ b/models/match/match-pyramid/model.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import sys
+import random
+import numpy as np
+import paddle
+import paddle.fluid as fluid
+from paddlerec.core.utils import envs
+from paddlerec.core.model import ModelBase
+
+
+class Model(ModelBase):
+    def __init__(self, config):
+        ModelBase.__init__(self, config)
+
+    def _init_hyper_parameters(self):
+        self.emb_path = envs.get_global_env("hyper_parameters.emb_path")
+        self.sentence_left_size = envs.get_global_env(
+            "hyper_parameters.sentence_left_size")
+        self.sentence_right_size = envs.get_global_env(
+            "hyper_parameters.sentence_right_size")
+        self.vocab_size = envs.get_global_env("hyper_parameters.vocab_size")
+        self.emb_size = envs.get_global_env("hyper_parameters.emb_size")
+        self.kernel_num = envs.get_global_env("hyper_parameters.kernel_num")
+        self.hidden_size = envs.get_global_env("hyper_parameters.hidden_size")
+        self.hidden_act = envs.get_global_env("hyper_parameters.hidden_act")
+        self.out_size = envs.get_global_env("hyper_parameters.out_size")
+        self.channels = envs.get_global_env("hyper_parameters.channels")
+        self.conv_filter = envs.get_global_env("hyper_parameters.conv_filter")
+        self.conv_act = envs.get_global_env("hyper_parameters.conv_act")
+        self.pool_size = envs.get_global_env("hyper_parameters.pool_size")
+        self.pool_stride = envs.get_global_env("hyper_parameters.pool_stride")
+        self.pool_type = envs.get_global_env("hyper_parameters.pool_type")
+        self.pool_padding = envs.get_global_env(
+            "hyper_parameters.pool_padding")
+
+    def input_data(self, is_infer=False, **kwargs):
+        sentence_left = fluid.data(
+            name="sentence_left",
+            shape=[-1, self.sentence_left_size, 1],
+            dtype='int64',
+            lod_level=0)
+        sentence_right = fluid.data(
+            name="sentence_right",
+            shape=[-1, self.sentence_right_size, 1],
+            dtype='int64',
+            lod_level=0)
+        return [sentence_left, sentence_right]
+
+    def embedding_layer(self, input):
+        """
+        embedding layer
+        """
+        if os.path.isfile(self.emb_path):
+            embedding_array = np.load(self.emb_path)
+            emb = fluid.layers.embedding(
+                input=input,
+                size=[self.vocab_size, self.emb_size],
+                padding_idx=0,
+                param_attr=fluid.ParamAttr(
+                    name="word_embedding",
+                    initializer=fluid.initializer.NumpyArrayInitializer(
+                        embedding_array)))
+        else:
+            emb = fluid.layers.embedding(
+                input=input,
+                size=[self.vocab_size, self.emb_size],
+                padding_idx=0,
+                param_attr=fluid.ParamAttr(
+                    name="word_embedding",
+                    initializer=fluid.initializer.Xavier()))
+
+        return emb
+
+    def conv_pool_layer(self, input):
+        """
+        convolution and pool layer
+        """
+        # data format NCHW
+        # same padding
+        conv = fluid.layers.conv2d(
+            input=input,
+            num_filters=self.kernel_num,
+            stride=1,
+            padding="SAME",
+            filter_size=self.conv_filter,
+            act=self.conv_act)
+        pool = fluid.layers.pool2d(
+            input=conv,
+            pool_size=self.pool_size,
+            pool_stride=self.pool_stride,
+            pool_type=self.pool_type,
+            pool_padding=self.pool_padding)
+        return pool
+
+    def net(self, inputs, is_infer=False):
+        left_emb = self.embedding_layer(inputs[0])
+        right_emb = self.embedding_layer(inputs[1])
+        cross = fluid.layers.matmul(left_emb, right_emb, transpose_y=True)
+        cross = fluid.layers.reshape(cross,
+                                     [-1, 1, cross.shape[1], cross.shape[2]])
+        conv_pool = self.conv_pool_layer(input=cross)
+        relu_hid = fluid.layers.fc(input=conv_pool,
+                                   size=self.hidden_size,
+                                   act=self.hidden_act)
+        prediction = fluid.layers.fc(
+            input=relu_hid,
+            size=self.out_size, )
+
+        if is_infer:
+            self._infer_results["prediction"] = prediction
+            return
+
+        pos = fluid.layers.slice(
+            prediction, axes=[0, 1], starts=[0, 0], ends=[64, 1])
+        neg = fluid.layers.slice(
+            prediction, axes=[0, 1], starts=[64, 0], ends=[128, 1])
+        loss_part1 = fluid.layers.elementwise_sub(
+            fluid.layers.fill_constant(
+                shape=[64, 1], value=1.0, dtype='float32'),
+            pos)
+        loss_part2 = fluid.layers.elementwise_add(loss_part1, neg)
+        loss_part3 = fluid.layers.elementwise_max(
+            fluid.layers.fill_constant(
+                shape=[64, 1], value=0.0, dtype='float32'),
+            loss_part2)
+
+        avg_cost = fluid.layers.mean(loss_part3)
+        self._cost = avg_cost
--- a/models/match/match-pyramid/readme.md
+++ b/models/match/match-pyramid/readme.md
+# match-pyramid文本匹配模型
+
+## 介绍
+在许多自然语言处理任务中，匹配两个文本是一个基本问题。一种有效的方法是从单词，短语和句子中提取有意义的匹配模式以产生匹配分数。受卷积神经网络在图像识别中的成功启发，神经元可以根据提取的基本视觉模式（例如定向的边角和边角）捕获许多复杂的模式，所以我们尝试将文本匹配建模为图像识别问题。本模型对齐原作者庞亮开源的tensorflow代码：https://github.com/pl8787/MatchPyramid-TensorFlow/blob/master/model/model_mp.py， 实现了下述论文中提出的Match-Pyramid模型：
+
+```text
+@inproceedings{Pang L , Lan Y , Guo J , et al. Text Matching as Image Recognition[J]. 2016.,
+  title={Text Matching as Image Recognition},
+  author={Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Shengxian Wan, Xueqi Cheng},
+  year={2016}
+}
+```
+
+## 数据准备
+训练及测试数据集选用Letor07数据集和 embed_wiki-pdc_d50_norm 词向量初始化embedding层。  
+该数据集包括：  
+1.词典文件：我们将每个单词映射得到一个唯一的编号wid，并将此映射保存在单词词典文件中。例如：word_dict.txt  
+2.语料库文件：我们使用字符串标识符的值表示一个句子的编号。第二个数字表示句子的长度。例如：qid_query.txt和docid_doc.txt  
+3.关系文件：关系文件被用来存储两个句子之间的关系，如query 和document之间的关系。例如：relation.train.fold1.txt, relation.test.fold1.txt  
+4.嵌入层文件：我们将预训练的词向量存储在嵌入文件中。例如：embed_wiki-pdc_d50_norm  
+
+## 数据下载和预处理
+本文提供了数据集的下载以及一键生成训练和测试数据的预处理脚本，您可以直接一键运行:bash data_process.sh  
+执行该脚本，会从国内源的服务器上下载Letor07数据集，删除掉data文件夹中原有的relation.test.fold1.txt和relation.train.fold1.txt，并将完整的数据集解压到data文件夹。随后运行 process.py 将全量训练数据放置于`./data/train`，全量测试数据放置于`./data/test`。并生成用于初始化embedding层的embedding.npy文件  
+执行该脚本的理想输出为：  
+```
+bash data_process.sh
+...........load  data...............
+--2020-07-13 13:24:50--  https://paddlerec.bj.bcebos.com/match_pyramid/match_pyramid_data.tar.gz
+Resolving paddlerec.bj.bcebos.com... 10.70.0.165
+Connecting to paddlerec.bj.bcebos.com|10.70.0.165|:443... connected.
+HTTP request sent, awaiting response... 200 OK
+Length: 214449643 (205M) [application/x-gzip]
+Saving to: “match_pyramid_data.tar.gz”
+
+100%[==========================================================================================================>] 214,449,643  114M/s   in 1.8s
+
+2020-07-13 13:24:52 (114 MB/s) - “match_pyramid_data.tar.gz” saved [214449643/214449643]
+
+data/
+data/relation.test.fold1.txt
+data/relation.test.fold2.txt
+data/relation.test.fold3.txt
+data/relation.test.fold4.txt
+data/relation.test.fold5.txt
+data/relation.train.fold1.txt
+data/relation.train.fold2.txt
+data/relation.train.fold3.txt
+data/relation.train.fold4.txt
+data/relation.train.fold5.txt
+data/relation.txt
+data/docid_doc.txt
+data/qid_query.txt
+data/word_dict.txt
+data/embed_wiki-pdc_d50_norm
+...........data process...............
+[./data/word_dict.txt]
+        Word dict size: 193367
+[./data/qid_query.txt]
+        Data size: 1692
+[./data/docid_doc.txt]
+        Data size: 65323
+[./data/embed_wiki-pdc_d50_norm]
+        Embedding size: 109282
+('Generate numpy embed:', (193368, 50))
+[./data/relation.train.fold1.txt]
+        Instance size: 47828
+('Pair Instance Count:', 325439)
+[./data/relation.test.fold1.txt]
+        Instance size: 13652
+```
+
+## 一键训练并测试评估
+本文提供了一键执行训练，测试和评估的脚本，您可以直接一键运行：bash run.sh  
+执行该脚本后，会执行python -m paddlerec.run -m ./config.yaml 命令开始训练并测试模型，将测试的结果保存到result.txt文件，最后通过执行eval.py进行评估得到数据的map指标  
+执行该脚本的理想输出为：  
+```
+..............test.................
+13651
+336
+('map=', 0.420878322843591)
+```
+
+## 每个文件的作用
+paddlerec可以：  
+通过config.yaml规定模型的参数  
+通过model.py规定模型的组网  
+使用train_reader.py读取训练集中的数据  
+使用test_reader.py读取测试集中的数据。  
+本文额外提供：  
+data_process.sh用来一键处理数据  
+run.sh用来一键启动训练，直接得出测试结果  
+eval.py通过保存的测试结果，计算map指标  
+如需详细了解paddlerec的使用方法请参考https://github.com/PaddlePaddle/PaddleRec/blob/master/README_CN.md 页面下方的教程。    
--- a/models/match/match-pyramid/run.sh
+++ b/models/match/match-pyramid/run.sh
+#!/bin/bash
+echo "................run................."
+python -m paddlerec.run -m ./config.yaml >result1.txt
+grep -A1 "prediction" ./result1.txt >./result.txt
+rm -f result1.txt
+python eval.py
--- a/models/match/match-pyramid/test_reader.py
+++ b/models/match/match-pyramid/test_reader.py
+#   Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import print_function
+
+from paddlerec.core.reader import ReaderBase
+
+
+class Reader(ReaderBase):
+    def init(self):
+        pass
+
+    def generate_sample(self, line):
+        """
+        Read the data line by line and process it as a dictionary
+        """
+
+        def reader():
+            """
+            This function needs to be implemented by the user, based on data format
+            """
+
+            features = line.strip('\n').split('\t')
+            doc1 = [int(word_id) for word_id in features[0].split(",")]
+            doc2 = [int(word_id) for word_id in features[1].split(",")]
+            features_name = ["doc1", "doc2"]
+            yield zip(features_name, [doc1] + [doc2])
+
+        return reader
--- a/models/match/match-pyramid/train_reader.py
+++ b/models/match/match-pyramid/train_reader.py
+#   Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import print_function
+
+from paddlerec.core.reader import ReaderBase
+
+
+class Reader(ReaderBase):
+    def init(self):
+        pass
+
+    def generate_sample(self, line):
+        """
+        Read the data line by line and process it as a dictionary
+        """
+
+        def reader():
+            """
+            This function needs to be implemented by the user, based on data format
+            """
+
+            features = line.strip('\n').split('\t')
+            doc1 = [int(word_id) for word_id in features[0].split(",")]
+            doc2 = [int(word_id) for word_id in features[1].split(",")]
+            features_name = ["doc1", "doc2"]
+            yield zip(features_name, [doc1] + [doc2])
+
+        return reader
--- a/models/match/multiview-simnet/config.yaml
+++ b/models/match/multiview-simnet/config.yaml
@@ -13,7 +13,7 @@
 # limitations under the License.

 # workspace
-workspace: "paddlerec.models.match.multiview-simnet"
+workspace: "models/match/multiview-simnet"

 # list of dataset
 dataset:

--- a/models/match/readme.md
+++ b/models/match/readme.md
@@ -34,8 +34,11 @@
 ## 使用教程(快速开始)
 ### 训练
 ```shell
-python -m paddlerec.run -m paddlerec.models.match.dssm # dssm
-python -m paddlerec.run -m paddlerec.models.match.multiview-simnet # multiview-simnet
+git clone https://github.com/PaddlePaddle/PaddleRec.git paddle-rec
+cd paddle-rec
+
+python -m paddlerec.run -m models/match/dssm/config.yaml # dssm
+python -m paddlerec.run -m models/match/multiview-simnet/config.yaml # multiview-simnet
 ```

 ### 预测

--- a/models/multitask/esmm/README.md
+++ b/models/multitask/esmm/README.md
+# ESMM
+
+以下是本例的简要目录结构及说明： 
+
+```
+├── data # 文档
+	├── train #训练数据
+		├──small.txt
+	├── test  #测试数据
+		├── small.txt
+	├── run.sh
+├── __init__.py 
+├── config.yaml #配置文件
+├── esmm_reader.py #数据读取文件
+├── model.py #模型文件
+```
+
+注：在阅读该示例前，建议您先了解以下内容：
+
+[paddlerec入门教程](https://github.com/PaddlePaddle/PaddleRec/blob/master/README.md)
+
+## 内容
+
+- [模型简介](https://github.com/PaddlePaddle/PaddleRec/tree/master/models/multitask/esmm#模型简介)
+- [数据准备](https://github.com/PaddlePaddle/PaddleRec/tree/master/models/multitask/esmm#数据准备)
+- [运行环境](https://github.com/PaddlePaddle/PaddleRec/tree/master/models/multitask/esmm#运行环境)
+- [快速开始](https://github.com/PaddlePaddle/PaddleRec/tree/master/models/multitask/esmm#快速开始)
+- [论文复现](https://github.com/PaddlePaddle/PaddleRec/tree/master/models/multitask/esmm#论文复现)
+- [进阶使用](https://github.com/PaddlePaddle/PaddleRec/tree/master/models/multitask/esmm#进阶使用)
+- [FAQ](https://github.com/PaddlePaddle/PaddleRec/tree/master/models/multitask/esmm#FAQ)
+
+## 模型简介
+
+不同于CTR预估问题，CVR预估面临两个关键问题：
+
+1. **Sample Selection Bias (SSB)** 转化是在点击之后才“有可能”发生的动作，传统CVR模型通常以点击数据为训练集，其中点击未转化为负例，点击并转化为正例。但是训练好的模型实际使用时，则是对整个空间的样本进行预估，而非只对点击样本进行预估。即是说，训练数据与实际要预测的数据来自不同分布，这个偏差对模型的泛化能力构成了很大挑战。
+2. **Data Sparsity (DS)** 作为CVR训练数据的点击样本远小于CTR预估训练使用的曝光样本。
+
+ESMM是发表在 SIGIR’2018 的论文[《Entire Space Multi-Task Model: An Eﬀective Approach for Estimating Post-Click Conversion Rate》](  https://arxiv.org/abs/1804.07931  )文章基于 Multi-Task Learning 的思路，提出一种新的CVR预估模型——ESMM，有效解决了真实场景中CVR预估面临的数据稀疏以及样本选择偏差这两个关键问题
+
+本项目在paddlepaddle上实现ESMM的网络结构，并在开源数据集[Ali-CCP：Alibaba Click and Conversion Prediction](  https://tianchi.aliyun.com/datalab/dataSet.html?dataId=408  )上验证模型效果, 本模型配置默认使用demo数据集，若进行精度验证，请参考[论文复现](https://github.com/PaddlePaddle/PaddleRec/tree/master/models/multitask/esmm#论文复现)部分。
+
+本项目支持功能
+
+训练：单机CPU、单机单卡GPU、单机多卡GPU、本地模拟参数服务器训练、增量训练，配置请参考 [启动训练](https://github.com/PaddlePaddle/PaddleRec/blob/master/doc/train.md)
+
+预测：单机CPU、单机单卡GPU ；配置请参考[PaddleRec 离线预测](https://github.com/PaddlePaddle/PaddleRec/blob/master/doc/predict.md)
+
+## 数据准备
+
+数据地址：[Ali-CCP：Alibaba Click and Conversion Prediction](  https://tianchi.aliyun.com/datalab/dataSet.html?dataId=408  )
+
+```
+cd data 
+sh run.sh
+```
+
+数据格式参见demo数据：data/train
+
+
+## 运行环境
+
+PaddlePaddle>=1.7.2
+
+python 2.7/3.5/3.6/3.7
+
+PaddleRec >=0.1
+
+os : windows/linux/macos
+
+## 快速开始
+
+### 单机训练
+
+CPU环境
+
+在config.yaml文件中设置好设备，epochs等。
+
+```
+dataset:
+  - name: dataset_train
+    batch_size: 5
+    type: QueueDataset
+    data_path: "{workspace}/data/train"
+    data_converter: "{workspace}/esmm_reader.py"
+  - name: dataset_infer
+    batch_size: 5
+    type: QueueDataset
+    data_path: "{workspace}/data/test"
+    data_converter: "{workspace}/esmm_reader.py"
+```
+
+### 单机预测
+
+CPU环境
+
+在config.yaml文件中设置好epochs、device等参数。
+
+```
+ - name: infer_runner
+    class: infer
+    init_model_path: "increment/1"
+    device: cpu
+    print_interval: 1
+    phases: [infer]
+```
+
+
+## 论文复现
+
+用原论文的完整数据复现论文效果需要在config.yaml中修改batch_size=1000, thread_num=8, epoch_num=4
+
+
+修改后运行方案：修改config.yaml中的'workspace'为config.yaml的目录位置，执行
+
+```
+python -m paddlerec.run -m /home/your/dir/config.yaml #调试模式 直接指定本地config的绝对路径
+```
+
+## 进阶使用
+
+## FAQ
--- a/models/multitask/esmm/config.yaml
+++ b/models/multitask/esmm/config.yaml
@@ -13,7 +13,7 @@
 # limitations under the License.


-workspace: "paddlerec.models.multitask.esmm"
+workspace: "models/multitask/esmm"

 dataset:
  - name: dataset_train

--- a/models/multitask/esmm/data/run.sh
+++ b/models/multitask/esmm/data/run.sh
+mkdir train_data
+mkdir test_data
+mkdir vocab
+mkdir data
+train_source_path="./data/sample_train.tar.gz"
+train_target_path="train_data"
+test_source_path="./data/sample_test.tar.gz"
+test_target_path="test_data"
+cd data
+echo "downloading sample_train.tar.gz......"
+curl -# 'http://jupter-oss.oss-cn-hangzhou.aliyuncs.com/file/opensearch/documents/408/sample_train.tar.gz?Expires=1586435769&OSSAccessKeyId=LTAIGx40tjZWxj6q&Signature=ahUDqhvKT1cGjC4%2FIER2EWtq7o4%3D&response-content-disposition=attachment%3B%20' -H 'Proxy-Connection: keep-alive' -H 'Upgrade-Insecure-Requests: 1' -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9' -H 'Accept-Language: zh-CN,zh;q=0.9' --compressed --insecure -o sample_train.tar.gz
+cd ..
+echo "unzipping sample_train.tar.gz......"
+tar -xzvf  ${train_source_path} -C ${train_target_path} && rm -rf ${train_source_path}
+cd data
+echo "downloading sample_test.tar.gz......"
+curl -# 'http://jupter-oss.oss-cn-hangzhou.aliyuncs.com/file/opensearch/documents/408/sample_test.tar.gz?Expires=1586435821&OSSAccessKeyId=LTAIGx40tjZWxj6q&Signature=OwLMPjt1agByQtRVi8pazsAliNk%3D&response-content-disposition=attachment%3B%20' -H 'Proxy-Connection: keep-alive' -H 'Upgrade-Insecure-Requests: 1' -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9' -H 'Accept-Language: zh-CN,zh;q=0.9' --compressed --insecure -o sample_test.tar.gz
+cd ..
+echo "unzipping sample_test.tar.gz......"
+tar -xzvf  ${test_source_path} -C ${test_target_path} && rm -rf ${test_source_path}
+echo "preprocessing data......"
+python reader.py --train_data_path ${train_target_path} \
+                 --test_data_path ${test_target_path} \
+                 --vocab_path vocab/vocab_size.txt \
+                 --train_sample_size 6400 \
+                 --test_sample_size 6400 \
--- a/models/multitask/mmoe/README.md
+++ b/models/multitask/mmoe/README.md
+# MMOE
+
+ 以下是本例的简要目录结构及说明： 
+
+```
+├── data # 文档
+	├── train #训练数据
+		├── train_data.txt
+	├── test  #测试数据
+		├── test_data.txt
+	├── run.sh
+	├── data_preparation.py
+├── __init__.py 
+├── config.yaml #配置文件
+├── census_reader.py #数据读取文件
+├── model.py #模型文件
+```
+
+注：在阅读该示例前，建议您先了解以下内容：
+
+[paddlerec入门教程](https://github.com/PaddlePaddle/PaddleRec/blob/master/README.md)
+
+## 内容
+
+- [模型简介](https://github.com/PaddlePaddle/PaddleRec/tree/master/models/multitask/mmoe#模型简介)
+- [数据准备](https://github.com/PaddlePaddle/PaddleRec/tree/master/models/multitask/mmoe#数据准备)
+- [运行环境](https://github.com/PaddlePaddle/PaddleRec/tree/master/models/multitask/mmoe#运行环境)
+- [快速开始](https://github.com/PaddlePaddle/PaddleRec/tree/master/models/multitask/mmoe#快速开始)
+- [论文复现](https://github.com/PaddlePaddle/PaddleRec/tree/master/models/multitask/mmoe#论文复现)
+- [进阶使用](https://github.com/PaddlePaddle/PaddleRec/tree/master/models/multitask/mmoe#进阶使用)
+- [FAQ](https://github.com/PaddlePaddle/PaddleRec/tree/master/models/multitask/mmoe#FAQ)
+
+## 模型简介
+
+多任务模型通过学习不同任务的联系和差异，可提高每个任务的学习效率和质量。多任务学习的的框架广泛采用shared-bottom的结构，不同任务间共用底部的隐层。这种结构本质上可以减少过拟合的风险，但是效果上可能受到任务差异和数据分布带来的影响。  论文[《Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts》]( https://www.kdd.org/kdd2018/accepted-papers/view/modeling-task-relationships-in-multi-task-learning-with-multi-gate-mixture- )中提出了一个Multi-gate Mixture-of-Experts(MMOE)的多任务学习结构。MMOE模型刻画了任务相关性，基于共享表示来学习特定任务的函数，避免了明显增加参数的缺点。 
+
+我们在Paddlepaddle定义MMOE的网络结构，在开源数据集Census-income Data上验证模型效果，两个任务的auc分别为：
+
+1.income
+
+> max_mmoe_test_auc_income：0.94937
+>
+> mean_mmoe_test_auc_income：0.94465
+
+2.marital
+
+> max_mmoe_test_auc_marital：0.99419
+>
+> mean_mmoe_test_auc_marital：0.99324
+
+若进行精度验证，请参考[论文复现](https://github.com/PaddlePaddle/PaddleRec/tree/master/models/multitask/mmoe#论文复现)部分。
+
+本项目支持功能
+
+训练：单机CPU、单机单卡GPU、单机多卡GPU、本地模拟参数服务器训练、增量训练，配置请参考 [启动训练](https://github.com/PaddlePaddle/PaddleRec/blob/master/doc/train.md)
+预测：单机CPU、单机单卡GPU ；配置请参考[PaddleRec 离线预测](https://github.com/PaddlePaddle/PaddleRec/blob/master/doc/predict.md)
+
+## 数据准备
+
+数据地址： [Census-income Data](https://archive.ics.uci.edu/ml/machine-learning-databases/census-income-mld/census.tar.gz )
+
+数据解压后， 在run.sh脚本文件中添加文件的路径，并运行脚本。
+
+```sh
+mkdir train_data
+mkdir test_data
+mkdir data
+train_path="data/census-income.data"
+test_path="data/census-income.test"
+train_data_path="train_data/"
+test_data_path="test_data/"
+pip install -r requirements.txt
+wget -P data/ https://archive.ics.uci.edu/ml/machine-learning-databases/census-income-mld/census.tar.gz
+tar -zxvf data/census.tar.gz -C data/
+
+python data_preparation.py --train_path ${train_path} \
+                           --test_path ${test_path} \
+                           --train_data_path ${train_data_path}\
+                           --test_data_path ${test_data_path}
+
+```
+
+生成的格式以逗号为分割点
+
+```
+0,0,73,0,0,0,0,1700.09,0,0
+```
+
+
+## 运行环境
+
+PaddlePaddle>=1.7.2
+
+python 2.7/3.5/3.6/3.7
+
+PaddleRec >=0.1
+
+os : windows/linux/macos
+
+## 快速开始
+
+### 单机训练
+
+CPU环境
+
+在config.yaml文件中设置好设备，epochs等。
+
+```
+dataset:
+- name: dataset_train
+  batch_size: 5
+  type: QueueDataset
+  data_path: "{workspace}/data/train"
+  data_converter: "{workspace}/census_reader.py"
+- name: dataset_infer
+  batch_size: 5
+  type: QueueDataset
+  data_path: "{workspace}/data/train"
+  data_converter: "{workspace}/census_reader.py"
+```
+
+### 单机预测
+
+CPU环境
+
+在config.yaml文件中设置好epochs、device等参数。
+
+```
+- name: infer_runner
+  class: infer
+  init_model_path: "increment/0"
+  device: cpu
+```
+
+## 论文复现
+
+用原论文的完整数据复现论文效果需要在config.yaml中修改batch_size=1000, thread_num=8, epoch_num=4
+
+使用gpu p100 单卡训练 6.5h 测试auc: best:0.9940, mean:0.9932
+
+修改后运行方案：修改config.yaml中的'workspace'为config.yaml的目录位置，执行
+
+```
+python -m paddlerec.run -m /home/your/dir/config.yaml #调试模式 直接指定本地config的绝对路径
+```
+
+## 进阶使用
+
+## FAQ
--- a/models/multitask/mmoe/config.yaml
+++ b/models/multitask/mmoe/config.yaml
@@ -12,7 +12,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-workspace: "paddlerec.models.multitask.mmoe"
+workspace: "models/multitask/mmoe"

 dataset:
 - name: dataset_train

--- a/models/multitask/mmoe/data/data_preparation.py
+++ b/models/multitask/mmoe/data/data_preparation.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import pandas as pd
+import numpy as np
+import paddle.fluid as fluid
+from args import *
+
+
+def fun1(x):
+    if x == ' 50000+.':
+        return 1
+    else:
+        return 0
+
+
+def fun2(x):
+    if x == ' Never married':
+        return 1
+    else:
+        return 0
+
+
+def data_preparation(train_path, test_path, train_data_path, test_data_path):
+    # The column names are from
+    # https://www2.1010data.com/documentationcenter/prod/Tutorials/MachineLearningExamples/CensusIncomeDataSet.html
+    column_names = [
+        'age', 'class_worker', 'det_ind_code', 'det_occ_code', 'education',
+        'wage_per_hour', 'hs_college', 'marital_stat', 'major_ind_code',
+        'major_occ_code', 'race', 'hisp_origin', 'sex', 'union_member',
+        'unemp_reason', 'full_or_part_emp', 'capital_gains', 'capital_losses',
+        'stock_dividends', 'tax_filer_stat', 'region_prev_res',
+        'state_prev_res', 'det_hh_fam_stat', 'det_hh_summ', 'instance_weight',
+        'mig_chg_msa', 'mig_chg_reg', 'mig_move_reg', 'mig_same',
+        'mig_prev_sunbelt', 'num_emp', 'fam_under_18', 'country_father',
+        'country_mother', 'country_self', 'citizenship', 'own_or_self',
+        'vet_question', 'vet_benefits', 'weeks_worked', 'year', 'income_50k'
+    ]
+
+    # Load the dataset in Pandas
+    train_df = pd.read_csv(
+        train_path,
+        delimiter=',',
+        header=None,
+        index_col=None,
+        names=column_names)
+    other_df = pd.read_csv(
+        test_path,
+        delimiter=',',
+        header=None,
+        index_col=None,
+        names=column_names)
+
+    # First group of tasks according to the paper
+    label_columns = ['income_50k', 'marital_stat']
+
+    # One-hot encoding categorical columns
+    categorical_columns = [
+        'class_worker', 'det_ind_code', 'det_occ_code', 'education',
+        'hs_college', 'major_ind_code', 'major_occ_code', 'race',
+        'hisp_origin', 'sex', 'union_member', 'unemp_reason',
+        'full_or_part_emp', 'tax_filer_stat', 'region_prev_res',
+        'state_prev_res', 'det_hh_fam_stat', 'det_hh_summ', 'mig_chg_msa',
+        'mig_chg_reg', 'mig_move_reg', 'mig_same', 'mig_prev_sunbelt',
+        'fam_under_18', 'country_father', 'country_mother', 'country_self',
+        'citizenship', 'vet_question'
+    ]
+    train_raw_labels = train_df[label_columns]
+    other_raw_labels = other_df[label_columns]
+    transformed_train = pd.get_dummies(train_df, columns=categorical_columns)
+    transformed_other = pd.get_dummies(other_df, columns=categorical_columns)
+
+    # Filling the missing column in the other set
+    transformed_other[
+        'det_hh_fam_stat_ Grandchild <18 ever marr not in subfamily'] = 0
+    # get label
+    transformed_train['income_50k'] = transformed_train['income_50k'].apply(
+        lambda x: fun1(x))
+    transformed_train['marital_stat'] = transformed_train[
+        'marital_stat'].apply(lambda x: fun2(x))
+    transformed_other['income_50k'] = transformed_other['income_50k'].apply(
+        lambda x: fun1(x))
+    transformed_other['marital_stat'] = transformed_other[
+        'marital_stat'].apply(lambda x: fun2(x))
+    # Split the other dataset into 1:1 validation to test according to the paper
+    validation_indices = transformed_other.sample(
+        frac=0.5, replace=False, random_state=1).index
+    test_indices = list(set(transformed_other.index) - set(validation_indices))
+    validation_data = transformed_other.iloc[validation_indices]
+    test_data = transformed_other.iloc[test_indices]
+
+    cols = transformed_train.columns.tolist()
+    cols.insert(0, cols.pop(cols.index('income_50k')))
+    cols.insert(0, cols.pop(cols.index('marital_stat')))
+    transformed_train = transformed_train[cols]
+    test_data = test_data[cols]
+    validation_data = validation_data[cols]
+
+    print(transformed_train.shape, transformed_other.shape,
+          validation_data.shape, test_data.shape)
+    transformed_train.to_csv(train_data_path + 'train_data.csv', index=False)
+    test_data.to_csv(test_data_path + 'test_data.csv', index=False)
+
+
+args = data_preparation_args()
+data_preparation(args.train_path, args.test_path, args.train_data_path,
+                 args.test_data_path)
--- a/models/multitask/readme.md
+++ b/models/multitask/readme.md
@@ -44,9 +44,12 @@

 ## 使用教程(快速开始)
 ```shell
-python -m paddlerec.run -m paddlerec.models.multitask.mmoe # mmoe
-python -m paddlerec.run -m paddlerec.models.multitask.share-bottom # share-bottom
-python -m paddlerec.run -m paddlerec.models.multitask.esmm # esmm
+git clone https://github.com/PaddlePaddle/PaddleRec.git paddle-rec
+cd paddle-rec
+
+python -m paddlerec.run -m models/multitask/mmoe/config.yaml # mmoe
+python -m paddlerec.run -m models/multitask/share-bottom/config.yaml # share-bottom
+python -m paddlerec.run -m models/multitask/esmm/config.yaml # esmm
 ```

 ## 使用教程（复现论文）

--- a/models/multitask/share-bottom/README.md
+++ b/models/multitask/share-bottom/README.md
--- a/models/multitask/share-bottom/config.yaml
+++ b/models/multitask/share-bottom/config.yaml
@@ -12,7 +12,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-workspace: "paddlerec.models.multitask.share-bottom"
+workspace: "models/multitask/share-bottom"

 dataset:
 - name: dataset_train

--- a/models/multitask/share-bottom/data/data_preparation.py
+++ b/models/multitask/share-bottom/data/data_preparation.py
--- a/models/multitask/share-bottom/data/run.sh
+++ b/models/multitask/share-bottom/data/run.sh
+mkdir train_data
+mkdir test_data
+mkdir data
+train_path="data/census-income.data"
+test_path="data/census-income.test"
+train_data_path="train_data/"
+test_data_path="test_data/"
+pip install -r requirements.txt
+
+wget -P data/ https://archive.ics.uci.edu/ml/machine-learning-databases/census-income-mld/census.tar.gz
+tar -zxvf data/census.tar.gz -C data/
+
+python data_preparation.py --train_path ${train_path} \
+                           --test_path ${test_path} \
+                           --train_data_path ${train_data_path}\
+                           --test_data_path ${test_data_path}
--- a/models/rank/AutoInt/config.yaml
+++ b/models/rank/AutoInt/config.yaml
@@ -14,7 +14,7 @@

 # global settings 
 debug: false
-workspace: "paddlerec.models.rank.AutoInt"
+workspace: "models/rank/AutoInt"


 dataset:

--- a/models/rank/BST/config.yaml
+++ b/models/rank/BST/config.yaml
@@ -14,7 +14,7 @@

 # global settings 
 debug: false
-workspace: "paddlerec.models.rank.BST"
+workspace: "models/rank/BST"

 dataset:
 - name: sample_1

--- a/models/rank/afm/config.yaml
+++ b/models/rank/afm/config.yaml
--- a/models/rank/dcn/config.yaml
+++ b/models/rank/dcn/config.yaml
--- a/models/rank/deep_crossing/config.yaml
+++ b/models/rank/deep_crossing/config.yaml
--- a/models/rank/deepfm/config.yaml
+++ b/models/rank/deepfm/config.yaml
--- a/models/rank/dien/README.md
+++ b/models/rank/dien/README.md
--- a/models/rank/dien/config.yaml
+++ b/models/rank/dien/config.yaml
--- a/models/rank/dien/infer_reader.py
+++ b/models/rank/dien/infer_reader.py
--- a/models/rank/dien/reader.py
+++ b/models/rank/dien/reader.py
--- a/models/rank/din/README.md
+++ b/models/rank/din/README.md
--- a/models/rank/din/config.yaml
+++ b/models/rank/din/config.yaml
--- a/models/rank/din/infer_reader.py
+++ b/models/rank/din/infer_reader.py
--- a/models/rank/din/reader.py
+++ b/models/rank/din/reader.py
--- a/models/rank/dnn/backend.yaml
+++ b/models/rank/dnn/backend.yaml
--- a/models/rank/dnn/config.yaml
+++ b/models/rank/dnn/config.yaml
--- a/models/rank/ffm/config.yaml
+++ b/models/rank/ffm/config.yaml
--- a/models/rank/fgcnn/config.yaml
+++ b/models/rank/fgcnn/config.yaml
--- a/models/rank/fibinet/README.md
+++ b/models/rank/fibinet/README.md
--- a/models/rank/fibinet/config.yaml
+++ b/models/rank/fibinet/config.yaml
--- a/models/rank/flen/README.md
+++ b/models/rank/flen/README.md
--- a/models/rank/flen/config.yaml
+++ b/models/rank/flen/config.yaml
--- a/models/rank/flen/data/get_data.py
+++ b/models/rank/flen/data/get_data.py
--- a/models/rank/fm/config.yaml
+++ b/models/rank/fm/config.yaml
--- a/models/rank/fnn/config.yaml
+++ b/models/rank/fnn/config.yaml
--- a/models/rank/logistic_regression/config.yaml
+++ b/models/rank/logistic_regression/config.yaml
--- a/models/rank/nfm/config.yaml
+++ b/models/rank/nfm/config.yaml
--- a/models/rank/pnn/config.yaml
+++ b/models/rank/pnn/config.yaml
--- a/models/rank/readme.md
+++ b/models/rank/readme.md
--- a/models/rank/wide_deep/README.md
+++ b/models/rank/wide_deep/README.md
--- a/models/rank/wide_deep/config.yaml
+++ b/models/rank/wide_deep/config.yaml
--- a/models/rank/xdeepfm/config.yaml
+++ b/models/rank/xdeepfm/config.yaml
--- a/models/recall/fasttext/config.yaml
+++ b/models/recall/fasttext/config.yaml
--- a/models/recall/gnn/config.yaml
+++ b/models/recall/gnn/config.yaml
--- a/models/recall/gnn/model.py
+++ b/models/recall/gnn/model.py
--- a/models/recall/gnn/readme.md
+++ b/models/recall/gnn/readme.md
--- a/models/recall/gru4rec/config.yaml
+++ b/models/recall/gru4rec/config.yaml
--- a/models/recall/look-alike_recall/README.md
+++ b/models/recall/look-alike_recall/README.md
--- a/models/recall/look-alike_recall/__init__.py
+++ b/models/recall/look-alike_recall/__init__.py
--- a/models/recall/look-alike_recall/config.yaml
+++ b/models/recall/look-alike_recall/config.yaml
--- a/models/recall/look-alike_recall/data/build_dataset.py
+++ b/models/recall/look-alike_recall/data/build_dataset.py
--- a/models/recall/look-alike_recall/data/train_data/paddle_train.txt
+++ b/models/recall/look-alike_recall/data/train_data/paddle_train.txt
--- a/models/recall/look-alike_recall/model.py
+++ b/models/recall/look-alike_recall/model.py
--- a/models/recall/ncf/config.yaml
+++ b/models/recall/ncf/config.yaml
--- a/models/recall/readme.md
+++ b/models/recall/readme.md
--- a/models/recall/ssr/config.yaml
+++ b/models/recall/ssr/config.yaml
--- a/models/recall/word2vec/config.yaml
+++ b/models/recall/word2vec/config.yaml
--- a/models/recall/youtube_dnn/config.yaml
+++ b/models/recall/youtube_dnn/config.yaml
--- a/models/rerank/listwise/config.yaml
+++ b/models/rerank/listwise/config.yaml
--- a/models/rerank/readme.md
+++ b/models/rerank/readme.md
--- a/models/treebased/tdm/README.md
+++ b/models/treebased/tdm/README.md
--- a/models/treebased/tdm/config.yaml
+++ b/models/treebased/tdm/config.yaml
--- a/run.py
+++ b/run.py
--- a/setup.py
+++ b/setup.py
--- a/tests/test_auc_metrics.py
+++ b/tests/test_auc_metrics.py
--- a/tests/test_pairwise_pn.py
+++ b/tests/test_pairwise_pn.py
--- a/tests/test_precision_recall_metrics.py
+++ b/tests/test_precision_recall_metrics.py
--- a/tests/test_recall_k.py
+++ b/tests/test_recall_k.py