README_en.md 17.6 KB
Newer Older
C
fix  
chengmo 已提交
1
([简体中文](./README.md)|English)
M
MrChengmo 已提交
2 3 4 5
<p align="center">
<img align="center" src="doc/imgs/logo.png">
<p>
<p align="center">
C
update  
chengmo 已提交
6
<img align="center" src="doc/imgs/overview_en.png">
M
MrChengmo 已提交
7 8 9 10 11
<p>


<h2 align="center">What is recommendation system ?</h2>
<p align="center">
C
update  
chengmo 已提交
12
<img align="center" src="doc/imgs/rec-overview-en.png">
M
MrChengmo 已提交
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
<p>

- Recommendation system is the key to help users get information of interest efficiently in the era of explosive growth of Internet information

- The recommendation system is also a silver bullet to help the product attract users, retain users, increase user stickiness and improve user conversion.

- Excellent recommendation system can help the product establish a good reputation, and help the product gain market share

  > It can be said that who can master and make good use of the recommendation system, who can get the first chance in the fierce competition of information distribution.
  >
  > At the same time, there are many problems that perplex the developers of the recommendation system, such as: huge data volume, complex model structure, inefficient distributed training environment, demanding online deployment requirements, all of which are too numerous to enumerate.

<h2 align="center">What is PaddleRec ?</h2>


C
update  
chengmo 已提交
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74
- A quick start tool of search & recommendation model based on [PaddlePaddle](https://www.paddlepaddle.org.cn/documentation/docs/en/beginners_guide/index_en.html)
- The whole process solution of recommendation system for beginners, developers and researchers
- Complete recommendation algorithm library including content understanding, matching, recall, ranking, multi-task, re-rank etc.


    |         Type          |                                 Algorithm                                 |  CPU  |   GPU   | Parameter-Server | Multi-GPU | Paper                                                                                                                                                                                                       |
    | :-------------------: | :-----------------------------------------------------------------------: | :---: | :-----: | :--------------: | :-------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
    | Content-Understanding | [Text-Classifcation](models/contentunderstanding/classification/model.py) |   ✓   |    ✓    |        ✓         |     x     | [EMNLP 2014][Convolutional neural networks for sentence classication](https://www.aclweb.org/anthology/D14-1181.pdf)                                                                                        |
    | Content-Understanding |         [TagSpace](models/contentunderstanding/tagspace/model.py)         |   ✓   |    ✓    |        ✓         |     x     | [EMNLP 2014][TagSpace: Semantic Embeddings from Hashtags](https://www.aclweb.org/anthology/D14-1194.pdf)                                                                                                    |
    |       Matching        |                    [DSSM](models/match/dssm/model.py)                     |   ✓   |    ✓    |        ✓         |     x     | [CIKM 2013][Learning Deep Structured Semantic Models for Web Search using Clickthrough Data](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/cikm2013_DSSM_fullversion.pdf)             |
    |       Matching        |        [MultiView-Simnet](models/match/multiview-simnet/model.py)         |   ✓   |    ✓    |        ✓         |     x     | [WWW 2015][A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/frp1159-songA.pdf)             |
    |        Recall         |                   [TDM](models/treebased/tdm/model.py)                    |   ✓   | >=1.8.0 |        ✓         |  >=1.8.0  | [KDD 2018][Learning Tree-based Deep Model for Recommender Systems](https://arxiv.org/pdf/1801.02294.pdf)                                                                                                    |
    |        Recall         |                [fasttext](models/recall/fasttext/model.py)                |   ✓   |    ✓    |        x         |     x     | [EACL 2017][Bag of Tricks for Efficient Text Classification](https://www.aclweb.org/anthology/E17-2068.pdf)                                                                                                 |
    |        Recall         |                [Word2Vec](models/recall/word2vec/model.py)                |   ✓   |    ✓    |        ✓         |     x     | [NIPS 2013][Distributed Representations of Words and Phrases and their Compositionality](https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf) |
    |        Recall         |                     [SSR](models/recall/ssr/model.py)                     |   ✓   |    ✓    |        ✓         |     ✓     | [SIGIR 2016][Multi-Rate Deep Learning for Temporal Recommendation](http://sonyis.me/paperpdf/spr209-song_sigir16.pdf)                                                                                       |
    |        Recall         |                 [Gru4Rec](models/recall/gru4rec/model.py)                 |   ✓   |    ✓    |        ✓         |     ✓     | [2015][Session-based Recommendations with Recurrent Neural Networks](https://arxiv.org/abs/1511.06939)                                                                                                      |
    |        Recall         |             [Youtube_dnn](models/recall/youtube_dnn/model.py)             |   ✓   |    ✓    |        ✓         |     ✓     | [RecSys 2016][Deep Neural Networks for YouTube Recommendations](https://static.googleusercontent.com/media/research.google.com/zh-CN//pubs/archive/45530.pdf)                                               |
    |        Recall         |                     [NCF](models/recall/ncf/model.py)                     |   ✓   |    ✓    |        ✓         |     ✓     | [WWW 2017][Neural Collaborative Filtering](https://arxiv.org/pdf/1708.05031.pdf)                                                                                                                            |
    |        Recall         |                     [GNN](models/recall/gnn/model.py)                     |   ✓   |    ✓    |        ✓         |     ✓     | [AAAI 2019][Session-based Recommendation with Graph Neural Networks](https://arxiv.org/abs/1811.00855)                                                                                                      |
    |        Ranking        |      [Logistic Regression](models/rank/logistic_regression/model.py)      |   ✓   |    x    |        ✓         |     x     | /                                                                                                                                                                                                           |
    |        Ranking        |                      [Dnn](models/rank/dnn/model.py)                      |   ✓   |    ✓    |        ✓         |     ✓     | /                                                                                                                                                                                                           |
    |        Ranking        |                       [FM](models/rank/fm/model.py)                       |   ✓   |    x    |        ✓         |     x     | [IEEE Data Mining 2010][Factorization machines](https://analyticsconsultores.com.mx/wp-content/uploads/2019/03/Factorization-Machines-Steffen-Rendle-Osaka-University-2010.pdf)                             |
    |        Ranking        |                      [FFM](models/rank/ffm/model.py)                      |   ✓   |    x    |        ✓         |     x     | [RECSYS 2016][Field-aware Factorization Machines for CTR Prediction](https://dl.acm.org/doi/pdf/10.1145/2959100.2959134)                                                                                    |
    |        Ranking        |                      [FNN](models/rank/fnn/model.py)                      |   ✓   |    x    |        ✓         |     x     | [ECIR 2016][Deep Learning over Multi-field Categorical Data](https://arxiv.org/pdf/1601.02376.pdf)                                                                                                          |
    |        Ranking        |            [Deep Crossing](models/rank/deep_crossing/model.py)            |   ✓   |    x    |        ✓         |     x     | [ACM 2016][Deep Crossing: Web-Scale Modeling without Manually Crafted Combinatorial Features](https://www.kdd.org/kdd2016/papers/files/adf0975-shanA.pdf)                                                   |
    |        Ranking        |                      [Pnn](models/rank/pnn/model.py)                      |   ✓   |    x    |        ✓         |     x     | [ICDM 2016][Product-based Neural Networks for User Response Prediction](https://arxiv.org/pdf/1611.00144.pdf)                                                                                               |
    |        Ranking        |                      [DCN](models/rank/dcn/model.py)                      |   ✓   |    x    |        ✓         |     x     | [KDD 2017][Deep & Cross Network for Ad Click Predictions](https://dl.acm.org/doi/pdf/10.1145/3124749.3124754)                                                                                               |
    |        Ranking        |                      [NFM](models/rank/nfm/model.py)                      |   ✓   |    x    |        ✓         |     x     | [SIGIR 2017][Neural Factorization Machines for Sparse Predictive Analytics](https://dl.acm.org/doi/pdf/10.1145/3077136.3080777)                                                                             |
    |        Ranking        |                      [AFM](models/rank/afm/model.py)                      |   ✓   |    x    |        ✓         |     x     | [IJCAI 2017][Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks](https://arxiv.org/pdf/1708.04617.pdf)                                                  |
    |        Ranking        |                   [DeepFM](models/rank/deepfm/model.py)                   |   ✓   |    x    |        ✓         |     x     | [IJCAI 2017][DeepFM: A Factorization-Machine based Neural Network for CTR Prediction](https://arxiv.org/pdf/1703.04247.pdf)                                                                                 |
    |        Ranking        |                  [xDeepFM](models/rank/xdeepfm/model.py)                  |   ✓   |    x    |        ✓         |     x     | [KDD 2018][xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems](https://dl.acm.org/doi/pdf/10.1145/3219819.3220023)                                                       |
    |        Ranking        |                      [DIN](models/rank/din/model.py)                      |   ✓   |    x    |        ✓         |     x     | [KDD 2018][Deep Interest Network for Click-Through Rate Prediction](https://dl.acm.org/doi/pdf/10.1145/3219819.3219823)                                                                                     |
    |        Ranking        |                [Wide&Deep](models/rank/wide_deep/model.py)                |   ✓   |    x    |        ✓         |     x     | [DLRS 2016][Wide & Deep Learning for Recommender Systems](https://dl.acm.org/doi/pdf/10.1145/2988450.2988454)                                                                                               |
    |        Ranking        |                    [FGCNN](models/rank/fgcnn/model.py)                    |   ✓   |    ✓    |        ✓         |     ✓     | [WWW 2019][Feature Generation by Convolutional Neural Network for Click-Through Rate Prediction](https://arxiv.org/pdf/1904.04447.pdf)                                                                      |
    |        Ranking        |                  [Fibinet](models/rank/fibinet/model.py)                  |   ✓   |    ✓    |        ✓         |     ✓     | [RecSys19][FiBiNET: Combining Feature Importance and Bilinear feature Interaction for Click-Through Rate Prediction]( https://arxiv.org/pdf/1905.09433.pdf)                                                 |
    |      Multi-Task       |                  [ESMM](models/multitask/esmm/model.py)                   |   ✓   |    ✓    |        ✓         |     ✓     | [SIGIR 2018][Entire Space Multi-Task Model: An Effective Approach for Estimating Post-Click Conversion Rate](https://arxiv.org/abs/1804.07931)                                                              |
    |      Multi-Task       |                  [MMOE](models/multitask/mmoe/model.py)                   |   ✓   |    ✓    |        ✓         |     ✓     | [KDD 2018][Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts](https://dl.acm.org/doi/abs/10.1145/3219819.3220007)                                                       |
    |      Multi-Task       |           [ShareBottom](models/multitask/share-bottom/model.py)           |   ✓   |    ✓    |        ✓         |     ✓     | [1998][Multitask learning](http://reports-archive.adm.cs.cmu.edu/anon/1997/CMU-CS-97-203.pdf)                                                                                                               |
    |        Re-Rank        |                [Listwise](models/rerank/listwise/model.py)                |   ✓   |    ✓    |        ✓         |     x     | [2019][Sequential Evaluation and Generation Framework for Combinatorial Recommender System](https://arxiv.org/pdf/1902.00245.pdf)                                                                           |





<h2 align="center">Getting Started</h2>

### Environmental requirements
M
MrChengmo 已提交
75 76
* Python 2.7/ 3.5 / 3.6 / 3.7
* PaddlePaddle  >= 1.7.2
C
update  
chengmo 已提交
77
* operating system: Windows/Mac/Linux
M
MrChengmo 已提交
78

C
update  
chengmo 已提交
79
  > Linux is recommended for distributed training
M
MrChengmo 已提交
80
  
C
update  
chengmo 已提交
81
### Installation
M
MrChengmo 已提交
82

C
update  
chengmo 已提交
83
1. **Install by pip**
M
MrChengmo 已提交
84 85 86
  ```bash
  python -m pip install paddle-rec
  ```
C
update  
chengmo 已提交
87 88 89 90
  > This method will download and install`paddlepaddle-v1.7.2-cpu`,if you are prompted that `PaddlePaddle` can not be installed automatically,You need to install `PaddlePaddle` manually,and then install `Paddlerec` again:
  > - Download PaddlePaddle whl from [address](https://pypi.org/project/paddlepaddle/1.7.2/#files) and install by pip.
  > - Directly install `PaddlePaddle` by pip,`python -m pip install paddlepaddle==1.7.2 -i https://mirror.baidu.com/pypi/simple`
  > - Other installation problems can be raised in [Paddle Issue](https://github.com/PaddlePaddle/Paddle/issues) or [PaddleRec Issue](https://github.com/PaddlePaddle/PaddleRec/issues)
M
MrChengmo 已提交
91

C
update  
chengmo 已提交
92
2. **Install by source code**
M
MrChengmo 已提交
93
  
C
update  
chengmo 已提交
94
  - Install PaddlePaddle  
M
MrChengmo 已提交
95 96 97 98 99

    ```shell
    python -m pip install paddlepaddle==1.7.2 -i https://mirror.baidu.com/pypi/simple
    ```

C
update  
chengmo 已提交
100
  - Install PaddleRec by source code
M
MrChengmo 已提交
101 102 103 104 105 106 107

    ```
    git clone https://github.com/PaddlePaddle/PaddleRec/
    cd PaddleRec
    python setup.py install
    ```

C
update  
chengmo 已提交
108
- Install PaddleRec-GPU  
M
MrChengmo 已提交
109

C
update  
chengmo 已提交
110
  After installing `PaddleRec`,You need to manually install `paddlepaddle-gpu`,select the appropriate version according to your environment (CUDA / cudnn),please refer to the installation tutorial[Installation Manuals](https://www.paddlepaddle.org.cn/documentation/docs/en/install/index_en.html)
M
MrChengmo 已提交
111 112


C
update  
chengmo 已提交
113
<h2 align="center">Quick Start</h2>
M
MrChengmo 已提交
114

C
update  
chengmo 已提交
115
We take the `dnn` algorithm as an example to introduce the quick start of `PaddleRec`, and we took 100 pieces of training data from [Criteo Dataset](https://www.kaggle.com/c/criteo-display-ad-challenge/):
M
MrChengmo 已提交
116 117

```bash
C
update  
chengmo 已提交
118
# Training with cpu
M
MrChengmo 已提交
119 120 121 122
python -m paddlerec.run -m paddlerec.models.rank.dnn  
```


C
update  
chengmo 已提交
123
<h2 align="center">Documentation</h2>
M
MrChengmo 已提交
124

C
update  
chengmo 已提交
125 126 127
### Background
* [Recommendation System](doc/rec_background.md)
* [Distributed deep learning](doc/ps_background.md)
M
MrChengmo 已提交
128

C
update  
chengmo 已提交
129 130
### Introductory Project
* [Ten minutes to learn PaddleRec](https://aistudio.baidu.com/aistudio/projectdetail/559336)
M
MrChengmo 已提交
131

C
update  
chengmo 已提交
132 133 134 135 136 137
### Introductory tutorial
* [Prepare Data](doc/slot_reader.md)
* [HyperParameter of model](doc/model.md)
* [Start Training](doc/train.md)
* [Start Predicting](doc/predict.md)
* [Serving](doc/serving.md)
M
MrChengmo 已提交
138 139


C
update  
chengmo 已提交
140 141 142 143 144 145
### Advanced tutorial
* [Custom Reader](doc/custom_reader.md)
* [Custom Model](doc/model_develop.md)
* [Custom Training Process](doc/trainer_develop.md)
* [Configuration description of yaml](doc/yaml.md)
* [Design document of PaddleRec](doc/design.md)
M
MrChengmo 已提交
146 147 148 149 150

### Benchmark
* [Benchmark](doc/benchmark.md)

### FAQ
C
update  
chengmo 已提交
151
* [Common Problem FAQ](doc/faq.md)
M
MrChengmo 已提交
152 153


C
update  
chengmo 已提交
154
<h2 align="center">Community</h2>
M
MrChengmo 已提交
155 156 157 158 159 160 161 162 163

<p align="center">
    <br>
    <img alt="Release" src="https://img.shields.io/badge/Release-0.1.0-yellowgreen">
    <img alt="License" src="https://img.shields.io/github/license/PaddlePaddle/PaddleRec">
    <img alt="Slack" src="https://img.shields.io/badge/Join-Slack-green">
    <br>
<p>

C
update  
chengmo 已提交
164
### Version history
M
MrChengmo 已提交
165 166 167 168
- 2020.06.17 - PaddleRec v0.1.0
- 2020.06.03 - PaddleRec v0.0.2
- 2020.05.14 - PaddleRec v0.0.1
  
C
update  
chengmo 已提交
169 170
### License
[Apache 2.0 license](LICENSE)
M
MrChengmo 已提交
171

C
update  
chengmo 已提交
172
### Contack us
M
MrChengmo 已提交
173

C
update  
chengmo 已提交
174
For any feedback or to report a bug, please propose a [GitHub Issue](https://github.com/PaddlePaddle/PaddleRec/issues)
M
MrChengmo 已提交
175

C
update  
chengmo 已提交
176
You can also communicate with us in the following ways:
M
MrChengmo 已提交
177

C
update  
chengmo 已提交
178 179
- QQ group id:`861717190`
- Wechat account:`paddlerec2020`
M
MrChengmo 已提交
180 181

<p align="center"><img width="200" height="200" margin="500" src="./doc/imgs/QQ_group.png"/>&#8194;&#8194;&#8194;&#8194;&#8194<img width="200" height="200"  src="doc/imgs/weixin_supporter.png"/></p>
C
update  
chengmo 已提交
182
<p align="center">PaddleRec QQ Group&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;PaddleRec Wechat account</p>