diff --git a/doc/imgs/overview.png b/doc/imgs/overview.png new file mode 100644 index 0000000000000000000000000000000000000000..12b1209ff36beaf46ae41fe3c2168acb171fb761 Binary files /dev/null and b/doc/imgs/overview.png differ diff --git a/doc/imgs/structure.png b/doc/imgs/structure.png index d7fd2839dc02ce4ebc5b1fcf2b0f02bcca0d7ed9..34042c5a025130e21bd50c0646dd0fc4ba5cbb7b 100644 Binary files a/doc/imgs/structure.png and b/doc/imgs/structure.png differ diff --git a/doc/model_list.md b/doc/model_list.md new file mode 100644 index 0000000000000000000000000000000000000000..9e68d9f6d2f8e9361cc13b9e76f28426062943bc --- /dev/null +++ b/doc/model_list.md @@ -0,0 +1,15 @@ +# 支持模型列表 +| 方向 | 模型 | 单机CPU训练 | 单机GPU训练 | 分布式CPU训练 | 大规模稀疏 | 分布式GPU训练 | 自定义数据集 | +| :------: | :--------------------: | :---------: | :---------: | :-----------: | :--------: | :-----------: | :----------: | +| 内容理解 | [Text-Classifcation]() | ✓ | x | ✓ | x | ✓ | ✓ | +| 内容理解 | [TagSpace]() | ✓ | x | ✓ | x | ✓ | ✓ | +| 召回 | [Word2Vec]() | ✓ | x | ✓ | x | ✓ | ✓ | +| 召回 | [TDM]() | ✓ | x | ✓ | x | ✓ | ✓ | +| 排序 | [CTR-Dnn]() | ✓ | x | ✓ | x | ✓ | ✓ | +| 排序 | [DeepFm]() | ✓ | x | ✓ | x | ✓ | ✓ | +| 排序 | [ListWise]() | ✓ | x | ✓ | x | ✓ | ✓ | +| 多任务 | [MMOE]() | ✓ | x | ✓ | x | ✓ | ✓ | +| 多任务 | [ESMM]() | ✓ | x | ✓ | x | ✓ | ✓ | +| 匹配 | [DSSM]() | ✓ | x | ✓ | x | ✓ | ✓ | +| 匹配 | [Multiview-Simnet]() | ✓ | x | ✓ | x | ✓ | ✓ | + diff --git a/doc/predict.md b/doc/predict.md index fed57802f2655594a74a24c495cfc92bd1bfac21..a33eda43ec6aed8ebe628f0540327b707055970d 100644 --- a/doc/predict.md +++ b/doc/predict.md @@ -1 +1 @@ -# PaddleRec 预测部署 \ No newline at end of file +# PaddleRec 离线预测 \ No newline at end of file diff --git a/models/rank/dnn/README.md b/models/rank/dnn/README.md new file mode 100644 index 0000000000000000000000000000000000000000..2d335cdc8c8a5bcd1979d666cfb858c0acd8d94b --- /dev/null +++ b/models/rank/dnn/README.md @@ -0,0 +1,270 @@ +# 基于DNN模型的点击率预估模型 + +## 介绍 +`CTR(Click Through Rate)`,即点击率,是“推荐系统/计算广告”等领域的重要指标,对其进行预估是商品推送/广告投放等决策的基础。简单来说,CTR预估对每次广告的点击情况做出预测,预测用户是点击还是不点击。CTR预估模型综合考虑各种因素、特征,在大量历史数据上训练,最终对商业决策提供帮助。本模型实现了下述论文中提出的DNN模型: + +```text +@inproceedings{guo2017deepfm, + title={DeepFM: A Factorization-Machine based Neural Network for CTR Prediction}, + author={Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li and Xiuqiang He}, + booktitle={the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI)}, + pages={1725--1731}, + year={2017} +} +``` + +# +## 数据准备 +### 数据来源 +训练及测试数据集选用[Display Advertising Challenge](https://www.kaggle.com/c/criteo-display-ad-challenge/)所用的Criteo数据集。该数据集包括两部分:训练集和测试集。训练集包含一段时间内Criteo的部分流量,测试集则对应训练数据后一天的广告点击流量。 +每一行数据格式如下所示: +```bash +