Merge pull request #319 from wangjiawei04/develop

Criteo CTR with Cube Doc

Merge pull request #319 from wangjiawei04/develop
Criteo CTR with Cube Doc
eced8957 · Dong Daxiang · GitHub · d873a398 · 4eac5f02 · eced8957
5 changed file
--- a/doc/CUBE_LOCAL.md
+++ b/doc/CUBE_LOCAL.md
+# Cube: Sparse Parameter Indexing Service (Local Mode)
+
+([简体中文](./CUBE_LOCAL_CN.md)|English)
+
+## Overview
+
+There are two examples on CTR under python / examples, they are criteo_ctr, criteo_ctr_with_cube. The former is to save the entire model during training, including sparse parameters. The latter is to cut out the sparse parameters and save them into two parts, one is the sparse parameter and the other is the dense parameter. Because the scale of sparse parameters is very large in industrial cases, reaching the order of 10 ^ 9. Therefore, it is not practical to start large-scale sparse parameter prediction on one machine. Therefore, we introduced Baidu's industrial-grade product Cube to provide the sparse parameter service for many years to provide distributed sparse parameter services.
+
+The local mode of Cube is different from distributed Cube, which is designed to be convenient for developers to use in experiments and demos. If there is a demand for distributed sparse parameter service, please continue reading [Distributed Cube User Guide](./Distributed_Cube) after reading this document (still developing).
+
+
+## Example
+in directory python/example/criteo_ctr_with_cube, run
+
+```
+python local_train.py # train model
+cp ../../../build_server/core/predictor/seq_generator seq_generator # copy Sequence File generator
+cp ../../../build_server/output/bin/cube* ./cube/ # copy Cube tool kits
+cp ../../../build_server/core/cube/cube-api/cube-cli ./cube/ # copy Cube Client
+cube_prepare.sh & # start deliver script
+```
+you will convert the Sparse Parameters from trained model to the Cube Server.
+
+## Components of Cube
+
+### cube-builder
+
+cube-builder is a tool for generating model shard files and version management. As the cube is used for distributed sparse parameter services, for each node in the distribution, different shards need to be loaded. However, the generated sparse parameter file is often a large file, and it needs to be divided into different shards by a hash function. At the same time, industrial-level scenarios need to support regular model distribution and streaming training, so it is very important for the version management of the model. This is also the missing part when training and saving the model. Therefore, while the cube-builder generates the shards, You can also manually specify version information.
+
+### cube-server
+
+The cube-server is based on the sparse parameter indexing, providing the sparse parameter service. It provides high-performance distributed query service through brpc, and makes remote calls through RestAPI.
+
+### cube-cli
+
+cube-cli is the client of cube-server. This part has been integrated into paddle serving. When we prepare the cube.conf configuration file and specify the kv_infer related op in the code of paddle serving server, cube-cli will Ready on the serving side.
+
+
+## Serving the Model Step by Step
+### precondition
+
+we need a trained model, and copy the tool kits from build_server folder.
+```
+python local_train.py # train model
+cp ../../../build_server/core/predictor/seq_generator seq_generator  # copy Sequence File generator
+cp ../../../build_server/output/bin/cube* ./cube/ # copy Cube tool kits
+cp ../../../build_server/core/cube/cube-api/cube-cli ./cube/ # copy Cube Client
+```
+
+### Generate Sequence File from Sparse Parameter
+
+In order to get the model parameters from the training end to the prediction end, we need to convert the trained model from the Paddle model save format to the Sequence File format.
+
+**why is Sequence File?**
+Sequence File is a common format for the Hadoop File System. It was mentioned at the beginning of the article that distributed cubes can provide support for ultra-large-scale sparse parameter services, and large-scale sparse parameters are stored in distributed file systems in actual production environments. Hadoop File System is one of the most stable distributed open source. So the Sequence File format became the file format for the Cube loading model.
+
+```
+mkdir -p cube_model
+mkdir -p cube/data
+./seq_generator ctr_serving_model/SparseFeatFactors ./cube_model/feature
+```
+
+### Generating Shards
+
+For the local version of Cube, the number of shard is 1. run
+
+```
+cube-builder -dict_name=test_dict -job_mode=base -last_version=0 -cur_version=0 -depend_version=0 -input_path=./cube_model -output_path=./cube/data -shard_num=1  -only_build=false
+```
+
+
+### Deliver to Cube-Server
+
+The process of the cube local version is very simple, you only need to store the index files. in ./data folder where the cube binary program is located.
+```
+mv ./cube/data/0_0/test_dict_part0/* ./cube/data/
+cd cube && ./cube &
+```
+
+
+### Cube-Client Verification
+
+this step is not necessary, but it can help you to verify if the model is ready.
+```
+./cube-cli -dict_name=test_dict -keys  keys -conf ./cube/cube.conf
+```
+if you succeed, you will see this
+<p align="center">
+    <img src="cube-cli.png" width="700">
+</p>
+
+If you see that each key has a corresponding value output, it means that the delivery was successful. This file can also be used by Serving to perform cube query in general kv infer op in Serving.
+
+
+## Appendix: Configuration
+the config file is cube.config located in python/examples/criteo_ctr_with_cube/cube/conf, this file is used by cube-cli.the Cube Local Mode users do not need to understand that just use it, it would be quite important in Cube Distributed Mode.
+
+```
+[{
+    "dict_name": "test_dict",  //table name
+    "shard": 1,  //shard num
+    "dup": 1,  //duplicates
+    "timeout": 200,
+    "retry": 3,
+    "backup_request": 100,
+    "type": "ipport_list",
+    "load_balancer": "rr",
+    "nodes": [{
+        "ipport_list": "list://127.0.0.1:8027" //IP list
+    }]
+}]
+```
--- a/doc/CUBE_LOCAL_CN.md
+++ b/doc/CUBE_LOCAL_CN.md
+# 稀疏参数索引服务Cube单机版使用指南
+
+(简体中文|[English](./CUBE_LOCAL.md))
+
+## 引言
+
+在python/examples下有两个关于CTR的示例，他们分别是criteo_ctr, criteo_ctr_with_cube。前者是在训练时保存整个模型，包括稀疏参数。后者是将稀疏参数裁剪出来，保存成两个部分，一个是稀疏参数，另一个是稠密参数。由于在工业级的场景中，稀疏参数的规模非常大，达到10^9数量级。因此在一台机器上启动大规模稀疏参数预测是不实际的，因此我们引入百度多年来在稀疏参数索引领域的工业级产品Cube，提供分布式的稀疏参数服务。
+
+单机版Cube是分布式Cube的弱化版本，旨在方便开发者做实验和Demo时使用。如果有分布式稀疏参数服务的需求，请在读完此文档之后，继续阅读  [稀疏参数索引服务Cube使用指南](分布式Cube)（正在建设中）。
+
+
+## 示例
+在python/example/criteo_ctr_with_cube下执行
+```
+python local_train.py # 训练模型
+cp ../../../build_server/core/predictor/seq_generator seq_generator #复制Sequence File模型生成工具
+cp ../../../build_server/output/bin/cube* ./cube/ #复制Cube应用程序
+cp ../../../build_server/core/cube/cube-api/cube-cli ./cube/ # 复制Cube-Cli
+cube_prepare.sh & #启动配送脚本
+```
+此示例是从模型训练到配送给Cube的全套流程，接下来会一一介绍。
+
+## 单机版Cube组件介绍
+
+
+### cube-builder
+
+cube-builder是把模型生成分片文件和版本管理的工具。由于cube是用于分布式的稀疏参数服务，对于分布式当中的每一个节点，需要加载不同的分片，然而生成的稀疏参数文件往往一个大文件，就需要用哈希函数将其分割为不同的分片。与此同时，工业级的场景需要支持定期模型的配送和流式训练，因此对于模型的版本管理十分重要，这也是在训练保存模型时缺失的部分，因此cube-builder在生成分片的同时，也可以人为指定增加版本信息。
+
+### cube-server
+
+cube-server基于Cube的KV能力，对外提供稀疏参数服务，它通过brpc提供高性能分布式查询服务，通过RestAPI来进行远端调用。
+
+### cube-cli
+
+cube-cli是cube-server的客户端，这部分已经被整合到paddle serving当中，当我们准备好cube.conf配置文件并在paddle serving server的代码中指定kv_infer相关的op时，cube-cli就会在serving端准备就绪。
+
+## 模型配送步骤
+### 前序步骤
+
+需要训练出模型文件，并复制相关build_server目录下的应用程序
+```
+python local_train.py
+cp ../../../build_server/core/predictor/seq_generator seq_generator #复制Sequence File模型生成工具
+cp ../../../build_server/output/bin/cube* ./cube/ #复制Cube应用程序
+cp ../../../build_server/core/cube/cube-api/cube-cli ./cube/ # 复制Cube-Cli
+```
+
+### 模型文件生成Sequence File
+
+为了让模型参数从训练端配送到预测端，我们需要把训练好的模型从Paddle 模型保存格式转换成Sequence File格式。
+
+**为什么是 Sequence File?**
+Sequence File是Hadoop File System的通用格式。在文章的开头提到了分布式Cube可以为超大规模稀疏参数服务提供支持，而大规模的稀疏参数在实际生产环境中保存在分布式文件系统当中，Hadoop File System是业界开源的最稳定的分布式文件系统之一，因此Sequence File格式成为了Cube加载模型的文件格式。
+
+```
+mkdir -p cube_model
+mkdir -p cube/data
+./seq_generator ctr_serving_model/SparseFeatFactors ./cube_model/feature
+```
+
+### 生成分片文件
+
+在单机版的环境下，分片数为1。执行
+
+```
+./cube/cube-builder -dict_name=test_dict -job_mode=base -last_version=0 -cur_version=0 -depend_version=0 -input_path=./cube_model -output_path=./cube/data -shard_num=1  -only_build=false
+
+```
+
+### 配送给Cube-Server
+
+
+单机版本的配送过程非常简单，只需要在cube二进制程序所在目录下的data文件夹存放index.前缀的文件即可。
+
+```
+mv ./cube/data/0_0/test_dict_part0/* ./cube/data/
+cd cube && ./cube &
+```
+
+### Cube-Client 验证配送是否成功
+此步非必须，用于测试配送是否成功
+```
+cd cube
+./cube-cli -dict_name=test_dict -keys  keys -conf ./cube/cube.conf
+```
+
+如果查看到每个key都有对应的value输出，就说明配送成功。此文件也可以被Serving使用，用作Serving中 general kv infer op中进行cube查询。
+
+如果执行成功，会看到如下结果
+<p align="center">
+    <img src="cube-cli.png" width="700">
+</p>
+
+
+## 注： 配置文件
+以python/examples/criteo_ctr_with_cube/cube/conf下的cube.conf示例，此文件被上述的cube-cli所使用，单机版用户可以直接使用不用关注此部分，它在分布式部署中更为重要。
+
+```
+[{
+    "dict_name": "test_dict",  //表名
+    "shard": 1,  //分片数
+    "dup": 1,  //副本数
+    "timeout": 200,
+    "retry": 3,
+    "backup_request": 100,
+    "type": "ipport_list",
+    "load_balancer": "rr",
+    "nodes": [{
+        "ipport_list": "list://127.0.0.1:8027" //IP列表
+    }]
+}]
+```
--- a/doc/cube-cli.png
+++ b/doc/cube-cli.png
--- a/python/examples/criteo_ctr_with_cube/README.md
+++ b/python/examples/criteo_ctr_with_cube/README.md
-## 带稀疏参数服务器的CTR预测服务
+## Criteo CTR with Sparse Parameter Indexing Service
+
+([简体中文](./README_CN.md)|English)
+
+### Get Sample Dataset

-### 获取样例数据
 ```
 sh get_data.sh
 ```

-### 保存模型和配置文件
+### Train and Save Model
 ```
 python local_train.py
 ```
-执行脚本后会在当前目录生成ctr_server_model和ctr_client_config文件夹,以及ctr_server_model_kv, ctr_client_conf_kv。
+the trained model will be in ./ctr_server_model and ./ctr_client_config, and ctr_server_model_kv, ctr_client_conf_kv。

-### 启动稀疏参数服务器
+### Start Sparse Parameter Indexing Service
 ```
 cp ../../../build_server/core/predictor/seq_generator seq_generator
 cp ../../../build_server/output/bin/cube* ./cube/
 sh cube_prepare.sh &
 ```

-### 启动RPC预测服务，服务端线程数为4（可在test_server.py配置）
+Here, the sparse parameter is loaded by cube sparse parameter indexing service Cube，for more details please read [Cube: Sparse Parameter Indexing Service (Local Mode)](../../../doc/CUBE_LOCAL.md)
+
+### Start RPC Predictor, the number of serving thread is 4（configurable in test_server.py）

 ```
 python test_server.py ctr_serving_model_kv 
 ```

-### 执行预测
+### Run Prediction

 ```
 python test_client.py ctr_client_conf/serving_client_conf.prototxt ./raw_data
@@ -32,17 +37,17 @@ python test_client.py ctr_client_conf/serving_client_conf.prototxt ./raw_data

 ### Benchmark

-设备 ：Intel(R) Xeon(R) CPU 6148 @ 2.40GHz 
+CPU ：Intel(R) Xeon(R) CPU 6148 @ 2.40GHz 

-模型 ：[Criteo CTR](https://github.com/PaddlePaddle/Serving/blob/develop/python/examples/ctr_criteo_with_cube/network_conf.py)
+Model ：[Criteo CTR](https://github.com/PaddlePaddle/Serving/blob/develop/python/examples/ctr_criteo_with_cube/network_conf.py)

 server core/thread num ： 4/8

-执行
+Run
 ```
 bash benchmark.sh
 ```
-客户端每个线程会发送1000个batch
+1000 batches will be sent by every client

 | client  thread num | prepro | client infer | op0    | op1   | op2    | postpro | avg_latency | qps   |
 | ------------------ | ------ | ------------ | ------ | ----- | ------ | ------- | ----- | ----- |
@@ -52,10 +57,10 @@ bash benchmark.sh
 | 8                  | 0.044  | 8.230        | 0.028  | 0.464 | 0.0023 | 0.0034  | 14.191 | 563.8 |
 | 16                 | 0.048  | 21.037       | 0.028  | 0.455 | 0.0025 | 0.0041  | 27.236 | 587.5 |

-平均每个线程耗时图如下
+the average latency of threads

 ![avg cost](../../../doc/criteo-cube-benchmark-avgcost.png)

-每个线程QPS耗时如下
+The QPS is 

 ![qps](../../../doc/criteo-cube-benchmark-qps.png)
--- a/python/examples/criteo_ctr_with_cube/README_CN.md
+++ b/python/examples/criteo_ctr_with_cube/README_CN.md
+## 带稀疏参数索引服务的CTR预测服务
+(简体中文|[English](./README.md))
+
+### 获取样例数据
+```
+sh get_data.sh
+```
+
+### 保存模型和配置文件
+```
+python local_train.py
+```
+执行脚本后会在当前目录生成ctr_server_model和ctr_client_config文件夹,以及ctr_server_model_kv, ctr_client_conf_kv。
+
+### 启动稀疏参数索引服务
+```
+cp ../../../build_server/core/predictor/seq_generator seq_generator
+cp ../../../build_server/output/bin/cube* ./cube/
+sh cube_prepare.sh &
+```
+
+此处，模型当中的稀疏参数会被存放在稀疏参数索引服务Cube当中，关于稀疏参数索引服务Cube的介绍，请阅读[稀疏参数索引服务Cube单机版使用指南](../../../doc/CUBE_LOCAL_CN.md)
+
+### 启动RPC预测服务，服务端线程数为4（可在test_server.py配置）
+
+```
+python test_server.py ctr_serving_model_kv 
+```
+
+### 执行预测
+
+```
+python test_client.py ctr_client_conf/serving_client_conf.prototxt ./raw_data
+```
+
+### Benchmark
+
+设备 ：Intel(R) Xeon(R) CPU 6148 @ 2.40GHz 
+
+模型 ：[Criteo CTR](https://github.com/PaddlePaddle/Serving/blob/develop/python/examples/ctr_criteo_with_cube/network_conf.py)
+
+server core/thread num ： 4/8
+
+执行
+```
+bash benchmark.sh
+```
+客户端每个线程会发送1000个batch
+
+| client  thread num | prepro | client infer | op0    | op1   | op2    | postpro | avg_latency | qps   |
+| ------------------ | ------ | ------------ | ------ | ----- | ------ | ------- | ----- | ----- |
+| 1                  | 0.035  | 1.596        | 0.021  | 0.518 | 0.0024 | 0.0025  | 6.774 | 147.7 |
+| 2                  | 0.034  | 1.780        | 0.027  | 0.463 | 0.0020 | 0.0023  | 6.931 | 288.3 |
+| 4                  | 0.038  | 2.954        | 0.025  | 0.455 | 0.0019 | 0.0027  | 8.378 | 477.5 |
+| 8                  | 0.044  | 8.230        | 0.028  | 0.464 | 0.0023 | 0.0034  | 14.191 | 563.8 |
+| 16                 | 0.048  | 21.037       | 0.028  | 0.455 | 0.0025 | 0.0041  | 27.236 | 587.5 |
+
+平均每个线程耗时图如下
+
+![avg cost](../../../doc/criteo-cube-benchmark-avgcost.png)
+
+每个线程QPS耗时如下
+
+![qps](../../../doc/criteo-cube-benchmark-qps.png)