Merge pull request #335 from wangjiawei04/jiawei/cube_quant_doc

add cube quantization doc

Merge pull request #335 from wangjiawei04/jiawei/cube_quant_doc
add cube quantization doc
dcc22b3c · Dong Daxiang · GitHub · cd601216 · 90760d42 · dcc22b3c
Showing with 104 addition and 0 deletion

doc/CUBE_LOCAL.md doc/CUBE_LOCAL.md +2 -0

doc/CUBE_LOCAL_CN.md doc/CUBE_LOCAL_CN.md +2 -0

doc/CUBE_QUANT.md doc/CUBE_QUANT.md +50 -0

doc/CUBE_QUANT_CN.md doc/CUBE_QUANT_CN.md +50 -0

未找到文件。
--- a/doc/CUBE_LOCAL.md
+++ b/doc/CUBE_LOCAL.md
@@ -8,6 +8,8 @@ There are two examples on CTR under python / examples, they are criteo_ctr, crit

 The local mode of Cube is different from distributed Cube, which is designed to be convenient for developers to use in experiments and demos. If there is a demand for distributed sparse parameter service, please continue reading [Distributed Cube User Guide](./Distributed_Cube) after reading this document (still developing).

+This document uses the original model without any compression algorithm. If there is a need for a quantitative model to go online, please read the [Quantization Storage on Cube Sparse Parameter Indexing](./CUBE_QUANT.md)
+

 ## Example
 in directory python/example/criteo_ctr_with_cube, run

--- a/doc/CUBE_LOCAL_CN.md
+++ b/doc/CUBE_LOCAL_CN.md
@@ -8,6 +8,8 @@

 单机版Cube是分布式Cube的弱化版本，旨在方便开发者做实验和Demo时使用。如果有分布式稀疏参数服务的需求，请在读完此文档之后，继续阅读  [稀疏参数索引服务Cube使用指南](分布式Cube)（正在建设中）。

+本文档使用的都是未经过任何压缩算法处理的原始模型，如果有量化模型上线需求，请阅读[Cube稀疏参数索引量化存储使用指南](./CUBE_QUANT_CN.md)
+

 ## 示例
 在python/example/criteo_ctr_with_cube下执行

--- a/doc/CUBE_QUANT.md
+++ b/doc/CUBE_QUANT.md
+# Quantization Storage on Cube Sparse Parameter Indexing
+
+([简体中文](./CUBE_QUANT_CN.md)|English)
+
+## Overview
+
+In our previous article, we know that the sparse parameter is a series of floating-point numbers with large dimensions, and floating-point numbers require 4 Bytes of storage space in the computer. In fact, we don't need very high precision of floating point numbers to achieve a comparable model effect, in exchange for a lot of space savings, speeding up model loading and query speed.
+
+## Precondition
+
+Please Read  [Cube: Sparse Parameter Indexing Service (Local Mode)](./CUBE_LOCAL_CN.md)
+
+
+##  Components
+### seq_generator:
+This tool is used to convert the Paddle model into a Sequence File. Here, two modes are given. The first is the normal mode. The value in the generated KV sequence is saved as an uncompressed floating point number. The second is the quantization mode. The Value in the generated KV sequence is stored according to [min, max, bytes]. See the specific principle ([Post-Training 4-bit Quantization on Embedding Tables](https://arxiv.org/abs/1911.02079))
+
+
+##  Usage
+
+In Serving Directory，train the model in the criteo_ctr_with_cube directory
+
+```
+cd python/examples/criteo_ctr_with_cube
+python local_train.py # save model
+```
+Next, you can use quantization and non-quantization to generate Sequence File for Cube sparse parameter indexing.
+
+```
+seq_generator ctr_serving_model/SparseFeatFactors ./cube_model/feature # naive mode
+seq_generator ctr_serving_model/SparseFeatFactors ./cube_model/feature 8 #quantization
+```
+This command will convert the sparse parameter file SparseFeatFactors in the ctr_serving_model directory into a feature file (Sequence File format) in the cube_model directory. At present, the quantization tool only supports 8-bit quantization. In the future, it will support higher compression rates and more types of quantization methods.
+
+## Launch Serving by Quantized Model
+
+In Serving, a quantized model is used when using general_dist_kv_quant_infer op to make predictions. See python/examples/criteo_ctr_with_cube/test_server_quant.py for details. No changes are required on the client side.
+
+In order to make the demo easier for users, the following script is to train the quantized criteo ctr model and launch serving by it.
+```
+cd python/examples/criteo_ctr_with_cube
+python local_train.py
+cp ../../../build_server/core/predictor/seq_generator seq_generator
+cp ../../../build_server/output/bin/cube* ./cube/
+sh cube_prepare_quant.sh &
+python test_server_quant.py ctr_serving_model_kv &
+python test_client.py ctr_client_conf/serving_client_conf.prototxt ./raw_data
+```
+
+Users can compare AUC results after quantization with AUC before quantization. 
--- a/doc/CUBE_QUANT_CN.md
+++ b/doc/CUBE_QUANT_CN.md
+# Cube稀疏参数索引量化存储使用指南
+
+(简体中文|[English](./CUBE_QUANT.md))
+
+## 总体概览
+
+我们在之前的文章中，知道稀疏参数是维度很大的一系列浮点数，而浮点数在计算机中需要4 Byte的存储空间。事实上，我们并不需要很高的浮点数精度就可以实现相当的模型效果，换来大量的空间节约，加快模型的加载速度和查询速度。
+
+
+## 前序要求
+
+请先读取  [稀疏参数索引服务Cube单机版使用指南](./CUBE_LOCAL_CN.md)
+
+
+##  组件介绍
+### seq_generator:
+此工具用于把Paddle的模型转换成Sequence File，在这里，我给出了两种模式，第一种是普通模式，生成的KV序列当中的Value以未压缩的浮点数来进行保存。第二种是量化模式，生成的KV序列当中的Value按照 [min, max, bytes]来存储。具体原理请参见 ([Post-Training 4-bit Quantization on Embedding Tables](https://arxiv.org/abs/1911.02079))
+
+
+##  使用方法
+
+在Serving主目录下，到criteo_ctr_with_cube目录下训练出模型
+
+```
+cd python/examples/criteo_ctr_with_cube
+python local_train.py # 生成模型
+```
+接下来可以使用量化和非量化两种方式去生成Sequence File用于Cube稀疏参数索引。
+```
+seq_generator ctr_serving_model/SparseFeatFactors ./cube_model/feature # 未量化模式
+seq_generator ctr_serving_model/SparseFeatFactors ./cube_model/feature 8 #量化模式
+```
+此命令会讲ctr_serving_model目录下的稀疏参数文件SparseFeatFactors转换为cube_model目录下的feature文件(Sequence File格式)。目前量化工具仅支持8bit量化，未来将支持压缩率更高和种类更多的量化方法。
+
+## 用量化模型启动Serving
+
+在Serving当中，使用general_dist_kv_quant_infer op来进行预测时使用量化模型。具体详见  python/examples/criteo_ctr_with_cube/test_server_quant.py。客户端部分不需要做任何改动。
+
+为方便用户做demo，我们给出了从0开始启动量化模型Serving。
+```
+cd python/examples/criteo_ctr_with_cube
+python local_train.py
+cp ../../../build_server/core/predictor/seq_generator seq_generator
+cp ../../../build_server/output/bin/cube* ./cube/
+sh cube_prepare_quant.sh &
+python test_server_quant.py ctr_serving_model_kv &
+python test_client.py ctr_client_conf/serving_client_conf.prototxt ./raw_data
+```
+
+用户可以将量化后的AUC结果同量化前的AUC做比较