fix doc

6673bb65 · xulongteng · e19aafca · 6673bb65
隐藏空白更改
内联并排

Showing with 34 addition and 24 deletion

doc/DEPLOY.md doc/DEPLOY.md +34 -24

未找到文件。
--- a/doc/DEPLOY.md
+++ b/doc/DEPLOY.md
@@ -336,7 +336,7 @@ gflags.conf  transfer.conf

 将bin/cube,bin/cube-agent和conf/gflags.conf拷贝到多个物理机上。假设拷贝好的文件结构如下：

-```
+```bash
 $ tree
 .
 |-- bin
@@ -385,7 +385,7 @@ Flags from /home/work/dangyifei/open-builder/src/main.cpp:
 只利用builder工具建立索引无特殊位置要求，如果接入配送环节使用必须和cube-transfer同机部署。  
 假设单独使用builder工具，文件结构如下：  

-```
+```bash
 $ tree
 `-- cube-builder
 |-- source
@@ -395,13 +395,19 @@ $ tree

 #### <span id="head24">3.3 启动cube-builder</span>
 ##### 3.3.1接入配送流程
-拷贝bin/cube-builder和cube-transfer程序同机器。  
-相关参数已经封装好，只需要在cube-transfer的conf/transfer.conf里配置好cube-builder的地址、源数据和建库数据output的地址即可。  
-##### 3.3.2单机builder，假设分片数为2，词典名为test
-######base模式  
-```
-启动cube-builder命令
-./open_builder -input_path=./source -output_path=./output -shard_num=2 -dict_name=test
+拷贝bin/cube-builder和cube-transfer程序到同一机器。  
+相关参数已经封装好，只需要在cube-transfer的conf/transfer.conf里配置好cube-builder的地址、源数据和建库数据output的地址即可，  执行cube-transfer时会通过配置文件中的路径调用cube-builder，所以通常不需要手动执行cube-builder。
+
+##### 3.3.2单机builder
+
+**假设分片数为2，词典名为test**
+
+###### base模式  
+
+启动cube-builder命令，参数中的路径需要为绝对路径
+
+```bash
+./cube-builder -input_path=${source} -output_path=${output} -shard_num=2 -dict_name=test
 ```
 运行后会根据当前时间戳自动生成建库索引文件夹1565323045_1565323045和meta信息文件夹meta_info结构如下：
 ```
@@ -430,11 +436,13 @@ $ tree
        `-- 1565323045_1565323045_1_0.json
 ```
 test_part0.tar和test_part0.tar.md5是shard0分片的数据和md5校验，1565323045_1565323045_0_0.json是0号分片的索引长度和数量，在对应版本的delta建库中需要。  
-######delta模式
+###### delta模式
 需要依赖于上次的base或者delta的id和key，1565323045_1565323045_0_0.json前一个时间戳是id，后一个是key（和分片数据的目录key_id相反），对应cube-builder输入参数-last_version和-depend_version，保持output和dict_name不变（builder会寻找上一轮的index meta信息）。  
-```
-启动cube-builder命令
-input_path=./source -output_path=./output -shard_num=2 -depend_version=1565323045 -last_version=1565323045 -job_mode=delta -dict_name=test
+
+启动cube-builder命令，参数中的路径需要为绝对路径
+
+```bash
+./cube-builder -input_path=${source} -output_path=${output} -shard_num=2 -depend_version=1565323045 -last_version=1565323045 -job_mode=delta -dict_name=test
 ```
 运行后会根据当前时间戳自动生成delta建库索引文件夹1565323045_1565326078和meta信息文件夹meta_info结构如下：
 ```
@@ -480,7 +488,7 @@ $ tree
        `-- 1565326078_1565323045_1_0.json
 ```
 #### <span id="head241">3.4 seqfile工具</span>
-builder输入数据的源格式必须为seqfile，key为uint64（输入必须为二进制8个字节），value为序列化的二进制。  
+builder输入数据的源格式必须为seqfile，key为uint64（输入必须为二进制8个字节），value为序列化的二进制。   
 提供明文转seqfile工具和读seqfile工具，位置在output/tool里kvtool.py和kv_to_seqfile.py。  
 kvtool.py 是读seqfile工具，会输出读到的kv信息，参数是文件地址假设在/home/work/test下的seqfile，运行方式如下：
 ```
@@ -506,7 +514,7 @@ SOURCE_FILE = './source/file.txt' #明文源数据路径

 #### <span id="head26">4.1 cube-transfer配置修改</span>

-cube-transfer配置文件是conf/transfer.conf，配置比较复杂；各个配置项含义如下：
+cube-transfer配置文件是conf/transfer.conf，配置比较复杂，配置文件中的路径需要为绝对路径，各个配置项含义如下：
 ```
 [default]
 dict_name: test_dict                                    //词典名
@@ -523,15 +531,16 @@ transfer_address: 10.10.10.5                             //cube-transfer本机

 [cube_agent]
 agent0_0: 10.10.220.15:8001                        //0号分片0号副本的agent ip:port
-cube0_0: 10.10.220.15:8000:/ssd2/cube_open                //0号分片0号副本的cube ip:port:deploy_path
+cube0_0: 10.10.220.15:8000:/ssd2/cube_open                //0号分片0号副本的cube，该路径下会存放配送的数据 ip:port:deploy_path
 agent0_1: 10.10.180.40:8001                        //0号分片1号副本的agent ip:port
-cube0_1: 10.10.180.40:8000:/home/disk1/cube_open             //0号分片1号副本的cube ip:port:deploy_path
+cube0_1: 10.10.180.40:8000:/home/disk1/cube_open             //0号分片1号副本的cube ，该路径下会存放配送的数据 ip:port:deploy_path
 ```

 #### <span id="head27">4.2 拷贝cube-transfer到物理机</span>

 将bin/cube-transfer和conf/transfer.conf拷贝到多个物理机上，构建output和tmp文件夹用来存放配送的中间文件。  
 假设拷贝好的文件结构如下：
+
 ```
 $ tree
 .
@@ -542,24 +551,25 @@ $ tree
    |-- transfer.conf
 ```
 #### <span id="head28">4.3 启动cube-transfer</span>
-假设启动服务端口8099，-l参数是log等级 --config是配置文件位置
-```
+假设启动服务端口8099，-l参数是log等级 --config是配置文件位置，./log文件夹下可以查看cube-transfer的日志
+```bash
 ./cube-transfer -p 8099 -l 4 --config conf/transfer.conf
 ```
 #### <span id="head281">4.4 cube-transfer支持查询接口</span>
 > 获取当前词典状态  
->http://10.10.10.5:8099/dict/info  
+> http://10.10.10.5:8099/dict/info  

 > 获取实例当前状态  
->http://10.10.10.5:8099/instance/status  
+> http://10.10.10.5:8099/instance/status  

 > 获取配送历史从最近的base到当前正在配送的delta  
->http://10.10.10.5:8099/dict/deploy/history 
+> http://10.10.10.5:8099/dict/deploy/history 

 #### <span id="head29">4.5 donefile格式协议</span>

 一旦cube-transfer部署完成，它就不断监听我们配置好的donefile数据位置，发现有数据更新后，即启动数据下载，然后通知cube-builder执行建库和配送流程，将新数据配送给各个分片的cube-server。  
 id最好使用版本产出时间戳，base和patch每产出一条直接在donefile文件最后加一行即可，文件名固定base.txt、patch.txt  
+
 >base.txt每行一条，id和key相同，目录下可有多个文件，不能有文件夹
 >```
 >{"id":"1562000400","key":"1562000400","input":"/home/work/test_data/input/seqfile"}
@@ -747,7 +757,7 @@ CTR预估任务样例使用的数据来自于[原始模型](https://github.com/P

 #### <span id="head42">2.2 Client编译与部署</span>

-按照[1.2Serving编译](#1.2 Serving编译)部分完成编译后，client端文件在output/demo/client/ctr_prediction路径下。
+按照1.2Serving编译部分完成编译后，client端文件在output/demo/client/ctr_prediction路径下。

 ##### <span id="head43">2.2.1 配置修改</span>

@@ -774,4 +784,4 @@ cluster: "list://127.0.0.1:8010"

 ##### <span id="head44">2.2.2 运行服务</span>

-执行`./bin/ctr_predictoin`启动client端，在./log路径下可以看到client端执行的日志。
+执行`./bin/ctr_predictoin`启动client端，在./log路径下可以看到client端执行的日志。
\ No newline at end of file