未验证 提交 50ed7e38 编写于 作者: J Jiawei Wang 提交者: GitHub

Merge pull request #6 from PaddlePaddle/master

Sync with remote
...@@ -32,6 +32,8 @@ ELASTIC CTR ...@@ -32,6 +32,8 @@ ELASTIC CTR
- cube-builder: 负责将训练作业产出的模型文件(hadoop sequence file格式)转换成可以被cube-server加载的字典文件。字典文件具有特定的数据结构,针对尺寸和内存中访问做了高度优化 - cube-builder: 负责将训练作业产出的模型文件(hadoop sequence file格式)转换成可以被cube-server加载的字典文件。字典文件具有特定的数据结构,针对尺寸和内存中访问做了高度优化
- Cube-Server: 提供分片kv读写能力的服务节点 - Cube-Server: 提供分片kv读写能力的服务节点
- Cube-agent: 与cube-server同机部署,接收cube-transfer下发的字典文件更新命令,拉取数据到本地,通知cube-server进行更新 - Cube-agent: 与cube-server同机部署,接收cube-transfer下发的字典文件更新命令,拉取数据到本地,通知cube-server进行更新
- Paddle Serving: 加载CTR预估任务模型ProgramDesc和dense参数,提供预测服务
- Client: CTR预估任务的demo客户端
以上组件串联完成从训练到预测部署的所有流程。本文档所提供的一键部署脚本[paddle-suite.sh](https://github.com/PaddlePaddle/Serving/blob/master/doc/resource/paddle-suite.sh)可一键部署上述所有组件。 以上组件串联完成从训练到预测部署的所有流程。本文档所提供的一键部署脚本[paddle-suite.sh](https://github.com/PaddlePaddle/Serving/blob/master/doc/resource/paddle-suite.sh)可一键部署上述所有组件。
...@@ -121,7 +123,7 @@ $ kubectl apply -f https://raw.githubusercontent.com/volcano-sh/volcano/master/i ...@@ -121,7 +123,7 @@ $ kubectl apply -f https://raw.githubusercontent.com/volcano-sh/volcano/master/i
## 3.1 下载部署方案脚本文件 ## 3.1 下载部署方案脚本文件
请将[本方案所需所有脚本文件](https://github.com/PaddlePaddle/edl/tree/develop/example/ctr/script)下载到本地 请将[本方案所需所有脚本文件](https://github.com/PaddlePaddle/Serving/tree/master/doc/resource)下载到本地
## 3.2 一键部署 ## 3.2 一键部署
...@@ -133,9 +135,7 @@ $ bash paddle-suite.sh ...@@ -133,9 +135,7 @@ $ bash paddle-suite.sh
请参考**3.3-3.8节**验证每一步的安装是否正确,**第4节**验证训练过程和预测服务结果。 请参考**3.3-3.8节**验证每一步的安装是否正确,**第4节**验证训练过程和预测服务结果。
任务的所有脚本文件可以访问[这里](https://github.com/PaddlePaddle/edl/tree/develop/example/ctr/script)获取。 **[注意!!!]**:以下**3.3-3.8节所述内容已经在一键部署脚本中包含,无需手动执行**。但为方便理解,将该脚本的每一步执行过程给出说明。
**注**:以下**3.3-3.8节所述内容已经在一键部署脚本中包含,无需手动执行**。但为方便理解,将该脚本的每一步执行过程给出说明。
## 3.3 选择一个node作为输出节点 ## 3.3 选择一个node作为输出节点
...@@ -405,16 +405,16 @@ $ ./get_values -h 192.168.1.1 -t 3 -r 10000 -b 1000 ...@@ -405,16 +405,16 @@ $ ./get_values -h 192.168.1.1 -t 3 -r 10000 -b 1000
并发数 (压测线程数) | batch size | 平均响应时间 (us) | total qps 并发数 (压测线程数) | batch size | 平均响应时间 (us) | total qps
-------|------------|-------------|--------------------------- -------|------------|-------------|---------------------------
1 | 1000 | 1159 | 862 1 | 1000 | 1643 | 608
4 | 1000 | 3537 | 1079 4 | 1000 | 4878 | 819
8 | 1000 | 7726 | 1073 8 | 1000 | 9870 | 810
16 | 1000 | 15440 | 1034 16 | 1000 | 22177 | 721
24 | 1000 | 24279 | 1004 24 | 1000 | 30620 | 783
32 | 1000 | 32570 | 996 32 | 1000 | 37668 | 849
###测试结论 ###测试结论
由于Redis高效的时间驱动模型和全内存操作,在单并发时,redis平均响应时间比cube少接近50% (1100us vs. 1680us) 由于Redis高效的时间驱动模型和全内存操作,在单并发时,redis平均响应时间与cube相差不多% (1643us vs. 1312us)
在扩展性方面,redis受制于单线程模型,随并发数增加,响应时间加倍增加,而总吞吐在1000qps左右即不再上涨;而cube则随着压测并发数增加,总的qps一直上涨,说明cube能够较好处理并发请求,具有良好的扩展能力。 在扩展性方面,redis受制于单线程模型,随并发数增加,响应时间加倍增加,而总吞吐在1000qps左右即不再上涨;而cube则随着压测并发数增加,总的qps一直上涨,说明cube能够较好处理并发请求,具有良好的扩展能力。
......
...@@ -20,6 +20,7 @@ int batch_size = 100; ...@@ -20,6 +20,7 @@ int batch_size = 100;
int key_size = 10000000; // keys in redis server int key_size = 10000000; // keys in redis server
std::vector<uint64_t> times_us; std::vector<uint64_t> times_us;
std::vector<uint64_t> average_time_us;
sw::redis::Redis *redis; sw::redis::Redis *redis;
...@@ -94,7 +95,7 @@ void thread_worker(int thread_id) ...@@ -94,7 +95,7 @@ void thread_worker(int thread_id)
std::vector<std::string> get_kvs_res; std::vector<std::string> get_kvs_res;
for(int j = i * batch_size; j < (i + 1) * batch_size; j++) { for(int j = i * batch_size; j < (i + 1) * batch_size; j++) {
get_kvs.push_back(std::to_string(i % key_size)); get_kvs.push_back(std::to_string(j % key_size));
} }
auto start2 = std::chrono::steady_clock::now(); auto start2 = std::chrono::steady_clock::now();
redis->mget(get_kvs.begin(), get_kvs.end(), std::back_inserter(get_kvs_res)); redis->mget(get_kvs.begin(), get_kvs.end(), std::back_inserter(get_kvs_res));
...@@ -102,10 +103,11 @@ void thread_worker(int thread_id) ...@@ -102,10 +103,11 @@ void thread_worker(int thread_id)
times_us[thread_id] += std::chrono::duration_cast<std::chrono::microseconds>(stop2 - start2).count(); times_us[thread_id] += std::chrono::duration_cast<std::chrono::microseconds>(stop2 - start2).count();
} }
// Per-thread statistics average_time_us[thread_id] = times_us[thread_id] / total_request_num;
std::cout << total_request_num << " requests, " << batch_size << " keys per req, total time us = " << times_us[thread_id] <<std::endl;
std::cout << "Average " << times_us[thread_id] / total_request_num << "us per req" << std::endl; // std::cout << total_request_num << " requests, " << batch_size << " keys per req, total time us = " << times_us[thread_id] <<std::endl;
std::cout << "qps: " << (double)total_request_num / times_us[thread_id] * 1000000 << std::endl; // std::cout << "Average " << average_time_us[thread_id] << "us per req" << std::endl;
// std::cout << "qps: " << (double)total_request_num / times_us[thread_id] * 1000000 << std::endl;
} }
int main(int argc, char **argv) int main(int argc, char **argv)
...@@ -117,6 +119,7 @@ int main(int argc, char **argv) ...@@ -117,6 +119,7 @@ int main(int argc, char **argv)
std::vector<std::thread> workers; std::vector<std::thread> workers;
times_us.reserve(thread_num); times_us.reserve(thread_num);
average_time_us.reserve(thread_num);
for (int i = 0; i < thread_num; ++i) { for (int i = 0; i < thread_num; ++i) {
times_us[i] = 0; times_us[i] = 0;
...@@ -127,18 +130,19 @@ int main(int argc, char **argv) ...@@ -127,18 +130,19 @@ int main(int argc, char **argv)
workers[i].join(); workers[i].join();
} }
// times_total_us is average running time of each thread
uint64_t times_total_us = 0; uint64_t times_total_us = 0;
uint64_t average_time_total_us;
for (int i = 0; i < thread_num; ++i) { for (int i = 0; i < thread_num; ++i) {
times_total_us += times_us[i]; times_total_us += times_us[i];
average_time_total_us += average_time_us[i];
} }
times_total_us /= thread_num;
// Total requests should be sum of requests sent by each thread times_total_us /= thread_num;
total_request_num *= thread_num; total_request_num *= thread_num;
std::cout << total_request_num << " requests, " << batch_size << " keys per req, total time us = " << times_total_us <<std::endl; std::cout << total_request_num << " requests, " << batch_size << " keys per req, total time us = " << times_total_us <<std::endl;
std::cout << "Average " << times_total_us / total_request_num << "us per req" << std::endl; std::cout << "Average " << average_time_total_us / thread_num << "us per req" << std::endl;
std::cout << "qps: " << (double)total_request_num / times_total_us * 1000000 << std::endl; std::cout << "qps: " << (double)total_request_num / times_total_us * 1000000 << std::endl;
return 0; return 0;
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册