diff --git a/README.md b/README.md
index 7b4a4f385f76b00f1307bdc7c0eedf177116901f..e6f49526aae36941de5ae4d545bd9f5dfe7fd4d9 100755
--- a/README.md
+++ b/README.md
@@ -17,7 +17,7 @@
-
+
@@ -58,9 +58,9 @@ This chapter guides you through the installation and deployment steps. It is str
- [Install Paddle Serving using docker](doc/Install_EN.md)
- [Build Paddle Serving from Source with Docker](doc/Compile_EN.md)
- [Deploy Paddle Serving on Kubernetes](doc/Run_On_Kubernetes_CN.md)
-- [Deploy Paddle Serving with Security gateway](doc/Serving_Auth_Docker_CN.md)
+- [Deploy Paddle Serving with Security gateway(Chinese)](doc/Serving_Auth_Docker_CN.md)
- [Deploy Paddle Serving on more hardwares](doc/Run_On_XPU_EN.md)
-- [Latest Wheel packages](doc/Latest_Packages_CN.md)(Update everyday on branch develop)
+- [Latest Wheel packages(Update everyday on branch develop)](doc/Latest_Packages_CN.md)
> Use
@@ -69,23 +69,23 @@ The first step is to call the model save interface to generate a model parameter
- [Quick Start](doc/Quick_Start_EN.md)
- [Save a servable model](doc/Save_EN.md)
- [Description of configuration and startup parameters](doc/Serving_Configure_EN.md)
-- [Guide for RESTful/gRPC/bRPC APIs](doc/C++_Serving/Introduction_CN.md)
-- [Infer on quantizative models](doc/Low_Precision_CN.md)
-- [Data format of classic models](doc/Process_data_CN.md)
-- [C++ Serving](doc/C++_Serving/Introduction_CN.md)
- - [protocols](doc/C++_Serving/Inference_Protocols_CN.md)
+- [Guide for RESTful/gRPC/bRPC APIs(Chinese)](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client)
+- [Infer on quantizative models](doc/Low_Precision_EN.md)
+- [Data format of classic models(Chinese)](doc/Process_data_CN.md)
+- [C++ Serving(Chinese)](doc/C++_Serving/Introduction_CN.md)
+ - [Protocols(Chinese)](doc/C++_Serving/Inference_Protocols_CN.md)
- [Hot loading models](doc/C++_Serving/Hot_Loading_EN.md)
- [A/B Test](doc/C++_Serving/ABTest_EN.md)
- [Encryption](doc/C++_Serving/Encryption_EN.md)
- [Analyze and optimize performance(Chinese)](doc/C++_Serving/Performance_Tuning_CN.md)
- [Benchmark(Chinese)](doc/C++_Serving/Benchmark_CN.md)
- [Python Pipeline](doc/Python_Pipeline/Pipeline_Design_EN.md)
- - [Analyze and optimize performance](doc/Python_Pipeline/Pipeline_Design_EN.md)
+ - [Analyze and optimize performance](doc/Python_Pipeline/Performance_Tuning_EN.md)
- [Benchmark(Chinese)](doc/Python_Pipeline/Benchmark_CN.md)
- Client SDK
- - [Python SDK(Chinese)](doc/C++_Serving/Http_Service_CN.md)
+ - [Python SDK(Chinese)](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client)
- [JAVA SDK](doc/Java_SDK_EN.md)
- - [C++ SDK(Chinese)](doc/C++_Serving/Creat_C++Serving_CN.md)
+ - [C++ SDK(Chinese)](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client)
- [Large-scale sparse parameter server](doc/Cube_Local_EN.md)
@@ -94,7 +94,7 @@ The first step is to call the model save interface to generate a model parameter
For Paddle Serving developers, we provide extended documents such as custom OP, level of detail(LOD) processing.
- [Custom Operators](doc/C++_Serving/OP_EN.md)
-- [Processing LOD Data](doc/LOD_EN.md)
+- [Processing LoD Data](doc/LOD_EN.md)
- [FAQ(Chinese)](doc/FAQ_CN.md)
Model Zoo
@@ -112,10 +112,10 @@ Paddle Serving works closely with the Paddle model suite, and implements a large
For more model examples, read [Model zoo](doc/Model_Zoo_EN.md)
-
+
-
+
Community
@@ -124,19 +124,19 @@ If you want to communicate with developers and other users? Welcome to join us,
### Wechat
- WeChat scavenging
-
+
+
+
-
+
### QQ
-- 飞桨推理部署交流群(Group No.:697765514)
-
-
-
+- QQ Group(Group No.:697765514)
-### Slack
+
+
+
-- [Slack channel](https://paddleserving.slack.com/archives/CUBPKHKMJ)
> Contribution
diff --git a/README_CN.md b/README_CN.md
index c5c336274a5aa0c7ba49619c9a457acce6401699..a1ba46a958a0146b1195e7c9f3af17ab6b263cfd 100644
--- a/README_CN.md
+++ b/README_CN.md
@@ -17,7 +17,7 @@
-
+
@@ -27,7 +27,7 @@
Paddle Serving依托深度学习框架PaddlePaddle旨在帮助深度学习开发者和企业提供高性能、灵活易用的工业级在线推理服务。Paddle Serving支持RESTful、gRPC、bRPC等多种协议,提供多种异构硬件和多种操作系统环境下推理解决方案,和多种经典预训练模型示例。核心特性如下:
- 集成高性能服务端推理引擎paddle Inference和移动端引擎paddle Lite,其他机器学习平台(Caffe/TensorFlow/ONNX/PyTorch)可通过[x2paddle](https://github.com/PaddlePaddle/X2Paddle)工具迁移模型
-- 具有高性能C++和高易用Python 2套框架。C++框架基于高性能bRPC网络框架打造高吞吐、低延迟的推理服务,性能领先竞品。Python框架基于gRPC/gRPC-Gateway网络框架和Python语言构建高易用、高吞吐推理服务框架。技术选型参考[技术选型](doc/Serving_Design_CN.md)
+- 具有高性能C++和高易用Python 2套框架。C++框架基于高性能bRPC网络框架打造高吞吐、低延迟的推理服务,性能领先竞品。Python框架基于gRPC/gRPC-Gateway网络框架和Python语言构建高易用、高吞吐推理服务框架。技术选型参考[技术选型](doc/Serving_Design_CN.md#21-设计选型)
- 支持HTTP、gRPC、bRPC等多种[协议](doc/C++_Serving/Inference_Protocols_CN.md);提供C++、Python、Java语言SDK
- 设计并实现基于有向无环图(DAG)的异步流水线高性能推理框架,具有多模型组合、异步调度、并发推理、动态批量、多卡多流推理等特性
- 适配x86(Intel) CPU、ARM CPU、Nvidia GPU、昆仑XPU等多种硬件;集成Intel MKLDNN、Nvidia TensorRT加速库,以及低精度和量化推理
@@ -48,17 +48,15 @@ Paddle Serving依托深度学习框架PaddlePaddle旨在帮助深度学习开发
文档
-***
-
> 部署
-此章节引导您完成安装和部署步骤,强烈推荐使用Docker部署Paddle Serving,如您不使用docker,省略docker相关步骤。在云服务器上可以使用Kubernetes部署Paddle Serving。在异构硬件如ARM CPU、昆仑XPU上编译或使用Paddle Serving可以下面的文档。每天编译生成develop分支的最新开发包供开发者使用。
+此章节引导您完成安装和部署步骤,强烈推荐使用Docker部署Paddle Serving,如您不使用docker,省略docker相关步骤。在云服务器上可以使用Kubernetes部署Paddle Serving。在异构硬件如ARM CPU、昆仑XPU上编译或使用Paddle Serving可阅读以下文档。每天编译生成develop分支的最新开发包供开发者使用。
- [使用docker安装Paddle Serving](doc/Install_CN.md)
- [源码编译安装Paddle Serving](doc/Compile_CN.md)
- [在Kuberntes集群上部署Paddle Serving](doc/Run_On_Kubernetes_CN.md)
- [部署Paddle Serving安全网关](doc/Serving_Auth_Docker_CN.md)
- [在异构硬件部署Paddle Serving](doc/Run_On_XPU_CN.md)
-- [最新Wheel开发包](doc/Latest_Packages_CN.md)(develop分支每日更新)
+- [最新Wheel开发包(develop分支每日更新)](doc/Latest_Packages_CN.md)
> 使用
@@ -66,7 +64,7 @@ Paddle Serving依托深度学习框架PaddlePaddle旨在帮助深度学习开发
- [快速开始](doc/Quick_Start_CN.md)
- [保存用于Paddle Serving的模型和配置](doc/Save_CN.md)
- [配置和启动参数的说明](doc/Serving_Configure_CN.md)
-- [RESTful/gRPC/bRPC API指南](doc/C++_Serving/Introduction_CN.md#4.Client端特性)
+- [RESTful/gRPC/bRPC API指南](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client)
- [低精度推理](doc/Low_Precision_CN.md)
- [常见模型数据处理](doc/Process_data_CN.md)
- [C++ Serving简介](doc/C++_Serving/Introduction_CN.md)
@@ -76,20 +74,20 @@ Paddle Serving依托深度学习框架PaddlePaddle旨在帮助深度学习开发
- [加密模型推理服务](doc/C++_Serving/Encryption_CN.md)
- [性能优化指南](doc/C++_Serving/Performance_Tuning_CN.md)
- [性能指标](doc/C++_Serving/Benchmark_CN.md)
-- [Python Pipeline简介](doc/Python_Pipeline/Pipeline_Design_CN.md)
- - [性能优化指南](doc/Python_Pipeline/Pipeline_Design_CN.md)
+- [Python Pipeline设计](doc/Python_Pipeline/Pipeline_Design_CN.md)
+ - [性能优化指南](doc/Python_Pipeline/Performance_Tuning_CN.md)
- [性能指标](doc/Python_Pipeline/Benchmark_CN.md)
- 客户端SDK
- - [Python SDK](doc/C++_Serving/Http_Service_CN.md)
+ - [Python SDK](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client)
- [JAVA SDK](doc/Java_SDK_CN.md)
- - [C++ SDK](doc/C++_Serving/Creat_C++Serving_CN.md)
+ - [C++ SDK](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client)
- [大规模稀疏参数索引服务](doc/Cube_Local_CN.md)
> 开发者
为Paddle Serving开发者,提供自定义OP,变长数据处理。
- [自定义OP](doc/C++_Serving/OP_CN.md)
-- [变长数据(LOD)处理](doc/LOD_CN.md)
+- [变长数据(LoD)处理](doc/LOD_CN.md)
- [常见问答](doc/FAQ_CN.md)
模型库
@@ -104,7 +102,7 @@ Paddle Serving与Paddle模型套件紧密配合,实现大量服务化部署,
-更多模型示例参考Repo,可进入[模型库](doc/Model_Zoo_CN.md)
+更多模型示例进入[模型库](doc/Model_Zoo_CN.md)
@@ -130,13 +128,10 @@ Paddle Serving与Paddle模型套件紧密配合,实现大量服务化部署,
-### Slack
-- [Slack channel](https://paddleserving.slack.com/archives/CUBPKHKMJ)
-
> 贡献代码
-如果您想为Paddle Serving贡献代码,请参考 [Contribution Guidelines](doc/Contribute_EN.md)
+如果您想为Paddle Serving贡献代码,请参考 [Contribution Guidelines(English)](doc/Contribute_EN.md)
- 感谢 [@loveululu](https://github.com/loveululu) 提供 Cube python API
- 感谢 [@EtachGu](https://github.com/EtachGu) 更新 docker 使用命令
- 感谢 [@BeyondYourself](https://github.com/BeyondYourself) 提供grpc教程,更新FAQ教程,整理文件目录。
diff --git a/doc/C++_Serving/Introduction_CN.md b/doc/C++_Serving/Introduction_CN.md
index 17c3f096f6eba10e6abe2c2fcab8e5ee17b83ad4..3f864ab04628b6e707463625d9aafb472917e9db 100755
--- a/doc/C++_Serving/Introduction_CN.md
+++ b/doc/C++_Serving/Introduction_CN.md
@@ -76,7 +76,7 @@ C++ Serving采用对称加密算法对模型进行加密,在服务加载模型
### 4.2 多语言多协议Client
-BRPC网络框架支持[多种底层通信协议](#1.网络框架(BRPC)),即使用目前的C++ Serving框架的Server端,各种语言的Client端,甚至使用curl的方式,只要按照上述协议(具体支持的协议见[brpc官网](https://github.com/apache/incubator-brpc))封装数据并发送,Server端就能够接收、处理和返回结果。
+BRPC网络框架支持[多种底层通信协议](#1网络框架BRPC),即使用目前的C++ Serving框架的Server端,各种语言的Client端,甚至使用curl的方式,只要按照上述协议(具体支持的协议见[brpc官网](https://github.com/apache/incubator-brpc))封装数据并发送,Server端就能够接收、处理和返回结果。
对于支持的各种协议我们提供了部分的Client SDK示例供用户参考和使用,用户也可以根据自己的需求去开发新的Client SDK,也欢迎用户添加其他语言/协议(例如GRPC-Go、GRPC-C++ HTTP2-Go、HTTP2-Java等)Client SDK到我们的仓库供其他开发者借鉴和参考。
diff --git a/doc/FAQ_CN.md b/doc/FAQ_CN.md
index c1f2359abf5aca4fdf40e9f7fb3abccb6adce62a..94020bcc2477151839c521474cde6d73ea388ad2 100644
--- a/doc/FAQ_CN.md
+++ b/doc/FAQ_CN.md
@@ -332,7 +332,7 @@ GLOG_v=2 python -m paddle_serving_server.serve --model xxx_conf/ --port 9999
使用gdb调试core文件的方法为:gdb <可执行文件> ,进入后输入bt指令,一般即可显示出错在哪一行。
-注意:可执行文件路径是C++ bin文件的路径,而不是python命令,一般为类似下面的这种/usr/local/lib/python3.6/site-packages/paddle_serving_server/serving-gpu-102-0.6.2/serving
+注意:可执行文件路径是C++ bin文件的路径,而不是python命令,一般为类似下面的这种/usr/local/lib/python3.6/site-packages/paddle_serving_server/serving-gpu-102-0.7.0/serving
## 性能优化
diff --git a/doc/Run_On_Kubernetes_CN.md b/doc/Run_On_Kubernetes_CN.md
index 4482c216b318fac3294c08309e8635c87c88a4b8..951fda78dd0c04d2faa7db5b84cfa845235fbaa5 100644
--- a/doc/Run_On_Kubernetes_CN.md
+++ b/doc/Run_On_Kubernetes_CN.md
@@ -20,7 +20,7 @@ kubectl apply -f https://bit.ly/kong-ingress-dbless
### 制作Serving运行镜像(可选):
-首先您需要确定运行镜像的具体环境。和[DOCKER开发镜像列表]()文档相比,开发镜像用于调试、编译代码,携带了大量的开发工具,因此镜像体积较大。运行镜像通常要求缩小容器体积以提高部署的灵活性。如果您不太需要轻量级的运行容器,请直接跳过这一部分。
+首先您需要确定运行镜像的具体环境。和[DOCKER开发镜像列表](./Docker_Images_CN.md)文档相比,开发镜像用于调试、编译代码,携带了大量的开发工具,因此镜像体积较大。运行镜像通常要求缩小容器体积以提高部署的灵活性。如果您不太需要轻量级的运行容器,请直接跳过这一部分。
在`tools/generate_runtime_docker.sh`文件下,它的使用方式如下
diff --git a/doc/Serving_Design_CN.md b/doc/Serving_Design_CN.md
index 1c407292dbe96b3bf0d3bb33433155e19bd9e7ee..c9d6b25bdd4520a1cf2d09e4e9713a9acfb01487 100644
--- a/doc/Serving_Design_CN.md
+++ b/doc/Serving_Design_CN.md
@@ -27,9 +27,9 @@ Paddle Serving提供很多大规模场景需要的部署功能:1)模型管
| 响应时间 | 吞吐 | 开发效率 | 资源利用率 | 选型 | 应用场景|
|-----|------|-----|-----|------|------|
-| 低 | 高 | 低 | 高 |C++ Serving | 高性能场景,大型在线推荐系统召回、排序服务|
-| 高 | 高 | 较高 |高|Python Pipeline Serving| 兼顾吞吐和效率,单算子多模型组合场景,异步模式|
-| 高 | 低 | 高| 低 |Python webservice| 高迭代效率场景,小型服务或需要快速迭代,模型效果验证|
+| 低 | 最高 | 低 | 最高 |C++ Serving | 高性能场景,大型在线推荐系统召回、排序服务|
+| 最高 | 较高 | 较高 |较高|Python Pipeline Serving| 兼顾吞吐和效率,单算子多模型组合场景,异步模式|
+| 较高 | 低 | 较高| 低 |Python webservice| 高迭代效率场景,小型服务或需要快速迭代,模型效果验证|
性能指标说明:
@@ -197,12 +197,4 @@ Pipeline Serving核心设计是图执行引擎,基本处理单元是OP和Chann
-----
-
-## 6. 未来计划
-
-### 6.1 向量检索、树结构检索
-在推荐与广告场景的召回系统中,通常需要采用基于向量的快速检索或者基于树结构的快速检索,Paddle Serving会对这方面的检索引擎进行集成或扩展。
-### 6.2 服务监控
-集成普罗米修斯监控,一套开源的监控&报警&时间序列数据库的组合,适合k8s和docker的监控系统。
diff --git a/doc/Serving_Design_EN.md b/doc/Serving_Design_EN.md
index 8a77d7552a33327d172e64e2bf67d604ec42aed9..3028f8f6109fc6c8327f8d4ea3ed1e6a8c1dff57 100644
--- a/doc/Serving_Design_EN.md
+++ b/doc/Serving_Design_EN.md
@@ -25,9 +25,9 @@ In order to meet the needs of users in different scenarios, Paddle Serving's pro
| Response time | throughput | development efficiency | Resource utilization | selection | Applications|
|-----|------|-----|-----|------|------|
-| LOW | HIGH | LOW | HIGH |C++ Serving | High-performance,recall and ranking services of large-scale online recommendation systems|
-| HIGH | HIGH | HIGH | HIGH |Python Pipeline Serving| High-throughput, high-efficiency, asynchronous mode, fitting for single operator multi-model combination scenarios|
-| HIGH | LOW | HIGH| LOW |Python webservice| High-throughput,Low-traffic services or projects that require rapid iteration, model effect verification|
+| Low | Highest | Low | Highest |C++ Serving | High-performance,recall and ranking services of large-scale online recommendation systems|
+| Higest | Higher | Higher | Higher |Python Pipeline Serving| High-throughput, high-efficiency, asynchronous mode, fitting for single operator multi-model combination scenarios|
+| Higer | Low | Higher| Low |Python webservice| High-throughput,Low-traffic services or projects that require rapid iteration, model effect verification|
Performance index description:
1. Response time (ms): Average response time of a single request, calculate the response time of 50, 90, 95, 99 quantiles, the lower the better.
@@ -199,16 +199,4 @@ The core design of Pipeline Serving is a graph execution engine, and the basic p
-----
-
-
-## 6. Future Plan
-
-### 5.1 Auto Deployment on Cloud
-In order to make deployment more easily on public cloud, Paddle Serving considers to provides Operators on Kubernetes in submitting a service job.
-
-### 6.2 Vector Indexing and Tree based Indexing
-In recommendation and advertisement systems, it is commonly seen to use vector based index or tree based indexing service to do candidate retrievals. These retrieval tasks will be built-in services of Paddle Serving.
-### 6.3 Service Monitoring
-Paddle Serving will integrate Prometheus monitoring, which is a set of open source monitoring & alarm & time series database combination, suitable for k8s and docker monitoring systems.