Merge branch 'develop' into compile_doc

52247b72 · TeslaZhao · GitHub · 6e307552 · 37f29830 · 52247b72
5 changed file
--- a/README.md
+++ b/README.md
@@ -17,7 +17,7 @@
        <img alt="Forks" src="https://img.shields.io/github/forks/PaddlePaddle/Serving?color=yellow&style=flat-square">
        <img alt="Issues" src="https://img.shields.io/github/issues/PaddlePaddle/Serving?color=yellow&style=flat-square">
        <img alt="Contributors" src="https://img.shields.io/github/contributors/PaddlePaddle/Serving?color=orange&style=flat-square">
-        <img alt="Community" src="https://img.shields.io/badge/join-Wechat,QQ,Slack-orange?style=flat-square">
+        <img alt="Community" src="https://img.shields.io/badge/join-Wechat,QQ-orange?style=flat-square">
    </a>
    <br>
 <p>
@@ -28,7 +28,7 @@ The goal of Paddle Serving is to provide high-performance, flexible and easy-to-


 - Integrate high-performance server-side inference engine paddle Inference and mobile-side engine paddle Lite. Models of other machine learning platforms (Caffe/TensorFlow/ONNX/PyTorch) can be migrated to paddle through [x2paddle](https://github.com/PaddlePaddle/X2Paddle).
- There are two frameworks, namely high-performance C++ Serving and high-easy-to-use Python pipeline.The C++ Serving is based on the bRPC network framework to create a high-throughput, low-latency inference service, and its performance indicators are ahead of competing products. The Python pipeline is based on the gRPC/gRPC-Gateway network framework and the Python language to build a highly easy-to-use and high-throughput inference service. How to choose which one please see [Techinical Selection](doc/Serving_Design_EN.md)
+- There are two frameworks, namely high-performance C++ Serving and high-easy-to-use Python pipeline.The C++ Serving is based on the bRPC network framework to create a high-throughput, low-latency inference service, and its performance indicators are ahead of competing products. The Python pipeline is based on the gRPC/gRPC-Gateway network framework and the Python language to build a highly easy-to-use and high-throughput inference service. How to choose which one please see [Techinical Selection](doc/Serving_Design_EN.md#21-design-selection).
 - Support multiple [protocols](doc/C++_Serving/Inference_Protocols_CN.md ) such as HTTP, gRPC, bRPC,  and provide C++, Python, Java language SDK.
 - Design and implement a high-performance inference service framework for asynchronous pipelines based on directed acyclic graph (DAG), with features such as multi-model combination, asynchronous scheduling, concurrent inference, dynamic batch, multi-card multi-stream inference, etc.- Adapt to a variety of commonly used computing hardwares, such as x86 (Intel) CPU, ARM CPU, Nvidia GPU, Kunlun XPU, etc.; Integrate acceleration libraries of Intel MKLDNN and  Nvidia TensorRT, and low-precision and quantitative inference.
 - Provide a model security deployment solution, including encryption model deployment, and authentication mechanism, HTTPs security gateway, which is used in practice.
@@ -57,9 +57,9 @@ This chapter guides you through the installation and deployment steps. It is str
 - [Install Paddle Serving using docker](doc/Install_EN.md)
 - [Build Paddle Serving from Source with Docker](doc/Compile_EN.md)
 - [Deploy Paddle Serving on Kubernetes](doc/Run_On_Kubernetes_CN.md)
- [Deploy Paddle Serving with Security gateway](doc/Serving_Auth_Docker_CN.md)
+- [Deploy Paddle Serving with Security gateway(Chinese)](doc/Serving_Auth_Docker_CN.md)
 - [Deploy Paddle Serving on more hardwares](doc/Run_On_XPU_EN.md)
- [Latest Wheel packages](doc/Latest_Packages_CN.md)(Update everyday on branch develop)
+- [Latest Wheel packages(Update everyday on branch develop)](doc/Latest_Packages_CN.md)

 > Use

@@ -68,23 +68,23 @@ The first step is to call the model save interface to generate a model parameter
 - [Quick Start](doc/Quick_Start_EN.md)
 - [Save a servable model](doc/Save_EN.md)
 - [Description of configuration and startup parameters](doc/Serving_Configure_EN.md)
- [Guide for RESTful/gRPC/bRPC APIs](doc/C++_Serving/Introduction_CN.md)
- [Infer on quantizative models](doc/Low_Precision_CN.md)
- [Data format of classic models](doc/Process_data_CN.md)
- [C++ Serving](doc/C++_Serving/Introduction_CN.md) 
-  - [protocols](doc/C++_Serving/Inference_Protocols_CN.md)
+- [Guide for RESTful/gRPC/bRPC APIs(Chinese)](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client)
+- [Infer on quantizative models](doc/Low_Precision_EN.md)
+- [Data format of classic models(Chinese)](doc/Process_data_CN.md)
+- [C++ Serving(Chinese)](doc/C++_Serving/Introduction_CN.md) 
+  - [Protocols(Chinese)](doc/C++_Serving/Inference_Protocols_CN.md)
  - [Hot loading models](doc/C++_Serving/Hot_Loading_EN.md)
  - [A/B Test](doc/C++_Serving/ABTest_EN.md)
  - [Encryption](doc/C++_Serving/Encryption_EN.md)
  - [Analyze and optimize performance(Chinese)](doc/C++_Serving/Performance_Tuning_CN.md)
  - [Benchmark(Chinese)](doc/C++_Serving/Benchmark_CN.md)
 - [Python Pipeline](doc/Python_Pipeline/Pipeline_Design_EN.md)
-  - [Analyze and optimize performance](doc/Python_Pipeline/Pipeline_Design_EN.md)
+  - [Analyze and optimize performance](doc/Python_Pipeline/Performance_Tuning_EN.md)
  - [Benchmark(Chinese)](doc/Python_Pipeline/Benchmark_CN.md)
 - Client SDK
-  - [Python SDK(Chinese)](doc/C++_Serving/Http_Service_CN.md)
+  - [Python SDK(Chinese)](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client)
  - [JAVA SDK](doc/Java_SDK_EN.md)
-  - [C++ SDK(Chinese)](doc/C++_Serving/Creat_C++Serving_CN.md)
+  - [C++ SDK(Chinese)](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client)
 - [Large-scale sparse parameter server](doc/Cube_Local_EN.md)

 <br>
@@ -93,7 +93,7 @@ The first step is to call the model save interface to generate a model parameter

 For Paddle Serving developers, we provide extended documents such as custom OP, level of detail(LOD) processing.
 - [Custom Operators](doc/C++_Serving/OP_EN.md)
- [Processing LOD Data](doc/LOD_EN.md)
+- [Processing LoD Data](doc/LOD_EN.md)
 - [FAQ(Chinese)](doc/FAQ_CN.md)

 <h2 align="center">Model Zoo</h2>
@@ -130,15 +130,12 @@ If you want to communicate with developers and other users? Welcome to join us,
 </p>

 ### QQ
- 飞桨推理部署交流群(Group No.：697765514)
+- QQ Group(Group No.：697765514)

 <p align="center">
  <img src="doc/images/qq_group_1.png" width="200">
 </p>

-### Slack
-
- [Slack channel](https://paddleserving.slack.com/archives/CUBPKHKMJ)

 > Contribution


--- a/README_CN.md
+++ b/README_CN.md
@@ -17,7 +17,7 @@
        <img alt="Forks" src="https://img.shields.io/github/forks/PaddlePaddle/Serving?color=yellow&style=flat-square">
        <img alt="Issues" src="https://img.shields.io/github/issues/PaddlePaddle/Serving?color=yellow&style=flat-square">
        <img alt="Contributors" src="https://img.shields.io/github/contributors/PaddlePaddle/Serving?color=orange&style=flat-square">
-        <img alt="Community" src="https://img.shields.io/badge/join-Wechat,QQ,Slack-orange?style=flat-square">
+        <img alt="Community" src="https://img.shields.io/badge/join-Wechat,QQ-orange?style=flat-square">
    </a>
    <br>
 <p>
@@ -27,7 +27,7 @@
 Paddle Serving依托深度学习框架PaddlePaddle旨在帮助深度学习开发者和企业提供高性能、灵活易用的工业级在线推理服务。Paddle Serving支持RESTful、gRPC、bRPC等多种协议，提供多种异构硬件和多种操作系统环境下推理解决方案，和多种经典预训练模型示例。核心特性如下：

 - 集成高性能服务端推理引擎paddle Inference和移动端引擎paddle Lite，其他机器学习平台（Caffe/TensorFlow/ONNX/PyTorch）可通过[x2paddle](https://github.com/PaddlePaddle/X2Paddle)工具迁移模型
- 具有高性能C++和高易用Python 2套框架。C++框架基于高性能bRPC网络框架打造高吞吐、低延迟的推理服务，性能领先竞品。Python框架基于gRPC/gRPC-Gateway网络框架和Python语言构建高易用、高吞吐推理服务框架。技术选型参考[技术选型](doc/Serving_Design_CN.md)
+- 具有高性能C++和高易用Python 2套框架。C++框架基于高性能bRPC网络框架打造高吞吐、低延迟的推理服务，性能领先竞品。Python框架基于gRPC/gRPC-Gateway网络框架和Python语言构建高易用、高吞吐推理服务框架。技术选型参考[技术选型](doc/Serving_Design_CN.md#21-设计选型)
 - 支持HTTP、gRPC、bRPC等多种[协议](doc/C++_Serving/Inference_Protocols_CN.md)；提供C++、Python、Java语言SDK
 - 设计并实现基于有向无环图(DAG)的异步流水线高性能推理框架，具有多模型组合、异步调度、并发推理、动态批量、多卡多流推理等特性
 - 适配x86(Intel) CPU、ARM CPU、Nvidia GPU、昆仑XPU等多种硬件；集成Intel MKLDNN、Nvidia TensorRT加速库，以及低精度和量化推理
@@ -58,7 +58,7 @@ Paddle Serving依托深度学习框架PaddlePaddle旨在帮助深度学习开发
 - [在Kuberntes集群上部署Paddle Serving](doc/Run_On_Kubernetes_CN.md)
 - [部署Paddle Serving安全网关](doc/Serving_Auth_Docker_CN.md)
 - [在异构硬件部署Paddle Serving](doc/Run_On_XPU_CN.md)
- [最新Wheel开发包](doc/Latest_Packages_CN.md)(develop分支每日更新)
+- [最新Wheel开发包(develop分支每日更新)](doc/Latest_Packages_CN.md)

 > 使用

@@ -66,7 +66,7 @@ Paddle Serving依托深度学习框架PaddlePaddle旨在帮助深度学习开发
 - [快速开始](doc/Quick_Start_CN.md)
 - [保存用于Paddle Serving的模型和配置](doc/Save_CN.md)
 - [配置和启动参数的说明](doc/Serving_Configure_CN.md)
- [RESTful/gRPC/bRPC API指南](doc/C++_Serving/Introduction_CN.md#4.Client端特性)
+- [RESTful/gRPC/bRPC API指南](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client)
 - [低精度推理](doc/Low_Precision_CN.md)
 - [常见模型数据处理](doc/Process_data_CN.md)
 - [C++ Serving简介](doc/C++_Serving/Introduction_CN.md) 
@@ -76,20 +76,20 @@ Paddle Serving依托深度学习框架PaddlePaddle旨在帮助深度学习开发
  - [加密模型推理服务](doc/C++_Serving/Encryption_CN.md)
  - [性能优化指南](doc/C++_Serving/Performance_Tuning_CN.md)
  - [性能指标](doc/C++_Serving/Benchmark_CN.md)
- [Python Pipeline简介](doc/Python_Pipeline/Pipeline_Design_CN.md)
-  - [性能优化指南](doc/Python_Pipeline/Pipeline_Design_CN.md)
+- [Python Pipeline设计](doc/Python_Pipeline/Pipeline_Design_CN.md)
+  - [性能优化指南](doc/Python_Pipeline/Performance_Tuning_CN.md)
  - [性能指标](doc/Python_Pipeline/Benchmark_CN.md)
 - 客户端SDK
-  - [Python SDK](doc/C++_Serving/Http_Service_CN.md)
+  - [Python SDK](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client)
  - [JAVA SDK](doc/Java_SDK_CN.md)
-  - [C++ SDK](doc/C++_Serving/Creat_C++Serving_CN.md)
+  - [C++ SDK](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client)
 - [大规模稀疏参数索引服务](doc/Cube_Local_CN.md)

 > 开发者

 为Paddle Serving开发者，提供自定义OP，变长数据处理。
 - [自定义OP](doc/C++_Serving/OP_CN.md)
- [变长数据(LOD)处理](doc/LOD_CN.md)
+- [变长数据(LoD)处理](doc/LOD_CN.md)
 - [常见问答](doc/FAQ_CN.md)

 <h2 align="center">模型库</h2>
@@ -130,13 +130,10 @@ Paddle Serving与Paddle模型套件紧密配合，实现大量服务化部署，
  <img src="doc/images/qq_group_1.png" width="200">
 </p>

-### Slack
- [Slack channel](https://paddleserving.slack.com/archives/CUBPKHKMJ)
-

 > 贡献代码

-如果您想为Paddle Serving贡献代码，请参考 [Contribution Guidelines](doc/Contribute_EN.md)
+如果您想为Paddle Serving贡献代码，请参考 [Contribution Guidelines(English)](doc/Contribute_EN.md)
 - 感谢 [@loveululu](https://github.com/loveululu) 提供 Cube python API
 - 感谢 [@EtachGu](https://github.com/EtachGu) 更新 docker 使用命令
 - 感谢 [@BeyondYourself](https://github.com/BeyondYourself) 提供grpc教程，更新FAQ教程，整理文件目录。

--- a/doc/Serving_Design_CN.md
+++ b/doc/Serving_Design_CN.md
@@ -27,9 +27,9 @@ Paddle Serving提供很多大规模场景需要的部署功能：1）模型管

 | 响应时间 | 吞吐 | 开发效率 | 资源利用率 | 选型 | 应用场景|
 |-----|------|-----|-----|------|------|
-| 低 | 高 | 低 | 高 |C++ Serving | 高性能场景，大型在线推荐系统召回、排序服务|
-| 高 | 高 | 较高 |高|Python Pipeline Serving| 兼顾吞吐和效率，单算子多模型组合场景，异步模式|
-| 高 | 低 | 高| 低 |Python webservice| 高迭代效率场景，小型服务或需要快速迭代，模型效果验证|
+| 低 | 最高 | 低 | 最高 |C++ Serving | 高性能场景，大型在线推荐系统召回、排序服务|
+| 最高 | 较高 | 较高 |较高|Python Pipeline Serving| 兼顾吞吐和效率，单算子多模型组合场景，异步模式|
+| 较高 | 低 | 较高| 低 |Python webservice| 高迭代效率场景，小型服务或需要快速迭代，模型效果验证|


 性能指标说明：
@@ -55,7 +55,7 @@ Paddle Serving从做顶层设计时考虑到不同团队在工业级场景中会
 > 跨平台运行

 跨平台是不依赖于操作系统，也不依赖硬件环境。一个操作系统下开发的应用，放到另一个操作系统下依然可以运行。因此，设计上既要考虑开发语言、组件是跨平台的，同时也要考虑不同系统上编译器的解释差异。
-Docker 是一个开源的应用容器引擎，让开发者可以打包他们的应用以及依赖包到一个可移植的容器中，然后发布到任何流行的Linux机器或Windows机器上。我们将Paddle Serving框架打包了多种Docker镜像，镜像列表参考《[Docker镜像](./Docker_Images_CN.md)》，根据用户的使用场景选择镜像。为方便用户使用Docker，我们提供了帮助文档《[如何在Docker中运行PaddleServing](./Run_In_Dokcer_CN.md)》。目前，Python webservice模式可在原生系统Linux和Windows双系统上部署运行。《[Windows平台使用Paddle Serving指导](./Windows_Tutorial_CN.md)》
+Docker 是一个开源的应用容器引擎，让开发者可以打包他们的应用以及依赖包到一个可移植的容器中，然后发布到任何流行的Linux机器或Windows机器上。我们将Paddle Serving框架打包了多种Docker镜像，镜像列表参考《[Docker镜像](./Docker_Images_CN.md)》，根据用户的使用场景选择镜像。为方便用户使用Docker，我们提供了帮助文档《[如何在Docker中运行PaddleServing](./Install_CN.md)》。目前，Python webservice模式可在原生系统Linux和Windows双系统上部署运行。《[Windows平台使用Paddle Serving指导](./Windows_Tutorial_CN.md)》

 > 支持多种开发语言SDK

@@ -132,7 +132,7 @@ Paddle Serving采用对称加密算法对模型进行加密，在服务加载模

 ### 3.5 A/B Test

-在对模型进行充分的离线评估后，通常需要进行在线A/B测试，来决定是否大规模上线服务。下图为使用Paddle Serving做A/B测试的基本结构，Client端做好相应的配置后，自动将流量分发给不同的Server，从而完成A/B测试。具体例子请参考《[如何使用Paddle Serving做ABTEST](./C++_Serving/ABTEST_CN.md)》。
+在对模型进行充分的离线评估后，通常需要进行在线A/B测试，来决定是否大规模上线服务。下图为使用Paddle Serving做A/B测试的基本结构，Client端做好相应的配置后，自动将流量分发给不同的Server，从而完成A/B测试。具体例子请参考《[如何使用Paddle Serving做ABTEST](./C++_Serving/ABTest_CN.md)》。

 <p align="center">
    <br>
@@ -197,12 +197,4 @@ Pipeline Serving核心设计是图执行引擎，基本处理单元是OP和Chann
 <center>
 <img src='images/pipeline_serving-image2.png' height = "300" align="middle"/>
 </center>
----
-
-## 6. 未来计划
-
-### 6.1 向量检索、树结构检索
-在推荐与广告场景的召回系统中，通常需要采用基于向量的快速检索或者基于树结构的快速检索，Paddle Serving会对这方面的检索引擎进行集成或扩展。

-### 6.2 服务监控
-集成普罗米修斯监控，一套开源的监控&报警&时间序列数据库的组合，适合k8s和docker的监控系统。
--- a/doc/Serving_Design_EN.md
+++ b/doc/Serving_Design_EN.md
@@ -25,9 +25,9 @@ In order to meet the needs of users in different scenarios, Paddle Serving's pro

 | Response time | throughput | development efficiency | Resource utilization | selection | Applications|
 |-----|------|-----|-----|------|------|
-| LOW | HIGH | LOW | HIGH |C++ Serving | High-performance，recall and ranking services of large-scale online recommendation systems|
-| HIGH | HIGH | HIGH | HIGH |Python Pipeline Serving| High-throughput, high-efficiency, asynchronous mode, fitting for single operator multi-model combination scenarios|
-| HIGH | LOW | HIGH| LOW |Python webservice| High-throughput，Low-traffic services or projects that require rapid iteration, model effect verification|
+| Low | Highest | Low | Highest |C++ Serving | High-performance，recall and ranking services of large-scale online recommendation systems|
+| Higest | Higher | Higher | Higher |Python Pipeline Serving| High-throughput, high-efficiency, asynchronous mode, fitting for single operator multi-model combination scenarios|
+| Higer | Low | Higher| Low |Python webservice| High-throughput，Low-traffic services or projects that require rapid iteration, model effect verification|

 Performance index description：
 1. Response time (ms): Average response time of a single request, calculate the response time of 50, 90, 95, 99 quantiles, the lower the better.
@@ -53,7 +53,7 @@ Paddle Serving takes into account a series of issues such as different operating

 Cross-platform is not dependent on the operating system, nor on the hardware environment. Applications developed under one operating system can still run under another operating system. Therefore, the design should consider not only the development language and the cross-platform components, but also the interpretation differences of the compilers on different systems.

-Docker is an open source application container engine that allows developers to package their applications and dependencies into a portable container, and then publish it to any popular Linux machine or Windows machine. We have packaged a variety of Docker images for the Paddle Serving framework. Refer to the image list《[Docker Images](Docker_Images_EN.md)》, Select mirrors according to user's usage. We provide Docker usage documentation《[How to run PaddleServing in Docker](Run_In_Docker_EN.md)》.Currently, the Python webservice mode can be deployed and run on the native Linux and Windows dual systems.《[Paddle Serving for Windows Users](Windows_Tutorial_EN.md)》
+Docker is an open source application container engine that allows developers to package their applications and dependencies into a portable container, and then publish it to any popular Linux machine or Windows machine. We have packaged a variety of Docker images for the Paddle Serving framework. Refer to the image list《[Docker Images](Docker_Images_EN.md)》, Select mirrors according to user's usage. We provide Docker usage documentation《[How to run PaddleServing in Docker](Install_EN.md)》.Currently, the Python webservice mode can be deployed and run on the native Linux and Windows dual systems.《[Paddle Serving for Windows Users](Windows_Tutorial_EN.md)》

 > Support multiple development languages client SDKs

@@ -199,16 +199,3 @@ The core design of Pipeline Serving is a graph execution engine, and the basic p
 <img src='images/pipeline_serving-image2.png' height = "300" align="middle"/>
 </center>

----
-
-
-## 6. Future Plan
-
-### 5.1 Auto Deployment on Cloud
-In order to make deployment more easily on public cloud, Paddle Serving considers to provides Operators on Kubernetes in submitting a service job.
-
-### 6.2 Vector Indexing and Tree based Indexing
-In recommendation and advertisement systems, it is commonly seen to use vector based index or tree based indexing service to do candidate retrievals. These retrieval tasks will be built-in services of Paddle Serving.
-
-### 6.3 Service Monitoring
-Paddle Serving will integrate Prometheus monitoring, which is a set of open source monitoring & alarm & time series database combination, suitable for k8s and docker monitoring systems.
--- a/java/README.md
+++ b/java/README.md
@@ -75,7 +75,7 @@ java -cp paddle-serving-sdk-java-examples-0.0.1-jar-with-dependencies.jar Pipeli

 2.Currently Serving has launched the Pipeline mode (see [Pipeline Serving](../doc/Python_Pipeline/Pipeline_Design_EN.md) for details). Pipeline Serving Client for Java is released.

-3.The parameters`ip` and`port` in PipelineClientExample.java(path:java/examples/src/main/java/[PipelineClientExample.java](./examples/src/main/java/PipelineClientExample.java)），needs to be connected with the corresponding pipeline server parameters`ip` and`port` which is defined in the config.yaml(Taking IMDB model ensemble as an example，path:python/examples/pipeline/imdb_model_ensemble/[config.yaml](../python/examples/pipeline/imdb_model_ensemble/config.yml)）
+3.The parameters`ip` and`port` in PipelineClientExample.java(path:java/examples/src/main/java/[PipelineClientExample.java](./examples/src/main/java/PipelineClientExample.java)），needs to be connected with the corresponding pipeline server parameters`ip` and`port` which is defined in the config.yaml(Taking IMDB model ensemble as an example，path:python/examples/pipeline/imdb_model_ensemble/[config.yaml](../examples/Pipeline/imdb_model_ensemble/config.yml)）

 ### Customization guidance