未验证 提交 f4c6d6cb 编写于 作者: D Dong Daxiang 提交者: GitHub

Update DESIGN_DOC_EN.md

上级 6d67da5a
...@@ -120,40 +120,42 @@ server.prepare_server(port=9292, device="cpu") ...@@ -120,40 +120,42 @@ server.prepare_server(port=9292, device="cpu")
server.run_server() server.run_server()
``` ```
#### 2.1.3 客户端访问API #### 2.1.3 Paddle Serving Client API
Paddle Serving支持远程服务访问的协议一种是基于RPC,另一种是HTTP。用户通过RPC访问,可以使用Paddle Serving提供的Python Client API,通过定制输入数据的格式来实现服务访问。下面的例子解释Paddle Serving Client如何定义输入数据。保存可部署模型时需要指定每个输入的别名,例如`sparse``dense`,对应的数据可以是离散的ID序列`[1, 1001, 100001]`,也可以是稠密的向量`[0.2, 0.5, 0.1, 0.4, 0.11, 0.22]`。当前Client的设计,对于离散的ID序列,支持Paddle中的`lod_level=0``lod_level=1`的情况,即张量以及一维变长张量。对于稠密的向量,支持`N-D Tensor`。用户不想要显式指定输入数据的形状,Paddle Serving的Client API会通过保存配置时记录的输入形状进行对应的检查。 Paddle Serving supports remote service access through RPC(remote procedure call) and HTTP. RPC access of remote service can be called through Client API of Paddle Serving. A user can define data preprocess function before calling Paddle Serving's client API. The example below explains how to define the input data of Paddle Serving Client. The servable model has two inputs with alias name of `sparse` and `dense`. `sparse` corresponds to sparse sequence ids such as `[1, 1001, 100001]` and `dense` corresponds to dense vector such as `[0.2, 0.5, 0.1, 0.4, 0.11, 0.22]`. For sparse sequence data, current design supports `lod_level=0` and `lod_level=1` of Paddle, that corresponds to `Tensor` and `LodTensor`. For dense vector, current design supports any `N-D Tensor`. Users do not need to assign the shape of inference model input. The Paddle Serving Client API will check the input data's shape with servable configurations.
``` python ``` python
feed_dict["sparse"] = [1, 1001, 100001] feed_dict["sparse"] = [1, 1001, 100001]
feed_dict["dense"] = [0.2, 0.5, 0.1, 0.4, 0.11, 0.22] feed_dict["dense"] = [0.2, 0.5, 0.1, 0.4, 0.11, 0.22]
fetch_map = client.predict(feed=feed_dict, fetch=["prob"]) fetch_map = client.predict(feed=feed_dict, fetch=["prob"])
``` ```
Client链接Server的代码,通常只需要加载保存模型时保存的Client端配置,以及指定要去访问的服务端点即可。为了保持内部访问进行数据并行的扩展能力,Paddle Serving Client允许定义多个服务端点。
The following code sample shows that Paddle Serving Client API connects to Server API with endpoint of the servers. To use the data parallelism ability during prediction, Paddle Serving Client allows users to define multiple server endpoints.
``` python ``` python
client = Client() client = Client()
client.load_client_config('servable_client_configs') client.load_client_config('servable_client_configs')
client.connect(["127.0.0.1:9292"]) client.connect(["127.0.0.1:9292"])
``` ```
### 2.2 Underlying Communication Mechanism
Paddle Serving adopts [baidu-rpc](https://github.com/apache/incubator-brpc) as underlying communication layer. baidu-rpc is an open-source RPC communication library with high concurrency and low latency advantages compared with other open source RPC library. Millions of instances and thousands of services are using baidu-rpc within Baidu.
### 2.2 底层通信机制 ### 2.3 Core Execution Engine
Paddle Serving采用[baidu-rpc](https://github.com/apache/incubator-brpc)进行底层的通信。baidu-rpc是百度开源的一款PRC通信库,具有高并发、低延时等特点,已经支持了包括百度在内上百万在线预估实例、上千个在线预估服务,稳定可靠。Paddle Serving底层采用baidu-rpc的另一个原因是深度学习模型的远程调用服务通常对延时比较敏感,需要采用一款延时较低的rpc。 The core execution engine of Paddle Serving is a Directed acyclic graph(DAG). In the DAG, each node represents a phase of inference service, such as paddle inference prediction, data preprocessing and data postprocessing. DAG can fully parallelize the computation efficiency and can fully utilize the computation resources. For example, when a user has input data that needs to be feed into two models, and combine the scores of the two models, the computation of model scoring is parallelized through DAG.
### 2.3 核心执行引擎
Paddle Serving的核心执行引擎是一个有向无环图,图中的每个节点代表预估服务的一个环节,例如计算模型预测打分就是其中一个环节。有向无环图有利于可并发节点充分利用部署实例内的计算资源,缩短延时。一个例子,当同一份输入需要送入两个不同的模型进行预估,并将两个模型预估的打分进行加权求和时,两个模型的打分过程即可以通过有向无环图的拓扑关系并发。
<p align="center"> <p align="center">
<br> <br>
<img src='design_doc.png'"> <img src='design_doc.png'">
<br> <br>
<p> <p>
### 2.4 微服务插件模式 ### 2.4 Micro service plugin
由于Paddle Serving底层采用基于C++的通信组件,并且核心框架也是基于C/C++编写,当用户想要在服务端定义复杂的前处理与后处理逻辑时,一种办法是修改Paddle Serving底层框架,重新编译源码。另一种方式可以通过在服务端嵌入轻量级的Web服务,通过在Web服务中实现更复杂的预处理逻辑,从而搭建一套逻辑完整的服务。当访问量超过了Web服务能够接受的范围,开发者有足够的理由开发一些高性能的C++预处理逻辑,并嵌入到Serving的原生服务库中。Web服务和RPC服务的关系以及他们的组合方式可以参考下文`用户类型`中的说明。 The underlying communication of Paddle Serving is implemented with C++ as well as the core framework, it is hard for users who do not familiar with C++ to implement new Paddle Serving Server Operators. Another approach is to use the light-weighted Web Service in Paddle Serving Server that can be viewed as a plugin. A user can implement complex data preprocessing and postprocessing logics to build a complex AI service. If access of the AI service has a large volumn, it is worth to implement the service with high performance Paddle Serving Server operators. The relationship between Web Service and RPC Service can be referenced in `User Type`.
## 3. 工业级特性 ## 3. Industrial Features
### 3.1 分布式稀疏参数索引 ### 3.1 Distributed Sparse Parameter Indexing
分布式稀疏参数索引通常在广告推荐中出现,并与分布式训练配合形成完整的离线-在线一体化部署。下图解释了其中的流程,产品的在线服务接受用户请求后将请求发送给预估服务,同时系统会记录用户的请求以进行相应的训练日志处理和拼接。离线分布式训练系统会针对流式产出的训练日志进行模型增量训练,而增量产生的模型会配送至分布式稀疏参数索引服务,同时对应的稠密的模型参数也会配送至在线的预估服务。在线服务由两部分组成,一部分是针对用户的请求提取特征后,将需要进行模型的稀疏参数索引的特征发送请求给分布式稀疏参数索引服务,针对分布式稀疏参数索引服务返回的稀疏参数再进行后续深度学习模型的计算流程,从而完成预估。 Distributed Sparse Parameter Indexing is commonly seen in advertising and recommendation scenarios, and is often used coupled with distributed training. The figure below explains a commonly seen architecture for online recommendation. When the recommendation service receives a request from a user, the system will automatically collects training log for the offline distributed online training. Mean while, the request is sent to Paddle Serving Server. For sparse features, distributed sparse parameter index service is called so that sparse parameters can be looked up. The dense input features together with the looked up sparse model parameters are fed into the Paddle Inference Node of the DAG in Paddle Serving Server. Then the score can be responsed through RPC to product service for item ranking.
<p align="center"> <p align="center">
<br> <br>
...@@ -161,14 +163,14 @@ Paddle Serving的核心执行引擎是一个有向无环图,图中的每个节 ...@@ -161,14 +163,14 @@ Paddle Serving的核心执行引擎是一个有向无环图,图中的每个节
<br> <br>
<p> <p>
为什么要使用Paddle Serving提供的分布式稀疏参数索引服务?1)在一些推荐场景中,模型的输入特征规模通常可以达到上千亿,单台机器无法支撑T级别模型在内存的保存,因此需要进行分布式存储。2)Paddle Serving提供的分布式稀疏参数索引服务,具有并发请求多个节点的能力,从而以较低的延时完成预估服务。 Why do we need to support distributed sparse parameter indexing in Paddle Serving? 1) In some recommendation scenarios, the number of features can be up to hundreds of billions that a single node can not hold the parameters within random access memory. 2) Paddle Serving supports distributed sparse parameter indexing that can couple with paddle inference. Users do not need to do extra work to have a low latency inference engine with hundreds of billions of parameters.
### 3.2 模型管理、在线A/B流量测试、模型热加载 ### 3.2 Model Management, online A/B test, Model Online Reloading
Paddle Serving的C++引擎支持模型管理、在线A/B流量测试、模型热加载等功能,当前在Python API还有没完全开放这部分功能的配置,敬请期待。 Paddle Serving's C++ engine supports model management, online A/B test and model online reloading. Currently, python API is not released yet, please wait for the next release.
## 4. 用户类型 ## 4. User Types
Paddle Serving面向的用户提供RPC和HTTP两种访问协议。对于HTTP协议,我们更倾向于流量中小型的服务使用,并且对延时没有严格要求的AI服务开发者。对于RPC协议,我们面向流量较大,对延时要求更高的用户,此外RPC的客户端可能也处在一个大系统的服务中,这种情况下非常适合使用Paddle Serving提供的RPC服务。对于使用分布式稀疏参数索引服务而言,Paddle Serving的用户不需要关心底层的细节,其调用本质也是通过RPC服务再调用RPC服务。下图给出了当前设计的Paddle Serving可能会使用Serving服务的几种场景。 Paddle Serving provides RPC and HTTP protocol for users. For HTTP service, we recommend users with median or small traffic services to use, and the latency is not a strict requirement. For RPC protocol, we recommend high traffic services and low latency required services to use. For users who use distributed sparse parameter indexing built-in service, it is not necessary to care about the underlying details of communication. The following figure gives out several scenarios that user may want to use Paddle Serving.
<p align="center"> <p align="center">
<br> <br>
...@@ -176,11 +178,11 @@ Paddle Serving面向的用户提供RPC和HTTP两种访问协议。对于HTTP协 ...@@ -176,11 +178,11 @@ Paddle Serving面向的用户提供RPC和HTTP两种访问协议。对于HTTP协
<br> <br>
<p> <p>
对于普通的模型而言(具体指通过Serving提供的IO保存的模型,并且没有对模型进行后处理),用户使用RPC服务不需要额外的开发即可实现服务启动,但需要开发一些Client端的代码来使用服务。对于Web服务的开发,需要用户现在Paddle Serving提供的Web Service框架中进行前后处理的开发,从而实现整个HTTP服务。 For servable models saved from Paddle Serving IO API, users do not need to do extra coding work to startup a service, but may need some coding work on the client side. For development of Web Service plugin, a user needs to provide implementation of Web Service's preprocessing and postprocessing work if needed to get a HTTP service.
### 4.1 Web服务开发 ### 4.1 Web Service Development
Web服务有很多开源的框架,Paddle Serving当前集成了Flask框架,但这部分对用户不可见,在未来可能会提供性能更好的Web框架作为底层HTTP服务集成引擎。用户需要继承WebService,从而实现对rpc服务的输入输出进行加工的目的。 Web Service has lots of open sourced framework. Currently Paddle Serving uses Flask as built-in service framework, and users are not aware of this. More efficient web service will be integrated in the furture if needed.
``` python ``` python
from paddle_serving_server.web_service import WebService from paddle_serving_server.web_service import WebService
...@@ -211,15 +213,15 @@ imdb_service.prepare_dict({"dict_file_path": sys.argv[4]}) ...@@ -211,15 +213,15 @@ imdb_service.prepare_dict({"dict_file_path": sys.argv[4]})
imdb_service.run_server() imdb_service.run_server()
``` ```
`WebService`作为基类,提供将用户接受的HTTP请求转化为RPC输入的接口`preprocess`,同时提供对RPC请求返回的结果进行后处理的接口`postprocess`,继承`WebService`的子类,可以定义各种类型的成员函数。`WebService`的启动命令和普通RPC服务提供的启动API一致。 `WebService` is a Base Class, providing inheritable interfaces such `preprocess` and `postprocess` for users to implement. In the inherited class of `WebService` class, users can define any functions they want and the startup function interface is the same as RPC service.
## 5. 未来计划 ## 5. Future Plan
### 5.1 有向无环图结构定义开放 ### 5.1 Open DAG definition API
当前版本开放的python API仅支持用户定义Sequential类型的执行流,如果想要进行Server进程内复杂的计算,需要增加对应的用户API。 Current version of Paddle Serving Server supports sequential type of execution flow. DAG definition API can be more helpful to users on complex tasks.
### 5.2 云端自动部署能力 ### 5.2 Auto Deployment on Cloud
为了方便用户更容易将Paddle的预测模型部署到线上,Paddle Serving在接下来的版本会提供Kubernetes生态下任务编排的工具。 In order to make deployment more easily on public cloud, Paddle Serving considers to provides Operators on Kubernetes in submitting a service job.
### 5.3 向量检索、树结构检索 ### 5.3 Vector Indexing and Tree based Indexing
在推荐与广告场景的召回系统中,通常需要采用基于向量的快速检索或者基于树结构的快速检索,Paddle Serving会对这方面的检索引擎进行集成或扩展。 In recommendation and advertisement systems, it is commonly seen to use vector based index or tree based indexing service to do candidate retrievals. These retrieval tasks will be built-in services of Paddle Serving.
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册