未验证 提交 32719d2d 编写于 作者: J Jiawei Wang 提交者: GitHub

Merge branch 'v0.7.0' into v0.7.0

......@@ -17,7 +17,7 @@
<img alt="Forks" src="https://img.shields.io/github/forks/PaddlePaddle/Serving?color=yellow&style=flat-square">
<img alt="Issues" src="https://img.shields.io/github/issues/PaddlePaddle/Serving?color=yellow&style=flat-square">
<img alt="Contributors" src="https://img.shields.io/github/contributors/PaddlePaddle/Serving?color=orange&style=flat-square">
<img alt="Community" src="https://img.shields.io/badge/join-Wechat,QQ,Slack-orange?style=flat-square">
<img alt="Community" src="https://img.shields.io/badge/join-Wechat,QQ-orange?style=flat-square">
</a>
<br>
<p>
......@@ -58,9 +58,9 @@ This chapter guides you through the installation and deployment steps. It is str
- [Install Paddle Serving using docker](doc/Install_EN.md)
- [Build Paddle Serving from Source with Docker](doc/Compile_EN.md)
- [Deploy Paddle Serving on Kubernetes](doc/Run_On_Kubernetes_CN.md)
- [Deploy Paddle Serving with Security gateway](doc/Serving_Auth_Docker_CN.md)
- [Deploy Paddle Serving with Security gateway(Chinese)](doc/Serving_Auth_Docker_CN.md)
- [Deploy Paddle Serving on more hardwares](doc/Run_On_XPU_EN.md)
- [Latest Wheel packages](doc/Latest_Packages_CN.md)(Update everyday on branch develop)
- [Latest Wheel packages(Update everyday on branch develop)](doc/Latest_Packages_CN.md)
> Use
......@@ -69,23 +69,23 @@ The first step is to call the model save interface to generate a model parameter
- [Quick Start](doc/Quick_Start_EN.md)
- [Save a servable model](doc/Save_EN.md)
- [Description of configuration and startup parameters](doc/Serving_Configure_EN.md)
- [Guide for RESTful/gRPC/bRPC APIs](doc/C++_Serving/Introduction_CN.md)
- [Infer on quantizative models](doc/Low_Precision_CN.md)
- [Data format of classic models](doc/Process_data_CN.md)
- [C++ Serving](doc/C++_Serving/Introduction_CN.md)
- [protocols](doc/C++_Serving/Inference_Protocols_CN.md)
- [Guide for RESTful/gRPC/bRPC APIs(Chinese)](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client)
- [Infer on quantizative models](doc/Low_Precision_EN.md)
- [Data format of classic models(Chinese)](doc/Process_data_CN.md)
- [C++ Serving(Chinese)](doc/C++_Serving/Introduction_CN.md)
- [Protocols(Chinese)](doc/C++_Serving/Inference_Protocols_CN.md)
- [Hot loading models](doc/C++_Serving/Hot_Loading_EN.md)
- [A/B Test](doc/C++_Serving/ABTest_EN.md)
- [Encryption](doc/C++_Serving/Encryption_EN.md)
- [Analyze and optimize performance(Chinese)](doc/C++_Serving/Performance_Tuning_CN.md)
- [Benchmark(Chinese)](doc/C++_Serving/Benchmark_CN.md)
- [Python Pipeline](doc/Python_Pipeline/Pipeline_Design_EN.md)
- [Analyze and optimize performance](doc/Python_Pipeline/Pipeline_Design_EN.md)
- [Analyze and optimize performance](doc/Python_Pipeline/Performance_Tuning_EN.md)
- [Benchmark(Chinese)](doc/Python_Pipeline/Benchmark_CN.md)
- Client SDK
- [Python SDK(Chinese)](doc/C++_Serving/Http_Service_CN.md)
- [Python SDK(Chinese)](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client)
- [JAVA SDK](doc/Java_SDK_EN.md)
- [C++ SDK(Chinese)](doc/C++_Serving/Creat_C++Serving_CN.md)
- [C++ SDK(Chinese)](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client)
- [Large-scale sparse parameter server](doc/Cube_Local_EN.md)
<br>
......@@ -94,7 +94,7 @@ The first step is to call the model save interface to generate a model parameter
For Paddle Serving developers, we provide extended documents such as custom OP, level of detail(LOD) processing.
- [Custom Operators](doc/C++_Serving/OP_EN.md)
- [Processing LOD Data](doc/LOD_EN.md)
- [Processing LoD Data](doc/LOD_EN.md)
- [FAQ(Chinese)](doc/FAQ_CN.md)
<h2 align="center">Model Zoo</h2>
......@@ -112,10 +112,10 @@ Paddle Serving works closely with the Paddle model suite, and implements a large
For more model examples, read [Model zoo](doc/Model_Zoo_EN.md)
<center class="half">
<p align="center">
<img src="https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/imgs_results/PP-OCRv2/PP-OCRv2-pic003.jpg?raw=true" width="345"/>
<img src="doc/images/detection.png" width="350">
</center>
</p>
<h2 align="center">Community</h2>
......@@ -124,19 +124,19 @@ If you want to communicate with developers and other users? Welcome to join us,
### Wechat
- WeChat scavenging
<center class="half">
<p align="center">
<img src="doc/images/wechat_group_1.jpeg" width="250">
</center>
</p>
### QQ
- 飞桨推理部署交流群(Group No.:697765514)
<center class="half">
<img src="doc/images/qq_group_1.png" width="200">
</center>
- QQ Group(Group No.:697765514)
### Slack
<p align="center">
<img src="doc/images/qq_group_1.png" width="200">
</p>
- [Slack channel](https://paddleserving.slack.com/archives/CUBPKHKMJ)
> Contribution
......
......@@ -17,7 +17,7 @@
<img alt="Forks" src="https://img.shields.io/github/forks/PaddlePaddle/Serving?color=yellow&style=flat-square">
<img alt="Issues" src="https://img.shields.io/github/issues/PaddlePaddle/Serving?color=yellow&style=flat-square">
<img alt="Contributors" src="https://img.shields.io/github/contributors/PaddlePaddle/Serving?color=orange&style=flat-square">
<img alt="Community" src="https://img.shields.io/badge/join-Wechat,QQ,Slack-orange?style=flat-square">
<img alt="Community" src="https://img.shields.io/badge/join-Wechat,QQ-orange?style=flat-square">
</a>
<br>
<p>
......@@ -27,7 +27,7 @@
Paddle Serving依托深度学习框架PaddlePaddle旨在帮助深度学习开发者和企业提供高性能、灵活易用的工业级在线推理服务。Paddle Serving支持RESTful、gRPC、bRPC等多种协议,提供多种异构硬件和多种操作系统环境下推理解决方案,和多种经典预训练模型示例。核心特性如下:
- 集成高性能服务端推理引擎paddle Inference和移动端引擎paddle Lite,其他机器学习平台(Caffe/TensorFlow/ONNX/PyTorch)可通过[x2paddle](https://github.com/PaddlePaddle/X2Paddle)工具迁移模型
- 具有高性能C++和高易用Python 2套框架。C++框架基于高性能bRPC网络框架打造高吞吐、低延迟的推理服务,性能领先竞品。Python框架基于gRPC/gRPC-Gateway网络框架和Python语言构建高易用、高吞吐推理服务框架。技术选型参考[技术选型](doc/Serving_Design_CN.md)
- 具有高性能C++和高易用Python 2套框架。C++框架基于高性能bRPC网络框架打造高吞吐、低延迟的推理服务,性能领先竞品。Python框架基于gRPC/gRPC-Gateway网络框架和Python语言构建高易用、高吞吐推理服务框架。技术选型参考[技术选型](doc/Serving_Design_CN.md#21-设计选型)
- 支持HTTP、gRPC、bRPC等多种[协议](doc/C++_Serving/Inference_Protocols_CN.md);提供C++、Python、Java语言SDK
- 设计并实现基于有向无环图(DAG)的异步流水线高性能推理框架,具有多模型组合、异步调度、并发推理、动态批量、多卡多流推理等特性
- 适配x86(Intel) CPU、ARM CPU、Nvidia GPU、昆仑XPU等多种硬件;集成Intel MKLDNN、Nvidia TensorRT加速库,以及低精度和量化推理
......@@ -48,17 +48,15 @@ Paddle Serving依托深度学习框架PaddlePaddle旨在帮助深度学习开发
<h2 align="center">文档</h2>
***
> 部署
此章节引导您完成安装和部署步骤,强烈推荐使用Docker部署Paddle Serving,如您不使用docker,省略docker相关步骤。在云服务器上可以使用Kubernetes部署Paddle Serving。在异构硬件如ARM CPU、昆仑XPU上编译或使用Paddle Serving可以下面的文档。每天编译生成develop分支的最新开发包供开发者使用。
此章节引导您完成安装和部署步骤,强烈推荐使用Docker部署Paddle Serving,如您不使用docker,省略docker相关步骤。在云服务器上可以使用Kubernetes部署Paddle Serving。在异构硬件如ARM CPU、昆仑XPU上编译或使用Paddle Serving可阅读以下文档。每天编译生成develop分支的最新开发包供开发者使用。
- [使用docker安装Paddle Serving](doc/Install_CN.md)
- [源码编译安装Paddle Serving](doc/Compile_CN.md)
- [在Kuberntes集群上部署Paddle Serving](doc/Run_On_Kubernetes_CN.md)
- [部署Paddle Serving安全网关](doc/Serving_Auth_Docker_CN.md)
- [在异构硬件部署Paddle Serving](doc/Run_On_XPU_CN.md)
- [最新Wheel开发包](doc/Latest_Packages_CN.md)(develop分支每日更新)
- [最新Wheel开发包(develop分支每日更新)](doc/Latest_Packages_CN.md)
> 使用
......@@ -66,7 +64,7 @@ Paddle Serving依托深度学习框架PaddlePaddle旨在帮助深度学习开发
- [快速开始](doc/Quick_Start_CN.md)
- [保存用于Paddle Serving的模型和配置](doc/Save_CN.md)
- [配置和启动参数的说明](doc/Serving_Configure_CN.md)
- [RESTful/gRPC/bRPC API指南](doc/C++_Serving/Introduction_CN.md#4.Client端特性)
- [RESTful/gRPC/bRPC API指南](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client)
- [低精度推理](doc/Low_Precision_CN.md)
- [常见模型数据处理](doc/Process_data_CN.md)
- [C++ Serving简介](doc/C++_Serving/Introduction_CN.md)
......@@ -76,20 +74,20 @@ Paddle Serving依托深度学习框架PaddlePaddle旨在帮助深度学习开发
- [加密模型推理服务](doc/C++_Serving/Encryption_CN.md)
- [性能优化指南](doc/C++_Serving/Performance_Tuning_CN.md)
- [性能指标](doc/C++_Serving/Benchmark_CN.md)
- [Python Pipeline简介](doc/Python_Pipeline/Pipeline_Design_CN.md)
- [性能优化指南](doc/Python_Pipeline/Pipeline_Design_CN.md)
- [Python Pipeline设计](doc/Python_Pipeline/Pipeline_Design_CN.md)
- [性能优化指南](doc/Python_Pipeline/Performance_Tuning_CN.md)
- [性能指标](doc/Python_Pipeline/Benchmark_CN.md)
- 客户端SDK
- [Python SDK](doc/C++_Serving/Http_Service_CN.md)
- [Python SDK](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client)
- [JAVA SDK](doc/Java_SDK_CN.md)
- [C++ SDK](doc/C++_Serving/Creat_C++Serving_CN.md)
- [C++ SDK](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client)
- [大规模稀疏参数索引服务](doc/Cube_Local_CN.md)
> 开发者
为Paddle Serving开发者,提供自定义OP,变长数据处理。
- [自定义OP](doc/C++_Serving/OP_CN.md)
- [变长数据(LOD)处理](doc/LOD_CN.md)
- [变长数据(LoD)处理](doc/LOD_CN.md)
- [常见问答](doc/FAQ_CN.md)
<h2 align="center">模型库</h2>
......@@ -104,7 +102,7 @@ Paddle Serving与Paddle模型套件紧密配合,实现大量服务化部署,
</p>
更多模型示例参考Repo,可进入[模型库](doc/Model_Zoo_CN.md)
更多模型示例进入[模型库](doc/Model_Zoo_CN.md)
<p align="center">
<img src="https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/imgs_results/PP-OCRv2/PP-OCRv2-pic003.jpg?raw=true" width="345"/>
......@@ -130,13 +128,10 @@ Paddle Serving与Paddle模型套件紧密配合,实现大量服务化部署,
<img src="doc/images/qq_group_1.png" width="200">
</p>
### Slack
- [Slack channel](https://paddleserving.slack.com/archives/CUBPKHKMJ)
> 贡献代码
如果您想为Paddle Serving贡献代码,请参考 [Contribution Guidelines](doc/Contribute_EN.md)
如果您想为Paddle Serving贡献代码,请参考 [Contribution Guidelines(English)](doc/Contribute_EN.md)
- 感谢 [@loveululu](https://github.com/loveululu) 提供 Cube python API
- 感谢 [@EtachGu](https://github.com/EtachGu) 更新 docker 使用命令
- 感谢 [@BeyondYourself](https://github.com/BeyondYourself) 提供grpc教程,更新FAQ教程,整理文件目录。
......
......@@ -76,7 +76,7 @@ C++ Serving采用对称加密算法对模型进行加密,在服务加载模型
<p>
### 4.2 多语言多协议Client
BRPC网络框架支持[多种底层通信协议](#1.网络框架(BRPC)),即使用目前的C++ Serving框架的Server端,各种语言的Client端,甚至使用curl的方式,只要按照上述协议(具体支持的协议见[brpc官网](https://github.com/apache/incubator-brpc))封装数据并发送,Server端就能够接收、处理和返回结果。
BRPC网络框架支持[多种底层通信协议](#1网络框架BRPC),即使用目前的C++ Serving框架的Server端,各种语言的Client端,甚至使用curl的方式,只要按照上述协议(具体支持的协议见[brpc官网](https://github.com/apache/incubator-brpc))封装数据并发送,Server端就能够接收、处理和返回结果。
对于支持的各种协议我们提供了部分的Client SDK示例供用户参考和使用,用户也可以根据自己的需求去开发新的Client SDK,也欢迎用户添加其他语言/协议(例如GRPC-Go、GRPC-C++ HTTP2-Go、HTTP2-Java等)Client SDK到我们的仓库供其他开发者借鉴和参考。
......
......@@ -332,7 +332,7 @@ GLOG_v=2 python -m paddle_serving_server.serve --model xxx_conf/ --port 9999
使用gdb调试core文件的方法为:gdb <可执行文件> <core文件>,进入后输入bt指令,一般即可显示出错在哪一行。
注意:可执行文件路径是C++ bin文件的路径,而不是python命令,一般为类似下面的这种/usr/local/lib/python3.6/site-packages/paddle_serving_server/serving-gpu-102-0.6.2/serving
注意:可执行文件路径是C++ bin文件的路径,而不是python命令,一般为类似下面的这种/usr/local/lib/python3.6/site-packages/paddle_serving_server/serving-gpu-102-0.7.0/serving
## 性能优化
......@@ -20,7 +20,7 @@ kubectl apply -f https://bit.ly/kong-ingress-dbless
### 制作Serving运行镜像(可选):
首先您需要确定运行镜像的具体环境。和[DOCKER开发镜像列表]()文档相比,开发镜像用于调试、编译代码,携带了大量的开发工具,因此镜像体积较大。运行镜像通常要求缩小容器体积以提高部署的灵活性。如果您不太需要轻量级的运行容器,请直接跳过这一部分。
首先您需要确定运行镜像的具体环境。和[DOCKER开发镜像列表](./Docker_Images_CN.md)文档相比,开发镜像用于调试、编译代码,携带了大量的开发工具,因此镜像体积较大。运行镜像通常要求缩小容器体积以提高部署的灵活性。如果您不太需要轻量级的运行容器,请直接跳过这一部分。
`tools/generate_runtime_docker.sh`文件下,它的使用方式如下
......
......@@ -27,9 +27,9 @@ Paddle Serving提供很多大规模场景需要的部署功能:1)模型管
| 响应时间 | 吞吐 | 开发效率 | 资源利用率 | 选型 | 应用场景|
|-----|------|-----|-----|------|------|
| 低 | 高 | 低 | 高 |C++ Serving | 高性能场景,大型在线推荐系统召回、排序服务|
| 高 | 高 | 较高 |高|Python Pipeline Serving| 兼顾吞吐和效率,单算子多模型组合场景,异步模式|
| 高 | 低 | 高| 低 |Python webservice| 高迭代效率场景,小型服务或需要快速迭代,模型效果验证|
| 低 | 最高 | 低 | 最高 |C++ Serving | 高性能场景,大型在线推荐系统召回、排序服务|
| 最高 | 较高 | 较高 |较高|Python Pipeline Serving| 兼顾吞吐和效率,单算子多模型组合场景,异步模式|
| 较高 | 低 | 较高| 低 |Python webservice| 高迭代效率场景,小型服务或需要快速迭代,模型效果验证|
性能指标说明:
......@@ -197,12 +197,4 @@ Pipeline Serving核心设计是图执行引擎,基本处理单元是OP和Chann
<center>
<img src='images/pipeline_serving-image2.png' height = "300" align="middle"/>
</center>
----
## 6. 未来计划
### 6.1 向量检索、树结构检索
在推荐与广告场景的召回系统中,通常需要采用基于向量的快速检索或者基于树结构的快速检索,Paddle Serving会对这方面的检索引擎进行集成或扩展。
### 6.2 服务监控
集成普罗米修斯监控,一套开源的监控&报警&时间序列数据库的组合,适合k8s和docker的监控系统。
......@@ -25,9 +25,9 @@ In order to meet the needs of users in different scenarios, Paddle Serving's pro
| Response time | throughput | development efficiency | Resource utilization | selection | Applications|
|-----|------|-----|-----|------|------|
| LOW | HIGH | LOW | HIGH |C++ Serving | High-performance,recall and ranking services of large-scale online recommendation systems|
| HIGH | HIGH | HIGH | HIGH |Python Pipeline Serving| High-throughput, high-efficiency, asynchronous mode, fitting for single operator multi-model combination scenarios|
| HIGH | LOW | HIGH| LOW |Python webservice| High-throughput,Low-traffic services or projects that require rapid iteration, model effect verification|
| Low | Highest | Low | Highest |C++ Serving | High-performance,recall and ranking services of large-scale online recommendation systems|
| Higest | Higher | Higher | Higher |Python Pipeline Serving| High-throughput, high-efficiency, asynchronous mode, fitting for single operator multi-model combination scenarios|
| Higer | Low | Higher| Low |Python webservice| High-throughput,Low-traffic services or projects that require rapid iteration, model effect verification|
Performance index description:
1. Response time (ms): Average response time of a single request, calculate the response time of 50, 90, 95, 99 quantiles, the lower the better.
......@@ -199,16 +199,4 @@ The core design of Pipeline Serving is a graph execution engine, and the basic p
<img src='images/pipeline_serving-image2.png' height = "300" align="middle"/>
</center>
----
## 6. Future Plan
### 5.1 Auto Deployment on Cloud
In order to make deployment more easily on public cloud, Paddle Serving considers to provides Operators on Kubernetes in submitting a service job.
### 6.2 Vector Indexing and Tree based Indexing
In recommendation and advertisement systems, it is commonly seen to use vector based index or tree based indexing service to do candidate retrievals. These retrieval tasks will be built-in services of Paddle Serving.
### 6.3 Service Monitoring
Paddle Serving will integrate Prometheus monitoring, which is a set of open source monitoring & alarm & time series database combination, suitable for k8s and docker monitoring systems.
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册