diff --git a/README.md b/README.md index 7b4a4f385f76b00f1307bdc7c0eedf177116901f..e6f49526aae36941de5ae4d545bd9f5dfe7fd4d9 100755 --- a/README.md +++ b/README.md @@ -17,7 +17,7 @@ Forks Issues Contributors - Community + Community

@@ -58,9 +58,9 @@ This chapter guides you through the installation and deployment steps. It is str - [Install Paddle Serving using docker](doc/Install_EN.md) - [Build Paddle Serving from Source with Docker](doc/Compile_EN.md) - [Deploy Paddle Serving on Kubernetes](doc/Run_On_Kubernetes_CN.md) -- [Deploy Paddle Serving with Security gateway](doc/Serving_Auth_Docker_CN.md) +- [Deploy Paddle Serving with Security gateway(Chinese)](doc/Serving_Auth_Docker_CN.md) - [Deploy Paddle Serving on more hardwares](doc/Run_On_XPU_EN.md) -- [Latest Wheel packages](doc/Latest_Packages_CN.md)(Update everyday on branch develop) +- [Latest Wheel packages(Update everyday on branch develop)](doc/Latest_Packages_CN.md) > Use @@ -69,23 +69,23 @@ The first step is to call the model save interface to generate a model parameter - [Quick Start](doc/Quick_Start_EN.md) - [Save a servable model](doc/Save_EN.md) - [Description of configuration and startup parameters](doc/Serving_Configure_EN.md) -- [Guide for RESTful/gRPC/bRPC APIs](doc/C++_Serving/Introduction_CN.md) -- [Infer on quantizative models](doc/Low_Precision_CN.md) -- [Data format of classic models](doc/Process_data_CN.md) -- [C++ Serving](doc/C++_Serving/Introduction_CN.md) - - [protocols](doc/C++_Serving/Inference_Protocols_CN.md) +- [Guide for RESTful/gRPC/bRPC APIs(Chinese)](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client) +- [Infer on quantizative models](doc/Low_Precision_EN.md) +- [Data format of classic models(Chinese)](doc/Process_data_CN.md) +- [C++ Serving(Chinese)](doc/C++_Serving/Introduction_CN.md) + - [Protocols(Chinese)](doc/C++_Serving/Inference_Protocols_CN.md) - [Hot loading models](doc/C++_Serving/Hot_Loading_EN.md) - [A/B Test](doc/C++_Serving/ABTest_EN.md) - [Encryption](doc/C++_Serving/Encryption_EN.md) - [Analyze and optimize performance(Chinese)](doc/C++_Serving/Performance_Tuning_CN.md) - [Benchmark(Chinese)](doc/C++_Serving/Benchmark_CN.md) - [Python Pipeline](doc/Python_Pipeline/Pipeline_Design_EN.md) - - [Analyze and optimize performance](doc/Python_Pipeline/Pipeline_Design_EN.md) + - [Analyze and optimize performance](doc/Python_Pipeline/Performance_Tuning_EN.md) - [Benchmark(Chinese)](doc/Python_Pipeline/Benchmark_CN.md) - Client SDK - - [Python SDK(Chinese)](doc/C++_Serving/Http_Service_CN.md) + - [Python SDK(Chinese)](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client) - [JAVA SDK](doc/Java_SDK_EN.md) - - [C++ SDK(Chinese)](doc/C++_Serving/Creat_C++Serving_CN.md) + - [C++ SDK(Chinese)](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client) - [Large-scale sparse parameter server](doc/Cube_Local_EN.md)
@@ -94,7 +94,7 @@ The first step is to call the model save interface to generate a model parameter For Paddle Serving developers, we provide extended documents such as custom OP, level of detail(LOD) processing. - [Custom Operators](doc/C++_Serving/OP_EN.md) -- [Processing LOD Data](doc/LOD_EN.md) +- [Processing LoD Data](doc/LOD_EN.md) - [FAQ(Chinese)](doc/FAQ_CN.md)

Model Zoo

@@ -112,10 +112,10 @@ Paddle Serving works closely with the Paddle model suite, and implements a large For more model examples, read [Model zoo](doc/Model_Zoo_EN.md) -
+

-

+

Community

@@ -124,19 +124,19 @@ If you want to communicate with developers and other users? Welcome to join us, ### Wechat - WeChat scavenging -
+ + +

-

+

### QQ -- 飞桨推理部署交流群(Group No.:697765514) -
- -
+- QQ Group(Group No.:697765514) -### Slack +

+ +

-- [Slack channel](https://paddleserving.slack.com/archives/CUBPKHKMJ) > Contribution diff --git a/README_CN.md b/README_CN.md index c5c336274a5aa0c7ba49619c9a457acce6401699..a1ba46a958a0146b1195e7c9f3af17ab6b263cfd 100644 --- a/README_CN.md +++ b/README_CN.md @@ -17,7 +17,7 @@ Forks Issues Contributors - Community + Community

@@ -27,7 +27,7 @@ Paddle Serving依托深度学习框架PaddlePaddle旨在帮助深度学习开发者和企业提供高性能、灵活易用的工业级在线推理服务。Paddle Serving支持RESTful、gRPC、bRPC等多种协议,提供多种异构硬件和多种操作系统环境下推理解决方案,和多种经典预训练模型示例。核心特性如下: - 集成高性能服务端推理引擎paddle Inference和移动端引擎paddle Lite,其他机器学习平台(Caffe/TensorFlow/ONNX/PyTorch)可通过[x2paddle](https://github.com/PaddlePaddle/X2Paddle)工具迁移模型 -- 具有高性能C++和高易用Python 2套框架。C++框架基于高性能bRPC网络框架打造高吞吐、低延迟的推理服务,性能领先竞品。Python框架基于gRPC/gRPC-Gateway网络框架和Python语言构建高易用、高吞吐推理服务框架。技术选型参考[技术选型](doc/Serving_Design_CN.md) +- 具有高性能C++和高易用Python 2套框架。C++框架基于高性能bRPC网络框架打造高吞吐、低延迟的推理服务,性能领先竞品。Python框架基于gRPC/gRPC-Gateway网络框架和Python语言构建高易用、高吞吐推理服务框架。技术选型参考[技术选型](doc/Serving_Design_CN.md#21-设计选型) - 支持HTTP、gRPC、bRPC等多种[协议](doc/C++_Serving/Inference_Protocols_CN.md);提供C++、Python、Java语言SDK - 设计并实现基于有向无环图(DAG)的异步流水线高性能推理框架,具有多模型组合、异步调度、并发推理、动态批量、多卡多流推理等特性 - 适配x86(Intel) CPU、ARM CPU、Nvidia GPU、昆仑XPU等多种硬件;集成Intel MKLDNN、Nvidia TensorRT加速库,以及低精度和量化推理 @@ -48,17 +48,15 @@ Paddle Serving依托深度学习框架PaddlePaddle旨在帮助深度学习开发

文档

-*** - > 部署 -此章节引导您完成安装和部署步骤,强烈推荐使用Docker部署Paddle Serving,如您不使用docker,省略docker相关步骤。在云服务器上可以使用Kubernetes部署Paddle Serving。在异构硬件如ARM CPU、昆仑XPU上编译或使用Paddle Serving可以下面的文档。每天编译生成develop分支的最新开发包供开发者使用。 +此章节引导您完成安装和部署步骤,强烈推荐使用Docker部署Paddle Serving,如您不使用docker,省略docker相关步骤。在云服务器上可以使用Kubernetes部署Paddle Serving。在异构硬件如ARM CPU、昆仑XPU上编译或使用Paddle Serving可阅读以下文档。每天编译生成develop分支的最新开发包供开发者使用。 - [使用docker安装Paddle Serving](doc/Install_CN.md) - [源码编译安装Paddle Serving](doc/Compile_CN.md) - [在Kuberntes集群上部署Paddle Serving](doc/Run_On_Kubernetes_CN.md) - [部署Paddle Serving安全网关](doc/Serving_Auth_Docker_CN.md) - [在异构硬件部署Paddle Serving](doc/Run_On_XPU_CN.md) -- [最新Wheel开发包](doc/Latest_Packages_CN.md)(develop分支每日更新) +- [最新Wheel开发包(develop分支每日更新)](doc/Latest_Packages_CN.md) > 使用 @@ -66,7 +64,7 @@ Paddle Serving依托深度学习框架PaddlePaddle旨在帮助深度学习开发 - [快速开始](doc/Quick_Start_CN.md) - [保存用于Paddle Serving的模型和配置](doc/Save_CN.md) - [配置和启动参数的说明](doc/Serving_Configure_CN.md) -- [RESTful/gRPC/bRPC API指南](doc/C++_Serving/Introduction_CN.md#4.Client端特性) +- [RESTful/gRPC/bRPC API指南](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client) - [低精度推理](doc/Low_Precision_CN.md) - [常见模型数据处理](doc/Process_data_CN.md) - [C++ Serving简介](doc/C++_Serving/Introduction_CN.md) @@ -76,20 +74,20 @@ Paddle Serving依托深度学习框架PaddlePaddle旨在帮助深度学习开发 - [加密模型推理服务](doc/C++_Serving/Encryption_CN.md) - [性能优化指南](doc/C++_Serving/Performance_Tuning_CN.md) - [性能指标](doc/C++_Serving/Benchmark_CN.md) -- [Python Pipeline简介](doc/Python_Pipeline/Pipeline_Design_CN.md) - - [性能优化指南](doc/Python_Pipeline/Pipeline_Design_CN.md) +- [Python Pipeline设计](doc/Python_Pipeline/Pipeline_Design_CN.md) + - [性能优化指南](doc/Python_Pipeline/Performance_Tuning_CN.md) - [性能指标](doc/Python_Pipeline/Benchmark_CN.md) - 客户端SDK - - [Python SDK](doc/C++_Serving/Http_Service_CN.md) + - [Python SDK](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client) - [JAVA SDK](doc/Java_SDK_CN.md) - - [C++ SDK](doc/C++_Serving/Creat_C++Serving_CN.md) + - [C++ SDK](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client) - [大规模稀疏参数索引服务](doc/Cube_Local_CN.md) > 开发者 为Paddle Serving开发者,提供自定义OP,变长数据处理。 - [自定义OP](doc/C++_Serving/OP_CN.md) -- [变长数据(LOD)处理](doc/LOD_CN.md) +- [变长数据(LoD)处理](doc/LOD_CN.md) - [常见问答](doc/FAQ_CN.md)

模型库

@@ -104,7 +102,7 @@ Paddle Serving与Paddle模型套件紧密配合,实现大量服务化部署,

-更多模型示例参考Repo,可进入[模型库](doc/Model_Zoo_CN.md) +更多模型示例进入[模型库](doc/Model_Zoo_CN.md)

@@ -130,13 +128,10 @@ Paddle Serving与Paddle模型套件紧密配合,实现大量服务化部署,

-### Slack -- [Slack channel](https://paddleserving.slack.com/archives/CUBPKHKMJ) - > 贡献代码 -如果您想为Paddle Serving贡献代码,请参考 [Contribution Guidelines](doc/Contribute_EN.md) +如果您想为Paddle Serving贡献代码,请参考 [Contribution Guidelines(English)](doc/Contribute_EN.md) - 感谢 [@loveululu](https://github.com/loveululu) 提供 Cube python API - 感谢 [@EtachGu](https://github.com/EtachGu) 更新 docker 使用命令 - 感谢 [@BeyondYourself](https://github.com/BeyondYourself) 提供grpc教程,更新FAQ教程,整理文件目录。 diff --git a/doc/C++_Serving/Introduction_CN.md b/doc/C++_Serving/Introduction_CN.md index 17c3f096f6eba10e6abe2c2fcab8e5ee17b83ad4..3f864ab04628b6e707463625d9aafb472917e9db 100755 --- a/doc/C++_Serving/Introduction_CN.md +++ b/doc/C++_Serving/Introduction_CN.md @@ -76,7 +76,7 @@ C++ Serving采用对称加密算法对模型进行加密,在服务加载模型

### 4.2 多语言多协议Client -BRPC网络框架支持[多种底层通信协议](#1.网络框架(BRPC)),即使用目前的C++ Serving框架的Server端,各种语言的Client端,甚至使用curl的方式,只要按照上述协议(具体支持的协议见[brpc官网](https://github.com/apache/incubator-brpc))封装数据并发送,Server端就能够接收、处理和返回结果。 +BRPC网络框架支持[多种底层通信协议](#1网络框架BRPC),即使用目前的C++ Serving框架的Server端,各种语言的Client端,甚至使用curl的方式,只要按照上述协议(具体支持的协议见[brpc官网](https://github.com/apache/incubator-brpc))封装数据并发送,Server端就能够接收、处理和返回结果。 对于支持的各种协议我们提供了部分的Client SDK示例供用户参考和使用,用户也可以根据自己的需求去开发新的Client SDK,也欢迎用户添加其他语言/协议(例如GRPC-Go、GRPC-C++ HTTP2-Go、HTTP2-Java等)Client SDK到我们的仓库供其他开发者借鉴和参考。 diff --git a/doc/FAQ_CN.md b/doc/FAQ_CN.md index c1f2359abf5aca4fdf40e9f7fb3abccb6adce62a..94020bcc2477151839c521474cde6d73ea388ad2 100644 --- a/doc/FAQ_CN.md +++ b/doc/FAQ_CN.md @@ -332,7 +332,7 @@ GLOG_v=2 python -m paddle_serving_server.serve --model xxx_conf/ --port 9999 使用gdb调试core文件的方法为:gdb <可执行文件> ,进入后输入bt指令,一般即可显示出错在哪一行。 -注意:可执行文件路径是C++ bin文件的路径,而不是python命令,一般为类似下面的这种/usr/local/lib/python3.6/site-packages/paddle_serving_server/serving-gpu-102-0.6.2/serving +注意:可执行文件路径是C++ bin文件的路径,而不是python命令,一般为类似下面的这种/usr/local/lib/python3.6/site-packages/paddle_serving_server/serving-gpu-102-0.7.0/serving ## 性能优化 diff --git a/doc/Run_On_Kubernetes_CN.md b/doc/Run_On_Kubernetes_CN.md index 4482c216b318fac3294c08309e8635c87c88a4b8..951fda78dd0c04d2faa7db5b84cfa845235fbaa5 100644 --- a/doc/Run_On_Kubernetes_CN.md +++ b/doc/Run_On_Kubernetes_CN.md @@ -20,7 +20,7 @@ kubectl apply -f https://bit.ly/kong-ingress-dbless ### 制作Serving运行镜像(可选): -首先您需要确定运行镜像的具体环境。和[DOCKER开发镜像列表]()文档相比,开发镜像用于调试、编译代码,携带了大量的开发工具,因此镜像体积较大。运行镜像通常要求缩小容器体积以提高部署的灵活性。如果您不太需要轻量级的运行容器,请直接跳过这一部分。 +首先您需要确定运行镜像的具体环境。和[DOCKER开发镜像列表](./Docker_Images_CN.md)文档相比,开发镜像用于调试、编译代码,携带了大量的开发工具,因此镜像体积较大。运行镜像通常要求缩小容器体积以提高部署的灵活性。如果您不太需要轻量级的运行容器,请直接跳过这一部分。 在`tools/generate_runtime_docker.sh`文件下,它的使用方式如下 diff --git a/doc/Serving_Design_CN.md b/doc/Serving_Design_CN.md index 1c407292dbe96b3bf0d3bb33433155e19bd9e7ee..c9d6b25bdd4520a1cf2d09e4e9713a9acfb01487 100644 --- a/doc/Serving_Design_CN.md +++ b/doc/Serving_Design_CN.md @@ -27,9 +27,9 @@ Paddle Serving提供很多大规模场景需要的部署功能:1)模型管 | 响应时间 | 吞吐 | 开发效率 | 资源利用率 | 选型 | 应用场景| |-----|------|-----|-----|------|------| -| 低 | 高 | 低 | 高 |C++ Serving | 高性能场景,大型在线推荐系统召回、排序服务| -| 高 | 高 | 较高 |高|Python Pipeline Serving| 兼顾吞吐和效率,单算子多模型组合场景,异步模式| -| 高 | 低 | 高| 低 |Python webservice| 高迭代效率场景,小型服务或需要快速迭代,模型效果验证| +| 低 | 最高 | 低 | 最高 |C++ Serving | 高性能场景,大型在线推荐系统召回、排序服务| +| 最高 | 较高 | 较高 |较高|Python Pipeline Serving| 兼顾吞吐和效率,单算子多模型组合场景,异步模式| +| 较高 | 低 | 较高| 低 |Python webservice| 高迭代效率场景,小型服务或需要快速迭代,模型效果验证| 性能指标说明: @@ -197,12 +197,4 @@ Pipeline Serving核心设计是图执行引擎,基本处理单元是OP和Chann

----- - -## 6. 未来计划 - -### 6.1 向量检索、树结构检索 -在推荐与广告场景的召回系统中,通常需要采用基于向量的快速检索或者基于树结构的快速检索,Paddle Serving会对这方面的检索引擎进行集成或扩展。 -### 6.2 服务监控 -集成普罗米修斯监控,一套开源的监控&报警&时间序列数据库的组合,适合k8s和docker的监控系统。 diff --git a/doc/Serving_Design_EN.md b/doc/Serving_Design_EN.md index 8a77d7552a33327d172e64e2bf67d604ec42aed9..3028f8f6109fc6c8327f8d4ea3ed1e6a8c1dff57 100644 --- a/doc/Serving_Design_EN.md +++ b/doc/Serving_Design_EN.md @@ -25,9 +25,9 @@ In order to meet the needs of users in different scenarios, Paddle Serving's pro | Response time | throughput | development efficiency | Resource utilization | selection | Applications| |-----|------|-----|-----|------|------| -| LOW | HIGH | LOW | HIGH |C++ Serving | High-performance,recall and ranking services of large-scale online recommendation systems| -| HIGH | HIGH | HIGH | HIGH |Python Pipeline Serving| High-throughput, high-efficiency, asynchronous mode, fitting for single operator multi-model combination scenarios| -| HIGH | LOW | HIGH| LOW |Python webservice| High-throughput,Low-traffic services or projects that require rapid iteration, model effect verification| +| Low | Highest | Low | Highest |C++ Serving | High-performance,recall and ranking services of large-scale online recommendation systems| +| Higest | Higher | Higher | Higher |Python Pipeline Serving| High-throughput, high-efficiency, asynchronous mode, fitting for single operator multi-model combination scenarios| +| Higer | Low | Higher| Low |Python webservice| High-throughput,Low-traffic services or projects that require rapid iteration, model effect verification| Performance index description: 1. Response time (ms): Average response time of a single request, calculate the response time of 50, 90, 95, 99 quantiles, the lower the better. @@ -199,16 +199,4 @@ The core design of Pipeline Serving is a graph execution engine, and the basic p ----- - - -## 6. Future Plan - -### 5.1 Auto Deployment on Cloud -In order to make deployment more easily on public cloud, Paddle Serving considers to provides Operators on Kubernetes in submitting a service job. - -### 6.2 Vector Indexing and Tree based Indexing -In recommendation and advertisement systems, it is commonly seen to use vector based index or tree based indexing service to do candidate retrievals. These retrieval tasks will be built-in services of Paddle Serving. -### 6.3 Service Monitoring -Paddle Serving will integrate Prometheus monitoring, which is a set of open source monitoring & alarm & time series database combination, suitable for k8s and docker monitoring systems.