提交 7dd623f2 编写于 作者: H HexToString

Merge branch 'develop' of https://github.com/PaddlePaddle/Serving into fix_dead_link

...@@ -17,20 +17,21 @@ ...@@ -17,20 +17,21 @@
<img alt="Forks" src="https://img.shields.io/github/forks/PaddlePaddle/Serving?color=yellow&style=flat-square"> <img alt="Forks" src="https://img.shields.io/github/forks/PaddlePaddle/Serving?color=yellow&style=flat-square">
<img alt="Issues" src="https://img.shields.io/github/issues/PaddlePaddle/Serving?color=yellow&style=flat-square"> <img alt="Issues" src="https://img.shields.io/github/issues/PaddlePaddle/Serving?color=yellow&style=flat-square">
<img alt="Contributors" src="https://img.shields.io/github/contributors/PaddlePaddle/Serving?color=orange&style=flat-square"> <img alt="Contributors" src="https://img.shields.io/github/contributors/PaddlePaddle/Serving?color=orange&style=flat-square">
<img alt="Community" src="https://img.shields.io/badge/join-Wechat,QQ,Slack-orange?style=flat-square"> <img alt="Community" src="https://img.shields.io/badge/join-Wechat,QQ-orange?style=flat-square">
</a> </a>
<br> <br>
<p> <p>
*** ***
The goal of Paddle Serving is to provide high-performance, flexible and easy-to-use industrial-grade online inference services for machine learning developers and enterprises.Paddle Serving supports multiple protocols such as RESTful, gRPC, bRPC, and provides inference solutions under a variety of hardware and multiple operating system environments, and many famous pre-trained model examples.The core features are as follows: The goal of Paddle Serving is to provide high-performance, flexible and easy-to-use industrial-grade online inference services for machine learning developers and enterprises.Paddle Serving supports multiple protocols such as RESTful, gRPC, bRPC, and provides inference solutions under a variety of hardware and multiple operating system environments, and many famous pre-trained model examples. The core features are as follows:
- Integrate high-performance server-side inference engine paddle Inference and mobile-side engine paddle Lite. Models of other machine learning platforms (Caffe/TensorFlow/ONNX/PyTorch) can be migrated to paddle through [x2paddle](https://github.com/PaddlePaddle/X2Paddle). - Integrate high-performance server-side inference engine paddle Inference and mobile-side engine paddle Lite. Models of other machine learning platforms (Caffe/TensorFlow/ONNX/PyTorch) can be migrated to paddle through [x2paddle](https://github.com/PaddlePaddle/X2Paddle).
- There are two frameworks, namely high-performance C++ Serving and high-easy-to-use Python pipeline.The C++ Serving is based on the bRPC network framework to create a high-throughput, low-latency inference service, and its performance indicators are ahead of competing products. The Python pipeline is based on the gRPC/gRPC-Gateway network framework and the Python language to build a highly easy-to-use and high-throughput inference service. How to choose which one please see [Techinical Selection](doc/Serving_Design_EN.md) - There are two frameworks, namely high-performance C++ Serving and high-easy-to-use Python pipeline. The C++ Serving is based on the bRPC network framework to create a high-throughput, low-latency inference service, and its performance indicators are ahead of competing products. The Python pipeline is based on the gRPC/gRPC-Gateway network framework and the Python language to build a highly easy-to-use and high-throughput inference service. How to choose which one please see [Techinical Selection](doc/Serving_Design_EN.md#21-design-selection).
- Support multiple [protocols](doc/C++_Serving/Inference_Protocols_CN.md ) such as HTTP, gRPC, bRPC, and provide C++, Python, Java language SDK. - Support multiple [protocols](doc/C++_Serving/Inference_Protocols_CN.md) such as HTTP, gRPC, bRPC, and provide C++, Python, Java language SDK.
- Design and implement a high-performance inference service framework for asynchronous pipelines based on directed acyclic graph (DAG), with features such as multi-model combination, asynchronous scheduling, concurrent inference, dynamic batch, multi-card multi-stream inference, etc.- Adapt to a variety of commonly used computing hardwares, such as x86 (Intel) CPU, ARM CPU, Nvidia GPU, Kunlun XPU, etc.; Integrate acceleration libraries of Intel MKLDNN and Nvidia TensorRT, and low-precision and quantitative inference. - Design and implement a high-performance inference service framework for asynchronous pipelines based on directed acyclic graph (DAG), with features such as multi-model combination, asynchronous scheduling, concurrent inference, dynamic batch, multi-card multi-stream inference, etc.
- Adapt to a variety of commonly used computing hardwares, such as x86 (Intel) CPU, ARM CPU, Nvidia GPU, Kunlun XPU, etc.; Integrate acceleration libraries of Intel MKLDNN and Nvidia TensorRT, and low-precision and quantitative inference.
- Provide a model security deployment solution, including encryption model deployment, and authentication mechanism, HTTPs security gateway, which is used in practice. - Provide a model security deployment solution, including encryption model deployment, and authentication mechanism, HTTPs security gateway, which is used in practice.
- Support cloud deployment, provide a deployment case of Baidu Cloud Intelligent Cloud kubernetes cluster. - Support cloud deployment, provide a deployment case of Baidu Cloud Intelligent Cloud kubernetes cluster.
- Provide more than 40 classic pre-model deployment examples, such as PaddleOCR, PaddleClas, PaddleDetection, PaddleSeg, PaddleNLP, PaddleRec and other suites, and more models continue to expand. - Provide more than 40 classic pre-model deployment examples, such as PaddleOCR, PaddleClas, PaddleDetection, PaddleSeg, PaddleNLP, PaddleRec and other suites, and more models continue to expand.
...@@ -57,9 +58,9 @@ This chapter guides you through the installation and deployment steps. It is str ...@@ -57,9 +58,9 @@ This chapter guides you through the installation and deployment steps. It is str
- [Install Paddle Serving using docker](doc/Install_EN.md) - [Install Paddle Serving using docker](doc/Install_EN.md)
- [Build Paddle Serving from Source with Docker](doc/Compile_EN.md) - [Build Paddle Serving from Source with Docker](doc/Compile_EN.md)
- [Deploy Paddle Serving on Kubernetes](doc/Run_On_Kubernetes_CN.md) - [Deploy Paddle Serving on Kubernetes](doc/Run_On_Kubernetes_CN.md)
- [Deploy Paddle Serving with Security gateway](doc/Serving_Auth_Docker_CN.md) - [Deploy Paddle Serving with Security gateway(Chinese)](doc/Serving_Auth_Docker_CN.md)
- [Deploy Paddle Serving on more hardwares](doc/Run_On_XPU_EN.md) - [Deploy Paddle Serving on more hardwares](doc/Run_On_XPU_EN.md)
- [Latest Wheel packages](doc/Latest_Packages_CN.md)(Update everyday on branch develop) - [Latest Wheel packages](doc/Latest_Packages_CN.md)
> Use > Use
...@@ -68,23 +69,23 @@ The first step is to call the model save interface to generate a model parameter ...@@ -68,23 +69,23 @@ The first step is to call the model save interface to generate a model parameter
- [Quick Start](doc/Quick_Start_EN.md) - [Quick Start](doc/Quick_Start_EN.md)
- [Save a servable model](doc/Save_EN.md) - [Save a servable model](doc/Save_EN.md)
- [Description of configuration and startup parameters](doc/Serving_Configure_EN.md) - [Description of configuration and startup parameters](doc/Serving_Configure_EN.md)
- [Guide for RESTful/gRPC/bRPC APIs](doc/C++_Serving/Introduction_CN.md) - [Guide for RESTful/gRPC/bRPC APIs(Chinese)](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client)
- [Infer on quantizative models](doc/Low_Precision_CN.md) - [Infer on quantizative models](doc/Low_Precision_EN.md)
- [Data format of classic models](doc/Process_data_CN.md) - [Data format of classic models(Chinese)](doc/Process_data_CN.md)
- [C++ Serving](doc/C++_Serving/Introduction_CN.md) - [C++ Serving(Chinese)](doc/C++_Serving/Introduction_CN.md)
- [protocols](doc/C++_Serving/Inference_Protocols_CN.md) - [Protocols(Chinese)](doc/C++_Serving/Inference_Protocols_CN.md)
- [Hot loading models](doc/C++_Serving/Hot_Loading_EN.md) - [Hot loading models](doc/C++_Serving/Hot_Loading_EN.md)
- [A/B Test](doc/C++_Serving/ABTest_EN.md) - [A/B Test](doc/C++_Serving/ABTest_EN.md)
- [Encryption](doc/C++_Serving/Encryption_EN.md) - [Encryption](doc/C++_Serving/Encryption_EN.md)
- [Analyze and optimize performance(Chinese)](doc/C++_Serving/Performance_Tuning_CN.md) - [Analyze and optimize performance(Chinese)](doc/C++_Serving/Performance_Tuning_CN.md)
- [Benchmark(Chinese)](doc/C++_Serving/Benchmark_CN.md) - [Benchmark(Chinese)](doc/C++_Serving/Benchmark_CN.md)
- [Python Pipeline](doc/Python_Pipeline/Pipeline_Design_EN.md) - [Python Pipeline](doc/Python_Pipeline/Pipeline_Design_EN.md)
- [Analyze and optimize performance](doc/Python_Pipeline/Pipeline_Design_EN.md) - [Analyze and optimize performance](doc/Python_Pipeline/Performance_Tuning_EN.md)
- [Benchmark(Chinese)](doc/Python_Pipeline/Benchmark_CN.md) - [Benchmark(Chinese)](doc/Python_Pipeline/Benchmark_CN.md)
- Client SDK - Client SDK
- [Python SDK(Chinese)](doc/C++_Serving/Http_Service_CN.md) - [Python SDK(Chinese)](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client)
- [JAVA SDK](doc/Java_SDK_EN.md) - [JAVA SDK](doc/Java_SDK_EN.md)
- [C++ SDK(Chinese)](doc/C++_Serving/Creat_C++Serving_CN.md) - [C++ SDK(Chinese)](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client)
- [Large-scale sparse parameter server](doc/Cube_Local_EN.md) - [Large-scale sparse parameter server](doc/Cube_Local_EN.md)
<br> <br>
...@@ -93,7 +94,7 @@ The first step is to call the model save interface to generate a model parameter ...@@ -93,7 +94,7 @@ The first step is to call the model save interface to generate a model parameter
For Paddle Serving developers, we provide extended documents such as custom OP, level of detail(LOD) processing. For Paddle Serving developers, we provide extended documents such as custom OP, level of detail(LOD) processing.
- [Custom Operators](doc/C++_Serving/OP_EN.md) - [Custom Operators](doc/C++_Serving/OP_EN.md)
- [Processing LOD Data](doc/LOD_EN.md) - [Processing LoD Data](doc/LOD_EN.md)
- [FAQ(Chinese)](doc/FAQ_CN.md) - [FAQ(Chinese)](doc/FAQ_CN.md)
<h2 align="center">Model Zoo</h2> <h2 align="center">Model Zoo</h2>
...@@ -130,15 +131,12 @@ If you want to communicate with developers and other users? Welcome to join us, ...@@ -130,15 +131,12 @@ If you want to communicate with developers and other users? Welcome to join us,
</p> </p>
### QQ ### QQ
- 飞桨推理部署交流群(Group No.:697765514) - QQ Group(Group No.:697765514)
<p align="center"> <p align="center">
<img src="doc/images/qq_group_1.png" width="200"> <img src="doc/images/qq_group_1.png" width="200">
</p> </p>
### Slack
- [Slack channel](https://paddleserving.slack.com/archives/CUBPKHKMJ)
> Contribution > Contribution
......
...@@ -17,7 +17,7 @@ ...@@ -17,7 +17,7 @@
<img alt="Forks" src="https://img.shields.io/github/forks/PaddlePaddle/Serving?color=yellow&style=flat-square"> <img alt="Forks" src="https://img.shields.io/github/forks/PaddlePaddle/Serving?color=yellow&style=flat-square">
<img alt="Issues" src="https://img.shields.io/github/issues/PaddlePaddle/Serving?color=yellow&style=flat-square"> <img alt="Issues" src="https://img.shields.io/github/issues/PaddlePaddle/Serving?color=yellow&style=flat-square">
<img alt="Contributors" src="https://img.shields.io/github/contributors/PaddlePaddle/Serving?color=orange&style=flat-square"> <img alt="Contributors" src="https://img.shields.io/github/contributors/PaddlePaddle/Serving?color=orange&style=flat-square">
<img alt="Community" src="https://img.shields.io/badge/join-Wechat,QQ,Slack-orange?style=flat-square"> <img alt="Community" src="https://img.shields.io/badge/join-Wechat,QQ-orange?style=flat-square">
</a> </a>
<br> <br>
<p> <p>
...@@ -27,7 +27,7 @@ ...@@ -27,7 +27,7 @@
Paddle Serving依托深度学习框架PaddlePaddle旨在帮助深度学习开发者和企业提供高性能、灵活易用的工业级在线推理服务。Paddle Serving支持RESTful、gRPC、bRPC等多种协议,提供多种异构硬件和多种操作系统环境下推理解决方案,和多种经典预训练模型示例。核心特性如下: Paddle Serving依托深度学习框架PaddlePaddle旨在帮助深度学习开发者和企业提供高性能、灵活易用的工业级在线推理服务。Paddle Serving支持RESTful、gRPC、bRPC等多种协议,提供多种异构硬件和多种操作系统环境下推理解决方案,和多种经典预训练模型示例。核心特性如下:
- 集成高性能服务端推理引擎paddle Inference和移动端引擎paddle Lite,其他机器学习平台(Caffe/TensorFlow/ONNX/PyTorch)可通过[x2paddle](https://github.com/PaddlePaddle/X2Paddle)工具迁移模型 - 集成高性能服务端推理引擎paddle Inference和移动端引擎paddle Lite,其他机器学习平台(Caffe/TensorFlow/ONNX/PyTorch)可通过[x2paddle](https://github.com/PaddlePaddle/X2Paddle)工具迁移模型
- 具有高性能C++和高易用Python 2套框架。C++框架基于高性能bRPC网络框架打造高吞吐、低延迟的推理服务,性能领先竞品。Python框架基于gRPC/gRPC-Gateway网络框架和Python语言构建高易用、高吞吐推理服务框架。技术选型参考[技术选型](doc/Serving_Design_CN.md) - 具有高性能C++和高易用Python 2套框架。C++框架基于高性能bRPC网络框架打造高吞吐、低延迟的推理服务,性能领先竞品。Python框架基于gRPC/gRPC-Gateway网络框架和Python语言构建高易用、高吞吐推理服务框架。技术选型参考[技术选型](doc/Serving_Design_CN.md#21-设计选型)
- 支持HTTP、gRPC、bRPC等多种[协议](doc/C++_Serving/Inference_Protocols_CN.md);提供C++、Python、Java语言SDK - 支持HTTP、gRPC、bRPC等多种[协议](doc/C++_Serving/Inference_Protocols_CN.md);提供C++、Python、Java语言SDK
- 设计并实现基于有向无环图(DAG)的异步流水线高性能推理框架,具有多模型组合、异步调度、并发推理、动态批量、多卡多流推理等特性 - 设计并实现基于有向无环图(DAG)的异步流水线高性能推理框架,具有多模型组合、异步调度、并发推理、动态批量、多卡多流推理等特性
- 适配x86(Intel) CPU、ARM CPU、Nvidia GPU、昆仑XPU等多种硬件;集成Intel MKLDNN、Nvidia TensorRT加速库,以及低精度和量化推理 - 适配x86(Intel) CPU、ARM CPU、Nvidia GPU、昆仑XPU等多种硬件;集成Intel MKLDNN、Nvidia TensorRT加速库,以及低精度和量化推理
...@@ -48,17 +48,15 @@ Paddle Serving依托深度学习框架PaddlePaddle旨在帮助深度学习开发 ...@@ -48,17 +48,15 @@ Paddle Serving依托深度学习框架PaddlePaddle旨在帮助深度学习开发
<h2 align="center">文档</h2> <h2 align="center">文档</h2>
***
> 部署 > 部署
此章节引导您完成安装和部署步骤,强烈推荐使用Docker部署Paddle Serving,如您不使用docker,省略docker相关步骤。在云服务器上可以使用Kubernetes部署Paddle Serving。在异构硬件如ARM CPU、昆仑XPU上编译或使用Paddle Serving可以下面的文档。每天编译生成develop分支的最新开发包供开发者使用。 此章节引导您完成安装和部署步骤,强烈推荐使用Docker部署Paddle Serving,如您不使用docker,省略docker相关步骤。在云服务器上可以使用Kubernetes部署Paddle Serving。在异构硬件如ARM CPU、昆仑XPU上编译或使用Paddle Serving可阅读以下文档。每天编译生成develop分支的最新开发包供开发者使用。
- [使用docker安装Paddle Serving](doc/Install_CN.md) - [使用docker安装Paddle Serving](doc/Install_CN.md)
- [源码编译安装Paddle Serving](doc/Compile_CN.md) - [源码编译安装Paddle Serving](doc/Compile_CN.md)
- [在Kuberntes集群上部署Paddle Serving](doc/Run_On_Kubernetes_CN.md) - [在Kuberntes集群上部署Paddle Serving](doc/Run_On_Kubernetes_CN.md)
- [部署Paddle Serving安全网关](doc/Serving_Auth_Docker_CN.md) - [部署Paddle Serving安全网关](doc/Serving_Auth_Docker_CN.md)
- [在异构硬件部署Paddle Serving](doc/Run_On_XPU_CN.md) - [在异构硬件部署Paddle Serving](doc/Run_On_XPU_CN.md)
- [最新Wheel开发包](doc/Latest_Packages_CN.md)(develop分支每日更新) - [最新Wheel开发包](doc/Latest_Packages_CN.md)
> 使用 > 使用
...@@ -66,7 +64,7 @@ Paddle Serving依托深度学习框架PaddlePaddle旨在帮助深度学习开发 ...@@ -66,7 +64,7 @@ Paddle Serving依托深度学习框架PaddlePaddle旨在帮助深度学习开发
- [快速开始](doc/Quick_Start_CN.md) - [快速开始](doc/Quick_Start_CN.md)
- [保存用于Paddle Serving的模型和配置](doc/Save_CN.md) - [保存用于Paddle Serving的模型和配置](doc/Save_CN.md)
- [配置和启动参数的说明](doc/Serving_Configure_CN.md) - [配置和启动参数的说明](doc/Serving_Configure_CN.md)
- [RESTful/gRPC/bRPC API指南](doc/C++_Serving/Introduction_CN.md#4.Client端特性) - [RESTful/gRPC/bRPC API指南](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client)
- [低精度推理](doc/Low_Precision_CN.md) - [低精度推理](doc/Low_Precision_CN.md)
- [常见模型数据处理](doc/Process_data_CN.md) - [常见模型数据处理](doc/Process_data_CN.md)
- [C++ Serving简介](doc/C++_Serving/Introduction_CN.md) - [C++ Serving简介](doc/C++_Serving/Introduction_CN.md)
...@@ -76,20 +74,20 @@ Paddle Serving依托深度学习框架PaddlePaddle旨在帮助深度学习开发 ...@@ -76,20 +74,20 @@ Paddle Serving依托深度学习框架PaddlePaddle旨在帮助深度学习开发
- [加密模型推理服务](doc/C++_Serving/Encryption_CN.md) - [加密模型推理服务](doc/C++_Serving/Encryption_CN.md)
- [性能优化指南](doc/C++_Serving/Performance_Tuning_CN.md) - [性能优化指南](doc/C++_Serving/Performance_Tuning_CN.md)
- [性能指标](doc/C++_Serving/Benchmark_CN.md) - [性能指标](doc/C++_Serving/Benchmark_CN.md)
- [Python Pipeline简介](doc/Python_Pipeline/Pipeline_Design_CN.md) - [Python Pipeline设计](doc/Python_Pipeline/Pipeline_Design_CN.md)
- [性能优化指南](doc/Python_Pipeline/Pipeline_Design_CN.md) - [性能优化指南](doc/Python_Pipeline/Performance_Tuning_CN.md)
- [性能指标](doc/Python_Pipeline/Benchmark_CN.md) - [性能指标](doc/Python_Pipeline/Benchmark_CN.md)
- 客户端SDK - 客户端SDK
- [Python SDK](doc/C++_Serving/Http_Service_CN.md) - [Python SDK](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client)
- [JAVA SDK](doc/Java_SDK_CN.md) - [JAVA SDK](doc/Java_SDK_CN.md)
- [C++ SDK](doc/C++_Serving/Creat_C++Serving_CN.md) - [C++ SDK](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client)
- [大规模稀疏参数索引服务](doc/Cube_Local_CN.md) - [大规模稀疏参数索引服务](doc/Cube_Local_CN.md)
> 开发者 > 开发者
为Paddle Serving开发者,提供自定义OP,变长数据处理。 为Paddle Serving开发者,提供自定义OP,变长数据处理。
- [自定义OP](doc/C++_Serving/OP_CN.md) - [自定义OP](doc/C++_Serving/OP_CN.md)
- [变长数据(LOD)处理](doc/LOD_CN.md) - [变长数据(LoD)处理](doc/LOD_CN.md)
- [常见问答](doc/FAQ_CN.md) - [常见问答](doc/FAQ_CN.md)
<h2 align="center">模型库</h2> <h2 align="center">模型库</h2>
...@@ -104,7 +102,7 @@ Paddle Serving与Paddle模型套件紧密配合,实现大量服务化部署, ...@@ -104,7 +102,7 @@ Paddle Serving与Paddle模型套件紧密配合,实现大量服务化部署,
</p> </p>
更多模型示例参考Repo,可进入[模型库](doc/Model_Zoo_CN.md) 更多模型示例进入[模型库](doc/Model_Zoo_CN.md)
<p align="center"> <p align="center">
<img src="https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/imgs_results/PP-OCRv2/PP-OCRv2-pic003.jpg?raw=true" width="345"/> <img src="https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/imgs_results/PP-OCRv2/PP-OCRv2-pic003.jpg?raw=true" width="345"/>
...@@ -130,13 +128,10 @@ Paddle Serving与Paddle模型套件紧密配合,实现大量服务化部署, ...@@ -130,13 +128,10 @@ Paddle Serving与Paddle模型套件紧密配合,实现大量服务化部署,
<img src="doc/images/qq_group_1.png" width="200"> <img src="doc/images/qq_group_1.png" width="200">
</p> </p>
### Slack
- [Slack channel](https://paddleserving.slack.com/archives/CUBPKHKMJ)
> 贡献代码 > 贡献代码
如果您想为Paddle Serving贡献代码,请参考 [Contribution Guidelines](doc/Contribute_EN.md) 如果您想为Paddle Serving贡献代码,请参考 [Contribution Guidelines(English)](doc/Contribute_EN.md)
- 感谢 [@loveululu](https://github.com/loveululu) 提供 Cube python API - 感谢 [@loveululu](https://github.com/loveululu) 提供 Cube python API
- 感谢 [@EtachGu](https://github.com/EtachGu) 更新 docker 使用命令 - 感谢 [@EtachGu](https://github.com/EtachGu) 更新 docker 使用命令
- 感谢 [@BeyondYourself](https://github.com/BeyondYourself) 提供grpc教程,更新FAQ教程,整理文件目录。 - 感谢 [@BeyondYourself](https://github.com/BeyondYourself) 提供grpc教程,更新FAQ教程,整理文件目录。
......
...@@ -31,7 +31,7 @@ The Python code in the file will process the data `test_data/part-0` and write t ...@@ -31,7 +31,7 @@ The Python code in the file will process the data `test_data/part-0` and write t
### Start Server ### Start Server
Here, we [use docker](../Run_In_Docker_EN.md) to start the server-side service. Here, we [use docker](../Docker_Images_EN.md) to start the server-side service.
First, start the BOW server, which enables the `8000` port: First, start the BOW server, which enables the `8000` port:
......
...@@ -76,7 +76,7 @@ C++ Serving采用对称加密算法对模型进行加密,在服务加载模型 ...@@ -76,7 +76,7 @@ C++ Serving采用对称加密算法对模型进行加密,在服务加载模型
<p> <p>
### 4.2 多语言多协议Client ### 4.2 多语言多协议Client
BRPC网络框架支持[多种底层通信协议](#1.网络框架(BRPC)),即使用目前的C++ Serving框架的Server端,各种语言的Client端,甚至使用curl的方式,只要按照上述协议(具体支持的协议见[brpc官网](https://github.com/apache/incubator-brpc))封装数据并发送,Server端就能够接收、处理和返回结果。 BRPC网络框架支持[多种底层通信协议](#1网络框架BRPC),即使用目前的C++ Serving框架的Server端,各种语言的Client端,甚至使用curl的方式,只要按照上述协议(具体支持的协议见[brpc官网](https://github.com/apache/incubator-brpc))封装数据并发送,Server端就能够接收、处理和返回结果。
对于支持的各种协议我们提供了部分的Client SDK示例供用户参考和使用,用户也可以根据自己的需求去开发新的Client SDK,也欢迎用户添加其他语言/协议(例如GRPC-Go、GRPC-C++ HTTP2-Go、HTTP2-Java等)Client SDK到我们的仓库供其他开发者借鉴和参考。 对于支持的各种协议我们提供了部分的Client SDK示例供用户参考和使用,用户也可以根据自己的需求去开发新的Client SDK,也欢迎用户添加其他语言/协议(例如GRPC-Go、GRPC-C++ HTTP2-Go、HTTP2-Java等)Client SDK到我们的仓库供其他开发者借鉴和参考。
......
...@@ -51,9 +51,9 @@ Serving开发镜像是Serving套件为了支持各个预测环境提供的用于 ...@@ -51,9 +51,9 @@ Serving开发镜像是Serving套件为了支持各个预测环境提供的用于
| :--------------------------: | :-------------------------------: | :-------------: | :-------------------: | :----------------: | | :--------------------------: | :-------------------------------: | :-------------: | :-------------------: | :----------------: |
| CPU | 0.7.0-devel | Ubuntu 16.04 | 2.2.0 | Ubuntu 18.04. | | CPU | 0.7.0-devel | Ubuntu 16.04 | 2.2.0 | Ubuntu 18.04. |
| Cuda10.1+Cudnn7 | 0.7.0-cuda10.1-cudnn7-devel | Ubuntu 16.04 | 无 | 无 | | Cuda10.1+Cudnn7 | 0.7.0-cuda10.1-cudnn7-devel | Ubuntu 16.04 | 无 | 无 |
| Cuda10.2+Cudnn7 | 0.7.0-cuda10.2-cudnn7-devel | Ubuntu 16.04 | 2.2.0-cuda10.2-cudnn7 | Ubuntu 16.04 | | Cuda10.2+Cudnn7 | 0.7.0-cuda10.2-cudnn7-devel | Ubuntu 16.04 | 2.2.0-gpu-cuda10.2-cudnn7 | Ubuntu 16.04 |
| Cuda10.2+Cudnn8 | 0.7.0-cuda10.2-cudnn8-devel | Ubuntu 16.04 | 无 | 无 | | Cuda10.2+Cudnn8 | 0.7.0-cuda10.2-cudnn8-devel | Ubuntu 16.04 | 无 | 无 |
| Cuda11.2+Cudnn8 | 0.7.0-cuda11.2-cudnn8-devel | Ubuntu 16.04 | 2.2.0-cuda11.2-cudnn8 | Ubuntu 18.04 | | Cuda11.2+Cudnn8 | 0.7.0-cuda11.2-cudnn8-devel | Ubuntu 16.04 | 2.2.0-gpu-cuda11.2-cudnn8 | Ubuntu 18.04 |
我们首先要针对自己所需的环境拉取相关镜像。上表**环境**一列下,除了CPU,其余(Cuda**+Cudnn**)都属于GPU环境。 我们首先要针对自己所需的环境拉取相关镜像。上表**环境**一列下,除了CPU,其余(Cuda**+Cudnn**)都属于GPU环境。
您可以使用Serving开发镜像。 您可以使用Serving开发镜像。
...@@ -106,9 +106,10 @@ find / -name Python.h ...@@ -106,9 +106,10 @@ find / -name Python.h
2) 设置`PYTHON_LIBRARIES` 2) 设置`PYTHON_LIBRARIES`
搜索 libpython3.7.so 搜索 libpython3.7.so 或 libpython3.7m.so
``` ```
find / -name libpython3.7.so find / -name libpython3.7.so
find / -name libpython3.7m.so
``` ```
通常会有类似于`**/lib/libpython3.7.so`或者`**/lib/x86_64-linux-gnu/libpython3.7.so`出现,我们只需要取它的文件夹目录就好,比如找到`/usr/local/lib/libpython3.7.so`,那么我们只需要`export PYTHON_LIBRARIES=/usr/local/lib`就好。 通常会有类似于`**/lib/libpython3.7.so`或者`**/lib/x86_64-linux-gnu/libpython3.7.so`出现,我们只需要取它的文件夹目录就好,比如找到`/usr/local/lib/libpython3.7.so`,那么我们只需要`export PYTHON_LIBRARIES=/usr/local/lib`就好。
如果没有找到,说明 1)静态编译Python,需要重新安装动态编译的Python 2)全县不足无法查看相关系统目录。 如果没有找到,说明 1)静态编译Python,需要重新安装动态编译的Python 2)全县不足无法查看相关系统目录。
...@@ -132,7 +133,7 @@ export PYTHON_EXECUTABLE=/usr/bin/python3.7 ...@@ -132,7 +133,7 @@ export PYTHON_EXECUTABLE=/usr/bin/python3.7
export GOPATH=$HOME/go export GOPATH=$HOME/go
export PATH=$PATH:$GOPATH/bin export PATH=$PATH:$GOPATH/bin
python -m install -r python/requirements.txt python3.7 -m pip install -r python/requirements.txt
go env -w GO111MODULE=on go env -w GO111MODULE=on
go env -w GOPROXY=https://goproxy.cn,direct go env -w GOPROXY=https://goproxy.cn,direct
......
...@@ -47,9 +47,9 @@ Serving development mirror is the mirror used to compile and debug prediction se ...@@ -47,9 +47,9 @@ Serving development mirror is the mirror used to compile and debug prediction se
| :--------------------------: | :-------------------------------: | :-------------: | :-------------------: | :----------------: | | :--------------------------: | :-------------------------------: | :-------------: | :-------------------: | :----------------: |
| CPU | 0.7.0-devel | Ubuntu 16.04 | 2.2.0 | Ubuntu 18.04. | | CPU | 0.7.0-devel | Ubuntu 16.04 | 2.2.0 | Ubuntu 18.04. |
| Cuda10.1+Cudnn7 | 0.7.0-cuda10.1-cudnn7-devel | Ubuntu 16.04 | Nan | Nan | | Cuda10.1+Cudnn7 | 0.7.0-cuda10.1-cudnn7-devel | Ubuntu 16.04 | Nan | Nan |
| Cuda10.2+Cudnn7 | 0.7.0-cuda10.2-cudnn7-devel | Ubuntu 16.04 | 2.2.0-cuda10.2-cudnn7 | Ubuntu 16.04 | | Cuda10.2+Cudnn7 | 0.7.0-cuda10.2-cudnn7-devel | Ubuntu 16.04 | 2.2.0-gpu-cuda10.2-cudnn7 | Ubuntu 16.04 |
| Cuda10.2+Cudnn8 | 0.7.0-cuda10.2-cudnn8-devel | Ubuntu 16.04 | Nan | Nan | | Cuda10.2+Cudnn8 | 0.7.0-cuda10.2-cudnn8-devel | Ubuntu 16.04 | Nan | Nan |
| Cuda11.2+Cudnn8 | 0.7.0-cuda11.2-cudnn8-devel | Ubuntu 16.04 | 2.2.0-cuda11.2-cudnn8 | Ubuntu 18.04 | | Cuda11.2+Cudnn8 | 0.7.0-cuda11.2-cudnn8-devel | Ubuntu 16.04 | 2.2.0-gpu-cuda11.2-cudnn8 | Ubuntu 18.04 |
We first need to pull related images for the environment we need. Under the **Environment** column in the above table, except for the CPU, the rest (Cuda**+Cudnn**) belong to the GPU environment. We first need to pull related images for the environment we need. Under the **Environment** column in the above table, except for the CPU, the rest (Cuda**+Cudnn**) belong to the GPU environment.
...@@ -93,9 +93,10 @@ If not found. Explanation 1) The development version of Python is not installed ...@@ -93,9 +93,10 @@ If not found. Explanation 1) The development version of Python is not installed
2) Set `PYTHON_LIBRARIES` 2) Set `PYTHON_LIBRARIES`
Search for libpython3.7.so Search for libpython3.7.so or libpython3.7m.so
``` ```
find / -name libpython3.7.so find / -name libpython3.7.so
find / -name libpython3.7m.so
``` ```
Usually there will be something similar to `**/lib/libpython3.7.so` or `**/lib/x86_64-linux-gnu/libpython3.7.so`, we only need to take its folder directory, For example, find `/usr/local/lib/libpython3.7.so`, then we only need `export PYTHON_LIBRARIES=/usr/local/lib`. Usually there will be something similar to `**/lib/libpython3.7.so` or `**/lib/x86_64-linux-gnu/libpython3.7.so`, we only need to take its folder directory, For example, find `/usr/local/lib/libpython3.7.so`, then we only need `export PYTHON_LIBRARIES=/usr/local/lib`.
If it is not found, it means 1) Statically compiling Python, you need to reinstall the dynamically compiled Python 2) The county is not enough to view the relevant system catalogs. If it is not found, it means 1) Statically compiling Python, you need to reinstall the dynamically compiled Python 2) The county is not enough to view the relevant system catalogs.
...@@ -119,7 +120,7 @@ export PYTHON_EXECUTABLE=/usr/bin/python3.7 ...@@ -119,7 +120,7 @@ export PYTHON_EXECUTABLE=/usr/bin/python3.7
export GOPATH=$HOME/go export GOPATH=$HOME/go
export PATH=$PATH:$GOPATH/bin export PATH=$PATH:$GOPATH/bin
python -m install -r python/requirements.txt python3.7 -m pip install -r python/requirements.txt
go env -w GO111MODULE=on go env -w GO111MODULE=on
go env -w GOPROXY=https://goproxy.cn,direct go env -w GOPROXY=https://goproxy.cn,direct
......
# FAQ # FAQ
## 版本升级问题
#### Q: 从v0.6.x升级到v0.7.0版本时,运行Python Pipeline程序时报错信息如下:
```
Failed to predict: (data_id=1 log_id=0) [det|0] Failed to postprocess: postprocess() takes 4 positional arguments but 5 were given
```
**A:** 在服务端程序(例如 web_service.py)的postprocess函数定义中增加参数data_id,改为 def postprocess(self, input_dicts, fetch_dict, **data_id**, log_id) 即可。
## 基础知识 ## 基础知识
...@@ -10,7 +16,7 @@ ...@@ -10,7 +16,7 @@
#### Q: paddle-serving是否支持Int32支持 #### Q: paddle-serving是否支持Int32支持
**A:** 在protobuf定feed_type和fetch_type编号与数据类型对应如下 **A:** 在protobuf定feed_type和fetch_type编号与数据类型对应如下,完整信息可参考[Serving配置与启动参数说明](./Serving_Configure_CN.md#模型配置文件)
0-int64 0-int64
...@@ -332,7 +338,7 @@ GLOG_v=2 python -m paddle_serving_server.serve --model xxx_conf/ --port 9999 ...@@ -332,7 +338,7 @@ GLOG_v=2 python -m paddle_serving_server.serve --model xxx_conf/ --port 9999
使用gdb调试core文件的方法为:gdb <可执行文件> <core文件>,进入后输入bt指令,一般即可显示出错在哪一行。 使用gdb调试core文件的方法为:gdb <可执行文件> <core文件>,进入后输入bt指令,一般即可显示出错在哪一行。
注意:可执行文件路径是C++ bin文件的路径,而不是python命令,一般为类似下面的这种/usr/local/lib/python3.6/site-packages/paddle_serving_server/serving-gpu-102-0.6.2/serving 注意:可执行文件路径是C++ bin文件的路径,而不是python命令,一般为类似下面的这种/usr/local/lib/python3.6/site-packages/paddle_serving_server/serving-gpu-102-0.7.0/serving
## 性能优化 ## 性能优化
...@@ -43,15 +43,15 @@ bash Serving/tools/paddle_env_install.sh ...@@ -43,15 +43,15 @@ bash Serving/tools/paddle_env_install.sh
**GPU:** **GPU:**
``` ```
# 启动 GPU Docker # 启动 GPU Docker
docker pull paddlepaddle/paddle:2.2.0-cuda10.2-cudnn7 docker pull paddlepaddle/paddle:2.2.0-gpu-cuda10.2-cudnn7
nvidia-docker run -p 9292:9292 --name test -dit paddlepaddle/paddle:2.2.0-cuda10.2-cudnn7 bash nvidia-docker run -p 9292:9292 --name test -dit paddlepaddle/paddle:2.2.0-gpu-cuda10.2-cudnn7 bash
nvidia-docker exec -it test bash nvidia-docker exec -it test bash
git clone https://github.com/PaddlePaddle/Serving git clone https://github.com/PaddlePaddle/Serving
# Paddle开发镜像需要执行以下脚本增加Serving所需依赖项 # Paddle开发镜像需要执行以下脚本增加Serving所需依赖项
bash Serving/tools/paddle_env_install.sh bash Serving/tools/paddle_env_install.sh
``` ```
## 2.安装Paddle Serving相关Python库 ## 2.安装Paddle Serving稳定wheel包
安装所需的pip依赖 安装所需的pip依赖
``` ```
...@@ -77,6 +77,8 @@ paddle-serving-server和paddle-serving-server-gpu安装包支持Centos 6/7, Ubun ...@@ -77,6 +77,8 @@ paddle-serving-server和paddle-serving-server-gpu安装包支持Centos 6/7, Ubun
paddle-serving-client和paddle-serving-app安装包支持Linux和Windows,其中paddle-serving-client仅支持python3.6/3.7/3.8。 paddle-serving-client和paddle-serving-app安装包支持Linux和Windows,其中paddle-serving-client仅支持python3.6/3.7/3.8。
**如果您之前使用paddle serving 0.5.X 0.6.X的Cuda10.2环境,需要注意在0.7.0版本,paddle-serving-server-gpu==0.7.0.post102的使用Cudnn7和TensorRT6,而0.6.0.post102使用cudnn8和TensorRT7。如果0.6.0的cuda10.2用户需要升级安装,请使用paddle-serving-server-gpu==0.7.0.post1028**
## 3.安装Paddle相关Python库 ## 3.安装Paddle相关Python库
**当您使用`paddle_serving_client.convert`命令或者`Python Pipeline框架`时才需要安装。** **当您使用`paddle_serving_client.convert`命令或者`Python Pipeline框架`时才需要安装。**
``` ```
...@@ -97,8 +99,8 @@ pip3 install https://paddle-inference-lib.bj.bcebos.com/2.2.0/python/Linux/GPU/x ...@@ -97,8 +99,8 @@ pip3 install https://paddle-inference-lib.bj.bcebos.com/2.2.0/python/Linux/GPU/x
| :--------------------------: | :-------------------------------: | :-------------: | :-------------------: | :----------------: | | :--------------------------: | :-------------------------------: | :-------------: | :-------------------: | :----------------: |
| CPU | 0.7.0-devel | Ubuntu 16.04 | 2.2.0 | Ubuntu 18.04. | | CPU | 0.7.0-devel | Ubuntu 16.04 | 2.2.0 | Ubuntu 18.04. |
| Cuda10.1+Cudnn7 | 0.7.0-cuda10.1-cudnn7-devel | Ubuntu 16.04 | 无 | 无 | | Cuda10.1+Cudnn7 | 0.7.0-cuda10.1-cudnn7-devel | Ubuntu 16.04 | 无 | 无 |
| Cuda10.2+Cudnn7 | 0.7.0-cuda10.2-cudnn7-devel | Ubuntu 16.04 | 2.2.0-cuda10.2-cudnn7 | Ubuntu 16.04 | | Cuda10.2+Cudnn7 | 0.7.0-cuda10.2-cudnn7-devel | Ubuntu 16.04 | 2.2.0-gpu-cuda10.2-cudnn7 | Ubuntu 16.04 |
| Cuda10.2+Cudnn8 | 0.7.0-cuda10.2-cudnn8-devel | Ubuntu 16.04 | 无 | 无 | | Cuda10.2+Cudnn8 | 0.7.0-cuda10.2-cudnn8-devel | Ubuntu 16.04 | 无 | 无 |
| Cuda11.2+Cudnn8 | 0.7.0-cuda11.2-cudnn8-devel | Ubuntu 16.04 | 2.2.0-cuda11.2-cudnn8 | Ubuntu 18.04 | | Cuda11.2+Cudnn8 | 0.7.0-cuda11.2-cudnn8-devel | Ubuntu 16.04 | 2.2.0-gpu-cuda11.2-cudnn8 | Ubuntu 18.04 |
对于**Windows 10 用户**,请参考文档[Windows平台使用Paddle Serving指导](Windows_Tutorial_CN.md) 对于**Windows 10 用户**,请参考文档[Windows平台使用Paddle Serving指导](Windows_Tutorial_CN.md)
...@@ -43,8 +43,8 @@ bash Serving/tools/paddle_env_install.sh ...@@ -43,8 +43,8 @@ bash Serving/tools/paddle_env_install.sh
**GPU:** **GPU:**
``` ```
# Start GPU Docker # Start GPU Docker
docker pull paddlepaddle/paddle:2.2.0-cuda10.2-cudnn7 docker pull paddlepaddle/paddle:2.2.0-gpu-cuda10.2-cudnn7
nvidia-docker run -p 9292:9292 --name test -dit paddlepaddle/paddle:2.2.0-cuda10.2-cudnn7 bash nvidia-docker run -p 9292:9292 --name test -dit paddlepaddle/paddle:2.2.0-gpu-cuda10.2-cudnn7 bash
nvidia-docker exec -it test bash nvidia-docker exec -it test bash
git clone https://github.com/PaddlePaddle/Serving git clone https://github.com/PaddlePaddle/Serving
...@@ -52,7 +52,7 @@ git clone https://github.com/PaddlePaddle/Serving ...@@ -52,7 +52,7 @@ git clone https://github.com/PaddlePaddle/Serving
bash Serving/tools/paddle_env_install.sh bash Serving/tools/paddle_env_install.sh
``` ```
## 2. Install Paddle Serving related whl Packages ## 2. Install Paddle Serving stable wheel packages
Install the required pip dependencies Install the required pip dependencies
``` ```
...@@ -78,6 +78,8 @@ The paddle-serving-server and paddle-serving-server-gpu installation packages su ...@@ -78,6 +78,8 @@ The paddle-serving-server and paddle-serving-server-gpu installation packages su
The paddle-serving-client and paddle-serving-app installation packages support Linux and Windows, and paddle-serving-client only supports python3.6/3.7/3.8. The paddle-serving-client and paddle-serving-app installation packages support Linux and Windows, and paddle-serving-client only supports python3.6/3.7/3.8.
**If you used the Cuda10.2 environment of paddle serving 0.5.X 0.6.X before, you need to pay attention to version 0.7.0, paddle-serving-server-gpu==0.7.0.post102 uses Cudnn7 and TensorRT6, and 0.6.0.post102 uses cudnn8 and TensorRT7. If 0.6.0 cuda10.2 users need to upgrade, please use paddle-serving-server-gpu==0.7.0.post1028**
## 3. Install Paddle related Python libraries ## 3. Install Paddle related Python libraries
**You only need to install it when you use the `paddle_serving_client.convert` command or the `Python Pipeline framework`. ** **You only need to install it when you use the `paddle_serving_client.convert` command or the `Python Pipeline framework`. **
``` ```
...@@ -100,8 +102,8 @@ pip3 install https://paddle-inference-lib.bj.bcebos.com/2.2.0/python/Linux/GPU/x ...@@ -100,8 +102,8 @@ pip3 install https://paddle-inference-lib.bj.bcebos.com/2.2.0/python/Linux/GPU/x
| :--------------------------: | :-------------------------------: | :-------------: | :-------------------: | :----------------: | | :--------------------------: | :-------------------------------: | :-------------: | :-------------------: | :----------------: |
| CPU | 0.7.0-devel | Ubuntu 16.04 | 2.2.0 | Ubuntu 18.04. | | CPU | 0.7.0-devel | Ubuntu 16.04 | 2.2.0 | Ubuntu 18.04. |
| Cuda10.1+Cudnn7 | 0.7.0-cuda10.1-cudnn7-devel | Ubuntu 16.04 | 无 | 无 | | Cuda10.1+Cudnn7 | 0.7.0-cuda10.1-cudnn7-devel | Ubuntu 16.04 | 无 | 无 |
| Cuda10.2+Cudnn7 | 0.7.0-cuda10.2-cudnn7-devel | Ubuntu 16.04 | 2.2.0-cuda10.2-cudnn7 | Ubuntu 16.04 | | Cuda10.2+Cudnn7 | 0.7.0-cuda10.2-cudnn7-devel | Ubuntu 16.04 | 2.2.0-gpu-cuda10.2-cudnn7 | Ubuntu 16.04 |
| Cuda10.2+Cudnn8 | 0.7.0-cuda10.2-cudnn8-devel | Ubuntu 16.04 | 无 | 无 | | Cuda10.2+Cudnn8 | 0.7.0-cuda10.2-cudnn8-devel | Ubuntu 16.04 | 无 | 无 |
| Cuda11.2+Cudnn8 | 0.7.0-cuda11.2-cudnn8-devel | Ubuntu 16.04 | 2.2.0-cuda11.2-cudnn8 | Ubuntu 18.04 | | Cuda11.2+Cudnn8 | 0.7.0-cuda11.2-cudnn8-devel | Ubuntu 16.04 | 2.2.0-gpu-cuda11.2-cudnn8 | Ubuntu 18.04 |
For **Windows 10 users**, please refer to the document [Paddle Serving Guide for Windows Platform](Windows_Tutorial_CN.md). For **Windows 10 users**, please refer to the document [Paddle Serving Guide for Windows Platform](Windows_Tutorial_CN.md).
# Latest Wheel Packages # Latest Wheel Packages
## CPU server ## Paddle-Serving-Server (x86 CPU/GPU)
### Python 3
```
# Compile by gcc8.2
https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.0.0-py3-none-any.whl
```
## GPU server Check the following table, and copy the address of hyperlink then run `pip3 install`. For example, if you want to install `paddle-serving-server-0.0.0-py3-non-any.whl`, right click the hyper link and copy the link address, the final command is `pip3 install https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.0.0-py3-none-any.whl`.
### Python 3
```
#cuda10.1 Cudnn 7 with TensorRT 6, Compile by gcc8.2
https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.0.0.post101-py3-none-any.whl
#cuda10.2 Cudnn 7 with TensorRT 6, Compile by gcc5.4
https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.0.0.post102-py3-none-any.whl
#cuda10.2 Cudnn 8 with TensorRT 7, Compile by gcc8.2
https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.0.0.post1028-py3-none-any.whl
#cuda11.2 Cudnn 8 with TensorRT 8 (beta), Compile by gcc8.2
https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.0.0.post112-py3-none-any.whl
```
## Client | | develop whl | develop bin | stable whl | stable bin |
|---------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------|
| cpu-avx-mkl | [paddle_serving_server-0.0.0-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.0.0-py3-none-any.whl) | [serving-cpu-avx-mkl-0.0.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-avx-mkl-0.0.0.tar.gz) | [paddle_serving_server-0.7.0-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.7.0-py3-none-any.whl) | [serving-cpu-avx-mkl-0.7.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-avx-mkl-0.7.0.tar.gz) |
| cpu-avx-openblas | [paddle_serving_server-0.0.0-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.0.0-py3-none-any.whl) | [serving-cpu-avx-openblas-0.0.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-avx-openblas-0.0.0.tar.gz) | [paddle_serving_server-0.7.0-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.7.0-py3-none-any.whl) | [serving-cpu-avx-openblas-0.7.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-avx-openblas-0.7.0.tar.gz) |
| cpu-noavx-openblas | [paddle_serving_server-0.0.0-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.0.0-py3-none-any.whl) | [ serving-cpu-noavx-openblas-0.0.0.tar.gz ]( https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-noavx-openblas-0.0.0.tar.gz) | [paddle_serving_server-0.7.0-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-0.7.0-py3-none-any.whl) | [ serving-cpu-noavx-openblas-0.7.0.tar.gz ]( https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-noavx-openblas-0.7.0.tar.gz) |
| cuda10.1-cudnn7-TensorRT6 | [paddle_serving_server_gpu-0.0.0.post101-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.0.0.post101-py3-none-any.whl) | [serving-gpu-101-0.0.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-101-0.0.0.tar.gz) | [paddle_serving_server_gpu-0.7.0.post101-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.7.0.post101-py3-none-any.whl) | [serving-gpu-101-0.7.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-101-0.7.0.tar.gz) |
| cuda10.2-cudnn7-TensorRT6 | [paddle_serving_server_gpu-0.0.0.post102-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.0.0.post102-py3-none-any.whl) | [serving-gpu-102-0.0.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-102-0.0.0.tar.gz) | [paddle_serving_server_gpu-0.7.0.post102-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.7.0.post102-py3-none-any.whl) | [serving-gpu-102-0.7.0.tar.gz](https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-102-0.7.0.tar.gz) |
| cuda10.2-cudnn8-TensorRT7 | [paddle_serving_server_gpu-0.0.0.post1028-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.0.0.post102-py3-none-any.whl) | [ serving-gpu-1028-0.0.0.tar.gz]( https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-1028-0.0.0.tar.gz ) | [paddle_serving_server_gpu-0.7.0.post1028-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.7.0.post102-py3-none-any.whl) | [ serving-gpu-1028-0.7.0.tar.gz]( https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-1028-0.7.0.tar.gz ) |
| cuda11.2-cudnn8-TensorRT8 | [paddle_serving_server_gpu-0.0.0.post112-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.0.0.post112-py3-none-any.whl) | [ serving-gpu-112-0.0.0.tar.gz]( https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-112-0.0.0.tar.gz ) | [paddle_serving_server_gpu-0.7.0.post112-py3-none-any.whl ](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-0.7.0.post112-py3-none-any.whl) | [ serving-gpu-112-0.7.0.tar.gz]( https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-112-0.7.0.tar.gz ) |
### Python 3.6 ### Binary Package
``` for most users, we do not need to read this section. But if you deploy your Paddle Serving on a machine without network, you will encounter a problem that the binary executable tar file cannot be downloaded. Therefore, here we give you all the download links for various environment.
https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp36-none-any.whl
``` ### How to setup SERVING_BIN offline?
### Python 3.8
``` - download the serving server whl package and bin package, and make sure they are for the same environment
https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp38-none-any.whl - download the serving client whl and serving app whl, pay attention to the Python version.
``` - `pip install ` the serving and `tar xf ` the binary package, then `export SERVING_BIN=$PWD/serving-gpu-cuda11-0.0.0/serving` (take Cuda 11 as the example)
### Python 3.7
``` ## paddle-serving-client
https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp37-none-any.whl
``` | | develop whl | stable whl |
|-----------------------|--------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|
| Python3.6 | [paddle_serving_client-0.0.0-cp36-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp36-none-any.whl) | [paddle_serving_client-0.7.0-cp36-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.7.0-cp36-none-any.whl)) |
| Python3.7 | [paddle_serving_client-0.0.0-cp37-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp37-none-any.whl) | [paddle_serving_client-0.7.0-cp37-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.7.0-cp37-none-any.whl) |
| Python3.8 | [paddle_serving_client-0.0.0-cp38-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp38-none-any.whl) | [paddle_serving_client-0.7.0-cp38-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.7.0-cp38-none-any.whl) |
## paddle-serving-app
| | develop whl | stable whl |
|---------|------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------|
| Python3 | [paddle_serving_app-0.0.0-py3-none-any.whl](https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_app-0.0.0-py3-none-any.whl) | [ paddle_serving_app-0.7.0-py3-none-any.whl ]( https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_app-0.7.0-py3-none-any.whl) |
## App
### Python 3
```
https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_app-0.0.0-py3-none-any.whl
```
## Baidu Kunlun user ## Baidu Kunlun user
for kunlun user who uses arm-xpu or x86-xpu can download the wheel packages as follows. Users should use the xpu-beta docker [DOCKER IMAGES](./Docker_Images_CN.md) for kunlun user who uses arm-xpu or x86-xpu can download the wheel packages as follows. Users should use the xpu-beta docker [DOCKER IMAGES](./Docker_Images_CN.md)
...@@ -49,9 +46,15 @@ for kunlun user who uses arm-xpu or x86-xpu can download the wheel packages as f ...@@ -49,9 +46,15 @@ for kunlun user who uses arm-xpu or x86-xpu can download the wheel packages as f
for arm kunlun user for arm kunlun user
``` ```
https://paddle-serving.bj.bcebos.com/whl/xpu/0.7.0/paddle_serving_server_xpu-0.7.0.post2-cp36-cp36m-linux_aarch64.whl # paddle-serving-server
https://paddle-serving.bj.bcebos.com/whl/xpu/0.7.0/paddle_serving_client-0.7.0-cp36-cp36m-linux_aarch64.whl https://paddle-serving.bj.bcebos.com/whl/xpu/arm/paddle_serving_server_xpu-0.0.0.post2-py3-none-any.whl
https://paddle-serving.bj.bcebos.com/whl/xpu/0.7.0/paddle_serving_app-0.7.0-cp36-cp36m-linux_aarch64.whl # paddle-serving-client
https://paddle-serving.bj.bcebos.com/whl/xpu/arm/paddle_serving_client-0.0.0-cp36-none-any.whl
# paddle-serving-app
https://paddle-serving.bj.bcebos.com/whl/xpu/arm/paddle_serving_app-0.0.0-py3-none-any.whl
# SERVING BIN
https://paddle-serving.bj.bcebos.com/bin/serving-xpu-aarch64-0.0.0.tar.gz
``` ```
for x86 kunlun user for x86 kunlun user
...@@ -62,30 +65,3 @@ https://paddle-serving.bj.bcebos.com/whl/xpu/0.7.0/paddle_serving_app-0.7.0-cp36 ...@@ -62,30 +65,3 @@ https://paddle-serving.bj.bcebos.com/whl/xpu/0.7.0/paddle_serving_app-0.7.0-cp36
``` ```
### Binary Package
for most users, we do not need to read this section. But if you deploy your Paddle Serving on a machine without network, you will encounter a problem that the binary executable tar file cannot be downloaded. Therefore, here we give you all the download links for various environment.
#### Bin links
```
# CPU AVX MKL
https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-avx-mkl-0.0.0.tar.gz
# CPU AVX OPENBLAS
https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-avx-openblas-0.0.0.tar.gz
# CPU NOAVX OPENBLAS
https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-noavx-openblas-0.0.0.tar.gz
# Cuda 10.1
https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-101-0.0.0.tar.gz
# Cuda 10.2 + Cudnn 7
https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-102-0.0.0.tar.gz
# Cuda 10.2 + Cudnn 8
https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-1028-0.0.0.tar.gz
# Cuda 11.2
https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-cuda112-0.0.0.tar.gz
```
#### How to setup SERVING_BIN offline?
- download the serving server whl package and bin package, and make sure they are for the same environment
- download the serving client whl and serving app whl, pay attention to the Python version.
- `pip install ` the serving and `tar xf ` the binary package, then `export SERVING_BIN=$PWD/serving-gpu-cuda11-0.0.0/serving` (take Cuda 11 as the example)
...@@ -20,15 +20,16 @@ kubectl apply -f https://bit.ly/kong-ingress-dbless ...@@ -20,15 +20,16 @@ kubectl apply -f https://bit.ly/kong-ingress-dbless
### 制作Serving运行镜像(可选): ### 制作Serving运行镜像(可选):
首先您需要确定运行镜像的具体环境。和[DOCKER开发镜像列表]()文档相比,开发镜像用于调试、编译代码,携带了大量的开发工具,因此镜像体积较大。运行镜像通常要求缩小容器体积以提高部署的灵活性。如果您不太需要轻量级的运行容器,请直接跳过这一部分。 首先您需要确定运行镜像的具体环境。和[DOCKER开发镜像列表](./Docker_Images_CN.md)文档相比,开发镜像用于调试、编译代码,携带了大量的开发工具,因此镜像体积较大。运行镜像通常要求缩小容器体积以提高部署的灵活性。如果您不太需要轻量级的运行容器,请直接跳过这一部分。
`tools/generate_runtime_docker.sh`文件下,它的使用方式如下 `tools/generate_runtime_docker.sh`文件下,它的使用方式如下
```bash ```bash
bash tools/generate_runtime_docker.sh --env cuda10.1 --python 3.6 --serving 0.6.0 --paddle 2.0.1 --name serving_runtime:cuda10.1-py36 bash tools/generate_runtime_docker.sh --env cuda10.1 --python 3.6 --name serving_runtime:cuda10.1-py36
``` ```
会生成 cuda10.1,python 3.6,serving版本0.6.0 还有 paddle版本2.0.1的运行镜像。如果有其他疑问,可以执行下列语句得到帮助信息。 会生成 cuda10.1,python 3.6,serving版本0.7.0 还有 paddle版本2.2.0的运行镜像。如果有其他疑问,可以执行下列语句得到帮助信息。
如果您需要老版本Serving运行镜像,请checkout到老版本分支。
``` ```
bash tools/generate_runtime_docker.sh --help bash tools/generate_runtime_docker.sh --help
...@@ -83,8 +84,8 @@ python3.6 web_service.py ...@@ -83,8 +84,8 @@ python3.6 web_service.py
web service模式本质上和pipeline模式类似,因此我们以`Serving/examples/C++/PaddleNLP/bert`为例 web service模式本质上和pipeline模式类似,因此我们以`Serving/examples/C++/PaddleNLP/bert`为例
```bash ```bash
#假设您已经拥有Serving运行镜像,假设镜像名为registry.baidubce.com/paddlepaddle/serving:0.6.0-cuda10.2-py36 #假设您已经拥有Serving运行镜像,假设镜像名为registry.baidubce.com/paddlepaddle/serving:0.7.0-cpu-py36
docker run --rm -dit --name webservice_serving_demo registry.baidubce.com/paddlepaddle/serving:0.6.0-cpu-py36 bash docker run --rm -dit --name webservice_serving_demo registry.baidubce.com/paddlepaddle/serving:0.7.0-cpu-py36 bash
cd Serving/examples/C++/PaddleNLP/bert cd Serving/examples/C++/PaddleNLP/bert
### download model ### download model
wget https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SemanticModel/bert_chinese_L-12_H-768_A-12.tar.gz wget https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SemanticModel/bert_chinese_L-12_H-768_A-12.tar.gz
......
...@@ -91,7 +91,7 @@ workdir_9393 ...@@ -91,7 +91,7 @@ workdir_9393
| `model` | str[]| `""` | Path of paddle model directory to be served | | `model` | str[]| `""` | Path of paddle model directory to be served |
| `mem_optim_off` | - | - | Disable memory / graphic memory optimization | | `mem_optim_off` | - | - | Disable memory / graphic memory optimization |
| `ir_optim` | bool | False | Enable analysis and optimization of calculation graph | | `ir_optim` | bool | False | Enable analysis and optimization of calculation graph |
| `use_mkl` (Only for cpu version) | - | - | Run inference with MKL | | `use_mkl` (Only for cpu version) | - | - | Run inference with MKL. Need open with ir_optim. |
| `use_trt` (Only for trt version) | - | - | Run inference with TensorRT. Need open with ir_optim. | | `use_trt` (Only for trt version) | - | - | Run inference with TensorRT. Need open with ir_optim. |
| `use_lite` (Only for Intel x86 CPU or ARM CPU) | - | - | Run PaddleLite inference. Need open with ir_optim. | | `use_lite` (Only for Intel x86 CPU or ARM CPU) | - | - | Run PaddleLite inference. Need open with ir_optim. |
| `use_xpu` | - | - | Run PaddleLite inference with Baidu Kunlun XPU. Need open with ir_optim. | | `use_xpu` | - | - | Run PaddleLite inference with Baidu Kunlun XPU. Need open with ir_optim. |
...@@ -357,17 +357,22 @@ op: ...@@ -357,17 +357,22 @@ op:
#Fetch结果列表,以client_config中fetch_var的alias_name为准 #Fetch结果列表,以client_config中fetch_var的alias_name为准
fetch_list: ["concat_1.tmp_0"] fetch_list: ["concat_1.tmp_0"]
#计算硬件ID,当devices为""或不写时为CPU预测;当devices为"0", "0,1,2"时为GPU预测,表示使用的GPU卡
devices: ""
# device_type, 0=cpu, 1=gpu, 2=tensorRT, 3=arm cpu, 4=kunlun xpu # device_type, 0=cpu, 1=gpu, 2=tensorRT, 3=arm cpu, 4=kunlun xpu
device_type: 0 device_type: 0
#use_mkldnn #计算硬件ID,当devices为""或不写时为CPU预测;当devices为"0", "0,1,2"时为GPU预测,表示使用的GPU卡
devices: ""
#use_mkldnn, 开启mkldnn时,必须同时设置ir_optim=True,否则无效
#use_mkldnn: True #use_mkldnn: True
#ir_optim #ir_optim, 开启TensorRT时,必须同时设置ir_optim=True,否则无效
ir_optim: True ir_optim: True
#precsion, 预测精度,降低预测精度可提升预测速度
#GPU 支持: "fp32"(default), "fp16", "int8";
#CPU 支持: "fp32"(default), "fp16", "bf16"(mkldnn); 不支持: "int8"
precision: "fp32"
rec: rec:
#并发数,is_thread_op=True时,为线程并发;否则为进程并发 #并发数,is_thread_op=True时,为线程并发;否则为进程并发
concurrency: 3 concurrency: 3
...@@ -390,17 +395,22 @@ op: ...@@ -390,17 +395,22 @@ op:
#Fetch结果列表,以client_config中fetch_var的alias_name为准 #Fetch结果列表,以client_config中fetch_var的alias_name为准
fetch_list: ["ctc_greedy_decoder_0.tmp_0", "softmax_0.tmp_0"] fetch_list: ["ctc_greedy_decoder_0.tmp_0", "softmax_0.tmp_0"]
#计算硬件ID,当devices为""或不写时为CPU预测;当devices为"0", "0,1,2"时为GPU预测,表示使用的GPU卡
devices: ""
# device_type, 0=cpu, 1=gpu, 2=tensorRT, 3=arm cpu, 4=kunlun xpu # device_type, 0=cpu, 1=gpu, 2=tensorRT, 3=arm cpu, 4=kunlun xpu
device_type: 0 device_type: 0
#use_mkldnn #计算硬件ID,当devices为""或不写时为CPU预测;当devices为"0", "0,1,2"时为GPU预测,表示使用的GPU卡
devices: ""
#use_mkldnn, 开启mkldnn时,必须同时设置ir_optim=True,否则无效
#use_mkldnn: True #use_mkldnn: True
#ir_optim #ir_optim, 开启TensorRT时,必须同时设置ir_optim=True,否则无效
ir_optim: True ir_optim: True
#precsion, 预测精度,降低预测精度可提升预测速度
#GPU 支持: "fp32"(default), "fp16", "int8";
#CPU 支持: "fp32"(default), "fp16", "bf16"(mkldnn); 不支持: "int8"
precision: "fp32"
``` ```
### 单机多卡 ### 单机多卡
...@@ -454,4 +464,4 @@ Python Pipeline支持低精度推理,CPU、GPU和TensoRT支持的精度类型 ...@@ -454,4 +464,4 @@ Python Pipeline支持低精度推理,CPU、GPU和TensoRT支持的精度类型
#GPU 支持: "fp32"(default), "fp16(TensorRT)", "int8"; #GPU 支持: "fp32"(default), "fp16(TensorRT)", "int8";
#CPU 支持: "fp32"(default), "fp16", "bf16"(mkldnn); 不支持: "int8" #CPU 支持: "fp32"(default), "fp16", "bf16"(mkldnn); 不支持: "int8"
precision: "fp32" precision: "fp32"
``` ```
\ No newline at end of file
...@@ -91,7 +91,7 @@ More flags: ...@@ -91,7 +91,7 @@ More flags:
| `model` | str[]| `""` | Path of paddle model directory to be served | | `model` | str[]| `""` | Path of paddle model directory to be served |
| `mem_optim_off` | - | - | Disable memory / graphic memory optimization | | `mem_optim_off` | - | - | Disable memory / graphic memory optimization |
| `ir_optim` | bool | False | Enable analysis and optimization of calculation graph | | `ir_optim` | bool | False | Enable analysis and optimization of calculation graph |
| `use_mkl` (Only for cpu version) | - | - | Run inference with MKL | | `use_mkl` (Only for cpu version) | - | - | Run inference with MKL. Need open with ir_optim. |
| `use_trt` (Only for trt version) | - | - | Run inference with TensorRT. Need open with ir_optim. | | `use_trt` (Only for trt version) | - | - | Run inference with TensorRT. Need open with ir_optim. |
| `use_lite` (Only for Intel x86 CPU or ARM CPU) | - | - | Run PaddleLite inference. Need open with ir_optim. | | `use_lite` (Only for Intel x86 CPU or ARM CPU) | - | - | Run PaddleLite inference. Need open with ir_optim. |
| `use_xpu` | - | - | Run PaddleLite inference with Baidu Kunlun XPU. Need open with ir_optim. | | `use_xpu` | - | - | Run PaddleLite inference with Baidu Kunlun XPU. Need open with ir_optim. |
...@@ -380,17 +380,22 @@ op: ...@@ -380,17 +380,22 @@ op:
#Fetch data list #Fetch data list
fetch_list: ["concat_1.tmp_0"] fetch_list: ["concat_1.tmp_0"]
#Device ID
devices: ""
# device_type, 0=cpu, 1=gpu, 2=tensorRT, 3=arm cpu, 4=kunlun xpu # device_type, 0=cpu, 1=gpu, 2=tensorRT, 3=arm cpu, 4=kunlun xpu
device_type: 0 device_type: 0
#use_mkldnn #Device ID
devices: ""
#use_mkldnn, When running on mkldnn,must set ir_optim=True
#use_mkldnn: True #use_mkldnn: True
#ir_optim #ir_optim, When running on TensorRT,must set ir_optim=True
ir_optim: True ir_optim: True
#precsion, Decrease accuracy can increase speed
#GPU 支持: "fp32"(default), "fp16", "int8";
#CPU 支持: "fp32"(default), "fp16", "bf16"(mkldnn); 不支持: "int8"
precision: "fp32"
rec: rec:
#concurrency,is_thread_op=True,thread otherwise process #concurrency,is_thread_op=True,thread otherwise process
concurrency: 3 concurrency: 3
...@@ -413,17 +418,22 @@ op: ...@@ -413,17 +418,22 @@ op:
#Fetch data list #Fetch data list
fetch_list: ["ctc_greedy_decoder_0.tmp_0", "softmax_0.tmp_0"] fetch_list: ["ctc_greedy_decoder_0.tmp_0", "softmax_0.tmp_0"]
#Device ID
devices: ""
# device_type, 0=cpu, 1=gpu, 2=tensorRT, 3=arm cpu, 4=kunlun xpu # device_type, 0=cpu, 1=gpu, 2=tensorRT, 3=arm cpu, 4=kunlun xpu
device_type: 0 device_type: 0
#Device ID
devices: ""
#use_mkldnn #use_mkldnn, When running on mkldnn,must set ir_optim=True
#use_mkldnn: True #use_mkldnn: True
#ir_optim #ir_optim, When running on TensorRT,must set ir_optim=True
ir_optim: True ir_optim: True
#precsion, Decrease accuracy can increase speed
#GPU 支持: "fp32"(default), "fp16", "int8";
#CPU 支持: "fp32"(default), "fp16", "bf16"(mkldnn); 不支持: "int8"
precision: "fp32"
``` ```
### Single-machine and multi-card inference ### Single-machine and multi-card inference
......
...@@ -27,9 +27,9 @@ Paddle Serving提供很多大规模场景需要的部署功能:1)模型管 ...@@ -27,9 +27,9 @@ Paddle Serving提供很多大规模场景需要的部署功能:1)模型管
| 响应时间 | 吞吐 | 开发效率 | 资源利用率 | 选型 | 应用场景| | 响应时间 | 吞吐 | 开发效率 | 资源利用率 | 选型 | 应用场景|
|-----|------|-----|-----|------|------| |-----|------|-----|-----|------|------|
| 低 | 高 | 低 | 高 |C++ Serving | 高性能场景,大型在线推荐系统召回、排序服务| | 低 | 最高 | 低 | 最高 |C++ Serving | 高性能场景,大型在线推荐系统召回、排序服务|
| 高 | 高 | 较高 |高|Python Pipeline Serving| 兼顾吞吐和效率,单算子多模型组合场景,异步模式| | 最高 | 较高 | 较高 |较高|Python Pipeline Serving| 兼顾吞吐和效率,单算子多模型组合场景,异步模式|
| 高 | 低 | 高| 低 |Python webservice| 高迭代效率场景,小型服务或需要快速迭代,模型效果验证| | 较高 | 低 | 较高| 低 |Python webservice| 高迭代效率场景,小型服务或需要快速迭代,模型效果验证|
性能指标说明: 性能指标说明:
...@@ -197,12 +197,4 @@ Pipeline Serving核心设计是图执行引擎,基本处理单元是OP和Chann ...@@ -197,12 +197,4 @@ Pipeline Serving核心设计是图执行引擎,基本处理单元是OP和Chann
<center> <center>
<img src='images/pipeline_serving-image2.png' height = "300" align="middle"/> <img src='images/pipeline_serving-image2.png' height = "300" align="middle"/>
</center> </center>
----
## 6. 未来计划
### 6.1 向量检索、树结构检索
在推荐与广告场景的召回系统中,通常需要采用基于向量的快速检索或者基于树结构的快速检索,Paddle Serving会对这方面的检索引擎进行集成或扩展。
### 6.2 服务监控
集成普罗米修斯监控,一套开源的监控&报警&时间序列数据库的组合,适合k8s和docker的监控系统。
...@@ -25,9 +25,9 @@ In order to meet the needs of users in different scenarios, Paddle Serving's pro ...@@ -25,9 +25,9 @@ In order to meet the needs of users in different scenarios, Paddle Serving's pro
| Response time | throughput | development efficiency | Resource utilization | selection | Applications| | Response time | throughput | development efficiency | Resource utilization | selection | Applications|
|-----|------|-----|-----|------|------| |-----|------|-----|-----|------|------|
| LOW | HIGH | LOW | HIGH |C++ Serving | High-performance,recall and ranking services of large-scale online recommendation systems| | Low | Highest | Low | Highest |C++ Serving | High-performance,recall and ranking services of large-scale online recommendation systems|
| HIGH | HIGH | HIGH | HIGH |Python Pipeline Serving| High-throughput, high-efficiency, asynchronous mode, fitting for single operator multi-model combination scenarios| | Higest | Higher | Higher | Higher |Python Pipeline Serving| High-throughput, high-efficiency, asynchronous mode, fitting for single operator multi-model combination scenarios|
| HIGH | LOW | HIGH| LOW |Python webservice| High-throughput,Low-traffic services or projects that require rapid iteration, model effect verification| | Higer | Low | Higher| Low |Python webservice| High-throughput,Low-traffic services or projects that require rapid iteration, model effect verification|
Performance index description: Performance index description:
1. Response time (ms): Average response time of a single request, calculate the response time of 50, 90, 95, 99 quantiles, the lower the better. 1. Response time (ms): Average response time of a single request, calculate the response time of 50, 90, 95, 99 quantiles, the lower the better.
...@@ -199,16 +199,3 @@ The core design of Pipeline Serving is a graph execution engine, and the basic p ...@@ -199,16 +199,3 @@ The core design of Pipeline Serving is a graph execution engine, and the basic p
<img src='images/pipeline_serving-image2.png' height = "300" align="middle"/> <img src='images/pipeline_serving-image2.png' height = "300" align="middle"/>
</center> </center>
----
## 6. Future Plan
### 5.1 Auto Deployment on Cloud
In order to make deployment more easily on public cloud, Paddle Serving considers to provides Operators on Kubernetes in submitting a service job.
### 6.2 Vector Indexing and Tree based Indexing
In recommendation and advertisement systems, it is commonly seen to use vector based index or tree based indexing service to do candidate retrievals. These retrieval tasks will be built-in services of Paddle Serving.
### 6.3 Service Monitoring
Paddle Serving will integrate Prometheus monitoring, which is a set of open source monitoring & alarm & time series database combination, suitable for k8s and docker monitoring systems.
doc/images/wechat_group_1.jpeg

91.8 KB | W: | H:

doc/images/wechat_group_1.jpeg

111.0 KB | W: | H:

doc/images/wechat_group_1.jpeg
doc/images/wechat_group_1.jpeg
doc/images/wechat_group_1.jpeg
doc/images/wechat_group_1.jpeg
  • 2-up
  • Swipe
  • Onion skin
...@@ -75,7 +75,7 @@ java -cp paddle-serving-sdk-java-examples-0.0.1-jar-with-dependencies.jar Pipeli ...@@ -75,7 +75,7 @@ java -cp paddle-serving-sdk-java-examples-0.0.1-jar-with-dependencies.jar Pipeli
2.Currently Serving has launched the Pipeline mode (see [Pipeline Serving](../doc/Python_Pipeline/Pipeline_Design_EN.md) for details). Pipeline Serving Client for Java is released. 2.Currently Serving has launched the Pipeline mode (see [Pipeline Serving](../doc/Python_Pipeline/Pipeline_Design_EN.md) for details). Pipeline Serving Client for Java is released.
3.The parameters`ip` and`port` in PipelineClientExample.java(path:java/examples/src/main/java/[PipelineClientExample.java](./examples/src/main/java/PipelineClientExample.java)),needs to be connected with the corresponding pipeline server parameters`ip` and`port` which is defined in the config.yaml(Taking IMDB model ensemble as an example,path:python/examples/pipeline/imdb_model_ensemble/[config.yaml](../python/examples/pipeline/imdb_model_ensemble/config.yml) 3.The parameters`ip` and`port` in PipelineClientExample.java(path:java/examples/src/main/java/[PipelineClientExample.java](./examples/src/main/java/PipelineClientExample.java)),needs to be connected with the corresponding pipeline server parameters`ip` and`port` which is defined in the config.yaml(Taking IMDB model ensemble as an example,path:python/examples/pipeline/imdb_model_ensemble/[config.yaml](../examples/Pipeline/imdb_model_ensemble/config.yml)
### Customization guidance ### Customization guidance
......
...@@ -24,7 +24,7 @@ if [[ "$VERSION" == "cuda10.1" ]];then ...@@ -24,7 +24,7 @@ if [[ "$VERSION" == "cuda10.1" ]];then
rm TensorRT6-cuda10.1-cudnn7.tar.gz rm TensorRT6-cuda10.1-cudnn7.tar.gz
elif [[ "$VERSION" == "cuda11.2" ]];then elif [[ "$VERSION" == "cuda11.2" ]];then
wget https://paddle-ci.gz.bcebos.com/TRT/TensorRT-8.0.3.4.Linux.x86_64-gnu.cuda-11.3.cudnn8.2.tar.gz --no-check-certificate wget https://paddle-ci.gz.bcebos.com/TRT/TensorRT-8.0.3.4.Linux.x86_64-gnu.cuda-11.3.cudnn8.2.tar.gz --no-check-certificate
tar -zxf TensorRT-8.0.3.4.Linux.x86_64-gnu.cuda-11.3.cudnn8.2.tar.gz tar -zxf TensorRT-8.0.3.4.Linux.x86_64-gnu.cuda-11.3.cudnn8.2.tar.gz -C /usr/local
cp -rf /usr/local/TensorRT-8.0.3.4/include/* /usr/include/ && cp -rf /usr/local/TensorRT-8.0.3.4/lib/* /usr/lib/ cp -rf /usr/local/TensorRT-8.0.3.4/include/* /usr/include/ && cp -rf /usr/local/TensorRT-8.0.3.4/lib/* /usr/lib/
rm -rf TensorRT-8.0.3.4.Linux.x86_64-gnu.cuda-11.3.cudnn8.2.tar.gz rm -rf TensorRT-8.0.3.4.Linux.x86_64-gnu.cuda-11.3.cudnn8.2.tar.gz
elif [[ "$VERSION" == "cuda10.2" ]];then elif [[ "$VERSION" == "cuda10.2" ]];then
......
...@@ -20,7 +20,7 @@ RUN_ENV=$3 # cpu/10.1 10.2 ...@@ -20,7 +20,7 @@ RUN_ENV=$3 # cpu/10.1 10.2
PYTHON_VERSION=$4 PYTHON_VERSION=$4
serving_release= serving_release=
client_release="paddle-serving-client==$SERVING_VERSION" client_release="paddle-serving-client==$SERVING_VERSION"
app_release="paddle-serving-app==0.3.1" app_release="paddle-serving-app==$SERVING_VERSION"
if [[ $PYTHON_VERSION == "3.6" ]];then if [[ $PYTHON_VERSION == "3.6" ]];then
CPYTHON="36" CPYTHON="36"
...@@ -33,48 +33,28 @@ elif [[ $PYTHON_VERSION == "3.8" ]];then ...@@ -33,48 +33,28 @@ elif [[ $PYTHON_VERSION == "3.8" ]];then
CPYTHON_PADDLE="38" CPYTHON_PADDLE="38"
fi fi
if [[ $SERVING_VERSION == "0.5.0" ]]; then if [[ "$RUN_ENV" == "cpu" ]];then
if [[ "$RUN_ENV" == "cpu" ]];then server_release="https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-$SERVING_VERSION-py3-none-any.whl"
server_release="paddle-serving-server==$SERVING_VERSION" serving_bin="https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-avx-mkl-$SERVING_VERSION.tar.gz"
serving_bin="https://paddle-serving.bj.bcebos.com/bin/serving-cpu-avx-mkl-${SERVING_VERSION}.tar.gz" paddle_whl="paddlepaddle==$PADDLE_VERSION"
paddle_whl="https://paddle-wheel.bj.bcebos.com/$PADDLE_VERSION-cpu-avx-mkl/paddlepaddle-$PADDLE_VERSION-cp$CPYTHON-cp$CPYTHON_PADDLE-linux_x86_64.whl" elif [[ "$RUN_ENV" == "cuda10.1" ]];then
elif [[ "$RUN_ENV" == "cuda10.1" ]];then server_release="https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-$SERVING_VERSION.post101-py3-none-any.whl"
server_release="paddle-serving-server-gpu==$SERVING_VERSION.post101" serving_bin="https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-101-$SERVING_VERSION.tar.gz"
serving_bin="https://paddle-serving.bj.bcebos.com/bin/serving-gpu-101-${SERVING_VERSION}.tar.gz" paddle_whl="https://paddle-inference-lib.bj.bcebos.com/$PADDLE_VERSION/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.1_cudnn7.6.5_trt6.0.1.5/paddlepaddle_gpu-$PADDLE_VERSION.post101-cp$CPYTHON-cp$CPYTHON_PADDLE-linux_x86_64.whl"
paddle_whl="https://paddle-wheel.bj.bcebos.com/with-trt/$PADDLE_VERSION-gpu-cuda10.1-cudnn7-mkl-gcc8.2/paddlepaddle_gpu-$PADDLE_VERSION.post101-cp$CPYTHON-cp$CPYTHON_PADDLE-linux_x86_64.whl" elif [[ "$RUN_ENV" == "cuda10.2" ]] ;then
elif [[ "$RUN_ENV" == "cuda10.2" ]];then server_release="https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-$SERVING_VERSION.post1028-py3-none-any.whl"
server_release="paddle-serving-server-gpu==$SERVING_VERSION.post102" serving_bin="https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-1028-$SERVING_VERSION.tar.gz"
serving_bin="https://paddle-serving.bj.bcebos.com/bin/serving-gpu-102-${SERVING_VERSION}.tar.gz" paddle_whl="https://paddle-inference-lib.bj.bcebos.com/$PADDLE_VERSION/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.2_cudnn8.1.1_trt7.2.3.4/paddlepaddle_gpu-$PADDLE_VERSION-cp$CPYTHON-cp$CPYTHON_PADDLE-linux_x86_64.whl"
paddle_whl="https://paddle-wheel.bj.bcebos.com/with-trt/$PADDLE_VERSION-gpu-cuda10.2-cudnn8-mkl-gcc8.2/paddlepaddle_gpu-$PADDLE_VERSION-cp$CPYTHON-cp$CPYTHON_PADDLE-linux_x86_64.whl" elif [[ "$RUN_ENV" == "cuda11.2" ]];then
elif [[ "$RUN_ENV" == "cuda11" ]];then server_release="https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-$SERVING_VERSION.post112-py3-none-any.whl"
server_release="paddle-serving-server-gpu==$SERVING_VERSION.post11" serving_bin="https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-112-$SERVING_VERSION.tar.gz"
serving_bin="https://paddle-serving.bj.bcebos.com/bin/serving-gpu-cuda11-${SERVING_VERSION}.tar.gz" paddle_whl="https://paddle-inference-lib.bj.bcebos.com/$PADDLE_VERSION/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda11.2_cudnn8.2.1_trt8.0.3.4/paddlepaddle_gpu-$PADDLE_VERSION.post112-cp$CPYTHON-cp$CPYTHON_PADDLE-linux_x86_64.whl"
paddle_whl="https://paddle-wheel.bj.bcebos.com/with-trt/$PADDLE_VERSION-gpu-cuda11.0-cudnn8-mkl-gcc8.2/paddlepaddle_gpu-$PADDLE_VERSION.post110-cp$CPYTHON-cp$CPYTHON_PADDLE-linux_x86_64.whl"
fi
client_release="paddle-serving-client==$SERVING_VERSION"
app_release="paddle-serving-app==0.3.1"
else
if [[ "$RUN_ENV" == "cpu" ]];then
server_release="https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server-$SERVING_VERSION-py3-none-any.whl"
serving_bin="https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-cpu-avx-mkl-$SERVING_VERSION.tar.gz"
paddle_whl="https://paddle-wheel.bj.bcebos.com/$PADDLE_VERSION-cpu-avx-mkl/paddlepaddle-$PADDLE_VERSION-cp$CPYTHON-cp$CPYTHON_PADDLE-linux_x86_64.whl"
elif [[ "$RUN_ENV" == "cuda10.1" ]];then
server_release="https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-$SERVING_VERSION.post101-py3-none-any.whl"
serving_bin="https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-101-$SERVING_VERSION.tar.gz"
paddle_whl="https://paddle-wheel.bj.bcebos.com/with-trt/$PADDLE_VERSION-gpu-cuda10.1-cudnn7-mkl-gcc8.2/paddlepaddle_gpu-$PADDLE_VERSION.post101-cp$CPYTHON-cp$CPYTHON_PADDLE-linux_x86_64.whl"
elif [[ "$RUN_ENV" == "cuda10.2" ]];then
server_release="https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-$SERVING_VERSION.post102-py3-none-any.whl"
serving_bin="https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-102-$SERVING_VERSION.tar.gz"
paddle_whl="https://paddle-wheel.bj.bcebos.com/with-trt/$PADDLE_VERSION-gpu-cuda10.2-cudnn8-mkl-gcc8.2/paddlepaddle_gpu-$PADDLE_VERSION-cp$CPYTHON-cp$CPYTHON_PADDLE-linux_x86_64.whl"
elif [[ "$RUN_ENV" == "cuda11" ]];then
server_release="https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_server_gpu-$SERVING_VERSION.post11-py3-none-any.whl"
serving_bin="https://paddle-serving.bj.bcebos.com/test-dev/bin/serving-gpu-cuda11-$SERVING_VERSION.tar.gz"
paddle_whl="https://paddle-wheel.bj.bcebos.com/with-trt/$PADDLE_VERSION-gpu-cuda11.0-cudnn8-mkl-gcc8.2/paddlepaddle_gpu-$PADDLE_VERSION.post110-cp$CPYTHON-cp$CPYTHON_PADDLE-linux_x86_64.whl"
fi
client_release="https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-$SERVING_VERSION-cp$CPYTHON-none-any.whl"
app_release="https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_app-$SERVING_VERSION-py3-none-any.whl"
fi fi
client_release="https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-$SERVING_VERSION-cp$CPYTHON-none-any.whl"
app_release="https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_app-$SERVING_VERSION-py3-none-any.whl"
if [[ "$RUN_ENV" == "cpu" ]];then if [[ "$RUN_ENV" == "cpu" ]];then
python$PYTHON_VERSION -m pip install $client_release $app_release $server_release python$PYTHON_VERSION -m pip install $client_release $app_release $server_release
python$PYTHON_VERSION -m pip install $paddle_whl python$PYTHON_VERSION -m pip install $paddle_whl
...@@ -105,15 +85,15 @@ elif [[ "$RUN_ENV" == "cuda10.2" ]];then ...@@ -105,15 +85,15 @@ elif [[ "$RUN_ENV" == "cuda10.2" ]];then
echo "export SERVING_BIN=$PWD/serving_bin/serving">>/root/.bashrc echo "export SERVING_BIN=$PWD/serving_bin/serving">>/root/.bashrc
rm -rf serving-gpu-102-${SERVING_VERSION}.tar.gz rm -rf serving-gpu-102-${SERVING_VERSION}.tar.gz
cd - cd -
elif [[ "$RUN_ENV" == "cuda11" ]];then elif [[ "$RUN_ENV" == "cuda11.2" ]];then
python$PYTHON_VERSION -m pip install $client_release $app_release $server_release python$PYTHON_VERSION -m pip install $client_release $app_release $server_release
python$PYTHON_VERSION -m pip install $paddle_whl python$PYTHON_VERSION -m pip install $paddle_whl
cd /usr/local/ cd /usr/local/
wget $serving_bin wget $serving_bin
tar xf serving-gpu-cuda11-${SERVING_VERSION}.tar.gz tar xf serving-gpu-112-${SERVING_VERSION}.tar.gz
mv $PWD/serving-gpu-cuda11-${SERVING_VERSION} $PWD/serving_bin mv $PWD/serving-gpu-112-${SERVING_VERSION} $PWD/serving_bin
echo "export SERVING_BIN=$PWD/serving_bin/serving">>/root/.bashrc echo "export SERVING_BIN=$PWD/serving_bin/serving">>/root/.bashrc
rm -rf serving-gpu-cuda11-${SERVING_VERSION}.tar.gz rm -rf serving-gpu-112-${SERVING_VERSION}.tar.gz
cd - cd -
fi fi
......
...@@ -7,10 +7,10 @@ function usage ...@@ -7,10 +7,10 @@ function usage
{ {
echo "usage: sh tools/generate_runtime_docker.sh --SOME_ARG ARG_VALUE" echo "usage: sh tools/generate_runtime_docker.sh --SOME_ARG ARG_VALUE"
echo " "; echo " ";
echo " --env : running env, cpu/cuda10.1/cuda10.2/cuda11"; echo " --env : running env, cpu/cuda10.1/cuda10.2/cuda11.2";
echo " --python : python version, 3.6/3.7/3.8 "; echo " --python : python version, 3.6/3.7/3.8 ";
echo " --serving : serving version(0.6.0)"; #echo " --serving : serving version(0.6.0/0.6.2)";
echo " --paddle : paddle version(2.1.0)" #echo " --paddle : paddle version(2.1.0/2.2.0)"
echo " --image_name : image name(default serving_runtime:env-python)" echo " --image_name : image name(default serving_runtime:env-python)"
echo " -h | --help : helper"; echo " -h | --help : helper";
} }
...@@ -25,8 +25,8 @@ function parse_args ...@@ -25,8 +25,8 @@ function parse_args
case "$1" in case "$1" in
--env ) env="$2"; shift;; --env ) env="$2"; shift;;
--python ) python="$2"; shift;; --python ) python="$2"; shift;;
--serving ) serving="$2"; shift;; #--serving ) serving="$2"; shift;;
--paddle ) paddle="$2"; shift;; #--paddle ) paddle="$2"; shift;;
--image_name ) image_name="$2"; shift;; --image_name ) image_name="$2"; shift;;
-h | --help ) usage; exit;; # quit and show usage -h | --help ) usage; exit;; # quit and show usage
* ) args+=("$1") # if no match, add it to the positional args * ) args+=("$1") # if no match, add it to the positional args
...@@ -66,9 +66,11 @@ function run ...@@ -66,9 +66,11 @@ function run
base_image="nvidia\/cuda:10.1-cudnn7-runtime-ubuntu16.04" base_image="nvidia\/cuda:10.1-cudnn7-runtime-ubuntu16.04"
elif [ $env == "cuda10.2" ]; then elif [ $env == "cuda10.2" ]; then
base_image="nvidia\/cuda:10.2-cudnn8-runtime-ubuntu16.04" base_image="nvidia\/cuda:10.2-cudnn8-runtime-ubuntu16.04"
elif [ $env == "cuda11" ]; then elif [ $env == "cuda11.2" ]; then
base_image="nvidia\/cuda:11.0.3-cudnn8-runtime-ubuntu16.04" base_image="nvidia\/cuda:11.2.0-cudnn8-runtime-ubuntu16.04"
fi fi
python="2.2.0"
serving="0.7.0"
echo "base image: $base_image" echo "base image: $base_image"
echo "named arg: python: $python" echo "named arg: python: $python"
echo "named arg: serving: $serving" echo "named arg: serving: $serving"
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册