@@ -31,7 +31,8 @@ The goal of Paddle Serving is to provide high-performance, flexible and easy-to-
- There are two frameworks, namely high-performance C++ Serving and high-easy-to-use Python pipeline. The C++ Serving is based on the bRPC network framework to create a high-throughput, low-latency inference service, and its performance indicators are ahead of competing products. The Python pipeline is based on the gRPC/gRPC-Gateway network framework and the Python language to build a highly easy-to-use and high-throughput inference service. How to choose which one please see [Techinical Selection](doc/Serving_Design_EN.md#21-design-selection).
- Support multiple [protocols](doc/C++_Serving/Inference_Protocols_CN.md) such as HTTP, gRPC, bRPC, and provide C++, Python, Java language SDK.
- Design and implement a high-performance inference service framework for asynchronous pipelines based on directed acyclic graph (DAG), with features such as multi-model combination, asynchronous scheduling, concurrent inference, dynamic batch, multi-card multi-stream inference, etc.
- Adapt to a variety of commonly used computing hardwares, such as x86 (Intel) CPU, ARM CPU, Nvidia GPU, Kunlun XPU, etc.; Integrate acceleration libraries of Intel MKLDNN and Nvidia TensorRT, and low-precision and quantitative inference.
- Adapt to a variety of commonly used computing hardwares, such as x86 (Intel) CPU, ARM CPU, Nvidia GPU, Kunlun XPU, HUAWEI Ascend 310/910, HYGON DCU、Nvidia Jetson etc.
- Integrate acceleration libraries of Intel MKLDNN and Nvidia TensorRT, and low-precision and quantitative inference.
- Provide a model security deployment solution, including encryption model deployment, and authentication mechanism, HTTPs security gateway, which is used in practice.
- Support cloud deployment, provide a deployment case of Baidu Cloud Intelligent Cloud kubernetes cluster.
- Provide more than 40 classic pre-model deployment examples, such as PaddleOCR, PaddleClas, PaddleDetection, PaddleSeg, PaddleNLP, PaddleRec and other suites, and more models continue to expand.
...
...
@@ -43,7 +44,7 @@ The goal of Paddle Serving is to provide high-performance, flexible and easy-to-
- Video tutorial(Chinese) : [深度学习服务化部署-以互联网应用为例](https://aistudio.baidu.com/aistudio/course/introduce/19084)
- Edge AI solution based on Paddle Serving & Baidu Intelligent Edge(Chinese) : [基于Paddle Serving&百度智能边缘BIE的边缘AI解决方案](https://mp.weixin.qq.com/s/j0EVlQXaZ7qmoz9Fv96Yrw)
- Edge AI solution(Chinese) : [基于Paddle Serving&百度智能边缘BIE的边缘AI解决方案](https://mp.weixin.qq.com/s/j0EVlQXaZ7qmoz9Fv96Yrw)
<palign="center">
<imgsrc="doc/images/demo.gif"width="700">
...
...
@@ -128,7 +129,6 @@ If you want to communicate with developers and other users? Welcome to join us,
-Edge AI solution based on Paddle Serving & Baidu Intelligent Edge(Chinese) : [基于Paddle Serving&百度智能边缘BIE的边缘AI解决方案](https://mp.weixin.qq.com/s/j0EVlQXaZ7qmoz9Fv96Yrw)
@@ -37,7 +37,7 @@ In addition, for some C++ secondary development scenarios, we also provide OPENC
Docker compilation is recommended. We have prepared the Paddle Serving compilation environment for you and configured the above compilation dependencies. For details, please refer to [this document](DOCKER_IMAGES_CN.md).
We provide five environment development images, namely CPU, Cuda10.1+Cudnn7, Cuda10.2+Cudnn7, Cuda10.2+Cudnn8, Cuda11.2+Cudnn8. We provide a Serving development image to cover the above environment. At the same time, we also support Paddle development mirroring.
We provide five environment development images, namely CPU, CUDA10.1 + CUDNN7, CUDA10.2 + CUDNN7, CUDA10.2 + CUDNN8, CUDA11.2 + CUDNN8. We provide a Serving development image to cover the above environment. At the same time, we also support Paddle development mirroring.
The Serving image name is **paddlepaddle/serving:${Serving development image Tag}** (If the network is not good, you can visit **registry.baidubce.com/paddlepaddle/serving:${Serving development image Tag}**), The name of the Paddle development image is **paddlepaddle/paddle:${Paddle Development Image Tag}**. In order to prevent users from confusing the two sets of mirroring, we explain the origin of the two sets of mirroring separately.
...
...
@@ -45,11 +45,11 @@ Serving development mirror is the mirror used to compile and debug prediction se
| Environment | Serving Dev Image Tag | OS | Paddle Dev Image Tag | OS |
We first need to pull related images for the environment we need. Under the **Environment** column in the above table, except for the CPU, the rest (Cuda**+Cudnn**) belong to the GPU environment.
**Note:** After the compilation is successful, you need to set the `SERVING_BIN` path, see the following [Notes](https://github.com/PaddlePaddle/Serving/blob/develop/doc/COMPILE_CN.md#Notes) ).
**Note:** After the compilation is successful, you need to set the `SERVING_BIN` path.
...
...
@@ -257,7 +257,7 @@ make -j10
| WITH_GPU | Compile Paddle Serving with NVIDIA GPU | OFF |
| WITH_TRT | Compile Paddle Serving with TensorRT | OFF |
| WITH_OPENCV | Compile Paddle Serving with OPENCV | OFF |
@@ -272,26 +272,26 @@ Paddle Serving supports prediction on the GPU through the PaddlePaddle predictio
To compile the Paddle Serving GPU version on bare metal, you need to install these basic libraries:
-CUDA
-CuDNN
-CUDNN
To compile the TensorRT version, you need to install the TensorRT library.
The things to note here are:
1. Compile the basic library versions such as CUDA/CUDNN installed on the system where Serving is located, and need to be compatible with the actual GPU device. For example, Tesla V100 card requires at least CUDA 9.0. If the version of basic libraries such as CUDA used during compilation is too low, the Serving process cannot be started due to the incompatibility between the generated GPU code and the actual hardware device, or serious problems such as coredump may occur.
2. Install the CUDA driver compatible with the actual GPU device on the system running Paddle Serving, and install the basic library compatible with the CUDA/CuDNN version used during compilation. If the version of CUDA/CuDNN installed on the system running Paddle Serving is lower than the version used during compilation, it may cause strange cuda function call failures and other problems.
2. Install the CUDA driver compatible with the actual GPU device on the system running Paddle Serving, and install the basic library compatible with the CUDA/CUDNN version used during compilation. If the version of CUDA/CUDNN installed on the system running Paddle Serving is lower than the version used during compilation, it may cause strange cuda function call failures and other problems.
The following is the matching relationship between PaddleServing mirrored Cuda, Cudnn, and TensorRT for reference:
| | CUDA | CuDNN | TensorRT |
| | CUDA | CUDNN | TensorRT |
| :----: | :-----: | :----------: | :----: |
| post101 | 10.1 | CuDNN 7.6.5 | 6.0.1 |
| post102 | 10.2 | CuDNN 8.0.5 | 7.1.3 |
| post11 | 11.0 | CuDNN 8.0.4 | 7.1.3 |
| post101 | 10.1 | CUDNN 7.6.5 | 6.0.1 |
| post102 | 10.2 | CUDNN 8.0.5 | 7.1.3 |
| post11 | 11.0 | CUDNN 8.0.4 | 7.1.3 |
### Attachment: How to make the Paddle Serving compilation system detect the CuDNN library
### Attachment: How to make the Paddle Serving compilation system detect the CUDNN library
After downloading the corresponding version of CuDNN from the official website of NVIDIA developer and decompressing it locally, add the `-DCUDNN_LIBRARY` parameter to the cmake compilation command and specify the path of the CuDNN library.
After downloading the corresponding version of CUDNN from the official website of NVIDIA developer and decompressing it locally, add the `-DCUDNN_LIBRARY` parameter to the cmake compilation command and specify the path of the CUDNN library.
## Attachment: Compile and install OpenCV library
**Note:** You only need to do this when you need to include the OpenCV library in your C++ code.
@@ -31,20 +31,20 @@ If you want to customize your Serving based on source code, use the version with
**cuda10.1-cudnn7-gcc54 image is not ready, you should run from dockerfile if you need it.**
If you need to develop and compile based on the source code, please use the version with the suffix -devel.
**In the TAG column, 0.7.0 can also be replaced with the corresponding version number, such as 0.5.0/0.4.1, etc., but it should be noted that some development environments only increase with a certain version iteration, so not all environments All have the corresponding version number can be used.**
**In the TAG column, 0.8.0 can also be replaced with the corresponding version number, such as 0.5.0/0.4.1, etc., but it should be noted that some development environments only increase with a certain version iteration, so not all environments All have the corresponding version number can be used.**
Running Images is lighter than Develop Images, and Running Images are made up with serving whl and bin, but without develop tools like cmake because of lower image size. If you want to know about it, plese check the document [Paddle Serving on Kubernetes.](./Run_On_Kubernetes_CN.md).
Running Images is lighter than Develop Images, and Running Images are made up with serving whl and bin, but without develop tools like cmake because of lower image size. If you want to know about it, plese check the document [Paddle Serving on Kubernetes](./Run_On_Kubernetes_CN.md).
**Strongly recommend** you build **Paddle Serving** in Docker. For more images, please refer to [Docker Image List](Docker_Images_CN.md).
**Tip-1**: This project only supports <mark>**Python3.6/3.7/3.8**</mark>, all subsequent operations related to Python/Pip need to select the correct Python version.
**Tip-1**: This project only supports <mark>**Python3.6/3.7/3.8/3.9**</mark>, all subsequent operations related to Python/Pip need to select the correct Python version.
**Tip-2**: The GPU environments in the following examples are all cuda10.2-cudnn7. If you use Python Pipeline to deploy and need Nvidia TensorRT to optimize prediction performance, please refer to [Supported Mirroring Environment and Instructions](#4.-Supported-Docker-Images-and-Instruction) to choose other versions.
...
...
@@ -15,16 +15,16 @@
**CPU:**
```
# Start CPU Docker Container
docker pull paddlepaddle/serving:0.7.0-devel
docker run -p 9292:9292 --name test -dit paddlepaddle/serving:0.7.0-devel bash
docker pull paddlepaddle/serving:0.8.0-devel
docker run -p 9292:9292 --name test -dit paddlepaddle/serving:0.8.0-devel bash
nvidia-docker run -p 9292:9292 --name test -dit paddlepaddle/paddle:2.2.2-gpu-cuda10.2-cudnn7 bash
nvidia-docker exec -it test bash
git clone https://github.com/PaddlePaddle/Serving
...
...
@@ -60,49 +60,54 @@ cd Serving
pip3 install -r python/requirements.txt
```
Install the service whl package. There are three types of client, app and server. The server is divided into CPU and GPU. Choose one installation according to the environment.
- GPU with CUDA10.2 + Cudnn7 + TensorRT6(Recommended)
```shell
pip3 install paddle-serving-client==0.7.0
pip3 install paddle-serving-server==0.7.0 # CPU
pip3 install paddle-serving-app==0.7.0
pip3 install paddle-serving-server-gpu==0.7.0.post102 #GPU with CUDA10.2 + Cudnn7 + TensorRT6
# Other GPU environments need to confirm the environment before choosing which one to execute
pip3 install paddle-serving-server-gpu==0.7.0.post101 # GPU with CUDA10.1 + TensorRT6
pip3 install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUDA11.2 + TensorRT8
If you are in China, You may need to use a chinese mirror source (such as Tsinghua source, add `-i https://pypi.tuna.tsinghua.edu.cn/simple` to the pip command) to speed up the download.
By default, the domestic Tsinghua mirror source is turned on to speed up the download. If you use a proxy, you can turn it off(`-i https://pypi.tuna.tsinghua.edu.cn/simple`).
If you need to use the installation package compiled by the develop branch, please download the download address from [Latest installation package list](./Latest_Packages_CN.md), and use the `pip install` command to install. If you want to compile by yourself, please refer to [Paddle Serving Compilation Document](./Compile_CN.md).
The paddle-serving-server and paddle-serving-server-gpu installation packages support Centos 6/7, Ubuntu 16/18 and Windows 10.
The paddle-serving-client and paddle-serving-app installation packages support Linux and Windows, and paddle-serving-client only supports python3.6/3.7/3.8.
The paddle-serving-client and paddle-serving-app installation packages support Linux and Windows, and paddle-serving-client only supports python3.6/3.7/3.8/3.9.
**If you used the Cuda10.2 environment of paddle serving 0.5.X 0.6.X before, you need to pay attention to version 0.7.0, paddle-serving-server-gpu==0.7.0.post102 uses Cudnn7 and TensorRT6, and 0.6.0.post102 uses cudnn8 and TensorRT7. If 0.6.0 cuda10.2 users need to upgrade, please use paddle-serving-server-gpu==0.7.0.post1028**
**If you used the CUDA10.2 environment of paddle serving 0.5.X 0.6.X before, you need to pay attention to version 0.8.0, paddle-serving-server-gpu==0.8.0.post102 uses Cudnn7 and TensorRT6, and 0.6.0.post102 uses cudnn8 and TensorRT7. If 0.6.0 cuda10.2 users need to upgrade, please use paddle-serving-server-gpu==0.8.0.post1028**
## 3. Install Paddle related Python libraries
**You only need to install it when you use the `paddle_serving_client.convert` command or the `Python Pipeline framework`. **
```
# CPU environment please execute
pip3 install paddlepaddle==2.2.0
pip3 install paddlepaddle==2.2.2
# GPU Cuda10.2 environment please execute
pip3 install paddlepaddle-gpu==2.2.0
# GPU CUDA 10.2 environment please execute
pip3 install paddlepaddle-gpu==2.2.2
```
**Note**: If your Cuda version is not 10.2 or if you want to use TensorRT(Cuda10.2 included), please do not execute the above commands directly, you need to refer to [Paddle-Inference official document-download and install the Linux prediction library](https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python) Select the URL link of the corresponding GPU environment and install it. Assuming that you use Python3.6, please follow the codeblock.
**Note**: If your CUDA version is not 10.2 or if you want to use TensorRT(CUDA10.2 included), please do not execute the above commands directly, you need to refer to [Paddle-Inference official document-download and install the Linux prediction library](https://paddleinference.paddlepaddle.org.cn/master/user_guides/download_lib.html#python) Select the URL link of the corresponding GPU environment and install it. Assuming that you use Python3.6, please follow the codeblock.
for most users, we do not need to read this section. But if you deploy your Paddle Serving on a machine without network, you will encounter a problem that the binary executable tar file cannot be downloaded. Therefore, here we give you all the download links for various environment.
...
...
@@ -27,15 +27,15 @@ for most users, we do not need to read this section. But if you deploy your Padd