diff --git a/README.md b/README.md index d3c17ead4f1b8d6263bb6b144aeb4f0628daa8da..85215d234abb7d069660cb261185abf78cf9a3f6 100644 --- a/README.md +++ b/README.md @@ -124,7 +124,7 @@ Recommended to install paddle >= 2.0.0 pip install paddlepaddle==2.0.0 # GPU Cuda10.2 please run -pip install paddlepaddle-gpu==2.0.0 +pip install paddlepaddle-gpu==2.0.0 ``` **Note**: If your Cuda version is not 10.2, please do not execute the above commands directly, you need to refer to [Paddle official documentation-multi-version whl package list @@ -135,8 +135,12 @@ The url corresponding to `cuda9.0_cudnn7-mkl`, copy it and run ``` pip install https://paddle-wheel.bj.bcebos.com/2.0.0-gpu-cuda9-cudnn7-mkl/paddlepaddle_gpu-2.0.0.post90-cp27-cp27mu-linux_x86_64.whl ``` + +the default `paddlepaddle-gpu==2.0.0` is Cuda 10.2 with no TensorRT. If you want to install PaddlePaddle with TensorRT. please also check the documentation-multi-version whl package list and find key word `cuda10.2-cudnn8.0-trt7.1.3`. More info please check [Paddle Serving uses TensorRT](./doc/TENSOR_RT.md) + If it is other environment and Python version, please find the corresponding link in the table and install it with pip. + For **Windows Users**, please read the document [Paddle Serving for Windows Users](./doc/WINDOWS_TUTORIAL.md)

Quick Start Example

@@ -219,6 +223,7 @@ the response is - [Develop Pipeline Serving](doc/PIPELINE_SERVING.md) - [Deploy Web Service with uWSGI](doc/UWSGI_DEPLOY.md) - [Hot loading for model file](doc/HOT_LOADING_IN_SERVING.md) +- [Paddle Serving uses TensorRT](doc/TENSOR_RT.md) ### About Efficiency - [How to profile Paddle Serving latency?](python/examples/util) diff --git a/README_CN.md b/README_CN.md index b658d6194785bbf2dd14c7ebd21d11048261dbe1..2cae7b525833f9c60411ea6c7f48f3860e22a10b 100644 --- a/README_CN.md +++ b/README_CN.md @@ -112,7 +112,7 @@ pip install paddle-serving-server-gpu==0.5.0.post11 # GPU with CUDA10.1 + Tensor 您可能需要使用国内镜像源(例如清华源, 在pip命令中添加`-i https://pypi.tuna.tsinghua.edu.cn/simple`)来加速下载。 -如果需要使用develop分支编译的安装包,请从[最新安装包列表](./doc/LATEST_PACKAGES.md)中获取下载地址进行下载,使用`pip install`命令进行安装。如果您想自行编译,请参照[Paddle Serving编译文档](./doc/COMPILE_CN.md) +如果需要使用develop分支编译的安装包,请从[最新安装包列表](./doc/LATEST_PACKAGES.md)中获取下载地址进行下载,使用`pip install`命令进行安装。如果您想自行编译,请参照[Paddle Serving编译文档](./doc/COMPILE_CN.md)。 paddle-serving-server和paddle-serving-server-gpu安装包支持Centos 6/7, Ubuntu 16/18和Windows 10。 @@ -134,6 +134,8 @@ pip install paddlepaddle-gpu==2.0.0 ``` pip install https://paddle-wheel.bj.bcebos.com/2.0.0-gpu-cuda9-cudnn7-mkl/paddlepaddle_gpu-2.0.0.post90-cp27-cp27mu-linux_x86_64.whl ``` +由于默认的`paddlepaddle-gpu==2.0.0`是Cuda 10.2,并没有联编TensorRT,因此如果需要和在`paddlepaddle-gpu`上使用TensorRT,需要在上述多版本whl包列表当中,找到`cuda10.2-cudnn8.0-trt7.1.3`,下载对应的Python版本。更多信息请参考[如何使用TensorRT?](doc/TENSOR_RT_CN.md)。 + 如果是其他环境和Python版本,请在表格中找到对应的链接并用pip安装。 对于**Windows 10 用户**,请参考文档[Windows平台使用Paddle Serving指导](./doc/WINDOWS_TUTORIAL_CN.md)。 @@ -220,6 +222,7 @@ curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1 - [如何开发Pipeline?](doc/PIPELINE_SERVING_CN.md) - [如何使用uWSGI部署Web Service](doc/UWSGI_DEPLOY_CN.md) - [如何实现模型文件热加载](doc/HOT_LOADING_IN_SERVING_CN.md) +- [如何使用TensorRT?](doc/TENSOR_RT_CN.md) ### 关于Paddle Serving性能 - [如何测试Paddle Serving性能?](python/examples/util/) diff --git a/doc/TENSOR_RT.md b/doc/TENSOR_RT.md new file mode 100644 index 0000000000000000000000000000000000000000..6e53a6ff029df6a46080d656a6dc9db95a9633e3 --- /dev/null +++ b/doc/TENSOR_RT.md @@ -0,0 +1,65 @@ +## Paddle Serving uses TensorRT + +(English|[简体中文]((./TENSOR_RT_CN.md))) + +### Background + +Deploying models trained on mainstream frameworks through the tensorRT tool launched by Nvidia can greatly increase the speed of model inference, which is often at least 1 times faster than the original framework, and it also takes up more device memory. less. Therefore, it is very useful for all users who need to deploy models to master the method of deploying deep learning models with tensorRT. Paddle Serving provides comprehensive TensorRT ecological support. + +### surroundings + +Serving Cuda10.1 Cuda10.2 and Cuda11 versions support TensorRT. + +#### Install Paddle + +In [Development using Docker environment](./RUN_IN_DOCKER.md) and [Docker image list](./DOCKER_IMAGES.md), we give the development image of TensorRT. After using the mirror to start, you need to install the Paddle whl package that supports TensorRT, refer to the documentation on the home page + +``` +# GPU Cuda10.2 environment please execute +pip install paddlepaddle-gpu==2.0.0 +``` + +**Note**: If your Cuda version is not 10.2, please do not execute the above commands directly, you need to refer to [Paddle official documentation-multi-version whl package list +](https://www.paddlepaddle.org.cn/documentation/docs/en/install/Tables_en.html#multi-version-whl-package-list-release) + +Select the URL link of the corresponding GPU environment and install it. For example, for Python2.7 users of Cuda 10.1, please select `cp27-cp27mu` and +`cuda10.1-cudnn7.6-trt6.0.1.5` corresponding url, copy it and execute +``` +pip install https://paddle-wheel.bj.bcebos.com/with-trt/2.0.0-gpu-cuda10.1-cudnn7-mkl/paddlepaddle_gpu-2.0.0.post101-cp27-cp27mu-linux_x86_64.whl +``` +Since the default `paddlepaddle-gpu==2.0.0` is Cuda 10.2 and TensorRT is not built, if you need to use TensorRT on `paddlepaddle-gpu`, you need to find `cuda10 in the above multi-version whl package list .2-cudnn8.0-trt7.1.3`, download the corresponding Python version. + + +#### Install Paddle Serving +``` +# Cuda10.2 +pip install paddle-server-server==${VERSION}.post102 +# Cuda 10.1 +pip install paddle-server-server==${VERSION}.post101 +# Cuda 11 +pip install paddle-server-server==${VERSION}.post11 +``` + +### Use TensorRT + +#### RPC mode + +In [Serving model example](../python/examples), we have given models that can be accelerated using TensorRT, such as [Faster_RCNN model](../python/examples/detection/faster_rcnn_r50_fpn_1x_coco) under detection + +We just need +``` +wget --no-check-certificate https://paddle-serving.bj.bcebos.com/pddet_demo/2.0/faster_rcnn_r50_fpn_1x_coco.tar +tar xf faster_rcnn_r50_fpn_1x_coco.tar +python -m paddle_serving_server_gpu.serve --model serving_server --port 9494 --gpu_ids 0 --use_trt +``` +The TensorRT version of the faster_rcnn model server is started + + +#### Local Predictor mode + +In [local_predictor](../python/paddle_serving_app/local_predict.py#L52), users can explicitly specify `use_trt=True` and pass it to `load_model_config`. +Other methods are no different from other Local Predictor methods, and you need to pay attention to the compatibility of the model with TensorRT. + +#### Pipeline Mode + +In [Pipeline mode](./PIPELINE_SERVING.md), our [imagenet example](../python/examples/pipeline/imagenet/config.yml#L23) gives the way to set TensorRT. diff --git a/doc/TENSOR_RT_CN.md b/doc/TENSOR_RT_CN.md new file mode 100644 index 0000000000000000000000000000000000000000..40d525d59b5a21ca22f7a1e4274009bf9ceba987 --- /dev/null +++ b/doc/TENSOR_RT_CN.md @@ -0,0 +1,67 @@ +## Paddle Serving 使用 TensorRT + +([English](./TENSOR_RT.md)|简体中文) + +### 背景 + +通过Nvidia推出的tensorRT工具来部署主流框架上训练的模型能够极大的提高模型推断的速度,往往相比与原本的框架能够有至少1倍以上的速度提升,同时占用的设备内存也会更加的少。因此对是所有需要部署模型的用户来说,掌握用tensorRT来部署深度学习模型的方法是非常有用的。Paddle Serving提供了全面的TensorRT生态支持。 + +### 环境 + +Serving 的Cuda10.1 Cuda10.2和Cuda11版本支持TensorRT。 + +#### 安装Paddle + +在[使用Docker环境开发](./RUN_IN_DOCKER_CN.md) 和 [Docker镜像列表](./DOCKER_IMAGES_CN.md)当中,我们给出了TensorRT的开发镜像。使用镜像启动之后,需要安装支持TensorRT的Paddle whl包,参考首页的文档 + +``` +# GPU Cuda10.2环境请执行 +pip install paddlepaddle-gpu==2.0.0 +``` + +**注意**: 如果您的Cuda版本不是10.2,请勿直接执行上述命令,需要参考[Paddle官方文档-多版本whl包列表 +](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/Tables.html#whl-release) + +选择相应的GPU环境的url链接并进行安装,例如Cuda 10.1的Python2.7用户,请选择表格当中的`cp27-cp27mu`和 +`cuda10.1-cudnn7.6-trt6.0.1.5`对应的url,复制下来并执行 +``` +pip install https://paddle-wheel.bj.bcebos.com/with-trt/2.0.0-gpu-cuda10.1-cudnn7-mkl/paddlepaddle_gpu-2.0.0.post101-cp27-cp27mu-linux_x86_64.whl +``` +由于默认的`paddlepaddle-gpu==2.0.0`是Cuda 10.2,并没有联编TensorRT,因此如果需要和在`paddlepaddle-gpu`上使用TensorRT,需要在上述多版本whl包列表当中,找到`cuda10.2-cudnn8.0-trt7.1.3`,下载对应的Python版本。 + + +#### 安装Paddle Serving +``` +# Cuda10.2 +pip install paddle-server-server==${VERSION}.post102 +# Cuda 10.1 +pip install paddle-server-server==${VERSION}.post101 +# Cuda 11 +pip install paddle-server-server==${VERSION}.post11 +``` + +### 使用TensorRT + +#### RPC模式 + +在[Serving模型示例](../python/examples)当中,我们有给出可以使用TensorRT加速的模型,例如detection下的[Faster_RCNN模型](../python/examples/detection/faster_rcnn_r50_fpn_1x_coco) + +我们只需 +``` +wget --no-check-certificate https://paddle-serving.bj.bcebos.com/pddet_demo/2.0/faster_rcnn_r50_fpn_1x_coco.tar +tar xf faster_rcnn_r50_fpn_1x_coco.tar +python -m paddle_serving_server_gpu.serve --model serving_server --port 9494 --gpu_ids 0 --use_trt +``` +TensorRT版本的faster_rcnn模型服务端就启动了 + + +#### Local Predictor模式 + +在 [local_predictor](../python/paddle_serving_app/local_predict.py#L52)当中,用户可以显式制定`use_trt=True`传入到`load_model_config`当中。 +其他方式和其他Local Predictor使用方法没有区别,需要注意模型对TensorRT的兼容性。 + +#### Pipeline模式 + +在 [Pipeline模式](./PIPELINE_SERVING_CN.md)当中,我们的[imagenet例子](../python/examples/pipeline/imagenet/config.yml#L23)给出了设置TensorRT的方式。 + + diff --git a/python/examples/detection/README.md b/python/examples/detection/README.md index 99ea3dd1fff9bdf88e90192e864e72b40b70421b..83f6157c0d29fc3b1c672c06473487e9e4efe3f0 100644 --- a/python/examples/detection/README.md +++ b/python/examples/detection/README.md @@ -14,7 +14,7 @@ Paddle Detection provides a large number of [Model Zoo](https://github.com/Paddl Several examples of PaddleDetection models used in Serving are given in this folder All examples support TensorRT. --[Faster RCNN](./faster_rcnn_r50_fpn_1x_coco) --[PPYOLO](./ppyolo_r50vd_dcn_1x_coco) --[TTFNet](./ttfnet_darknet53_1x_coco) --[YOLOv3](./yolov3_darknet53_270e_coco) +- [Faster RCNN](./faster_rcnn_r50_fpn_1x_coco) +- [PPYOLO](./ppyolo_r50vd_dcn_1x_coco) +- [TTFNet](./ttfnet_darknet53_1x_coco) +- [YOLOv3](./yolov3_darknet53_270e_coco)