未验证 提交 c46ea3d4 编写于 作者: J Jiawei Wang 提交者: GitHub

Merge branch 'develop' into newtest

......@@ -166,16 +166,18 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po
<center>
| Argument | Type | Default | Description |
|--------------|------|-----------|--------------------------------|
| ---------------------------------------------- | ---- | ------- | ----------------------------------------------------- |
| `thread` | int | `4` | Concurrency of current service |
| `port` | int | `9292` | Exposed port of current service to users|
| `port` | int | `9292` | Exposed port of current service to users |
| `model` | str | `""` | Path of paddle model directory to be served |
| `mem_optim_off` | - | - | Disable memory / graphic memory optimization |
| `ir_optim` | - | - | Enable analysis and optimization of calculation graph |
| `ir_optim` | bool | False | Enable analysis and optimization of calculation graph |
| `use_mkl` (Only for cpu version) | - | - | Run inference with MKL |
| `use_trt` (Only for trt version) | - | - | Run inference with TensorRT |
| `use_lite` (Only for ARM) | - | - | Run PaddleLite inference |
| `use_xpu` (Only for ARM+XPU) | - | - | Run PaddleLite XPU inference |
| `use_lite` (Only for Intel x86 CPU or ARM CPU) | - | - | Run PaddleLite inference |
| `use_xpu` | - | - | Run PaddleLite inference with Baidu Kunlun XPU |
| `precision` | str | FP32 | Precision Mode, support FP32, FP16, INT8 |
| `use_calib` | bool | False | Only for deployment with TensorRT |
</center>
......
......@@ -165,17 +165,18 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po
<center>
| Argument | Type | Default | Description |
|--------------|------|-----------|--------------------------------|
| ---------------------------------------------- | ---- | ------- | ------------------------------------------------------ |
| `thread` | int | `4` | Concurrency of current service |
| `port` | int | `9292` | Exposed port of current service to users|
| `port` | int | `9292` | Exposed port of current service to users |
| `name` | str | `""` | Service name, can be used to generate HTTP request url |
| `model` | str | `""` | Path of paddle model directory to be served |
| `mem_optim_off` | - | - | Disable memory optimization |
| `ir_optim` | - | - | Enable analysis and optimization of calculation graph |
| `ir_optim` | bool | False | Enable analysis and optimization of calculation graph |
| `use_mkl` (Only for cpu version) | - | - | Run inference with MKL |
| `use_trt` (Only for Cuda>=10.1 version) | - | - | Run inference with TensorRT |
| `use_lite` (Only for ARM) | - | - | Run PaddleLite inference |
| `use_xpu` (Only for ARM+XPU) | - | - | Run PaddleLite XPU inference |
| `use_lite` (Only for Intel x86 CPU or ARM CPU) | - | - | Run PaddleLite inference |
| `use_xpu` | - | - | Run PaddleLite inference with Baidu Kunlun XPU |
| `precision` | str | FP32 | Precision Mode, support FP32, FP16, INT8 |
</center>
......
# Paddle Serving Using Baidu Kunlun Chips
(English|[简体中文](./BAIDU_KUNLUN_XPU_SERVING_CN.md))
Paddle serving supports deployment using Baidu Kunlun chips. At present, the pilot support is deployed on the ARM server with Baidu Kunlun chips
(such as Phytium FT-2000+/64). We will improve
Paddle serving supports deployment using Baidu Kunlun chips. Currently, it supports deployment on the ARM CPU server with Baidu Kunlun chips
(such as Phytium FT-2000+/64), or Intel CPU with Baidu Kunlun chips. We will improve
the deployment capability on various heterogeneous hardware servers in the future.
# Compilation and installation
Refer to [compile](COMPILE.md) document to setup the compilation environment
Refer to [compile](COMPILE.md) document to setup the compilation environment. The following is based on FeiTeng FT-2000 +/64 platform.
## Compilatiton
* Compile the Serving Server
```
......@@ -54,11 +54,11 @@ make -j10
```
## Install the wheel package
After the compilations stages above, the whl package will be generated in ```python/dist/``` under the specific temporary directories.
For example, after the Server Compiation step,the whl package will be produced under the server-build-arm/python/dist directory, and you can run ```pip install -u python/dist/*.whl``` to install the package
For example, after the Server Compiation step,the whl package will be produced under the server-build-arm/python/dist directory, and you can run ```pip install -u python/dist/*.whl``` to install the package.
# Request parameters description
In order to deploy serving
service on the arm server with Baidu Kunlun xpu chips and use the acceleration capability of Paddle-Lite,please specify the following parameters during deployment
service on the arm server with Baidu Kunlun xpu chips and use the acceleration capability of Paddle-Lite,please specify the following parameters during deployment.
| param | param description | about |
| :------- | :------------------------------- | :----------------------------------------------------------------- |
| use_lite | using Paddle-Lite Engine | use the inference capability of Paddle-Lite |
......@@ -72,23 +72,23 @@ tar -xzf uci_housing.tar.gz
```
## Start RPC service
There are mainly three deployment methods:
* deploy on the ARM server with Baidu xpu using the acceleration capability of Paddle-Lite and xpu;
* deploy on the ARM server standalone with Paddle-Lite;
* deploy on the ARM server standalone without Paddle-Lite。
* deploy on the cpu server with Baidu xpu using the acceleration capability of Paddle-Lite and xpu;
* deploy on the cpu server standalone with Paddle-Lite;
* deploy on the cpu server standalone without Paddle-Lite.
The first two deployment methods are recommended
The first two deployment methods are recommended.
Start the rpc service, deploying on ARM server with Baidu Kunlun chips,and accelerate with Paddle-Lite and Baidu Kunlun xpu.
Start the rpc service, deploying on cpu server with Baidu Kunlun chips,and accelerate with Paddle-Lite and Baidu Kunlun xpu.
```
python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --use_xpu --ir_optim
python3 -m paddle_serving_server.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --use_xpu --ir_optim
```
Start the rpc service, deploying on ARM server,and accelerate with Paddle-Lite.
Start the rpc service, deploying on cpu server,and accelerate with Paddle-Lite.
```
python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --ir_optim
python3 -m paddle_serving_server.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --ir_optim
```
Start the rpc service, deploying on ARM server.
Start the rpc service, deploying on cpu server.
```
python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292
python3 -m paddle_serving_server.serve --model uci_housing_model --thread 6 --port 9292
```
##
```
......@@ -102,8 +102,17 @@ data = [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727,
fetch_map = client.predict(feed={"x": np.array(data).reshape(1,13,1)}, fetch=["price"])
print(fetch_map)
```
Some examples are provided below, and other models can be modifed with reference to these examples。
# Others
## Model example and explanation
Some examples are provided below, and other models can be modifed with reference to these examples.
| sample name | sample links |
| :---------- | :---------------------------------------------------------- |
| fit_a_line | [fit_a_line_xpu](../python/examples/xpu/fit_a_line_xpu) |
| resnet | [resnet_v2_50_xpu](../python/examples/xpu/resnet_v2_50_xpu) |
Note:Supported model lists refer to [doc](https://paddlelite.paddlepaddle.org.cn/introduction/support_model_list.html). There are differences in the adaptation of different models, and there may be some unsupported cases. If you have any problem,please submit [Github issue](https://github.com/PaddlePaddle/Serving/issues), and we will follow up in real time.
## Kunlun chip related reference materials
* [PaddlePaddle on Baidu Kunlun xpu chips](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/xpu_docs/index_cn.html)
* [Deployment on Baidu Kunlun xpu chips using PaddleLite](https://paddlelite.paddlepaddle.org.cn/demo_guides/baidu_xpu.html)
# Paddle Serving使用百度昆仑芯片部署
(简体中文|[English](./BAIDU_KUNLUN_XPU_SERVING.md))
Paddle Serving支持使用百度昆仑芯片进行预测部署。目前试验性支持在百度昆仑芯片和arm服务器(如飞腾 FT-2000+/64)上进行部署,后续完善对其他异构硬件服务器部署能力。
Paddle Serving支持使用百度昆仑芯片进行预测部署。目前支持在百度昆仑芯片和arm服务器(如飞腾 FT-2000+/64), 或者百度昆仑芯片和Intel CPU服务器,上进行部署,后续完善对其他异构硬件服务器部署能力。
# 编译、安装
基本环境配置可参考[该文档](COMPILE_CN.md)进行配置。
基本环境配置可参考[该文档](COMPILE_CN.md)进行配置。下面以飞腾FT-2000+/64机器为例进行介绍。
## 编译
* 编译server部分
```
......@@ -20,7 +20,7 @@ cmake -DPYTHON_INCLUDE_DIR=/usr/include/python3.7m/ \
-DSERVER=ON ..
make -j10
```
可以执行`make install`把目标产出放在`./output`目录下,cmake阶段需添加`-DCMAKE_INSTALL_PREFIX=./output`选项来指定存放路径。
可以执行`make install`把目标产出放在`./output`目录下,cmake阶段需添加`-DCMAKE_INSTALL_PREFIX=./output`选项来指定存放路径。在支持AVX2指令集的Intel CPU平台上请指定```-DWITH_MKL=ON```编译选项。
* 编译client部分
```
mkdir -p client-build-arm && cd client-build-arm
......@@ -55,11 +55,11 @@ make -j10
# 请求参数说明
为了支持arm+xpu服务部署,使用Paddle-Lite加速能力,请求时需使用以下参数。
|参数|参数说明|备注|
|:--|:--|:--|
|use_lite|使用Paddle-Lite Engine|使用Paddle-Lite cpu预测能力|
|use_xpu|使用Baidu Kunlun进行预测|该选项需要与use_lite配合使用|
|ir_optim|开启Paddle-Lite计算子图优化|详细见[Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite)|
| 参数 | 参数说明 | 备注 |
| :------- | :-------------------------- | :--------------------------------------------------------------- |
| use_lite | 使用Paddle-Lite Engine | 使用Paddle-Lite cpu预测能力 |
| use_xpu | 使用Baidu Kunlun进行预测 | 该选项需要与use_lite配合使用 |
| ir_optim | 开启Paddle-Lite计算子图优化 | 详细见[Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite) |
# 部署使用示例
## 下载模型
```
......@@ -68,23 +68,23 @@ tar -xzf uci_housing.tar.gz
```
## 启动rpc服务
主要有三种启动配置:
* 使用arm cpu+xpu部署,使用Paddle-Lite xpu优化加速能力;
* 单独使用arm cpu部署,使用Paddle-Lite优化加速能力;
* 使用arm cpu部署,不使用Paddle-Lite加速。
* 使用cpu+xpu部署,使用Paddle-Lite xpu优化加速能力;
* 单独使用cpu部署,使用Paddle-Lite优化加速能力;
* 使用cpu部署,不使用Paddle-Lite加速。
推荐使用前两种部署方式。
启动rpc服务,使用arm cpu+xpu部署,使用Paddle-Lite xpu优化加速能力
启动rpc服务,使用cpu+xpu部署,使用Paddle-Lite xpu优化加速能力
```
python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --use_xpu --ir_optim
python3 -m paddle_serving_server.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --use_xpu --ir_optim
```
启动rpc服务,使用arm cpu部署, 使用Paddle-Lite加速能力
启动rpc服务,使用cpu部署, 使用Paddle-Lite加速能力
```
python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --ir_optim
python3 -m paddle_serving_server.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --ir_optim
```
启动rpc服务,使用arm cpu部署, 不使用Paddle-Lite加速能力
启动rpc服务,使用cpu部署, 不使用Paddle-Lite加速能力
```
python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292
python3 -m paddle_serving_server.serve --model uci_housing_model --thread 6 --port 9292
```
## client调用
```
......@@ -98,8 +98,16 @@ data = [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727,
fetch_map = client.predict(feed={"x": np.array(data).reshape(1,13,1)}, fetch=["price"])
print(fetch_map)
```
# 其他说明
## 模型实例及说明
以下提供部分样例,其他模型可参照进行修改。
|示例名称|示例链接|
|:-----|:--|
|fit_a_line|[fit_a_line_xpu](../python/examples/xpu/fit_a_line_xpu)|
|resnet|[resnet_v2_50_xpu](../python/examples/xpu/resnet_v2_50_xpu)|
| 示例名称 | 示例链接 |
| :--------- | :---------------------------------------------------------- |
| fit_a_line | [fit_a_line_xpu](../python/examples/xpu/fit_a_line_xpu) |
| resnet | [resnet_v2_50_xpu](../python/examples/xpu/resnet_v2_50_xpu) |
注:支持昆仑芯片部署模型列表见[链接](https://paddlelite.paddlepaddle.org.cn/introduction/support_model_list.html)。不同模型适配上存在差异,可能存在不支持的情况,部署使用存在问题时,欢迎以[Github issue](https://github.com/PaddlePaddle/Serving/issues),我们会实时跟进。
## 昆仑芯片支持相关参考资料
* [昆仑XPU芯片运行飞桨](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/xpu_docs/index_cn.html)
* [PaddleLite使用百度XPU预测部署](https://paddlelite.paddlepaddle.org.cn/demo_guides/baidu_xpu.html)
......@@ -52,7 +52,7 @@ python -m paddle_serving_server.serve --model bert_seq128_model/ --port 9292 #c
```
Or,start gpu inference service,Run
```
python -m paddle_serving_server_gpu.serve --model bert_seq128_model/ --port 9292 --gpu_ids 0 #launch gpu inference service at GPU 0
python -m paddle_serving_server.serve --model bert_seq128_model/ --port 9292 --gpu_ids 0 #launch gpu inference service at GPU 0
```
| Parameters | Meaning |
| ---------- | ---------------------------------------- |
......
......@@ -50,7 +50,7 @@ python -m paddle_serving_server.serve --model bert_seq128_model/ --port 9292 #
```
或者,启动gpu预测服务,执行
```
python -m paddle_serving_server_gpu.serve --model bert_seq128_model/ --port 9292 --gpu_ids 0 #在gpu 0上启动gpu预测服务
python -m paddle_serving_server.serve --model bert_seq128_model/ --port 9292 --gpu_ids 0 #在gpu 0上启动gpu预测服务
```
......
......@@ -117,13 +117,13 @@ Compared with CPU environment, GPU environment needs to refer to the following t
| CUDA_CUDART_LIBRARY | The directory where libcudart.so.* is located, usually /usr/local/cuda/lib64/ | Required for all environments | No (/usr/local/cuda/lib64/) |
| TENSORRT_ROOT | The upper level directory of the directory where libnvinfer.so.* is located, depends on the TensorRT installation directory | Cuda 9.0/10.0 does not need, other needs | No (/usr) |
If not in Docker environment, users can refer to the following execution methods. The specific path is subject to the current environment, and the code is only for reference.TENSORRT_LIBRARY_PATH is related to the TensorRT version and should be set according to the actual situation。For example, in the cuda10.1 environment, the TensorRT version is 6.0 (/usr/local/TensorRT-6.0.1.5/targets/x86_64-linux-gnu/),In the cuda10.2 environment, the TensorRT version is 7.1 (/usr/local/TensorRT-7.1.3.4/targets/x86_64-linux-gnu/).
If not in Docker environment, users can refer to the following execution methods. The specific path is subject to the current environment, and the code is only for reference.TENSORRT_LIBRARY_PATH is related to the TensorRT version and should be set according to the actual situation。For example, in the cuda10.1 environment, the TensorRT version is 6.0 (/usr/local/TensorRT6-cuda10.1-cudnn7/targets/x86_64-linux-gnu/),In the cuda10.2 and cuda11.0 environment, the TensorRT version is 7.1 (/usr/local/TensorRT-7.1.3.4/targets/x86_64-linux-gnu/).
``` shell
export CUDA_PATH='/usr/local/cuda'
export CUDNN_LIBRARY='/usr/local/cuda/lib64/'
export CUDA_CUDART_LIBRARY="/usr/local/cuda/lib64/"
export TENSORRT_LIBRARY_PATH="/usr/local/TensorRT-6.0.1.5/targets/x86_64-linux-gnu/"
export TENSORRT_LIBRARY_PATH="/usr/local/TensorRT6-cuda10.1-cudnn7/targets/x86_64-linux-gnu/"
mkdir server-build-gpu && cd server-build-gpu
cmake -DPYTHON_INCLUDE_DIR=$PYTHON_INCLUDE_DIR \
......
......@@ -116,13 +116,13 @@ make -j10
| CUDA_CUDART_LIBRARY | libcudart.so.*所在目录,通常为/usr/local/cuda/lib64/ | 全部环境都需要 | 否(/usr/local/cuda/lib64/) |
| TENSORRT_ROOT | libnvinfer.so.*所在目录的上一级目录,取决于TensorRT安装目录 | Cuda 9.0/10.0不需要,其他需要 | 否(/usr) |
非Docker环境下,用户可以参考如下执行方式,具体的路径以当时环境为准,代码仅作为参考。TENSORRT_LIBRARY_PATH和TensorRT版本有关,要根据实际情况设置。例如在cuda10.1环境下TensorRT版本是6.0(/usr/local/TensorRT-6.0.1.5/targets/x86_64-linux-gnu/),在cuda10.2环境下TensorRT版本是7.1(/usr/local/TensorRT-7.1.3.4/targets/x86_64-linux-gnu/)。
非Docker环境下,用户可以参考如下执行方式,具体的路径以当时环境为准,代码仅作为参考。TENSORRT_LIBRARY_PATH和TensorRT版本有关,要根据实际情况设置。例如在cuda10.1环境下TensorRT版本是6.0(/usr/local/TensorRT6-cuda10.1-cudnn7/targets/x86_64-linux-gnu/),在cuda10.2和cuda11.0环境下TensorRT版本是7.1(/usr/local/TensorRT-7.1.3.4/targets/x86_64-linux-gnu/)。
``` shell
export CUDA_PATH='/usr/local/cuda'
export CUDNN_LIBRARY='/usr/local/cuda/lib64/'
export CUDA_CUDART_LIBRARY="/usr/local/cuda/lib64/"
export TENSORRT_LIBRARY_PATH="/usr/local/TensorRT-6.0.1.5/targets/x86_64-linux-gnu/"
export TENSORRT_LIBRARY_PATH="/usr/local/TensorRT6-cuda10.1-cudnn7/targets/x86_64-linux-gnu/"
mkdir server-build-gpu && cd server-build-gpu
cmake -DPYTHON_INCLUDE_DIR=$PYTHON_INCLUDE_DIR \
......
......@@ -25,7 +25,7 @@ python -m paddle_serving_server.serve --model encrypt_server/ --port 9300 --use_
```
GPU Service
```
python -m paddle_serving_server_gpu.serve --model encrypt_server/ --port 9300 --use_encryption_model --gpu_ids 0
python -m paddle_serving_server.serve --model encrypt_server/ --port 9300 --use_encryption_model --gpu_ids 0
```
At this point, the server does not really start, but waits for the key。
......
......@@ -25,7 +25,7 @@ python -m paddle_serving_server.serve --model encrypt_server/ --port 9300 --use_
```
GPU Service
```
python -m paddle_serving_server_gpu.serve --model encrypt_server/ --port 9300 --use_encryption_model --gpu_ids 0
python -m paddle_serving_server.serve --model encrypt_server/ --port 9300 --use_encryption_model --gpu_ids 0
```
此时,服务器不会真正启动,而是等待密钥。
......
......@@ -5,8 +5,8 @@
例如:
```shell
python -m paddle_serving_server_gpu.serve --model bert_seq128_model --port 9292 --gpu_ids 0
python -m paddle_serving_server_gpu.serve --model ResNet50_vd_model --port 9393 --gpu_ids 0
python -m paddle_serving_server.serve --model bert_seq128_model --port 9292 --gpu_ids 0
python -m paddle_serving_server.serve --model ResNet50_vd_model --port 9393 --gpu_ids 0
```
在卡0上,同时部署了bert示例和iamgenet示例。
......
......@@ -38,7 +38,7 @@ We can see that the `serving_server` and `serving_client` folders hold the serve
Start the server (GPU)
```
python -m paddle_serving_server_gpu.serve --model serving_server --port 9393 --gpu_id 0
python -m paddle_serving_server.serve --model serving_server --port 9393 --gpu_id 0
```
Client (`test_client.py`)
......
......@@ -37,7 +37,7 @@ python -m paddle_serving_client.convert --dirname . --model_filename dygraph_mod
启动服务端(GPU)
```
python -m paddle_serving_server_gpu.serve --model serving_server --port 9393 --gpu_id 0
python -m paddle_serving_server.serve --model serving_server --port 9393 --gpu_id 0
```
客户端写法,保存为`test_client.py`
......
......@@ -50,7 +50,7 @@ We just need
```
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/pddet_demo/2.0/faster_rcnn_r50_fpn_1x_coco.tar
tar xf faster_rcnn_r50_fpn_1x_coco.tar
python -m paddle_serving_server_gpu.serve --model serving_server --port 9494 --gpu_ids 0 --use_trt
python -m paddle_serving_server.serve --model serving_server --port 9494 --gpu_ids 0 --use_trt
```
The TensorRT version of the faster_rcnn model server is started
......
......@@ -50,7 +50,7 @@ pip install paddle-server-server==${VERSION}.post11
```
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/pddet_demo/2.0/faster_rcnn_r50_fpn_1x_coco.tar
tar xf faster_rcnn_r50_fpn_1x_coco.tar
python -m paddle_serving_server_gpu.serve --model serving_server --port 9494 --gpu_ids 0 --use_trt
python -m paddle_serving_server.serve --model serving_server --port 9494 --gpu_ids 0 --use_trt
```
TensorRT版本的faster_rcnn模型服务端就启动了
......
......@@ -54,7 +54,7 @@ Currently Windows supports the Local Predictor of the Web Service framework. The
```
# filename:your_webservice.py
from paddle_serving_server.web_service import WebService
# If it is the GPU version, please use from paddle_serving_server_gpu.web_service import WebService
# If it is the GPU version, please use from paddle_serving_server.web_service import WebService
class YourWebService(WebService):
def preprocess(self, feed=[], fetch=[]):
#Implement pre-processing here
......
......@@ -54,7 +54,7 @@ python ocr_web_client.py
```
# filename:your_webservice.py
from paddle_serving_server.web_service import WebService
# 如果是GPU版本,请使用 from paddle_serving_server_gpu.web_service import WebService
# 如果是GPU版本,请使用 from paddle_serving_server.web_service import WebService
class YourWebService(WebService):
def preprocess(self, feed=[], fetch=[]):
#在这里实现前处理
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册