未验证 提交 c46ea3d4 编写于 作者: J Jiawei Wang 提交者: GitHub

Merge branch 'develop' into newtest

...@@ -166,16 +166,18 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po ...@@ -166,16 +166,18 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po
<center> <center>
| Argument | Type | Default | Description | | Argument | Type | Default | Description |
|--------------|------|-----------|--------------------------------| | ---------------------------------------------- | ---- | ------- | ----------------------------------------------------- |
| `thread` | int | `4` | Concurrency of current service | | `thread` | int | `4` | Concurrency of current service |
| `port` | int | `9292` | Exposed port of current service to users| | `port` | int | `9292` | Exposed port of current service to users |
| `model` | str | `""` | Path of paddle model directory to be served | | `model` | str | `""` | Path of paddle model directory to be served |
| `mem_optim_off` | - | - | Disable memory / graphic memory optimization | | `mem_optim_off` | - | - | Disable memory / graphic memory optimization |
| `ir_optim` | - | - | Enable analysis and optimization of calculation graph | | `ir_optim` | bool | False | Enable analysis and optimization of calculation graph |
| `use_mkl` (Only for cpu version) | - | - | Run inference with MKL | | `use_mkl` (Only for cpu version) | - | - | Run inference with MKL |
| `use_trt` (Only for trt version) | - | - | Run inference with TensorRT | | `use_trt` (Only for trt version) | - | - | Run inference with TensorRT |
| `use_lite` (Only for ARM) | - | - | Run PaddleLite inference | | `use_lite` (Only for Intel x86 CPU or ARM CPU) | - | - | Run PaddleLite inference |
| `use_xpu` (Only for ARM+XPU) | - | - | Run PaddleLite XPU inference | | `use_xpu` | - | - | Run PaddleLite inference with Baidu Kunlun XPU |
| `precision` | str | FP32 | Precision Mode, support FP32, FP16, INT8 |
| `use_calib` | bool | False | Only for deployment with TensorRT |
</center> </center>
......
...@@ -165,17 +165,18 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po ...@@ -165,17 +165,18 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po
<center> <center>
| Argument | Type | Default | Description | | Argument | Type | Default | Description |
|--------------|------|-----------|--------------------------------| | ---------------------------------------------- | ---- | ------- | ------------------------------------------------------ |
| `thread` | int | `4` | Concurrency of current service | | `thread` | int | `4` | Concurrency of current service |
| `port` | int | `9292` | Exposed port of current service to users| | `port` | int | `9292` | Exposed port of current service to users |
| `name` | str | `""` | Service name, can be used to generate HTTP request url | | `name` | str | `""` | Service name, can be used to generate HTTP request url |
| `model` | str | `""` | Path of paddle model directory to be served | | `model` | str | `""` | Path of paddle model directory to be served |
| `mem_optim_off` | - | - | Disable memory optimization | | `mem_optim_off` | - | - | Disable memory optimization |
| `ir_optim` | - | - | Enable analysis and optimization of calculation graph | | `ir_optim` | bool | False | Enable analysis and optimization of calculation graph |
| `use_mkl` (Only for cpu version) | - | - | Run inference with MKL | | `use_mkl` (Only for cpu version) | - | - | Run inference with MKL |
| `use_trt` (Only for Cuda>=10.1 version) | - | - | Run inference with TensorRT | | `use_trt` (Only for Cuda>=10.1 version) | - | - | Run inference with TensorRT |
| `use_lite` (Only for ARM) | - | - | Run PaddleLite inference | | `use_lite` (Only for Intel x86 CPU or ARM CPU) | - | - | Run PaddleLite inference |
| `use_xpu` (Only for ARM+XPU) | - | - | Run PaddleLite XPU inference | | `use_xpu` | - | - | Run PaddleLite inference with Baidu Kunlun XPU |
| `precision` | str | FP32 | Precision Mode, support FP32, FP16, INT8 |
</center> </center>
......
# Paddle Serving Using Baidu Kunlun Chips # Paddle Serving Using Baidu Kunlun Chips
(English|[简体中文](./BAIDU_KUNLUN_XPU_SERVING_CN.md)) (English|[简体中文](./BAIDU_KUNLUN_XPU_SERVING_CN.md))
Paddle serving supports deployment using Baidu Kunlun chips. At present, the pilot support is deployed on the ARM server with Baidu Kunlun chips Paddle serving supports deployment using Baidu Kunlun chips. Currently, it supports deployment on the ARM CPU server with Baidu Kunlun chips
(such as Phytium FT-2000+/64). We will improve (such as Phytium FT-2000+/64), or Intel CPU with Baidu Kunlun chips. We will improve
the deployment capability on various heterogeneous hardware servers in the future. the deployment capability on various heterogeneous hardware servers in the future.
# Compilation and installation # Compilation and installation
Refer to [compile](COMPILE.md) document to setup the compilation environment Refer to [compile](COMPILE.md) document to setup the compilation environment. The following is based on FeiTeng FT-2000 +/64 platform.
## Compilatiton ## Compilatiton
* Compile the Serving Server * Compile the Serving Server
``` ```
...@@ -54,11 +54,11 @@ make -j10 ...@@ -54,11 +54,11 @@ make -j10
``` ```
## Install the wheel package ## Install the wheel package
After the compilations stages above, the whl package will be generated in ```python/dist/``` under the specific temporary directories. After the compilations stages above, the whl package will be generated in ```python/dist/``` under the specific temporary directories.
For example, after the Server Compiation step,the whl package will be produced under the server-build-arm/python/dist directory, and you can run ```pip install -u python/dist/*.whl``` to install the package For example, after the Server Compiation step,the whl package will be produced under the server-build-arm/python/dist directory, and you can run ```pip install -u python/dist/*.whl``` to install the package.
# Request parameters description # Request parameters description
In order to deploy serving In order to deploy serving
service on the arm server with Baidu Kunlun xpu chips and use the acceleration capability of Paddle-Lite,please specify the following parameters during deployment service on the arm server with Baidu Kunlun xpu chips and use the acceleration capability of Paddle-Lite,please specify the following parameters during deployment.
| param | param description | about | | param | param description | about |
| :------- | :------------------------------- | :----------------------------------------------------------------- | | :------- | :------------------------------- | :----------------------------------------------------------------- |
| use_lite | using Paddle-Lite Engine | use the inference capability of Paddle-Lite | | use_lite | using Paddle-Lite Engine | use the inference capability of Paddle-Lite |
...@@ -72,23 +72,23 @@ tar -xzf uci_housing.tar.gz ...@@ -72,23 +72,23 @@ tar -xzf uci_housing.tar.gz
``` ```
## Start RPC service ## Start RPC service
There are mainly three deployment methods: There are mainly three deployment methods:
* deploy on the ARM server with Baidu xpu using the acceleration capability of Paddle-Lite and xpu; * deploy on the cpu server with Baidu xpu using the acceleration capability of Paddle-Lite and xpu;
* deploy on the ARM server standalone with Paddle-Lite; * deploy on the cpu server standalone with Paddle-Lite;
* deploy on the ARM server standalone without Paddle-Lite。 * deploy on the cpu server standalone without Paddle-Lite.
The first two deployment methods are recommended The first two deployment methods are recommended.
Start the rpc service, deploying on ARM server with Baidu Kunlun chips,and accelerate with Paddle-Lite and Baidu Kunlun xpu. Start the rpc service, deploying on cpu server with Baidu Kunlun chips,and accelerate with Paddle-Lite and Baidu Kunlun xpu.
``` ```
python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --use_xpu --ir_optim python3 -m paddle_serving_server.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --use_xpu --ir_optim
``` ```
Start the rpc service, deploying on ARM server,and accelerate with Paddle-Lite. Start the rpc service, deploying on cpu server,and accelerate with Paddle-Lite.
``` ```
python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --ir_optim python3 -m paddle_serving_server.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --ir_optim
``` ```
Start the rpc service, deploying on ARM server. Start the rpc service, deploying on cpu server.
``` ```
python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292 python3 -m paddle_serving_server.serve --model uci_housing_model --thread 6 --port 9292
``` ```
## ##
``` ```
...@@ -102,8 +102,17 @@ data = [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, ...@@ -102,8 +102,17 @@ data = [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727,
fetch_map = client.predict(feed={"x": np.array(data).reshape(1,13,1)}, fetch=["price"]) fetch_map = client.predict(feed={"x": np.array(data).reshape(1,13,1)}, fetch=["price"])
print(fetch_map) print(fetch_map)
``` ```
Some examples are provided below, and other models can be modifed with reference to these examples。 # Others
## Model example and explanation
Some examples are provided below, and other models can be modifed with reference to these examples.
| sample name | sample links | | sample name | sample links |
| :---------- | :---------------------------------------------------------- | | :---------- | :---------------------------------------------------------- |
| fit_a_line | [fit_a_line_xpu](../python/examples/xpu/fit_a_line_xpu) | | fit_a_line | [fit_a_line_xpu](../python/examples/xpu/fit_a_line_xpu) |
| resnet | [resnet_v2_50_xpu](../python/examples/xpu/resnet_v2_50_xpu) | | resnet | [resnet_v2_50_xpu](../python/examples/xpu/resnet_v2_50_xpu) |
Note:Supported model lists refer to [doc](https://paddlelite.paddlepaddle.org.cn/introduction/support_model_list.html). There are differences in the adaptation of different models, and there may be some unsupported cases. If you have any problem,please submit [Github issue](https://github.com/PaddlePaddle/Serving/issues), and we will follow up in real time.
## Kunlun chip related reference materials
* [PaddlePaddle on Baidu Kunlun xpu chips](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/xpu_docs/index_cn.html)
* [Deployment on Baidu Kunlun xpu chips using PaddleLite](https://paddlelite.paddlepaddle.org.cn/demo_guides/baidu_xpu.html)
# Paddle Serving使用百度昆仑芯片部署 # Paddle Serving使用百度昆仑芯片部署
(简体中文|[English](./BAIDU_KUNLUN_XPU_SERVING.md)) (简体中文|[English](./BAIDU_KUNLUN_XPU_SERVING.md))
Paddle Serving支持使用百度昆仑芯片进行预测部署。目前试验性支持在百度昆仑芯片和arm服务器(如飞腾 FT-2000+/64)上进行部署,后续完善对其他异构硬件服务器部署能力。 Paddle Serving支持使用百度昆仑芯片进行预测部署。目前支持在百度昆仑芯片和arm服务器(如飞腾 FT-2000+/64), 或者百度昆仑芯片和Intel CPU服务器,上进行部署,后续完善对其他异构硬件服务器部署能力。
# 编译、安装 # 编译、安装
基本环境配置可参考[该文档](COMPILE_CN.md)进行配置。 基本环境配置可参考[该文档](COMPILE_CN.md)进行配置。下面以飞腾FT-2000+/64机器为例进行介绍。
## 编译 ## 编译
* 编译server部分 * 编译server部分
``` ```
...@@ -20,7 +20,7 @@ cmake -DPYTHON_INCLUDE_DIR=/usr/include/python3.7m/ \ ...@@ -20,7 +20,7 @@ cmake -DPYTHON_INCLUDE_DIR=/usr/include/python3.7m/ \
-DSERVER=ON .. -DSERVER=ON ..
make -j10 make -j10
``` ```
可以执行`make install`把目标产出放在`./output`目录下,cmake阶段需添加`-DCMAKE_INSTALL_PREFIX=./output`选项来指定存放路径。 可以执行`make install`把目标产出放在`./output`目录下,cmake阶段需添加`-DCMAKE_INSTALL_PREFIX=./output`选项来指定存放路径。在支持AVX2指令集的Intel CPU平台上请指定```-DWITH_MKL=ON```编译选项。
* 编译client部分 * 编译client部分
``` ```
mkdir -p client-build-arm && cd client-build-arm mkdir -p client-build-arm && cd client-build-arm
...@@ -55,11 +55,11 @@ make -j10 ...@@ -55,11 +55,11 @@ make -j10
# 请求参数说明 # 请求参数说明
为了支持arm+xpu服务部署,使用Paddle-Lite加速能力,请求时需使用以下参数。 为了支持arm+xpu服务部署,使用Paddle-Lite加速能力,请求时需使用以下参数。
|参数|参数说明|备注| | 参数 | 参数说明 | 备注 |
|:--|:--|:--| | :------- | :-------------------------- | :--------------------------------------------------------------- |
|use_lite|使用Paddle-Lite Engine|使用Paddle-Lite cpu预测能力| | use_lite | 使用Paddle-Lite Engine | 使用Paddle-Lite cpu预测能力 |
|use_xpu|使用Baidu Kunlun进行预测|该选项需要与use_lite配合使用| | use_xpu | 使用Baidu Kunlun进行预测 | 该选项需要与use_lite配合使用 |
|ir_optim|开启Paddle-Lite计算子图优化|详细见[Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite)| | ir_optim | 开启Paddle-Lite计算子图优化 | 详细见[Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite) |
# 部署使用示例 # 部署使用示例
## 下载模型 ## 下载模型
``` ```
...@@ -68,23 +68,23 @@ tar -xzf uci_housing.tar.gz ...@@ -68,23 +68,23 @@ tar -xzf uci_housing.tar.gz
``` ```
## 启动rpc服务 ## 启动rpc服务
主要有三种启动配置: 主要有三种启动配置:
* 使用arm cpu+xpu部署,使用Paddle-Lite xpu优化加速能力; * 使用cpu+xpu部署,使用Paddle-Lite xpu优化加速能力;
* 单独使用arm cpu部署,使用Paddle-Lite优化加速能力; * 单独使用cpu部署,使用Paddle-Lite优化加速能力;
* 使用arm cpu部署,不使用Paddle-Lite加速。 * 使用cpu部署,不使用Paddle-Lite加速。
推荐使用前两种部署方式。 推荐使用前两种部署方式。
启动rpc服务,使用arm cpu+xpu部署,使用Paddle-Lite xpu优化加速能力 启动rpc服务,使用cpu+xpu部署,使用Paddle-Lite xpu优化加速能力
``` ```
python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --use_xpu --ir_optim python3 -m paddle_serving_server.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --use_xpu --ir_optim
``` ```
启动rpc服务,使用arm cpu部署, 使用Paddle-Lite加速能力 启动rpc服务,使用cpu部署, 使用Paddle-Lite加速能力
``` ```
python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --ir_optim python3 -m paddle_serving_server.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --ir_optim
``` ```
启动rpc服务,使用arm cpu部署, 不使用Paddle-Lite加速能力 启动rpc服务,使用cpu部署, 不使用Paddle-Lite加速能力
``` ```
python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292 python3 -m paddle_serving_server.serve --model uci_housing_model --thread 6 --port 9292
``` ```
## client调用 ## client调用
``` ```
...@@ -98,8 +98,16 @@ data = [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, ...@@ -98,8 +98,16 @@ data = [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727,
fetch_map = client.predict(feed={"x": np.array(data).reshape(1,13,1)}, fetch=["price"]) fetch_map = client.predict(feed={"x": np.array(data).reshape(1,13,1)}, fetch=["price"])
print(fetch_map) print(fetch_map)
``` ```
# 其他说明
## 模型实例及说明
以下提供部分样例,其他模型可参照进行修改。 以下提供部分样例,其他模型可参照进行修改。
|示例名称|示例链接| | 示例名称 | 示例链接 |
|:-----|:--| | :--------- | :---------------------------------------------------------- |
|fit_a_line|[fit_a_line_xpu](../python/examples/xpu/fit_a_line_xpu)| | fit_a_line | [fit_a_line_xpu](../python/examples/xpu/fit_a_line_xpu) |
|resnet|[resnet_v2_50_xpu](../python/examples/xpu/resnet_v2_50_xpu)| | resnet | [resnet_v2_50_xpu](../python/examples/xpu/resnet_v2_50_xpu) |
注:支持昆仑芯片部署模型列表见[链接](https://paddlelite.paddlepaddle.org.cn/introduction/support_model_list.html)。不同模型适配上存在差异,可能存在不支持的情况,部署使用存在问题时,欢迎以[Github issue](https://github.com/PaddlePaddle/Serving/issues),我们会实时跟进。
## 昆仑芯片支持相关参考资料
* [昆仑XPU芯片运行飞桨](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/xpu_docs/index_cn.html)
* [PaddleLite使用百度XPU预测部署](https://paddlelite.paddlepaddle.org.cn/demo_guides/baidu_xpu.html)
...@@ -52,7 +52,7 @@ python -m paddle_serving_server.serve --model bert_seq128_model/ --port 9292 #c ...@@ -52,7 +52,7 @@ python -m paddle_serving_server.serve --model bert_seq128_model/ --port 9292 #c
``` ```
Or,start gpu inference service,Run Or,start gpu inference service,Run
``` ```
python -m paddle_serving_server_gpu.serve --model bert_seq128_model/ --port 9292 --gpu_ids 0 #launch gpu inference service at GPU 0 python -m paddle_serving_server.serve --model bert_seq128_model/ --port 9292 --gpu_ids 0 #launch gpu inference service at GPU 0
``` ```
| Parameters | Meaning | | Parameters | Meaning |
| ---------- | ---------------------------------------- | | ---------- | ---------------------------------------- |
......
...@@ -50,7 +50,7 @@ python -m paddle_serving_server.serve --model bert_seq128_model/ --port 9292 # ...@@ -50,7 +50,7 @@ python -m paddle_serving_server.serve --model bert_seq128_model/ --port 9292 #
``` ```
或者,启动gpu预测服务,执行 或者,启动gpu预测服务,执行
``` ```
python -m paddle_serving_server_gpu.serve --model bert_seq128_model/ --port 9292 --gpu_ids 0 #在gpu 0上启动gpu预测服务 python -m paddle_serving_server.serve --model bert_seq128_model/ --port 9292 --gpu_ids 0 #在gpu 0上启动gpu预测服务
``` ```
......
...@@ -117,13 +117,13 @@ Compared with CPU environment, GPU environment needs to refer to the following t ...@@ -117,13 +117,13 @@ Compared with CPU environment, GPU environment needs to refer to the following t
| CUDA_CUDART_LIBRARY | The directory where libcudart.so.* is located, usually /usr/local/cuda/lib64/ | Required for all environments | No (/usr/local/cuda/lib64/) | | CUDA_CUDART_LIBRARY | The directory where libcudart.so.* is located, usually /usr/local/cuda/lib64/ | Required for all environments | No (/usr/local/cuda/lib64/) |
| TENSORRT_ROOT | The upper level directory of the directory where libnvinfer.so.* is located, depends on the TensorRT installation directory | Cuda 9.0/10.0 does not need, other needs | No (/usr) | | TENSORRT_ROOT | The upper level directory of the directory where libnvinfer.so.* is located, depends on the TensorRT installation directory | Cuda 9.0/10.0 does not need, other needs | No (/usr) |
If not in Docker environment, users can refer to the following execution methods. The specific path is subject to the current environment, and the code is only for reference.TENSORRT_LIBRARY_PATH is related to the TensorRT version and should be set according to the actual situation。For example, in the cuda10.1 environment, the TensorRT version is 6.0 (/usr/local/TensorRT-6.0.1.5/targets/x86_64-linux-gnu/),In the cuda10.2 environment, the TensorRT version is 7.1 (/usr/local/TensorRT-7.1.3.4/targets/x86_64-linux-gnu/). If not in Docker environment, users can refer to the following execution methods. The specific path is subject to the current environment, and the code is only for reference.TENSORRT_LIBRARY_PATH is related to the TensorRT version and should be set according to the actual situation。For example, in the cuda10.1 environment, the TensorRT version is 6.0 (/usr/local/TensorRT6-cuda10.1-cudnn7/targets/x86_64-linux-gnu/),In the cuda10.2 and cuda11.0 environment, the TensorRT version is 7.1 (/usr/local/TensorRT-7.1.3.4/targets/x86_64-linux-gnu/).
``` shell ``` shell
export CUDA_PATH='/usr/local/cuda' export CUDA_PATH='/usr/local/cuda'
export CUDNN_LIBRARY='/usr/local/cuda/lib64/' export CUDNN_LIBRARY='/usr/local/cuda/lib64/'
export CUDA_CUDART_LIBRARY="/usr/local/cuda/lib64/" export CUDA_CUDART_LIBRARY="/usr/local/cuda/lib64/"
export TENSORRT_LIBRARY_PATH="/usr/local/TensorRT-6.0.1.5/targets/x86_64-linux-gnu/" export TENSORRT_LIBRARY_PATH="/usr/local/TensorRT6-cuda10.1-cudnn7/targets/x86_64-linux-gnu/"
mkdir server-build-gpu && cd server-build-gpu mkdir server-build-gpu && cd server-build-gpu
cmake -DPYTHON_INCLUDE_DIR=$PYTHON_INCLUDE_DIR \ cmake -DPYTHON_INCLUDE_DIR=$PYTHON_INCLUDE_DIR \
......
...@@ -116,13 +116,13 @@ make -j10 ...@@ -116,13 +116,13 @@ make -j10
| CUDA_CUDART_LIBRARY | libcudart.so.*所在目录,通常为/usr/local/cuda/lib64/ | 全部环境都需要 | 否(/usr/local/cuda/lib64/) | | CUDA_CUDART_LIBRARY | libcudart.so.*所在目录,通常为/usr/local/cuda/lib64/ | 全部环境都需要 | 否(/usr/local/cuda/lib64/) |
| TENSORRT_ROOT | libnvinfer.so.*所在目录的上一级目录,取决于TensorRT安装目录 | Cuda 9.0/10.0不需要,其他需要 | 否(/usr) | | TENSORRT_ROOT | libnvinfer.so.*所在目录的上一级目录,取决于TensorRT安装目录 | Cuda 9.0/10.0不需要,其他需要 | 否(/usr) |
非Docker环境下,用户可以参考如下执行方式,具体的路径以当时环境为准,代码仅作为参考。TENSORRT_LIBRARY_PATH和TensorRT版本有关,要根据实际情况设置。例如在cuda10.1环境下TensorRT版本是6.0(/usr/local/TensorRT-6.0.1.5/targets/x86_64-linux-gnu/),在cuda10.2环境下TensorRT版本是7.1(/usr/local/TensorRT-7.1.3.4/targets/x86_64-linux-gnu/)。 非Docker环境下,用户可以参考如下执行方式,具体的路径以当时环境为准,代码仅作为参考。TENSORRT_LIBRARY_PATH和TensorRT版本有关,要根据实际情况设置。例如在cuda10.1环境下TensorRT版本是6.0(/usr/local/TensorRT6-cuda10.1-cudnn7/targets/x86_64-linux-gnu/),在cuda10.2和cuda11.0环境下TensorRT版本是7.1(/usr/local/TensorRT-7.1.3.4/targets/x86_64-linux-gnu/)。
``` shell ``` shell
export CUDA_PATH='/usr/local/cuda' export CUDA_PATH='/usr/local/cuda'
export CUDNN_LIBRARY='/usr/local/cuda/lib64/' export CUDNN_LIBRARY='/usr/local/cuda/lib64/'
export CUDA_CUDART_LIBRARY="/usr/local/cuda/lib64/" export CUDA_CUDART_LIBRARY="/usr/local/cuda/lib64/"
export TENSORRT_LIBRARY_PATH="/usr/local/TensorRT-6.0.1.5/targets/x86_64-linux-gnu/" export TENSORRT_LIBRARY_PATH="/usr/local/TensorRT6-cuda10.1-cudnn7/targets/x86_64-linux-gnu/"
mkdir server-build-gpu && cd server-build-gpu mkdir server-build-gpu && cd server-build-gpu
cmake -DPYTHON_INCLUDE_DIR=$PYTHON_INCLUDE_DIR \ cmake -DPYTHON_INCLUDE_DIR=$PYTHON_INCLUDE_DIR \
......
...@@ -25,7 +25,7 @@ python -m paddle_serving_server.serve --model encrypt_server/ --port 9300 --use_ ...@@ -25,7 +25,7 @@ python -m paddle_serving_server.serve --model encrypt_server/ --port 9300 --use_
``` ```
GPU Service GPU Service
``` ```
python -m paddle_serving_server_gpu.serve --model encrypt_server/ --port 9300 --use_encryption_model --gpu_ids 0 python -m paddle_serving_server.serve --model encrypt_server/ --port 9300 --use_encryption_model --gpu_ids 0
``` ```
At this point, the server does not really start, but waits for the key。 At this point, the server does not really start, but waits for the key。
......
...@@ -25,7 +25,7 @@ python -m paddle_serving_server.serve --model encrypt_server/ --port 9300 --use_ ...@@ -25,7 +25,7 @@ python -m paddle_serving_server.serve --model encrypt_server/ --port 9300 --use_
``` ```
GPU Service GPU Service
``` ```
python -m paddle_serving_server_gpu.serve --model encrypt_server/ --port 9300 --use_encryption_model --gpu_ids 0 python -m paddle_serving_server.serve --model encrypt_server/ --port 9300 --use_encryption_model --gpu_ids 0
``` ```
此时,服务器不会真正启动,而是等待密钥。 此时,服务器不会真正启动,而是等待密钥。
......
...@@ -5,8 +5,8 @@ ...@@ -5,8 +5,8 @@
例如: 例如:
```shell ```shell
python -m paddle_serving_server_gpu.serve --model bert_seq128_model --port 9292 --gpu_ids 0 python -m paddle_serving_server.serve --model bert_seq128_model --port 9292 --gpu_ids 0
python -m paddle_serving_server_gpu.serve --model ResNet50_vd_model --port 9393 --gpu_ids 0 python -m paddle_serving_server.serve --model ResNet50_vd_model --port 9393 --gpu_ids 0
``` ```
在卡0上,同时部署了bert示例和iamgenet示例。 在卡0上,同时部署了bert示例和iamgenet示例。
......
...@@ -38,7 +38,7 @@ We can see that the `serving_server` and `serving_client` folders hold the serve ...@@ -38,7 +38,7 @@ We can see that the `serving_server` and `serving_client` folders hold the serve
Start the server (GPU) Start the server (GPU)
``` ```
python -m paddle_serving_server_gpu.serve --model serving_server --port 9393 --gpu_id 0 python -m paddle_serving_server.serve --model serving_server --port 9393 --gpu_id 0
``` ```
Client (`test_client.py`) Client (`test_client.py`)
......
...@@ -37,7 +37,7 @@ python -m paddle_serving_client.convert --dirname . --model_filename dygraph_mod ...@@ -37,7 +37,7 @@ python -m paddle_serving_client.convert --dirname . --model_filename dygraph_mod
启动服务端(GPU) 启动服务端(GPU)
``` ```
python -m paddle_serving_server_gpu.serve --model serving_server --port 9393 --gpu_id 0 python -m paddle_serving_server.serve --model serving_server --port 9393 --gpu_id 0
``` ```
客户端写法,保存为`test_client.py` 客户端写法,保存为`test_client.py`
......
...@@ -50,7 +50,7 @@ We just need ...@@ -50,7 +50,7 @@ We just need
``` ```
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/pddet_demo/2.0/faster_rcnn_r50_fpn_1x_coco.tar wget --no-check-certificate https://paddle-serving.bj.bcebos.com/pddet_demo/2.0/faster_rcnn_r50_fpn_1x_coco.tar
tar xf faster_rcnn_r50_fpn_1x_coco.tar tar xf faster_rcnn_r50_fpn_1x_coco.tar
python -m paddle_serving_server_gpu.serve --model serving_server --port 9494 --gpu_ids 0 --use_trt python -m paddle_serving_server.serve --model serving_server --port 9494 --gpu_ids 0 --use_trt
``` ```
The TensorRT version of the faster_rcnn model server is started The TensorRT version of the faster_rcnn model server is started
......
...@@ -50,7 +50,7 @@ pip install paddle-server-server==${VERSION}.post11 ...@@ -50,7 +50,7 @@ pip install paddle-server-server==${VERSION}.post11
``` ```
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/pddet_demo/2.0/faster_rcnn_r50_fpn_1x_coco.tar wget --no-check-certificate https://paddle-serving.bj.bcebos.com/pddet_demo/2.0/faster_rcnn_r50_fpn_1x_coco.tar
tar xf faster_rcnn_r50_fpn_1x_coco.tar tar xf faster_rcnn_r50_fpn_1x_coco.tar
python -m paddle_serving_server_gpu.serve --model serving_server --port 9494 --gpu_ids 0 --use_trt python -m paddle_serving_server.serve --model serving_server --port 9494 --gpu_ids 0 --use_trt
``` ```
TensorRT版本的faster_rcnn模型服务端就启动了 TensorRT版本的faster_rcnn模型服务端就启动了
......
...@@ -54,7 +54,7 @@ Currently Windows supports the Local Predictor of the Web Service framework. The ...@@ -54,7 +54,7 @@ Currently Windows supports the Local Predictor of the Web Service framework. The
``` ```
# filename:your_webservice.py # filename:your_webservice.py
from paddle_serving_server.web_service import WebService from paddle_serving_server.web_service import WebService
# If it is the GPU version, please use from paddle_serving_server_gpu.web_service import WebService # If it is the GPU version, please use from paddle_serving_server.web_service import WebService
class YourWebService(WebService): class YourWebService(WebService):
def preprocess(self, feed=[], fetch=[]): def preprocess(self, feed=[], fetch=[]):
#Implement pre-processing here #Implement pre-processing here
......
...@@ -54,7 +54,7 @@ python ocr_web_client.py ...@@ -54,7 +54,7 @@ python ocr_web_client.py
``` ```
# filename:your_webservice.py # filename:your_webservice.py
from paddle_serving_server.web_service import WebService from paddle_serving_server.web_service import WebService
# 如果是GPU版本,请使用 from paddle_serving_server_gpu.web_service import WebService # 如果是GPU版本,请使用 from paddle_serving_server.web_service import WebService
class YourWebService(WebService): class YourWebService(WebService):
def preprocess(self, feed=[], fetch=[]): def preprocess(self, feed=[], fetch=[]):
#在这里实现前处理 #在这里实现前处理
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册