diff --git a/README.md b/README.md index 4b63ce9c8aead852a76404f4268fc8131c060729..05147c5c8d78325895b1b5fc9a9dfa528e3c7c0c 100644 --- a/README.md +++ b/README.md @@ -165,17 +165,19 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po ```
-| Argument | Type | Default | Description | -|--------------|------|-----------|--------------------------------| -| `thread` | int | `4` | Concurrency of current service | -| `port` | int | `9292` | Exposed port of current service to users| -| `model` | str | `""` | Path of paddle model directory to be served | -| `mem_optim_off` | - | - | Disable memory / graphic memory optimization | -| `ir_optim` | - | - | Enable analysis and optimization of calculation graph | -| `use_mkl` (Only for cpu version) | - | - | Run inference with MKL | -| `use_trt` (Only for trt version) | - | - | Run inference with TensorRT | -| `use_lite` (Only for ARM) | - | - | Run PaddleLite inference | -| `use_xpu` (Only for ARM+XPU) | - | - | Run PaddleLite XPU inference | +| Argument | Type | Default | Description | +| ---------------------------------------------- | ---- | ------- | ----------------------------------------------------- | +| `thread` | int | `4` | Concurrency of current service | +| `port` | int | `9292` | Exposed port of current service to users | +| `model` | str | `""` | Path of paddle model directory to be served | +| `mem_optim_off` | - | - | Disable memory / graphic memory optimization | +| `ir_optim` | bool | False | Enable analysis and optimization of calculation graph | +| `use_mkl` (Only for cpu version) | - | - | Run inference with MKL | +| `use_trt` (Only for trt version) | - | - | Run inference with TensorRT | +| `use_lite` (Only for Intel x86 CPU or ARM CPU) | - | - | Run PaddleLite inference | +| `use_xpu` | - | - | Run PaddleLite inference with Baidu Kunlun XPU | +| `precision` | str | FP32 | Precision Mode, support FP32, FP16, INT8 | +| `use_calib` | bool | False | Only for deployment with TensorRT |
diff --git a/README_CN.md b/README_CN.md index 4ee2c9863dfbc6f4531d0ec00ca92aacc19e769e..a3c8d3061232ceee95ae4c1347823899545afbc0 100644 --- a/README_CN.md +++ b/README_CN.md @@ -164,18 +164,19 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po ```
-| Argument | Type | Default | Description | -|--------------|------|-----------|--------------------------------| -| `thread` | int | `4` | Concurrency of current service | -| `port` | int | `9292` | Exposed port of current service to users| -| `name` | str | `""` | Service name, can be used to generate HTTP request url | -| `model` | str | `""` | Path of paddle model directory to be served | -| `mem_optim_off` | - | - | Disable memory optimization | -| `ir_optim` | - | - | Enable analysis and optimization of calculation graph | -| `use_mkl` (Only for cpu version) | - | - | Run inference with MKL | -| `use_trt` (Only for Cuda>=10.1 version) | - | - | Run inference with TensorRT | -| `use_lite` (Only for ARM) | - | - | Run PaddleLite inference | -| `use_xpu` (Only for ARM+XPU) | - | - | Run PaddleLite XPU inference | +| Argument | Type | Default | Description | +| ---------------------------------------------- | ---- | ------- | ------------------------------------------------------ | +| `thread` | int | `4` | Concurrency of current service | +| `port` | int | `9292` | Exposed port of current service to users | +| `name` | str | `""` | Service name, can be used to generate HTTP request url | +| `model` | str | `""` | Path of paddle model directory to be served | +| `mem_optim_off` | - | - | Disable memory optimization | +| `ir_optim` | bool | False | Enable analysis and optimization of calculation graph | +| `use_mkl` (Only for cpu version) | - | - | Run inference with MKL | +| `use_trt` (Only for Cuda>=10.1 version) | - | - | Run inference with TensorRT | +| `use_lite` (Only for Intel x86 CPU or ARM CPU) | - | - | Run PaddleLite inference | +| `use_xpu` | - | - | Run PaddleLite inference with Baidu Kunlun XPU | +| `precision` | str | FP32 | Precision Mode, support FP32, FP16, INT8 |
diff --git a/doc/BAIDU_KUNLUN_XPU_SERVING.md b/doc/BAIDU_KUNLUN_XPU_SERVING.md index c57ce515096253678c9222c96a3e57fcd9dd91e7..02568642bad6aafd147b628a1c6607fd8af9fed3 100644 --- a/doc/BAIDU_KUNLUN_XPU_SERVING.md +++ b/doc/BAIDU_KUNLUN_XPU_SERVING.md @@ -1,12 +1,12 @@ # Paddle Serving Using Baidu Kunlun Chips (English|[简体中文](./BAIDU_KUNLUN_XPU_SERVING_CN.md)) -Paddle serving supports deployment using Baidu Kunlun chips. At present, the pilot support is deployed on the ARM server with Baidu Kunlun chips - (such as Phytium FT-2000+/64). We will improve +Paddle serving supports deployment using Baidu Kunlun chips. Currently, it supports deployment on the ARM CPU server with Baidu Kunlun chips + (such as Phytium FT-2000+/64), or Intel CPU with Baidu Kunlun chips. We will improve the deployment capability on various heterogeneous hardware servers in the future. # Compilation and installation -Refer to [compile](COMPILE.md) document to setup the compilation environment。 +Refer to [compile](COMPILE.md) document to setup the compilation environment. The following is based on FeiTeng FT-2000 +/64 platform. ## Compilatiton * Compile the Serving Server ``` @@ -54,11 +54,11 @@ make -j10 ``` ## Install the wheel package After the compilations stages above, the whl package will be generated in ```python/dist/``` under the specific temporary directories. -For example, after the Server Compiation step,the whl package will be produced under the server-build-arm/python/dist directory, and you can run ```pip install -u python/dist/*.whl``` to install the package。 +For example, after the Server Compiation step,the whl package will be produced under the server-build-arm/python/dist directory, and you can run ```pip install -u python/dist/*.whl``` to install the package. # Request parameters description In order to deploy serving - service on the arm server with Baidu Kunlun xpu chips and use the acceleration capability of Paddle-Lite,please specify the following parameters during deployment。 + service on the arm server with Baidu Kunlun xpu chips and use the acceleration capability of Paddle-Lite,please specify the following parameters during deployment. | param | param description | about | | :------- | :------------------------------- | :----------------------------------------------------------------- | | use_lite | using Paddle-Lite Engine | use the inference capability of Paddle-Lite | @@ -72,23 +72,23 @@ tar -xzf uci_housing.tar.gz ``` ## Start RPC service There are mainly three deployment methods: -* deploy on the ARM server with Baidu xpu using the acceleration capability of Paddle-Lite and xpu; -* deploy on the ARM server standalone with Paddle-Lite; -* deploy on the ARM server standalone without Paddle-Lite。 +* deploy on the cpu server with Baidu xpu using the acceleration capability of Paddle-Lite and xpu; +* deploy on the cpu server standalone with Paddle-Lite; +* deploy on the cpu server standalone without Paddle-Lite. -The first two deployment methods are recommended。 +The first two deployment methods are recommended. -Start the rpc service, deploying on ARM server with Baidu Kunlun chips,and accelerate with Paddle-Lite and Baidu Kunlun xpu. +Start the rpc service, deploying on cpu server with Baidu Kunlun chips,and accelerate with Paddle-Lite and Baidu Kunlun xpu. ``` -python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --use_xpu --ir_optim +python3 -m paddle_serving_server.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --use_xpu --ir_optim ``` -Start the rpc service, deploying on ARM server,and accelerate with Paddle-Lite. +Start the rpc service, deploying on cpu server,and accelerate with Paddle-Lite. ``` -python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --ir_optim +python3 -m paddle_serving_server.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --ir_optim ``` -Start the rpc service, deploying on ARM server. +Start the rpc service, deploying on cpu server. ``` -python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292 +python3 -m paddle_serving_server.serve --model uci_housing_model --thread 6 --port 9292 ``` ## ``` @@ -102,8 +102,17 @@ data = [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, fetch_map = client.predict(feed={"x": np.array(data).reshape(1,13,1)}, fetch=["price"]) print(fetch_map) ``` -Some examples are provided below, and other models can be modifed with reference to these examples。 +# Others +## Model example and explanation + +Some examples are provided below, and other models can be modifed with reference to these examples. | sample name | sample links | | :---------- | :---------------------------------------------------------- | | fit_a_line | [fit_a_line_xpu](../python/examples/xpu/fit_a_line_xpu) | | resnet | [resnet_v2_50_xpu](../python/examples/xpu/resnet_v2_50_xpu) | + +Note:Supported model lists refer to [doc](https://paddlelite.paddlepaddle.org.cn/introduction/support_model_list.html). There are differences in the adaptation of different models, and there may be some unsupported cases. If you have any problem,please submit [Github issue](https://github.com/PaddlePaddle/Serving/issues), and we will follow up in real time. + +## Kunlun chip related reference materials +* [PaddlePaddle on Baidu Kunlun xpu chips](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/xpu_docs/index_cn.html) +* [Deployment on Baidu Kunlun xpu chips using PaddleLite](https://paddlelite.paddlepaddle.org.cn/demo_guides/baidu_xpu.html) diff --git a/doc/BAIDU_KUNLUN_XPU_SERVING_CN.md b/doc/BAIDU_KUNLUN_XPU_SERVING_CN.md index 6640533cafee67360e3f8a12b87816f5aad97aa0..fb7de26e016388dbcc3e5db23d8232743fdd792e 100644 --- a/doc/BAIDU_KUNLUN_XPU_SERVING_CN.md +++ b/doc/BAIDU_KUNLUN_XPU_SERVING_CN.md @@ -1,10 +1,10 @@ # Paddle Serving使用百度昆仑芯片部署 (简体中文|[English](./BAIDU_KUNLUN_XPU_SERVING.md)) -Paddle Serving支持使用百度昆仑芯片进行预测部署。目前试验性支持在百度昆仑芯片和arm服务器(如飞腾 FT-2000+/64)上进行部署,后续完善对其他异构硬件服务器部署能力。 +Paddle Serving支持使用百度昆仑芯片进行预测部署。目前支持在百度昆仑芯片和arm服务器(如飞腾 FT-2000+/64), 或者百度昆仑芯片和Intel CPU服务器,上进行部署,后续完善对其他异构硬件服务器部署能力。 # 编译、安装 -基本环境配置可参考[该文档](COMPILE_CN.md)进行配置。 +基本环境配置可参考[该文档](COMPILE_CN.md)进行配置。下面以飞腾FT-2000+/64机器为例进行介绍。 ## 编译 * 编译server部分 ``` @@ -20,7 +20,7 @@ cmake -DPYTHON_INCLUDE_DIR=/usr/include/python3.7m/ \ -DSERVER=ON .. make -j10 ``` -可以执行`make install`把目标产出放在`./output`目录下,cmake阶段需添加`-DCMAKE_INSTALL_PREFIX=./output`选项来指定存放路径。 +可以执行`make install`把目标产出放在`./output`目录下,cmake阶段需添加`-DCMAKE_INSTALL_PREFIX=./output`选项来指定存放路径。在支持AVX2指令集的Intel CPU平台上请指定```-DWITH_MKL=ON```编译选项。 * 编译client部分 ``` mkdir -p client-build-arm && cd client-build-arm @@ -55,11 +55,11 @@ make -j10 # 请求参数说明 为了支持arm+xpu服务部署,使用Paddle-Lite加速能力,请求时需使用以下参数。 -|参数|参数说明|备注| -|:--|:--|:--| -|use_lite|使用Paddle-Lite Engine|使用Paddle-Lite cpu预测能力| -|use_xpu|使用Baidu Kunlun进行预测|该选项需要与use_lite配合使用| -|ir_optim|开启Paddle-Lite计算子图优化|详细见[Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite)| +| 参数 | 参数说明 | 备注 | +| :------- | :-------------------------- | :--------------------------------------------------------------- | +| use_lite | 使用Paddle-Lite Engine | 使用Paddle-Lite cpu预测能力 | +| use_xpu | 使用Baidu Kunlun进行预测 | 该选项需要与use_lite配合使用 | +| ir_optim | 开启Paddle-Lite计算子图优化 | 详细见[Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite) | # 部署使用示例 ## 下载模型 ``` @@ -68,23 +68,23 @@ tar -xzf uci_housing.tar.gz ``` ## 启动rpc服务 主要有三种启动配置: -* 使用arm cpu+xpu部署,使用Paddle-Lite xpu优化加速能力; -* 单独使用arm cpu部署,使用Paddle-Lite优化加速能力; -* 使用arm cpu部署,不使用Paddle-Lite加速。 +* 使用cpu+xpu部署,使用Paddle-Lite xpu优化加速能力; +* 单独使用cpu部署,使用Paddle-Lite优化加速能力; +* 使用cpu部署,不使用Paddle-Lite加速。 推荐使用前两种部署方式。 -启动rpc服务,使用arm cpu+xpu部署,使用Paddle-Lite xpu优化加速能力 +启动rpc服务,使用cpu+xpu部署,使用Paddle-Lite xpu优化加速能力 ``` -python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --use_xpu --ir_optim +python3 -m paddle_serving_server.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --use_xpu --ir_optim ``` -启动rpc服务,使用arm cpu部署, 使用Paddle-Lite加速能力 +启动rpc服务,使用cpu部署, 使用Paddle-Lite加速能力 ``` -python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --ir_optim +python3 -m paddle_serving_server.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --ir_optim ``` -启动rpc服务,使用arm cpu部署, 不使用Paddle-Lite加速能力 +启动rpc服务,使用cpu部署, 不使用Paddle-Lite加速能力 ``` -python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292 +python3 -m paddle_serving_server.serve --model uci_housing_model --thread 6 --port 9292 ``` ## client调用 ``` @@ -98,8 +98,16 @@ data = [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, fetch_map = client.predict(feed={"x": np.array(data).reshape(1,13,1)}, fetch=["price"]) print(fetch_map) ``` +# 其他说明 + +## 模型实例及说明 以下提供部分样例,其他模型可参照进行修改。 -|示例名称|示例链接| -|:-----|:--| -|fit_a_line|[fit_a_line_xpu](../python/examples/xpu/fit_a_line_xpu)| -|resnet|[resnet_v2_50_xpu](../python/examples/xpu/resnet_v2_50_xpu)| +| 示例名称 | 示例链接 | +| :--------- | :---------------------------------------------------------- | +| fit_a_line | [fit_a_line_xpu](../python/examples/xpu/fit_a_line_xpu) | +| resnet | [resnet_v2_50_xpu](../python/examples/xpu/resnet_v2_50_xpu) | + +注:支持昆仑芯片部署模型列表见[链接](https://paddlelite.paddlepaddle.org.cn/introduction/support_model_list.html)。不同模型适配上存在差异,可能存在不支持的情况,部署使用存在问题时,欢迎以[Github issue](https://github.com/PaddlePaddle/Serving/issues),我们会实时跟进。 +## 昆仑芯片支持相关参考资料 +* [昆仑XPU芯片运行飞桨](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/xpu_docs/index_cn.html) +* [PaddleLite使用百度XPU预测部署](https://paddlelite.paddlepaddle.org.cn/demo_guides/baidu_xpu.html) diff --git a/doc/BERT_10_MINS.md b/doc/BERT_10_MINS.md index 7f2aef671cfca910c4fb07de288fb6ba28bcd451..3857bc555dcd69be96d961f2acc363bac6575c50 100644 --- a/doc/BERT_10_MINS.md +++ b/doc/BERT_10_MINS.md @@ -52,7 +52,7 @@ python -m paddle_serving_server.serve --model bert_seq128_model/ --port 9292 #c ``` Or,start gpu inference service,Run ``` -python -m paddle_serving_server_gpu.serve --model bert_seq128_model/ --port 9292 --gpu_ids 0 #launch gpu inference service at GPU 0 +python -m paddle_serving_server.serve --model bert_seq128_model/ --port 9292 --gpu_ids 0 #launch gpu inference service at GPU 0 ``` | Parameters | Meaning | | ---------- | ---------------------------------------- | diff --git a/doc/BERT_10_MINS_CN.md b/doc/BERT_10_MINS_CN.md index df4e8eb32614df0c8b0c2edeeb47fd1516a70710..3a480b6efc919f9a8af97e537db47ab3eafcbf14 100644 --- a/doc/BERT_10_MINS_CN.md +++ b/doc/BERT_10_MINS_CN.md @@ -50,7 +50,7 @@ python -m paddle_serving_server.serve --model bert_seq128_model/ --port 9292 # ``` 或者,启动gpu预测服务,执行 ``` -python -m paddle_serving_server_gpu.serve --model bert_seq128_model/ --port 9292 --gpu_ids 0 #在gpu 0上启动gpu预测服务 +python -m paddle_serving_server.serve --model bert_seq128_model/ --port 9292 --gpu_ids 0 #在gpu 0上启动gpu预测服务 ``` diff --git a/doc/COMPILE.md b/doc/COMPILE.md index 79a26f857c5b06dd3da39988879b297ce35db167..a681d7ddc0ce954a4c766ba4fbcf4201f907193b 100755 --- a/doc/COMPILE.md +++ b/doc/COMPILE.md @@ -117,13 +117,13 @@ Compared with CPU environment, GPU environment needs to refer to the following t | CUDA_CUDART_LIBRARY | The directory where libcudart.so.* is located, usually /usr/local/cuda/lib64/ | Required for all environments | No (/usr/local/cuda/lib64/) | | TENSORRT_ROOT | The upper level directory of the directory where libnvinfer.so.* is located, depends on the TensorRT installation directory | Cuda 9.0/10.0 does not need, other needs | No (/usr) | -If not in Docker environment, users can refer to the following execution methods. The specific path is subject to the current environment, and the code is only for reference.TENSORRT_LIBRARY_PATH is related to the TensorRT version and should be set according to the actual situation。For example, in the cuda10.1 environment, the TensorRT version is 6.0 (/usr/local/TensorRT-6.0.1.5/targets/x86_64-linux-gnu/),In the cuda10.2 environment, the TensorRT version is 7.1 (/usr/local/TensorRT-7.1.3.4/targets/x86_64-linux-gnu/). +If not in Docker environment, users can refer to the following execution methods. The specific path is subject to the current environment, and the code is only for reference.TENSORRT_LIBRARY_PATH is related to the TensorRT version and should be set according to the actual situation。For example, in the cuda10.1 environment, the TensorRT version is 6.0 (/usr/local/TensorRT6-cuda10.1-cudnn7/targets/x86_64-linux-gnu/),In the cuda10.2 and cuda11.0 environment, the TensorRT version is 7.1 (/usr/local/TensorRT-7.1.3.4/targets/x86_64-linux-gnu/). ``` shell export CUDA_PATH='/usr/local/cuda' export CUDNN_LIBRARY='/usr/local/cuda/lib64/' export CUDA_CUDART_LIBRARY="/usr/local/cuda/lib64/" -export TENSORRT_LIBRARY_PATH="/usr/local/TensorRT-6.0.1.5/targets/x86_64-linux-gnu/" +export TENSORRT_LIBRARY_PATH="/usr/local/TensorRT6-cuda10.1-cudnn7/targets/x86_64-linux-gnu/" mkdir server-build-gpu && cd server-build-gpu cmake -DPYTHON_INCLUDE_DIR=$PYTHON_INCLUDE_DIR \ diff --git a/doc/COMPILE_CN.md b/doc/COMPILE_CN.md index 3d04f4c873cb6b6c004c1377cbedaf30c89d9d5e..736597a2777e3245ecdcf85ce1aab26f11ed92e0 100755 --- a/doc/COMPILE_CN.md +++ b/doc/COMPILE_CN.md @@ -116,13 +116,13 @@ make -j10 | CUDA_CUDART_LIBRARY | libcudart.so.*所在目录,通常为/usr/local/cuda/lib64/ | 全部环境都需要 | 否(/usr/local/cuda/lib64/) | | TENSORRT_ROOT | libnvinfer.so.*所在目录的上一级目录,取决于TensorRT安装目录 | Cuda 9.0/10.0不需要,其他需要 | 否(/usr) | -非Docker环境下,用户可以参考如下执行方式,具体的路径以当时环境为准,代码仅作为参考。TENSORRT_LIBRARY_PATH和TensorRT版本有关,要根据实际情况设置。例如在cuda10.1环境下TensorRT版本是6.0(/usr/local/TensorRT-6.0.1.5/targets/x86_64-linux-gnu/),在cuda10.2环境下TensorRT版本是7.1(/usr/local/TensorRT-7.1.3.4/targets/x86_64-linux-gnu/)。 +非Docker环境下,用户可以参考如下执行方式,具体的路径以当时环境为准,代码仅作为参考。TENSORRT_LIBRARY_PATH和TensorRT版本有关,要根据实际情况设置。例如在cuda10.1环境下TensorRT版本是6.0(/usr/local/TensorRT6-cuda10.1-cudnn7/targets/x86_64-linux-gnu/),在cuda10.2和cuda11.0环境下TensorRT版本是7.1(/usr/local/TensorRT-7.1.3.4/targets/x86_64-linux-gnu/)。 ``` shell export CUDA_PATH='/usr/local/cuda' export CUDNN_LIBRARY='/usr/local/cuda/lib64/' export CUDA_CUDART_LIBRARY="/usr/local/cuda/lib64/" -export TENSORRT_LIBRARY_PATH="/usr/local/TensorRT-6.0.1.5/targets/x86_64-linux-gnu/" +export TENSORRT_LIBRARY_PATH="/usr/local/TensorRT6-cuda10.1-cudnn7/targets/x86_64-linux-gnu/" mkdir server-build-gpu && cd server-build-gpu cmake -DPYTHON_INCLUDE_DIR=$PYTHON_INCLUDE_DIR \ diff --git a/doc/ENCRYPTION.md b/doc/ENCRYPTION.md index b3639bbc6572623f4f0b7af28f44effd665d9f4e..89b2c5f8ed35d2a69cfdb38e2c1c18af22463226 100644 --- a/doc/ENCRYPTION.md +++ b/doc/ENCRYPTION.md @@ -25,7 +25,7 @@ python -m paddle_serving_server.serve --model encrypt_server/ --port 9300 --use_ ``` GPU Service ``` -python -m paddle_serving_server_gpu.serve --model encrypt_server/ --port 9300 --use_encryption_model --gpu_ids 0 +python -m paddle_serving_server.serve --model encrypt_server/ --port 9300 --use_encryption_model --gpu_ids 0 ``` At this point, the server does not really start, but waits for the key。 diff --git a/doc/ENCRYPTION_CN.md b/doc/ENCRYPTION_CN.md index 87452ea365f2cf3b05a0b356a3e709f882568b88..41713e8aa87229dabb039aae084e2207a27977fc 100644 --- a/doc/ENCRYPTION_CN.md +++ b/doc/ENCRYPTION_CN.md @@ -25,7 +25,7 @@ python -m paddle_serving_server.serve --model encrypt_server/ --port 9300 --use_ ``` GPU Service ``` -python -m paddle_serving_server_gpu.serve --model encrypt_server/ --port 9300 --use_encryption_model --gpu_ids 0 +python -m paddle_serving_server.serve --model encrypt_server/ --port 9300 --use_encryption_model --gpu_ids 0 ``` 此时,服务器不会真正启动,而是等待密钥。 diff --git a/doc/MULTI_SERVICE_ON_ONE_GPU_CN.md b/doc/MULTI_SERVICE_ON_ONE_GPU_CN.md index 7554e6658e1fdc4b7bd6eb7f110b0d67c118e254..1de36af8120c0547d4a5cfd4d939e45b47984886 100644 --- a/doc/MULTI_SERVICE_ON_ONE_GPU_CN.md +++ b/doc/MULTI_SERVICE_ON_ONE_GPU_CN.md @@ -5,8 +5,8 @@ 例如: ```shell -python -m paddle_serving_server_gpu.serve --model bert_seq128_model --port 9292 --gpu_ids 0 -python -m paddle_serving_server_gpu.serve --model ResNet50_vd_model --port 9393 --gpu_ids 0 +python -m paddle_serving_server.serve --model bert_seq128_model --port 9292 --gpu_ids 0 +python -m paddle_serving_server.serve --model ResNet50_vd_model --port 9393 --gpu_ids 0 ``` 在卡0上,同时部署了bert示例和iamgenet示例。 diff --git a/doc/SAVE.md b/doc/SAVE.md index 32562fa55af253bdaa6328c9bd02f5d54328161b..9da923bf6df1437923539aba6da99a429082da29 100644 --- a/doc/SAVE.md +++ b/doc/SAVE.md @@ -38,7 +38,7 @@ We can see that the `serving_server` and `serving_client` folders hold the serve Start the server (GPU) ``` -python -m paddle_serving_server_gpu.serve --model serving_server --port 9393 --gpu_id 0 +python -m paddle_serving_server.serve --model serving_server --port 9393 --gpu_id 0 ``` Client (`test_client.py`) diff --git a/doc/SAVE_CN.md b/doc/SAVE_CN.md index 1bb3df108275c2587ffc0979beca89d4d0ada4ea..42606372a06bc26591b70d1ae6db119cd5a8749d 100644 --- a/doc/SAVE_CN.md +++ b/doc/SAVE_CN.md @@ -37,7 +37,7 @@ python -m paddle_serving_client.convert --dirname . --model_filename dygraph_mod 启动服务端(GPU) ``` -python -m paddle_serving_server_gpu.serve --model serving_server --port 9393 --gpu_id 0 +python -m paddle_serving_server.serve --model serving_server --port 9393 --gpu_id 0 ``` 客户端写法,保存为`test_client.py` diff --git a/doc/TENSOR_RT.md b/doc/TENSOR_RT.md index 7504646fea750572cde472ebfb6178989b542ec1..a18bc0b0c7c9fb61d57d1d532a719170b79d8047 100644 --- a/doc/TENSOR_RT.md +++ b/doc/TENSOR_RT.md @@ -50,7 +50,7 @@ We just need ``` wget --no-check-certificate https://paddle-serving.bj.bcebos.com/pddet_demo/2.0/faster_rcnn_r50_fpn_1x_coco.tar tar xf faster_rcnn_r50_fpn_1x_coco.tar -python -m paddle_serving_server_gpu.serve --model serving_server --port 9494 --gpu_ids 0 --use_trt +python -m paddle_serving_server.serve --model serving_server --port 9494 --gpu_ids 0 --use_trt ``` The TensorRT version of the faster_rcnn model server is started diff --git a/doc/TENSOR_RT_CN.md b/doc/TENSOR_RT_CN.md index 40d525d59b5a21ca22f7a1e4274009bf9ceba987..453a08379196df94a348a13746ed288632d44486 100644 --- a/doc/TENSOR_RT_CN.md +++ b/doc/TENSOR_RT_CN.md @@ -50,7 +50,7 @@ pip install paddle-server-server==${VERSION}.post11 ``` wget --no-check-certificate https://paddle-serving.bj.bcebos.com/pddet_demo/2.0/faster_rcnn_r50_fpn_1x_coco.tar tar xf faster_rcnn_r50_fpn_1x_coco.tar -python -m paddle_serving_server_gpu.serve --model serving_server --port 9494 --gpu_ids 0 --use_trt +python -m paddle_serving_server.serve --model serving_server --port 9494 --gpu_ids 0 --use_trt ``` TensorRT版本的faster_rcnn模型服务端就启动了 diff --git a/doc/WINDOWS_TUTORIAL.md b/doc/WINDOWS_TUTORIAL.md index 73cf52bb4fab14c213a13f358ed84f1e643b0734..2c1e787a7fe8d640609f344a6dde73fe1f4a42d8 100644 --- a/doc/WINDOWS_TUTORIAL.md +++ b/doc/WINDOWS_TUTORIAL.md @@ -54,7 +54,7 @@ Currently Windows supports the Local Predictor of the Web Service framework. The ``` # filename:your_webservice.py from paddle_serving_server.web_service import WebService -# If it is the GPU version, please use from paddle_serving_server_gpu.web_service import WebService +# If it is the GPU version, please use from paddle_serving_server.web_service import WebService class YourWebService(WebService): def preprocess(self, feed=[], fetch=[]): #Implement pre-processing here diff --git a/doc/WINDOWS_TUTORIAL_CN.md b/doc/WINDOWS_TUTORIAL_CN.md index 143d3b22ff0d2a6c9b35542ac301fd2a906a0962..e68373e3e2306761e45102fe67ff15fc089df87d 100644 --- a/doc/WINDOWS_TUTORIAL_CN.md +++ b/doc/WINDOWS_TUTORIAL_CN.md @@ -54,7 +54,7 @@ python ocr_web_client.py ``` # filename:your_webservice.py from paddle_serving_server.web_service import WebService -# 如果是GPU版本,请使用 from paddle_serving_server_gpu.web_service import WebService +# 如果是GPU版本,请使用 from paddle_serving_server.web_service import WebService class YourWebService(WebService): def preprocess(self, feed=[], fetch=[]): #在这里实现前处理