paddle_serving_deploy_en.md 22.0 KB
Newer Older
S
stephon 已提交
1 2
# Model Service Deployment

B
Bin Lu 已提交
3
## Catalogue
S
stephon 已提交
4 5 6 7

- [1. Introduction](#1)
- [2. Installation of Serving](#2)
- [3. Service Deployment for Image Classification](#3)
H
HydrogenSulfate 已提交
8
  - [3.1 Model conversion](#3.1)
S
stephon 已提交
9 10
  - [3.2 Service Deployment and Request](#3.2)
- [4. Service Deployment for Image Recognition](#4)
H
HydrogenSulfate 已提交
11
  - [4.1 Model conversion](#4.1)
S
stephon 已提交
12 13 14 15
  - [4.2 Service Deployment and Request](#4.2)
- [5. FAQ](#5)

<a name="1"></a>
S
stephon 已提交
16 17 18 19 20 21
## 1. Introduction

[Paddle Serving](https://github.com/PaddlePaddle/Serving) is designed to provide easy deployment of on-line prediction services for deep learning developers, it supports one-click deployment of industrial-grade services, highly concurrent and efficient communication between client and server, and multiple programming languages for client development.

This section, exemplified by HTTP deployment of prediction service, describes how to deploy model services in PaddleClas with PaddleServing. Currently, only deployment on Linux platform is supported. Windows platform is not supported.

S
stephon 已提交
22
<a name="2"></a>
S
stephon 已提交
23 24 25 26
## 2. Installation of Serving

It is officially recommended to use docker for the installation and environment deployment of Serving. First, pull the docker and create a Serving-based one.

H
HydrogenSulfate 已提交
27
```shell
S
stephon 已提交
28 29 30 31 32 33 34
docker pull paddlepaddle/serving:0.7.0-cuda10.2-cudnn7-devel
nvidia-docker run -p 9292:9292 --name test -dit paddlepaddle/serving:0.7.0-cuda10.2-cudnn7-devel bash
nvidia-docker exec -it test bash
```

Once you are in docker,  install the Serving-related python packages.

H
HydrogenSulfate 已提交
35 36 37 38 39
```shell
python3.7 -m pip install paddle-serving-client==0.7.0
python3.7 -m pip install paddle-serving-server==0.7.0 # CPU
python3.7 -m pip install paddle-serving-app==0.7.0
python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post102 #GPU with CUDA10.2 + TensorRT6
S
stephon 已提交
40
# For other GPU environemnt, confirm the environment before choosing which one to execute
H
HydrogenSulfate 已提交
41 42
python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post101 # GPU with CUDA10.1 + TensorRT6
python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUDA11.2 + TensorRT8
S
stephon 已提交
43 44 45
```

- Speed up the installation process by replacing the source with `-i https://pypi.tuna.tsinghua.edu.cn/simple`.
S
stephon 已提交
46
- For other environment configuration and installation, please refer to [Install Paddle Serving using docker](https://github.com/PaddlePaddle/Serving/blob/v0.7.0/doc/Install_EN.md)
S
stephon 已提交
47 48
- To deploy CPU services, please install the CPU version of serving-server with the following command.

H
HydrogenSulfate 已提交
49 50
```shell
python3.7 -m pip install paddle-serving-server
S
stephon 已提交
51 52
```

S
stephon 已提交
53
<a name="3"></a>
S
stephon 已提交
54 55
## 3. Service Deployment for Image Classification

S
stephon 已提交
56
<a name="3.1"></a>
57
## 3. Image Classification Service Deployment
S
stephon 已提交
58

59
The following takes the classic ResNet50_vd model as an example to introduce how to deploy the image classification service.
S
stephon 已提交
60

61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87
<a name="3.1"></a>
### 3.1 Model conversion

When using PaddleServing for service deployment, you need to convert the saved inference model into a Serving model.
- Go to the working directory:
  ```shell
  cd deploy/paddleserving
  ```
- Download and unzip the inference model for ResNet50_vd:
  ```shell
  # Download ResNet50_vd inference model
  wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet50_vd_infer.tar
  # Decompress the ResNet50_vd inference model
  tar xf ResNet50_vd_infer.tar
  ```
- Use the paddle_serving_client command to convert the downloaded inference model into a model format for easy server deployment:
  ```shell
  # Convert ResNet50_vd model
  python3.7 -m paddle_serving_client.convert \
  --dirname ./ResNet50_vd_infer/ \
  --model_filename inference.pdmodel \
  --params_filename inference.pdiparams \
  --serving_server ./ResNet50_vd_serving/ \
  --serving_client ./ResNet50_vd_client/
  ```
  The specific meaning of the parameters in the above command is shown in the following table
    | parameter | type | default value | description |
H
HydrogenSulfate 已提交
88
    | --------- | ---- | ------------- | ----------- |
89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126
  | `dirname` | str | - | The storage path of the model file to be converted. The program structure file and parameter file are saved in this directory. |
  | `model_filename` | str | None | The name of the file storing the model Inference Program structure that needs to be converted. If set to None, use `__model__` as the default filename |
  | `params_filename` | str | None | File name where all parameters of the model to be converted are stored. It needs to be specified if and only if all model parameters are stored in a single binary file. If the model parameters are stored in separate files, set it to None |
  | `serving_server` | str | `"serving_server"` | The storage path of the converted model files and configuration files. Default is serving_server |
  | `serving_client` | str | `"serving_client"` | The converted client configuration file storage path. Default is serving_client |

    After the ResNet50_vd inference model conversion is completed, there will be additional `ResNet50_vd_serving` and `ResNet50_vd_client` folders in the current folder, with the following structure:
  ```shell
  ├── ResNet50_vd_serving/
  │   ├── inference.pdiparams
  │   ├── inference.pdmodel
  │   ├── serving_server_conf.prototxt
  │   └── serving_server_conf.stream.prototxt

  └── ResNet50_vd_client/
      ├── serving_client_conf.prototxt
      └── serving_client_conf.stream.prototxt
  ```

- Serving provides the function of input and output renaming in order to be compatible with the deployment of different models. When different models are deployed in inference, you only need to modify the `alias_name` of the configuration file, and the inference deployment can be completed without modifying the code. Therefore, after the conversion, you need to modify the alias names in the files `serving_server_conf.prototxt` under `ResNet50_vd_serving` and `ResNet50_vd_client` respectively, and change the `alias_name` in `fetch_var` to `prediction`, the modified serving_server_conf.prototxt is as follows Show:
  ```log
  feed_var {
    name: "inputs"
    alias_name: "inputs"
    is_lod_tensor: false
    feed_type: 1
    shape: 3
    shape: 224
    shape: 224
  }
  fetch_var {
    name: "save_infer_model/scale_0.tmp_1"
    alias_name: "prediction"
    is_lod_tensor: false
    fetch_type: 1
    shape: 1000
  }
  ```
S
stephon 已提交
127
<a name="3.2"></a>
128
### 3.2 Service deployment and request
S
stephon 已提交
129

130 131
The paddleserving directory contains the code for starting the pipeline service, the C++ serving service and sending the prediction request, mainly including:
```shell
S
stephon 已提交
132
__init__.py
133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158
classification_web_service.py # Script to start the pipeline server
config.yml # Configuration file to start the pipeline service
pipeline_http_client.py # Script for sending pipeline prediction requests in http mode
pipeline_rpc_client.py # Script for sending pipeline prediction requests in rpc mode
readme.md # Classification model service deployment document
run_cpp_serving.sh # Start the C++ Serving departmentscript
test_cpp_serving_client.py # Script for sending C++ serving prediction requests in rpc mode
```
<a name="3.2.1"></a>
#### 3.2.1 Python Serving

- Start the service:
  ```shell
  # Start the service and save the running log in log.txt
  python3.7 classification_web_service.py &>log.txt &
  ```

- send request:
  ```shell
  # send service request
  python3.7 pipeline_http_client.py
  ```
  After a successful run, the results of the model prediction will be printed in the cmd window, and the results are as follows:
  ```log
  {'err_no': 0, 'err_msg': '', 'key': ['label', 'prob'], 'value': ["['daisy']", '[0.9341402053833008]'], 'tensors ': []}
  ```
H
HydrogenSulfate 已提交
159

160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178
- turn off the service
If the service program is running in the foreground, you can press `Ctrl+C` to terminate the server program; if it is running in the background, you can use the kill command to close related processes, or you can execute the following command in the path where the service program is started to terminate the server program:
  ```bash
  python3.7 -m paddle_serving_server.serve stop
  ```
  After the execution is completed, the `Process stopped` message appears, indicating that the service was successfully shut down.

<a name="3.2.2"></a>
#### 3.2.2 C++ Serving

Different from Python Serving, the C++ Serving client calls C++ OP to predict, so before starting the service, you need to compile and install the serving server package, and set `SERVING_BIN`.

- Compile and install the Serving server package
  ```shell
  # Enter the working directory
  cd PaddleClas/deploy/paddleserving
  # One-click compile and install Serving server, set SERVING_BIN
  source ./build_server.sh python3.7
  ```
H
HydrogenSulfate 已提交
179
  **Note: The path set by **[build_server.sh](../../../deploy/paddleserving/build_server.sh#L55-L62) may need to be modified according to the actual machine environment such as CUDA, python version, etc., and then compiled; if you encounter a non-network error during the execution of build_server.sh, you can manually copy the commands in the script to the terminal for execution.
180 181 182 183 184 185 186 187 188 189 190

- Modify the client file `ResNet50_client/serving_client_conf.prototxt` , change the field after `feed_type:` to 20, change the field after the first `shape:` to 1 and delete the rest of the `shape` fields.
  ```log
  feed_var {
    name: "inputs"
    alias_name: "inputs"
    is_lod_tensor: false
    feed_type: 20
    shape: 1
  }
  ```
H
HydrogenSulfate 已提交
191

192 193 194
- Modify part of the code of [`test_cpp_serving_client`](../../../deploy/paddleserving/test_cpp_serving_client.py)
  1. Modify the [`feed={"inputs": image}`](../../../deploy/paddleserving/test_cpp_serving_client.py#L28) part of the code, and change the path after `load_client_config` to `ResNet50_client/serving_client_conf.prototxt` .
  2. Modify the [`feed={"inputs": image}`](../../../deploy/paddleserving/test_cpp_serving_client.py#L45) part of the code, and change `inputs` to be the same as the `feed_var` field in `ResNet50_client/serving_client_conf.prototxt` name` is the same. Since `name` in some model client files is `x` instead of `inputs` , you need to pay attention to this when using these models for C++ Serving deployment.
195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219

- Start the service:
  ```shell
  # Start the service, the service runs in the background, and the running log is saved in nohup.txt
  # CPU deployment
  sh run_cpp_serving.sh
  # GPU deployment and specify card 0
  sh run_cpp_serving.sh 0
  ```

- send request:
  ```shell
  # send service request
  python3.7 test_cpp_serving_client.py
  ```
  After a successful run, the results of the model prediction will be printed in the cmd window, and the results are as follows:
  ```log
  prediction: daisy, probability: 0.9341399073600769
  ```
- close the service:
  If the service program is running in the foreground, you can press `Ctrl+C` to terminate the server program; if it is running in the background, you can use the kill command to close related processes, or you can execute the following command in the path where the service program is started to terminate the server program:
  ```bash
  python3.7 -m paddle_serving_server.serve stop
  ```
  After the execution is completed, the `Process stopped` message appears, indicating that the service was successfully shut down.
S
stephon 已提交
220 221


S
stephon 已提交
222
<a name="4"></a>
223
## 4. Image recognition service deployment
S
stephon 已提交
224

225
When using PaddleServing for image recognition service deployment, **need to convert multiple saved inference models to Serving models**. The following takes the ultra-lightweight image recognition model in PP-ShiTu as an example to introduce the deployment of image recognition services.
S
stephon 已提交
226
<a name="4.1"></a>
227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254
### 4.1 Model conversion

- Go to the working directory:
  ```shell
  cd deploy/
  ```
- Download generic detection inference model and generic recognition inference model
  ```shell
  # Create and enter the models folder
  mkdir models
  cd models
  # Download and unzip the generic recognition model
  wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/general_PPLCNet_x2_5_lite_v1.0_infer.tar
  tar -xf general_PPLCNet_x2_5_lite_v1.0_infer.tar
  # Download and unzip the generic detection model
  wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar
  tar -xf picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar
  ```
- Convert the generic recognition inference model to the Serving model:
  ```shell
  # Convert the generic recognition model
  python3.7 -m paddle_serving_client.convert \
  --dirname ./general_PPLCNet_x2_5_lite_v1.0_infer/ \
  --model_filename inference.pdmodel \
  --params_filename inference.pdiparams \
  --serving_server ./general_PPLCNet_x2_5_lite_v1.0_serving/ \
  --serving_client ./general_PPLCNet_x2_5_lite_v1.0_client/
  ```
H
HydrogenSulfate 已提交
255
  The meaning of the parameters of the above command is the same as [#3.1 Model conversion](#3.1)
256 257 258 259 260

  After the recognition inference model is converted, there will be additional folders `general_PPLCNet_x2_5_lite_v1.0_serving/` and `general_PPLCNet_x2_5_lite_v1.0_client/` in the current folder. Modify the name of `alias` in `serving_server_conf.prototxt` in `general_PPLCNet_x2_5_lite_v1.0_serving/` and `general_PPLCNet_x2_5_lite_v1.0_client/` directories respectively: Change `alias_name` in `fetch_var` to `features`. The content of the modified `serving_server_conf.prototxt` is as follows

  ```log
  feed_var {
S
stephon 已提交
261 262 263 264 265 266 267
  name: "x"
  alias_name: "x"
  is_lod_tensor: false
  feed_type: 1
  shape: 3
  shape: 224
  shape: 224
268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298
  }
  fetch_var {
    name: "save_infer_model/scale_0.tmp_1"
    alias_name: "features"
    is_lod_tensor: false
    fetch_type: 1
    shape: 512
  }
  ```

  After the conversion of the general recognition inference model is completed, there will be additional `general_PPLCNet_x2_5_lite_v1.0_serving/` and `general_PPLCNet_x2_5_lite_v1.0_client/` folders in the current folder, with the following structure:
    ```shell
    ├── general_PPLCNet_x2_5_lite_v1.0_serving/
    │   ├── inference.pdiparams
    │   ├── inference.pdmodel
    │   ├── serving_server_conf.prototxt
    │   └── serving_server_conf.stream.prototxt

    └── general_PPLCNet_x2_5_lite_v1.0_client/
        ├── serving_client_conf.prototxt
        └── serving_client_conf.stream.prototxt
    ```
- Convert general detection inference model to Serving model:
  ```shell
  # Convert generic detection model
  python3.7 -m paddle_serving_client.convert --dirname ./picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer/ \
  --model_filename inference.pdmodel \
  --params_filename inference.pdiparams \
  --serving_server ./picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/ \
  --serving_client ./picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/
  ```
H
HydrogenSulfate 已提交
299
  The meaning of the parameters of the above command is the same as [#3.1 Model conversion](#3.1)
300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330

  After the conversion of the general detection inference model is completed, there will be additional folders `picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/` and `picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/` in the current folder, with the following structure:
    ```shell
    ├── picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/
    │   ├── inference.pdiparams
    │   ├── inference.pdmodel
    │   ├── serving_server_conf.prototxt
    │   └── serving_server_conf.stream.prototxt

    └── picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/
        ├── serving_client_conf.prototxt
        └── serving_client_conf.stream.prototxt
    ```
  The specific meaning of the parameters in the above command is shown in the following table
    | parameter         | type | default value      | description                                         |
    | ----------------- | ---- | ------------------ | ----------------------------------------------------- |
    | `dirname`         | str  | -                  | The storage path of the model file to be converted. The program structure file and parameter file are saved in this directory.|
    | `model_filename`  | str  | None               | The name of the file storing the model Inference Program structure that needs to be converted. If set to None, use `__model__` as the default filename |
    | `params_filename` | str  | None               | The name of the file that stores all parameters of the model that need to be transformed. It needs to be specified if and only if all model parameters are stored in a single binary file. If the model parameters are stored in separate files, set it to None |
    | `serving_server`  | str  | `"serving_server"` | The storage path of the converted model files and configuration files. Default is serving_server |
    | `serving_client`  | str  | `"serving_client"` | The converted client configuration file storage path. Default is |

- Download and unzip the index of the retrieval library that has been built
    ```shell
    # Go back to the deploy directory
    cd ../
    # Download the built retrieval library index
    wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v1.0.tar
    # Decompress the built retrieval library index
    tar -xf drink_dataset_v1.0.tar
    ```
S
stephon 已提交
331
<a name="4.2"></a>
332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365
### 4.2 Service deployment and request

**Note:** The identification service involves multiple models, and the PipeLine deployment method is used for performance reasons. The Pipeline deployment method currently does not support the windows platform.
- go to the working directory
  ```shell
  cd ./deploy/paddleserving/recognition
  ```
  The paddleserving directory contains code to start the Python Pipeline service, the C++ Serving service, and send prediction requests, including:
  ```shell
  __init__.py
  config.yml # The configuration file to start the python pipeline service
  pipeline_http_client.py # Script for sending pipeline prediction requests in http mode
  pipeline_rpc_client.py # Script for sending pipeline prediction requests in rpc mode
  recognition_web_service.py # Script to start the pipeline server
  readme.md # Recognition model service deployment documents
  run_cpp_serving.sh # Script to start C++ Pipeline Serving deployment
  test_cpp_serving_client.py # Script for sending C++ Pipeline serving prediction requests by rpc
  ```

<a name="4.2.1"></a>
#### 4.2.1 Python Serving

- Start the service:
  ```shell
  # Start the service and save the running log in log.txt
  python3.7 recognition_web_service.py &>log.txt &
  ```

- send request:
  ```shell
  python3.7 pipeline_http_client.py
  ```
  After a successful run, the results of the model prediction will be printed in the cmd window, and the results are as follows:
  ```log
H
HydrogenSulfate 已提交
366
  {'err_no': 0, 'err_msg': '', 'key': ['result'], 'value': ["[{'bbox': [345, 95, 524, 576], 'rec_docs': '红牛-强化型', 'rec_scores': 0.79903316}]"], 'tensors': []}
367 368 369 370 371 372 373 374 375 376 377 378 379
  ```

<a name="4.2.2"></a>
#### 4.2.2 C++ Serving

Different from Python Serving, the C++ Serving client calls C++ OP to predict, so before starting the service, you need to compile and install the serving server package, and set `SERVING_BIN`.
- Compile and install the Serving server package
  ```shell
  # Enter the working directory
  cd PaddleClas/deploy/paddleserving
  # One-click compile and install Serving server, set SERVING_BIN
  source ./build_server.sh python3.7
  ```
380
  **Note:** The path set by [build_server.sh](../../../deploy/paddleserving/build_server.sh#L55-L62) may need to be modified according to the actual machine environment such as CUDA, python version, etc., and then compiled.
381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415

- The input and output format used by C++ Serving is different from that of Python, so you need to execute the following command to overwrite the files below [3.1] (#31-model conversion) by copying the 4 files to get the corresponding 4 prototxt files in the folder.
  ```shell
  # Enter PaddleClas/deploy directory
  cd PaddleClas/deploy/

  # Overwrite prototxt file
  \cp ./paddleserving/recognition/preprocess/general_PPLCNet_x2_5_lite_v1.0_serving/*.prototxt ./models/general_PPLCNet_x2_5_lite_v1.0_serving/
  \cp ./paddleserving/recognition/preprocess/general_PPLCNet_x2_5_lite_v1.0_client/*.prototxt ./models/general_PPLCNet_x2_5_lite_v1.0_client/
  \cp ./paddleserving/recognition/preprocess/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/*.prototxt ./models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/
  \cp ./paddleserving/recognition/preprocess/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/*.prototxt ./models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/
  ```

- Start the service:
  ```shell
  # Enter the working directory
  cd PaddleClas/deploy/paddleserving/recognition

  # The default port number is 9400; the running log is saved in log_PPShiTu.txt by default
  # CPU deployment
  sh run_cpp_serving.sh
  # GPU deployment, and specify card 0
  sh run_cpp_serving.sh 0
  ```

- send request:
  ```shell
  # send service request
  python3.7 test_cpp_serving_client.py
  ```
  After a successful run, the results of the model predictions are printed in the client's terminal window as follows:
  ```log
  WARNING: Logging before InitGoogleLogging() is written to STDERR
  I0614 03:01:36.273097 6084 naming_service_thread.cpp:202] brpc::policy::ListNamingService("127.0.0.1:9400"): added 1
  I0614 03:01:37.393564 6084 general_model.cpp:490] [client]logid=0,client_cost=1107.82ms,server_cost=1101.75ms.
H
HydrogenSulfate 已提交
416
  [{'bbox': [345, 95, 524, 585], 'rec_docs': '红牛-强化型', 'rec_scores': 0.8073724}]
417 418 419 420 421 422 423 424
  ```

- close the service:
  If the service program is running in the foreground, you can press `Ctrl+C` to terminate the server program; if it is running in the background, you can use the kill command to close related processes, or you can execute the following command in the path where the service program is started to terminate the server program:
  ```bash
  python3.7 -m paddle_serving_server.serve stop
  ```
  After the execution is completed, the `Process stopped` message appears, indicating that the service was successfully shut down.
S
stephon 已提交
425 426


S
stephon 已提交
427
<a name="5"></a>
S
stephon 已提交
428 429 430 431 432 433 434 435 436 437 438 439
## 5.FAQ

**Q1**: After sending a request, no result is returned or the output is prompted with a decoding error.

**A1**: Please turn off the proxy before starting the service and sending requests, try the following command:

```
unset https_proxy
unset http_proxy
```

For more types of service deployment, such as `RPC prediction services`, you can refer to the [github official website](https://github.com/PaddlePaddle/Serving/tree/v0.7.0/examples) of Serving.