PaddleClas supports rapid service deployment through PaddleHub. Currently, the deployment of image classification is supported. Please look forward to the deployment of image recognition.
## Catalogue
-[1 Introduction](#1-introduction)
-[2. Prepare the environment](#2-prepare-the-environment)
-[3. Download the inference model](#3-download-the-inference-model)
-[4. Install the service module](#4-install-the-service-module)
-[5. Start service](#5-start-service)
-[5.1 Start with command line parameters](#51-start-with-command-line-parameters)
-[5.2 Start with configuration file](#52-start-with-configuration-file)
* The model file path can be viewed and modified in `PaddleClas/deploy/hubserving/clas/params.py`.
It should be noted that the prefix of model structure file and model parameters file must be `inference`.
<aname="3"></a>
## 3. Download the inference model
* More models provided by PaddleClas can be obtained from the [model library](../../docs/en/models/models_intro_en.md). You can also use models trained by yourself.
Before installing the service module, you need to prepare the inference model and put it in the correct path. The default model path is:
### 3. Install Service Module
* Classification inference model structure file: `PaddleClas/inference/inference.pdmodel`
* Classification inference model weight file: `PaddleClas/inference/inference.pdiparams`
* On Linux platform, the examples are as follows.
```shell
cd PaddleClas/deploy
hub install hubserving/clas/
```
**Notice**:
* Model file paths can be viewed and modified in `PaddleClas/deploy/hubserving/clas/params.py`:
```python
"inference_model_dir":"../inference/"
```
* Model files (including `.pdmodel` and `.pdiparams`) must be named `inference`.
* We provide a large number of pre-trained models based on the ImageNet-1k dataset. For the model list and download address, see [Model Library Overview](../../docs/en/algorithm_introduction/ImageNet_models_en.md), or you can use your own trained and converted models.
<aname="4"></a>
## 4. Install the service module
* In the Linux environment, the installation example is as follows:
```shell
cd PaddleClas/deploy
# Install the service module:
hub install hubserving/clas/
```
* In the Windows environment (the folder separator is `\`), the installation example is as follows:
```shell
cd PaddleClas\deploy
# Install the service module:
hub install hubserving\clas\
```
<a name="5"></a>
## 5. Start service
<a name="5.1"></a>
### 5.1 Start with command line parameters
This method only supports prediction using CPU. Start command:
* On Windows platform, the examples are as follows.
```shell
cd PaddleClas\deploy
hub install hubserving\clas\
hub serving start \
--modules clas_system
--port 8866
```
This completes the deployment of a serviced API, using the default port number 8866.
### 4. Start service
#### Way 1. Start with command line parameters (CPU only)
**Parameter Description**:
| parameters | uses |
| ------------------ | ------------------- |
| --modules/-m | [**required**] PaddleHub Serving pre-installed model, listed in the form of multiple Module==Version key-value pairs<br>*`When no Version is specified, the latest is selected by default version`* |
| --port/-p | [**OPTIONAL**] Service port, default is 8866 |
| --use_multiprocess | [**Optional**] Whether to enable the concurrent mode, the default is single-process mode, it is recommended to use this mode for multi-core CPU machines<br>*`Windows operating system only supports single-process mode`* |
| --workers | [**Optional**] The number of concurrent tasks specified in concurrent mode, the default is `2*cpu_count-1`, where `cpu_count` is the number of CPU cores |
For more deployment details, see [PaddleHub Serving Model One-Click Service Deployment](https://paddlehub.readthedocs.io/zh_CN/release-v2.1/tutorial/serving.html)
**start command:**
```shell
$ hub serving start --modulesModule1==Version1 \
--port XXXX \
--use_multiprocess\
--workers\
```
**parameters:**
|parameters|usage|
|-|-|
|--modules/-m|PaddleHub Serving pre-installed model, listed in the form of multiple Module==Version key-value pairs<br>*`When Version is not specified, the latest version is selected by default`*|
|--port/-p|Service port, default is 8866|
|--use_multiprocess|Enable concurrent mode, the default is single-process mode, this mode is recommended for multi-core CPU machines<br>*`Windows operating system only supports single-process mode`*|
|--workers|The number of concurrent tasks specified in concurrent mode, the default is `2*cpu_count-1`, where `cpu_count` is the number of CPU cores|
For example, start the 2-stage series service:
```shell
hub serving start -m clas_system
```
<a name="5.2"></a>
### 5.2 Start with configuration file
This completes the deployment of a service API, using the default port number 8866.
This method only supports prediction using CPU or GPU. Start command:
#### Way 2. Start with configuration file(CPU、GPU)
**start command:**
```shell
hub serving start --config/-c config.json
```
Wherein, the format of `config.json` is as follows:
hub serving start -c config.json
```
Among them, the format of `config.json` is as follows:
```json
{
"modules_info": {
...
...
@@ -96,104 +129,110 @@ Wherein, the format of `config.json` is as follows:
"workers": 2
}
```
- The configurable parameters in `init_args` are consistent with the `_initialize` function interface in `module.py`. Among them,
- when `use_gpu` is `true`, it means that the GPU is used to start the service.
- when `enable_mkldnn` is `true`, it means that use MKL-DNN to accelerate.
- The configurable parameters in `predict_args` are consistent with the `predict` function interface in `module.py`.
**Note:**
- When using the configuration file to start the service, other parameters will be ignored.
- If you use GPU prediction (that is, `use_gpu` is set to `true`), you need to set the environment variable CUDA_VISIBLE_DEVICES before starting the service, such as: ```export CUDA_VISIBLE_DEVICES=0```, otherwise you do not need to set it.
-**`use_gpu` and `use_multiprocess` cannot be `true` at the same time.**
-**When both `use_gpu` and `enable_mkldnn` are set to `true` at the same time, GPU is used to run and `enable_mkldnn` will be ignored.**
For example, use GPU card No. 3 to start the 2-stage series service:
**Parameter Description**:
* The configurable parameters in `init_args` are consistent with the `_initialize` function interface in `module.py`. in,
- When `use_gpu` is `true`, it means to use GPU to start the service.
- When `enable_mkldnn` is `true`, it means to use MKL-DNN acceleration.
* The configurable parameters in `predict_args` are consistent with the `predict` function interface in `module.py`.
**Notice**:
* When using the configuration file to start the service, the parameter settings in the configuration file will be used, and other command line parameters will be ignored;
* If you use GPU prediction (ie, `use_gpu` is set to `true`), you need to set the `CUDA_VISIBLE_DEVICES` environment variable to specify the GPU card number used before starting the service, such as: `export CUDA_VISIBLE_DEVICES=0`;
* **`use_gpu` cannot be `true`** at the same time as `use_multiprocess`;
* ** When both `use_gpu` and `enable_mkldnn` are `true`, `enable_mkldnn` will be ignored and GPU** will be used.
If you use GPU No. 3 card to start the service:
```shell
cd PaddleClas/deploy
export CUDA_VISIBLE_DEVICES=3
hub serving start -c hubserving/clas/config.json
```
## Send prediction requests
After the service starts, you can use the following command to send a prediction request to obtain the prediction result:
-**image_path**: Test image path, can be a single image path or an image directory path
-**batch_size**: [**Optional**] batch_size. Default by `1`.
-**resize_short**: [**Optional**] In preprocessing, resize according to short size. Default by `256`。
-**crop_size**: [**Optional**] In preprocessing, centor crop size. Default by `224`。
-**normalize**: [**Optional**] In preprocessing, whether to do `normalize`. Default by `True`。
-**to_chw**: [**Optional**] In preprocessing, whether to transpose to `CHW`. Default by `True`。
<a name="6"></a>
## 6. Send prediction requests
**Notice**:
If you want to use `Transformer series models`, such as `DeiT_***_384`, `ViT_***_384`, etc., please pay attention to the input size of model, and need to set `--resize_short=384`, `--crop_size=384`.
After configuring the server, you can use the following command to send a prediction request to get the prediction result:
The returned result is a list, including the `top_k`'s classification results, corresponding scores and the time cost of prediction, details as follows.
```
list: The returned results
└─ list: The result of first picture
└─ list: The top-k classification results, sorted in descending order of score
└─ list: The scores corresponding to the top-k classification results, sorted in descending order of score
└─ float: The time cost of predicting the picture, unit second
The average time of prediction cost: 2.970 s/image
The average time cost: 3.014 s/image
The average top-1 score: 0.110
```
**Script parameter description**:
* **server_url**: Service address, the format is `http://[ip_address]:[port]/predict/[module_name]`.
* **image_path**: The test image path, which can be a single image path or an image collection directory path.
* **batch_size**: [**OPTIONAL**] Make predictions in `batch_size` size, default is `1`.
* **resize_short**: [**optional**] When preprocessing, resize by short edge, default is `256`.
* **crop_size**: [**Optional**] The size of the center crop during preprocessing, the default is `224`.
* **normalize**: [**Optional**] Whether to perform `normalize` during preprocessing, the default is `True`.
* **to_chw**: [**Optional**] Whether to adjust to `CHW` order when preprocessing, the default is `True`.
**Note**: If you use `Transformer` series models, such as `DeiT_***_384`, `ViT_***_384`, etc., please pay attention to the input data size of the model, you need to specify `--resize_short=384 -- crop_size=384`.
**Return result format description**:
The returned result is a list (list), including the top-k classification results, the corresponding scores, and the time-consuming prediction of this image, as follows:
```shell
list: return result
└──list: first image result
├── list: the top k classification results, sorted in descending order of score
├── list: the scores corresponding to the first k classification results, sorted in descending order of score
└── float: The image classification time, in seconds
```
**Note:** If you need to add, delete or modify the returned fields, you can modify the corresponding module. For the details, refer to the user-defined modification service module in the next section.
## User defined service module modification
If you need to modify the service logic, the following steps are generally required:
1. Stop service
```shell
hub serving stop --port/-p XXXX
```
<a name="7"></a>
## 7. User defined service module modification
2. Modify the code in the corresponding files, like `module.py` and `params.py`, according to the actual needs. You need re-install(hub install hubserving/clas/) and re-deploy after modifing `module.py`.
After modifying and installing and before deploying, you can use `python hubserving/clas/module.py` to test the installed service module.
If you need to modify the service logic, you need to do the following:
For example, if you need to replace the model used by the deployed service, you need to modify model path parameters `cfg.model_file` and `cfg.params_file` in `params.py`. Of course, other related parameters may need to be modified at the same time. Please modify and debug according to the actual situation.
3. Uninstall old service module
```shell
hub uninstall clas_system
```
1. Stop the service
```shell
hub serving stop --port/-p XXXX
```
4. Install modified service module
```shell
hub install hubserving/clas/
```
2. Go to the corresponding `module.py` and `params.py` and other files to modify the code according to actual needs. `module.py` needs to be reinstalled after modification (`hub install hubserving/clas/`) and deployed. Before deploying, you can use the `python3.7 hubserving/clas/module.py` command to quickly test the code ready for deployment.
5. Restart service
```shell
hub serving start -m clas_system
```
3. Uninstall the old service pack
```shell
hub uninstall clas_system
```
**Note**:
4. Install the new modified service pack
```shell
hub install hubserving/clas/
```
Common parameters can be modified in params.py:
* Directory of model files(include model structure file and model parameters file):
```python
"inference_model_dir":
```
* The number of Top-k results returned during post-processing:
```python
'topk':
```
* Mapping file corresponding to label and class ID during post-processing:
```python
'class_id_map_file':
```
5. Restart the service
```shell
hub serving start -m clas_system
```
In order to avoid unnecessary delay and be able to predict in batch, the preprocessing (include resize, crop and other) is completed in the client, so modify [test_hubserving.py](./test_hubserving.py#L35-L52) if necessary.
**Notice**:
Common parameters can be modified in `PaddleClas/deploy/hubserving/clas/params.py`:
* To replace the model, you need to modify the model file path parameters:
```python
"inference_model_dir":
```
* Change the number of `top-k` results returned when postprocessing:
```python
'topk':
```
* The mapping file corresponding to the lable and class id when changing the post-processing:
```python
'class_id_map_file':
```
In order to avoid unnecessary delay and be able to predict with batch_size, data preprocessing logic (including `resize`, `crop` and other operations) is completed on the client side, so it needs to modify data preprocessing logic related code in [PaddleClas/deploy/hubserving/test_hubserving.py# L41-L47](./test_hubserving.py#L41-L47) and [PaddleClas/deploy/hubserving/test_hubserving.py#L51-L76](./test_hubserving.py#L51-L76).
-[3. Image Classification Service Deployment](#3-image-classification-service-deployment)
-[3.1 Model conversion](#31-model-conversion)
-[3.2 Service deployment and request](#32-service-deployment-and-request)
-[3.2.1 Python Serving](#321-python-serving)
-[3.2.2 C++ Serving](#322-c-serving)
<aname="1"></a>
## 1 Introduction
[Paddle Serving](https://github.com/PaddlePaddle/Serving) aims to help deep learning developers easily deploy online prediction services, support one-click deployment of industrial-grade service capabilities, high concurrency between client and server Efficient communication and support for developing clients in multiple programming languages.
This section takes the HTTP prediction service deployment as an example to introduce how to use PaddleServing to deploy the model service in PaddleClas. Currently, only Linux platform deployment is supported, and Windows platform is not currently supported.
<aname="2"></a>
## 2. Serving installation
The Serving official website recommends using docker to install and deploy the Serving environment. First, you need to pull the docker environment and create a Serving-based docker.
* If the installation speed is too slow, you can change the source through `-i https://pypi.tuna.tsinghua.edu.cn/simple` to speed up the installation process.
* For other environment configuration installation, please refer to: [Install Paddle Serving with Docker](https://github.com/PaddlePaddle/Serving/blob/v0.7.0/doc/Install_EN.md)
<aname="3"></a>
## 3. Image Classification Service Deployment
The following takes the classic ResNet50_vd model as an example to introduce how to deploy the image classification service.
<aname="3.1"></a>
### 3.1 Model conversion
When using PaddleServing for service deployment, you need to convert the saved inference model into a Serving model.
- Go to the working directory:
```shell
cd deploy/paddleserving
```
- Download and unzip the inference model for ResNet50_vd:
| `dirname` | str | - | The storage path of the model file to be converted. The program structure file and parameter file are saved in this directory. |
| `model_filename` | str | None | The name of the file storing the model Inference Program structure that needs to be converted. If set to None, use `__model__` as the default filename |
| `params_filename` | str | None | File name where all parameters of the model to be converted are stored. It needs to be specified if and only if all model parameters are stored in a single binary file. If the model parameters are stored in separate files, set it to None |
| `serving_server` | str | `"serving_server"` | The storage path of the converted model files and configuration files. Default is serving_server |
| `serving_client` | str | `"serving_client"` | The converted client configuration file storage path. Default is serving_client |
After the ResNet50_vd inference model conversion is completed, there will be additional `ResNet50_vd_serving` and `ResNet50_vd_client` folders in the current folder, with the following structure:
```shell
├── ResNet50_vd_serving/
│ ├── inference.pdiparams
│ ├── inference.pdmodel
│ ├── serving_server_conf.prototxt
│ └── serving_server_conf.stream.prototxt
│
└── ResNet50_vd_client/
├── serving_client_conf.prototxt
└── serving_client_conf.stream.prototxt
```
- Serving provides the function of input and output renaming in order to be compatible with the deployment of different models. When different models are deployed in inference, you only need to modify the `alias_name` of the configuration file, and the inference deployment can be completed without modifying the code. Therefore, after the conversion, you need to modify the alias names in the files `serving_server_conf.prototxt` under `ResNet50_vd_serving` and `ResNet50_vd_client` respectively, and change the `alias_name` in `fetch_var` to `prediction`, the modified serving_server_conf.prototxt is as follows Show:
```log
feed_var {
name: "inputs"
alias_name: "inputs"
is_lod_tensor: false
feed_type: 1
shape: 3
shape: 224
shape: 224
}
fetch_var {
name: "save_infer_model/scale_0.tmp_1"
alias_name: "prediction"
is_lod_tensor: false
fetch_type: 1
shape: 1000
}
```
<aname="3.2"></a>
### 3.2 Service deployment and request
The paddleserving directory contains the code for starting the pipeline service, the C++ serving service and sending the prediction request, mainly including:
```shell
__init__.py
classification_web_service.py # Script to start the pipeline server
config.yml # Configuration file to start the pipeline service
pipeline_http_client.py # Script for sending pipeline prediction requests in http mode
pipeline_rpc_client.py # Script for sending pipeline prediction requests in rpc mode
readme.md # Classification model service deployment document
run_cpp_serving.sh # Start the C++ Serving departmentscript
test_cpp_serving_client.py # Script for sending C++ serving prediction requests in rpc mode
```
<aname="3.2.1"></a>
#### 3.2.1 Python Serving
- Start the service:
```shell
# Start the service and save the running log in log.txt
If the service program is running in the foreground, you can press `Ctrl+C` to terminate the server program; if it is running in the background, you can use the kill command to close related processes, or you can execute the following command in the path where the service program is started to terminate the server program:
```bash
python3.7 -m paddle_serving_server.serve stop
```
After the execution is completed, the `Process stopped` message appears, indicating that the service was successfully shut down.
<aname="3.2.2"></a>
#### 3.2.2 C++ Serving
Different from Python Serving, the C++ Serving client calls C++ OP to predict, so before starting the service, you need to compile and install the serving server package, and set `SERVING_BIN`.
- Compile and install the Serving server package
```shell
# Enter the working directory
cd PaddleClas/deploy/paddleserving
# One-click compile and install Serving server, set SERVING_BIN
source ./build_server.sh python3.7
```
**Note: The path set by **[build_server.sh](./build_server.sh#L55-L62) may need to be modified according to the actual machine environment such as CUDA, python version, etc., and then compiled.
- Modify the client file `ResNet50_client/serving_client_conf.prototxt` , change the field after `feed_type:` to 20, change the field after the first `shape:` to 1 and delete the rest of the `shape` fields.
```log
feed_var {
name: "inputs"
alias_name: "inputs"
is_lod_tensor: false
feed_type: 20
shape: 1
}
```
- Modify part of the code of [`test_cpp_serving_client`](./test_cpp_serving_client.py)
1. Modify the [`feed={"inputs": image}`](./test_cpp_serving_client.py#L28) part of the code, and change the path after `load_client_config` to `ResNet50_client/serving_client_conf.prototxt` .
2. Modify the [`feed={"inputs": image}`](./test_cpp_serving_client.py#L45) part of the code, and change `inputs` to be the same as the `feed_var` field in `ResNet50_client/serving_client_conf.prototxt` name` is the same. Since `name` in some model client files is `x` instead of `inputs` , you need to pay attention to this when using these models for C++ Serving deployment.
- Start the service:
```shell
# Start the service, the service runs in the background, and the running log is saved in nohup.txt
# CPU deployment
sh run_cpp_serving.sh
# GPU deployment and specify card 0
sh run_cpp_serving.sh 0
```
- send request:
```shell
# send service request
python3.7 test_cpp_serving_client.py
```
After a successful run, the results of the model prediction will be printed in the cmd window, and the results are as follows:
If the service program is running in the foreground, you can press `Ctrl+C` to terminate the server program; if it is running in the background, you can use the kill command to close related processes, or you can execute the following command in the path where the service program is started to terminate the server program:
```bash
python3.7 -m paddle_serving_server.serve stop
```
After the execution is completed, the `Process stopped` message appears, indicating that the service was successfully shut down.
##4.FAQ
**Q1**: No result is returned after the request is sent or an output decoding error is prompted
**A1**: Do not set the proxy when starting the service and sending the request. You can close the proxy before starting the service and sending the request. The command to close the proxy is:
```shell
unset https_proxy
unset http_proxy
```
**Q2**: nothing happens after starting the service
**A2**: You can check whether the path corresponding to `model_config` in `config.yml` exists, and whether the folder name is correct
For more service deployment types, such as `RPC prediction service`, you can refer to Serving's [github official website](https://github.com/PaddlePaddle/Serving/tree/v0.9.0/examples)
English | [简体中文](../../zh_CN/inference_deployment/paddle_hub_serving_deploy.md)
PaddleClas supports rapid service deployment through Paddlehub. At present, it supports the deployment of image classification. Please look forward to the deployment of image recognition.
# Service deployment based on PaddleHub Serving
---
PaddleClas supports rapid service deployment through PaddleHub. Currently, the deployment of image classification is supported. Please look forward to the deployment of image recognition.
## Catalogue
-[1. Introduction](#1)
-[2. Prepare the environment](#2)
-[3. Download inference model](#3)
...
...
@@ -16,97 +15,101 @@ PaddleClas supports rapid service deployment through Paddlehub. At present, it s
-[6. Send prediction requests](#6)
-[7. User defined service module modification](#7)
<aname="1"></a>
## 1. Introduction
## 1 Introduction
HubServing service pack contains 3 files, the directory is as follows:
The hubserving service deployment configuration service package `clas` contains 3 required files, the directories are as follows:
```shell
deploy/hubserving/clas/
├── __init__.py # Empty file, required
├── config.json # Configuration file, optional, passed in as a parameter when starting the service with configuration
├── module.py # The main module, required, contains the complete logic of the service
└── params.py # Parameter file, required, including model path, pre- and post-processing parameters and other parameters
```
hubserving/clas/
└─ __init__.py Empty file, required
└─ config.json Configuration file, optional, passed in as a parameter when using configuration to start the service
└─ module.py Main module file, required, contains the complete logic of the service
└─ params.py Parameter file, required, including parameters such as model path, pre- and post-processing parameters
*Classification inference model structure file: `PaddleClas/inference/inference.pdmodel`
*Classification inference model weight file: `PaddleClas/inference/inference.pdiparams`
**Notice**:
* The model file path can be viewed and modified in `PaddleClas/deploy/hubserving/clas/params.py`.
* It should be noted that the prefix of model structure file and model parameters file must be `inference`.
* More models provided by PaddleClas can be obtained from the [model library](../algorithm_introduction/ImageNet_models_en.md). You can also use models trained by yourself.
* Model file paths can be viewed and modified in `PaddleClas/deploy/hubserving/clas/params.py`:
```python
"inference_model_dir":"../inference/"
```
* Model files (including `.pdmodel` and `.pdiparams`) must be named `inference`.
* We provide a large number of pre-trained models based on the ImageNet-1k dataset. For the model list and download address, see [Model Library Overview](../algorithm_introduction/ImageNet_models.md), or you can use your own trained and converted models.
<aname="4"></a>
## 4. Install Service Module
## 4. Install the service module
* On Linux platform, the examples are as follows.
```shell
cd PaddleClas/deploy
hub install hubserving/clas/
```
* In the Linux environment, the installation example is as follows:
```shell
cd PaddleClas/deploy
# Install the service module:
hub install hubserving/clas/
```
* In the Windows environment (the folder separator is `\`), the installation example is as follows:
```shell
cd PaddleClas\deploy
# Install the service module:
hub install hubserving\clas\
```
* On Windows platform, the examples are as follows.
```shell
cd PaddleClas\deploy
hub install hubserving\clas\
```
<a name="5"></a>
## 5. Start service
<a name="5.1"></a>
### 5.1 Start with command line parameters
This method only supports CPU. Command as follow:
This method only supports prediction using CPU. Start command:
```shell
$ hub serving start --modulesModule1==Version1 \
--port XXXX \
--use_multiprocess\
--workers\
```
**parameters:**
|parameters|usage|
|-|-|
|--modules/-m|PaddleHub Serving pre-installed model, listed in the form of multiple Module==Version key-value pairs<br>*`When Version is not specified, the latest version is selected by default`*|
|--port/-p|Service port, default is 8866|
|--use_multiprocess|Enable concurrent mode, the default is single-process mode, this mode is recommended for multi-core CPU machines<br>*`Windows operating system only supports single-process mode`*|
|--workers|The number of concurrent tasks specified in concurrent mode, the default is `2*cpu_count-1`, where `cpu_count` is the number of CPU cores|
For example, start service:
```shell
hub serving start -m clas_system
```
hub serving start \
--modules clas_system
--port 8866
```
This completes the deployment of a serviced API, using the default port number 8866.
This completes the deployment of a service API, using the default port number 8866.
**Parameter Description**:
|parameters|uses|
|-|-|
|--modules/-m| [**required**] PaddleHub Serving pre-installed model, listed in the form of multiple Module==Version key-value pairs<br>*`When no Version is specified, the latest is selected by default version `*|
|--port/-p| [**OPTIONAL**] Service port, default is 8866|
|--use_multiprocess| [**Optional**] Whether to enable the concurrent mode, the default is single-process mode, it is recommended to use this mode for multi-core CPU machines<br>*`Windows operating system only supports single-process mode`*|
|--workers| [**Optional**] The number of concurrent tasks specified in concurrent mode, the default is `2*cpu_count-1`, where `cpu_count` is the number of CPU cores|
For more deployment details, see [PaddleHub Serving Model One-Click Service Deployment](https://paddlehub.readthedocs.io/zh_CN/release-v2.1/tutorial/serving.html)
<a name="5.2"></a>
### 5.2 Start with configuration file
This method supports CPU and GPU. Command as follow:
This method only supports prediction using CPU or GPU. Start command:
```shell
hub serving start --config/-c config.json
```
hub serving start -c config.json
```
Wherein, the format of `config.json` is as follows:
Among them, the format of `config.json` is as follows:
```json
{
...
...
@@ -127,18 +130,19 @@ Wherein, the format of `config.json` is as follows:
}
```
- The configurable parameters in `init_args` are consistent with the `_initialize` function interface in `module.py`. Among them,
- when `use_gpu` is `true`, it means that the GPU is used to start the service.
- when `enable_mkldnn` is `true`, it means that use MKL-DNN to accelerate.
- The configurable parameters in `predict_args` are consistent with the `predict` function interface in `module.py`.
**Parameter Description**:
* The configurable parameters in `init_args` are consistent with the `_initialize` function interface in `module.py`. in,
- When `use_gpu` is `true`, it means to use GPU to start the service.
- When `enable_mkldnn` is `true`, it means to use MKL-DNN acceleration.
* The configurable parameters in `predict_args` are consistent with the `predict` function interface in `module.py`.
**Note:**
- When using the configuration file to start the service, other parameters will be ignored.
- If you use GPU prediction (that is, `use_gpu` is set to `true`), you need to set the environment variable CUDA_VISIBLE_DEVICES before starting the service, such as: ```export CUDA_VISIBLE_DEVICES=0```, otherwise you do not need to set it.
-**`use_gpu` and `use_multiprocess` cannot be `true` at the same time.**
-**When both `use_gpu` and `enable_mkldnn` are set to `true` at the same time, GPU is used to run and `enable_mkldnn` will be ignored.**
**Notice**:
* When using the configuration file to start the service, the parameter settings in the configuration file will be used, and other command line parameters will be ignored;
* If you use GPU prediction (ie, `use_gpu` is set to `true`), you need to set the `CUDA_VISIBLE_DEVICES` environment variable to specify the GPU card number used before starting the service, such as: `export CUDA_VISIBLE_DEVICES=0`;
* **`use_gpu` cannot be `true`** at the same time as `use_multiprocess`;
* ** When both `use_gpu` and `enable_mkldnn` are `true`, `enable_mkldnn` will be ignored and GPU** will be used.
For example, use GPU card No. 3 to start the 2-stage series service:
-**image_path**: Test image path, can be a single image path or an image directory path
-**batch_size**: [**Optional**] batch_size. Default by `1`.
-**resize_short**: [**Optional**] In preprocessing, resize according to short size. Default by `256`。
-**crop_size**: [**Optional**] In preprocessing, centor crop size. Default by `224`。
-**normalize**: [**Optional**] In preprocessing, whether to do `normalize`. Default by `True`。
-**to_chw**: [**Optional**] In preprocessing, whether to transpose to `CHW`. Default by `True`。
**Script parameter description**:
* **server_url**: Service address, the format is `http://[ip_address]:[port]/predict/[module_name]`.
* **image_path**: The test image path, which can be a single image path or an image collection directory path.
* **batch_size**: [**OPTIONAL**] Make predictions in `batch_size` size, default is `1`.
* **resize_short**: [**optional**] When preprocessing, resize by short edge, default is `256`.
* **crop_size**: [**Optional**] The size of the center crop during preprocessing, the default is `224`.
* **normalize**: [**Optional**] Whether to perform `normalize` during preprocessing, the default is `True`.
* **to_chw**: [**Optional**] Whether to adjust to `CHW` order when preprocessing, the default is `True`.
**Notice**:
If you want to use `Transformer series models`, such as `DeiT_***_384`, `ViT_***_384`, etc., please pay attention to the input size of model, and need to set `--resize_short=384`, `--crop_size=384`.
**Note**: If you use `Transformer` series models, such as `DeiT_***_384`, `ViT_***_384`, etc., please pay attention to the input data size of the model, you need to specify `--resize_short=384 -- crop_size=384`.
**Eg.**
**Return result format description**:
The returned result is a list (list), including the top-k classification results, the corresponding scores, and the time-consuming prediction of this image, as follows:
├── list: the top k classification results, sorted in descending order of score
├── list: the scores corresponding to the first k classification results, sorted in descending order of score
└── float: The image classification time, in seconds
```
The returned result is a list, including the `top_k`'s classification results, corresponding scores and the time cost of prediction, details as follows.
```
list: The returned results
└─ list: The result of first picture
└─ list: The top-k classification results, sorted in descending order of score
└─ list: The scores corresponding to the top-k classification results, sorted in descending order of score
└─ float: The time cost of predicting the picture, unit second
```
**Note:** If you need to add, delete or modify the returned fields, you can modify the corresponding module. For the details, refer to the user-defined modification service module in the next section.
<a name="7"></a>
## 7. User defined service module modification
If you need to modify the service logic, the following steps are generally required:
1. Stop service
```shell
hub serving stop --port/-p XXXX
```
2. Modify the code in the corresponding files, like `module.py` and `params.py`, according to the actual needs. You need re-install(hub install hubserving/clas/) and re-deploy after modifing `module.py`.
After modifying and installing and before deploying, you can use `python hubserving/clas/module.py` to test the installed service module.
If you need to modify the service logic, you need to do the following:
For example, if you need to replace the model used by the deployed service, you need to modify model path parameters `cfg.model_file` and `cfg.params_file` in `params.py`. Of course, other related parameters may need to be modified at the same time. Please modify and debug according to the actual situation.
3. Uninstall old service module
```shell
hub uninstall clas_system
```
1. Stop the service
```shell
hub serving stop --port/-p XXXX
```
4. Install modified service module
```shell
hub install hubserving/clas/
```
2. Go to the corresponding `module.py` and `params.py` and other files to modify the code according to actual needs. `module.py` needs to be reinstalled after modification (`hub install hubserving/clas/`) and deployed. Before deploying, you can use the `python3.7 hubserving/clas/module.py` command to quickly test the code ready for deployment.
5. Restart service
```shell
hub serving start -m clas_system
```
3. Uninstall the old service pack
```shell
hub uninstall clas_system
```
**Note**:
4. Install the new modified service pack
```shell
hub install hubserving/clas/
```
Common parameters can be modified in params.py:
* Directory of model files(include model structure file and model parameters file):
```python
"inference_model_dir":
```
* The number of Top-k results returned during post-processing:
```python
'topk':
```
* Mapping file corresponding to label and class ID during post-processing:
```python
'class_id_map_file':
```
5. Restart the service
```shell
hub serving start -m clas_system
```
In order to avoid unnecessary delay and be able to predict in batch, the preprocessing (include resize, crop and other) is completed in the client, so modify [test_hubserving.py](../../../deploy/hubserving/test_hubserving.py#L35-L52) if necessary.
**Notice**:
Common parameters can be modified in `PaddleClas/deploy/hubserving/clas/params.py`:
* To replace the model, you need to modify the model file path parameters:
```python
"inference_model_dir":
```
* Change the number of `top-k` results returned when postprocessing:
```python
'topk':
```
* The mapping file corresponding to the lable and class id when changing the post-processing:
```python
'class_id_map_file':
```
In order to avoid unnecessary delay and be able to predict with batch_size, data preprocessing logic (including `resize`, `crop` and other operations) is completed on the client side, so it needs to be in [PaddleClas/deploy/hubserving/test_hubserving.py# L41-L47](../../../deploy/hubserving/test_hubserving.py#L41-L47) and [PaddleClas/deploy/hubserving/test_hubserving.py#L51-L76](../../../deploy/hubserving/test_hubserving.py#L51-L76) Modify the data preprocessing logic related code.
-[3. Service Deployment for Image Classification](#3)
-[3.1 Model Transformation](#3.1)
-[3.2 Service Deployment and Request](#3.2)
-[4. Service Deployment for Image Recognition](#4)
-[4.1 Model Transformation](#4.1)
-[4.2 Service Deployment and Request](#4.2)
-[5. FAQ](#5)
<aname="1"></a>
## 1. Introduction
[Paddle Serving](https://github.com/PaddlePaddle/Serving) is designed to provide easy deployment of on-line prediction services for deep learning developers, it supports one-click deployment of industrial-grade services, highly concurrent and efficient communication between client and server, and multiple programming languages for client development.
This section, exemplified by HTTP deployment of prediction service, describes how to deploy model services in PaddleClas with PaddleServing. Currently, only deployment on Linux platform is supported. Windows platform is not supported.
<aname="2"></a>
## 2. Installation of Serving
It is officially recommended to use docker for the installation and environment deployment of Serving. First, pull the docker and create a Serving-based one.
nvidia-docker run -p 9292:9292 --name test -dit paddlepaddle/serving:0.7.0-cuda10.2-cudnn7-devel bash
nvidia-docker exec -it test bash
```
Once you are in docker, install the Serving-related python packages.
```
pip3 install paddle-serving-client==0.7.0
pip3 install paddle-serving-server==0.7.0 # CPU
pip3 install paddle-serving-app==0.7.0
pip3 install paddle-serving-server-gpu==0.7.0.post102 #GPU with CUDA10.2 + TensorRT6
# For other GPU environemnt, confirm the environment before choosing which one to execute
pip3 install paddle-serving-server-gpu==0.7.0.post101 # GPU with CUDA10.1 + TensorRT6
pip3 install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUDA11.2 + TensorRT8
```
- Speed up the installation process by replacing the source with `-i https://pypi.tuna.tsinghua.edu.cn/simple`.
- For other environment configuration and installation, please refer to [Install Paddle Serving using docker](https://github.com/PaddlePaddle/Serving/blob/v0.7.0/doc/Install_EN.md)
- To deploy CPU services, please install the CPU version of serving-server with the following command.
```
pip install paddle-serving-server
```
<aname="3"></a>
## 3. Service Deployment for Image Classification
<aname="3.1"></a>
### 3.1 Model Transformation
When adopting PaddleServing for service deployment, the saved inference model needs to be converted to a Serving model. The following part takes the classic ResNet50_vd model as an example to introduce the deployment of image classification service.
- Enter the working directory:
```
cd deploy/paddleserving
```
- Download the inference model of ResNet50_vd:
```
# Download and decompress the ResNet50_vd model
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet50_vd_infer.tar && tar xf ResNet50_vd_infer.tar
```
- Convert the downloaded inference model into a format that is readily deployable by Server with the help of paddle_serving_client.
After the transformation, `ResNet50_vd_serving` and `ResNet50_vd_client` will be added to the current folder in the following format:
```
|- ResNet50_vd_server/
|- __model__
|- __params__
|- serving_server_conf.prototxt
|- serving_server_conf.stream.prototxt
|- ResNet50_vd_client
|- serving_client_conf.prototxt
|- serving_client_conf.stream.prototxt
```
Having obtained the model file, modify the alias name in `serving_server_conf.prototxt` under directory `ResNet50_vd_server` by changing `alias_name` in `fetch_var` to `prediction`.
**Notes**: Serving supports input and output renaming to ensure its compatibility with the deployment of different models. In this case, modifying the alias_name of the configuration file is the only step needed to complete the inference and deployment of all kinds of models. The modified serving_server_conf.prototxt is shown below:
```
feed_var {
name: "inputs"
alias_name: "inputs"
is_lod_tensor: false
feed_type: 1
shape: 3
shape: 224
shape: 224
}
fetch_var {
name: "save_infer_model/scale_0.tmp_1"
alias_name: "prediction"
is_lod_tensor: true
fetch_type: 1
shape: -1
}
```
<aname="3.2"></a>
### 3.2 Service Deployment and Request
Paddleserving's directory contains the code to start the pipeline service and send prediction requests, including:
```
__init__.py
config.yml # Configuration file for starting the service
pipeline_http_client.py # Script for sending pipeline prediction requests by http
pipeline_rpc_client.py # Script for sending pipeline prediction requests by rpc
classification_web_service.py # Script for starting the pipeline server
```
- Start the service:
```
# Start the service and the run log is saved in log.txt
python3 classification_web_service.py &>log.txt &
```
Once the service is successfully started, a log will be printed in log.txt similar to the following ![img](../../../deploy/paddleserving/imgs/start_server.png)
- Send request:
```
# Send service request
python3 pipeline_http_client.py
```
Once the service is successfully started, the prediction results will be printed in the cmd window, see the following example:![img](../../../deploy/paddleserving/imgs/results.png)
<aname="4"></a>
## 4. Service Deployment for Image Recognition
When using PaddleServing for service deployment, the saved inference model needs to be converted to a Serving model. The following part, exemplified by the ultra-lightweight model for image recognition in PP-ShiTu, details the deployment of image recognition service.
<aname="4.1"></a>
## 4.1 Model Transformation
- Download inference models for general detection and general recognition
```
cd deploy
# Download and decompress general recogntion models
After the transformation, `general_PPLCNet_x2_5_lite_v1.0_serving/` and `general_PPLCNet_x2_5_lite_v1.0_serving/` will be added to the current folder. Modify the alias name in serving_server_conf.prototxt under the directory `general_PPLCNet_x2_5_lite_v1.0_serving/` by changing `alias_name` to `features` in `fetch_var`. The modified serving_server_conf.prototxt is similar to the following:
```
feed_var {
name: "x"
alias_name: "x"
is_lod_tensor: false
feed_type: 1
shape: 3
shape: 224
shape: 224
}
fetch_var {
name: "save_infer_model/scale_0.tmp_1"
alias_name: "features"
is_lod_tensor: true
fetch_type: 1
shape: -1
}
```
- Convert the inference model for detection into a Serving model:
After the transformation, `picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/` and `picodet_PPLCNet_x2_5_ mainbody_lite_v1.0_client/` will be added to the current folder.
**Note:** The alias name in the serving_server_conf.prototxt under the directory`picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/` requires no modification.
- Download and decompress the constructed search library index
```
cd ../
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v1.0.tar && tar -xf drink_dataset_v1.0.tar
```
<aname="4.2"></a>
## 4.2 Service Deployment and Request
**Note:** Since the recognition service involves multiple models, PipeLine is adopted for better performance. This deployment method does not support the windows platform for now.
- Enter the working directory
```
cd ./deploy/paddleserving/recognition
```
Paddleserving's directory contains the code to start the pipeline service and send prediction requests, including:
```
__init__.py
config.yml # Configuration file for starting the service
pipeline_http_client.py # Script for sending pipeline prediction requests by http
pipeline_rpc_client.py # Script for sending pipeline prediction requests by rpc
recognition_web_service.py # Script for starting the pipeline server
```
- Start the service:
```
# Start the service and the run log is saved in log.txt
python3 recognition_web_service.py &>log.txt &
```
Once the service is successfully started, a log will be printed in log.txt similar to the following ![img](../../../deploy/paddleserving/imgs/start_server_shitu.png)
- Send request:
```
python3 pipeline_http_client.py
```
Once the service is successfully started, the prediction results will be printed in the cmd window, see the following example: ![img](../../../deploy/paddleserving/imgs/results_shitu.png)
<aname="5"></a>
## 5.FAQ
**Q1**: After sending a request, no result is returned or the output is prompted with a decoding error.
**A1**: Please turn off the proxy before starting the service and sending requests, try the following command:
```
unset https_proxy
unset http_proxy
```
For more types of service deployment, such as `RPC prediction services`, you can refer to the [github official website](https://github.com/PaddlePaddle/Serving/tree/v0.7.0/examples) of Serving.
-[3. Image recognition service deployment](#3-image-recognition-service-deployment)
-[3.1 Model conversion](#31-model-conversion)
-[3.2 Service deployment and request](#32-service-deployment-and-request)
-[3.2.1 Python Serving](#321-python-serving)
-[3.2.2 C++ Serving](#322-c-serving)
-[4. FAQ](#4-faq)
<aname="1"></a>
## 1 Introduction
[Paddle Serving](https://github.com/PaddlePaddle/Serving) aims to help deep learning developers easily deploy online prediction services, support one-click deployment of industrial-grade service capabilities, high concurrency between client and server Efficient communication and support for developing clients in multiple programming languages.
This section takes the HTTP prediction service deployment as an example to introduce how to use PaddleServing to deploy the model service in PaddleClas. Currently, only Linux platform deployment is supported, and Windows platform is not currently supported.
<aname="2"></a>
## 2. Serving installation
The Serving official website recommends using docker to install and deploy the Serving environment. First, you need to pull the docker environment and create a Serving-based docker.
* If the installation speed is too slow, you can change the source through `-i https://pypi.tuna.tsinghua.edu.cn/simple` to speed up the installation process.
* For other environment configuration installation, please refer to: [Install Paddle Serving with Docker](https://github.com/PaddlePaddle/Serving/blob/v0.7.0/doc/Install_CN.md)
<aname="3"></a>
## 3. Image recognition service deployment
When using PaddleServing for image recognition service deployment, **need to convert multiple saved inference models to Serving models**. The following takes the ultra-lightweight image recognition model in PP-ShiTu as an example to introduce the deployment of image recognition services.
<aname="3.1"></a>
### 3.1 Model conversion
- Go to the working directory:
```shell
cd deploy/
```
- Download generic detection inference model and generic recognition inference model
```shell
# Create and enter the models folder
mkdir models
cd models
# Download and unzip the generic recognition model
The meaning of the parameters of the above command is the same as [#3.1 Model conversion](#3.1)
After the recognition inference model is converted, there will be additional folders `general_PPLCNet_x2_5_lite_v1.0_serving/` and `general_PPLCNet_x2_5_lite_v1.0_client/` in the current folder. Modify the name of `alias` in `serving_server_conf.prototxt` in `general_PPLCNet_x2_5_lite_v1.0_serving/` and `general_PPLCNet_x2_5_lite_v1.0_client/` directories respectively: Change `alias_name` in `fetch_var` to `features`. The content of the modified `serving_server_conf.prototxt` is as follows
```log
feed_var {
name: "x"
alias_name: "x"
is_lod_tensor: false
feed_type: 1
shape: 3
shape: 224
shape: 224
}
fetch_var {
name: "save_infer_model/scale_0.tmp_1"
alias_name: "features"
is_lod_tensor: false
fetch_type: 1
shape: 512
}
```
After the conversion of the general recognition inference model is completed, there will be additional `general_PPLCNet_x2_5_lite_v1.0_serving/` and `general_PPLCNet_x2_5_lite_v1.0_client/` folders in the current folder, with the following structure:
```shell
├── general_PPLCNet_x2_5_lite_v1.0_serving/
│ ├── inference.pdiparams
│ ├── inference.pdmodel
│ ├── serving_server_conf.prototxt
│ └── serving_server_conf.stream.prototxt
│
└── general_PPLCNet_x2_5_lite_v1.0_client/
├── serving_client_conf.prototxt
└── serving_client_conf.stream.prototxt
```
- Convert general detection inference model to Serving model:
The meaning of the parameters of the above command is the same as [#3.1 Model conversion](#3.1)
After the conversion of the general detection inference model is completed, there will be additional folders `picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/` and `picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/` in the current folder, with the following structure:
| `dirname` | str | - | The storage path of the model file to be converted. The program structure file and parameter file are saved in this directory.|
| `model_filename` | str | None | The name of the file storing the model Inference Program structure that needs to be converted. If set to None, use `__model__` as the default filename |
| `params_filename` | str | None | The name of the file that stores all parameters of the model that need to be transformed. It needs to be specified if and only if all model parameters are stored in a single binary file. If the model parameters are stored in separate files, set it to None |
| `serving_server` | str | `"serving_server"` | The storage path of the converted model files and configuration files. Default is serving_server |
| `serving_client` | str | `"serving_client"` | The converted client configuration file storage path. Default is |
- Download and unzip the index of the retrieval library that has been built
**Note:** The identification service involves multiple models, and the PipeLine deployment method is used for performance reasons. The Pipeline deployment method currently does not support the windows platform.
- go to the working directory
```shell
cd ./deploy/paddleserving/recognition
```
The paddleserving directory contains code to start the Python Pipeline service, the C++ Serving service, and send prediction requests, including:
```shell
__init__.py
config.yml # The configuration file to start the python pipeline service
pipeline_http_client.py # Script for sending pipeline prediction requests in http mode
pipeline_rpc_client.py # Script for sending pipeline prediction requests in rpc mode
recognition_web_service.py # Script to start the pipeline server
readme.md # Recognition model service deployment documents
run_cpp_serving.sh # Script to start C++ Pipeline Serving deployment
test_cpp_serving_client.py # Script for sending C++ Pipeline serving prediction requests by rpc
```
<aname="3.2.1"></a>
#### 3.2.1 Python Serving
- Start the service:
```shell
# Start the service and save the running log in log.txt
python3.7 recognition_web_service.py &>log.txt &
```
- send request:
```shell
python3.7 pipeline_http_client.py
```
After a successful run, the results of the model prediction will be printed in the cmd window, and the results are as follows:
Different from Python Serving, the C++ Serving client calls C++ OP to predict, so before starting the service, you need to compile and install the serving server package, and set `SERVING_BIN`.
- Compile and install the Serving server package
```shell
# Enter the working directory
cd PaddleClas/deploy/paddleserving
# One-click compile and install Serving server, set SERVING_BIN
source ./build_server.sh python3.7
```
**Note:** The path set by [build_server.sh](../build_server.sh#L55-L62) may need to be modified according to the actual machine environment such as CUDA, python version, etc., and then compiled.
- The input and output format used by C++ Serving is different from that of Python, so you need to execute the following command to overwrite the files below [3.1] (#31-model conversion) by copying the 4 files to get the corresponding 4 prototxt files in the folder.
If the service program is running in the foreground, you can press `Ctrl+C` to terminate the server program; if it is running in the background, you can use the kill command to close related processes, or you can execute the following command in the path where the service program is started to terminate the server program:
```bash
python3.7 -m paddle_serving_server.serve stop
```
After the execution is completed, the `Process stopped` message appears, indicating that the service was successfully shut down.
<aname="4"></a>
## 4. FAQ
**Q1**: No result is returned after the request is sent or an output decoding error is prompted
**A1**: Do not set the proxy when starting the service and sending the request. You can close the proxy before starting the service and sending the request. The command to close the proxy is:
```shell
unset https_proxy
unset http_proxy
```
**Q2**: nothing happens after starting the service
**A2**: You can check whether the path corresponding to `model_config` in `config.yml` exists, and whether the folder name is correct
For more service deployment types, such as `RPC prediction service`, you can refer to Serving's [github official website](https://github.com/PaddlePaddle/Serving/tree/v0.9.0/examples)