简体中文 | [English](./readme_en.md)
# 分类模型服务化部署
## 目录
- [1. 简介](#1-简介)
- [2. Serving 安装](#2-serving-安装)
- [3. 图像分类服务部署](#3-图像分类服务部署)
- [3.1 模型转换](#31-模型转换)
- [3.2 服务部署和请求](#32-服务部署和请求)
- [3.2.1 Python Serving](#321-python-serving)
- [3.2.2 C++ Serving](#322-c-serving)
- [4.FAQ](#4faq)
<a name="1"></a>
## 1. 简介
[Paddle Serving](https://github.com/PaddlePaddle/Serving) 旨在帮助深度学习开发者轻松部署在线预测服务,支持一键部署工业级的服务能力、客户端和服务端之间高并发和高效通信、并支持多种编程语言开发客户端。
该部分以 HTTP 预测服务部署为例,介绍怎样在 PaddleClas 中使用 PaddleServing 部署模型服务。目前只支持 Linux 平台部署,暂不支持 Windows 平台。
<a name="2"></a>
## 2. Serving 安装
Serving 官网推荐使用 docker 安装并部署 Serving 环境。首先需要拉取 docker 环境并创建基于 Serving 的 docker。
# 启动GPU docker
docker pull paddlepaddle/serving:0.7.0-cuda10.2-cudnn7-devel
nvidia-docker run -p 9292:9292 --name test -dit paddlepaddle/serving:0.7.0-cuda10.2-cudnn7-devel bash
nvidia-docker exec -it test bash
# 启动CPU docker
docker pull paddlepaddle/serving:0.7.0-devel
docker run -p 9292:9292 --name test -dit paddlepaddle/serving:0.7.0-devel bash
docker exec -it test bash
进入 docker 后,需要安装 Serving 相关的 python 包。
python3.7 -m pip install paddle-serving-client==0.7.0
python3.7 -m pip install paddle-serving-app==0.7.0
python3.7 -m pip install faiss-cpu==1.7.1post2
python3.7 -m pip install paddle-serving-server==0.7.0 # CPU
python3.7 -m pip install paddlepaddle==2.2.0 # CPU
python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post102 # GPU with CUDA10.2 + TensorRT6
python3.7 -m pip install paddlepaddle-gpu==2.2.0 # GPU with CUDA10.2
python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post101 # GPU with CUDA10.1 + TensorRT6
python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUDA11.2 + TensorRT8
* 如果安装速度太慢,可以通过 `-i https://pypi.tuna.tsinghua.edu.cn/simple` 更换源,加速安装过程。
* 其他环境配置安装请参考:[使用Docker安装Paddle Serving](https://github.com/PaddlePaddle/Serving/blob/v0.7.0/doc/Install_CN.md)
<a name="3"></a>
## 3. 图像分类服务部署
下面以经典的 ResNet50_vd 模型为例,介绍如何部署图像分类服务。
<a name="3.1"></a>
### 3.1 模型转换
使用 PaddleServing 做服务化部署时,需要将保存的 inference 模型转换为 Serving 模型。
- 进入工作目录:
cd deploy/paddleserving
- 下载并解压 ResNet50_vd 的 inference 模型:
# 下载 ResNet50_vd inference 模型
wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet50_vd_infer.tar
# 解压 ResNet50_vd inference 模型
tar xf ResNet50_vd_infer.tar
- 用 paddle_serving_client 命令把下载的 inference 模型转换成易于 Server 部署的模型格式:
# 转换 ResNet50_vd 模型
python3.7 -m paddle_serving_client.convert \
--dirname ./ResNet50_vd_infer/ \
--model_filename inference.pdmodel \
--params_filename inference.pdiparams \
--serving_server ./ResNet50_vd_serving/ \
--serving_client ./ResNet50_vd_client/
| 参数 | 类型 | 默认值 | 描述 |
| ----------------- | ---- | ------------------ | ------------------------------------------------------------ |
| `dirname` | str | - | 需要转换的模型文件存储路径,Program结构文件和参数文件均保存在此目录。 |
| `model_filename` | str | None | 存储需要转换的模型Inference Program结构的文件名称。如果设置为None,则使用 `__model__` 作为默认的文件名 |
| `params_filename` | str | None | 存储需要转换的模型所有参数的文件名称。当且仅当所有模型参数被保>存在一个单独的二进制文件中,它才需要被指定。如果模型参数是存储在各自分离的文件中,设置它的值为None |
| `serving_server` | str | `"serving_server"` | 转换后的模型文件和配置文件的存储路径。默认值为serving_server |
| `serving_client` | str | `"serving_client"` | 转换后的客户端配置文件存储路径。默认值为serving_client |
ResNet50_vd 推理模型转换完成后,会在当前文件夹多出 `ResNet50_vd_serving` 和 `ResNet50_vd_client` 的文件夹,具备如下结构:
├── ResNet50_vd_serving/
│ ├── inference.pdiparams
│ ├── inference.pdmodel
│ ├── serving_server_conf.prototxt
│ └── serving_server_conf.stream.prototxt
└── ResNet50_vd_client/
├── serving_client_conf.prototxt
└── serving_client_conf.stream.prototxt
- Serving 为了兼容不同模型的部署,提供了输入输出重命名的功能。让不同的模型在推理部署时,只需要修改配置文件的 `alias_name` 即可,无需修改代码即可完成推理部署。因此在转换完毕后需要分别修改 `ResNet50_vd_serving` 下的文件 `serving_server_conf.prototxt``ResNet50_vd_client` 下的文件 `serving_client_conf.prototxt`,将 `fetch_var``alias_name:` 后的字段改为 `prediction`,修改后的 `serving_server_conf.prototxt``serving_client_conf.prototxt` 如下所示:
feed_var {
name: "inputs"
alias_name: "inputs"
is_lod_tensor: false
feed_type: 1
shape: 3
shape: 224
shape: 224
fetch_var {
name: "save_infer_model/scale_0.tmp_1"
alias_name: "prediction"
is_lod_tensor: false
fetch_type: 1
shape: 1000
<a name="3.2"></a>
### 3.2 服务部署和请求
paddleserving 目录包含了启动 pipeline 服务、C++ serving服务和发送预测请求的代码,主要包括:
classification_web_service.py # 启动pipeline服务端的脚本
config.yml # 启动pipeline服务的配置文件
pipeline_http_client.py # http方式发送pipeline预测请求的脚本
pipeline_rpc_client.py # rpc方式发送pipeline预测请求的脚本
readme.md # 分类模型服务化部署文档
run_cpp_serving.sh # 启动C++ Serving部署的脚本
test_cpp_serving_client.py # rpc方式发送C++ serving预测请求的脚本
<a name="3.2.1"></a>
#### 3.2.1 Python Serving
- 启动服务:
# 启动服务,运行日志保存在 log.txt
python3.7 classification_web_service.py &>log.txt &
- 发送请求:
# 发送服务请求
python3.7 pipeline_http_client.py
{'err_no': 0, 'err_msg': '', 'key': ['label', 'prob'], 'value': ["['daisy']", '[0.9341402053833008]'], 'tensors': []}
- 关闭服务
python3.7 -m paddle_serving_server.serve stop
执行完毕后出现`Process stopped`信息表示成功关闭服务。
<a name="3.2.2"></a>
#### 3.2.2 C++ Serving
与Python Serving不同,C++ Serving客户端调用 C++ OP来预测,因此在启动服务之前,需要编译并安装 serving server包,并设置 `SERVING_BIN`
- 编译并安装Serving server包
# 进入工作目录
cd PaddleClas/deploy/paddleserving
# 一键编译安装Serving server、设置 SERVING_BIN
source ./build_server.sh python3.7
- 修改客户端文件 `ResNet50_vd_client/serving_client_conf.prototxt` ,将 `feed_type:` 后的字段改为20,将第一个 `shape:` 后的字段改为1并删掉其余的 `shape` 字段。
feed_var {
name: "inputs"
alias_name: "inputs"
is_lod_tensor: false
feed_type: 20
shape: 1
- 修改 [`test_cpp_serving_client`](./test_cpp_serving_client.py) 的部分代码
1. 修改 [`load_client_config`](./test_cpp_serving_client.py#L28) 处的代码,将 `load_client_config` 后的路径改为 `ResNet50_vd_client/serving_client_conf.prototxt`
2. 修改 [`feed={"inputs": image}`](./test_cpp_serving_client.py#L45) 处的代码,将 `inputs` 改为与 `ResNet50_vd_client/serving_client_conf.prototxt``feed_var` 字段下面的 `name` 一致。由于部分模型client文件中的 `name``x` 而不是 `inputs` ,因此使用这些模型进行C++ Serving部署时需要注意这一点。
- 启动服务:
# 启动服务, 服务在后台运行,运行日志保存在 nohup.txt
# CPU部署
bash run_cpp_serving.sh
# GPU部署并指定0号卡
bash run_cpp_serving.sh 0
- 发送请求:
# 发送服务请求
python3.7 test_cpp_serving_client.py
prediction: daisy, probability: 0.9341399073600769
- 关闭服务:
python3.7 -m paddle_serving_server.serve stop
执行完毕后出现`Process stopped`信息表示成功关闭服务。
## 4.FAQ
**Q1**: 发送请求后没有结果返回或者提示输出解码报错
**A1**: 启动服务和发送请求时不要设置代理,可以在启动服务前和发送请求前关闭代理,关闭代理的命令是:
unset https_proxy
unset http_proxy
**Q2**: 启动服务后没有任何反应
**A2**: 可以检查`config.yml``model_config`对应的路径是否存在,文件夹命名是否正确
更多的服务部署类型,如 `RPC 预测服务` 等,可以参考 Serving 的[github 官网](https://github.com/PaddlePaddle/Serving/tree/v0.9.0/examples)
English | [简体中文](./readme.md)
# Classification model service deployment
## Table of contents
- [1 Introduction](#1-introduction)
- [2. Serving installation](#2-serving-installation)
- [3. Image Classification Service Deployment](#3-image-classification-service-deployment)
- [3.1 Model conversion](#31-model-conversion)
- [3.2 Service deployment and request](#32-service-deployment-and-request)
- [3.2.1 Python Serving](#321-python-serving)
- [3.2.2 C++ Serving](#322-c-serving)
<a name="1"></a>
## 1 Introduction
[Paddle Serving](https://github.com/PaddlePaddle/Serving) aims to help deep learning developers easily deploy online prediction services, support one-click deployment of industrial-grade service capabilities, high concurrency between client and server Efficient communication and support for developing clients in multiple programming languages.
This section takes the HTTP prediction service deployment as an example to introduce how to use PaddleServing to deploy the model service in PaddleClas. Currently, only Linux platform deployment is supported, and Windows platform is not currently supported.
<a name="2"></a>
## 2. Serving installation
The Serving official website recommends using docker to install and deploy the Serving environment. First, you need to pull the docker environment and create a Serving-based docker.
# start GPU docker
docker pull paddlepaddle/serving:0.7.0-cuda10.2-cudnn7-devel
nvidia-docker run -p 9292:9292 --name test -dit paddlepaddle/serving:0.7.0-cuda10.2-cudnn7-devel bash
nvidia-docker exec -it test bash
# start CPU docker
docker pull paddlepaddle/serving:0.7.0-devel
docker run -p 9292:9292 --name test -dit paddlepaddle/serving:0.7.0-devel bash
docker exec -it test bash
After entering docker, you need to install Serving-related python packages.
python3.7 -m pip install paddle-serving-client==0.7.0
python3.7 -m pip install paddle-serving-app==0.7.0
python3.7 -m pip install faiss-cpu==1.7.1post2
#If it is a CPU deployment environment:
python3.7 -m pip install paddle-serving-server==0.7.0 #CPU
python3.7 -m pip install paddlepaddle==2.2.0 # CPU
#If it is a GPU deployment environment
python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post102 # GPU with CUDA10.2 + TensorRT6
python3.7 -m pip install paddlepaddle-gpu==2.2.0 # GPU with CUDA10.2
#Other GPU environments need to confirm the environment and then choose which one to execute
python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post101 # GPU with CUDA10.1 + TensorRT6
python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUDA11.2 + TensorRT8
* If the installation speed is too slow, you can change the source through `-i https://pypi.tuna.tsinghua.edu.cn/simple` to speed up the installation process.
* For other environment configuration installation, please refer to: [Install Paddle Serving with Docker](https://github.com/PaddlePaddle/Serving/blob/v0.7.0/doc/Install_EN.md)
<a name="3"></a>
## 3. Image Classification Service Deployment
The following takes the classic ResNet50_vd model as an example to introduce how to deploy the image classification service.
<a name="3.1"></a>
### 3.1 Model conversion
When using PaddleServing for service deployment, you need to convert the saved inference model into a Serving model.
- Go to the working directory:
cd deploy/paddleserving
- Download and unzip the inference model for ResNet50_vd:
# Download ResNet50_vd inference model
wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet50_vd_infer.tar
# Decompress the ResNet50_vd inference model
tar xf ResNet50_vd_infer.tar
- Use the paddle_serving_client command to convert the downloaded inference model into a model format for easy server deployment:
# Convert ResNet50_vd model
python3.7 -m paddle_serving_client.convert \
--dirname ./ResNet50_vd_infer/ \
--model_filename inference.pdmodel \
--params_filename inference.pdiparams \
--serving_server ./ResNet50_vd_serving/ \
--serving_client ./ResNet50_vd_client/
The specific meaning of the parameters in the above command is shown in the following table
| parameter | type | default value | description |
| --------- | ---- | ------------- | ----------- |-------------------------------------------------- --- |
| `dirname` | str | - | The storage path of the model file to be converted. The program structure file and parameter file are saved in this directory. |
| `model_filename` | str | None | The name of the file storing the model Inference Program structure that needs to be converted. If set to None, use `__model__` as the default filename |
| `params_filename` | str | None | File name where all parameters of the model to be converted are stored. It needs to be specified if and only if all model parameters are stored in a single binary file. If the model parameters are stored in separate files, set it to None |
| `serving_server` | str | `"serving_server"` | The storage path of the converted model files and configuration files. Default is serving_server |
| `serving_client` | str | `"serving_client"` | The converted client configuration file storage path. Default is serving_client |
After the ResNet50_vd inference model conversion is completed, there will be additional `ResNet50_vd_serving` and `ResNet50_vd_client` folders in the current folder, with the following structure:
├── ResNet50_vd_serving/
│ ├── inference.pdiparams
│ ├── inference.pdmodel
│ ├── serving_server_conf.prototxt
│ └── serving_server_conf.stream.prototxt
└── ResNet50_vd_client/
├── serving_client_conf.prototxt
└── serving_client_conf.stream.prototxt
- Serving provides the function of input and output renaming in order to be compatible with the deployment of different models. When different models are deployed in inference, you only need to modify the `alias_name` of the configuration file, and the inference deployment can be completed without modifying the code. Therefore, after the conversion, you need to modify the alias names in the files `serving_server_conf.prototxt` under `ResNet50_vd_serving` and `ResNet50_vd_client` respectively, and change the `alias_name` in `fetch_var` to `prediction`, the modified serving_server_conf.prototxt is as follows Show:
feed_var {
name: "inputs"
alias_name: "inputs"
is_lod_tensor: false
feed_type: 1
shape: 3
shape: 224
shape: 224
fetch_var {
name: "save_infer_model/scale_0.tmp_1"
alias_name: "prediction"
is_lod_tensor: false
fetch_type: 1
shape: 1000
<a name="3.2"></a>
### 3.2 Service deployment and request
The paddleserving directory contains the code for starting the pipeline service, the C++ serving service and sending the prediction request, mainly including:
classification_web_service.py # Script to start the pipeline server
config.yml # Configuration file to start the pipeline service
pipeline_http_client.py # Script for sending pipeline prediction requests in http mode
pipeline_rpc_client.py # Script for sending pipeline prediction requests in rpc mode
readme.md # Classification model service deployment document
run_cpp_serving.sh # Start the C++ Serving departmentscript
test_cpp_serving_client.py # Script for sending C++ serving prediction requests in rpc mode
<a name="3.2.1"></a>
#### 3.2.1 Python Serving
- Start the service:
# Start the service and save the running log in log.txt
python3.7 classification_web_service.py &>log.txt &
- send request:
# send service request
python3.7 pipeline_http_client.py
After a successful run, the results of the model prediction will be printed in the cmd window, and the results are as follows:
{'err_no': 0, 'err_msg': '', 'key': ['label', 'prob'], 'value': ["['daisy']", '[0.9341402053833008]'], 'tensors ': []}
- turn off the service
If the service program is running in the foreground, you can press `Ctrl+C` to terminate the server program; if it is running in the background, you can use the kill command to close related processes, or you can execute the following command in the path where the service program is started to terminate the server program:
python3.7 -m paddle_serving_server.serve stop
After the execution is completed, the `Process stopped` message appears, indicating that the service was successfully shut down.
<a name="3.2.2"></a>
#### 3.2.2 C++ Serving
Different from Python Serving, the C++ Serving client calls C++ OP to predict, so before starting the service, you need to compile and install the serving server package, and set `SERVING_BIN`.
- Compile and install the Serving server package
# Enter the working directory
cd PaddleClas/deploy/paddleserving
# One-click compile and install Serving server, set SERVING_BIN
source ./build_server.sh python3.7
**Note: The path set by **[build_server.sh](./build_server.sh#L55-L62) may need to be modified according to the actual machine environment such as CUDA, python version, etc., and then compiled.
- Modify the client file `ResNet50_client/serving_client_conf.prototxt` , change the field after `feed_type:` to 20, change the field after the first `shape:` to 1 and delete the rest of the `shape` fields.
feed_var {
name: "inputs"
alias_name: "inputs"
is_lod_tensor: false
feed_type: 20
shape: 1
- Modify part of the code of [`test_cpp_serving_client`](./test_cpp_serving_client.py)
1. Modify the [`feed={"inputs": image}`](./test_cpp_serving_client.py#L28) part of the code, and change the path after `load_client_config` to `ResNet50_client/serving_client_conf.prototxt` .
2. Modify the [`feed={"inputs": image}`](./test_cpp_serving_client.py#L45) part of the code, and change `inputs` to be the same as the `feed_var` field in `ResNet50_client/serving_client_conf.prototxt` name` is the same. Since `name` in some model client files is `x` instead of `inputs` , you need to pay attention to this when using these models for C++ Serving deployment.
- Start the service:
# Start the service, the service runs in the background, and the running log is saved in nohup.txt
# CPU deployment
sh run_cpp_serving.sh
# GPU deployment and specify card 0
sh run_cpp_serving.sh 0
- send request:
# send service request
python3.7 test_cpp_serving_client.py
After a successful run, the results of the model prediction will be printed in the cmd window, and the results are as follows:
prediction: daisy, probability: 0.9341399073600769
- close the service:
If the service program is running in the foreground, you can press `Ctrl+C` to terminate the server program; if it is running in the background, you can use the kill command to close related processes, or you can execute the following command in the path where the service program is started to terminate the server program:
python3.7 -m paddle_serving_server.serve stop
After the execution is completed, the `Process stopped` message appears, indicating that the service was successfully shut down.
**Q1**: No result is returned after the request is sent or an output decoding error is prompted
**A1**: Do not set the proxy when starting the service and sending the request. You can close the proxy before starting the service and sending the request. The command to close the proxy is:
unset https_proxy
unset http_proxy
**Q2**: nothing happens after starting the service
**A2**: You can check whether the path corresponding to `model_config` in `config.yml` exists, and whether the folder name is correct
For more service deployment types, such as `RPC prediction service`, you can refer to Serving's [github official website](https://github.com/PaddlePaddle/Serving/tree/v0.9.0/examples)
简体中文 | [English](./readme_en.md)
# 识别模型服务化部署
## 目录
- [1. 简介](#1-简介)
- [2. Serving 安装](#2-serving-安装)
- [3. 图像识别服务部署](#3-图像识别服务部署)
- [3.1 模型转换](#31-模型转换)
- [3.2 服务部署和请求](#32-服务部署和请求)
- [3.2.1 Python Serving](#321-python-serving)
- [3.2.2 C++ Serving](#322-c-serving)
- [4. FAQ](#4-faq)
<a name="1"></a>
## 1. 简介
[Paddle Serving](https://github.com/PaddlePaddle/Serving) 旨在帮助深度学习开发者轻松部署在线预测服务,支持一键部署工业级的服务能力、客户端和服务端之间高并发和高效通信、并支持多种编程语言开发客户端。
该部分以 HTTP 预测服务部署为例,介绍怎样在 PaddleClas 中使用 PaddleServing 部署模型服务。目前只支持 Linux 平台部署,暂不支持 Windows 平台。
<a name="2"></a>
## 2. Serving 安装
Serving 官网推荐使用 docker 安装并部署 Serving 环境。首先需要拉取 docker 环境并创建基于 Serving 的 docker。
# 启动GPU docker
docker pull paddlepaddle/serving:0.7.0-cuda10.2-cudnn7-devel
nvidia-docker run -p 9292:9292 --name test -dit paddlepaddle/serving:0.7.0-cuda10.2-cudnn7-devel bash
nvidia-docker exec -it test bash
# 启动CPU docker
docker pull paddlepaddle/serving:0.7.0-devel
docker run -p 9292:9292 --name test -dit paddlepaddle/serving:0.7.0-devel bash
docker exec -it test bash
进入 docker 后,需要安装 Serving 相关的 python 包。
python3.7 -m pip install paddle-serving-client==0.7.0
python3.7 -m pip install paddle-serving-app==0.7.0
python3.7 -m pip install faiss-cpu==1.7.1post2
python3.7 -m pip install paddle-serving-server==0.7.0 # CPU
python3.7 -m pip install paddlepaddle==2.2.0 # CPU
python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post102 # GPU with CUDA10.2 + TensorRT6
python3.7 -m pip install paddlepaddle-gpu==2.2.0 # GPU with CUDA10.2
python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post101 # GPU with CUDA10.1 + TensorRT6
python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUDA11.2 + TensorRT8
* 如果安装速度太慢,可以通过 `-i https://pypi.tuna.tsinghua.edu.cn/simple` 更换源,加速安装过程。
* 其他环境配置安装请参考:[使用Docker安装Paddle Serving](https://github.com/PaddlePaddle/Serving/blob/v0.7.0/doc/Install_CN.md)
<a name="3"></a>
## 3. 图像识别服务部署
使用 PaddleServing 做图像识别服务化部署时,**需要将保存的多个 inference 模型都转换为 Serving 模型**。 下面以 PP-ShiTu 中的超轻量图像识别模型为例,介绍图像识别服务的部署。
<a name="3.1"></a>
### 3.1 模型转换
- 进入工作目录:
cd deploy/
- 下载通用检测 inference 模型和通用识别 inference 模型
# 创建并进入models文件夹
mkdir models
cd models
# 下载并解压通用识别模型
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/general_PPLCNet_x2_5_lite_v1.0_infer.tar
tar -xf general_PPLCNet_x2_5_lite_v1.0_infer.tar
# 下载并解压通用检测模型
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar
tar -xf picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar
- 转换通用识别 inference 模型为 Serving 模型:
# 转换通用识别模型
python3.7 -m paddle_serving_client.convert \
--dirname ./general_PPLCNet_x2_5_lite_v1.0_infer/ \
--model_filename inference.pdmodel \
--params_filename inference.pdiparams \
--serving_server ./general_PPLCNet_x2_5_lite_v1.0_serving/ \
--serving_client ./general_PPLCNet_x2_5_lite_v1.0_client/
上述命令的参数含义与[#3.1 模型转换](#3.1)相同
通用识别 inference 模型转换完成后,会在当前文件夹多出 `general_PPLCNet_x2_5_lite_v1.0_serving/``general_PPLCNet_x2_5_lite_v1.0_client/` 的文件夹,具备如下结构:
├── general_PPLCNet_x2_5_lite_v1.0_serving/
│ ├── inference.pdiparams
│ ├── inference.pdmodel
│ ├── serving_server_conf.prototxt
│ └── serving_server_conf.stream.prototxt
└── general_PPLCNet_x2_5_lite_v1.0_client/
├── serving_client_conf.prototxt
└── serving_client_conf.stream.prototxt
- 转换通用检测 inference 模型为 Serving 模型:
# 转换通用检测模型
python3.7 -m paddle_serving_client.convert --dirname ./picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer/ \
--model_filename inference.pdmodel \
--params_filename inference.pdiparams \
--serving_server ./picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/ \
--serving_client ./picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/
上述命令的参数含义与[#3.1 模型转换](#3.1)相同
识别推理模型转换完成后,会在当前文件夹多出 `general_PPLCNet_x2_5_lite_v1.0_serving/``general_PPLCNet_x2_5_lite_v1.0_client/` 的文件夹。分别修改 `general_PPLCNet_x2_5_lite_v1.0_serving/``general_PPLCNet_x2_5_lite_v1.0_client/` 目录下的 `serving_server_conf.prototxt` 中的 `alias` 名字: 将 `fetch_var` 中的 `alias_name` 改为 `features`。 修改后的 `serving_server_conf.prototxt` 内容如下
feed_var {
name: "x"
alias_name: "x"
is_lod_tensor: false
feed_type: 1
shape: 3
shape: 224
shape: 224
fetch_var {
name: "save_infer_model/scale_0.tmp_1"
alias_name: "features"
is_lod_tensor: false
fetch_type: 1
shape: 512
通用检测 inference 模型转换完成后,会在当前文件夹多出 `picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/``picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/` 的文件夹,具备如下结构:
├── picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/
│ ├── inference.pdiparams
│ ├── inference.pdmodel
│ ├── serving_server_conf.prototxt
│ └── serving_server_conf.stream.prototxt
└── picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/
├── serving_client_conf.prototxt
└── serving_client_conf.stream.prototxt
| 参数 | 类型 | 默认值 | 描述 |
| ----------------- | ---- | ------------------ | ------------------------------------------------------------ |
| `dirname` | str | - | 需要转换的模型文件存储路径,Program结构文件和参数文件均保存在此目录。 |
| `model_filename` | str | None | 存储需要转换的模型Inference Program结构的文件名称。如果设置为None,则使用 `__model__` 作为默认的文件名 |
| `params_filename` | str | None | 存储需要转换的模型所有参数的文件名称。当且仅当所有模型参数被保>存在一个单独的二进制文件中,它才需要被指定。如果模型参数是存储在各自分离的文件中,设置它的值为None |
| `serving_server` | str | `"serving_server"` | 转换后的模型文件和配置文件的存储路径。默认值为serving_server |
| `serving_client` | str | `"serving_client"` | 转换后的客户端配置文件存储路径。默认值为serving_client |
- 下载并解压已经构建后完成的检索库 index
# 回到deploy目录
cd ../
# 下载构建完成的检索库 index
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v1.0.tar
# 解压构建完成的检索库 index
tar -xf drink_dataset_v1.0.tar
<a name="3.2"></a>
### 3.2 服务部署和请求
**注意:** 识别服务涉及到多个模型,出于性能考虑采用 PipeLine 部署方式。Pipeline 部署方式当前不支持 windows 平台。
- 进入到工作目录
cd ./deploy/paddleserving/recognition
paddleserving 目录包含启动 Python Pipeline 服务、C++ Serving 服务和发送预测请求的代码,包括:
config.yml # 启动python pipeline服务的配置文件
pipeline_http_client.py # http方式发送pipeline预测请求的脚本
pipeline_rpc_client.py # rpc方式发送pipeline预测请求的脚本
recognition_web_service.py # 启动pipeline服务端的脚本
readme.md # 识别模型服务化部署文档
run_cpp_serving.sh # 启动C++ Pipeline Serving部署的脚本
test_cpp_serving_client.py # rpc方式发送C++ Pipeline serving预测请求的脚本
<a name="3.2.1"></a>
#### 3.2.1 Python Serving
- 启动服务:
# 启动服务,运行日志保存在 log.txt
python3.7 recognition_web_service.py &>log.txt &
- 发送请求:
python3.7 pipeline_http_client.py
{'err_no': 0, 'err_msg': '', 'key': ['result'], 'value': ["[{'bbox': [345, 95, 524, 576], 'rec_docs': '红牛-强化型', 'rec_scores': 0.79903316}]"], 'tensors': []}
<a name="3.2.2"></a>
#### 3.2.2 C++ Serving
与Python Serving不同,C++ Serving客户端调用 C++ OP来预测,因此在启动服务之前,需要编译并安装 serving server包,并设置 `SERVING_BIN`
- 编译并安装Serving server包
# 进入工作目录
cd PaddleClas/deploy/paddleserving
# 一键编译安装Serving server、设置 SERVING_BIN
source ./build_server.sh python3.7
- C++ Serving使用的输入输出格式与Python不同,因此需要执行以下命令,将4个文件复制到下的文件覆盖掉[3.1](#31-模型转换)得到文件夹中的对应4个prototxt文件。
# 进入PaddleClas/deploy目录
cd PaddleClas/deploy/
# 覆盖prototxt文件
\cp ./paddleserving/recognition/preprocess/general_PPLCNet_x2_5_lite_v1.0_serving/*.prototxt ./models/general_PPLCNet_x2_5_lite_v1.0_serving/
\cp ./paddleserving/recognition/preprocess/general_PPLCNet_x2_5_lite_v1.0_client/*.prototxt ./models/general_PPLCNet_x2_5_lite_v1.0_client/
\cp ./paddleserving/recognition/preprocess/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/*.prototxt ./models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/
\cp ./paddleserving/recognition/preprocess/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/*.prototxt ./models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/
- 启动服务:
# 进入工作目录
cd PaddleClas/deploy/paddleserving/recognition
# 端口号默认为9400;运行日志默认保存在 log_PPShiTu.txt 中
# CPU部署
bash run_cpp_serving.sh
# GPU部署,并指定第0号卡
bash run_cpp_serving.sh 0
- 发送请求:
# 发送服务请求
python3.7 test_cpp_serving_client.py
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0614 03:01:36.273097 6084 naming_service_thread.cpp:202] brpc::policy::ListNamingService(""): added 1
I0614 03:01:37.393564 6084 general_model.cpp:490] [client]logid=0,client_cost=1107.82ms,server_cost=1101.75ms.
[{'bbox': [345, 95, 524, 585], 'rec_docs': '红牛-强化型', 'rec_scores': 0.8073724}]
- 关闭服务
python3.7 -m paddle_serving_server.serve stop
执行完毕后出现`Process stopped`信息表示成功关闭服务。
<a name="4"></a>
## 4. FAQ
**Q1**: 发送请求后没有结果返回或者提示输出解码报错
**A1**: 启动服务和发送请求时不要设置代理,可以在启动服务前和发送请求前关闭代理,关闭代理的命令是:
unset https_proxy
unset http_proxy
**Q2**: 启动服务后没有任何反应
**A2**: 可以检查`config.yml``model_config`对应的路径是否存在,文件夹命名是否正确
更多的服务部署类型,如 `RPC 预测服务` 等,可以参考 Serving 的[github 官网](https://github.com/PaddlePaddle/Serving/tree/v0.9.0/examples)
English | [简体中文](./readme.md)
# Identify model service deployment
## Table of contents
- [1 Introduction](#1-introduction)
- [2. Serving installation](#2-serving-installation)
- [3. Image recognition service deployment](#3-image-recognition-service-deployment)
- [3.1 Model conversion](#31-model-conversion)
- [3.2 Service deployment and request](#32-service-deployment-and-request)
- [3.2.1 Python Serving](#321-python-serving)
- [3.2.2 C++ Serving](#322-c-serving)
- [4. FAQ](#4-faq)
<a name="1"></a>
## 1 Introduction
[Paddle Serving](https://github.com/PaddlePaddle/Serving) aims to help deep learning developers easily deploy online prediction services, support one-click deployment of industrial-grade service capabilities, high concurrency between client and server Efficient communication and support for developing clients in multiple programming languages.
This section takes the HTTP prediction service deployment as an example to introduce how to use PaddleServing to deploy the model service in PaddleClas. Currently, only Linux platform deployment is supported, and Windows platform is not currently supported.
<a name="2"></a>
## 2. Serving installation
The Serving official website recommends using docker to install and deploy the Serving environment. First, you need to pull the docker environment and create a Serving-based docker.
# start GPU docker
docker pull paddlepaddle/serving:0.7.0-cuda10.2-cudnn7-devel
nvidia-docker run -p 9292:9292 --name test -dit paddlepaddle/serving:0.7.0-cuda10.2-cudnn7-devel bash
nvidia-docker exec -it test bash
# start CPU docker
docker pull paddlepaddle/serving:0.7.0-devel
docker run -p 9292:9292 --name test -dit paddlepaddle/serving:0.7.0-devel bash
docker exec -it test bash
After entering docker, you need to install Serving-related python packages.
python3.7 -m pip install paddle-serving-client==0.7.0
python3.7 -m pip install paddle-serving-app==0.7.0
python3.7 -m pip install faiss-cpu==1.7.1post2
#If it is a CPU deployment environment:
python3.7 -m pip install paddle-serving-server==0.7.0 #CPU
python3.7 -m pip install paddlepaddle==2.2.0 # CPU
#If it is a GPU deployment environment
python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post102 # GPU with CUDA10.2 + TensorRT6
python3.7 -m pip install paddlepaddle-gpu==2.2.0 # GPU with CUDA10.2
#Other GPU environments need to confirm the environment and then choose which one to execute
python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post101 # GPU with CUDA10.1 + TensorRT6
python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUDA11.2 + TensorRT8
* If the installation speed is too slow, you can change the source through `-i https://pypi.tuna.tsinghua.edu.cn/simple` to speed up the installation process.
* For other environment configuration installation, please refer to: [Install Paddle Serving with Docker](https://github.com/PaddlePaddle/Serving/blob/v0.7.0/doc/Install_CN.md)
<a name="3"></a>
## 3. Image recognition service deployment
When using PaddleServing for image recognition service deployment, **need to convert multiple saved inference models to Serving models**. The following takes the ultra-lightweight image recognition model in PP-ShiTu as an example to introduce the deployment of image recognition services.
<a name="3.1"></a>
### 3.1 Model conversion
- Go to the working directory:
cd deploy/
- Download generic detection inference model and generic recognition inference model
# Create and enter the models folder
mkdir models
cd models
# Download and unzip the generic recognition model
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/general_PPLCNet_x2_5_lite_v1.0_infer.tar
tar -xf general_PPLCNet_x2_5_lite_v1.0_infer.tar
# Download and unzip the generic detection model
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar
tar -xf picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar
- Convert the generic recognition inference model to the Serving model:
# Convert the generic recognition model
python3.7 -m paddle_serving_client.convert \
--dirname ./general_PPLCNet_x2_5_lite_v1.0_infer/ \
--model_filename inference.pdmodel \
--params_filename inference.pdiparams \
--serving_server ./general_PPLCNet_x2_5_lite_v1.0_serving/ \
--serving_client ./general_PPLCNet_x2_5_lite_v1.0_client/
The meaning of the parameters of the above command is the same as [#3.1 Model conversion](#3.1)
After the conversion of the general recognition inference model is completed, there will be additional `general_PPLCNet_x2_5_lite_v1.0_serving/` and `general_PPLCNet_x2_5_lite_v1.0_client/` folders in the current folder, with the following structure:
├── general_PPLCNet_x2_5_lite_v1.0_serving/
│ ├── inference.pdiparams
│ ├── inference.pdmodel
│ ├── serving_server_conf.prototxt
│ └── serving_server_conf.stream.prototxt
└── general_PPLCNet_x2_5_lite_v1.0_client/
├── serving_client_conf.prototxt
└── serving_client_conf.stream.prototxt
- Convert general detection inference model to Serving model:
# Convert generic detection model
python3.7 -m paddle_serving_client.convert --dirname ./picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer/ \
--model_filename inference.pdmodel \
--params_filename inference.pdiparams \
--serving_server ./picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/ \
--serving_client ./picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/
The meaning of the parameters of the above command is the same as [#3.1 Model conversion](#3.1)
After the conversion of the general detection inference model is completed, there will be additional folders `picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/` and `picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/` in the current folder, with the following structure:
├── picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/
│ ├── inference.pdiparams
│ ├── inference.pdmodel
│ ├── serving_server_conf.prototxt
│ └── serving_server_conf.stream.prototxt
└── picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/
├── serving_client_conf.prototxt
└── serving_client_conf.stream.prototxt
The specific meaning of the parameters in the above command is shown in the following table
| parameter | type | default value | description |
| --------- | ---- | ------------- | ----------- |
| `dirname` | str | - | The storage path of the model file to be converted. The program structure file and parameter file are saved in this directory. |
| `model_filename` | str | None | The name of the file storing the model Inference Program structure that needs to be converted. If set to None, use `__model__` as the default filename |
| `params_filename` | str | None | The name of the file that stores all parameters of the model that need to be transformed. It needs to be specified if and only if all model parameters are stored in a single binary file. If the model parameters are stored in separate files, set it to None |
| `serving_server` | str | `"serving_server"` | The storage path of the converted model files and configuration files. Default is serving_server |
| `serving_client` | str | `"serving_client"` | The converted client configuration file storage path. Default is serving_client |
- Download and unzip the index of the retrieval library that has been built
# Go back to the deploy directory
cd ../
# Download the built retrieval library index
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v1.0.tar
# Decompress the built retrieval library index
tar -xf drink_dataset_v1.0.tar
<a name="3.2"></a>
### 3.2 Service deployment and request
**Note:** The identification service involves multiple models, and the PipeLine deployment method is used for performance reasons. The Pipeline deployment method currently does not support the windows platform.
- go to the working directory
cd ./deploy/paddleserving/recognition
The paddleserving directory contains code to start the Python Pipeline service, the C++ Serving service, and send prediction requests, including:
config.yml # The configuration file to start the python pipeline service
pipeline_http_client.py # Script for sending pipeline prediction requests in http mode
pipeline_rpc_client.py # Script for sending pipeline prediction requests in rpc mode
recognition_web_service.py # Script to start the pipeline server
readme.md # Identify model service deployment documents
run_cpp_serving.sh # Script to start C++ Pipeline Serving deployment
test_cpp_serving_client.py # Script for sending C++ Pipeline serving prediction requests by rpc
<a name="3.2.1"></a>
#### 3.2.1 Python Serving
- Start the service:
# Start the service and save the running log in log.txt
python3.7 recognition_web_service.py &>log.txt &
- send request:
python3.7 pipeline_http_client.py
After a successful run, the results of the model prediction will be printed in the cmd window, and the results are as follows:
{'err_no': 0, 'err_msg': '', 'key': ['result'], 'value': ["[{'bbox': [345, 95, 524, 576], 'rec_docs': 'Red Bull-Enhanced', 'rec_scores': 0.79903316}]"], 'tensors': []}
<a name="3.2.2"></a>
#### 3.2.2 C++ Serving
Different from Python Serving, the C++ Serving client calls C++ OP to predict, so before starting the service, you need to compile and install the serving server package, and set `SERVING_BIN`.
- Compile and install the Serving server package
# Enter the working directory
cd PaddleClas/deploy/paddleserving
# One-click compile and install Serving server, set SERVING_BIN
source ./build_server.sh python3.7
**Note: The path set by **[build_server.sh](../build_server.sh#L55-L62) may need to be modified according to the actual machine environment such as CUDA, python version, etc., and then compiled.
- The input and output format used by C++ Serving is different from that of Python, so you need to execute the following command to overwrite the files below [3.1] (#31-model conversion) by copying the 4 files to get the corresponding 4 prototxt files in the folder.
# Enter PaddleClas/deploy directory
cd PaddleClas/deploy/
# Overwrite prototxt file
\cp ./paddleserving/recognition/preprocess/general_PPLCNet_x2_5_lite_v1.0_serving/*.prototxt ./models/general_PPLCNet_x2_5_lite_v1.0_serving/
\cp ./paddleserving/recognition/preprocess/general_PPLCNet_x2_5_lite_v1.0_client/*.prototxt ./models/general_PPLCNet_x2_5_lite_v1.0_client/
\cp ./paddleserving/recognition/preprocess/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/*.prototxt ./models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/
\cp ./paddleserving/recognition/preprocess/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/*.prototxt ./models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/
- Start the service:
# Enter the working directory
cd PaddleClas/deploy/paddleserving/recognition
# The default port number is 9400; the running log is saved in log_PPShiTu.txt by default
# CPU deployment
sh run_cpp_serving.sh
# GPU deployment, and specify card 0
sh run_cpp_serving.sh 0
- send request:
# send service request
python3.7 test_cpp_serving_client.py
After a successful run, the results of the model predictions are printed in the client's terminal window as follows:
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0614 03:01:36.273097 6084 naming_service_thread.cpp:202] brpc::policy::ListNamingService(""): added 1
I0614 03:01:37.393564 6084 general_model.cpp:490] [client]logid=0,client_cost=1107.82ms,server_cost=1101.75ms.
[{'bbox': [345, 95, 524, 585], 'rec_docs': 'Red Bull-Enhanced', 'rec_scores': 0.8073724}]
- close the service:
If the service program is running in the foreground, you can press `Ctrl+C` to terminate the server program; if it is running in the background, you can use the kill command to close related processes, or you can execute the following command in the path where the service program is started to terminate the server program:
python3.7 -m paddle_serving_server.serve stop
After the execution is completed, the `Process stopped` message appears, indicating that the service was successfully shut down.
<a name="4"></a>
## 4. FAQ
**Q1**: No result is returned after the request is sent or an output decoding error is prompted
**A1**: Do not set the proxy when starting the service and sending the request. You can close the proxy before starting the service and sending the request. The command to close the proxy is:
unset https_proxy
unset http_proxy
**Q2**: nothing happens after starting the service
**A2**: You can check whether the path corresponding to `model_config` in `config.yml` exists, and whether the folder name is correct
For more service deployment types, such as `RPC prediction service`, you can refer to Serving's [github official website](https://github.com/PaddlePaddle/Serving/tree/v0.9.0/examples)
# Model Service deployment
## Catalogue
- [1. Introduction](#1)
- [2. Installation of Serving](#2)
- [3. Service Deployment for Image Classification](#3)
- [3.1 Model Transformation](#3.1)
- [3.2 Service Deployment and Request](#3.2)
- [3.2.1 Python Serving](#3.2.1)
- [3.2.2 C++ Serving](#3.2.2)
- [4. Service Deployment for Image Recognition](#4)
- [4.1 Model Transformation](#4.1)
- [4.2 Service Deployment and Request](#4.2)
- [4.2.1 Python Serving](#4.2.1)
- [4.2.2 C++ Serving](#4.2.2)
- [5. FAQ](#5)
<a name="1"></a>
## 1 Introduction
[Paddle Serving](https://github.com/PaddlePaddle/Serving) aims to help deep learning developers easily deploy online prediction services, support one-click deployment of industrial-grade service capabilities, high concurrency between client and server Efficient communication and support for developing clients in multiple programming languages.
This section takes the HTTP prediction service deployment as an example to introduce how to use PaddleServing to deploy the model service in PaddleClas. Currently, only Linux platform deployment is supported, and Windows platform is not currently supported.
<a name="2"></a>
## 2. Installation of Serving
The Serving official website recommends using docker to install and deploy the Serving environment. First, you need to pull the docker environment and create a Serving-based docker.
# start GPU docker
docker pull paddlepaddle/serving:0.7.0-cuda10.2-cudnn7-devel
nvidia-docker run -p 9292:9292 --name test -dit paddlepaddle/serving:0.7.0-cuda10.2-cudnn7-devel bash
nvidia-docker exec -it test bash
# start CPU docker
docker pull paddlepaddle/serving:0.7.0-devel
docker run -p 9292:9292 --name test -dit paddlepaddle/serving:0.7.0-devel bash
docker exec -it test bash
After entering docker, you need to install Serving-related python packages.
python3.7 -m pip install paddle-serving-client==0.7.0
python3.7 -m pip install paddle-serving-app==0.7.0
python3.7 -m pip install faiss-cpu==1.7.1post2
#If it is a CPU deployment environment:
python3.7 -m pip install paddle-serving-server==0.7.0 #CPU
python3.7 -m pip install paddlepaddle==2.2.0 # CPU
#If it is a GPU deployment environment
python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post102 # GPU with CUDA10.2 + TensorRT6
python3.7 -m pip install paddlepaddle-gpu==2.2.0 # GPU with CUDA10.2
#Other GPU environments need to confirm the environment and then choose which one to execute
python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post101 # GPU with CUDA10.1 + TensorRT6
python3.7 -m pip install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUDA11.2 + TensorRT8
* If the installation speed is too slow, you can change the source through `-i https://pypi.tuna.tsinghua.edu.cn/simple` to speed up the installation process.
* For other environment configuration installation, please refer to: [Install Paddle Serving with Docker](https://github.com/PaddlePaddle/Serving/blob/v0.7.0/doc/Install_CN.md)
<a name="3"></a>
## 3. Service Deployment for Image Classification
The following takes the classic ResNet50_vd model as an example to introduce how to deploy the image classification service.
<a name="3.1"></a>
### 3.1 Model Transformation
When using PaddleServing for service deployment, you need to convert the saved inference model into a Serving model.
- Go to the working directory:
cd deploy/paddleserving
- Download and unzip the inference model for ResNet50_vd:
# Download ResNet50_vd inference model
wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet50_vd_infer.tar
# Decompress the ResNet50_vd inference model
tar xf ResNet50_vd_infer.tar
- Use the paddle_serving_client command to convert the downloaded inference model into a model format for easy server deployment:
# Convert ResNet50_vd model
python3.7 -m paddle_serving_client.convert \
--dirname ./ResNet50_vd_infer/ \
--model_filename inference.pdmodel \
--params_filename inference.pdiparams \
--serving_server ./ResNet50_vd_serving/ \
--serving_client ./ResNet50_vd_client/
The specific meaning of the parameters in the above command is shown in the following table
| parameter | type | default value | description |
| ----------------- | ---- | ------------------ | ------------------------------------------------------------ |
| `dirname` | str | - | The storage path of the model file to be converted. The program structure file and parameter file are saved in this directory. |
| `model_filename` | str | None | The name of the file storing the model Inference Program structure that needs to be converted. If set to None, use `__model__` as the default filename |
| `params_filename` | str | None | File name where all parameters of the model to be converted are stored. It needs to be specified if and only if all model parameters are stored in a single binary file. If the model parameters are stored in separate files, set it to None |
| `serving_server` | str | `"serving_server"` | The storage path of the converted model files and configuration files. Default is serving_server |
| `serving_client` | str | `"serving_client"` | The converted client configuration file storage path. Default is serving_client |
After the ResNet50_vd inference model is converted, there will be additional `ResNet50_vd_serving` and `ResNet50_vd_client` folders in the current folder, with the following structure:
├── ResNet50_vd_serving/
│ ├── inference.pdiparams
│ ├── inference.pdmodel
│ ├── serving_server_conf.prototxt
│ └── serving_server_conf.stream.prototxt
└── ResNet50_vd_client/
├── serving_client_conf.prototxt
└── serving_client_conf.stream.prototxt
- Serving provides the function of input and output renaming in order to be compatible with the deployment of different models. When different models are deployed in inference, you only need to modify the `alias_name` of the configuration file, and the inference deployment can be completed without modifying the code. Therefore, after the conversion, you need to modify the alias names in the files `serving_server_conf.prototxt` under `ResNet50_vd_serving` and `ResNet50_vd_client` respectively, and change the `alias_name` in `fetch_var` to `prediction`, the modified serving_server_conf.prototxt is as follows Show:
feed_var {
name: "inputs"
alias_name: "inputs"
is_lod_tensor: false
feed_type: 1
shape: 3
shape: 224
shape: 224
fetch_var {
name: "save_infer_model/scale_0.tmp_1"
alias_name: "prediction"
is_lod_tensor: false
fetch_type: 1
shape: 1000
<a name="3.2"></a>
### 3.2 Service Deployment and Request
The paddleserving directory contains the code to start the pipeline service, C++ serving service and send prediction requests, including:
classification_web_service.py # Script to start the pipeline server
config.yml # Configuration file to start the pipeline service
pipeline_http_client.py # Script for sending pipeline prediction requests in http mode
pipeline_rpc_client.py # Script for sending pipeline prediction requests in rpc mode
run_cpp_serving.sh# Script to start C++ Serving deployment
test_cpp_serving_client.py # Script for sending C++ serving prediction requests in rpc mode
<a name="3.2.1"></a>
#### 3.2.1 Python Serving
- Start the service:
# Start the service and save the running log in log.txt
python3.7 classification_web_service.py &>log.txt &
- send request:
# send service request
python3.7 pipeline_http_client.py
After a successful run, the results of the model prediction will be printed in the cmd window, and the results are as follows:
{'err_no': 0, 'err_msg': '', 'key': ['label', 'prob'], 'value': ["['daisy']", '[0.9341402053833008]'], 'tensors ': []}
<a name="3.2.2"></a>
#### 3.2.2 C++ Serving
- Start the service:
# Start the service, the service runs in the background, and the running log is saved in nohup.txt
sh run_cpp_serving.sh
- send request:
# send service request
python3.7 test_cpp_serving_client.py
After a successful run, the results of the model prediction will be printed in the cmd window, and the results are as follows:
prediction: daisy, probability: 0.9341399073600769
<a name="4"></a>
## 4. Service Deployment for Image Recognition
In addition to the single-model deployment method introduced in [Chapter 3 Service Deployment for Image Classification](#3), we will introduce how to use multiple models to complete the multi-model **image recognition service deployment**
When using PaddleServing for image recognition service deployment, **need to convert multiple saved inference models to Serving models**. The following takes the ultra-lightweight image recognition model in PP-ShiTu as an example to introduce the deployment of image recognition services.
<a name="4.1"></a>
### 4.1 Model Transformation
- Go to the working directory:
cd deploy/
- Download generic detection inference model and generic recognition inference model
# Create and enter the models folder
mkdir models
cd models
# Download and unzip the generic recognition model
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/general_PPLCNet_x2_5_lite_v1.0_infer.tar
tar -xf general_PPLCNet_x2_5_lite_v1.0_infer.tar
# Download and unzip the generic detection model
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar
tar -xf picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar
- Convert the generic recognition inference model to the Serving model:
# Convert the generic recognition model
python3.7 -m paddle_serving_client.convert \
--dirname ./general_PPLCNet_x2_5_lite_v1.0_infer/ \
--model_filename inference.pdmodel \
--params_filename inference.pdiparams \
--serving_server ./general_PPLCNet_x2_5_lite_v1.0_serving/ \
--serving_client ./general_PPLCNet_x2_5_lite_v1.0_client/
The meaning of the parameters of the above command is the same as [#3.1 Model Transformation](#3.1)
After the transformation of the general recognition inference model is completed, there will be additional `general_PPLCNet_x2_5_lite_v1.0_serving/` and `general_PPLCNet_x2_5_lite_v1.0_client/` folders in the current folder, with the following structure:
├── general_PPLCNet_x2_5_lite_v1.0_serving/
│ ├── inference.pdiparams
│ ├── inference.pdmodel
│ ├── serving_server_conf.prototxt
│ └── serving_server_conf.stream.prototxt
└── general_PPLCNet_x2_5_lite_v1.0_client/
├── serving_client_conf.prototxt
└── serving_client_conf.stream.prototxt
- Modify the alias names in `serving_server_conf.prototxt` in `general_PPLCNet_x2_5_lite_v1.0_serving/` and `general_PPLCNet_x2_5_lite_v1.0_client/` directories respectively: change `alias_name` in `fetch_var` to `features`.
The modified `serving_server_conf.prototxt` content is as follows:
feed_var {
name: "x"
alias_name: "x"
is_lod_tensor: false
feed_type: 1
shape: 3
shape: 224
shape: 224
fetch_var {
name: "save_infer_model/scale_0.tmp_1"
alias_name: "features"
is_lod_tensor: false
fetch_type: 1
shape: 512
- Convert general detection inference model to Serving model:
# Convert generic detection model
python3.7 -m paddle_serving_client.convert --dirname ./picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer/ \
--model_filename inference.pdmodel \
--params_filename inference.pdiparams \
--serving_server ./picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/ \
--serving_client ./picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/
The meaning of the parameters of the above command is the same as [#3.1 Model Transformation](#3.1)
After the general detection inference model transformation is completed, there will be additional folders `picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/` and `picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/` in the current folder, with the following structure:
├── picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/
│ ├── inference.pdiparams
│ ├── inference.pdmodel
│ ├── serving_server_conf.prototxt
│ └── serving_server_conf.stream.prototxt
└── picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/
├── serving_client_conf.prototxt
└── serving_client_conf.stream.prototxt
**Note:** There is no need to modify the alias name in serving_server_conf.prototxt under `picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/` directory.
- Download and unzip the built retrieval library index
# Go back to the deploy directory
cd ../
# Download the built retrieval library index
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v1.0.tar
# Decompress the built retrieval library index
tar -xf drink_dataset_v1.0.tar
<a name="4.2"></a>
### 4.2 Service Deployment and Request
**Note:** The identification service involves multiple models, and the PipeLine deployment method is used for performance reasons. The Pipeline deployment method currently does not support the windows platform.
- go to the working directory
cd ./deploy/paddleserving/recognition
The paddleserving directory contains code to start the Python Pipeline service, the C++ Serving service, and send prediction requests, including:
config.yml # The configuration file to start the python pipeline service
pipeline_http_client.py # Script for sending pipeline prediction requests in http mode
pipeline_rpc_client.py # Script for sending pipeline prediction requests in rpc mode
recognition_web_service.py # Script to start the pipeline server
run_cpp_serving.sh # Script to start C++ Pipeline Serving deployment
test_cpp_serving_client.py # Script for sending C++ Pipeline serving prediction requests by rpc
<a name="4.2.1"></a>
#### 4.2.1 Python Serving
- Start the service:
# Start the service, run the logSave in log.txt
python3.7 recognition_web_service.py &>log.txt &
- send request:
python3.7 pipeline_http_client.py
After a successful run, the results of the model prediction will be printed in the cmd window, and the results are as follows:
{'err_no': 0, 'err_msg': '', 'key': ['result'], 'value': ["[{'bbox': [345, 95, 524, 576], 'rec_docs': 'Red Bull-Enhanced', 'rec_scores': 0.79903316}]"], 'tensors': []}
<a name="4.2.2"></a>
#### 4.2.2 C++ Serving
- Start the service:
# Start the service: Here, the subject detection and feature extraction services will be started at the same time in the background, and the port numbers are 9293 and 9294 respectively;
# The running logs are saved in log_mainbody_detection.txt and log_feature_extraction.txt respectively
sh run_cpp_serving.sh
- send request:
# send service request
python3.7 test_cpp_serving_client.py
After a successful run, the results of the model predictions are printed in the cmd window, and the results are as follows:
[{'bbox': [345, 95, 524, 586], 'rec_docs': 'Red Bull-Enhanced', 'rec_scores': 0.8016462}]
<a name="5"></a>
## 5. FAQ
**Q1**: No result is returned after the request is sent or an output decoding error is prompted
**A1**: Do not set the proxy when starting the service and sending the request. You can close the proxy before starting the service and sending the request. The command to close the proxy is:
unset https_proxy
unset http_proxy
**Q2**: nothing happens after starting the service
**A2**: You can check whether the path corresponding to `model_config` in `config.yml` exists, and whether the folder name is correct
For more service deployment types, such as `RPC prediction service`, you can refer to Serving's [github official website](https://github.com/PaddlePaddle/Serving/tree/v0.9.0/examples)
<a name="1"></a>
## 1. 简介
