未验证 提交 a8048f4a 编写于 作者: J Jiawei Wang 提交者: GitHub

Merge branch 'develop' into xpu-example

......@@ -60,6 +60,13 @@ option(WITH_TRT "Compile Paddle Serving with TRT"
option(PADDLE_ON_INFERENCE "Compile for encryption" ON)
option(WITH_OPENCV "Compile Paddle Serving with OPENCV" OFF)
if(NOT DEFINED VERSION_TAG)
set(VERSION_TAG "0.0.0")
endif()
if (WITH_PYTHON)
message(STATUS "Compile Version Tag for wheel: ${VERSION_TAG}")
endif()
if (WITH_OPENCV)
SET(OPENCV_DIR "" CACHE PATH "Location of libraries")
if(NOT DEFINED OPENCV_DIR)
......
......@@ -59,11 +59,11 @@ For example, after the Server Compiation step,the whl package will be produced
# Request parameters description
In order to deploy serving
service on the arm server with Baidu Kunlun xpu chips and use the acceleration capability of Paddle-Lite,please specify the following parameters during deployment。
|param|param description|about|
|:--|:--|:--|
|use_lite|using Paddle-Lite Engine|use the inference capability of Paddle-Lite|
|use_xpu|using Baidu Kunlun for inference|need to be used with the use_lite option|
|ir_optim|open the graph optimization|refer to[Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite)|
| param | param description | about |
| :------- | :------------------------------- | :----------------------------------------------------------------- |
| use_lite | using Paddle-Lite Engine | use the inference capability of Paddle-Lite |
| use_xpu | using Baidu Kunlun for inference | need to be used with the use_lite option |
| ir_optim | open the graph optimization | refer to[Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite) |
# Deplyment examples
## Download the model
```
......@@ -78,15 +78,15 @@ There are mainly three deployment methods:
The first two deployment methods are recommended。
Start the rpc service, deploying on ARM server with Baidu Kunlun chips,and accelerate with Paddle-Lite and Baidu Kunlun xpu.
Start the rpc service, deploying on ARM server with Baidu Kunlun chips,and accelerate with Paddle-Lite and Baidu Kunlun xpu.
```
python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --use_xpu --ir_optim
```
Start the rpc service, deploying on ARM server,and accelerate with Paddle-Lite.
Start the rpc service, deploying on ARM server,and accelerate with Paddle-Lite.
```
python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --ir_optim
```
Start the rpc service, deploying on ARM server.
Start the rpc service, deploying on ARM server.
```
python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292
```
......@@ -103,7 +103,7 @@ fetch_map = client.predict(feed={"x": np.array(data).reshape(1,13,1)}, fetch=["p
print(fetch_map)
```
Some examples are provided below, and other models can be modifed with reference to these examples。
|sample name|sample links|
|:-----|:--|
|fit_a_line|[fit_a_line_xpu](../python/examples/xpu/fit_a_line_xpu)|
|resnet|[resnet_v2_50_xpu](../python/examples/xpu/resnet_v2_50_xpu)|
| sample name | sample links |
| :---------- | :---------------------------------------------------------- |
| fit_a_line | [fit_a_line_xpu](../python/examples/xpu/fit_a_line_xpu) |
| resnet | [resnet_v2_50_xpu](../python/examples/xpu/resnet_v2_50_xpu) |
# Low-Precision Deployment for Paddle Serving
(English|[简体中文](./LOW_PRECISION_DEPLOYMENT_CN.md))
Intel CPU supports int8 and bfloat16 models, NVIDIA TensorRT supports int8 and float16 models.
## Obtain the quantized model through PaddleSlim tool
Train the low-precision models please refer to [PaddleSlim](https://paddleslim.readthedocs.io/zh_CN/latest/tutorials/quant/overview.html).
## Deploy the quantized model from PaddleSlim using Paddle Serving with Nvidia TensorRT int8 mode
Firstly, download the [Resnet50 int8 model](https://paddle-inference-dist.bj.bcebos.com/inference_demo/python/resnet50/ResNet50_quant.tar.gz) and convert to Paddle Serving's saved model。
```
wget https://paddle-inference-dist.bj.bcebos.com/inference_demo/python/resnet50/ResNet50_quant.tar.gz
tar zxvf ResNet50_quant.tar.gz
python -m paddle_serving_client.convert --dirname ResNet50_quant
```
Start RPC service, specify the GPU id and precision mode
```
python -m paddle_serving_server.serve --model serving_server --port 9393 --gpu_ids 0 --use_gpu --use_trt --precision int8
```
Request the serving service with Client
```
from paddle_serving_client import Client
from paddle_serving_app.reader import Sequential, File2Image, Resize, CenterCrop
from paddle_serving_app.reader import RGB2BGR, Transpose, Div, Normalize
client = Client()
client.load_client_config(
"resnet_v2_50_imagenet_client/serving_client_conf.prototxt")
client.connect(["127.0.0.1:9393"])
seq = Sequential([
File2Image(), Resize(256), CenterCrop(224), RGB2BGR(), Transpose((2, 0, 1)),
Div(255), Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], True)
])
image_file = "daisy.jpg"
img = seq(image_file)
fetch_map = client.predict(feed={"image": img}, fetch=["score"])
print(fetch_map["score"].reshape(-1))
```
## Reference
* [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)
* [Deploy the quantized model Using Paddle Inference on Intel CPU](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_x86_cpu_int8.html)
* [Deploy the quantized model Using Paddle Inference on Nvidia GPU](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_trt.html)
\ No newline at end of file
# Paddle Serving低精度部署
(简体中文|[English](./LOW_PRECISION_DEPLOYMENT.md))
低精度部署, 在Intel CPU上支持int8、bfloat16模型,Nvidia TensorRT支持int8、float16模型。
## 通过PaddleSlim量化生成低精度模型
详细见[PaddleSlim量化](https://paddleslim.readthedocs.io/zh_CN/latest/tutorials/quant/overview.html)
## 使用TensorRT int8加载PaddleSlim Int8量化模型进行部署
首先下载Resnet50 [PaddleSlim量化模型](https://paddle-inference-dist.bj.bcebos.com/inference_demo/python/resnet50/ResNet50_quant.tar.gz),并转换为Paddle Serving支持的部署模型格式。
```
wget https://paddle-inference-dist.bj.bcebos.com/inference_demo/python/resnet50/ResNet50_quant.tar.gz
tar zxvf ResNet50_quant.tar.gz
python -m paddle_serving_client.convert --dirname ResNet50_quant
```
启动rpc服务, 设定所选GPU id、部署模型精度
```
python -m paddle_serving_server.serve --model serving_server --port 9393 --gpu_ids 0 --use_gpu --use_trt --precision int8
```
使用client进行请求
```
from paddle_serving_client import Client
from paddle_serving_app.reader import Sequential, File2Image, Resize, CenterCrop
from paddle_serving_app.reader import RGB2BGR, Transpose, Div, Normalize
client = Client()
client.load_client_config(
"resnet_v2_50_imagenet_client/serving_client_conf.prototxt")
client.connect(["127.0.0.1:9393"])
seq = Sequential([
File2Image(), Resize(256), CenterCrop(224), RGB2BGR(), Transpose((2, 0, 1)),
Div(255), Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], True)
])
image_file = "daisy.jpg"
img = seq(image_file)
fetch_map = client.predict(feed={"image": img}, fetch=["score"])
print(fetch_map["score"].reshape(-1))
```
## 参考文档
* [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)
* PaddleInference Intel CPU部署量化模型[文档](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_x86_cpu_int8.html)
* PaddleInference NV GPU部署量化模型[文档](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_trt.html)
\ No newline at end of file
......@@ -47,3 +47,5 @@ elif package_name.endswith('xpu'):
path = "paddle_serving_" + sys.argv[1]
commit_id = subprocess.check_output(['git', 'rev-parse', 'HEAD'])
update_info(path + "/version.py", "commit_id", commit_id)
update_info(path + "/version.py", "version_tag", "${VERSION_TAG}")
......@@ -12,3 +12,6 @@
# See the License for the specific language governing permissions and
# limitations under the License.
from .models import ServingModels
from . import version
__version__ = version.version_tag
......@@ -12,5 +12,5 @@
# See the License for the specific language governing permissions and
# limitations under the License.
""" Paddle Serving App version string """
serving_app_version = "0.0.0"
version_tag = "0.0.0"
commit_id = ""
......@@ -17,4 +17,4 @@ from . import version
from . import client
from .client import *
__version__ = version.serving_client_version
__version__ = version.version_tag
......@@ -12,7 +12,6 @@
# See the License for the specific language governing permissions and
# limitations under the License.
""" Paddle Serving Client version string """
serving_client_version = "0.0.0"
serving_server_version = "0.0.0"
version_tag = "0.0.0"
module_proto_version = "0.0.0"
commit_id = ""
......@@ -31,4 +31,4 @@ from paddle_serving_server import (
from .dag import *
from .server import *
__version__ = version.serving_server_version
__version__ = version.version_tag
......@@ -22,7 +22,7 @@ from .proto import general_model_config_pb2 as m_config
from .proto import multi_lang_general_model_service_pb2_grpc
import google.protobuf.text_format
import time
from .version import serving_server_version, version_suffix, device_type
from .version import version_tag, version_suffix, device_type
from contextlib import closing
import argparse
......@@ -369,7 +369,7 @@ class Server(object):
version_file = open("{}/version.py".format(self.module_path), "r")
folder_name = "serving-%s-%s" % (self.get_serving_bin_name(),
serving_server_version)
version_tag)
tar_name = "%s.tar.gz" % folder_name
bin_url = "https://paddle-serving.bj.bcebos.com/bin/%s" % tar_name
......
......@@ -12,10 +12,9 @@
# See the License for the specific language governing permissions and
# limitations under the License.
""" Paddle Serving Server version string """
serving_client_version = "0.0.0"
serving_server_version = "0.0.0"
module_proto_version = "0.0.0"
version_tag = "0.0.0"
version_suffix = ""
module_proto_version = "0.0.0"
device_type = "0"
cuda_version = "9"
commit_id = ""
......@@ -22,7 +22,7 @@ import os
from setuptools import setup, Distribution, Extension
from setuptools import find_packages
from setuptools import setup
from paddle_serving_app.version import serving_app_version
from paddle_serving_app.version import version_tag
from pkg_resources import DistributionNotFound, get_distribution
def python_version():
......@@ -78,7 +78,7 @@ package_dir={'paddle_serving_app':
setup(
name='paddle-serving-app',
version=serving_app_version.replace('-', ''),
version=version_tag.replace('-', ''),
description=
('Paddle Serving Package for saved model with PaddlePaddle'),
url='https://github.com/PaddlePaddle/Serving',
......
......@@ -22,7 +22,7 @@ import sys
from setuptools import setup, Distribution, Extension
from setuptools import find_packages
from setuptools import setup
from paddle_serving_client.version import serving_client_version
from paddle_serving_client.version import version_tag
import util
py_version = sys.version_info
......@@ -79,7 +79,7 @@ package_dir={'paddle_serving_client':
setup(
name='paddle-serving-client',
version=serving_client_version.replace('-', ''),
version=version_tag.replace('-', ''),
description=
('Paddle Serving Package for saved model with PaddlePaddle'),
url='https://github.com/PaddlePaddle/Serving',
......
......@@ -19,7 +19,7 @@ from __future__ import print_function
from setuptools import setup, Distribution, Extension
from setuptools import find_packages
from setuptools import setup
from paddle_serving.version import serving_client_version
from paddle_serving.version import version_tag
from grpc_tools import protoc
import util
......@@ -43,7 +43,7 @@ package_dir={'paddle_serving.serving_client':
setup(
name='paddle-serving-client',
version=serving_client_version.replace('-', ''),
version=version_tag.replace('-', ''),
description=
('Paddle Serving Package for saved model with PaddlePaddle'),
url='https://github.com/PaddlePaddle/Serving',
......
......@@ -19,10 +19,10 @@ from __future__ import print_function
from setuptools import setup, Distribution, Extension
from setuptools import find_packages
from setuptools import setup
from paddle_serving_server.version import serving_server_version, version_suffix
from paddle_serving_server.version import version_tag, version_suffix
import util
package_version = serving_server_version.replace('-', '')
package_version = version_tag.replace('-', '')
if version_suffix != "":
version_suffix = "post" + version_suffix
package_version = package_version + "." + version_suffix
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册