@@ -59,11 +59,11 @@ For example, after the Server Compiation step,the whl package will be produced
...
@@ -59,11 +59,11 @@ For example, after the Server Compiation step,the whl package will be produced
# Request parameters description
# Request parameters description
In order to deploy serving
In order to deploy serving
service on the arm server with Baidu Kunlun xpu chips and use the acceleration capability of Paddle-Lite,please specify the following parameters during deployment。
service on the arm server with Baidu Kunlun xpu chips and use the acceleration capability of Paddle-Lite,please specify the following parameters during deployment。
Intel CPU supports int8 and bfloat16 models, NVIDIA TensorRT supports int8 and float16 models.
## Obtain the quantized model through PaddleSlim tool
Train the low-precision models please refer to [PaddleSlim](https://paddleslim.readthedocs.io/zh_CN/latest/tutorials/quant/overview.html).
## Deploy the quantized model from PaddleSlim using Paddle Serving with Nvidia TensorRT int8 mode
Firstly, download the [Resnet50 int8 model](https://paddle-inference-dist.bj.bcebos.com/inference_demo/python/resnet50/ResNet50_quant.tar.gz) and convert to Paddle Serving's saved model。