README_CN.md 1.2 KB
Newer Older
Z
zhangjun 已提交
1 2 3 4 5 6 7 8 9 10 11 12
# resnet50 int8示例
(简体中文|[English](./README.md))

## 通过PaddleSlim量化生成低精度模型
详细见[PaddleSlim量化](https://paddleslim.readthedocs.io/zh_CN/latest/tutorials/quant/overview.html)

## 使用TensorRT int8加载PaddleSlim Int8量化模型进行部署
首先下载Resnet50 [PaddleSlim量化模型](https://paddle-inference-dist.bj.bcebos.com/inference_demo/python/resnet50/ResNet50_quant.tar.gz),并转换为Paddle Serving支持的部署模型格式。
```
wget https://paddle-inference-dist.bj.bcebos.com/inference_demo/python/resnet50/ResNet50_quant.tar.gz
tar zxvf ResNet50_quant.tar.gz

T
TeslaZhao 已提交
13
python3 -m paddle_serving_client.convert --dirname ResNet50_quant
Z
zhangjun 已提交
14 15 16
```
启动rpc服务, 设定所选GPU id、部署模型精度
```
T
TeslaZhao 已提交
17
python3 -m paddle_serving_server.serve --model serving_server --port 9393 --gpu_ids 0 --use_trt --precision int8 
Z
zhangjun 已提交
18 19 20
```
使用client进行请求
```
T
TeslaZhao 已提交
21
python3 resnet50_client.py
Z
zhangjun 已提交
22 23 24 25 26 27
```

## 参考文档
* [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)
* PaddleInference Intel CPU部署量化模型[文档](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_x86_cpu_int8.html)
* PaddleInference NV GPU部署量化模型[文档](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_trt.html)