未验证 提交 d7c1c893 编写于 作者: T TeslaZhao 提交者: GitHub

Merge pull request #1790 from TeslaZhao/develop

Update doc
([简体中文](./README_CN.md)|English) (简体中文|[English](./README.md))
<p align="center"> <p align="center">
<br> <br>
...@@ -24,37 +24,36 @@ ...@@ -24,37 +24,36 @@
*** ***
The goal of Paddle Serving is to provide high-performance, flexible and easy-to-use industrial-grade online inference services for machine learning developers and enterprises.Paddle Serving supports multiple protocols such as RESTful, gRPC, bRPC, and provides inference solutions under a variety of hardware and multiple operating system environments, and many famous pre-trained model examples. The core features are as follows: Paddle Serving 依托深度学习框架 PaddlePaddle 旨在帮助深度学习开发者和企业提供高性能、灵活易用的工业级在线推理服务。Paddle Serving 支持 RESTful、gRPC、bRPC 等多种协议,提供多种异构硬件和多种操作系统环境下推理解决方案,和多种经典预训练模型示例。核心特性如下:
- 集成高性能服务端推理引擎 [Paddle Inference](https://paddleinference.paddlepaddle.org.cn/product_introduction/inference_intro.html) 和端侧引擎 [Paddle Lite](https://paddlelite.paddlepaddle.org.cn/introduction/tech_highlights.html),其他机器学习平台(Caffe/TensorFlow/ONNX/PyTorch)可通过 [x2paddle](https://github.com/PaddlePaddle/X2Paddle) 工具迁移模型
- 具有高性能 C++ Serving 和高易用 Python Pipeline 2套框架。C++ Serving 基于高性能 bRPC 网络框架打造高吞吐、低延迟的推理服务,性能领先竞品。Python Pipeline 基于 gRPC/gRPC-Gateway 网络框架和 Python 语言构建高易用、高吞吐推理服务框架。技术选型参考[技术选型](doc/Serving_Design_CN.md#21-设计选型)
- 支持 HTTP、gRPC、bRPC 等多种[协议](doc/C++_Serving/Inference_Protocols_CN.md);提供 C++、Python、Java 语言 SDK
- 设计并实现基于有向无环图(DAG) 的异步流水线高性能推理框架,具有多模型组合、异步调度、并发推理、动态批量、多卡多流推理、请求缓存等特性
- 适配 x86(Intel) CPU、ARM CPU、Nvidia GPU、昆仑 XPU、华为昇腾310/910、海光 DCU、Nvidia Jetson 等多种硬件
- 集成 Intel MKLDNN、Nvidia TensorRT 加速库,以及低精度量化推理
- 提供一套模型安全部署解决方案,包括加密模型部署、鉴权校验、HTTPs 安全网关,并在实际项目中应用
- 支持云端部署,提供百度云智能云 kubernetes 集群部署 Paddle Serving 案例
- 提供丰富的经典模型部署示例,如 PaddleOCR、PaddleClas、PaddleDetection、PaddleSeg、PaddleNLP、PaddleRec 等套件,共计40+个预训练精品模型
- 支持大规模稀疏参数索引模型分布式部署,具有多表、多分片、多副本、本地高频 cache 等特性、可单机或云端部署
- 支持服务监控,提供基于普罗米修斯的性能数据统计及端口访问
- Integrate high-performance server-side inference engine [Paddle Inference](https://paddleinference.paddlepaddle.org.cn/product_introduction/inference_intro.html) and mobile-side engine [Paddle Lite](https://paddlelite.paddlepaddle.org.cn/introduction/tech_highlights.html). Models of other machine learning platforms (Caffe/TensorFlow/ONNX/PyTorch) can be migrated to paddle through [x2paddle](https://github.com/PaddlePaddle/X2Paddle).
- There are two frameworks, namely high-performance C++ Serving and high-easy-to-use Python pipeline. The C++ Serving is based on the bRPC network framework to create a high-throughput, low-latency inference service, and its performance indicators are ahead of competing products. The Python pipeline is based on the gRPC/gRPC-Gateway network framework and the Python language to build a highly easy-to-use and high-throughput inference service. How to choose which one please see [Techinical Selection](doc/Serving_Design_EN.md#21-design-selection).
- Support multiple [protocols](doc/C++_Serving/Inference_Protocols_CN.md) such as HTTP, gRPC, bRPC, and provide C++, Python, Java language SDK.
- Design and implement a high-performance inference service framework for asynchronous pipelines based on directed acyclic graph (DAG), with features such as multi-model combination, asynchronous scheduling, concurrent inference, dynamic batch, multi-card multi-stream inference, request cache, etc.
- Adapt to a variety of commonly used computing hardwares, such as x86 (Intel) CPU, ARM CPU, Nvidia GPU, Kunlun XPU, HUAWEI Ascend 310/910, HYGON DCU、Nvidia Jetson etc.
- Integrate acceleration libraries of Intel MKLDNN and Nvidia TensorRT, and low-precision and quantitative inference.
- Provide a model security deployment solution, including encryption model deployment, and authentication mechanism, HTTPs security gateway, which is used in practice.
- Support cloud deployment, provide a deployment case of Baidu Cloud Intelligent Cloud kubernetes cluster.
- Provide more than 40 classic pre-model deployment examples, such as PaddleOCR, PaddleClas, PaddleDetection, PaddleSeg, PaddleNLP, PaddleRec and other suites, and more models continue to expand.
- Supports distributed deployment of large-scale sparse parameter index models, with features such as multiple tables, multiple shards, multiple copies, local high-frequency cache, etc., and can be deployed on a single machine or clouds.
- Support service monitoring, provide prometheus-based performance statistics and port access
<h2 align="center">教程与案例</h2>
<h2 align="center">Tutorial and Solutions</h2> - AIStudio 使用教程 : [Paddle Serving服务化部署框架](https://www.paddlepaddle.org.cn/tutorials/projectdetail/3946013)
- AIStudio OCR 实战 : [基于Paddle Serving的OCR服务化部署实战](https://aistudio.baidu.com/aistudio/projectdetail/3630726)
- 视频教程 : [深度学习服务化部署-以互联网应用为例](https://aistudio.baidu.com/aistudio/course/introduce/19084)
- 边缘 AI 解决方案 : [基于Paddle Serving&百度智能边缘BIE的边缘AI解决方案](https://mp.weixin.qq.com/s/j0EVlQXaZ7qmoz9Fv96Yrw)
- 政务问答解决方案 : [政务问答检索式 FAQ System](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/question_answering/faq_system)
- 智能问答解决方案 : [保险智能问答](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/question_answering/faq_finance)
- 语义索引解决方案 : [In-batch Negatives](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/neural_search/recall/in_batch_negative)
- AIStudio tutorial(Chinese) : [Paddle Serving服务化部署框架](https://www.paddlepaddle.org.cn/tutorials/projectdetail/3946013) <h2 align="center">论文</h2>
- AIStudio OCR practice(Chinese) : [基于PaddleServing的OCR服务化部署实战](https://aistudio.baidu.com/aistudio/projectdetail/3630726)
- Video tutorial(Chinese) : [深度学习服务化部署-以互联网应用为例](https://aistudio.baidu.com/aistudio/course/introduce/19084)
- Edge AI solution(Chinese) : [基于Paddle Serving&百度智能边缘BIE的边缘AI解决方案](https://mp.weixin.qq.com/s/j0EVlQXaZ7qmoz9Fv96Yrw)
- GOVT Q&A Solution(Chinese) : [政务问答检索式 FAQ System](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/question_answering/faq_system)
- Smart Q&A Solution(Chinese) : [保险智能问答](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/question_answering/faq_finance)
- Semantic Indexing Solution(Chinese) : [In-batch Negatives](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/neural_search/recall/in_batch_negative)
<h2 align="center">Papers</h2> - 论文 : [JiZhi: A Fast and Cost-Effective Model-As-A-Service System for
- Paper : [JiZhi: A Fast and Cost-Effective Model-As-A-Service System for
Web-Scale Online Inference at Baidu](https://arxiv.org/pdf/2106.01674.pdf) Web-Scale Online Inference at Baidu](https://arxiv.org/pdf/2106.01674.pdf)
- Paper : [ERNIE 3.0 TITAN: EXPLORING LARGER-SCALE KNOWLEDGE - 论文 : [ERNIE 3.0 TITAN: EXPLORING LARGER-SCALE KNOWLEDGE
ENHANCED PRE-TRAINING FOR LANGUAGE UNDERSTANDING ENHANCED PRE-TRAINING FOR LANGUAGE UNDERSTANDING
AND GENERATION](https://arxiv.org/pdf/2112.12731.pdf) AND GENERATION](https://arxiv.org/pdf/2112.12731.pdf)
...@@ -62,118 +61,115 @@ AND GENERATION](https://arxiv.org/pdf/2112.12731.pdf) ...@@ -62,118 +61,115 @@ AND GENERATION](https://arxiv.org/pdf/2112.12731.pdf)
<img src="doc/images/demo.gif" width="700"> <img src="doc/images/demo.gif" width="700">
</p> </p>
<h2 align="center">Documentation</h2> <h2 align="center">文档</h2>
> 部署
> Set up
此章节引导您完成安装和部署步骤,强烈推荐使用Docker部署Paddle Serving,如您不使用docker,省略docker相关步骤。在云服务器上可以使用Kubernetes部署Paddle Serving。在异构硬件如ARM CPU、昆仑XPU上编译或使用Paddle Serving可阅读以下文档。每天编译生成develop分支的最新开发包供开发者使用。
This chapter guides you through the installation and deployment steps. It is strongly recommended to use Docker to deploy Paddle Serving. If you do not use docker, ignore the docker-related steps. Paddle Serving can be deployed on cloud servers using Kubernetes, running on many commonly hardwares such as ARM CPU, Intel CPU, Nvidia GPU, Kunlun XPU. The latest development kit of the develop branch is compiled and generated every day for developers to use. - [使用 Docker 安装 Paddle Serving](doc/Install_CN.md)
- [Linux 原生系统安装 Paddle Serving](doc/Install_Linux_Env_CN.md)
- [Install Paddle Serving using docker](doc/Install_EN.md) - [源码编译安装 Paddle Serving](doc/Compile_CN.md)
- [Build Paddle Serving from Source with Docker](doc/Compile_EN.md) - [Kuberntes集群部署 Paddle Serving](doc/Run_On_Kubernetes_CN.md)
- [Install Paddle Serving on linux system](doc/Install_Linux_Env_CN.md) - [部署 Paddle Serving 安全网关](doc/Serving_Auth_Docker_CN.md)
- [Deploy Paddle Serving on Kubernetes(Chinese)](doc/Run_On_Kubernetes_CN.md) - 异构硬件部署[[ARM CPU、百度昆仑](doc/Run_On_XPU_CN.md)[华为昇腾](doc/Run_On_NPU_CN.md)[海光DCU](doc/Run_On_DCU_CN.md)[Jetson](doc/Run_On_JETSON_CN.md)]
- [Deploy Paddle Serving with Security gateway(Chinese)](doc/Serving_Auth_Docker_CN.md) - [Docker 镜像列表](doc/Docker_Images_CN.md)
- Deploy on more hardwares[[ARM CPU、百度昆仑](doc/Run_On_XPU_EN.md)[华为昇腾](doc/Run_On_NPU_CN.md)[海光DCU](doc/Run_On_DCU_CN.md)[Jetson](doc/Run_On_JETSON_CN.md)] - [下载 Python Wheels](doc/Latest_Packages_CN.md)
- [Docker Images](doc/Docker_Images_EN.md)
- [Download Wheel packages](doc/Latest_Packages_EN.md) > 使用
> Use 安装Paddle Serving后,使用快速开始将引导您运行Serving。第一步,调用模型保存接口,生成模型参数配置文件(.prototxt)用以在客户端和服务端使用;第二步,阅读配置和启动参数并启动服务;第三步,根据API和您的使用场景,基于SDK编写客户端请求,并测试推理服务。您想了解跟多特性的使用场景和方法,请详细阅读以下文档。
- [快速开始](doc/Quick_Start_CN.md)
The first step is to call the model save interface to generate a model parameter configuration file (.prototxt), which will be used on the client and server. The second step, read the configuration and startup parameters and start the service. According to API documents and your case, the third step is to write client requests based on the SDK, and test the inference service. - [保存用于Paddle Serving的模型和配置](doc/Save_CN.md)
- [配置和启动参数的说明](doc/Serving_Configure_CN.md)
- [Quick Start](doc/Quick_Start_EN.md) - [RESTful/gRPC/bRPC API指南](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client)
- [Save a servable model](doc/Save_EN.md) - [低精度推理](doc/Low_Precision_CN.md)
- [Description of configuration and startup parameters](doc/Serving_Configure_EN.md) - [常见模型数据处理](doc/Process_data_CN.md)
- [Guide for RESTful/gRPC/bRPC APIs(Chinese)](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client) - [普罗米修斯](doc/Prometheus_CN.md)
- [Infer on quantizative models](doc/Low_Precision_EN.md) - [设置 TensorRT 动态shape](doc/TensorRT_Dynamic_Shape_CN.md)
- [Data format of classic models(Chinese)](doc/Process_data_CN.md) - [C++ Serving 概述](doc/C++_Serving/Introduction_CN.md)
- [Prometheus(Chinese)](doc/Prometheus_CN.md) - [异步框架](doc/C++_Serving/Asynchronous_Framwork_CN.md)
- [C++ Serving(Chinese)](doc/C++_Serving/Introduction_CN.md) - [协议](doc/C++_Serving/Inference_Protocols_CN.md)
- [Protocols(Chinese)](doc/C++_Serving/Inference_Protocols_CN.md) - [模型热加载](doc/C++_Serving/Hot_Loading_CN.md)
- [Hot loading models](doc/C++_Serving/Hot_Loading_EN.md) - [A/B Test](doc/C++_Serving/ABTest_CN.md)
- [A/B Test](doc/C++_Serving/ABTest_EN.md) - [加密模型推理服务](doc/C++_Serving/Encryption_CN.md)
- [Encryption](doc/C++_Serving/Encryption_EN.md) - [性能优化指南](doc/C++_Serving/Performance_Tuning_CN.md)
- [Analyze and optimize performance(Chinese)](doc/C++_Serving/Performance_Tuning_CN.md) - [性能指标](doc/C++_Serving/Benchmark_CN.md)
- [Benchmark(Chinese)](doc/C++_Serving/Benchmark_CN.md) - [多模型串联](doc/C++_Serving/2+_model.md)
- [Multiple models in series(Chinese)](doc/C++_Serving/2+_model.md) - [请求缓存](doc/C++_Serving/Request_Cache_CN.md)
- [Request Cache(Chinese)](doc/C++_Serving/Request_Cache_CN.md) - [Python Pipeline 概述](doc/Python_Pipeline/Pipeline_Int_CN.md)
- [Python Pipeline Overview(Chinese)](doc/Python_Pipeline/Pipeline_Int_CN.md) - [框架设计](doc/Python_Pipeline/Pipeline_Design_CN.md)
- [Architecture Design(Chinese)](doc/Python_Pipeline/Pipeline_Design_CN.md) - [核心功能](doc/Python_Pipeline/Pipeline_Features_CN.md)
- [Core Features(Chinese)](doc/Python_Pipeline/Pipeline_Features_CN.md) - [性能优化](doc/Python_Pipeline/Pipeline_Optimize_CN.md)
- [Performance Optimization(Chinese)](doc/Python_Pipeline/Pipeline_Optimize_CN.md) - [性能指标](doc/Python_Pipeline/Pipeline_Benchmark_CN.md)
- [Benchmark(Chinese)](doc/Python_Pipeline/Pipeline_Benchmark_CN.md) - 客户端SDK
- Client SDK - [Python SDK](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client)
- [Python SDK(Chinese)](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client) - [JAVA SDK](doc/Java_SDK_CN.md)
- [JAVA SDK](doc/Java_SDK_EN.md) - [C++ SDK](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client)
- [C++ SDK(Chinese)](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client) - [大规模稀疏参数索引服务](doc/Cube_Local_CN.md)
- [Large-scale sparse parameter server](doc/Cube_Local_EN.md)
> 开发者
<br>
为Paddle Serving开发者,提供自定义OP,变长数据处理。
> Developers - [自定义OP](doc/C++_Serving/OP_CN.md)
- [变长数据(LoD)处理](doc/LOD_CN.md)
For Paddle Serving developers, we provide extended documents such as custom OP, level of detail(LOD) processing. - [常见问答](doc/FAQ_CN.md)
- [Custom Operators](doc/C++_Serving/OP_EN.md)
- [Processing LoD Data](doc/LOD_EN.md) <h2 align="center">模型库</h2>
- [FAQ(Chinese)](doc/FAQ_CN.md)
Paddle Serving与Paddle模型套件紧密配合,实现大量服务化部署,包括图像分类、物体检测、语言文本识别、中文词性、情感分析、内容推荐等多种类型示例,以及Paddle全链条项目,共计47个模型。
<h2 align="center">Model Zoo</h2>
Paddle Serving works closely with the Paddle model suite, and implements a large number of service deployment examples, including image classification, object detection, language and text recognition, Chinese part of speech, sentiment analysis, content recommendation and other types of examples, for a total of 46 models.
<p align="center"> <p align="center">
| PaddleOCR | PaddleDetection | PaddleClas | PaddleSeg | PaddleRec | Paddle NLP | Paddle Video | | PaddleOCR | PaddleDetection | PaddleClas | PaddleSeg | PaddleRec | Paddle NLP | Paddle Video |
| :----: | :----: | :----: | :----: | :----: | :----: | :----: | | :----: | :----: | :----: | :----: | :----: | :----: | :----: |
| 8 | 12 | 14 | 2 | 3 | 6 | 1| | 8 | 12 | 14 | 2 | 3 | 7 | 1 |
</p> </p>
For more model examples, read [Model zoo](doc/Model_Zoo_EN.md) 更多模型示例进入[模型库](doc/Model_Zoo_CN.md)
<p align="center"> <p align="center">
<img src="https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/imgs_results/PP-OCRv2/PP-OCRv2-pic003.jpg?raw=true" width="345"/> <img src="https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/imgs_results/PP-OCRv2/PP-OCRv2-pic003.jpg?raw=true" width="345"/>
<img src="doc/images/detection.png" width="350"> <img src="doc/images/detection.png" width="350">
</p> </p>
<h2 align="center">社区</h2>
<h2 align="center">Community</h2>
If you want to communicate with developers and other users? Welcome to join us, join the community through the following methods below. 您想要同开发者和其他用户沟通吗?欢迎加入我们,通过如下方式加入社群
### Wechat ### 微信
- WeChat scavenging - 微信用户请扫码
<p align="center"> <p align="center">
<img src="doc/images/wechat_group_1.jpeg" width="250"> <img src="doc/images/wechat_group_1.jpeg" width="250">
</p> </p>
### QQ ### QQ
- QQ Group(Group No.:697765514) - 飞桨推理部署交流群(Group No.:697765514)
<p align="center"> <p align="center">
<img src="doc/images/qq_group_1.png" width="200"> <img src="doc/images/qq_group_1.png" width="200">
</p> </p>
> Contribution > 贡献代码
If you want to contribute code to Paddle Serving, please reference [Contribution Guidelines](doc/Contribute_EN.md) 如果您想为Paddle Serving贡献代码,请参考 [Contribution Guidelines(English)](doc/Contribute_EN.md)
- Thanks to [@loveululu](https://github.com/loveululu) for providing python API of Cube. - 感谢 [@w5688414](https://github.com/w5688414) 提供 NLP Ernie Indexing 案例
- Thanks to [@EtachGu](https://github.com/EtachGu) in updating run docker codes. - 感谢 [@loveululu](https://github.com/loveululu) 提供 Cube python API
- Thanks to [@BeyondYourself](https://github.com/BeyondYourself) in complementing the gRPC tutorial, updating the FAQ doc and modifying the mdkir command - 感谢 [@EtachGu](https://github.com/EtachGu) 更新 docker 使用命令
- Thanks to [@mcl-stone](https://github.com/mcl-stone) in updating faster_rcnn benchmark - 感谢 [@BeyondYourself](https://github.com/BeyondYourself) 提供grpc教程,更新FAQ教程,整理文件目录。
- Thanks to [@cg82616424](https://github.com/cg82616424) in updating the unet benchmark modifying resize comment error - 感谢 [@mcl-stone](https://github.com/mcl-stone) 提供faster rcnn benchmark脚本
- Thanks to [@cuicheng01](https://github.com/cuicheng01) for providing 11 PaddleClas models - 感谢 [@cg82616424](https://github.com/cg82616424) 提供unet benchmark脚本和修改部分注释错误
- Thanks to [@Jiaqi Liu](https://github.com/LiuChiachi) for supporting prediction for string list input - 感谢 [@cuicheng01](https://github.com/cuicheng01) 提供PaddleClas的11个模型
- Thanks to [@Bin Lu](https://github.com/Intsigstephon) for adding pp-shitu example - 感谢 [@Jiaqi Liu](https://github.com/LiuChiachi) 新增list[str]类型输入的预测支持
- 感谢 [@Bin Lu](https://github.com/Intsigstephon) 提供PP-Shitu C++模型示例
> Feedback > 反馈
For any feedback or to report a bug, please propose a [GitHub Issue](https://github.com/PaddlePaddle/Serving/issues). 如有任何反馈或是bug,请在 [GitHub Issue](https://github.com/PaddlePaddle/Serving/issues)提交
> License > License
[Apache 2.0 License](https://github.com/PaddlePaddle/Serving/blob/develop/LICENSE) [Apache 2.0 License](https://github.com/PaddlePaddle/Serving/blob/develop/LICENSE)
\ No newline at end of file
(简体中文|[English](./README.md)) ([简体中文](./README_CN.md)|English)
<p align="center"> <p align="center">
<br> <br>
...@@ -24,36 +24,37 @@ ...@@ -24,36 +24,37 @@
*** ***
Paddle Serving 依托深度学习框架 PaddlePaddle 旨在帮助深度学习开发者和企业提供高性能、灵活易用的工业级在线推理服务。Paddle Serving 支持 RESTful、gRPC、bRPC 等多种协议,提供多种异构硬件和多种操作系统环境下推理解决方案,和多种经典预训练模型示例。核心特性如下: The goal of Paddle Serving is to provide high-performance, flexible and easy-to-use industrial-grade online inference services for machine learning developers and enterprises.Paddle Serving supports multiple protocols such as RESTful, gRPC, bRPC, and provides inference solutions under a variety of hardware and multiple operating system environments, and many famous pre-trained model examples. The core features are as follows:
- 集成高性能服务端推理引擎 [Paddle Inference](https://paddleinference.paddlepaddle.org.cn/product_introduction/inference_intro.html) 和端侧引擎 [Paddle Lite](https://paddlelite.paddlepaddle.org.cn/introduction/tech_highlights.html),其他机器学习平台(Caffe/TensorFlow/ONNX/PyTorch)可通过 [x2paddle](https://github.com/PaddlePaddle/X2Paddle) 工具迁移模型
- 具有高性能 C++ Serving 和高易用 Python Pipeline 2套框架。C++ Serving 基于高性能 bRPC 网络框架打造高吞吐、低延迟的推理服务,性能领先竞品。Python Pipeline 基于 gRPC/gRPC-Gateway 网络框架和 Python 语言构建高易用、高吞吐推理服务框架。技术选型参考[技术选型](doc/Serving_Design_CN.md#21-设计选型)
- 支持 HTTP、gRPC、bRPC 等多种[协议](doc/C++_Serving/Inference_Protocols_CN.md);提供 C++、Python、Java 语言 SDK
- 设计并实现基于有向无环图(DAG) 的异步流水线高性能推理框架,具有多模型组合、异步调度、并发推理、动态批量、多卡多流推理、请求缓存等特性
- 适配 x86(Intel) CPU、ARM CPU、Nvidia GPU、昆仑 XPU、华为昇腾310/910、海光 DCU、Nvidia Jetson 等多种硬件
- 集成 Intel MKLDNN、Nvidia TensorRT 加速库,以及低精度量化推理
- 提供一套模型安全部署解决方案,包括加密模型部署、鉴权校验、HTTPs 安全网关,并在实际项目中应用
- 支持云端部署,提供百度云智能云 kubernetes 集群部署 Paddle Serving 案例
- 提供丰富的经典模型部署示例,如 PaddleOCR、PaddleClas、PaddleDetection、PaddleSeg、PaddleNLP、PaddleRec 等套件,共计40+个预训练精品模型
- 支持大规模稀疏参数索引模型分布式部署,具有多表、多分片、多副本、本地高频 cache 等特性、可单机或云端部署
- 支持服务监控,提供基于普罗米修斯的性能数据统计及端口访问
- Integrate high-performance server-side inference engine [Paddle Inference](https://paddleinference.paddlepaddle.org.cn/product_introduction/inference_intro.html) and mobile-side engine [Paddle Lite](https://paddlelite.paddlepaddle.org.cn/introduction/tech_highlights.html). Models of other machine learning platforms (Caffe/TensorFlow/ONNX/PyTorch) can be migrated to paddle through [x2paddle](https://github.com/PaddlePaddle/X2Paddle).
- There are two frameworks, namely high-performance C++ Serving and high-easy-to-use Python pipeline. The C++ Serving is based on the bRPC network framework to create a high-throughput, low-latency inference service, and its performance indicators are ahead of competing products. The Python pipeline is based on the gRPC/gRPC-Gateway network framework and the Python language to build a highly easy-to-use and high-throughput inference service. How to choose which one please see [Techinical Selection](doc/Serving_Design_EN.md#21-design-selection).
- Support multiple [protocols](doc/C++_Serving/Inference_Protocols_CN.md) such as HTTP, gRPC, bRPC, and provide C++, Python, Java language SDK.
- Design and implement a high-performance inference service framework for asynchronous pipelines based on directed acyclic graph (DAG), with features such as multi-model combination, asynchronous scheduling, concurrent inference, dynamic batch, multi-card multi-stream inference, request cache, etc.
- Adapt to a variety of commonly used computing hardwares, such as x86 (Intel) CPU, ARM CPU, Nvidia GPU, Kunlun XPU, HUAWEI Ascend 310/910, HYGON DCU、Nvidia Jetson etc.
- Integrate acceleration libraries of Intel MKLDNN and Nvidia TensorRT, and low-precision and quantitative inference.
- Provide a model security deployment solution, including encryption model deployment, and authentication mechanism, HTTPs security gateway, which is used in practice.
- Support cloud deployment, provide a deployment case of Baidu Cloud Intelligent Cloud kubernetes cluster.
- Provide more than 40 classic pre-model deployment examples, such as PaddleOCR, PaddleClas, PaddleDetection, PaddleSeg, PaddleNLP, PaddleRec and other suites, and more models continue to expand.
- Supports distributed deployment of large-scale sparse parameter index models, with features such as multiple tables, multiple shards, multiple copies, local high-frequency cache, etc., and can be deployed on a single machine or clouds.
- Support service monitoring, provide prometheus-based performance statistics and port access
<h2 align="center">教程与案例</h2>
- AIStudio 使用教程 : [Paddle Serving服务化部署框架](https://www.paddlepaddle.org.cn/tutorials/projectdetail/3946013) <h2 align="center">Tutorial and Solutions</h2>
- AIStudio OCR 实战 : [基于Paddle Serving的OCR服务化部署实战](https://aistudio.baidu.com/aistudio/projectdetail/3630726)
- 视频教程 : [深度学习服务化部署-以互联网应用为例](https://aistudio.baidu.com/aistudio/course/introduce/19084)
- 边缘 AI 解决方案 : [基于Paddle Serving&百度智能边缘BIE的边缘AI解决方案](https://mp.weixin.qq.com/s/j0EVlQXaZ7qmoz9Fv96Yrw)
- 政务问答解决方案 : [政务问答检索式 FAQ System](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/question_answering/faq_system)
- 智能问答解决方案 : [保险智能问答](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/question_answering/faq_finance)
- 语义索引解决方案 : [In-batch Negatives](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/neural_search/recall/in_batch_negative)
<h2 align="center">论文</h2> - AIStudio tutorial(Chinese) : [Paddle Serving服务化部署框架](https://www.paddlepaddle.org.cn/tutorials/projectdetail/3946013)
- AIStudio OCR practice(Chinese) : [基于PaddleServing的OCR服务化部署实战](https://aistudio.baidu.com/aistudio/projectdetail/3630726)
- Video tutorial(Chinese) : [深度学习服务化部署-以互联网应用为例](https://aistudio.baidu.com/aistudio/course/introduce/19084)
- Edge AI solution(Chinese) : [基于Paddle Serving&百度智能边缘BIE的边缘AI解决方案](https://mp.weixin.qq.com/s/j0EVlQXaZ7qmoz9Fv96Yrw)
- GOVT Q&A Solution(Chinese) : [政务问答检索式 FAQ System](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/question_answering/faq_system)
- Smart Q&A Solution(Chinese) : [保险智能问答](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/question_answering/faq_finance)
- Semantic Indexing Solution(Chinese) : [In-batch Negatives](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/neural_search/recall/in_batch_negative)
- 论文 : [JiZhi: A Fast and Cost-Effective Model-As-A-Service System for <h2 align="center">Papers</h2>
- Paper : [JiZhi: A Fast and Cost-Effective Model-As-A-Service System for
Web-Scale Online Inference at Baidu](https://arxiv.org/pdf/2106.01674.pdf) Web-Scale Online Inference at Baidu](https://arxiv.org/pdf/2106.01674.pdf)
- 论文 : [ERNIE 3.0 TITAN: EXPLORING LARGER-SCALE KNOWLEDGE - Paper : [ERNIE 3.0 TITAN: EXPLORING LARGER-SCALE KNOWLEDGE
ENHANCED PRE-TRAINING FOR LANGUAGE UNDERSTANDING ENHANCED PRE-TRAINING FOR LANGUAGE UNDERSTANDING
AND GENERATION](https://arxiv.org/pdf/2112.12731.pdf) AND GENERATION](https://arxiv.org/pdf/2112.12731.pdf)
...@@ -61,114 +62,117 @@ AND GENERATION](https://arxiv.org/pdf/2112.12731.pdf) ...@@ -61,114 +62,117 @@ AND GENERATION](https://arxiv.org/pdf/2112.12731.pdf)
<img src="doc/images/demo.gif" width="700"> <img src="doc/images/demo.gif" width="700">
</p> </p>
<h2 align="center">文档</h2> <h2 align="center">Documentation</h2>
> 部署
> Set up
此章节引导您完成安装和部署步骤,强烈推荐使用Docker部署Paddle Serving,如您不使用docker,省略docker相关步骤。在云服务器上可以使用Kubernetes部署Paddle Serving。在异构硬件如ARM CPU、昆仑XPU上编译或使用Paddle Serving可阅读以下文档。每天编译生成develop分支的最新开发包供开发者使用。
- [使用 Docker 安装 Paddle Serving](doc/Install_CN.md) This chapter guides you through the installation and deployment steps. It is strongly recommended to use Docker to deploy Paddle Serving. If you do not use docker, ignore the docker-related steps. Paddle Serving can be deployed on cloud servers using Kubernetes, running on many commonly hardwares such as ARM CPU, Intel CPU, Nvidia GPU, Kunlun XPU. The latest development kit of the develop branch is compiled and generated every day for developers to use.
- [Linux 原生系统安装 Paddle Serving](doc/Install_Linux_Env_CN.md)
- [源码编译安装 Paddle Serving](doc/Compile_CN.md) - [Install Paddle Serving using docker](doc/Install_EN.md)
- [Kuberntes集群部署 Paddle Serving](doc/Run_On_Kubernetes_CN.md) - [Build Paddle Serving from Source with Docker](doc/Compile_EN.md)
- [部署 Paddle Serving 安全网关](doc/Serving_Auth_Docker_CN.md) - [Install Paddle Serving on linux system](doc/Install_Linux_Env_CN.md)
- 异构硬件部署[[ARM CPU、百度昆仑](doc/Run_On_XPU_CN.md)[华为昇腾](doc/Run_On_NPU_CN.md)[海光DCU](doc/Run_On_DCU_CN.md)[Jetson](doc/Run_On_JETSON_CN.md)] - [Deploy Paddle Serving on Kubernetes(Chinese)](doc/Run_On_Kubernetes_CN.md)
- [Docker 镜像列表](doc/Docker_Images_CN.md) - [Deploy Paddle Serving with Security gateway(Chinese)](doc/Serving_Auth_Docker_CN.md)
- [下载 Python Wheels](doc/Latest_Packages_CN.md) - Deploy on more hardwares[[ARM CPU、百度昆仑](doc/Run_On_XPU_EN.md)[华为昇腾](doc/Run_On_NPU_CN.md)[海光DCU](doc/Run_On_DCU_CN.md)[Jetson](doc/Run_On_JETSON_CN.md)]
- [Docker Images](doc/Docker_Images_EN.md)
> 使用 - [Download Wheel packages](doc/Latest_Packages_EN.md)
安装Paddle Serving后,使用快速开始将引导您运行Serving。第一步,调用模型保存接口,生成模型参数配置文件(.prototxt)用以在客户端和服务端使用;第二步,阅读配置和启动参数并启动服务;第三步,根据API和您的使用场景,基于SDK编写客户端请求,并测试推理服务。您想了解跟多特性的使用场景和方法,请详细阅读以下文档。 > Use
- [快速开始](doc/Quick_Start_CN.md)
- [保存用于Paddle Serving的模型和配置](doc/Save_CN.md) The first step is to call the model save interface to generate a model parameter configuration file (.prototxt), which will be used on the client and server. The second step, read the configuration and startup parameters and start the service. According to API documents and your case, the third step is to write client requests based on the SDK, and test the inference service.
- [配置和启动参数的说明](doc/Serving_Configure_CN.md)
- [RESTful/gRPC/bRPC API指南](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client) - [Quick Start](doc/Quick_Start_EN.md)
- [低精度推理](doc/Low_Precision_CN.md) - [Save a servable model](doc/Save_EN.md)
- [常见模型数据处理](doc/Process_data_CN.md) - [Description of configuration and startup parameters](doc/Serving_Configure_EN.md)
- [普罗米修斯](doc/Prometheus_CN.md) - [Guide for RESTful/gRPC/bRPC APIs(Chinese)](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client)
- [设置 TensorRT 动态shape](doc/TensorRT_Dynamic_Shape_CN.md) - [Infer on quantizative models](doc/Low_Precision_EN.md)
- [C++ Serving 概述](doc/C++_Serving/Introduction_CN.md) - [Data format of classic models(Chinese)](doc/Process_data_CN.md)
- [异步框架](doc/C++_Serving/Asynchronous_Framwork_CN.md) - [Prometheus(Chinese)](doc/Prometheus_CN.md)
- [协议](doc/C++_Serving/Inference_Protocols_CN.md) - [C++ Serving(Chinese)](doc/C++_Serving/Introduction_CN.md)
- [模型热加载](doc/C++_Serving/Hot_Loading_CN.md) - [Protocols(Chinese)](doc/C++_Serving/Inference_Protocols_CN.md)
- [A/B Test](doc/C++_Serving/ABTest_CN.md) - [Hot loading models](doc/C++_Serving/Hot_Loading_EN.md)
- [加密模型推理服务](doc/C++_Serving/Encryption_CN.md) - [A/B Test](doc/C++_Serving/ABTest_EN.md)
- [性能优化指南](doc/C++_Serving/Performance_Tuning_CN.md) - [Encryption](doc/C++_Serving/Encryption_EN.md)
- [性能指标](doc/C++_Serving/Benchmark_CN.md) - [Analyze and optimize performance(Chinese)](doc/C++_Serving/Performance_Tuning_CN.md)
- [多模型串联](doc/C++_Serving/2+_model.md) - [Benchmark(Chinese)](doc/C++_Serving/Benchmark_CN.md)
- [请求缓存](doc/C++_Serving/Request_Cache_CN.md) - [Multiple models in series(Chinese)](doc/C++_Serving/2+_model.md)
- [Python Pipeline 概述](doc/Python_Pipeline/Pipeline_Int_CN.md) - [Request Cache(Chinese)](doc/C++_Serving/Request_Cache_CN.md)
- [框架设计](doc/Python_Pipeline/Pipeline_Design_CN.md) - [Python Pipeline Overview(Chinese)](doc/Python_Pipeline/Pipeline_Int_CN.md)
- [核心功能](doc/Python_Pipeline/Pipeline_Features_CN.md) - [Architecture Design(Chinese)](doc/Python_Pipeline/Pipeline_Design_CN.md)
- [性能优化](doc/Python_Pipeline/Pipeline_Optimize_CN.md) - [Core Features(Chinese)](doc/Python_Pipeline/Pipeline_Features_CN.md)
- [性能指标](doc/Python_Pipeline/Pipeline_Benchmark_CN.md) - [Performance Optimization(Chinese)](doc/Python_Pipeline/Pipeline_Optimize_CN.md)
- 客户端SDK - [Benchmark(Chinese)](doc/Python_Pipeline/Pipeline_Benchmark_CN.md)
- [Python SDK](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client) - Client SDK
- [JAVA SDK](doc/Java_SDK_CN.md) - [Python SDK(Chinese)](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client)
- [C++ SDK](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client) - [JAVA SDK](doc/Java_SDK_EN.md)
- [大规模稀疏参数索引服务](doc/Cube_Local_CN.md) - [C++ SDK(Chinese)](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client)
- [Large-scale sparse parameter server](doc/Cube_Local_EN.md)
> 开发者
<br>
为Paddle Serving开发者,提供自定义OP,变长数据处理。
- [自定义OP](doc/C++_Serving/OP_CN.md) > Developers
- [变长数据(LoD)处理](doc/LOD_CN.md)
- [常见问答](doc/FAQ_CN.md) For Paddle Serving developers, we provide extended documents such as custom OP, level of detail(LOD) processing.
- [Custom Operators](doc/C++_Serving/OP_EN.md)
<h2 align="center">模型库</h2> - [Processing LoD Data](doc/LOD_EN.md)
- [FAQ(Chinese)](doc/FAQ_CN.md)
Paddle Serving与Paddle模型套件紧密配合,实现大量服务化部署,包括图像分类、物体检测、语言文本识别、中文词性、情感分析、内容推荐等多种类型示例,以及Paddle全链条项目,共计47个模型。
<h2 align="center">Model Zoo</h2>
Paddle Serving works closely with the Paddle model suite, and implements a large number of service deployment examples, including image classification, object detection, language and text recognition, Chinese part of speech, sentiment analysis, content recommendation and other types of examples, for a total of 46 models.
<p align="center"> <p align="center">
| PaddleOCR | PaddleDetection | PaddleClas | PaddleSeg | PaddleRec | Paddle NLP | Paddle Video | | PaddleOCR | PaddleDetection | PaddleClas | PaddleSeg | PaddleRec | Paddle NLP | Paddle Video |
| :----: | :----: | :----: | :----: | :----: | :----: | :----: | | :----: | :----: | :----: | :----: | :----: | :----: | :----: |
| 8 | 12 | 14 | 2 | 3 | 7 | 1 | | 8 | 12 | 14 | 2 | 3 | 6 | 1|
</p> </p>
更多模型示例进入[模型库](doc/Model_Zoo_CN.md) For more model examples, read [Model zoo](doc/Model_Zoo_EN.md)
<p align="center"> <p align="center">
<img src="https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/imgs_results/PP-OCRv2/PP-OCRv2-pic003.jpg?raw=true" width="345"/> <img src="https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/imgs_results/PP-OCRv2/PP-OCRv2-pic003.jpg?raw=true" width="345"/>
<img src="doc/images/detection.png" width="350"> <img src="doc/images/detection.png" width="350">
</p> </p>
<h2 align="center">社区</h2>
<h2 align="center">Community</h2>
您想要同开发者和其他用户沟通吗?欢迎加入我们,通过如下方式加入社群 If you want to communicate with developers and other users? Welcome to join us, join the community through the following methods below.
### 微信 ### Wechat
- 微信用户请扫码 - WeChat scavenging
<p align="center"> <p align="center">
<img src="doc/images/wechat_group_1.jpeg" width="250"> <img src="doc/images/wechat_group_1.jpeg" width="250">
</p> </p>
### QQ ### QQ
- 飞桨推理部署交流群(Group No.:697765514) - QQ Group(Group No.:697765514)
<p align="center"> <p align="center">
<img src="doc/images/qq_group_1.png" width="200"> <img src="doc/images/qq_group_1.png" width="200">
</p> </p>
> 贡献代码 > Contribution
如果您想为Paddle Serving贡献代码,请参考 [Contribution Guidelines(English)](doc/Contribute_EN.md) If you want to contribute code to Paddle Serving, please reference [Contribution Guidelines](doc/Contribute_EN.md)
- 感谢 [@w5688414](https://github.com/w5688414) 提供 NLP Ernie Indexing 案例 - Thanks to [@loveululu](https://github.com/loveululu) for providing python API of Cube.
- 感谢 [@loveululu](https://github.com/loveululu) 提供 Cube python API - Thanks to [@EtachGu](https://github.com/EtachGu) in updating run docker codes.
- 感谢 [@EtachGu](https://github.com/EtachGu) 更新 docker 使用命令 - Thanks to [@BeyondYourself](https://github.com/BeyondYourself) in complementing the gRPC tutorial, updating the FAQ doc and modifying the mdkir command
- 感谢 [@BeyondYourself](https://github.com/BeyondYourself) 提供grpc教程,更新FAQ教程,整理文件目录。 - Thanks to [@mcl-stone](https://github.com/mcl-stone) in updating faster_rcnn benchmark
- 感谢 [@mcl-stone](https://github.com/mcl-stone) 提供faster rcnn benchmark脚本 - Thanks to [@cg82616424](https://github.com/cg82616424) in updating the unet benchmark modifying resize comment error
- 感谢 [@cg82616424](https://github.com/cg82616424) 提供unet benchmark脚本和修改部分注释错误 - Thanks to [@cuicheng01](https://github.com/cuicheng01) for providing 11 PaddleClas models
- 感谢 [@cuicheng01](https://github.com/cuicheng01) 提供PaddleClas的11个模型 - Thanks to [@Jiaqi Liu](https://github.com/LiuChiachi) for supporting prediction for string list input
- 感谢 [@Jiaqi Liu](https://github.com/LiuChiachi) 新增list[str]类型输入的预测支持 - Thanks to [@Bin Lu](https://github.com/Intsigstephon) for adding pp-shitu example
- 感谢 [@Bin Lu](https://github.com/Intsigstephon) 提供PP-Shitu C++模型示例
> 反馈 > Feedback
如有任何反馈或是bug,请在 [GitHub Issue](https://github.com/PaddlePaddle/Serving/issues)提交 For any feedback or to report a bug, please propose a [GitHub Issue](https://github.com/PaddlePaddle/Serving/issues).
> License > License
......
...@@ -30,7 +30,7 @@ Failed to predict: (data_id=1 log_id=0) [det|0] Failed to postprocess: postproce ...@@ -30,7 +30,7 @@ Failed to predict: (data_id=1 log_id=0) [det|0] Failed to postprocess: postproce
#### Q: Paddle Serving 支持哪些数据类型? #### Q: Paddle Serving 支持哪些数据类型?
**A:** 在 protobuf 定义中 `feed_type``fetch_type` 编号与数据类型对应如下,完整信息可参考[保存用于 Serving 部署的模型参数](./5-1_Save_Model_Params_CN.md) **A:** 在 protobuf 定义中 `feed_type``fetch_type` 编号与数据类型对应如下,完整信息可参考[保存用于 Serving 部署的模型参数](./Save_CN.md)
| 类型 | 类型值 | | 类型 | 类型值 |
|------|------| |------|------|
...@@ -49,7 +49,7 @@ Failed to predict: (data_id=1 log_id=0) [det|0] Failed to postprocess: postproce ...@@ -49,7 +49,7 @@ Failed to predict: (data_id=1 log_id=0) [det|0] Failed to postprocess: postproce
#### Q: Paddle Serving 是否支持 Windows 和 Linux 原生环境部署? #### Q: Paddle Serving 是否支持 Windows 和 Linux 原生环境部署?
**A:** 安装 `Linux Docker`,在 Docker 中部署 Paddle Serving,参考[安装指南](./2-0_Index_CN.md) **A:** 安装 `Linux Docker`,在 Docker 中部署 Paddle Serving,参考[安装指南](./Install_CN.md)
#### Q: Paddle Serving 如何修改消息大小限制 #### Q: Paddle Serving 如何修改消息大小限制
...@@ -61,7 +61,7 @@ Failed to predict: (data_id=1 log_id=0) [det|0] Failed to postprocess: postproce ...@@ -61,7 +61,7 @@ Failed to predict: (data_id=1 log_id=0) [det|0] Failed to postprocess: postproce
#### Q: Paddle Serving 支持哪些网络协议? #### Q: Paddle Serving 支持哪些网络协议?
**A:** C++ Serving 同时支持 HTTP、gRPC 和 bRPC 协议。其中 HTTP 协议既支持 HTTP + Json 格式,同时支持 HTTP + proto 格式。完整信息请阅读[C++ Serving 通讯协议](./6-2_Cpp_Serving_Protocols_CN.md);Python Pipeline 支持 HTTP 和 gRPC 协议,更多信息请阅读[Python Pipeline 框架设计](./6-2_Cpp_Serving_Protocols_CN.md) **A:** C++ Serving 同时支持 HTTP、gRPC 和 bRPC 协议。其中 HTTP 协议既支持 HTTP + Json 格式,同时支持 HTTP + proto 格式。完整信息请阅读[C++ Serving 通讯协议](./C++_Serving/Inference_Protocols_CN.md);Python Pipeline 支持 HTTP 和 gRPC 协议,更多信息请阅读[Python Pipeline 框架设计](./Python_Pipeline/Pipeline_Features_CN.md)
<a name="3"></a> <a name="3"></a>
...@@ -309,7 +309,7 @@ GLOG_v=2 python -m paddle_serving_server.serve --model xxx_conf/ --port 9999 ...@@ -309,7 +309,7 @@ GLOG_v=2 python -m paddle_serving_server.serve --model xxx_conf/ --port 9999
#### Q: Python Pipeline 启动成功后,日志文件在哪里,在哪里设置日志级别? #### Q: Python Pipeline 启动成功后,日志文件在哪里,在哪里设置日志级别?
**A:** Python Pipeline 服务的日志信息请阅读[Python Pipeline 设计](./7-1_Python_Pipeline_Design_CN.md) 第三节服务日志。 **A:** Python Pipeline 服务的日志信息请阅读[Python Pipeline 设计](./Python_Pipeline/Pipeline_Design_CN.md) 第三节服务日志。
#### Q: (GLOG_v=2下)Server 日志一切正常,但 Client 始终得不到正确的预测结果 #### Q: (GLOG_v=2下)Server 日志一切正常,但 Client 始终得不到正确的预测结果
......
...@@ -4,6 +4,8 @@ ...@@ -4,6 +4,8 @@
低精度部署, 在Intel CPU上支持int8、bfloat16模型,Nvidia TensorRT支持int8、float16模型。 低精度部署, 在Intel CPU上支持int8、bfloat16模型,Nvidia TensorRT支持int8、float16模型。
## C++ Serving 部署量化模型
### 通过PaddleSlim量化生成低精度模型 ### 通过PaddleSlim量化生成低精度模型
详细见[PaddleSlim量化](https://paddleslim.readthedocs.io/zh_CN/latest/tutorials/quant/overview.html) 详细见[PaddleSlim量化](https://paddleslim.readthedocs.io/zh_CN/latest/tutorials/quant/overview.html)
...@@ -41,7 +43,12 @@ fetch_map = client.predict(feed={"image": img}, fetch=["score"]) ...@@ -41,7 +43,12 @@ fetch_map = client.predict(feed={"image": img}, fetch=["score"])
print(fetch_map["score"].reshape(-1)) print(fetch_map["score"].reshape(-1))
``` ```
### 参考文档 ## Python Pipeline 部署量化模型
请参考 [Python Pipeline 低精度推理](./Python_Pipeline/Pipeline_Features_CN.md#低精度推理)
## 参考文档
* [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim) * [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)
* PaddleInference Intel CPU部署量化模型[文档](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_x86_cpu_int8.html) * PaddleInference Intel CPU部署量化模型[文档](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_x86_cpu_int8.html)
* PaddleInference NV GPU部署量化模型[文档](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_trt.html) * PaddleInference NV GPU部署量化模型[文档](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_trt.html)
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册