([简体中文](./README_CN.md)|English)
*** The goal of Paddle Serving is to provide high-performance, flexible and easy-to-use industrial-grade online inference services for machine learning developers and enterprises.Paddle Serving supports multiple protocols such as RESTful, gRPC, bRPC, and provides inference solutions under a variety of hardware and multiple operating system environments, and many famous pre-trained model examples. The core features are as follows: - Integrate high-performance server-side inference engine paddle Inference and mobile-side engine paddle Lite. Models of other machine learning platforms (Caffe/TensorFlow/ONNX/PyTorch) can be migrated to paddle through [x2paddle](https://github.com/PaddlePaddle/X2Paddle). - There are two frameworks, namely high-performance C++ Serving and high-easy-to-use Python pipeline. The C++ Serving is based on the bRPC network framework to create a high-throughput, low-latency inference service, and its performance indicators are ahead of competing products. The Python pipeline is based on the gRPC/gRPC-Gateway network framework and the Python language to build a highly easy-to-use and high-throughput inference service. How to choose which one please see [Techinical Selection](doc/Serving_Design_EN.md#21-design-selection). - Support multiple [protocols](doc/C++_Serving/Inference_Protocols_CN.md) such as HTTP, gRPC, bRPC, and provide C++, Python, Java language SDK. - Design and implement a high-performance inference service framework for asynchronous pipelines based on directed acyclic graph (DAG), with features such as multi-model combination, asynchronous scheduling, concurrent inference, dynamic batch, multi-card multi-stream inference, request cache, etc. - Adapt to a variety of commonly used computing hardwares, such as x86 (Intel) CPU, ARM CPU, Nvidia GPU, Kunlun XPU, HUAWEI Ascend 310/910, HYGON DCU、Nvidia Jetson etc. - Integrate acceleration libraries of Intel MKLDNN and Nvidia TensorRT, and low-precision and quantitative inference. - Provide a model security deployment solution, including encryption model deployment, and authentication mechanism, HTTPs security gateway, which is used in practice. - Support cloud deployment, provide a deployment case of Baidu Cloud Intelligent Cloud kubernetes cluster. - Provide more than 40 classic pre-model deployment examples, such as PaddleOCR, PaddleClas, PaddleDetection, PaddleSeg, PaddleNLP, PaddleRec and other suites, and more models continue to expand. - Supports distributed deployment of large-scale sparse parameter index models, with features such as multiple tables, multiple shards, multiple copies, local high-frequency cache, etc., and can be deployed on a single machine or clouds. - Support service monitoring, provide prometheus-based performance statistics and port access
| PaddleOCR | PaddleDetection | PaddleClas | PaddleSeg | PaddleRec | Paddle NLP | | :----: | :----: | :----: | :----: | :----: | :----: | | 8 | 12 | 14 | 2 | 3 | 6 |
For more model examples, read [Model zoo](doc/Model_Zoo_EN.md)
### QQ - QQ Group(Group No.:697765514)
> Contribution If you want to contribute code to Paddle Serving, please reference [Contribution Guidelines](doc/Contribute_EN.md) - Thanks to [@loveululu](https://github.com/loveululu) for providing python API of Cube. - Thanks to [@EtachGu](https://github.com/EtachGu) in updating run docker codes. - Thanks to [@BeyondYourself](https://github.com/BeyondYourself) in complementing the gRPC tutorial, updating the FAQ doc and modifying the mdkir command - Thanks to [@mcl-stone](https://github.com/mcl-stone) in updating faster_rcnn benchmark - Thanks to [@cg82616424](https://github.com/cg82616424) in updating the unet benchmark modifying resize comment error - Thanks to [@cuicheng01](https://github.com/cuicheng01) for providing 11 PaddleClas models > Feedback For any feedback or to report a bug, please propose a [GitHub Issue](https://github.com/PaddlePaddle/Serving/issues). > License [Apache 2.0 License](https://github.com/PaddlePaddle/Serving/blob/develop/LICENSE)