diff --git a/README.md b/README.md index 47e4f9d65389dec39c2b455497a0cb81e0bb7eb1..31db680c8abdacba5f126cdf13dda266f6ceb791 100755 --- a/README.md +++ b/README.md @@ -30,7 +30,7 @@ The goal of Paddle Serving is to provide high-performance, flexible and easy-to- - Integrate high-performance server-side inference engine paddle Inference and mobile-side engine paddle Lite. Models of other machine learning platforms (Caffe/TensorFlow/ONNX/PyTorch) can be migrated to paddle through [x2paddle](https://github.com/PaddlePaddle/X2Paddle). - There are two frameworks, namely high-performance C++ Serving and high-easy-to-use Python pipeline. The C++ Serving is based on the bRPC network framework to create a high-throughput, low-latency inference service, and its performance indicators are ahead of competing products. The Python pipeline is based on the gRPC/gRPC-Gateway network framework and the Python language to build a highly easy-to-use and high-throughput inference service. How to choose which one please see [Techinical Selection](doc/Serving_Design_EN.md#21-design-selection). - Support multiple [protocols](doc/C++_Serving/Inference_Protocols_CN.md) such as HTTP, gRPC, bRPC, and provide C++, Python, Java language SDK. -- Design and implement a high-performance inference service framework for asynchronous pipelines based on directed acyclic graph (DAG), with features such as multi-model combination, asynchronous scheduling, concurrent inference, dynamic batch, multi-card multi-stream inference, etc. +- Design and implement a high-performance inference service framework for asynchronous pipelines based on directed acyclic graph (DAG), with features such as multi-model combination, asynchronous scheduling, concurrent inference, dynamic batch, multi-card multi-stream inference, request cache, etc. - Adapt to a variety of commonly used computing hardwares, such as x86 (Intel) CPU, ARM CPU, Nvidia GPU, Kunlun XPU, HUAWEI Ascend 310/910, HYGON DCU、Nvidia Jetson etc. - Integrate acceleration libraries of Intel MKLDNN and Nvidia TensorRT, and low-precision and quantitative inference. - Provide a model security deployment solution, including encryption model deployment, and authentication mechanism, HTTPs security gateway, which is used in practice. @@ -75,6 +75,7 @@ The first step is to call the model save interface to generate a model parameter - [Guide for RESTful/gRPC/bRPC APIs(Chinese)](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client) - [Infer on quantizative models](doc/Low_Precision_EN.md) - [Data format of classic models(Chinese)](doc/Process_data_CN.md) +- [Prometheus(Chinese)](doc/Prometheus_CN.md) - [C++ Serving(Chinese)](doc/C++_Serving/Introduction_CN.md) - [Protocols(Chinese)](doc/C++_Serving/Inference_Protocols_CN.md) - [Hot loading models](doc/C++_Serving/Hot_Loading_EN.md) @@ -83,6 +84,7 @@ The first step is to call the model save interface to generate a model parameter - [Analyze and optimize performance(Chinese)](doc/C++_Serving/Performance_Tuning_CN.md) - [Benchmark(Chinese)](doc/C++_Serving/Benchmark_CN.md) - [Multiple models in series(Chinese)](doc/C++_Serving/2+_model.md) + - [Request Cache(Chinese)](doc/C++_Serving/Request_Cache_CN.md) - [Python Pipeline](doc/Python_Pipeline/Pipeline_Design_EN.md) - [Analyze and optimize performance](doc/Python_Pipeline/Performance_Tuning_EN.md) - [TensorRT dynamic Shape](doc/TensorRT_Dynamic_Shape_EN.md) diff --git a/README_CN.md b/README_CN.md index 324ad49b406ac0849f4d313012ff24763f289741..fefdfbc75a5b4bdc6dc77f2cc0b3f3f3d5a5f2c9 100755 --- a/README_CN.md +++ b/README_CN.md @@ -29,13 +29,14 @@ Paddle Serving依托深度学习框架PaddlePaddle旨在帮助深度学习开发 - 集成高性能服务端推理引擎paddle Inference和移动端引擎paddle Lite,其他机器学习平台(Caffe/TensorFlow/ONNX/PyTorch)可通过[x2paddle](https://github.com/PaddlePaddle/X2Paddle)工具迁移模型 - 具有高性能C++和高易用Python 2套框架。C++框架基于高性能bRPC网络框架打造高吞吐、低延迟的推理服务,性能领先竞品。Python框架基于gRPC/gRPC-Gateway网络框架和Python语言构建高易用、高吞吐推理服务框架。技术选型参考[技术选型](doc/Serving_Design_CN.md#21-设计选型) - 支持HTTP、gRPC、bRPC等多种[协议](doc/C++_Serving/Inference_Protocols_CN.md);提供C++、Python、Java语言SDK -- 设计并实现基于有向无环图(DAG)的异步流水线高性能推理框架,具有多模型组合、异步调度、并发推理、动态批量、多卡多流推理等特性 +- 设计并实现基于有向无环图(DAG)的异步流水线高性能推理框架,具有多模型组合、异步调度、并发推理、动态批量、多卡多流推理、请求缓存等特性 - 适配x86(Intel) CPU、ARM CPU、Nvidia GPU、昆仑XPU、华为昇腾310/910、海光DCU、Nvidia Jetson等多种硬件 - 集成Intel MKLDNN、Nvidia TensorRT加速库,以及低精度和量化推理 - 提供一套模型安全部署解决方案,包括加密模型部署、鉴权校验、HTTPs安全网关,并在实际项目中应用 - 支持云端部署,提供百度云智能云kubernetes集群部署Paddle Serving案例 - 提供丰富的经典预模型部署示例,如PaddleOCR、PaddleClas、PaddleDetection、PaddleSeg、PaddleNLP、PaddleRec等套件,共计40+个预训练精品模型 - 支持大规模稀疏参数索引模型分布式部署,具有多表、多分片、多副本、本地高频cache等特性、可单机或云端部署 +- 支持服务监控,提供基于普罗米修斯的性能数据统计及端口访问

教程

@@ -70,6 +71,7 @@ Paddle Serving依托深度学习框架PaddlePaddle旨在帮助深度学习开发 - [RESTful/gRPC/bRPC API指南](doc/C++_Serving/Introduction_CN.md#42-多语言多协议Client) - [低精度推理](doc/Low_Precision_CN.md) - [常见模型数据处理](doc/Process_data_CN.md) +- [普罗米修斯](doc/Prometheus_CN.md) - [C++ Serving简介](doc/C++_Serving/Introduction_CN.md) - [协议](doc/C++_Serving/Inference_Protocols_CN.md) - [模型热加载](doc/C++_Serving/Hot_Loading_CN.md) @@ -78,6 +80,7 @@ Paddle Serving依托深度学习框架PaddlePaddle旨在帮助深度学习开发 - [性能优化指南](doc/C++_Serving/Performance_Tuning_CN.md) - [性能指标](doc/C++_Serving/Benchmark_CN.md) - [多模型串联](doc/C++_Serving/2+_model.md) + - [请求缓存](doc/C++_Serving/Request_Cache_CN.md) - [Python Pipeline设计](doc/Python_Pipeline/Pipeline_Design_CN.md) - [性能优化指南](doc/Python_Pipeline/Performance_Tuning_CN.md) - [TensorRT动态shape](doc/TensorRT_Dynamic_Shape_CN.md) diff --git a/doc/Prometheus_CN.md b/doc/Prometheus_CN.md index 4807d8c7029b0f5252aa632a92511b762002c4ff..67c20b0246bd6092170ac86da7a4a3b01ee276bb 100644 --- a/doc/Prometheus_CN.md +++ b/doc/Prometheus_CN.md @@ -7,7 +7,7 @@ curl http://localhost:19393/metrics ## 配置使用 -### C+ Server +### C++ Server 对于 C++ Server 来说,启动服务时请添加如下参数