From f8b05f1714f335b204ec472535004aa048b101e1 Mon Sep 17 00:00:00 2001 From: TeslaZhao Date: Tue, 9 Nov 2021 09:52:59 +0800 Subject: [PATCH] Merge pull request #1475 from TeslaZhao/develop Modify doc directory --- doc/HTTP_SERVICE_CN.md | 0 doc/MULTI_SERVICE_ON_ONE_GPU_CN.md | 15 ----- doc/Makefile | 20 ------- doc/UWSGI_DEPLOY.md | 45 --------------- doc/UWSGI_DEPLOY_CN.md | 45 --------------- .../ABTEST_IN_PADDLE_SERVING.md | 0 .../ABTEST_IN_PADDLE_SERVING_CN.md | 0 doc/{ => cpp_pipeline_server}/C++DESIGN.md | 0 doc/{ => cpp_pipeline_server}/C++DESIGN_CN.md | 0 doc/{ => cpp_pipeline_server}/ENCRYPTION.md | 0 .../ENCRYPTION_CN.md | 0 .../HOT_LOADING_IN_SERVING.md | 0 .../HOT_LOADING_IN_SERVING_CN.md | 0 doc/{ => cpp_pipeline_server}/NEW_OPERATOR.md | 0 .../NEW_OPERATOR_CN.md | 0 .../NEW_WEB_SERVICE.md | 0 .../NEW_WEB_SERVICE_CN.md | 0 .../MULTI_SERVING_OVER_SINGLE_GPU_CARD.md | 32 ----------- doc/deprecated/NEW_WEB_SERVICE.md | 56 ------------------- doc/deprecated/NEW_WEB_SERVICE_CN.md | 56 ------------------- doc/doc_test_list | 2 - .../PIPELINE_SERVING.md | 0 .../PIPELINE_SERVING_CN.md | 0 23 files changed, 271 deletions(-) mode change 100755 => 100644 doc/HTTP_SERVICE_CN.md delete mode 100644 doc/MULTI_SERVICE_ON_ONE_GPU_CN.md delete mode 100644 doc/Makefile delete mode 100644 doc/UWSGI_DEPLOY.md delete mode 100644 doc/UWSGI_DEPLOY_CN.md rename doc/{ => cpp_pipeline_server}/ABTEST_IN_PADDLE_SERVING.md (100%) rename doc/{ => cpp_pipeline_server}/ABTEST_IN_PADDLE_SERVING_CN.md (100%) rename doc/{ => cpp_pipeline_server}/C++DESIGN.md (100%) rename doc/{ => cpp_pipeline_server}/C++DESIGN_CN.md (100%) rename doc/{ => cpp_pipeline_server}/ENCRYPTION.md (100%) rename doc/{ => cpp_pipeline_server}/ENCRYPTION_CN.md (100%) rename doc/{ => cpp_pipeline_server}/HOT_LOADING_IN_SERVING.md (100%) rename doc/{ => cpp_pipeline_server}/HOT_LOADING_IN_SERVING_CN.md (100%) rename doc/{ => cpp_pipeline_server}/NEW_OPERATOR.md (100%) rename doc/{ => cpp_pipeline_server}/NEW_OPERATOR_CN.md (100%) rename doc/{ => cpp_pipeline_server}/NEW_WEB_SERVICE.md (100%) rename doc/{ => cpp_pipeline_server}/NEW_WEB_SERVICE_CN.md (100%) delete mode 100644 doc/deprecated/MULTI_SERVING_OVER_SINGLE_GPU_CARD.md delete mode 100644 doc/deprecated/NEW_WEB_SERVICE.md delete mode 100644 doc/deprecated/NEW_WEB_SERVICE_CN.md delete mode 100644 doc/doc_test_list rename doc/{ => python_pipeline_server}/PIPELINE_SERVING.md (100%) rename doc/{ => python_pipeline_server}/PIPELINE_SERVING_CN.md (100%) diff --git a/doc/HTTP_SERVICE_CN.md b/doc/HTTP_SERVICE_CN.md old mode 100755 new mode 100644 diff --git a/doc/MULTI_SERVICE_ON_ONE_GPU_CN.md b/doc/MULTI_SERVICE_ON_ONE_GPU_CN.md deleted file mode 100644 index 1de36af8..00000000 --- a/doc/MULTI_SERVICE_ON_ONE_GPU_CN.md +++ /dev/null @@ -1,15 +0,0 @@ -# 单卡多模型预测服务 - -当客户端发送的请求数并不频繁的情况下,会造成服务端机器计算资源尤其是GPU资源的浪费,这种情况下,可以在服务端启动多个预测服务来提高资源利用率。Paddle Serving支持在单张显卡上部署多个预测服务,使用时只需要在启动单个服务时通过--gpu_ids参数将服务与显卡进行绑定,这样就可以将多个服务都绑定到同一张卡上。 - -例如: - -```shell -python -m paddle_serving_server.serve --model bert_seq128_model --port 9292 --gpu_ids 0 -python -m paddle_serving_server.serve --model ResNet50_vd_model --port 9393 --gpu_ids 0 -``` - -在卡0上,同时部署了bert示例和iamgenet示例。 - -**注意:** 单张显卡内部进行推理计算时仍然为串行计算,这种方式是为了减少server端显卡的空闲时间。 - diff --git a/doc/Makefile b/doc/Makefile deleted file mode 100644 index d0c3cbf1..00000000 --- a/doc/Makefile +++ /dev/null @@ -1,20 +0,0 @@ -# Minimal makefile for Sphinx documentation -# - -# You can set these variables from the command line, and also -# from the environment for the first two. -SPHINXOPTS ?= -SPHINXBUILD ?= sphinx-build -SOURCEDIR = source -BUILDDIR = build - -# Put it first so that "make" without argument is like "make help". -help: - @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) - -.PHONY: help Makefile - -# Catch-all target: route all unknown targets to Sphinx using the new -# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). -%: Makefile - @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) diff --git a/doc/UWSGI_DEPLOY.md b/doc/UWSGI_DEPLOY.md deleted file mode 100644 index 1aa9c1fc..00000000 --- a/doc/UWSGI_DEPLOY.md +++ /dev/null @@ -1,45 +0,0 @@ -# Deploy HTTP service with uWSGI - -([简体中文](./UWSGI_DEPLOY_CN.md)|English) - -In fit_a_line example, after starting the HTTP prediction service, you will see the following information: - -```shell -web service address: -http://10.127.3.150:9393/uci/prediction - * Serving Flask app "serve" (lazy loading) - * Environment: production - WARNING: This is a development server. Do not use it in a production deployment. - Use a production WSGI server instead. - * Debug mode: off - * Running on http://0.0.0.0:9393/ (Press CTRL+C to quit) -``` - -Here you will be prompted that the HTTP service started is in development mode and cannot be used for production deployment. -The prediction service started by Flask is not stable enough to withstand the concurrency of a large number of requests. In the actual deployment process, WSGI (Web Server Gateway Interface) is used. - -Next, we will show how to use the [uWSGI](https://github.com/unbit/uwsgi) module to deploy HTTP prediction services for production environments. - - -```python -#uwsgi_service.py -from paddle_serving_server.web_service import WebService - -#Define prediction service -uci_service = WebService(name = "uci") -uci_service.load_model_config("./uci_housing_model") -uci_service.prepare_server(workdir="./workdir", port=int(9500), device="cpu") -uci_service.run_rpc_service() -#Get flask application -app_instance = uci_service.get_app_instance() -``` - -Start service with uWSGI - -```bash -uwsgi --http :9393 --module uwsgi_service:app_instance -``` - -Use the --processes parameter to specify the number of service processes. - -For more information about uWSGI, please refer to [uWSGI documentation](https://uwsgi-docs.readthedocs.io/en/latest/) diff --git a/doc/UWSGI_DEPLOY_CN.md b/doc/UWSGI_DEPLOY_CN.md deleted file mode 100644 index 96615516..00000000 --- a/doc/UWSGI_DEPLOY_CN.md +++ /dev/null @@ -1,45 +0,0 @@ -# 使用uwsgi启动HTTP预测服务 - -(简体中文|[English](./UWSGI_DEPLOY.md)) - -在提供的fit_a_line示例中,启动HTTP预测服务后会看到有以下信息: - -```shell -web service address: -http://10.127.3.150:9393/uci/prediction - * Serving Flask app "serve" (lazy loading) - * Environment: production - WARNING: This is a development server. Do not use it in a production deployment. - Use a production WSGI server instead. - * Debug mode: off - * Running on http://0.0.0.0:9393/ (Press CTRL+C to quit) -``` - -这里会提示启动的HTTP服务是开发模式,并不能用于生产环境的部署。Flask启动的服务环境不够稳定也无法承受大量请求的并发,实际部署过程中配合需要WSGI(Web Server Gateway Interface)使用。 - -下面我们展示一下如何使用[uWSGI](https://github.com/unbit/uwsgi)模块来部署HTTP预测服务用于生产环境。 - -编写HTTP服务脚本 - -```python -#uwsgi_service.py -from paddle_serving_server.web_service import WebService - -#配置预测服务 -uci_service = WebService(name = "uci") -uci_service.load_model_config("./uci_housing_model") -uci_service.prepare_server(workdir="./workdir", port=int(9500), device="cpu") -uci_service.run_rpc_service() -#获取flask服务 -app_instance = uci_service.get_app_instance() -``` - -使用uwsgi启动HTTP服务 - -```bash -uwsgi --http :9393 --module uwsgi_service:app_instance -``` - -使用--processes参数可以指定服务的进程数。 - -更多uWSGI的信息请参考[uWSGI使用文档](https://uwsgi-docs.readthedocs.io/en/latest/) diff --git a/doc/ABTEST_IN_PADDLE_SERVING.md b/doc/cpp_pipeline_server/ABTEST_IN_PADDLE_SERVING.md similarity index 100% rename from doc/ABTEST_IN_PADDLE_SERVING.md rename to doc/cpp_pipeline_server/ABTEST_IN_PADDLE_SERVING.md diff --git a/doc/ABTEST_IN_PADDLE_SERVING_CN.md b/doc/cpp_pipeline_server/ABTEST_IN_PADDLE_SERVING_CN.md similarity index 100% rename from doc/ABTEST_IN_PADDLE_SERVING_CN.md rename to doc/cpp_pipeline_server/ABTEST_IN_PADDLE_SERVING_CN.md diff --git a/doc/C++DESIGN.md b/doc/cpp_pipeline_server/C++DESIGN.md similarity index 100% rename from doc/C++DESIGN.md rename to doc/cpp_pipeline_server/C++DESIGN.md diff --git a/doc/C++DESIGN_CN.md b/doc/cpp_pipeline_server/C++DESIGN_CN.md similarity index 100% rename from doc/C++DESIGN_CN.md rename to doc/cpp_pipeline_server/C++DESIGN_CN.md diff --git a/doc/ENCRYPTION.md b/doc/cpp_pipeline_server/ENCRYPTION.md similarity index 100% rename from doc/ENCRYPTION.md rename to doc/cpp_pipeline_server/ENCRYPTION.md diff --git a/doc/ENCRYPTION_CN.md b/doc/cpp_pipeline_server/ENCRYPTION_CN.md similarity index 100% rename from doc/ENCRYPTION_CN.md rename to doc/cpp_pipeline_server/ENCRYPTION_CN.md diff --git a/doc/HOT_LOADING_IN_SERVING.md b/doc/cpp_pipeline_server/HOT_LOADING_IN_SERVING.md similarity index 100% rename from doc/HOT_LOADING_IN_SERVING.md rename to doc/cpp_pipeline_server/HOT_LOADING_IN_SERVING.md diff --git a/doc/HOT_LOADING_IN_SERVING_CN.md b/doc/cpp_pipeline_server/HOT_LOADING_IN_SERVING_CN.md similarity index 100% rename from doc/HOT_LOADING_IN_SERVING_CN.md rename to doc/cpp_pipeline_server/HOT_LOADING_IN_SERVING_CN.md diff --git a/doc/NEW_OPERATOR.md b/doc/cpp_pipeline_server/NEW_OPERATOR.md similarity index 100% rename from doc/NEW_OPERATOR.md rename to doc/cpp_pipeline_server/NEW_OPERATOR.md diff --git a/doc/NEW_OPERATOR_CN.md b/doc/cpp_pipeline_server/NEW_OPERATOR_CN.md similarity index 100% rename from doc/NEW_OPERATOR_CN.md rename to doc/cpp_pipeline_server/NEW_OPERATOR_CN.md diff --git a/doc/NEW_WEB_SERVICE.md b/doc/cpp_pipeline_server/NEW_WEB_SERVICE.md similarity index 100% rename from doc/NEW_WEB_SERVICE.md rename to doc/cpp_pipeline_server/NEW_WEB_SERVICE.md diff --git a/doc/NEW_WEB_SERVICE_CN.md b/doc/cpp_pipeline_server/NEW_WEB_SERVICE_CN.md similarity index 100% rename from doc/NEW_WEB_SERVICE_CN.md rename to doc/cpp_pipeline_server/NEW_WEB_SERVICE_CN.md diff --git a/doc/deprecated/MULTI_SERVING_OVER_SINGLE_GPU_CARD.md b/doc/deprecated/MULTI_SERVING_OVER_SINGLE_GPU_CARD.md deleted file mode 100644 index 1bbcaf16..00000000 --- a/doc/deprecated/MULTI_SERVING_OVER_SINGLE_GPU_CARD.md +++ /dev/null @@ -1,32 +0,0 @@ -# Multiple Serving Instances over Single GPU Card - -Paddle Serving依托PaddlePaddle预测库执行实际的预测计算。由于当前GPU预测库的限制,单个Serving实例只可以绑定1张GPU卡,且进程内所有worker线程共用1个GPU stream。也就是说,不管Serving启动多少个worker线程,所有的请求在GPU是严格串行计算的,起不到加速作用。这会带来一个问题,就是如果模型计算量不大,那么Serving进程实际上不会用满GPU的算力。 - -为了充分利用GPU卡的算力,考虑在单张卡上启动多个Serving实例,通过多个GPU stream,力争用满GPU的算力。启动命令可以如下所示: - -``` -bin/serving --gpuid=0 --bthread_concurrency=4 --bthread_min_concurrency=4 --port=8010& -bin/serving --gpuid=0 --bthread_concurrency=4 --bthread_min_concurrency=4 --port=8011& -``` - -上述2条命令,启动2个Serving实例,分别监听8010端口和8011端口。但他们都绑定同一张卡 (gpuid = 0)。 - -命令行参数含义: -``` --gpuid=N:用于指定所绑定的GPU卡ID --bthread_concurrency和bthread_min_concurrency共同限制该进程启动的worker数:由于在GPU预测模式下,增加worker线程数并不能提高并发能力,为了节省部分资源,干脆将他们限制掉;均设为4,是因为这是bthread允许的最小值。 --port xxx:Serving实例监听的端口 -``` - -但是,上述方式究竟是否能在不影响响应时间等其他指标的前提下,起到提高GPU使用率作用,受到多个限制因素的制约,具体的: - -1. 单个stream占用GPU算力;假如单个stream已经将GPU算力占用超过50%,那么增加stream很可能会导致2个stream的job分别排队,拖慢各自的响应时间 -2. GPU显存:Serving进程需要将模型参数加载到显存中,并且计算时要在GPU显存池分配临时变量;假如单个Serving进程已经用掉超过50%的显存,则增加Serving进程会造成显存不足,导致进程报错退出 - -为此,可采用如下步骤,进行测试: - -1. 加载模型时,在model_toolkit.prototxt中,model type选择FLUID_GPU_ANALYSIS或FLUID_GPU_ANALYSIS_DIR;会对模型进行静态分析,进行一定程度显存优化 -2. 在步骤1完成后,启动单个Serving进程,启动参数:`--gpuid=N --bthread_concurrency=4 --bthread_min_concurrency=4`;启动一个client,进行并发度为1的压力测试,batch size从小到大,记下平响;由于算力的限制,当batch size增大到一定程度,应该会出现响应时间明显变大;或虽然没有明显变大,但已经不满足系统需求 -3. 再启动1个Serving进程,与步骤2启动时使用相同的参数略有不同: `--gpuid=N --bthread_concurrency=4 --bthread_min_concurrency=4 --port=8011` 其中--port=8011用来让新启动的进程使用一个新的服务端口;然后同时对这2个Serving进程进行压测,继续观察batch size从小到大时平均响应时间的变化,直到取得batch size和响应时间的折中 -4. 重复步骤2-3 -5. 以2-4步的测试,来决定:单张GPU卡可以由多少个Serving进程共用; 实际部署时,就在一张GPU卡上启动这么多个Serving进程同时提供服务 diff --git a/doc/deprecated/NEW_WEB_SERVICE.md b/doc/deprecated/NEW_WEB_SERVICE.md deleted file mode 100644 index 441ad146..00000000 --- a/doc/deprecated/NEW_WEB_SERVICE.md +++ /dev/null @@ -1,56 +0,0 @@ -# How to develop a new Web service? - -([简体中文](NEW_WEB_SERVICE_CN.md)|English) - -This document will take the image classification service based on the Imagenet data set as an example to introduce how to develop a new web service. The complete code can be visited at [here](../../python/examples/imagenet/resnet50_web_service.py). - -## WebService base class - -Paddle Serving implements the [WebService](https://github.com/PaddlePaddle/Serving/blob/develop/python/paddle_serving_server/web_service.py#L23) base class. You need to override its `preprocess` and `postprocess` method. The default implementation is as follows: - -```python -class WebService(object): - - def preprocess(self, feed={}, fetch=[]): - return feed, fetch - def postprocess(self, feed={}, fetch=[], fetch_map=None): - return fetch_map -``` - -### preprocess - -The preprocess method has two input parameters, `feed` and `fetch`. For an HTTP request `request`: - -- The value of `feed` is the feed part `request.json["feed"]` in the request data -- The value of `fetch` is the fetch part `request.json["fetch"]` in the request data - -The return values are the feed and fetch values used in the prediction. - -### postprocess - -The postprocess method has three input parameters, `feed`, `fetch` and `fetch_map`: - -- The value of `feed` is the feed part `request.json["feed"]` in the request data -- The value of `fetch` is the fetch part `request.json["fetch"]` in the request data -- The value of `fetch_map` is the model output value. - -The return value will be processed as `{"reslut": fetch_map}` as the return of the HTTP request. - -## Develop ImageService class - -```python -class ImageService(WebService): - - def preprocess(self, feed={}, fetch=[]): - reader = ImageReader() - feed_batch = [] - for ins in feed: - if "image" not in ins: - raise ("feed data error!") - sample = base64.b64decode(ins["image"]) - img = reader.process_image(sample) - feed_batch.append({"image": img}) - return feed_batch, fetch -``` - -For the above `ImageService`, only the `preprocess` method is rewritten to process the image data in Base64 format into the data format required by prediction. diff --git a/doc/deprecated/NEW_WEB_SERVICE_CN.md b/doc/deprecated/NEW_WEB_SERVICE_CN.md deleted file mode 100644 index 5458c934..00000000 --- a/doc/deprecated/NEW_WEB_SERVICE_CN.md +++ /dev/null @@ -1,56 +0,0 @@ -# 如何开发一个新的Web Service? - -(简体中文|[English](NEW_WEB_SERVICE.md)) - -本文档将以Imagenet图像分类服务为例,来介绍如何开发一个新的Web Service。您可以在[这里](../../python/examples/imagenet/resnet50_web_service.py)查阅完整的代码。 - -## WebService基类 - -Paddle Serving实现了[WebService](https://github.com/PaddlePaddle/Serving/blob/develop/python/paddle_serving_server/web_service.py#L23)基类,您需要重写它的`preprocess`方法和`postprocess`方法,默认实现如下: - -```python -class WebService(object): - - def preprocess(self, feed={}, fetch=[]): - return feed, fetch - def postprocess(self, feed={}, fetch=[], fetch_map=None): - return fetch_map -``` - -### preprocess方法 - -preprocess方法有两个输入参数,`feed`和`fetch`。对于一个HTTP请求`request`: - -- `feed`的值为请求数据中的feed部分`request.json["feed"]` -- `fetch`的值为请求数据中的fetch部分`request.json["fetch"]` - -返回值分别是预测过程中用到的feed和fetch值。 - -### postprocess方法 - -postprocess方法有三个输入参数,`feed`、`fetch`和`fetch_map`: - -- `feed`的值为请求数据中的feed部分`request.json["feed"]` -- `fetch`的值为请求数据中的fetch部分`request.json["fetch"]` -- `fetch_map`的值为fetch到的模型输出值 - -返回值将会被处理成`{"reslut": fetch_map}`作为HTTP请求的返回。 - -## 开发ImageService类 - -```python -class ImageService(WebService): - - def preprocess(self, feed={}, fetch=[]): - reader = ImageReader() - feed_batch = [] - for ins in feed: - if "image" not in ins: - raise ("feed data error!") - sample = base64.b64decode(ins["image"]) - img = reader.process_image(sample) - feed_batch.append({"image": img}) - return feed_batch, fetch -``` - -对于上述的`ImageService`,只重写了前处理方法,将base64格式的图片数据处理成模型预测需要的数据格式。 diff --git a/doc/doc_test_list b/doc/doc_test_list deleted file mode 100644 index 8812f85e..00000000 --- a/doc/doc_test_list +++ /dev/null @@ -1,2 +0,0 @@ -BERT_10_MINS.md -ABTEST_IN_PADDLE_SERVING.md diff --git a/doc/PIPELINE_SERVING.md b/doc/python_pipeline_server/PIPELINE_SERVING.md similarity index 100% rename from doc/PIPELINE_SERVING.md rename to doc/python_pipeline_server/PIPELINE_SERVING.md diff --git a/doc/PIPELINE_SERVING_CN.md b/doc/python_pipeline_server/PIPELINE_SERVING_CN.md similarity index 100% rename from doc/PIPELINE_SERVING_CN.md rename to doc/python_pipeline_server/PIPELINE_SERVING_CN.md -- GitLab