Merge pull request #1475 from TeslaZhao/develop

Modify doc directory

Merge pull request #1475 from TeslaZhao/develop
Modify doc directory
f8b05f17 · TeslaZhao · 0bfa26a1 · f8b05f17 · 0bfa26a1 · 0bfa26a1
23 changed file
--- a/doc/HTTP_SERVICE_CN.md
+++ b/doc/HTTP_SERVICE_CN.md
--- a/doc/MULTI_SERVICE_ON_ONE_GPU_CN.md
+++ b/doc/MULTI_SERVICE_ON_ONE_GPU_CN.md
-# 单卡多模型预测服务
-
-当客户端发送的请求数并不频繁的情况下，会造成服务端机器计算资源尤其是GPU资源的浪费，这种情况下，可以在服务端启动多个预测服务来提高资源利用率。Paddle Serving支持在单张显卡上部署多个预测服务，使用时只需要在启动单个服务时通过--gpu_ids参数将服务与显卡进行绑定，这样就可以将多个服务都绑定到同一张卡上。
-
-例如：
-
-```shell
-python -m paddle_serving_server.serve --model bert_seq128_model --port 9292 --gpu_ids 0
-python -m paddle_serving_server.serve --model ResNet50_vd_model --port 9393 --gpu_ids 0
-```
-
-在卡0上，同时部署了bert示例和iamgenet示例。
-
-**注意：** 单张显卡内部进行推理计算时仍然为串行计算，这种方式是为了减少server端显卡的空闲时间。
- 
--- a/doc/Makefile
+++ b/doc/Makefile
-# Minimal makefile for Sphinx documentation
-#
-
-# You can set these variables from the command line, and also
-# from the environment for the first two.
-SPHINXOPTS    ?=
-SPHINXBUILD   ?= sphinx-build
-SOURCEDIR     = source
-BUILDDIR      = build
-
-# Put it first so that "make" without argument is like "make help".
-help:
-	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
-
-.PHONY: help Makefile
-
-# Catch-all target: route all unknown targets to Sphinx using the new
-# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
-%: Makefile
-	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
--- a/doc/UWSGI_DEPLOY.md
+++ b/doc/UWSGI_DEPLOY.md
-# Deploy HTTP service with uWSGI
-
-([简体中文](./UWSGI_DEPLOY_CN.md)|English)
-
-In fit_a_line example, after starting the HTTP prediction service, you will see the following information:
-
-```shell
-web service address:
-http://10.127.3.150:9393/uci/prediction
- * Serving Flask app "serve" (lazy loading)
- * Environment: production
-   WARNING: This is a development server. Do not use it in a production deployment.
-   Use a production WSGI server instead.
- * Debug mode: off
- * Running on http://0.0.0.0:9393/ (Press CTRL+C to quit)
-```
-
-Here you will be prompted that the HTTP service started is in development mode and cannot be used for production deployment. 
-The prediction service started by Flask is not stable enough to withstand the concurrency of a large number of requests. In the actual deployment process, WSGI (Web Server Gateway Interface) is used.
-
-Next, we will show how to use the [uWSGI](https://github.com/unbit/uwsgi) module to deploy HTTP prediction services for production environments.
-
-
-```python
-#uwsgi_service.py
-from paddle_serving_server.web_service import WebService
-
-#Define prediction service
-uci_service = WebService(name = "uci")
-uci_service.load_model_config("./uci_housing_model")
-uci_service.prepare_server(workdir="./workdir", port=int(9500), device="cpu")
-uci_service.run_rpc_service()
-#Get flask application
-app_instance = uci_service.get_app_instance()
-```
-
-Start service with uWSGI
-
-```bash
-uwsgi --http :9393 --module uwsgi_service:app_instance
-```
-
-Use the --processes parameter to specify the number of service processes. 
-
-For more information about uWSGI, please refer to [uWSGI documentation](https://uwsgi-docs.readthedocs.io/en/latest/)
--- a/doc/UWSGI_DEPLOY_CN.md
+++ b/doc/UWSGI_DEPLOY_CN.md
-# 使用uwsgi启动HTTP预测服务
-
-(简体中文|[English](./UWSGI_DEPLOY.md))
-
-在提供的fit_a_line示例中，启动HTTP预测服务后会看到有以下信息：
-
-```shell
-web service address:
-http://10.127.3.150:9393/uci/prediction
- * Serving Flask app "serve" (lazy loading)
- * Environment: production
-   WARNING: This is a development server. Do not use it in a production deployment.
-   Use a production WSGI server instead.
- * Debug mode: off
- * Running on http://0.0.0.0:9393/ (Press CTRL+C to quit)
-```
-
-这里会提示启动的HTTP服务是开发模式，并不能用于生产环境的部署。Flask启动的服务环境不够稳定也无法承受大量请求的并发，实际部署过程中配合需要WSGI（Web Server Gateway Interface）使用。
-
-下面我们展示一下如何使用[uWSGI](https://github.com/unbit/uwsgi)模块来部署HTTP预测服务用于生产环境。
-
-编写HTTP服务脚本
-
-```python
-#uwsgi_service.py
-from paddle_serving_server.web_service import WebService
-
-#配置预测服务
-uci_service = WebService(name = "uci")
-uci_service.load_model_config("./uci_housing_model")
-uci_service.prepare_server(workdir="./workdir", port=int(9500), device="cpu")
-uci_service.run_rpc_service()
-#获取flask服务
-app_instance = uci_service.get_app_instance()
-```
-
-使用uwsgi启动HTTP服务
-
-```bash
-uwsgi --http :9393 --module uwsgi_service:app_instance
-```
-
-使用--processes参数可以指定服务的进程数。
-
-更多uWSGI的信息请参考[uWSGI使用文档](https://uwsgi-docs.readthedocs.io/en/latest/)
--- a/doc/ABTEST_IN_PADDLE_SERVING.md
+++ b/doc/ABTEST_IN_PADDLE_SERVING.md
--- a/doc/ABTEST_IN_PADDLE_SERVING_CN.md
+++ b/doc/ABTEST_IN_PADDLE_SERVING_CN.md
--- a/doc/C++DESIGN.md
+++ b/doc/C++DESIGN.md
--- a/doc/C++DESIGN_CN.md
+++ b/doc/C++DESIGN_CN.md
--- a/doc/ENCRYPTION.md
+++ b/doc/ENCRYPTION.md
--- a/doc/ENCRYPTION_CN.md
+++ b/doc/ENCRYPTION_CN.md
--- a/doc/HOT_LOADING_IN_SERVING.md
+++ b/doc/HOT_LOADING_IN_SERVING.md
--- a/doc/HOT_LOADING_IN_SERVING_CN.md
+++ b/doc/HOT_LOADING_IN_SERVING_CN.md
--- a/doc/NEW_OPERATOR.md
+++ b/doc/NEW_OPERATOR.md
--- a/doc/NEW_OPERATOR_CN.md
+++ b/doc/NEW_OPERATOR_CN.md
--- a/doc/NEW_WEB_SERVICE.md
+++ b/doc/NEW_WEB_SERVICE.md
--- a/doc/NEW_WEB_SERVICE_CN.md
+++ b/doc/NEW_WEB_SERVICE_CN.md
--- a/doc/deprecated/MULTI_SERVING_OVER_SINGLE_GPU_CARD.md
+++ b/doc/deprecated/MULTI_SERVING_OVER_SINGLE_GPU_CARD.md
-# Multiple Serving Instances over Single GPU Card
-
-Paddle Serving依托PaddlePaddle预测库执行实际的预测计算。由于当前GPU预测库的限制，单个Serving实例只可以绑定1张GPU卡，且进程内所有worker线程共用1个GPU stream。也就是说，不管Serving启动多少个worker线程，所有的请求在GPU是严格串行计算的，起不到加速作用。这会带来一个问题，就是如果模型计算量不大，那么Serving进程实际上不会用满GPU的算力。
-
-为了充分利用GPU卡的算力，考虑在单张卡上启动多个Serving实例，通过多个GPU stream，力争用满GPU的算力。启动命令可以如下所示：
-
-```
-bin/serving --gpuid=0 --bthread_concurrency=4 --bthread_min_concurrency=4 --port=8010&
-bin/serving --gpuid=0 --bthread_concurrency=4 --bthread_min_concurrency=4 --port=8011&
-```
-
-上述2条命令，启动2个Serving实例，分别监听8010端口和8011端口。但他们都绑定同一张卡 (gpuid = 0)。
-
-命令行参数含义：
-```
-gpuid=N：用于指定所绑定的GPU卡ID
-bthread_concurrency和bthread_min_concurrency共同限制该进程启动的worker数：由于在GPU预测模式下，增加worker线程数并不能提高并发能力，为了节省部分资源，干脆将他们限制掉；均设为4，是因为这是bthread允许的最小值。
-port xxx：Serving实例监听的端口
-```
-
-但是，上述方式究竟是否能在不影响响应时间等其他指标的前提下，起到提高GPU使用率作用，受到多个限制因素的制约，具体的：
-
-1. 单个stream占用GPU算力；假如单个stream已经将GPU算力占用超过50%，那么增加stream很可能会导致2个stream的job分别排队，拖慢各自的响应时间
-2. GPU显存：Serving进程需要将模型参数加载到显存中，并且计算时要在GPU显存池分配临时变量；假如单个Serving进程已经用掉超过50%的显存，则增加Serving进程会造成显存不足，导致进程报错退出
-
-为此，可采用如下步骤，进行测试：
-
-1. 加载模型时，在model_toolkit.prototxt中，model type选择FLUID_GPU_ANALYSIS或FLUID_GPU_ANALYSIS_DIR；会对模型进行静态分析，进行一定程度显存优化
-2. 在步骤1完成后，启动单个Serving进程，启动参数:`--gpuid=N --bthread_concurrency=4 --bthread_min_concurrency=4`；启动一个client，进行并发度为1的压力测试，batch size从小到大，记下平响；由于算力的限制，当batch size增大到一定程度，应该会出现响应时间明显变大；或虽然没有明显变大，但已经不满足系统需求
-3. 再启动1个Serving进程，与步骤2启动时使用相同的参数略有不同: `--gpuid=N --bthread_concurrency=4 --bthread_min_concurrency=4 --port=8011` 其中--port=8011用来让新启动的进程使用一个新的服务端口；然后同时对这2个Serving进程进行压测，继续观察batch size从小到大时平均响应时间的变化，直到取得batch size和响应时间的折中
-4. 重复步骤2-3
-5. 以2-4步的测试，来决定：单张GPU卡可以由多少个Serving进程共用; 实际部署时，就在一张GPU卡上启动这么多个Serving进程同时提供服务
--- a/doc/deprecated/NEW_WEB_SERVICE.md
+++ b/doc/deprecated/NEW_WEB_SERVICE.md
-# How to develop a new Web service?
-
-([简体中文](NEW_WEB_SERVICE_CN.md)|English)
-
-This document will take the image classification service based on the Imagenet data set as an example to introduce how to develop a new web service. The complete code can be visited at [here](../../python/examples/imagenet/resnet50_web_service.py).
-
-## WebService base class
-
-Paddle Serving implements the [WebService](https://github.com/PaddlePaddle/Serving/blob/develop/python/paddle_serving_server/web_service.py#L23) base class. You need to override its `preprocess` and `postprocess` method. The default implementation is as follows:
-
-```python
-class WebService(object):
-  
-    def preprocess(self, feed={}, fetch=[]):
-        return feed, fetch
-    def postprocess(self, feed={}, fetch=[], fetch_map=None):
-        return fetch_map
-```
-
-### preprocess
-
-The preprocess method has two input parameters, `feed` and `fetch`. For an HTTP request `request`:
-
- The value of `feed` is the feed part `request.json["feed"]` in the request data 
- The value of `fetch` is the fetch part `request.json["fetch"]` in the request data
-
-The return values are the feed and fetch values used in the prediction.
-
-### postprocess
-
-The postprocess method has three input parameters, `feed`, `fetch` and `fetch_map`:
-
- The value of `feed` is the feed part `request.json["feed"]` in the request data 
- The value of `fetch` is the fetch part `request.json["fetch"]` in the request data
- The value of `fetch_map` is the model output value.
-
-The return value will be processed as `{"reslut": fetch_map}` as the return of the HTTP request.
-
-## Develop ImageService class
-
-```python
-class ImageService(WebService):
-
-    def preprocess(self, feed={}, fetch=[]):
-        reader = ImageReader()
-        feed_batch = []
-        for ins in feed:
-            if "image" not in ins:
-                raise ("feed data error!")
-            sample = base64.b64decode(ins["image"])
-            img = reader.process_image(sample)
-            feed_batch.append({"image": img})
-        return feed_batch, fetch
-```
-
-For the above `ImageService`, only the `preprocess` method is rewritten to process the image data in Base64 format into the data format required by prediction.
--- a/doc/deprecated/NEW_WEB_SERVICE_CN.md
+++ b/doc/deprecated/NEW_WEB_SERVICE_CN.md
-# 如何开发一个新的Web Service？
-
-(简体中文|[English](NEW_WEB_SERVICE.md))
-
-本文档将以Imagenet图像分类服务为例，来介绍如何开发一个新的Web Service。您可以在[这里](../../python/examples/imagenet/resnet50_web_service.py)查阅完整的代码。
-
-## WebService基类
-
-Paddle Serving实现了[WebService](https://github.com/PaddlePaddle/Serving/blob/develop/python/paddle_serving_server/web_service.py#L23)基类，您需要重写它的`preprocess`方法和`postprocess`方法，默认实现如下：
-
-```python
-class WebService(object):
-  
-    def preprocess(self, feed={}, fetch=[]):
-        return feed, fetch
-    def postprocess(self, feed={}, fetch=[], fetch_map=None):
-        return fetch_map
-```
-
-### preprocess方法
-
-preprocess方法有两个输入参数，`feed`和`fetch`。对于一个HTTP请求`request`：
-
- `feed`的值为请求数据中的feed部分`request.json["feed"]`
- `fetch`的值为请求数据中的fetch部分`request.json["fetch"]`
-
-返回值分别是预测过程中用到的feed和fetch值。
-
-### postprocess方法
-
-postprocess方法有三个输入参数，`feed`、`fetch`和`fetch_map`：
-
- `feed`的值为请求数据中的feed部分`request.json["feed"]`
- `fetch`的值为请求数据中的fetch部分`request.json["fetch"]`
- `fetch_map`的值为fetch到的模型输出值
-
-返回值将会被处理成`{"reslut": fetch_map}`作为HTTP请求的返回。
-
-## 开发ImageService类
-
-```python
-class ImageService(WebService):
-
-    def preprocess(self, feed={}, fetch=[]):
-        reader = ImageReader()
-        feed_batch = []
-        for ins in feed:
-            if "image" not in ins:
-                raise ("feed data error!")
-            sample = base64.b64decode(ins["image"])
-            img = reader.process_image(sample)
-            feed_batch.append({"image": img})
-        return feed_batch, fetch
-```
-
-对于上述的`ImageService`，只重写了前处理方法，将base64格式的图片数据处理成模型预测需要的数据格式。
--- a/doc/doc_test_list
+++ b/doc/doc_test_list
-BERT_10_MINS.md
-ABTEST_IN_PADDLE_SERVING.md
--- a/doc/PIPELINE_SERVING.md
+++ b/doc/PIPELINE_SERVING.md
--- a/doc/PIPELINE_SERVING_CN.md
+++ b/doc/PIPELINE_SERVING_CN.md