From 9612b0bc44715bfd80fcd9b56b8813e10c4db92a Mon Sep 17 00:00:00 2001 From: Helin Wang Date: Wed, 13 Sep 2017 15:04:55 -0700 Subject: [PATCH] refine serve README --- mnist-client/README.md | 4 +- serve/README.md | 113 ++++++++++++++++++++++++++++++++--------- 2 files changed, 90 insertions(+), 27 deletions(-) diff --git a/mnist-client/README.md b/mnist-client/README.md index 830f011..7e91e85 100644 --- a/mnist-client/README.md +++ b/mnist-client/README.md @@ -53,8 +53,8 @@ PaddlePaddle. Please see [here](TODO) for more details. ## Build We have already prepared the pre-built docker image -`paddlepaddle/book:mnist`, here is the command if you want build the -docker image again. +`paddlepaddle/book:mnist`, here is the command if you want to build +the docker image again. ```bash docker build -t paddlepaddle/book:mnist . diff --git a/serve/README.md b/serve/README.md index bcae06d..8f4904b 100644 --- a/serve/README.md +++ b/serve/README.md @@ -1,21 +1,28 @@ -# PaddlePaddle Serving Example +# Inference Server Example - -## Build - - $ docker build -t serve . +The inference server can be used to inference any model trained by +PaddlePaddle. It provides an HTTP endpoint. ## Run - $ docker run -v `pwd`:/data -it -p 8000:80 -e WITH_GPU=0 paddlepaddle/book:serve - $ curl -H "Content-Type: application/json" -X POST -d '{"img":[0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0]}' http://localhost:8000/ +The inference server reads a trained model (a topology file and a +parameter file) and serves HTTP request at port `8000`. +We will first show how to obtain the PaddlePaddle model, and then how +to start the server. -## How to save PaddlePaddle model +We will use Docker to run the demo, if you are not familiar with +Docker, please checkout +this +[tutorial](https://github.com/PaddlePaddle/Paddle/wiki/TLDR-for-new-docker-user). -Neural network model in PaddlePaddle contains two parts, the parameter, and the topology. +### Obtain the PaddlePaddle Model -Paddle training scripts contain the neural network topology, which is representing by layers. For example, +Neural network model in PaddlePaddle contains two parts, the +parameter, and the topology. + +A PaddlePaddle training script contains the neural network topology, +which is represented by layers. For example, ```python img = paddle.layer.data(name="img", type=paddle.data_type.dense_vector(784)) @@ -23,7 +30,8 @@ hidden = fc_layer(input=type, size=200) prediction = fc_layer(input=hidden, size=10, act=paddle.activation.Softmax()) ``` -The parameter instance is created by topology and updated by the `train` method. +The parameter instance is created by the topology and updated by the +`train` method. ```python ... @@ -35,7 +43,9 @@ trainer = paddle.trainer.SGD(cost=cost, parameters=params) PaddlePaddle stores the topology and parameter separately. -1. To serialize a topology, we need to create a topology instance explicitly by the outputs of the neural network. Then, invoke `serialize_for_inference` method. The example code is +1. To serialize a topology, we need to create a topology instance + explicitly by the outputs of the neural network. Then, invoke + `serialize_for_inference` method. The example code is ```python # Save the inference topology to protobuf. @@ -44,28 +54,59 @@ PaddlePaddle stores the topology and parameter separately. inference_topology.serialize_for_inference(f) ``` -2. To save a parameter, we need to invoke `to_tar` method in Parameter class. The example code is, +2. To save a parameter, we need to invoke `to_tar` method in Parameter + class. The example code is, ```python with open('param.tar', 'w') as f: params.to_tar(f) ``` - After we serialize the parameter and topology to two files, we could use that two files to set up an inference server. + After we serialize the parameter and topology to two files, we could + use that two files to set up an inference server. + For a working example, please see [here](https://github.com/reyoung/paddle_mnist_v2_demo/blob/master/train.py). -## How to set up an inference server -... +### Start the Server + +Make sure the `inference_topology.pkl` and `param.tar` mentioned in +the last section are in your current working directory, and run the +command: + +```bash +docker run --name paddle_serve -v `pwd`:/data -d -p 8000:80 -e WITH_GPU=0 paddlepaddle/book:serve +``` + +The above command will mount the current working directory to the +`/data` directory inside the docker container. The inference server +will load the model topology and parameters that we just created from +there. +To run the inference server with GPU support, please +install [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) +first, and run: -## What is the data format of inference server +```bash +nvidia-docker run --name paddle_serve -v `pwd`:/data -d -p 8000:80 -e WITH_GPU=1 paddlepaddle/book:serve +``` + +After you are done with the demo, you can run `docker stop +paddle_serve` to stop this docker container. + +## HTTP API -The inference server will handle a post request on uri `/`. The contant type of the request and response is json. You need to manually add `Content-Type` request header as `Content-Type: application/json`. +The inference server will handle HTTP POST request on path `/`. The +content type of the request and response is json. You need to manually +add `Content-Type` request header as `Content-Type: application/json`. -The request json object is a single json object, which key is the layer name of input data. The value of that object is decided by data type. +The request json object is a single json dictionay object, whose key +is the layer name of input data. The type of the corresponding value +is decided by the data type. For most cases the corresponding value +will be a list of floats. For completeness we will list all data types +below: -There are tweleve data types are supported by PaddePaddle, and they are organized in a matrix. +There are tweleve data types supported by PaddePaddle: | | plain | a sequence | a sequence of sequence | | --- | --- | --- | ---| @@ -74,11 +115,16 @@ There are tweleve data types are supported by PaddePaddle, and they are organize | sparse | [i, i, ...] | [[i, i, ...], [i, i, ...], ...] | [[[i, i, ...], [i, i, ...], ...], [[i, i, ...], [i, i, ...], ...], ...] | | sparse | [[i, f], [i, f], ... ] | [[[i, f], [i, f], ... ], ...] | [[[[i, f], [i, f], ... ], ...], ...] -In that table, `i` stands for a `int` value and `f` stands for a `float` value. +In the table, `i` stands for a `int` value and `f` stands for a +`float` value. -What `data_type` should be used is decided by the training topology. For example, +What `data_type` should be used is decided by the training +topology. For example, -* For image data, they are usually a plain dense vector, we flatten the image into a vector. The pixels of that image are usually normalized in `[-1.0, 1.0]` or `[0.0, 1.0]`(it depends on each neural network.). +* For image data, they are usually a plain dense vector, we flatten + the image into a vector. The pixel values of that image are usually + normalized in `[-1.0, 1.0]` or `[0.0, 1.0]`(depends on each neural + network). ```text +-------+ @@ -86,7 +132,11 @@ What `data_type` should be used is decided by the training topology. For example |139 211| +---->[0.95, 0.95, 0.54, 0.82] +-------+ ``` -* For text data, each word of that text is represented by a integer. The association map between word and integer is decided by the training process. A sentence is represented by a list of integer. + +* For text data, each word of that text is represented by an + integer. The association map between word and integer is decided by + the training process. A sentence is represented by a list of + integer. ```text I am good . @@ -138,4 +188,17 @@ The response is a json object, too. The example of return data are: } ``` -The `code` and `message` represent the status of the request. The `data` are the outputs of the neural network; they could be a probability of each class, could be the IDs of output sentence, and so on. +The `code` and `message` represent the status of the request. The +`data` are the outputs of the neural network; they could be a +probability of each class, could be the IDs of output sentence, and so +on. + +## Build + +We have already prepared the pre-built docker image +`paddlepaddle/book:serve`, here is the command if you want to build +the docker image again. + +```bash +docker build -t paddlepaddle/book:serve . +``` -- GitLab