README.md 6.5 KB
Newer Older
H
Helin Wang 已提交
1
# Inference Server Example
2

M
Mimee 已提交
3
The inference server can be used to perform inference on any model trained on
H
Helin Wang 已提交
4
PaddlePaddle. It provides an HTTP endpoint.
5 6 7

## Run

H
Helin Wang 已提交
8
The inference server reads a trained model (a topology file and a
H
Helin Wang 已提交
9 10 11 12 13
parameter file) and serves HTTP request at port `8000`. Because models
differ in the numbers and types of inputs, **the HTTP API will differ
slightly for each model,** please see [HTTP API](#http-api) for the
API spec,
and
M
Mimee 已提交
14
[here](https://github.com/PaddlePaddle/book/wiki/Using-Pre-trained-Models) for
H
Helin Wang 已提交
15 16
the request examples of different models that illustrate the
difference.
Y
Yu Yang 已提交
17

H
Helin Wang 已提交
18 19
We will first show how to obtain the PaddlePaddle model, and then how
to start the server.
Y
Yu Yang 已提交
20

H
Helin Wang 已提交
21 22 23
We will use Docker to run the demo, if you are not familiar with
Docker, please checkout
this
M
Mimee 已提交
24
[TLDR](https://github.com/PaddlePaddle/Paddle/wiki/Docker-for-Beginners).
Y
Yu Yang 已提交
25

H
Helin Wang 已提交
26
### Obtain the PaddlePaddle Model
Y
Yu Yang 已提交
27

M
Mimee 已提交
28 29
A neural network model in PaddlePaddle contains two parts: the
**parameter** and the **topology**.
H
Helin Wang 已提交
30 31 32

A PaddlePaddle training script contains the neural network topology,
which is represented by layers. For example,
Y
Yu Yang 已提交
33 34 35 36 37 38 39

```python
img = paddle.layer.data(name="img", type=paddle.data_type.dense_vector(784))
hidden = fc_layer(input=type, size=200)
prediction = fc_layer(input=hidden, size=10, act=paddle.activation.Softmax())
```

H
Helin Wang 已提交
40 41
The parameter instance is created by the topology and updated by the
`train` method.
Y
Yu Yang 已提交
42 43 44 45 46 47 48 49 50 51 52

```python
...
params = paddle.parameters.create(cost)
...
trainer = paddle.trainer.SGD(cost=cost, parameters=params)
...
```

PaddlePaddle stores the topology and parameter separately.

H
Helin Wang 已提交
53 54
1. To serialize a topology, we need to create a topology instance
   explicitly by the outputs of the neural network. Then, invoke
M
Mimee 已提交
55
   `serialize_for_inference` method.
Y
Yu Yang 已提交
56 57 58 59 60 61 62 63

  ```python
  # Save the inference topology to protobuf.
  inference_topology = paddle.topology.Topology(layers=prediction)
  with open("inference_topology.pkl", 'wb') as f:
      inference_topology.serialize_for_inference(f)
  ```

64 65
2. To save a parameter, we need to invoke `save_parameter_to_tar` method of
  `trainer`.
Y
Yu Yang 已提交
66 67 68

  ```python
  with open('param.tar', 'w') as f:
69
      trainer.save_parameter_to_tar(f)
Y
Yu Yang 已提交
70
  ```
Y
Fix CI  
Yu Yang 已提交
71

M
Mimee 已提交
72 73
 After serializing the parameter and topology into two files, we could
 use them to set up an inference server.
Y
Yu Yang 已提交
74

M
Mimee 已提交
75
 For a working example, please see [train.py](https://github.com/reyoung/paddle_mnist_v2_demo/blob/master/train.py).
Y
Yu Yang 已提交
76 77


H
Helin Wang 已提交
78 79 80 81 82 83 84 85 86 87 88
### Start the Server

Make sure the `inference_topology.pkl` and `param.tar` mentioned in
the last section are in your current working directory, and run the
command:

```bash
docker run --name paddle_serve -v `pwd`:/data -d -p 8000:80 -e WITH_GPU=0 paddlepaddle/book:serve
```

The above command will mount the current working directory to the
M
Mimee 已提交
89
`/data/` directory inside the docker container. The inference server
H
Helin Wang 已提交
90 91
will load the model topology and parameters that we just created from
there.
Y
Yu Yang 已提交
92

M
Mimee 已提交
93 94
To run the inference server with GPU support, please make sure you have
[nvidia-docker](https://github.com/NVIDIA/nvidia-docker)
H
Helin Wang 已提交
95
first, and run:
Y
Yu Yang 已提交
96

H
Helin Wang 已提交
97
```bash
G
gangliao 已提交
98
nvidia-docker run --name paddle_serve -v `pwd`:/data -d -p 8000:80 -e WITH_GPU=1 paddlepaddle/book:serve-gpu
H
Helin Wang 已提交
99 100
```

S
Superjom 已提交
101 102
this command will start a server on port `8000`.

H
Helin Wang 已提交
103 104 105 106
After you are done with the demo, you can run `docker stop
paddle_serve` to stop this docker container.

## HTTP API
Y
Yu Yang 已提交
107

H
Helin Wang 已提交
108 109 110
The inference server will handle HTTP POST request on path `/`. The
content type of the request and response is json. You need to manually
add `Content-Type` request header as `Content-Type: application/json`.
Y
Yu Yang 已提交
111

H
Helin Wang 已提交
112 113 114
The request json object is a single json dictionay object, whose key
is the layer name of input data. The type of the corresponding value
is decided by the data type. For most cases the corresponding value
M
Mimee 已提交
115
will be a list of floats. For completeness, we will list all data types
H
Helin Wang 已提交
116
below:
Y
Yu Yang 已提交
117

S
Superjom 已提交
118
There are twelve data types supported by PaddePaddle:
Y
Yu Yang 已提交
119 120 121 122 123 124 125 126

| | plain | a sequence | a sequence of sequence |
| --- | --- | --- | ---|
| dense | [ f, f, f, f, ... ] | [ [f, f, f, ...], [f, f, f, ...]] | [[[f, f, ...], [f, f, ...]], [[f, f, ...], [f, f, ...]], ...] |
| integer | i | [i, i, ...] | [[i, i, ...], [i, i, ...], ...] |
| sparse | [i, i, ...] | [[i, i, ...], [i, i, ...], ...] | [[[i, i, ...], [i, i, ...], ...], [[i, i, ...], [i, i, ...], ...], ...] |
| sparse | [[i, f], [i, f], ... ] | [[[i, f], [i, f], ... ], ...] | [[[[i, f], [i, f], ... ], ...], ...]

H
Helin Wang 已提交
127 128
In the table, `i` stands for a `int` value and `f` stands for a
`float` value.
Y
Yu Yang 已提交
129

H
Helin Wang 已提交
130 131
What `data_type` should be used is decided by the training
topology. For example,
Y
Yu Yang 已提交
132

H
Helin Wang 已提交
133 134 135 136
* For image data, they are usually a plain dense vector, we flatten
  the image into a vector. The pixel values of that image are usually
  normalized in `[-1.0, 1.0]` or `[0.0, 1.0]`(depends on each neural
  network).
Y
Yu Yang 已提交
137

Y
Fix CI  
Yu Yang 已提交
138 139
    ```text
    +-------+
Y
Yu Yang 已提交
140 141 142
   |243 241|
   |139 211| +---->[0.95, 0.95, 0.54, 0.82]
   +-------+
Y
Fix CI  
Yu Yang 已提交
143
    ```
H
Helin Wang 已提交
144 145 146 147 148

* For text data, each word of that text is represented by an
  integer. The association map between word and integer is decided by
  the training process. A sentence is represented by a list of
  integer.
Y
Yu Yang 已提交
149 150 151 152 153 154 155 156

   ```text
    I am good .
        +
        |
        v
   23 942 402 19  +----->  [23, 942, 402, 19]
   ```
Y
Fix CI  
Yu Yang 已提交
157

Y
Yu Yang 已提交
158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199
A sample request data of a `4x4` image and a sentence could be

```json
{
    "img": [
        0.95,
        0.95,
        0.54,
        0.82
    ],
    "sentence": [
        23,
        942,
        402,
        19
    ]
}
```

The response is a json object, too. The example of return data are:

```json
{
  "code": 0,
  "data": [
    [
      0.10060056298971176,
      0.057179879397153854,
      0.1453431099653244,
      0.15825574100017548,
      0.04464773088693619,
      0.1566203236579895,
      0.05657859891653061,
      0.12077419459819794,
      0.08073269575834274,
      0.07926714420318604
    ]
  ],
  "message": "success"
}
```

M
Mimee 已提交
200 201
Here, `code` and `message` represent the status of the request.
`data` corresponds to the outputs of the neural network; they could be a
H
Helin Wang 已提交
202 203 204
probability of each class, could be the IDs of output sentence, and so
on.

Q
qiaolongfei 已提交
205 206 207 208 209
## MNIST Demo Client

If you have trained an model with [train.py](https://github.com/reyoung/paddle_mnist_v2_demo/blob/master/train.py) and
start a inference server. Then you can use this [client](https://github.com/PaddlePaddle/book/tree/develop/02.recognize_digits/client/client.py) to test if it works right.

H
Helin Wang 已提交
210 211 212 213 214 215 216 217
## Build

We have already prepared the pre-built docker image
`paddlepaddle/book:serve`, here is the command if you want to build
the docker image again.

```bash
docker build -t paddlepaddle/book:serve .
G
gangliao 已提交
218
docker build -t paddlepaddle/book:serve-gpu -f Dockerfile.gpu .
H
Helin Wang 已提交
219
```