README.md 14.3 KB
Newer Older
M
MRXLT 已提交
1 2
([简体中文](./README_CN.md)|English)

D
Dong Daxiang 已提交
3 4
<p align="center">
    <br>
D
Dong Daxiang 已提交
5
<img src='doc/serving_logo.png' width = "600" height = "130">
D
Dong Daxiang 已提交
6 7
    <br>
<p>
8

M
MRXLT 已提交
9

B
barrierye 已提交
10

D
Dong Daxiang 已提交
11 12
<p align="center">
    <br>
B
barrierye 已提交
13 14 15
    <a href="https://travis-ci.com/PaddlePaddle/Serving">
        <img alt="Build Status" src="https://img.shields.io/travis/com/PaddlePaddle/Serving/develop">
    </a>
D
Dong Daxiang 已提交
16 17 18 19
    <img alt="Release" src="https://img.shields.io/badge/Release-0.0.3-yellowgreen">
    <img alt="Issues" src="https://img.shields.io/github/issues/PaddlePaddle/Serving">
    <img alt="License" src="https://img.shields.io/github/license/PaddlePaddle/Serving">
    <img alt="Slack" src="https://img.shields.io/badge/Join-Slack-green">
D
Dong Daxiang 已提交
20 21
    <br>
<p>
D
Dong Daxiang 已提交
22

W
wangjiawei04 已提交
23
- [Motivation](./README.md#motivation)
W
wangjiawei04 已提交
24
- [AIStudio Tutorial](./README.md#aistuio-tutorial)
W
wangjiawei04 已提交
25 26 27 28
- [Installation](./README.md#installation)
- [Quick Start Example](./README.md#quick-start-example)
- [Document](README.md#document)
- [Community](README.md#community)
W
wangjiawei04 已提交
29

D
Dong Daxiang 已提交
30
<h2 align="center">Motivation</h2>
D
Dong Daxiang 已提交
31

J
Jiawei Wang 已提交
32
We consider deploying deep learning inference service online to be a user-facing application in the future. **The goal of this project**: When you have trained a deep neural net with [Paddle](https://github.com/PaddlePaddle/Paddle), you are also capable to deploy the model online easily. A demo of Paddle Serving is as follows:
W
wangjiawei04 已提交
33

W
wangjiawei04 已提交
34 35 36 37 38 39 40
<h3 align="center">Some Key Features of Paddle Serving</h3>

- Integrate with Paddle training pipeline seamlessly, most paddle models can be deployed **with one line command**.
- **Industrial serving features** supported, such as models management, online loading, online A/B testing etc.
- **Highly concurrent and efficient communication** between clients and servers supported.
- **Multiple programming languages** supported on client side, such as C++, python and Java.

J
Jiawei Wang 已提交
41 42
***

T
TeslaZhao 已提交
43
- Any model trained by [PaddlePaddle](https://github.com/paddlepaddle/paddle) can be directly used or [Model Conversion Interface](./doc/SAVE.md) for online deployment of Paddle Serving.
W
wangjiawei04 已提交
44
- Support [Multi-model Pipeline Deployment](./doc/PIPELINE_SERVING.md), and provide the requirements of the REST interface and RPC interface itself, [Pipeline example](./python/examples/pipeline).
T
TeslaZhao 已提交
45
- Support the model zoos from the Paddle ecosystem, such as [PaddleDetection](./python/examples/detection), [PaddleOCR](./python/examples/ocr), [PaddleRec](https://github.com/PaddlePaddle/PaddleRec/tree/master/recserving/movie_recommender).
W
wangjiawei04 已提交
46 47 48
- Provide a variety of pre-processing and post-processing to facilitate users in training, deployment and other stages of related code, bridging the gap between AI developers and application developers, please refer to
[Serving Examples](./python/examples/).

D
Dong Daxiang 已提交
49
<p align="center">
D
Dong Daxiang 已提交
50
    <img src="doc/demo.gif" width="700">
D
Dong Daxiang 已提交
51
</p>
D
Dong Daxiang 已提交
52 53


W
wangjiawei04 已提交
54
<h2 align="center">AIStudio Turorial</h2>
W
wangjiawei04 已提交
55

W
wangjiawei04 已提交
56
Here we provide tutorial on AIStudio(Chinese Version) [AIStudio教程-Paddle Serving服务化部署框架](https://www.paddlepaddle.org.cn/tutorials/projectdetail/1555945)
W
wangjiawei04 已提交
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78

The tutorial provides 
<ul>
<li>Paddle Serving Environment Setup</li>
  <ul>
    <li>Running in docker images
    <li>pip install Paddle Serving
  </ul>
<li>Quick Experience of Paddle Serving</li>
<li>Advanced Tutorial of Model Deployment</li>
  <ul>
    <li>Save/Convert Models for Paddle Serving</li>
    <li>Setup Online Inference Service</li>
  </ul>
<li>Paddle Serving Examples</li>
  <ul>
    <li>Paddle Serving for Detections</li>
    <li>Paddle Serving for OCR</li>
  </ul>
</ul>


D
Dong Daxiang 已提交
79
<h2 align="center">Installation</h2>
D
Dong Daxiang 已提交
80

W
wangjiawei04 已提交
81 82
We **highly recommend** you to **run Paddle Serving in Docker**, please visit [Run in Docker](doc/RUN_IN_DOCKER.md). See the [document](doc/DOCKER_IMAGES.md) for more docker images.

83 84 85
**Attention:**: Currently, the default GPU environment of paddlepaddle 2.1 is Cuda 10.2, so the sample code of GPU Docker is based on Cuda 10.2. We also provides docker images and whl packages for other GPU environments. If users use other environments, they need to carefully check and select the appropriate version.

**Attention:** the following so-called 'python' or 'pip' stands for one of Python 3.6/3.7/3.8.
W
wangjiawei04 已提交
86

M
MRXLT 已提交
87 88
```
# Run CPU Docker
89 90
docker pull registry.baidubce.com/paddlepaddle/serving:0.6.0-devel
docker run -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/serving:0.6.0-devel bash
M
MRXLT 已提交
91
docker exec -it test bash
W
wangjiawei04 已提交
92
git clone https://github.com/PaddlePaddle/Serving
M
MRXLT 已提交
93 94 95
```
```
# Run GPU Docker
96 97
nvidia-docker pull registry.baidubce.com/paddlepaddle/serving:0.6.0-cuda10.2-cudnn8-devel
nvidia-docker run -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/serving:0.6.0-cuda10.2-cudnn8-devel bash
M
MRXLT 已提交
98
nvidia-docker exec -it test bash
W
wangjiawei04 已提交
99
git clone https://github.com/PaddlePaddle/Serving
M
MRXLT 已提交
100
```
101 102 103 104 105
install python dependencies
```
cd Serving
pip install -r python/requirements.txt
```
D
Dong Daxiang 已提交
106

D
Dong Daxiang 已提交
107
```shell
108 109 110 111
pip install paddle-serving-client==0.6.0
pip install paddle-serving-server==0.6.0 # CPU
pip install paddle-serving-app==0.6.0
pip install paddle-serving-server-gpu==0.6.0.post102 #GPU with CUDA10.2 + TensorRT7
W
wangjiawei04 已提交
112
# DO NOT RUN ALL COMMANDS! check your GPU env and select the right one
113 114
pip install paddle-serving-server-gpu==0.6.0.post101 # GPU with CUDA10.1 + TensorRT6
pip install paddle-serving-server-gpu==0.6.0.post11 # GPU with CUDA10.1 + TensorRT7
D
Dong Daxiang 已提交
115 116
```

M
MRXLT 已提交
117
You may need to use a domestic mirror source (in China, you can use the Tsinghua mirror source, add `-i https://pypi.tuna.tsinghua.edu.cn/simple` to pip command) to speed up the download.
B
barrierye 已提交
118

W
wangjiawei04 已提交
119
If you need install modules compiled with develop branch, please download packages from [latest packages list](./doc/LATEST_PACKAGES.md) and install with `pip install` command. If you want to compile by yourself, please refer to [How to compile Paddle Serving?](./doc/COMPILE.md)
M
MRXLT 已提交
120

W
wangjiawei04 已提交
121
Packages of paddle-serving-server and paddle-serving-server-gpu support Centos 6/7, Ubuntu 16/18, Windows 10.
122

123 124 125 126 127
Packages of paddle-serving-client and paddle-serving-app support Linux and Windows, but paddle-serving-client only support python3.6/3.7/3.8.

**For latest version, Cuda 9.0 or Cuda 10.0 are no longer supported, Python2.7/3.5 is no longer supported.**

Recommended to install paddle >= 2.1.0
M
MRXLT 已提交
128

D
Dong Daxiang 已提交
129 130

```
W
wangjiawei04 已提交
131
# CPU users, please run
132
pip install paddlepaddle==2.1.0
D
Dong Daxiang 已提交
133

W
wangjiawei04 已提交
134
# GPU Cuda10.2 please run
135
pip install paddlepaddle-gpu==2.1.0 
D
Dong Daxiang 已提交
136 137
```

W
wangjiawei04 已提交
138
**Note**: If your Cuda version is not 10.2, please do not execute the above commands directly, you need to refer to [Paddle official documentation-multi-version whl package list
W
wangjiawei04 已提交
139
](https://www.paddlepaddle.org.cn/documentation/docs/en/install/Tables_en.html#multi-version-whl-package-list-release)
W
wangjiawei04 已提交
140

141 142
Select the url link of the corresponding GPU environment and install it. For example, for Python3.6 users of Cuda 10.1, please select `cp36-cp36m` and
The url corresponding to `cuda10.1-cudnn7-mkl-gcc8.2-avx-trt6.0.1.5`, copy it and run
W
wangjiawei04 已提交
143
```
144
pip install https://paddle-wheel.bj.bcebos.com/with-trt/2.1.0-gpu-cuda10.1-cudnn7-mkl-gcc8.2/paddlepaddle_gpu-2.1.0.post101-cp36-cp36m-linux_x86_64.whl
W
wangjiawei04 已提交
145
```
W
wangjiawei04 已提交
146

147
the default `paddlepaddle-gpu==2.1.0` is Cuda 10.2 with no TensorRT. If you want to install PaddlePaddle with TensorRT. please also check the documentation-multi-version whl package list and find key word `cuda10.2-cudnn8.0-trt7.1.3`. More info please check [Paddle Serving uses TensorRT](./doc/TENSOR_RT.md)
W
wangjiawei04 已提交
148

W
wangjiawei04 已提交
149 150
If it is other environment and Python version, please find the corresponding link in the table and install it with pip.

W
wangjiawei04 已提交
151

W
wangjiawei04 已提交
152
For **Windows Users**, please read the document [Paddle Serving for Windows Users](./doc/WINDOWS_TUTORIAL.md)
D
Dong Daxiang 已提交
153

D
Dong Daxiang 已提交
154
<h2 align="center">Quick Start Example</h2>
D
Dong Daxiang 已提交
155

W
wangjiawei04 已提交
156
This quick start example is mainly for those users who already have a model to deploy, and we also provide a model that can be used for deployment. in case if you want to know how to complete the process from offline training to online service, please refer to the AiStudio tutorial above.
D
Dong Daxiang 已提交
157

D
Dong Daxiang 已提交
158
### Boston House Price Prediction model
W
wangjiawei04 已提交
159 160

get into the Serving git directory, and change dir to `fit_a_line`
D
Dong Daxiang 已提交
161
``` shell
W
wangjiawei04 已提交
162 163
cd Serving/python/examples/fit_a_line
sh get_data.sh
D
Dong Daxiang 已提交
164
```
D
Dong Daxiang 已提交
165

D
Dong Daxiang 已提交
166 167
Paddle Serving provides HTTP and RPC based service for users to access

W
wangjiawei04 已提交
168
### RPC service
D
Dong Daxiang 已提交
169

W
wangjiawei04 已提交
170
A user can also start a RPC service with `paddle_serving_server.serve`. RPC service is usually faster than HTTP service, although a user needs to do some coding based on Paddle Serving's python client API. Note that we do not specify `--name` here. 
D
Dong Daxiang 已提交
171
``` shell
W
wangjiawei04 已提交
172
python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292
D
Dong Daxiang 已提交
173
```
D
Dong Daxiang 已提交
174 175
<center>

Z
update  
zhangjun 已提交
176 177 178 179 180 181 182 183 184 185 186 187 188
| Argument                                       | Type | Default | Description                                           |
| ---------------------------------------------- | ---- | ------- | ----------------------------------------------------- |
| `thread`                                       | int  | `4`     | Concurrency of current service                        |
| `port`                                         | int  | `9292`  | Exposed port of current service to users              |
| `model`                                        | str  | `""`    | Path of paddle model directory to be served           |
| `mem_optim_off`                                | -    | -       | Disable memory / graphic memory optimization          |
| `ir_optim`                                     | bool | False   | Enable analysis and optimization of calculation graph |
| `use_mkl` (Only for cpu version)               | -    | -       | Run inference with MKL                                |
| `use_trt` (Only for trt version)               | -    | -       | Run inference with TensorRT                           |
| `use_lite` (Only for Intel x86 CPU or ARM CPU) | -    | -       | Run PaddleLite inference                              |
| `use_xpu`                                      | -    | -       | Run PaddleLite inference with Baidu Kunlun XPU        |
| `precision`                                    | str  | FP32    | Precision Mode, support FP32, FP16, INT8              |
| `use_calib`                                    | bool | False   | Only for deployment with TensorRT                     |
D
Dong Daxiang 已提交
189 190

</center>
W
fix doc  
wangjiawei04 已提交
191 192

```python
D
Dong Daxiang 已提交
193
# A user can visit rpc service through paddle_serving_client API
D
Dong Daxiang 已提交
194
from paddle_serving_client import Client
W
wangjiawei04 已提交
195
import numpy as np
D
Dong Daxiang 已提交
196
client = Client()
D
Dong Daxiang 已提交
197
client.load_client_config("uci_housing_client/serving_client_conf.prototxt")
D
Dong Daxiang 已提交
198
client.connect(["127.0.0.1:9292"])
D
Dong Daxiang 已提交
199
data = [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727,
D
Dong Daxiang 已提交
200
        -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]
W
wangjiawei04 已提交
201
fetch_map = client.predict(feed={"x": np.array(data).reshape(1,13,1)}, fetch=["price"])
D
Dong Daxiang 已提交
202
print(fetch_map)
D
Dong Daxiang 已提交
203
```
D
Dong Daxiang 已提交
204
Here, `client.predict` function has two arguments. `feed` is a `python dict` with model input variable alias name and values. `fetch` assigns the prediction variables to be returned from servers. In the example, the name of `"x"` and `"price"` are assigned when the servable model is saved during training.
D
Dong Daxiang 已提交
205

M
MRXLT 已提交
206

W
wangjiawei04 已提交
207 208 209 210
### WEB service

Users can also put the data format processing logic on the server side, so that they can directly use curl to access the service, refer to the following case whose path is `python/examples/fit_a_line`

W
wangjiawei04 已提交
211 212
```
python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 --name uci
W
wangjiawei04 已提交
213 214 215 216 217 218 219 220 221
```
for client side,
```
curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
```
the response is
```
{"result":{"price":[[18.901151657104492]]}}
```
222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249
<h3 align="center">Pipeline Service</h3>

Paddle Serving provides industry-leading multi-model tandem services, which strongly supports the actual operating business scenarios of major companies, please refer to [OCR word recognition](./python/examples/pipeline/ocr).

we get two models
```
python -m paddle_serving_app.package --get_model ocr_rec
tar -xzvf ocr_rec.tar.gz
python -m paddle_serving_app.package --get_model ocr_det
tar -xzvf ocr_det.tar.gz
```
then we start server side, launch two models as one standalone web service
```
python web_service.py
```
http request
```
python pipeline_http_client.py
```
grpc request
```
python pipeline_rpc_client.py
```
output
```
{'err_no': 0, 'err_msg': '', 'key': ['res'], 'value': ["['土地整治与土壤修复研究中心', '华南农业大学1素图']"]}
```

W
wangjiawei04 已提交
250

D
Dong Daxiang 已提交
251
<h2 align="center">Document</h2>
D
Dong Daxiang 已提交
252

D
Dong Daxiang 已提交
253
### New to Paddle Serving
D
Dong Daxiang 已提交
254
- [How to save a servable model?](doc/SAVE.md)
J
Jiawei Wang 已提交
255
- [Write Bert-as-Service in 10 minutes](doc/BERT_10_MINS.md)
W
wangjiawei04 已提交
256
- [Paddle Serving Examples](python/examples)
257 258
- [How to process natural data in Paddle Serving?(Chinese)](doc/PROCESS_DATA.md)
- [How to process level of detail(LOD)?](doc/LOD.md)
W
wangjiawei04 已提交
259

D
Dong Daxiang 已提交
260
### Developers
261 262
- [How to deploy Paddle Serving on K8S?(Chinese)](doc/PADDLE_SERVING_ON_KUBERNETES.md)
- [How to route Paddle Serving to secure endpoint?(Chinese)](doc/SERVIING_AUTH_DOCKER.md)
B
barrierye 已提交
263
- [How to develop a new Web Service?](doc/NEW_WEB_SERVICE.md)
J
Jiawei Wang 已提交
264
- [Compile from source code](doc/COMPILE.md)
W
wangjiawei04 已提交
265
- [Develop Pipeline Serving](doc/PIPELINE_SERVING.md)
M
MRXLT 已提交
266 267
- [Deploy Web Service with uWSGI](doc/UWSGI_DEPLOY.md)
- [Hot loading for model file](doc/HOT_LOADING_IN_SERVING.md)
W
fix  
wangjiawei04 已提交
268
- [Paddle Serving uses TensorRT](doc/TENSOR_RT.md)
D
Dong Daxiang 已提交
269

D
Dong Daxiang 已提交
270
### About Efficiency
M
MRXLT 已提交
271
- [How to profile Paddle Serving latency?](python/examples/util)
M
MRXLT 已提交
272
- [How to optimize performance?](doc/PERFORMANCE_OPTIM.md)
M
MRXLT 已提交
273
- [Deploy multi-services on one GPU(Chinese)](doc/MULTI_SERVICE_ON_ONE_GPU_CN.md)
274
- [GPU Benchmarks(Chinese)](doc/BENCHMARKING_GPU.md)
D
Dong Daxiang 已提交
275

D
Dong Daxiang 已提交
276
### Design
J
Jiawei Wang 已提交
277
- [Design Doc](doc/DESIGN_DOC.md)
D
Dong Daxiang 已提交
278

W
wangjiawei04 已提交
279 280
### FAQ
- [FAQ(Chinese)](doc/FAQ.md)
D
Dong Daxiang 已提交
281

W
wangjiawei04 已提交
282
<h2 align="center">Community</h2>
D
Dong Daxiang 已提交
283

D
Dong Daxiang 已提交
284
### Slack
D
Dong Daxiang 已提交
285

D
Dong Daxiang 已提交
286 287
To connect with other users and contributors, welcome to join our [Slack channel](https://paddleserving.slack.com/archives/CUBPKHKMJ)

D
Dong Daxiang 已提交
288
### Contribution
D
Dong Daxiang 已提交
289

D
Dong Daxiang 已提交
290
If you want to contribute code to Paddle Serving, please reference [Contribution Guidelines](doc/CONTRIBUTE.md)
D
Dong Daxiang 已提交
291

J
Jiawei Wang 已提交
292 293 294
- Special Thanks to [@BeyondYourself](https://github.com/BeyondYourself) in complementing the gRPC tutorial, updating the FAQ doc and modifying the mdkir command
- Special Thanks to [@mcl-stone](https://github.com/mcl-stone) in updating faster_rcnn benchmark
- Special Thanks to [@cg82616424](https://github.com/cg82616424) in updating the unet benchmark and modifying resize comment error
295
- Special Thanks to [@cuicheng01](https://github.com/cuicheng01) for providing 11 PaddleClas models
P
PaddlePM 已提交
296

D
Dong Daxiang 已提交
297
### Feedback
D
Dong Daxiang 已提交
298

D
Dong Daxiang 已提交
299 300
For any feedback or to report a bug, please propose a [GitHub Issue](https://github.com/PaddlePaddle/Serving/issues).

D
Dong Daxiang 已提交
301 302
### License

D
Dong Daxiang 已提交
303
[Apache 2.0 License](https://github.com/PaddlePaddle/Serving/blob/develop/LICENSE)