HOT_LOADING_IN_SERVING_CN.md 9.5 KB
Newer Older
B
barrierye 已提交
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
# Paddle Serving中的模型热加载

## 背景

在实际的工业场景下,通常是定期不间断产出模型,服务端需要在服务不中断的情况下按时更新迭代模型。

这里用本地搭建FTP的形式,模拟监控远程模型,拉取更新本地模型,来展示Paddle Serving的模型热加载功能。

## 示例

示例目录结构,示例中用`local_path`来模拟本地,用`remote_path`来模拟远程:

```shell
.
├── local_path
└── remote_path
```

### 远程部分

进入`remote_path`文件夹:

```shell
cd remote_path
```

#### 生产远程模型

运行下面的Python代码生产模型。

```python
import os
import time
import paddle
import paddle.fluid as fluid
import paddle_serving_client.io as serving_io

train_reader = paddle.batch(
    paddle.reader.shuffle(
        paddle.dataset.uci_housing.train(), buf_size=500),
    batch_size=16)

test_reader = paddle.batch(
    paddle.reader.shuffle(
        paddle.dataset.uci_housing.test(), buf_size=500),
    batch_size=16)

x = fluid.data(name='x', shape=[None, 13], dtype='float32')
y = fluid.data(name='y', shape=[None, 1], dtype='float32')

y_predict = fluid.layers.fc(input=x, size=1, act=None)
cost = fluid.layers.square_error_cost(input=y_predict, label=y)
avg_loss = fluid.layers.mean(cost)
sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.01)
sgd_optimizer.minimize(avg_loss)

place = fluid.CPUPlace()
feeder = fluid.DataFeeder(place=place, feed_list=[x, y])
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())

for pass_id in range(30):
    for data_train in train_reader():
        avg_loss_value, = exe.run(fluid.default_main_program(),
                                  feed=feeder.feed(data_train),
                                  fetch_list=[avg_loss])
    time.sleep(60) # Simulate the production model every other period of time
    serving_io.save_model("uci_housing_model", "uci_housing_client",
                          {"x": x}, {"price": y_predict},
                          fluid.default_main_program())
    os.system('touch donefile')
    print('save {}'.format(pass_id))
```

上面的代码会每隔 60 秒在当前目录下产出 Boston 房价预测模型`uci_housing_model`,并在每次产出后更新时间戳文件`donefile`

```shell
.
├── donefile           # timestamp file 
B
barrierye 已提交
80
├── produce_model.py
B
barrierye 已提交
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193
├── uci_housing_client
└── uci_housing_model  # output model
```

#### 启动FTP服务

这里使用`pyftpdlib`开启FTP服务,执行下面的命令(您可能需要使用`pip install pyftpdlib `来安装相关的库):

```shell
python -m pyftpdlib -p 8080
```



### 本地部分

进入`local_path`文件夹:

```shell
cd local_path
```

#### 用初始模型启动Server端

这里使用预训练的 Boston 房价预测模型作为初始模型:

```shell
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
tar -xzf uci_housing.tar.gz
```

启动Server端:

```shell
python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292
```

#### 执行监控程序

用下面的命令来执行监控程序,通过轮询方式监控远程地址的时间戳文件`donefile`,当时间戳变更则认为远程模型已经更新,将远程模型拉取到本地临时路径下(默认为`./tmp`),更新本地模型以及Paddle Serving的时间戳文件`fluid_time_file`

```shell
python -m paddle_serving_server.monitor --type='ftp' --ftp_ip='127.0.0.1' --ftp_port='8080' --remote_path='/' --remote_model_name='uci_housing_model' --remote_donefile_name='donefile' --local_path='./' --local_model_name='uci_housing_model' --local_donefile_name='fluid_time_file' --local_tmp_dir='tmp'
```

上面的代码会监控远程路径`ftp://127.0.0.1:8080/`下的`donefile`文件来判断远程模型是否更新,若已经更新则将远程模型`ftp://127.0.0.1:8080/uci_housing_model`拉取到本地`./tmp`路径下,之后更新本地路径的模型`./uci_housing_model`,并更新Paddle Serving的时间戳文件`./uci_housing_model/fluid_time_file`

#### 查看Server日志

通过下面命令查看Server的运行日志:

```shell
tail -f log/serving.INFO
```

日志中显示模型已经被热加载:

```shell
W0327 19:00:38.498729  5559 infer.h:509] td_core[0x7f20e8068f10] clone model from pd_core[0x7f20e8005f90] succ, cur_idx[1].
W0327 19:00:38.498737  5559 infer.h:489] Succ load clone model, path[uci_housing_model]
W0327 19:00:38.498744  5559 infer.h:656] Succ reload version engine: 18446744073709551615
I0327 19:00:38.498752  5559 manager.h:131] Finish reload 1 workflow(s)
I0327 19:00:48.498860  5559 server.cpp:150] Begin reload framework...
W0327 19:00:48.498947  5559 infer.h:656] Succ reload version engine: 18446744073709551615
I0327 19:00:48.498970  5559 manager.h:131] Finish reload 1 workflow(s)
I0327 19:00:58.499076  5559 server.cpp:150] Begin reload framework...
W0327 19:00:58.499167  5559 infer.h:656] Succ reload version engine: 18446744073709551615
I0327 19:00:58.499181  5559 manager.h:131] Finish reload 1 workflow(s)
I0327 19:01:08.499277  5559 server.cpp:150] Begin reload framework...
W0327 19:01:08.499366  5559 infer.h:656] Succ reload version engine: 18446744073709551615
I0327 19:01:08.499379  5559 manager.h:131] Finish reload 1 workflow(s)
I0327 19:01:18.499492  5559 server.cpp:150] Begin reload framework...
W0327 19:01:18.499637  5559 infer.h:656] Succ reload version engine: 18446744073709551615
I0327 19:01:18.499655  5559 manager.h:131] Finish reload 1 workflow(s)
I0327 19:01:28.499745  5559 server.cpp:150] Begin reload framework...
W0327 19:01:28.499814  5559 infer.h:250] begin reload model[uci_housing_model].
I0327 19:01:28.500083  5559 infer.h:66] InferEngineCreationParams: model_path = uci_housing_model, enable_memory_optimization = 0, static_optimization = 0, force_update_static_cache = 0
I0327 19:01:28.500160  5559 analysis_predictor.cc:84] Profiler is deactivated, and no profiling report will be generated.
W0327 19:01:28.500176  5559 init.cc:159] AVX is available, Please re-compile on local machine
I0327 19:01:28.500628  5559 analysis_predictor.cc:833] MODEL VERSION: 0.0.0
I0327 19:01:28.500653  5559 analysis_predictor.cc:835] PREDICTOR VERSION: 1.7.1
I0327 19:01:28.502399  5559 graph_pattern_detector.cc:101] ---  detected 1 subgraphs
I0327 19:01:28.504007  5559 analysis_predictor.cc:462] ======= optimize end =======
W0327 19:01:28.504101  5559 infer.h:472] Succ load common model[0x7f20e806b8b0], path[uci_housing_model].
I0327 19:01:28.504154  5559 analysis_predictor.cc:84] Profiler is deactivated, and no profiling report will be generated.
W0327 19:01:28.504194  5559 infer.h:509] td_core[0x7f20e80b9680] clone model from pd_core[0x7f20e806b8b0] succ, cur_idx[0].
I0327 19:01:28.504287  5559 analysis_predictor.cc:84] Profiler is deactivated, and no profiling report will be generated.
W0327 19:01:28.504330  5559 infer.h:509] td_core[0x7f20e80bf1e0] clone model from pd_core[0x7f20e806b8b0] succ, cur_idx[0].
I0327 19:01:28.504365  5559 analysis_predictor.cc:84] Profiler is deactivated, and no profiling report will be generated.
W0327 19:01:28.504403  5559 infer.h:509] td_core[0x7f20e80af2a0] clone model from pd_core[0x7f20e806b8b0] succ, cur_idx[0].
I0327 19:01:28.504436  5559 analysis_predictor.cc:84] Profiler is deactivated, and no profiling report will be generated.
W0327 19:01:28.504483  5559 infer.h:509] td_core[0x7f20e8004a00] clone model from pd_core[0x7f20e806b8b0] succ, cur_idx[0].
I0327 19:01:28.504516  5559 analysis_predictor.cc:84] Profiler is deactivated, and no profiling report will be generated.
W0327 19:01:28.504551  5559 infer.h:509] td_core[0x7f20e80a8960] clone model from pd_core[0x7f20e806b8b0] succ, cur_idx[0].
I0327 19:01:28.504580  5559 analysis_predictor.cc:84] Profiler is deactivated, and no profiling report will be generated.
W0327 19:01:28.504611  5559 infer.h:509] td_core[0x7f20e80a4bd0] clone model from pd_core[0x7f20e806b8b0] succ, cur_idx[0].
I0327 19:01:28.504639  5559 analysis_predictor.cc:84] Profiler is deactivated, and no profiling report will be generated.
W0327 19:01:28.504669  5559 infer.h:509] td_core[0x7f20e80b8f20] clone model from pd_core[0x7f20e806b8b0] succ, cur_idx[0].
I0327 19:01:28.504699  5559 analysis_predictor.cc:84] Profiler is deactivated, and no profiling report will be generated.
W0327 19:01:28.504730  5559 infer.h:509] td_core[0x7f20e80a4ab0] clone model from pd_core[0x7f20e806b8b0] succ, cur_idx[0].
I0327 19:01:28.504760  5559 analysis_predictor.cc:84] Profiler is deactivated, and no profiling report will be generated.
W0327 19:01:28.504796  5559 infer.h:509] td_core[0x7f20e807ee40] clone model from pd_core[0x7f20e806b8b0] succ, cur_idx[0].
I0327 19:01:28.504827  5559 analysis_predictor.cc:84] Profiler is deactivated, and no profiling report will be generated.
W0327 19:01:28.504904  5559 infer.h:509] td_core[0x7f20e8085900] clone model from pd_core[0x7f20e806b8b0] succ, cur_idx[0].
I0327 19:01:28.505043  5559 analysis_predictor.cc:84] Profiler is deactivated, and no profiling report will be generated.
W0327 19:01:28.505097  5559 infer.h:509] td_core[0x7f20e8088500] clone model from pd_core[0x7f20e806b8b0] succ, cur_idx[0].
W0327 19:01:28.505110  5559 infer.h:489] Succ load clone model, path[uci_housing_model]
W0327 19:01:28.505120  5559 infer.h:656] Succ reload version engine: 18446744073709551615
I0327 19:01:28.505131  5559 manager.h:131] Finish reload 1 workflow(s)
I0327 19:01:38.505468  5559 server.cpp:150] Begin reload framework...
W0327 19:01:38.505568  5559 infer.h:656] Succ reload version engine: 18446744073709551615
I0327 19:01:38.505584  5559 manager.h:131] Finish reload 1 workflow(s)
```