ABTEST_IN_PADDLE_SERVING_CN.md 3.5 KB
Newer Older
B
add doc  
barrierye 已提交
1 2
# 如何使用Paddle Serving做ABTEST

3 4
(简体中文|[English](./ABTEST_IN_PADDLE_SERVING.md))

B
add doc  
barrierye 已提交
5 6
该文档将会用一个基于IMDB数据集的文本分类任务的例子,介绍如何使用Paddle Serving搭建A/B Test框架,例中的Client端、Server端结构如下图所示。

B
barrierye 已提交
7
<img src="abtest.png" style="zoom:33%;" />
B
add doc  
barrierye 已提交
8

B
barrierye 已提交
9
需要注意的是:A/B Test只适用于RPC模式,不适用于WEB模式。
B
add doc  
barrierye 已提交
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

### 下载数据以及模型

``` shell
cd Serving/python/examples/imdb
sh get_data.sh
```

### 处理数据

下面Python代码将处理`test_data/part-0`的数据,写入`processed.data`文件中。

```python
from imdb_reader import IMDBDataset
imdb_dataset = IMDBDataset()
imdb_dataset.load_resource('imdb.vocab')

with open('test_data/part-0') as fin:
    with open('processed.data', 'w') as fout:
        for line in fin:
            word_ids, label = imdb_dataset.get_words_and_label(line)
            fout.write("{};{}\n".format(','.join([str(x) for x in word_ids]), label[0]))
```

### 启动Server端

这里采用[Docker方式](https://github.com/PaddlePaddle/Serving/blob/develop/doc/RUN_IN_DOCKER_CN.md)启动Server端服务。

首先启动BOW Server,该服务启用`8000`端口:

```bash
41
docker run -dit -v $PWD/imdb_bow_model:/model -p 8000:8000 --name bow-server hub.baidubce.com/paddlepaddle/serving:0.1.3
B
add doc  
barrierye 已提交
42 43 44 45 46 47 48 49 50
docker exec -it bow-server bash
pip install paddle-serving-server
python -m paddle_serving_server.serve --model model --port 8000 >std.log 2>err.log &
exit
```

同理启动LSTM Server,该服务启用`9000`端口:

```bash
51
docker run -dit -v $PWD/imdb_lstm_model:/model -p 9000:9000 --name lstm-server hub.baidubce.com/paddlepaddle/serving:0.1.3
B
add doc  
barrierye 已提交
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96
docker exec -it lstm-server bash
pip install paddle-serving-server
python -m paddle_serving_server.serve --model model --port 9000 >std.log 2>err.log &
exit
```

### 启动Client端

在宿主机运行下面Python代码启动Client端,需要确保宿主机装好`paddle-serving-client`包。

```python
from paddle_serving_client import Client

client = Client()
client.load_client_config('imdb_bow_client_conf/serving_client_conf.prototxt')
client.add_variant("bow", ["127.0.0.1:8000"], 10)
client.add_variant("lstm", ["127.0.0.1:9000"], 90)
client.connect()

with open('processed.data') as f:
    cnt = {"bow": {'acc': 0, 'total': 0}, "lstm": {'acc': 0, 'total': 0}}
    for line in f:
        word_ids, label = line.split(';')
        word_ids = [int(x) for x in word_ids.split(',')]
        feed = {"words": word_ids}
        fetch = ["acc", "cost", "prediction"]
        [fetch_map, tag] = client.predict(feed=feed, fetch=fetch, need_variant_tag=True)
        if (float(fetch_map["prediction"][1]) - 0.5) * (float(label[0]) - 0.5) > 0:
            cnt[tag]['acc'] += 1
        cnt[tag]['total'] += 1

    for tag, data in cnt.items():
        print('[{}](total: {}) acc: {}'.format(tag, data['total'], float(data['acc']) / float(data['total'])))
```

代码中,`client.add_variant(tag, clusters, variant_weight)`是为了添加一个标签为`tag`、流量权重为`variant_weight`的variant。在这个样例中,添加了一个标签为`bow`、流量权重为`10`的BOW variant,以及一个标签为`lstm`、流量权重为`90`的LSTM variant。Client端的流量会根据`10:90`的比例分发到两个variant。

Client端做预测时,若指定参数`need_variant_tag=True`,返回值则包含分发流量对应的variant标签。

### 预期结果

``` bash
[lstm](total: 1867) acc: 0.490091055169
[bow](total: 217) acc: 0.73732718894
```