ABTEST_IN_PADDLE_SERVING.md 4.4 KB
Newer Older
B
add doc  
barrierye 已提交
1 2
# ABTEST in Paddle Serving

J
Jiawei Wang 已提交
3 4
([简体中文](./ABTEST_IN_PADDLE_SERVING_CN.md)|English)

B
add doc  
barrierye 已提交
5 6
This document will use an example of text classification task based on IMDB dataset to show how to build a A/B Test framework using Paddle Serving. The structure relationship between the client and servers in the example is shown in the figure below.

B
barrierye 已提交
7
<img src="abtest.png" style="zoom:33%;" />
B
add doc  
barrierye 已提交
8 9 10 11 12 13 14 15 16 17 18 19 20 21

Note that:  A/B Test is only applicable to RPC mode, not web mode.

### Download Data and Models

```shell
cd Serving/python/examples/imdb
sh get_data.sh
```

### Processing Data

The following Python code will process the data `test_data/part-0` and write to the `processed.data` file.

W
wangjiawei04 已提交
22
[//file]:#process.py
B
add doc  
barrierye 已提交
23
``` python
M
MRXLT 已提交
24
from paddle_serving_app.reader import IMDBDataset
B
add doc  
barrierye 已提交
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
imdb_dataset = IMDBDataset()
imdb_dataset.load_resource('imdb.vocab')

with open('test_data/part-0') as fin:
    with open('processed.data', 'w') as fout:
        for line in fin:
            word_ids, label = imdb_dataset.get_words_and_label(line)
            fout.write("{};{}\n".format(','.join([str(x) for x in word_ids]), label[0]))
```

### Start Server

Here, we [use docker](https://github.com/PaddlePaddle/Serving/blob/develop/doc/RUN_IN_DOCKER.md) to start the server-side service. 

First, start the BOW server, which enables the `8000` port:

``` shell
B
barrierye 已提交
42
docker run -dit -v $PWD/imdb_bow_model:/model -p 8000:8000 --name bow-server hub.baidubce.com/paddlepaddle/serving:latest
B
add doc  
barrierye 已提交
43 44 45 46 47 48 49 50 51
docker exec -it bow-server bash
pip install paddle-serving-server
python -m paddle_serving_server.serve --model model --port 8000 >std.log 2>err.log &
exit
```

Similarly, start the LSTM server, which enables the `9000` port:

```bash
B
barrierye 已提交
52
docker run -dit -v $PWD/imdb_lstm_model:/model -p 9000:9000 --name lstm-server hub.baidubce.com/paddlepaddle/serving:latest
B
add doc  
barrierye 已提交
53 54 55 56 57 58 59 60 61 62
docker exec -it lstm-server bash
pip install paddle-serving-server
python -m paddle_serving_server.serve --model model --port 9000 >std.log 2>err.log &
exit
```

### Start Client

Run the following Python code on the host computer to start client. Make sure that the host computer is installed with the `paddle-serving-client` package.

W
wangjiawei04 已提交
63
[//file]:#ab_client.py
J
Jiawei Wang 已提交
64
``` python
B
add doc  
barrierye 已提交
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80
from paddle_serving_client import Client

client = Client()
client.load_client_config('imdb_bow_client_conf/serving_client_conf.prototxt')
client.add_variant("bow", ["127.0.0.1:8000"], 10)
client.add_variant("lstm", ["127.0.0.1:9000"], 90)
client.connect()

with open('processed.data') as f:
    cnt = {"bow": {'acc': 0, 'total': 0}, "lstm": {'acc': 0, 'total': 0}}
    for line in f:
        word_ids, label = line.split(';')
        word_ids = [int(x) for x in word_ids.split(',')]
        feed = {"words": word_ids}
        fetch = ["acc", "cost", "prediction"]
        [fetch_map, tag] = client.predict(feed=feed, fetch=fetch, need_variant_tag=True)
M
MRXLT 已提交
81
        if (float(fetch_map["prediction"][0][1]) - 0.5) * (float(label[0]) - 0.5) > 0:
B
add doc  
barrierye 已提交
82 83 84 85 86 87 88 89 90
            cnt[tag]['acc'] += 1
        cnt[tag]['total'] += 1

    for tag, data in cnt.items():
        print('[{}](total: {}) acc: {}'.format(tag, data['total'], float(data['acc']) / float(data['total'])))
```

In the code, the function `client.add_variant(tag, clusters, variant_weight)` is to add a variant with label `tag` and flow weight `variant_weight`. In this example, a BOW variant with label of `bow` and flow weight of `10`, and an LSTM variant with label of `lstm` and a flow weight of `90` are added. The flow on the client side will be distributed to two variants according to the ratio of `10:90`.

M
fix doc  
MRXLT 已提交
91
When making prediction on the client side, if the parameter `need_variant_tag=True` is specified, the response will contain the variant tag corresponding to the distribution flow.
B
add doc  
barrierye 已提交
92 93 94 95 96 97 98

### Expected Results

``` python
[lstm](total: 1867) acc: 0.490091055169
[bow](total: 217) acc: 0.73732718894
```
J
Jiawei Wang 已提交
99 100

<!--
W
wangjiawei04 已提交
101 102
cp ../Serving/python/examples/imdb/get_data.sh .
cp ../Serving/python/examples/imdb/imdb_reader.py .
J
Jiawei Wang 已提交
103 104 105 106 107
pip install -U paddle_serving_server
pip install -U paddle_serving_client
pip install -U paddlepaddle
sh get_data.sh
python process.py
W
wangjiawei04 已提交
108 109 110 111
python -m paddle_serving_server.serve --model imdb_bow_model --port 8000 --workdir workdir1 &
sleep 5
python -m paddle_serving_server.serve --model imdb_lstm_model --port 9000  --workdir workdir2 &
sleep 5
J
Jiawei Wang 已提交
112 113 114 115 116 117
python ab_client.py >log.txt
if [[ $? -eq 0 ]]; then
    echo "test success"
else
    echo "test fail"
fi
J
Jiawei Wang 已提交
118
ps -ef | grep "paddle_serving_server" | grep -v grep | awk '{print $2}' | xargs kill
J
Jiawei Wang 已提交
119
-->