gru4rec_examples.md 3.4 KB
Newer Older
F
frankwhzhang 已提交
1
# Example to train gru4rec model with FedAvg Strategy
D
dongdaxiang 已提交
2

F
frankwhzhang 已提交
3
This doc introduce how to use PaddleFL to train model with Fl Strategy.
D
dongdaxiang 已提交
4

F
frankwhzhang 已提交
5 6
### Dependencies
- paddlepaddle>=1.6
D
dongdaxiang 已提交
7

F
frankwhzhang 已提交
8 9 10 11 12 13 14 15 16 17 18
### How to install PaddleFL
please use the python which has installed paddlepaddle.
```sh
python setup.py install
```

### Model
[Gru4rec](https://arxiv.org/abs/1511.06939) is the classical session-based recommendation model. The details implement by paddlepaddle is [here](https://github.com/PaddlePaddle/models/tree/develop/PaddleRec/gru4rec).


### Datasets
F
frankwhzhang 已提交
19
Public Dataset [Rsc15](https://2015.recsyschallenge.com) 
F
frankwhzhang 已提交
20 21 22 23 24 25 26 27 28 29

```sh
#download data
cd example/gru4rec_demo
sh download.sh
```

### How to work in PaddleFL
PaddleFL has two period , CompileTime and RunTime. In CompileTime, define a federated learning task by fl_master. In RunTime, train a federated learning job by fl_server and fl_trainer .

F
frankwhzhang 已提交
30 31 32 33
```sh
sh run.sh
```

F
frankwhzhang 已提交
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
### How to work in CompileTime
In this example, we implement it in fl_master.py
```sh
# please run fl_master to generate fl_job
python fl_master.py
```
In fl_master.py,  we first define FL-Strategy, User-Defined-Program and Distributed-Config. Then FL-Job-Generator generate FL-Job for federated server and worker.
```python
# define model
model = Model()
model.gru4rec_network()

# define JobGenerator and set model config
# feed_name and target_name are config for save model.
job_generator = JobGenerator()
optimizer = fluid.optimizer.SGD(learning_rate=2.0)
job_generator.set_optimizer(optimizer)
job_generator.set_losses([model.loss])
job_generator.set_startup_program(model.startup_program)
job_generator.set_infer_feed_and_target_names(
    [x.name for x in model.inputs], [model.loss.name, model.recall.name])

# define FL-Strategy , we now support two flstrategy, fed_avg and dpsgd. Inner_step means fl_trainer locally train inner_step mini-batch.
build_strategy = FLStrategyFactory()
build_strategy.fed_avg = True
build_strategy.inner_step = 1
strategy = build_strategy.create_fl_strategy()

# define Distributed-Config and generate fl_job 
endpoints = ["127.0.0.1:8181"]
output = "fl_job_config"
job_generator.generate_fl_job(
    strategy, server_endpoints=endpoints, worker_num=2, output=output)

```
D
dongdaxiang 已提交
69

F
frankwhzhang 已提交
70 71 72 73 74 75 76
### How to work in RunTime

```sh 
python -u fl_server.py >server0.log &
python -u fl_trainer.py 0 data/ >trainer0.log &
python -u fl_trainer.py 1 data/ >trainer1.log &
```
F
frankwhzhang 已提交
77
fl_trainer.py can define own reader according to data. 
D
dongdaxiang 已提交
78
```python
F
frankwhzhang 已提交
79 80
r = Gru4rec_Reader()
train_reader = r.reader(train_file_dir, place, batch_size=10)
D
dongdaxiang 已提交
81 82 83
```

### Performance
F
frankwhzhang 已提交
84
We train gru4rec model with FedAvg Strategy for 40 epochs. We use first 1/20 rsc15 data as our dataset including 40w session and 3w7 item dictionary. We also constuct baselines including standard single mode and distributed parameter server mode.
D
dongdaxiang 已提交
85

F
frankwhzhang 已提交
86 87 88
```sh
# download code and readme
wget https://paddle-zwh.bj.bcebos.com/gru4rec_paddlefl_benchmark/gru4rec_benchmark.tar
D
dongdaxiang 已提交
89 90
```

F
frankwhzhang 已提交
91 92 93
| Dataset | single/distributed | distribute mode | recall@20|
| --- | --- | --- |---|
| all data | single | - | 0.508 | 
F
frankwhzhang 已提交
94
| all data | distributed 4 node | Parameter Server  | 0.504 |
F
frankwhzhang 已提交
95 96 97 98 99
| all data | distributed 4 node | FedAvg | 0.504 | 
| 1/4 part-0 | single | - | 0.286 | 
| 1/4 part-1 | single | - | 0.277 | 
| 1/4 part-2 | single | - | 0.269 | 
| 1/4 part-3 | single | - | 0.282 | 
D
dongdaxiang 已提交
100

F
frankwhzhang 已提交
101 102
We can find Distributed mode PS and FedAvg is equal in recall@20 ,  and more data could offer better result.

D
dongdaxiang 已提交
103

F
frankwhzhang 已提交
104 105
<img src="fl_benchmark.png" height=300 width=500 hspace='10'/> <br />

D
dongdaxiang 已提交
106 107