README.md 6.8 KB
Newer Older
J
jingqinghe 已提交
1
## Instructions for PaddleFL-MPC UCI Housing Demo
2 3 4 5 6

([简体中文](./README_CN.md)|English)

This document introduces how to run UCI Housing demo based on Paddle-MPC, which has two ways of running, i.e., single machine and multi machines.

J
jingqinghe 已提交
7
### 1. Running on Single Machine
8

J
jingqinghe 已提交
9
#### (1). Prepare Data
10 11 12 13 14 15 16 17 18 19

Generate encrypted data utilizing `generate_encrypted_data()` in `process_data.py` script. For example, users can write the following code into a python script named `prepare.py`, and then run the script with command `python prepare.py`.

```python
import process_data
process_data.generate_encrypted_data()
```

Encrypted data files of feature and label would be generated and saved in `/tmp` directory. Different suffix names are used for these files to indicate the ownership of different computation parties. For instance, a file named `house_feature.part0` means it is a feature file of party 0.

J
jingqinghe 已提交
20
#### (2). Launch Demo with A Shell Script
21

H
heya02 已提交
22 23 24 25 26 27 28 29 30
You should set the env params as follow:

```
export PYTHON=/yor/python
export PATH_TO_REDIS_BIN=/path/to/redis_bin
export LOCALHOST=/your/localhost
export REDIS_PORT=/your/redis/port
```

31 32 33
Launch demo with the `run_standalone.sh` script. The concrete command is:

```bash
H
bug_fix  
heya02 已提交
34
bash run_standalone.sh uci_demo.py
35 36 37 38 39 40
```

The loss with cypher text format will be displayed on screen while training. At the same time, the loss data would be also save in `/tmp` directory, and the format of file name is similar to what is described in Step 1.

Besides, predictions would be made in this demo once training is finished. The predictions with cypher text format would also be save in `/tmp` directory.

H
heya02 已提交
41 42
#### (3). Decrypt Data

43 44
Finally, using `load_decrypt_data()` in `process_data.py` script, this demo would decrypt and print the loss and predictions, which can be compared with related results of Paddle plain text model.

H
heya02 已提交
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
For example, users can write the following code into a python script named `decrypt_save.py`, and then run the script with command `python decrypt_save.py decrypt_loss_file decrypt_prediction_file`. The decrypted loss and prediction results would be saved into two files correspondingly.

```python
import sys

import process_data


decrypt_loss_file=sys.argv[1]
decrypt_prediction_file=sys.argv[2]
BATCH_SIZE=10
process_data.load_decrypt_data("/tmp/uci_loss", (1, ), decrypt_loss_file)
process_data.load_decrypt_data("/tmp/uci_prediction", (BATCH_SIZE, ), decrypt_prediction_file)
```

60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76
**Note** that remember to delete the loss and prediction files in `/tmp` directory generated in last running, in case of any influence on the decrypted results of current running. For simplifying users operations, we provide the following commands in `run_standalone.sh`, which can delete the files mentioned above when running this script.

```bash
# remove temp data generated in last time
LOSS_FILE="/tmp/uci_loss.*"
PRED_FILE="/tmp/uci_prediction.*"
if [ "$LOSS_FILE" ]; then
        rm -rf $LOSS_FILE
fi

if [ "$PRED_FILE" ]; then
        rm -rf $PRED_FILE
fi
```



J
jingqinghe 已提交
77
### 2. Running on Multi Machines
78

J
jingqinghe 已提交
79
#### (1). Prepare Data
80 81 82

Data owner encrypts data. Concrete operations are consistent with “Prepare Data” in “Running on Single Machine”.

J
jingqinghe 已提交
83
#### (2). Distribute Encrypted Data
84 85 86

According to the suffix of file name, distribute encrypted data files to `/tmp ` directories of all 3 computation parties. For example, send `house_feature.part0` and `house_label.part0` to `/tmp` of party 0 with `scp` command.

H
bug_fix  
heya02 已提交
87
#### (3). Modify uci_demo.py
88

H
bug_fix  
heya02 已提交
89
Each computation party makes the following modifications on `uci_demo.py` according to the environment of machine.
90 91 92 93 94 95 96 97 98

* Modify IP Information

  Modify `localhost` in the following code as the IP address of the machine.

  ```python
  pfl_mpc.init("aby3", int(role), "localhost", server, int(port))
  ```

J
jingqinghe 已提交
99
#### (4). Launch Demo on Each Party
100 101 102 103 104 105 106 107 108 109

**Note** that Redis service is necessary for demo running. Remember to clear the cache of Redis server before launching demo on each computation party, in order to avoid any negative influences caused by the cached records in Redis. The following command can be used for clear Redis, where REDIS_BIN is the executable binary of redis-cli, SERVER and PORT represent the IP and port of Redis server respectively.

```
$REDIS_BIN -h $SERVER -p $PORT flushall
```

Launch demo on each computation party with the following command,

```
H
bug_fix  
heya02 已提交
110
$PYTHON_EXECUTABLE uci_demo.py $PARTY_ID $SERVER $PORT
111 112 113 114 115 116
```

where PYTHON_EXECUTABLE is the python which installs PaddleFL, PARTY_ID is the ID of computation party, which is 0, 1, or 2, SERVER and PORT represent the IP and port of Redis server respectively.

Similarly, training loss with cypher text format would be printed on the screen of each computation party. And at the same time, the loss and predictions would be saved in `/tmp` directory.

J
jingqinghe 已提交
117
**Note** that remember to delete the loss and prediction files in `/tmp` directory generated in last running, in case of any influence on the decrypted results of current running.
118

J
jingqinghe 已提交
119
#### (5). Decrypt Loss and Prediction Data
120 121 122

Each computation party sends `uci_loss.part` and `uci_prediction.part` files in `/tmp` directory to the `/tmp` directory of data owner. Data owner decrypts and gets the plain text of loss and predictions with ` load_decrypt_data()` in `process_data.py`.

H
heya02 已提交
123
For example, the following code can be written into a python script to decrypt and print training loss and predictions.
124 125

```python
H
heya02 已提交
126 127
import sys

128 129 130
import process_data


H
heya02 已提交
131 132 133 134 135
decrypt_loss_file=sys.argv[1]
decrypt_prediction_file=sys.argv[2]
BATCH_SIZE=10
process_data.load_decrypt_data("/tmp/uci_loss", (1, ), decrypt_loss_file)
process_data.load_decrypt_data("/tmp/uci_prediction", (BATCH_SIZE, ), decrypt_prediction_file)
136 137
```

J
jingqinghe 已提交
138
### 3. Convergence of paddle_fl.mpc vs paddle
J
jingqinghe 已提交
139

J
jingqinghe 已提交
140
Below, is the result of our experiment to test the convergence of paddle_fl.mpc on single machine.
J
jingqinghe 已提交
141 142


J
jingqinghe 已提交
143 144
#### (1). Training Parameters

J
jingqinghe 已提交
145 146 147 148
- Dataset: Boston house price dataset
- Number of Epoch: 20
- Batch Size: 10

J
jingqinghe 已提交
149
#### (2). Experiment Results
J
jingqinghe 已提交
150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173

| Epoch/Step | paddle_fl.mpc | Paddle |
| ---------- | ------------- | ------ |
| Epoch=0, Step=0  | 738.39491 | 738.46204 |
| Epoch=1, Step=0  | 630.68834 | 629.9071 |
| Epoch=2, Step=0  | 539.54683 | 538.1757 |
| Epoch=3, Step=0  | 462.41159 | 460.64722 |
| Epoch=4, Step=0  | 397.11516 | 395.11017 |
| Epoch=5, Step=0  | 341.83102 | 339.69815 |
| Epoch=6, Step=0  | 295.01114 | 292.83597 |
| Epoch=7, Step=0  | 255.35141 | 253.19429 |
| Epoch=8, Step=0  | 221.74739 | 219.65132 |
| Epoch=9, Step=0  | 193.26459 | 191.25981 |
| Epoch=10, Step=0  | 169.11423 | 167.2204 |
| Epoch=11, Step=0  | 148.63138 | 146.85835 |
| Epoch=12, Step=0  | 131.25081 | 129.60391 |
| Epoch=13, Step=0  | 116.49708 | 114.97599 |
| Epoch=14, Step=0  | 103.96669 | 102.56854 |
| Epoch=15, Step=0  | 93.31706 | 92.03858 |
| Epoch=16, Step=0  | 84.26219 | 83.09653 |
| Epoch=17, Step=0  | 76.55664 | 75.49785 |
| Epoch=18, Step=0  | 69.99673 | 69.03561 |
| Epoch=19, Step=0  | 64.40562 | 63.53539 |