README.md 4.7 KB
Newer Older
J
jingqinghe 已提交
1
## Instructions for PaddleFL-MPC MNIST Demo
2 3 4 5 6

([简体中文](./README_CN.md)|English)

This document introduces how to run MNIST demo based on Paddle-MPC, which has two ways of running, i.e., single machine and multi machines.

J
jingqinghe 已提交
7
### 1. Running on Single Machine
8

J
jingqinghe 已提交
9
#### (1). Prepare Data
10

11
Generate encrypted training and testing data utilizing `generate_encrypted_data()` and `generate_encrypted_test_data()` in `process_data.py` script. Users can run the script with command `python process_data.py` to generate encrypted feature and label in given directory, e.g., `./mpc_data/`. Users can specify `class_num` (2 or 10) to determine the encrypted data is for `logisticfc_sigmoid`(two classes) or `lenet` and `logistic_fc_softmax`(10 classes) network.  Different suffix names are used for these files to indicate the ownership of different computation parties. For instance, a file named `mnist2_feature.part0` means it is a feature file of party 0.
12

J
jingqinghe 已提交
13
#### (2). Launch Demo with A Shell Script
14

H
heya02 已提交
15 16 17 18 19 20 21 22 23
You should set the env params as follow:

```
export PYTHON=/yor/python
export PATH_TO_REDIS_BIN=/path/to/redis_bin
export LOCALHOST=/your/localhost
export REDIS_PORT=/your/redis/port
```

24 25 26
Launch demo with the `run_standalone.sh` script. The concrete command is:

```bash
27
bash run_standalone.sh train_fc_sigmoid.py
28 29 30 31
```

The information of current epoch and step will be displayed on screen while training, as well as the total cost time when traning finished.

32
Besides, predictions would be made in this demo once training is finished. The predictions with cypher text format would be save in `./mpc_infer_data/` directory (users can modify it in the python script `train_fc_sigmoid.py`), and the format of file name is similar to what is described in Step 1.
33

J
jingqinghe 已提交
34
#### (3). Decrypt Data
35

H
heya02 已提交
36
Decrypt the saved prediction data and save the decrypted prediction results into a specified file using `decrypt_data_to_file()` in `process_data.py` script. For example, users can write the following code into a python script named `decrypt_save.py`, and then run the script with command `python decrypt_save.py decrypt_file`. The decrypted prediction results would be saved into `decrypt_file`.
37 38

```python
H
heya02 已提交
39 40 41
import sys

decrypt_file=sys.argv[1]
42
import process_data
H
heya02 已提交
43
process_data.decrypt_data_to_file("/tmp/mnist_output_prediction", (BATCH_SIZE,), decrypt_file)
44 45 46
```


J
jingqinghe 已提交
47
### 2. Running on Multi Machines
48

J
jingqinghe 已提交
49
#### (1). Prepare Data
50 51 52

Data owner encrypts data. Concrete operations are consistent with “Prepare Data” in “Running on Single Machine”.

J
jingqinghe 已提交
53
#### (2). Distribute Encrypted Data
54

55
According to the suffix of file name, distribute encrypted data files to `./mpc_data/ ` directories of all 3 computation parties. For example, send `mnist2_feature.part0` and `mnist2_label.part0` to `./mpc_data/` of party 0 with `scp` command.
56

57
#### (3). Modify train_fc_sigmoid.py
58 59 60 61 62 63 64

Each computation party modifies `localhost` in the following code as the IP address of it's machine.

```python
pfl_mpc.init("aby3", int(role), "localhost", server, int(port))
```

J
jingqinghe 已提交
65
#### (4). Launch Demo on Each Party
66 67 68 69 70 71 72 73 74 75

**Note** that Redis service is necessary for demo running. Remember to clear the cache of Redis server before launching demo on each computation party, in order to avoid any negative influences caused by the cached records in Redis. The following command can be used for clear Redis, where REDIS_BIN is the executable binary of redis-cli, SERVER and PORT represent the IP and port of Redis server respectively.

```
$REDIS_BIN -h $SERVER -p $PORT flushall
```

Launch demo on each computation party with the following command,

```
76
$PYTHON_EXECUTABLE train_fc_sigmoid.py $PARTY_ID $SERVER $PORT
77 78 79 80
```

where PYTHON_EXECUTABLE is the python which installs PaddleFL, PARTY_ID is the ID of computation party, which is 0, 1, or 2, SERVER and PORT represent the IP and port of Redis server respectively.

81
Similarly, predictions with cypher text format would be saved in `./mpc_infer_data/` directory, for example, a file named `mnist_output_prediction.part0` for party 0.
82

J
jingqinghe 已提交
83
#### (5). Decrypt Prediction Data
84

85
Each computation party sends  `mnist_output_prediction.part` file in `./mpc_infer_data/` directory to the `./mpc_infer_data/` directory of data owner. Data owner decrypts the prediction data and saves the decrypted prediction results into a specified file using `decrypt_data_to_file()` in `process_data.py` script. For example, users can write the following code into a python script named `decrypt_save.py`, and then run the script with command `python decrypt_save.py decrypt_file`. The decrypted prediction results would be saved into file `decrypt_file`.
86 87

```python
H
heya02 已提交
88 89 90
import sys

decrypt_file=sys.argv[1]
91
import process_data
92
process_data.decrypt_data_to_file("./mpc_infer_data/mnist_output_prediction", (BATCH_SIZE,), decrypt_file)
93 94
```