提交 d35b2c7b 编写于 作者: X xukun07

rename prepare_data and add readme for examples

上级 5302307d
......@@ -106,7 +106,6 @@ def _recv_align_result(host):
The host is represented as "id:ip:port".
:return: set. The received align result.
"""
from multiprocessing.connection import Listener
ip_addr = host.split(":")[1]
port = int(host.split(":")[2])
server = Listener((ip_addr, port))
......
##Instructions for PaddleFL-MPC MNIST Demo
([简体中文](./README_CN.md)|English)
This document introduces how to run MNIST demo based on Paddle-MPC, which has two ways of running, i.e., single machine and multi machines.
###1. Running on Single Machine
####(1). Prepare Data
Generate encrypted training and testing data utilizing `generate_encrypted_data()` and `generate_encrypted_test_data()` in `process_data.py` script. For example, users can write the following code into a python script named `prepare.py`, and then run the script with command `python prepare.py`.
```python
import process_data
process_data.generate_encrypted_data()
process_data.generate_encrypted_test_data()
```
Encrypted data files of feature and label would be generated and saved in `/tmp` directory. Different suffix names are used for these files to indicate the ownership of different computation parties. For instance, a file named `mnist2_feature.part0` means it is a feature file of party 0.
####(2). Launch Demo with A Shell Script
Launch demo with the `run_standalone.sh` script. The concrete command is:
```bash
bash run_standalone.sh mnist_demo.py
```
The information of current epoch and step will be displayed on screen while training, as well as the total cost time when traning finished.
Besides, predictions would be made in this demo once training is finished. The predictions with cypher text format would be save in `/tmp` directory, and the format of file name is similar to what is described in Step 1.
####(3). Decrypt Data
Decrypt the saved prediction data and save the decrypted prediction results into a specified file using `decrypt_data_to_file()` in `process_data.py` script. For example, users can write the following code into a python script named `decrypt_save.py`, and then run the script with command `python decrypt_save.py`. The decrypted prediction results would be saved into `mpc_label`.
```python
import process_data
process_data.decrypt_data_to_file("/tmp/mnist_output_prediction", (BATCH_SIZE,), "mpc_label")
```
**Note** that remember to delete the prediction files in `/tmp` directory generated in last running, in case of any influence on the decrypted results of current running. For simplifying users operations, we provide the following commands in `run_standalone.sh`, which can delete the files mentioned above when running this script.
```bash
# remove temp data generated in last time
PRED_FILE="/tmp/mnist_output_prediction.*"
if [ "$PRED_FILE" ]; then
rm -rf $PRED_FILE
fi
```
###2. Running on Multi Machines
####(1). Prepare Data
Data owner encrypts data. Concrete operations are consistent with “Prepare Data” in “Running on Single Machine”.
####(2). Distribute Encrypted Data
According to the suffix of file name, distribute encrypted data files to `/tmp ` directories of all 3 computation parties. For example, send `mnist2_feature.part0` and `mnist2_label.part0` to `/tmp` of party 0 with `scp` command.
####(3). Modify mnist_demo.py
Each computation party modifies `localhost` in the following code as the IP address of it's machine.
```python
pfl_mpc.init("aby3", int(role), "localhost", server, int(port))
```
####(4). Launch Demo on Each Party
**Note** that Redis service is necessary for demo running. Remember to clear the cache of Redis server before launching demo on each computation party, in order to avoid any negative influences caused by the cached records in Redis. The following command can be used for clear Redis, where REDIS_BIN is the executable binary of redis-cli, SERVER and PORT represent the IP and port of Redis server respectively.
```
$REDIS_BIN -h $SERVER -p $PORT flushall
```
Launch demo on each computation party with the following command,
```
$PYTHON_EXECUTABLE mnist_demo.py $PARTY_ID $SERVER $PORT
```
where PYTHON_EXECUTABLE is the python which installs PaddleFL, PARTY_ID is the ID of computation party, which is 0, 1, or 2, SERVER and PORT represent the IP and port of Redis server respectively.
Similarly, predictions with cypher text format would be saved in `/tmp` directory, for example, a file named `mnist_output_prediction.part0` for party 0.
**Note** that remember to delete the precidtion files in `/tmp` directory generated in last running, in case of any influence on the decrypted results of current running.
####(5). Decrypt Prediction Data
Each computation party sends `mnist_output_prediction.part` file in `/tmp` directory to the `/tmp` directory of data owner. Data owner decrypts the prediction data and saves the decrypted prediction results into a specified file using `decrypt_data_to_file()` in `process_data.py` script. For example, users can write the following code into a python script named `decrypt_save.py`, and then run the script with command `python decrypt_save.py`. The decrypted prediction results would be saved into file `mpc_label`.
```python
import process_data
process_data.decrypt_data_to_file("/tmp/mnist_output_prediction", (BATCH_SIZE,), "mpc_label")
```
##PaddleFL-MPC MNIST Demo运行说明
(简体中文|[English](./README.md))
本示例介绍基于PaddleFL-MPC进行MNIST数据集模型训练和预测的使用说明,分为单机运行和多机运行两种方式。
###一. 单机运行
####1. 准备数据
使用`process_data.py`脚本中的`generate_encrypted_data()``generate_encrypted_test_data()`产生加密训练数据和测试数据,比如将如下内容写到一个`prepare.py`脚本中,然后`python prepare.py`
```python
import process_data
process_data.generate_encrypted_data()
process_data.generate_encrypted_test_data()
```
将在/tmp目录下生成对应于3个计算party的feature和label的加密数据文件,以后缀名区分属于不同party的数据。比如,`mnist2_feature.part0`表示属于party0的feature数据。
####2. 使用shell脚本启动demo
使用`run_standalone.sh`脚本,启动并运行demo,命令如下:
```bash 
bash run_standalone.sh mnist_demo.py
```
运行之后将在屏幕上打印训练过程中所处的epoch和step,并在完成训练后打印出训练花费的时间。
此外,在完成训练之后,demo会继续进行预测,并将预测密文结果保存到/tmp目录下的文件中,文件命名格式类似于步骤1中所述。
####3. 解密数据
使用`process_data.py`脚本中的`decrypt_data_to_file()`,将保存的密文预测结果进行解密,并且将解密得到的明文预测结果保存到指定文件中。例如,将下面的内容写到一个`decrypt_save.py`脚本中,然后`python decrypt_save.py`,将把明文预测结果保存在`mpc_label`文件中。
```python
import process_data
process_data.decrypt_data_to_file("/tmp/mnist_output_prediction", (BATCH_SIZE,), "mpc_label")
```
**注意**:再次启动运行demo之前,请先将上次在`/tmp`保存的预测密文结果文件删除,以免影响本次密文数据的恢复结果。为了简化用户操作,我们在`run_standalone.sh`脚本中加入了如下的内容,可以在执行脚本时删除上次的数据。
```bash
# remove temp data generated in last time
PRED_FILE="/tmp/mnist_output_prediction.*"
if [ "$PRED_FILE" ]; then
rm -rf $PRED_FILE
fi
```
###二. 多机运行
####1. 准备数据
数据方对数据进行加密处理。具体操作和单机运行中的准备数据步骤一致。
####2. 分发数据
按照后缀名,将步骤1中准备好的数据分别发送到对应的计算party的/tmp目录下。比如,使用scp命令,将
`mnist2_feature.part0``mnist2_label.part0`发送到party0的/tmp目录下。
####3. 计算party修改mnist_demo.py脚本
各计算party根据自己的机器环境,将脚本如下内容中的`localhost`修改为自己的IP地址:
```python
pfl_mpc.init("aby3", int(role), "localhost", server, int(port))
```
####4. 各计算party启动demo
**注意**:运行需要用到redis服务。为了确保redis中已保存的数据不会影响demo的运行,请在各计算party启动demo之前,使用如下命令清空redis。其中,REDIS_BIN表示redis-cli可执行程序,SERVER和PORT分别表示redis server的IP地址和端口号。
```
$REDIS_BIN -h $SERVER -p $PORT flushall
```
在各计算party分别执行以下命令,启动demo:
```
$PYTHON_EXECUTABLE mnist_demo.py $PARTY_ID $SERVER $PORT
```
其中,PYTHON_EXECUTABLE表示自己安装了PaddleFL的python,PARTY_ID表示计算party的编号,值为0、1或2,SERVER和PORT分别表示redis server的IP地址和端口号。
同样地,密文prediction数据将会保存到`/tmp`目录下的文件中。比如,在party0中将保存为文件`mnist_output_prediction.part0`.
**注意**:再次启动运行demo之前,请先将上次在`/tmp`保存的prediction文件删除,以免影响本次密文数据的恢复结果。
####5. 解密预测数据
各计算party将`/tmp`目录下的`mnist_output_prediction.part`文件发送到数据方的/tmp目录下。数据方使用`process_data.py`脚本中的`decrypt_data_to_file()`,将密文预测结果进行解密,并且将解密得到的明文预测结果保存到指定文件中。例如,将下面的内容写到一个`decrypt_save.py`脚本中,然后`python decrypt_save.py`,将把明文预测结果保存在`mpc_label`文件中。
```python
import process_data
process_data.decrypt_data_to_file("/tmp/mnist_output_prediction", (BATCH_SIZE,), "mpc_label")
```
......@@ -24,7 +24,7 @@ import paddle
import paddle.fluid as fluid
import paddle_fl.mpc as pfl_mpc
import paddle_fl.mpc.data_utils.aby3 as aby3
import prepare_data
role, server, port = sys.argv[1], sys.argv[2], sys.argv[3]
# modify host(localhost).
......@@ -72,7 +72,7 @@ test_batch_sample = paddle.reader.compose(test_batch_feature, test_batch_label)
test_loader.set_batch_generator(test_batch_sample, places=place)
# loss file
loss_file = "/tmp/mnist_output_loss.part{}".format(role)
# loss_file = "/tmp/mnist_output_loss.part{}".format(role)
# train
exe = fluid.Executor(place)
......@@ -83,9 +83,9 @@ step = 0
for epoch_id in range(epoch_num):
# feed data via loader
for sample in loader():
loss = exe.run(feed=sample, fetch_list=[cost.name])
exe.run(feed=sample, fetch_list=[cost.name])
if step % 50 == 0:
print('Epoch={}, Step={}, loss={}'.format(epoch_id, step, loss))
print('Epoch={}, Step={}'.format(epoch_id, step))
step += 1
end_time = time.time()
......@@ -98,9 +98,3 @@ for sample in test_loader():
prediction = exe.run(program=infer_program, feed=sample, fetch_list=[cost])
with open(prediction_file, 'ab') as f:
f.write(np.array(prediction).tostring())
# decrypt
#if 0 == role:
# prepare_data.decrypt_data_to_file("/tmp/mnist_output_prediction", (BATCH_SIZE,), "mpc_label")
......@@ -12,10 +12,8 @@
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Prepare of MNIST data for MPC usage
Process data for MNIST.
"""
import sys
import numpy as np
import paddle
import six
......@@ -81,7 +79,7 @@ def load_decrypt_data(filepath, shape):
def decrypt_data_to_file(filepath, shape, decrypted_filepath):
"""
load the encrypted data and reconstruct
load the encrypted data and reconstruct to a file
"""
part_readers = []
for id in six.moves.range(3):
......@@ -93,7 +91,3 @@ def decrypt_data_to_file(filepath, shape, decrypted_filepath):
with open(decrypted_filepath, 'a+') as f:
for i in p:
f.write(str(i) + '\n')
generate_encrypted_data()
generate_encrypted_test_data()
......@@ -62,12 +62,7 @@ fi
$REDIS_BIN -h $SERVER -p $PORT flushall
# remove temp data generated in last time
LOSS_FILE="/tmp/mnist_output_loss.*"
PRED_FILE="/tmp/mnist_output_prediction.*"
if [ "$LOSS_FILE" ]; then
rm -rf $LOSS_FILE
fi
if [ "$PRED_FILE" ]; then
rm -rf $PRED_FILE
fi
......
##Instructions for PaddleFL-MPC UCI Housing Demo
([简体中文](./README_CN.md)|English)
This document introduces how to run UCI Housing demo based on Paddle-MPC, which has two ways of running, i.e., single machine and multi machines.
###1. Running on Single Machine
####(1). Prepare Data
Generate encrypted data utilizing `generate_encrypted_data()` in `process_data.py` script. For example, users can write the following code into a python script named `prepare.py`, and then run the script with command `python prepare.py`.
```python
import process_data
process_data.generate_encrypted_data()
```
Encrypted data files of feature and label would be generated and saved in `/tmp` directory. Different suffix names are used for these files to indicate the ownership of different computation parties. For instance, a file named `house_feature.part0` means it is a feature file of party 0.
####(2). Launch Demo with A Shell Script
Launch demo with the `run_standalone.sh` script. The concrete command is:
```bash
bash run_standalone.sh uci_housing_demo.py
```
The loss with cypher text format will be displayed on screen while training. At the same time, the loss data would be also save in `/tmp` directory, and the format of file name is similar to what is described in Step 1.
Besides, predictions would be made in this demo once training is finished. The predictions with cypher text format would also be save in `/tmp` directory.
Finally, using `load_decrypt_data()` in `process_data.py` script, this demo would decrypt and print the loss and predictions, which can be compared with related results of Paddle plain text model.
**Note** that remember to delete the loss and prediction files in `/tmp` directory generated in last running, in case of any influence on the decrypted results of current running. For simplifying users operations, we provide the following commands in `run_standalone.sh`, which can delete the files mentioned above when running this script.
```bash
# remove temp data generated in last time
LOSS_FILE="/tmp/uci_loss.*"
PRED_FILE="/tmp/uci_prediction.*"
if [ "$LOSS_FILE" ]; then
rm -rf $LOSS_FILE
fi
if [ "$PRED_FILE" ]; then
rm -rf $PRED_FILE
fi
```
###2. Running on Multi Machines
####(1). Prepare Data
Data owner encrypts data. Concrete operations are consistent with “Prepare Data” in “Running on Single Machine”.
####(2). Distribute Encrypted Data
According to the suffix of file name, distribute encrypted data files to `/tmp ` directories of all 3 computation parties. For example, send `house_feature.part0` and `house_label.part0` to `/tmp` of party 0 with `scp` command.
####(3). Modify uci_housing_demo.py
Each computation party makes the following modifications on `uci_housing_demo.py` according to the environment of machine.
* Modify IP Information
Modify `localhost` in the following code as the IP address of the machine.
```python
pfl_mpc.init("aby3", int(role), "localhost", server, int(port))
```
* Comment Out Codes for Single Machine Running
Comment out the following codes which are used when running on single machine.
```python
import process_data
print("uci_loss:")
process_data.load_decrypt_data("/tmp/uci_loss", (1,))
print("prediction:")
process_data.load_decrypt_data("/tmp/uci_prediction", (BATCH_SIZE,))
```
####(4). Launch Demo on Each Party
**Note** that Redis service is necessary for demo running. Remember to clear the cache of Redis server before launching demo on each computation party, in order to avoid any negative influences caused by the cached records in Redis. The following command can be used for clear Redis, where REDIS_BIN is the executable binary of redis-cli, SERVER and PORT represent the IP and port of Redis server respectively.
```
$REDIS_BIN -h $SERVER -p $PORT flushall
```
Launch demo on each computation party with the following command,
```
$PYTHON_EXECUTABLE uci_housing_demo.py $PARTY_ID $SERVER $PORT
```
where PYTHON_EXECUTABLE is the python which installs PaddleFL, PARTY_ID is the ID of computation party, which is 0, 1, or 2, SERVER and PORT represent the IP and port of Redis server respectively.
Similarly, training loss with cypher text format would be printed on the screen of each computation party. And at the same time, the loss and predictions would be saved in `/tmp` directory.
**Note** that remember to delete the loss and precidtion files in `/tmp` directory generated in last running, in case of any influence on the decrypted results of current running.
####(5). Decrypt Loss and Prediction Data
Each computation party sends `uci_loss.part` and `uci_prediction.part` files in `/tmp` directory to the `/tmp` directory of data owner. Data owner decrypts and gets the plain text of loss and predictions with ` load_decrypt_data()` in `process_data.py`.
For example, the following code can be written into a python script to decrypt and print training loss.
```python
import process_data
print("uci_loss:")
process_data.load_decrypt_data("/tmp/uci_loss", (1,))
```
And the following code can be written into a python script to decrypt and print predictions.
```python
import process_data
print("prediction:")
process_data.load_decrypt_data("/tmp/uci_prediction", (BATCH_SIZE,))
```
##PaddleFL-MPC UCI Housing Demo运行说明
(简体中文|[English](./README.md))
本示例介绍基于PaddleFL-MPC进行UCI房价预测模型训练和预测的使用说明,分为单机运行和多机运行两种方式。
###一. 单机运行
####1. 准备数据
使用`process_data.py`脚本中的`generate_encrypted_data()`产生加密数据,比如将如下内容写到一个`prepare.py`脚本中,然后`python prepare.py`
```python
import process_data
process_data.generate_encrypted_data()
```
将在/tmp目录下生成对应于3个计算party的feature和label的加密数据文件,以后缀名区分属于不同party的数据。比如,`house_feature.part0`表示属于party0的feature数据。
####2. 使用shell脚本启动demo
使用`run_standalone.sh`脚本,启动并运行demo,命令如下:
```bash 
bash run_standalone.sh uci_housing_demo.py
```
运行之后将在屏幕上打印训练过程中的密文loss数据,同时,对应的密文loss数据将会保存到/tmp目录下的文件中,文件命名格式类似于步骤1中所述。
此外,在完成训练之后,demo会继续进行预测,并将预测密文结果也保存到/tmp目录下的文件中。
最后,demo会使用`process_data.py`脚本中的`load_decrypt_data()`,恢复并打印出明文的loss数据和prediction结果,用以和明文Paddle模型结果进行对比。
**注意**:再次启动运行demo之前,请先将上次在`/tmp`保存的loss和prediction文件删除,以免影响本次密文数据的恢复结果。为了简化用户操作,我们在`run_standalone.sh`脚本中加入了如下的内容,可以在执行脚本时删除上次数据。
```bash
# remove temp data generated in last time
LOSS_FILE="/tmp/uci_loss.*"
PRED_FILE="/tmp/uci_prediction.*"
if [ "$LOSS_FILE" ]; then
rm -rf $LOSS_FILE
fi
if [ "$PRED_FILE" ]; then
rm -rf $PRED_FILE
fi
```
###二. 多机运行
####1. 准备数据
数据方对数据进行加密处理。具体操作和单机运行中的准备数据步骤一致。
####2. 分发数据
按照后缀名,将步骤1中准备好的数据分别发送到对应的计算party的/tmp目录下。比如,使用scp命令,将
`house_feature.part0``house_label.part0`发送到party0的/tmp目录下。
####3. 计算party修改uci_housing_demo.py脚本
各计算party根据自己的机器环境,对uci_housing_demo.py做如下改动:
* 修改IP信息
将脚本如下内容中的`localhost`修改为自己的IP地址:
```python
pfl_mpc.init("aby3", int(role), "localhost", server, int(port))
```
* 注释掉单机运行所需代码
将脚本中如下代码注释掉,这部分代码用在单机运行case下。
```python
import process_data
print("uci_loss:")
process_data.load_decrypt_data("/tmp/uci_loss", (1,))
print("prediction:")
process_data.load_decrypt_data("/tmp/uci_prediction", (BATCH_SIZE,))
```
####4. 各计算party启动demo
**注意**:运行需要用到redis服务。为了确保redis中已保存的数据不会影响demo的运行,请在各计算party启动demo之前,使用如下命令清空redis。其中,REDIS_BIN表示redis-cli可执行程序,SERVER和PORT分别表示redis server的IP地址和端口号。
```
$REDIS_BIN -h $SERVER -p $PORT flushall
```
在各计算party分别执行以下命令,启动demo:
```
$PYTHON_EXECUTABLE uci_housing_demo.py $PARTY_ID $SERVER $PORT
```
其中,PYTHON_EXECUTABLE表示自己安装了PaddleFL的python,PARTY_ID表示计算party的编号,值为0、1或2,SERVER和PORT分别表示redis server的IP地址和端口号。
同样地,运行之后将在各计算party的屏幕上打印训练过程中的密文loss数据。同时,对应的密文loss和prediction数据将会保存到`/tmp`目录下的文件中,文件命名格式类似于步骤1中所述。
**注意**:再次启动运行demo之前,请先将上次在`/tmp`保存的loss和prediction文件删除,以免影响本次密文数据的恢复结果。
####5. 数据方解密loss和prediction
各计算party将`/tmp`目录下的`uci_loss.part``uci_prediction.part`文件发送到数据方的/tmp目录下。数据方使用process_data.py脚本中的load_decrypt_data()解密恢复出loss数据和prediction数据。
比如,使用如下内容的python脚本,打印解密的loss数据:
```python
import process_data
print("uci_loss:")
process_data.load_decrypt_data("/tmp/uci_loss", (1,))
```
使用如下内容的python脚本,打印解密的prediction数据:
```python
import process_data
print("prediction:")
process_data.load_decrypt_data("/tmp/uci_prediction", (BATCH_SIZE,))
```
......@@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Prepare of UCI housing data for MPC usage
Process data for UCI Housing.
"""
import numpy as np
import paddle
......@@ -60,6 +60,3 @@ def load_decrypt_data(filepath, shape):
for instance in aby3_share_reader():
p = aby3.reconstruct(np.array(instance))
print(p)
generate_encrypted_data()
......@@ -93,8 +93,8 @@ for sample in loader():
f.write(np.array(prediction).tostring())
break
import prepare_data
import process_data
print("uci_loss:")
prepare_data.load_decrypt_data("/tmp/uci_loss", (1, ))
process_data.load_decrypt_data("/tmp/uci_loss", (1, ))
print("prediction:")
prepare_data.load_decrypt_data("/tmp/uci_prediction", (BATCH_SIZE, ))
process_data.load_decrypt_data("/tmp/uci_prediction", (BATCH_SIZE, ))
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册