rename prepare_data and add readme for examples

d35b2c7b · xukun07 · 5302307d · d35b2c7b · d35b2c7b · d35b2c7b
10 changed file
--- a/python/paddle_fl/mpc/data_utils/alignment.py
+++ b/python/paddle_fl/mpc/data_utils/alignment.py
@@ -106,7 +106,6 @@ def _recv_align_result(host):
    The host is represented as "id:ip:port".
    :return: set. The received align result.
    """
-    from multiprocessing.connection import Listener
    ip_addr = host.split(":")[1]
    port = int(host.split(":")[2])
    server = Listener((ip_addr, port))

--- a/python/paddle_fl/mpc/examples/mnist_demo/README.md
+++ b/python/paddle_fl/mpc/examples/mnist_demo/README.md
+##Instructions for PaddleFL-MPC MNIST Demo
+([简体中文](./README_CN.md)|English)
+This document introduces how to run MNIST demo based on Paddle-MPC, which has two ways of running, i.e., single machine and multi machines.
+###1. Running on Single Machine
+####(1). Prepare Data
+Generate encrypted training and testing data utilizing `generate_encrypted_data()` and `generate_encrypted_test_data()` in `process_data.py` script. For example, users can write the following code into a python script named `prepare.py`, and then run the script with command `python prepare.py`.
+```python
+import process_data
+process_data.generate_encrypted_data()
+process_data.generate_encrypted_test_data()
+```
+Encrypted data files of feature and label would be generated and saved in `/tmp` directory. Different suffix names are used for these files to indicate the ownership of different computation parties. For instance, a file named `mnist2_feature.part0` means it is a feature file of party 0.
+####(2). Launch Demo with A Shell Script
+Launch demo with the `run_standalone.sh` script. The concrete command is:
+```bash
+bash run_standalone.sh mnist_demo.py
+```
+The information of current epoch and step will be displayed on screen while training, as well as the total cost time when traning finished.
+Besides, predictions would be made in this demo once training is finished. The predictions with cypher text format would be save in `/tmp` directory, and the format of file name is similar to what is described in Step 1.
+####(3). Decrypt Data
+Decrypt the saved prediction data and save the decrypted prediction results into a specified file using `decrypt_data_to_file()` in `process_data.py` script. For example, users can write the following code into a python script named `decrypt_save.py`, and then run the script with command `python decrypt_save.py`. The decrypted prediction results would be saved into `mpc_label`.
+```python
+import process_data
+process_data.decrypt_data_to_file("/tmp/mnist_output_prediction", (BATCH_SIZE,), "mpc_label")
+```
+**Note** that remember to delete the prediction files in `/tmp` directory generated in last running, in case of any influence on the decrypted results of current running. For simplifying users operations, we provide the following commands in `run_standalone.sh`, which can delete the files mentioned above when running this script.
+```bash
+# remove temp data generated in last time
+PRED_FILE="/tmp/mnist_output_prediction.*"
+if [ "$PRED_FILE" ]; then
+        rm -rf $PRED_FILE
+fi
+```
+###2. Running on Multi Machines
+####(1). Prepare Data
+Data owner encrypts data. Concrete operations are consistent with “Prepare Data” in “Running on Single Machine”.
+####(2). Distribute Encrypted Data
+According to the suffix of file name, distribute encrypted data files to `/tmp ` directories of all 3 computation parties. For example, send `mnist2_feature.part0` and `mnist2_label.part0` to `/tmp` of party 0 with `scp` command.
+####(3). Modify mnist_demo.py
+Each computation party modifies `localhost` in the following code as the IP address of it's machine.
+```python
+pfl_mpc.init("aby3", int(role), "localhost", server, int(port))
+```
+####(4). Launch Demo on Each Party
+**Note** that Redis service is necessary for demo running. Remember to clear the cache of Redis server before launching demo on each computation party, in order to avoid any negative influences caused by the cached records in Redis. The following command can be used for clear Redis, where REDIS_BIN is the executable binary of redis-cli, SERVER and PORT represent the IP and port of Redis server respectively.
+```
+$REDIS_BIN -h $SERVER -p $PORT flushall
+```
+Launch demo on each computation party with the following command,
+```
+$PYTHON_EXECUTABLE mnist_demo.py $PARTY_ID $SERVER $PORT
+```
+where PYTHON_EXECUTABLE is the python which installs PaddleFL, PARTY_ID is the ID of computation party, which is 0, 1, or 2, SERVER and PORT represent the IP and port of Redis server respectively.
+Similarly, predictions with cypher text format would be saved in `/tmp` directory, for example, a file named `mnist_output_prediction.part0` for party 0.
+**Note** that remember to delete the precidtion files in `/tmp` directory generated in last running, in case of any influence on the decrypted results of current running.
+####(5). Decrypt Prediction Data
+Each computation party sends  `mnist_output_prediction.part` file in `/tmp` directory to the `/tmp` directory of data owner. Data owner decrypts the prediction data and saves the decrypted prediction results into a specified file using `decrypt_data_to_file()` in `process_data.py` script. For example, users can write the following code into a python script named `decrypt_save.py`, and then run the script with command `python decrypt_save.py`. The decrypted prediction results would be saved into file `mpc_label`.
+```python
+import process_data
+process_data.decrypt_data_to_file("/tmp/mnist_output_prediction", (BATCH_SIZE,), "mpc_label")
+```
--- a/python/paddle_fl/mpc/examples/mnist_demo/README_CN.md
+++ b/python/paddle_fl/mpc/examples/mnist_demo/README_CN.md
+##PaddleFL-MPC MNIST Demo运行说明
+(简体中文|[English](./README.md))
+本示例介绍基于PaddleFL-MPC进行MNIST数据集模型训练和预测的使用说明，分为单机运行和多机运行两种方式。
+###一. 单机运行
+####1. 准备数据
+使用`process_data.py`脚本中的`generate_encrypted_data()`和`generate_encrypted_test_data()`产生加密训练数据和测试数据，比如将如下内容写到一个`prepare.py`脚本中，然后`python prepare.py`
+```python
+import process_data
+process_data.generate_encrypted_data()
+process_data.generate_encrypted_test_data()
+```
+将在/tmp目录下生成对应于3个计算party的feature和label的加密数据文件，以后缀名区分属于不同party的数据。比如，`mnist2_feature.part0`表示属于party0的feature数据。
+####2. 使用shell脚本启动demo
+使用`run_standalone.sh`脚本，启动并运行demo，命令如下：
+```bash 
+bash run_standalone.sh mnist_demo.py
+```
+运行之后将在屏幕上打印训练过程中所处的epoch和step，并在完成训练后打印出训练花费的时间。
+此外，在完成训练之后，demo会继续进行预测，并将预测密文结果保存到/tmp目录下的文件中，文件命名格式类似于步骤1中所述。
+####3. 解密数据
+使用`process_data.py`脚本中的`decrypt_data_to_file()`，将保存的密文预测结果进行解密，并且将解密得到的明文预测结果保存到指定文件中。例如，将下面的内容写到一个`decrypt_save.py`脚本中，然后`python decrypt_save.py`，将把明文预测结果保存在`mpc_label`文件中。
+```python
+import process_data
+process_data.decrypt_data_to_file("/tmp/mnist_output_prediction", (BATCH_SIZE,), "mpc_label")
+```
+**注意**：再次启动运行demo之前，请先将上次在`/tmp`保存的预测密文结果文件删除，以免影响本次密文数据的恢复结果。为了简化用户操作，我们在`run_standalone.sh`脚本中加入了如下的内容，可以在执行脚本时删除上次的数据。
+```bash
+# remove temp data generated in last time
+PRED_FILE="/tmp/mnist_output_prediction.*"
+if [ "$PRED_FILE" ]; then
+        rm -rf $PRED_FILE
+fi
+```
+###二. 多机运行
+####1. 准备数据
+数据方对数据进行加密处理。具体操作和单机运行中的准备数据步骤一致。
+####2. 分发数据
+按照后缀名，将步骤1中准备好的数据分别发送到对应的计算party的/tmp目录下。比如，使用scp命令，将
+`mnist2_feature.part0`和`mnist2_label.part0`发送到party0的/tmp目录下。
+####3. 计算party修改mnist_demo.py脚本
+各计算party根据自己的机器环境，将脚本如下内容中的`localhost`修改为自己的IP地址：
+```python
+pfl_mpc.init("aby3", int(role), "localhost", server, int(port))
+```
+####4. 各计算party启动demo
+**注意**：运行需要用到redis服务。为了确保redis中已保存的数据不会影响demo的运行，请在各计算party启动demo之前，使用如下命令清空redis。其中，REDIS_BIN表示redis-cli可执行程序，SERVER和PORT分别表示redis server的IP地址和端口号。
+```
+$REDIS_BIN -h $SERVER -p $PORT flushall
+```
+在各计算party分别执行以下命令，启动demo：
+```
+$PYTHON_EXECUTABLE mnist_demo.py $PARTY_ID $SERVER $PORT
+```
+其中，PYTHON_EXECUTABLE表示自己安装了PaddleFL的python，PARTY_ID表示计算party的编号，值为0、1或2，SERVER和PORT分别表示redis server的IP地址和端口号。
+同样地，密文prediction数据将会保存到`/tmp`目录下的文件中。比如，在party0中将保存为文件`mnist_output_prediction.part0`.
+**注意**：再次启动运行demo之前，请先将上次在`/tmp`保存的prediction文件删除，以免影响本次密文数据的恢复结果。
+####5. 解密预测数据
+各计算party将`/tmp`目录下的`mnist_output_prediction.part`文件发送到数据方的/tmp目录下。数据方使用`process_data.py`脚本中的`decrypt_data_to_file()`，将密文预测结果进行解密，并且将解密得到的明文预测结果保存到指定文件中。例如，将下面的内容写到一个`decrypt_save.py`脚本中，然后`python decrypt_save.py`，将把明文预测结果保存在`mpc_label`文件中。
+```python
+import process_data
+process_data.decrypt_data_to_file("/tmp/mnist_output_prediction", (BATCH_SIZE,), "mpc_label")
+```
--- a/python/paddle_fl/mpc/examples/mnist_demo/mnist_demo.py
+++ b/python/paddle_fl/mpc/examples/mnist_demo/mnist_demo.py
@@ -24,7 +24,7 @@ import paddle
 import paddle.fluid as fluid
 import paddle_fl.mpc as pfl_mpc
 import paddle_fl.mpc.data_utils.aby3 as aby3
-import prepare_data
 role, server, port = sys.argv[1], sys.argv[2], sys.argv[3]
 # modify host(localhost).
@@ -72,7 +72,7 @@ test_batch_sample = paddle.reader.compose(test_batch_feature, test_batch_label)
 test_loader.set_batch_generator(test_batch_sample, places=place)
 # loss file
-loss_file = "/tmp/mnist_output_loss.part{}".format(role)
+# loss_file = "/tmp/mnist_output_loss.part{}".format(role)
 # train
 exe = fluid.Executor(place)
@@ -83,9 +83,9 @@ step = 0
 for epoch_id in range(epoch_num):
    # feed data via loader
    for sample in loader():
-        loss = exe.run(feed=sample, fetch_list=[cost.name])
+        exe.run(feed=sample, fetch_list=[cost.name])
        if step % 50 == 0:
-            print('Epoch={}, Step={}, loss={}'.format(epoch_id, step, loss))
+            print('Epoch={}, Step={}'.format(epoch_id, step))
        step += 1
 end_time = time.time()
@@ -98,9 +98,3 @@ for sample in test_loader():
    prediction = exe.run(program=infer_program, feed=sample, fetch_list=[cost])
    with open(prediction_file, 'ab') as f:
        f.write(np.array(prediction).tostring())
-# decrypt
-#if 0 == role:
-#    prepare_data.decrypt_data_to_file("/tmp/mnist_output_prediction", (BATCH_SIZE,), "mpc_label")
--- a/python/paddle_fl/mpc/examples/mnist_demo/prepare_data.py
+++ b/python/paddle_fl/mpc/examples/mnist_demo/prepare_data.py
@@ -12,10 +12,8 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 """
-Prepare of MNIST data for MPC usage
+Process data for MNIST.
 """
-import sys
 import numpy as np
 import paddle
 import six
@@ -81,7 +79,7 @@ def load_decrypt_data(filepath, shape):
 def decrypt_data_to_file(filepath, shape, decrypted_filepath):
    """
-    load the encrypted data and reconstruct
+    load the encrypted data and reconstruct to a file
    """
    part_readers = []
    for id in six.moves.range(3):
@@ -93,7 +91,3 @@ def decrypt_data_to_file(filepath, shape, decrypted_filepath):
        with open(decrypted_filepath, 'a+') as f:
            for i in p:
                f.write(str(i) + '\n')
-generate_encrypted_data()
-generate_encrypted_test_data()
--- a/python/paddle_fl/mpc/examples/mnist_demo/run_standalone.sh
+++ b/python/paddle_fl/mpc/examples/mnist_demo/run_standalone.sh
@@ -62,12 +62,7 @@ fi
 $REDIS_BIN -h $SERVER -p $PORT flushall
 # remove temp data generated in last time
-LOSS_FILE="/tmp/mnist_output_loss.*"
 PRED_FILE="/tmp/mnist_output_prediction.*"
-if [ "$LOSS_FILE" ]; then
-        rm -rf $LOSS_FILE
-fi
 if [ "$PRED_FILE" ]; then
        rm -rf $PRED_FILE
 fi

--- a/python/paddle_fl/mpc/examples/uci_demo/README.md
+++ b/python/paddle_fl/mpc/examples/uci_demo/README.md
+##Instructions for PaddleFL-MPC UCI Housing Demo
+([简体中文](./README_CN.md)|English)
+This document introduces how to run UCI Housing demo based on Paddle-MPC, which has two ways of running, i.e., single machine and multi machines.
+###1. Running on Single Machine
+####(1). Prepare Data
+Generate encrypted data utilizing `generate_encrypted_data()` in `process_data.py` script. For example, users can write the following code into a python script named `prepare.py`, and then run the script with command `python prepare.py`.
+```python
+import process_data
+process_data.generate_encrypted_data()
+```
+Encrypted data files of feature and label would be generated and saved in `/tmp` directory. Different suffix names are used for these files to indicate the ownership of different computation parties. For instance, a file named `house_feature.part0` means it is a feature file of party 0.
+####(2). Launch Demo with A Shell Script
+Launch demo with the `run_standalone.sh` script. The concrete command is:
+```bash
+bash run_standalone.sh uci_housing_demo.py
+```
+The loss with cypher text format will be displayed on screen while training. At the same time, the loss data would be also save in `/tmp` directory, and the format of file name is similar to what is described in Step 1.
+Besides, predictions would be made in this demo once training is finished. The predictions with cypher text format would also be save in `/tmp` directory.
+Finally, using `load_decrypt_data()` in `process_data.py` script, this demo would decrypt and print the loss and predictions, which can be compared with related results of Paddle plain text model.
+**Note** that remember to delete the loss and prediction files in `/tmp` directory generated in last running, in case of any influence on the decrypted results of current running. For simplifying users operations, we provide the following commands in `run_standalone.sh`, which can delete the files mentioned above when running this script.
+```bash
+# remove temp data generated in last time
+LOSS_FILE="/tmp/uci_loss.*"
+PRED_FILE="/tmp/uci_prediction.*"
+if [ "$LOSS_FILE" ]; then
+        rm -rf $LOSS_FILE
+fi
+if [ "$PRED_FILE" ]; then
+        rm -rf $PRED_FILE
+fi
+```
+###2. Running on Multi Machines
+####(1). Prepare Data
+Data owner encrypts data. Concrete operations are consistent with “Prepare Data” in “Running on Single Machine”.
+####(2). Distribute Encrypted Data
+According to the suffix of file name, distribute encrypted data files to `/tmp ` directories of all 3 computation parties. For example, send `house_feature.part0` and `house_label.part0` to `/tmp` of party 0 with `scp` command.
+####(3). Modify uci_housing_demo.py
+Each computation party makes the following modifications on `uci_housing_demo.py` according to the environment of machine.
+* Modify IP Information
+  Modify `localhost` in the following code as the IP address of the machine.
+  ```python
+  pfl_mpc.init("aby3", int(role), "localhost", server, int(port))
+  ```
+* Comment Out Codes for Single Machine Running
+  Comment out the following codes which are used when running on single machine.
+  ```python
+  import process_data
+  print("uci_loss:")
+  process_data.load_decrypt_data("/tmp/uci_loss", (1,))
+  print("prediction:")
+  process_data.load_decrypt_data("/tmp/uci_prediction", (BATCH_SIZE,))
+  ```
+####(4). Launch Demo on Each Party
+**Note** that Redis service is necessary for demo running. Remember to clear the cache of Redis server before launching demo on each computation party, in order to avoid any negative influences caused by the cached records in Redis. The following command can be used for clear Redis, where REDIS_BIN is the executable binary of redis-cli, SERVER and PORT represent the IP and port of Redis server respectively.
+```
+$REDIS_BIN -h $SERVER -p $PORT flushall
+```
+Launch demo on each computation party with the following command,
+```
+$PYTHON_EXECUTABLE uci_housing_demo.py $PARTY_ID $SERVER $PORT
+```
+where PYTHON_EXECUTABLE is the python which installs PaddleFL, PARTY_ID is the ID of computation party, which is 0, 1, or 2, SERVER and PORT represent the IP and port of Redis server respectively.
+Similarly, training loss with cypher text format would be printed on the screen of each computation party. And at the same time, the loss and predictions would be saved in `/tmp` directory.
+**Note** that remember to delete the loss and precidtion files in `/tmp` directory generated in last running, in case of any influence on the decrypted results of current running.
+####(5). Decrypt Loss and Prediction Data
+Each computation party sends `uci_loss.part` and `uci_prediction.part` files in `/tmp` directory to the `/tmp` directory of data owner. Data owner decrypts and gets the plain text of loss and predictions with ` load_decrypt_data()` in `process_data.py`.
+For example, the following code can be written into a python script to decrypt and print training loss.
+```python
+import process_data
+print("uci_loss:")
+process_data.load_decrypt_data("/tmp/uci_loss", (1,))
+```
+And the following code can be written into a python script to decrypt and print predictions.
+```python
+import process_data
+print("prediction:")
+process_data.load_decrypt_data("/tmp/uci_prediction", (BATCH_SIZE,))
+```
--- a/python/paddle_fl/mpc/examples/uci_demo/README_CN.md
+++ b/python/paddle_fl/mpc/examples/uci_demo/README_CN.md
+##PaddleFL-MPC UCI Housing Demo运行说明
+(简体中文|[English](./README.md))
+本示例介绍基于PaddleFL-MPC进行UCI房价预测模型训练和预测的使用说明，分为单机运行和多机运行两种方式。
+###一. 单机运行
+####1. 准备数据
+使用`process_data.py`脚本中的`generate_encrypted_data()`产生加密数据，比如将如下内容写到一个`prepare.py`脚本中，然后`python prepare.py`
+```python
+import process_data
+process_data.generate_encrypted_data()
+```
+将在/tmp目录下生成对应于3个计算party的feature和label的加密数据文件，以后缀名区分属于不同party的数据。比如，`house_feature.part0`表示属于party0的feature数据。
+####2. 使用shell脚本启动demo
+使用`run_standalone.sh`脚本，启动并运行demo，命令如下：
+```bash 
+bash run_standalone.sh uci_housing_demo.py
+```
+运行之后将在屏幕上打印训练过程中的密文loss数据，同时，对应的密文loss数据将会保存到/tmp目录下的文件中，文件命名格式类似于步骤1中所述。
+此外，在完成训练之后，demo会继续进行预测，并将预测密文结果也保存到/tmp目录下的文件中。
+最后，demo会使用`process_data.py`脚本中的`load_decrypt_data()`，恢复并打印出明文的loss数据和prediction结果，用以和明文Paddle模型结果进行对比。
+**注意**：再次启动运行demo之前，请先将上次在`/tmp`保存的loss和prediction文件删除，以免影响本次密文数据的恢复结果。为了简化用户操作，我们在`run_standalone.sh`脚本中加入了如下的内容，可以在执行脚本时删除上次数据。
+```bash
+# remove temp data generated in last time
+LOSS_FILE="/tmp/uci_loss.*"
+PRED_FILE="/tmp/uci_prediction.*"
+if [ "$LOSS_FILE" ]; then
+        rm -rf $LOSS_FILE
+fi
+if [ "$PRED_FILE" ]; then
+        rm -rf $PRED_FILE
+fi
+```
+###二. 多机运行
+####1. 准备数据
+数据方对数据进行加密处理。具体操作和单机运行中的准备数据步骤一致。
+####2. 分发数据
+按照后缀名，将步骤1中准备好的数据分别发送到对应的计算party的/tmp目录下。比如，使用scp命令，将
+`house_feature.part0`和`house_label.part0`发送到party0的/tmp目录下。
+####3. 计算party修改uci_housing_demo.py脚本
+各计算party根据自己的机器环境，对uci_housing_demo.py做如下改动：
+* 修改IP信息
+  将脚本如下内容中的`localhost`修改为自己的IP地址：
+  ```python
+  pfl_mpc.init("aby3", int(role), "localhost", server, int(port))
+  ```
+* 注释掉单机运行所需代码
+  将脚本中如下代码注释掉，这部分代码用在单机运行case下。
+  ```python
+  import process_data
+  print("uci_loss:")
+  process_data.load_decrypt_data("/tmp/uci_loss", (1,))
+  print("prediction:")
+  process_data.load_decrypt_data("/tmp/uci_prediction", (BATCH_SIZE,))
+  ```
+####4. 各计算party启动demo
+**注意**：运行需要用到redis服务。为了确保redis中已保存的数据不会影响demo的运行，请在各计算party启动demo之前，使用如下命令清空redis。其中，REDIS_BIN表示redis-cli可执行程序，SERVER和PORT分别表示redis server的IP地址和端口号。
+```
+$REDIS_BIN -h $SERVER -p $PORT flushall
+```
+在各计算party分别执行以下命令，启动demo：
+```
+$PYTHON_EXECUTABLE uci_housing_demo.py $PARTY_ID $SERVER $PORT
+```
+其中，PYTHON_EXECUTABLE表示自己安装了PaddleFL的python，PARTY_ID表示计算party的编号，值为0、1或2，SERVER和PORT分别表示redis server的IP地址和端口号。
+同样地，运行之后将在各计算party的屏幕上打印训练过程中的密文loss数据。同时，对应的密文loss和prediction数据将会保存到`/tmp`目录下的文件中，文件命名格式类似于步骤1中所述。
+**注意**：再次启动运行demo之前，请先将上次在`/tmp`保存的loss和prediction文件删除，以免影响本次密文数据的恢复结果。
+####5. 数据方解密loss和prediction
+各计算party将`/tmp`目录下的`uci_loss.part`和`uci_prediction.part`文件发送到数据方的/tmp目录下。数据方使用process_data.py脚本中的load_decrypt_data()解密恢复出loss数据和prediction数据。
+比如，使用如下内容的python脚本，打印解密的loss数据：
+```python
+import process_data
+print("uci_loss:")
+process_data.load_decrypt_data("/tmp/uci_loss", (1,))
+```
+使用如下内容的python脚本，打印解密的prediction数据：
+```python
+import process_data
+print("prediction:")
+process_data.load_decrypt_data("/tmp/uci_prediction", (BATCH_SIZE,))
+```
--- a/python/paddle_fl/mpc/examples/uci_demo/prepare_data.py
+++ b/python/paddle_fl/mpc/examples/uci_demo/prepare_data.py
@@ -12,7 +12,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 """
-Prepare of UCI housing data for MPC usage
+Process data for UCI Housing.
 """
 import numpy as np
 import paddle
@@ -60,6 +60,3 @@ def load_decrypt_data(filepath, shape):
    for instance in aby3_share_reader():
        p = aby3.reconstruct(np.array(instance))
        print(p)
-generate_encrypted_data()
--- a/python/paddle_fl/mpc/examples/uci_demo/uci_housing_demo.py
+++ b/python/paddle_fl/mpc/examples/uci_demo/uci_housing_demo.py
@@ -93,8 +93,8 @@ for sample in loader():
        f.write(np.array(prediction).tostring())
    break
-import prepare_data
+import process_data
 print("uci_loss:")
-prepare_data.load_decrypt_data("/tmp/uci_loss", (1, ))
+process_data.load_decrypt_data("/tmp/uci_loss", (1, ))
 print("prediction:")
-prepare_data.load_decrypt_data("/tmp/uci_prediction", (BATCH_SIZE, ))
+process_data.load_decrypt_data("/tmp/uci_prediction", (BATCH_SIZE, ))