提交 8468db4b 编写于 作者: J jhjiangcs

add model encryption and decryption demos.

上级 13cba8d8
......@@ -105,4 +105,5 @@ REGISTER_OPERATOR(
ops::MpcSGDOpInferVarType);
REGISTER_OP_CPU_KERNEL(
mpc_sgd,
ops::MpcSGDOpKernel<paddle::platform::CPUDeviceContext, int64_t, float>);
ops::MpcSGDOpKernel<paddle::platform::CPUDeviceContext, int64_t, float>,
ops::MpcSGDOpKernel<paddle::platform::CPUDeviceContext, int64_t, double>);
## Instructions for PaddleFL-MPC Model Decryption Demo
([简体中文](./README_CN.md)|English)
### 1. Introduction
Users can decrypt encrypted model with Paddle-MPC. The decrypted model can be used for training and prediction.
### 2. Usages
We will show how to decrypt prediction model.
1. **Decrypt Model**:Users decrypt encryped model with api `aby3.decrypt_model`.
```python
aby3.decrypt_model(mpc_model_dir=mpc_model_dir,
plain_model_path=decrypted_paddle_model_dir,
mpc_model_filename=mpc_model_filename,
plain_model_filename=paddle_model_filename)
```
2. **Predict**:Users can predict plaintext data with decrypted model.
1) User loads decrypted model with api `fluid.io.load_inference_model`.
```python
infer_prog, feed_names, fetch_targets = fluid.io.load_inference_model(executor=exe,
dirname=decrypted_paddle_model_dir,
model_filename=paddle_model_filename)
```
2) User predict plaintext data with decrypted model.
```python
results = exe.run(infer_prog,
feed={feed_names[0]: np.array(infer_feature)},
fetch_list=fetch_targets)
```
## 模型解密使用手册
(简体中文|[English](./README.md))
### 1. 介绍
基于paddle-mpc提供的功能,用户可以实现对MPC密文模型的解密,得到明文模型。具体地,模型解密可以满足用户对于明文模型的需求:在从各方获取密文模型之后,通过解密得到最终的明文模型,该明文模型和paddle模型的功能完全一致。
### 2. 使用场景
基于多方训练、更新得到的密文模型,解密恢复出完整的明文模型。该明文模型可用于继续训练和预测。
### 3. 使用方法
由于针对训练、更新和预测模型的解密步骤基本是一致的,所以这里以预测模型的解密为例,介绍模型解密的主要使用步骤。
1. **解密模型**:模型解密需求方从各方获取保存的密文预测模型(即模型分片),使用paddle-mpc提供的模型解密接口`aby3.decrypt_model`解密恢复出明文预测模型。
假设获取到的三个密文模型分片存放在`mpc_model_dir`目录下,使用`aby3.decrypt_model`进行解密:
```python
aby3.decrypt_model(mpc_model_dir=mpc_model_dir,
plain_model_path=decrypted_paddle_model_dir,
mpc_model_filename=mpc_model_filename,
plain_model_filename=paddle_model_filename)
```
2. **预测**:使用解密后的预测模型对输入的待预测数据进行预测,输出预测的结果。
该步骤同paddle预测模型的使用方法一致,首先使用`fluid.io.load_inference_model`加载预测模型:
```python
infer_prog, feed_names, fetch_targets = fluid.io.load_inference_model(executor=exe,
dirname=decrypted_paddle_model_dir,
model_filename=paddle_model_filename)
```
然后进行预测,得到预测结果:
```python
results = exe.run(infer_prog,
feed={feed_names[0]: np.array(infer_feature)},
fetch_list=fetch_targets)
```
### 4. 使用示例
提供了对UCI Housing房价预测模型进行解密并使用的示例,可直接运行`decrypt_inference_model.py`脚本得到预测结果。**需要注意的是**`decrypt_inference_model.py`脚本中待解密的模型设置为了`model_encryption/predict/predict.py`脚本内指定的模型,因此,执行脚本前请确保对应路径下已经保存了密文预测模型。
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Decrypt MPC inference model into paddle model and make prediction.
"""
import numpy as np
import paddle
import paddle.fluid as fluid
from paddle_fl.mpc.data_utils import aby3
mpc_model_dir = '../model_encryption/predict/tmp/mpc_models_to_predict'
mpc_model_filename = 'model_to_predict'
decrypted_paddle_model_dir = './tmp/paddle_inference_model'
paddle_model_filename = 'decrypted_model'
def infer():
"""
Predict with decrypted model.
"""
place = fluid.CPUPlace()
exe = fluid.Executor(place)
# Step 1. load decrypted model.
infer_prog, feed_names, fetch_targets = fluid.io.load_inference_model(executor=exe,
dirname=decrypted_paddle_model_dir,
model_filename=paddle_model_filename)
# Step 2. make prediction
batch_size = 10
infer_reader = fluid.io.batch(
paddle.dataset.uci_housing.test(), batch_size=batch_size)
infer_data = next(infer_reader())
infer_feat = np.array(
[data[0] for data in infer_data]).astype("float32")
assert feed_names[0] == 'x'
results = exe.run(infer_prog,
feed={feed_names[0]: np.array(infer_feat)},
fetch_list=fetch_targets)
print("infer results: (House Price)")
for idx, val in enumerate(results[0]):
print("%d: %.2f" % (idx, val))
if __name__ = '__main__':
# decrypt mpc model
aby3.decrypt_model(mpc_model_dir=mpc_model_dir,
plain_model_path=decrypted_paddle_model_dir,
mpc_model_filename=mpc_model_filename,
plain_model_filename=paddle_model_filename)
print('Successfully decrypt inference model. The decrypted model is saved in: {}'
.format(decrypted_paddle_model_dir))
# infer with decrypted model
infer()
## Instructions for PaddleFL-MPC Model Encryption Demo
([简体中文](./README_CN.md)|English)
### 1. Introduction
This document introduces how to run encrypt PaddlePaddle's model, then train or update encrypted model, or predict with encrypted model. Model encryption is suitable for protecting user data and model.
### 2. Scenarios
Model encryption demo contains three scenarios:
* **Encrypt Model and Train**
Each party loads PaddlePadlde model and then encrypts it. Each party feeds the encrypted data to train the encrypted model. Each party can get one share for the encrypted model. PaddlePaddle model can be reconstructed with three encrypted model shares.
* **Encrypt Pre-trained Model and Update**
Pre-trained model is encryption and distributed to multipel parties. All parties update the encrypted model by feeding encrypted data. PaddlePaddle model can be reconstructed with three encrypted model shares.
* **Encrypt Pre-trained Model and Predict**
Pre-trained model is encryption and distributed to multipel parties. All parties predict encrypted data with encrypted model. Prediction ouput can be reconstructed with three encrypted prediction shares.
### 3. Usage
#### 3.1 Train Model
<img src='images/model_training.png' width = "500" height = "550" align="middle"/>
This figure shows the model encryption training with Paddle-MPC.
1. **Load Model**: Users init mpc context with mpc_init OP. Then, users load or define PaddlePaddle network.
```python
pfl_mpc.init("aby3", role, ip, server, port)
[_, _, _, loss] = network.model_network()
exe.run(fluid.default_startup_program())
```
2. **Transpile(Encrypt) Model**: Users use api `aby3.transpile` encrypt PaddlePaddle model.
```python
aby3.transpile()
```
3. **Train Model**: Users train encrypted model with encrypted data.
```python
for epoch_id in range(epoch_num):
for mpc_sample in loader():
mpc_loss = exe.run(feed=mpc_sample, fetch_list=[loss.name])
```
4. **Save Model**:Users save encrypted model using `aby3.save_trainable_model`.
```python
aby3.save_trainable_model(exe=exe,
model_dir=model_save_dir,
model_filename=model_filename)
```
5. **Decrypt Model**:Users can decrypt model with the model shares.
#### 3.2 Update Model
<img src='images/model_updating.png' width = "500" height = "380" align="middle"/>
This figure shows how to update pre-trained model with Paddle-MPC.
1. **Pre-train Model**: Users train PaddlePaddle model with plaintext data.
2. **Encrypt Model**: Users encrypt pre-trained model with api `aby3.encrypt_model` and distribute model shares to other users.
```python
# Step 1. Load pre-trained model.
main_prog, _, _ = fluid.io.load_inference_model(executor=exe,
dirname=paddle_model_dir,
model_filename=model_filename)
# Step 2. Encrypt pre-trained model.
aby3.encrypt_model(program=main_prog,
mpc_model_dir=mpc_model_dir,
model_filename=model_filename)
```
3. **Update Model**:Users init mpc context with mpc_init OP, then load encrypted model with `aby3.load_mpc_model`. Users update the encrypted model with encrypted data.
```python
# Step 1. initialize MPC environment and load MPC model into
# default_main_program to update.
pfl_mpc.init("aby3", role, ip, server, port)
aby3.load_mpc_model(exe=exe,
mpc_model_dir=mpc_model_dir,
mpc_model_filename=mpc_model_filename)
# Step 2. MPC update
for epoch_id in range(epoch_num):
for mpc_sample in loader():
mpc_loss = exe.run(feed=mpc_sample, fetch_list=[loss.name])
```
4. **Decrypt Model**:Users can decrypt model with the model shares.
#### 3.3 Predict Model
<img src='images/model_infer.png' width = "500" height = "380" align="middle"/>
This figure shows how to predict with encrypted model.
1. **Train Model**:Users train PaddlePaddle model with plaintext data.
2. **Encrypt Model**: Users encrypt model with api `aby3.encrypt_model` and distribute model shares to other users. The api is same with `Update Model`.
3. **Predict Model**: Users initialize mpc context with `mpc_init OP`, then load encrypted model with api `aby3.load_mpc_model`. Users predict encryped data with encryted model.
```python
# Step 1. initialize MPC environment and load MPC model to predict
pfl_mpc.init("aby3", role, ip, server, port)
infer_prog, feed_names, fetch_targets =
aby3.load_mpc_model(exe=exe,
mpc_model_dir=mpc_model_dir, mpc_model_filename=mpc_model_filename, inference=True)
# Step 2. MPC predict
prediction = exe.run(program=infer_prog, feed={feed_names[0]: np.array(mpc_sample)}, fetch_list=fetch_targets)
# Step 3. save prediction results
with open(pred_file, 'ab') as f:
f.write(np.array(prediction).tostring())
```
4. **Decrypt Model**:Users can decrypt model with the model shares.
### 4. Usage Demo
#### 4.1 Train Model
Instructions for model encryption and training with PaddleFL-MPC using UCI Housing dataset: [Here](./train).
#### 4.2 Update Model
Instructions for pre-trained model encryption and update with Paddle-MPC using UCI Housing dataset: [Here](./update).
#### 4.3 Predict Model
Instructions for pre-trained model encryption and prediction with Paddle-MPC using UCI Housing dataset: [Here](./predict).
## 模型加密使用手册
(简体中文|[English](./README.md))
### 1. 介绍
基于Paddle-MPC提供的功能,用户可以实现对明文PaddlePaddle模型的加密,然后根据具体需求在加密的模型上使用密文数据进行模型的训练、更新或预测。因此,模型加密可以用于同时保护用户训练数据和模型的场景。
### 2. 使用场景
根据用户的不同需求,模型加密的使用场景主要包括以下三种:
* **模型加密后训练**
多方用户使用各自数据联合进行已有模型的训练。在该场景下,各方用户可直接加载模型库中的网络模型或自定义的网络模型(未训练的模型program)并对模型加密,各方使用密文训练数据联合进行密文模型的训练和保存。训练完成后,各方只拥有密文模型,即明文模型的分片,在需要时可以基于模型分片解密恢复出完整的明文模型。
* **预训练模型加密后再更新**
多方用户使用各自新数据联合对现有的预训练模型进行更新。在该场景下,预训练的明文模型经过加密后分发给多方用户,各方用户使用新的密文训练数据联合进行密文模型的更新和保存。更新完成后,各方只拥有明文完整模型的分片,在需要时可以基于模型分片解密恢复出完整的明文模型。
* **预训练模型加密后预测**
多方用户使用预测模型对各自数据进行联合预测。在该场景下,明文预测模型经过加密后分发给多方用户,各方用户使用密文模型对密文数据作出联合预测。预测完成后,各方只拥有预测结果的分片,在需要时可以基于分片数据解密恢复出明文预测结果。
### 3. 使用方法
#### 3.1 模型训练
<img src='images/model_training.png' width = "500" height = "550" align="middle"/>
使用paddle-mpc进行模型加密训练的过程示意图如上,主要方法步骤如下:
1. **加载模型**:各方用户使用mpc_init OP初始化MPC环境,然后直接加载模型库中的网络模型或自定义的网络模型并完成参数的初始化。具体API用例为:
```python
pfl_mpc.init("aby3", role, ip, server, port)
[_, _, _, loss] = network.model_network()
exe.run(fluid.default_startup_program())
```
2. **模型转换**(加密):各方使用paddle-mpc提供的模型转换接口`aby3.transpile`,将明文模型转换(加密)成密文模型。具体API用例为:
```python
aby3.transpile()
```
3. **联合训练**:各方使用密文训练数据联合进行密文模型的训练。具体API用例为:
```python
for epoch_id in range(epoch_num):
for mpc_sample in loader():
mpc_loss = exe.run(feed=mpc_sample, fetch_list=[loss.name])
```
4. **模型保存**:训练完成后,各方使用`aby3.save_trainable_model`接口保存训练好的密文模型。具体API用例为:
```python
aby3.save_trainable_model(exe=exe,
model_dir=model_save_dir,
model_filename=model_filename)
```
5. **模型解密**:如有需要,模型解密需求方从各方获取保存的密文模型,使用paddle-mpc提供的模型解密功能解密恢复出明文模型。具体可参考`model_decryption`目录中的介绍。
#### 3.2 模型更新
<img src='images/model_updating.png' width = "500" height = "380" align="middle"/>
使用paddle-mpc进行模型加密更新的过程示意图如上,主要方法步骤如下:
1. **模型预训练**:使用明文数据完成明文模型的预训练,得到预训练模型并保存。该步骤由预训练模型拥有方执行,是进行模型加密前的预训练操作。
2. **模型加密**:预训练模型拥有方使用paddle-mpc提供的模型加密接口`aby3.encrypt_model`将预训练模型进行加密。加密得到的三个密文模型(即明文模型分片)分别发送给三个模型更新方保存。具体API用例为:
```python
# Step 1. Load pre-trained model.
main_prog, _, _ = fluid.io.load_inference_model(executor=exe,
dirname=paddle_model_dir,
model_filename=model_filename)
# Step 2. Encrypt pre-trained model.
aby3.encrypt_model(program=main_prog,
mpc_model_dir=mpc_model_dir,
model_filename=model_filename)
```
3. **联合更新**:更新模型的三方使用mpc_init OP初始化MPC环境,然后使用paddle-mpc提供的模型加载接口`aby3.load_mpc_model`加载保存的密文模型,基于密文模型和密文数据进行密文模型的更新并保存更新完成的密文模型。具体API用例为:
```python
# Step 1. initialize MPC environment and load MPC model into
# default_main_program to update.
pfl_mpc.init("aby3", role, ip, server, port)
aby3.load_mpc_model(exe=exe,
mpc_model_dir=mpc_model_dir,
mpc_model_filename=mpc_model_filename)
# Step 2. MPC update
for epoch_id in range(epoch_num):
for mpc_sample in loader():
mpc_loss = exe.run(feed=mpc_sample, fetch_list=[loss.name])
```
4. **模型解密**:如有需要,模型解密需求方从各方获取保存的密文模型,使用paddle-mpc提供的模型解密功能解密恢复出明文模型。
#### 3.3 模型预测
<img src='images/model_infer.png' width = "500" height = "380" align="middle"/>
使用paddle-mpc进行模型加密预测的过程示意图如上,主要方法步骤如下:
1. **模型训练**:使用明文数据完成明文预测模型的训练和保存。该步骤由预测模型拥有方执行。
2. **模型加密**:预测模型拥有方使用paddle-mpc提供的模型加密接口`aby3.encrypt_model`将预测模型进行加密。加密得到的三个密文模型(即明文模型分片)分别发送给三个预测方保存。具体API用例同模型更新中的介绍。
3. **联合预测**:执行预测的三方使用mpc_init OP初始化MPC环境,然后使用paddle-mpc提供的模型加载接口`aby3.load_mpc_model`加载密文预测模型,基于密文预测模型和密文数据进行预测并保存密文预测结果。具体API用例为:
```python
# Step 1. initialize MPC environment and load MPC model to predict
pfl_mpc.init("aby3", role, ip, server, port)
infer_prog, feed_names, fetch_targets =
aby3.load_mpc_model(exe=exe,
mpc_model_dir=mpc_model_dir, mpc_model_filename=mpc_model_filename, inference=True)
# Step 2. MPC predict
prediction = exe.run(program=infer_prog, feed={feed_names[0]: np.array(mpc_sample)}, fetch_list=fetch_targets)
# Step 3. save prediction results
with open(pred_file, 'ab') as f:
f.write(np.array(prediction).tostring())
```
4. **结果解密**:预测结果请求方从各方获取保存的密文预测结果,使用paddle-mpc提供的数据解密功能解密恢复出明文预测结果。
### 4. 使用示例
#### 4.1 模型训练
使用UCI Housing房价预测模型加密训练的示例,请见[这里](./train)
#### 4.2 模型更新
使用UCI Housing房价预测模型加密更新的示例,请见[这里](./update)
#### 4.3 模型预测
使用UCI Housing房价预测模型加密预测的示例,请见[这里](./predict)
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
This module provides networks.
"""
import paddle
import paddle.fluid as fluid
UCI_BATCH_SIZE = 10
BATCH_SIZE = 10
TRAIN_EPOCH = 20
PADDLE_UPDATE_EPOCH = 10
MPC_UPDATE_EPOCH = TRAIN_EPOCH - PADDLE_UPDATE_EPOCH
def uci_network():
"""
Build a network for uci housing.
"""
x = fluid.data(name='x', shape=[UCI_BATCH_SIZE, 13], dtype='float32')
y = fluid.data(name='y', shape=[UCI_BATCH_SIZE, 1], dtype='float32')
param_attr = paddle.fluid.param_attr.ParamAttr(name="fc_0.w_0",
initializer=fluid.initializer.ConstantInitializer(0.0))
bias_attr = paddle.fluid.param_attr.ParamAttr(name="fc_0.b_0")
y_pre = fluid.layers.fc(input=x, size=1, param_attr=param_attr, bias_attr=bias_attr)
# add infer_program
infer_program = fluid.default_main_program().clone(for_test=False)
cost = fluid.layers.square_error_cost(input=y_pre, label=y)
avg_loss = fluid.layers.mean(cost)
optimizer = fluid.optimizer.SGD(learning_rate=0.001)
optimizer.minimize(avg_loss)
return_list = [x, y, y_pre, avg_loss]
return return_list
## Instructions for pre-trained model encryption and prediction with Paddle-MPC
([简体中文](./README_CN.md)|English)
This document introduces how to encrypt pre-trained plaintext model and predict based it on Paddle-MPC.
###1. Train PaddlePaddle Model, Encrypt, and Save
Train plaintext PaddlePaddle model, encrypt, and save with the following script.
```bash
python train_and_encrypt_model.py
```
###2. Prepare Data
Run script `../process_data.py` to generate encrypted training and testing data.
###3. Predict with MPC Model
Predict mpc model with the following script.
```bash
bash run_standalone.sh predict_with_mpc_model.py
```
###4. Decrypt Prediction Data
Decrypt predition data.
```bash
python decrypt_mpc_prediction.py
```
## UCI房价预测模型加密预测
(简体中文|[English](./README.md))
###1. 训练明文模型并加密保存
使用如下命令完成明文模型的训练、加密和保存:
```bash
python train_and_encrypt_model.py
```
###2. 准备用于预测的加密数据
执行脚本`../process_data.py`完成模型更新所需数据的加密处理。
###3. 加密预测
使用如下命令完成密文模型预测:
```bash
bash run_standalone.sh predict_with_mpc_model.py
```
###4. 解密loss数据验证密文模型预测过程
使用如下命令对保存的预测结果进行解密查看:
```bash
python decrypt_mpc_prediction.py
```
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Decrypt MPC training loss.
"""
import sys
sys.path.append('..')
import process_data
print("********decrypted uci_loss*********")
LOSS_SIZE = 1
process_data.load_decrypt_data("./tmp/uci_prediction", (LOSS_SIZE,))
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
MPC prediction.
"""
import sys
import time
import numpy as np
import paddle.fluid as fluid
import paddle_fl.mpc as pfl_mpc
from paddle_fl.mpc.data_utils import aby3
sys.path.append('..')
import process_data
import network
def load_mpc_model_and_predict(role, ip, server, port, mpc_model_dir, mpc_model_filename):
"""
Predict based on MPC inference model, save prediction results into files.
"""
place = fluid.CPUPlace()
exe = fluid.Executor(place)
# Step 1. initialize MPC environment and load MPC model to predict
pfl_mpc.init("aby3", role, ip, server, port)
infer_prog, feed_names, fetch_targets = aby3.load_mpc_model(exe=exe,
mpc_model_dir=mpc_model_dir,
mpc_model_filename=mpc_model_filename,
inference=True)
# Step 2. MPC predict
batch_size = network.BATCH_SIZE
feature_file = "/tmp/house_feature"
feature_shape = (13,)
pred_file = "./tmp/uci_prediction.part{}".format(role)
loader = process_data.get_mpc_test_dataloader(feature_file, feature_shape, role, batch_size)
start_time = time.time()
for sample in loader():
prediction = exe.run(program=infer_prog, feed={feed_names[0]: np.array(sample)}, fetch_list=fetch_targets)
# Step 3. save prediction results
with open(pred_file, 'ab') as f:
f.write(np.array(prediction).tostring())
break
end_time = time.time()
print('Mpc Predict with samples of {}, cost time in seconds:{}'
.format(batch_size, (end_time - start_time)))
if __name__ == '__main__':
role, server, port = int(sys.argv[1]), sys.argv[2], int(sys.argv[3])
mpc_model_dir = './tmp/mpc_models_to_predict/model_share_{}'.format(role)
mpc_model_filename = 'model_to_predict'
load_mpc_model_and_predict(role=role,
ip='localhost',
server=server,
port=port,
mpc_model_dir=mpc_model_dir,
mpc_model_filename=mpc_model_filename)
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Prepare work before MPC model inference, which includes create paddle
model to inference, and encrypt paddle model into MPC model.
"""
import paddle
import paddle.fluid as fluid
import sys
import time
from paddle_fl.mpc.data_utils import aby3
sys.path.append('..')
import network
def train_infer_model(model_dir, model_filename):
"""
Original Training: train and save paddle inference model.
"""
# Step 1. load paddle net
[x, y, y_pre, loss] = network.uci_network()
# Step 2. train
place = fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
feeder = fluid.DataFeeder(place=place, feed_list=[x, y])
train_reader = paddle.batch(
paddle.dataset.uci_housing.train(), batch_size=network.BATCH_SIZE, drop_last=True)
start_time = time.time()
for epoch_id in range(network.TRAIN_EPOCH):
step = 0
for data in train_reader():
avg_loss = exe.run(feed=feeder.feed(data), fetch_list=[loss.name])
if step % 50 == 0:
print('Epoch={}, Step={}, Loss={}'.format(epoch_id, step, avg_loss[0]))
step += 1
end_time = time.time()
print('For Prediction: Paddle Training of Epoch={} Batch_size={}, cost time in seconds:{}'
.format(network.TRAIN_EPOCH, network.BATCH_SIZE, (end_time - start_time)))
# Step 3. save inference model
fluid.io.save_inference_model(executor=exe,
main_program=fluid.default_main_program(),
dirname=model_dir,
model_filename=model_filename,
feeded_var_names=[x.name],
target_vars=[y_pre])
def encrypt_paddle_model(paddle_model_dir, mpc_model_dir, model_filename):
"""
Load, encrypt and save model.
"""
place = fluid.CPUPlace()
exe = fluid.Executor(place)
# Step 1. Load inference model.
main_prog, _, _ = fluid.io.load_inference_model(executor=exe,
dirname=paddle_model_dir,
model_filename=model_filename)
# Step 2. Encrypt inference model.
aby3.encrypt_model(program=main_prog,
mpc_model_dir=mpc_model_dir,
model_filename=model_filename)
if __name__ == '__main__':
model_to_predict_dir = './tmp/paddle_model_to_predict'
model_to_predict_name = 'model_to_predict'
train_infer_model(model_dir=model_to_predict_dir,
model_filename=model_to_predict_name)
print('Successfully train and save paddle model to predict. The model is saved in: {}.'
.format(model_to_predict_dir))
mpc_model_to_predict_dir = './tmp/mpc_models_to_predict'
encrypt_paddle_model(paddle_model_dir=model_to_predict_dir,
mpc_model_dir=mpc_model_to_predict_dir,
model_filename=model_to_predict_name)
print('Successfully encrypt paddle model to predict. The encrypted models are saved in: {}.'
.format(mpc_model_to_predict_dir))
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
This module provides utils for model processing.
"""
import os
import numpy as np
import six
import paddle
import paddle.fluid as fluid
from paddle_fl.mpc.data_utils import aby3
#BATCH_SIZE = 10
#TRAIN_EPOCH = 20
#PADDLE_UPDATE_EPOCH = 10
#MPC_UPDATE_EPOCH = TRAIN_EPOCH - PADDLE_UPDATE_EPOCH
def get_mpc_dataloader(feature_file, label_file, feature_shape, label_shape,
feature_name, label_name, role, batch_size):
"""
Read feature and label training data from files.
"""
x = fluid.default_main_program().global_block().var(feature_name)
y = fluid.default_main_program().global_block().var(label_name)
feature_reader = aby3.load_aby3_shares(feature_file, id=role, shape=feature_shape)
label_reader = aby3.load_aby3_shares(label_file, id=role, shape=label_shape)
batch_feature = aby3.batch(feature_reader, batch_size, drop_last=True)
batch_label = aby3.batch(label_reader, batch_size, drop_last=True)
# async data loader
loader = fluid.io.DataLoader.from_generator(feed_list=[x, y], capacity=batch_size)
batch_sample = paddle.reader.compose(batch_feature, batch_label)
place = fluid.CPUPlace()
loader.set_batch_generator(batch_sample, places=place)
return loader
def get_mpc_test_dataloader(feature_file, feature_shape, role, batch_size):
"""
Read feature test data for prediction.
"""
feature_reader = aby3.load_aby3_shares(feature_file, id=role, shape=feature_shape)
batch_feature = aby3.batch(feature_reader, batch_size, drop_last=True)
return batch_feature
def load_decrypt_data(filepath, shape):
"""
Load the encrypted data and reconstruct.
"""
part_readers = []
for id in six.moves.range(3):
part_readers.append(
aby3.load_aby3_shares(
filepath, id=id, shape=shape))
aby3_share_reader = paddle.reader.compose(part_readers[0], part_readers[1],
part_readers[2])
for instance in aby3_share_reader():
p = aby3.reconstruct(np.array(instance))
print(p)
def generate_encrypted_data(mpc_data_dir):
"""
Generate encrypted samples
"""
sample_reader = paddle.dataset.uci_housing.train()
def encrypted_housing_features():
"""
feature reader
"""
for instance in sample_reader():
yield aby3.make_shares(instance[0])
def encrypted_housing_labels():
"""
label reader
"""
for instance in sample_reader():
yield aby3.make_shares(instance[1])
aby3.save_aby3_shares(encrypted_housing_features, mpc_data_dir + "house_feature")
aby3.save_aby3_shares(encrypted_housing_labels, mpc_data_dir + "house_label")
if __name__ == '__main__':
mpc_data_dir = "./mpc_data/"
if not os.path.exists(mpc_data_dir):
os.mkdir(mpc_data_dir)
generate_encrypted_data(mpc_data_dir)
## Instructions for PaddleFL-MPC model encryption and training
([简体中文](./README_CN.md)|English)
This document introduces how to encrypt plaintext model and training based on Paddle-MPC.
###1. Prepare Data
Run script `../process_data.py` to generate encrypted training and testing data.
###2. Encrypt Model, Train, and Save
Encrypt plaintext PaddlePaddle model, train the model, and save it with the following script.
```bash
bash run_standalone.sh encrypt_and_train_model.py
```
###3. Decrypt Loss Data
Decrypt the loss data to test the correctness of mpc training by running the following script.
```bash
python decrypt_mpc_loss.py
```
## UCI房价预测模型加密训练
(简体中文|[English](./README.md))
本示例介绍基于PaddleFL-MPC进行明文模型加密后再进行训练的使用说明。
###1. 准备加密数据
执行脚本`../process_data.py`完成训练数据的加密处理:
###2. 训练明文模型并加密保存
使用如下命令完成模型的加密、密文模型的训练与保存:
```bash
bash run_standalone.sh encrypt_and_train_model.py
```
###3. 解密loss数据验证密文模型训练过程
使用如下命令对训练过程中保存的loss数据进行解密查看,验证训练的正确性:
```bash
python decrypt_mpc_loss.py
```
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Decrypt MPC training loss.
"""
import sys
sys.path.append('..')
import process_data
print("********decrypted uci_loss*********")
LOSS_SIZE = 1
process_data.load_decrypt_data("./tmp/uci_mpc_loss", (LOSS_SIZE,))
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
MPC training.
"""
import numpy as np
import os
import sys
import time
import paddle.fluid as fluid
import paddle_fl.mpc as pfl_mpc
from paddle_fl.mpc.data_utils import aby3
sys.path.append('..')
import network
import process_data
def encrypt_model_and_train(role, ip, server, port, model_save_dir, model_filename):
"""
Load uci network and train MPC model.
"""
place = fluid.CPUPlace()
exe = fluid.Executor(place)
# Step 1. Initialize MPC environment and load paddle model network and initialize parameter.
pfl_mpc.init("aby3", role, ip, server, port)
[_, _, _, loss] = network.uci_network()
exe.run(fluid.default_startup_program())
# Step 2. TRANSPILE: encrypt default_main_program into MPC program
aby3.transpile()
# Step 3. MPC-TRAINING: model training based on MPC program.
mpc_data_dir = "../mpc_data/"
feature_file = mpc_data_dir + "house_feature"
feature_shape = (13,)
label_file = mpc_data_dir + "house_label"
label_shape = (1,)
if not os.path.exists('./tmp'):
os.makedirs('./tmp')
loss_file = "./tmp/uci_mpc_loss.part{}".format(role)
if os.path.exists(loss_file):
os.remove(loss_file)
batch_size = network.UCI_BATCH_SIZE
epoch_num = network.TRAIN_EPOCH
feature_name = 'x'
label_name = 'y'
loader = process_data.get_mpc_dataloader(feature_file, label_file, feature_shape, label_shape,
feature_name, label_name, role, batch_size)
start_time = time.time()
for epoch_id in range(epoch_num):
step = 0
for sample in loader():
mpc_loss = exe.run(feed=sample, fetch_list=[loss.name])
if step % 50 == 0:
print('Epoch={}, Step={}, Loss={}'.format(epoch_id, step, mpc_loss))
with open(loss_file, 'ab') as f:
f.write(np.array(mpc_loss).tostring())
step += 1
end_time = time.time()
print('Mpc Training of Epoch={} Batch_size={}, cost time in seconds:{}'
.format(epoch_num, batch_size, (end_time - start_time)))
# Step 4. SAVE trained MPC model as a trainable model.
aby3.save_trainable_model(exe=exe,
model_dir=model_save_dir,
model_filename=model_filename)
print('Successfully save mpc trained model into:{}'.format(model_save_dir))
role, server, port = int(sys.argv[1]), sys.argv[2], int(sys.argv[3])
model_save_dir = './tmp/mpc_models_trained/trained_model_share_{}'.format(role)
trained_model_name = 'mpc_trained_model'
encrypt_model_and_train(role=role,
ip='localhost',
server=server,
port=port,
model_save_dir=model_save_dir,
model_filename=trained_model_name)
## Instructions for pre-trained model encryption and update with Paddle-MPC
([简体中文](./README_CN.md)|English)
This document introduces how to encrypt pre-trained plaintext model and update based it on Paddle-MPC.
###1. Train PaddlePaddle Model, Encrypt, and Save
Train plaintext PaddlePaddle model, encrypt, and save with the following script.
```bash
python train_and_encrypt_model.py
```
###2. Prepare Data
Run script `../process_data.py` to generate encrypted training and testing data.
###3. Update MPC Model
Update mpc model with the following script.
```bash
bash run_standalone.sh update_mpc_model.py
```
###4. Decrypt Loss Data
Decrypt the loss data to test the correctness of mpc training by running the following script.
```bash
python decrypt_mpc_loss.py
```
## UCI房价预测模型加密更新
(简体中文|[English](./README.md))
本示例介绍基于PaddleFL-MPC对预训练对明文模型加密后再进行训练更新的使用说明。
###1. 训练明文模型并加密保存
使用如下命令完成明文模型的训练、加密和保存:
```bash
python train_and_encrypt_model.py
```
###2. 准备用于更新模型的加密数据
执行脚本`../process_data.py`完成模型更新所需数据的加密处理。
###3. 更新密文模型
使用如下命令完成密文模型的训练与保存:
```bash
bash run_standalone.sh update_mpc_model.py
```
###4. 解密loss数据验证密文模型更新过程
使用如下命令对更新过程中保存的loss数据进行解密查看:
```bash
python decrypt_mpc_loss.py
```
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Decrypt MPC training loss.
"""
import sys
sys.path.append('..')
import process_data
print("********decrypted uci_loss*********")
LOSS_SIZE = 1
process_data.load_decrypt_data("./tmp/uci_mpc_loss", (LOSS_SIZE,))
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Prepare work before MPC model updating, which includes create paddle
model to update, and encrypt paddle model into MPC model.
"""
import paddle
import paddle.fluid as fluid
import time
import sys
from paddle_fl.mpc.data_utils import aby3
sys.path.append('..')
import network
def original_train(model_dir, model_filename):
"""
Original Training: train and save pre-trained paddle model
"""
# Step 1. load paddle net
[x, y, _, loss] = network.uci_network()
# Step 2. train
place = fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
feeder = fluid.DataFeeder(place=place, feed_list=[x, y])
train_reader = paddle.batch(
paddle.dataset.uci_housing.train(), batch_size=network.BATCH_SIZE, drop_last=True)
start_time = time.time()
for epoch_id in range(network.PADDLE_UPDATE_EPOCH):
step = 0
for data in train_reader():
avg_loss = exe.run(feed=feeder.feed(data), fetch_list=[loss.name])
if step % 50 == 0:
print('Epoch={}, Step={}, Loss={}'.format(epoch_id, step, avg_loss[0]))
step += 1
end_time = time.time()
print('Paddle Training of Epoch={} Batch_size={}, cost time in seconds:{}'
.format(network.PADDLE_UPDATE_EPOCH, network.BATCH_SIZE, (end_time - start_time)))
# Step 3. save model to update
aby3.save_trainable_model(exe=exe,
program=fluid.default_main_program(),
model_dir=model_dir,
model_filename=model_filename)
def encrypt_paddle_model(paddle_model_dir, mpc_model_dir, model_filename):
"""
Load, encrypt and save model.
"""
place = fluid.CPUPlace()
exe = fluid.Executor(place)
# Step 1. Load pre-trained model.
main_prog, _, _ = fluid.io.load_inference_model(executor=exe,
dirname=paddle_model_dir,
model_filename=model_filename)
# Step 2. Encrypt pre-trained model.
aby3.encrypt_model(program=main_prog,
mpc_model_dir=mpc_model_dir,
model_filename=model_filename)
if __name__ == '__main__':
# train paddle model
model_to_update_dir = './tmp/paddle_model_to_update'
model_to_update_name = 'model_to_update'
original_train(model_dir=model_to_update_dir,
model_filename=model_to_update_name)
print('Successfully train and save paddle model for update. The model is saved in: {}.'
.format(model_to_update_dir))
# encrypt paddle model
mpc_model_to_update_dir = './tmp/mpc_models_to_update'
encrypt_paddle_model(paddle_model_dir=model_to_update_dir,
mpc_model_dir=mpc_model_to_update_dir,
model_filename=model_to_update_name)
print('Successfully encrypt paddle model for update. The encrypted models are saved in: {}.'
.format(mpc_model_to_update_dir))
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
MPC updating.
"""
import os
import sys
import time
import numpy as np
import paddle.fluid as fluid
import paddle_fl.mpc as pfl_mpc
from paddle_fl.mpc.data_utils import aby3
sys.path.append('..')
import network
import process_data
def load_uci_update(role, ip, server, port, mpc_model_dir, mpc_model_filename, updated_model_dir):
"""
Load, update and save uci MPC model.
"""
place = fluid.CPUPlace()
exe = fluid.Executor(place)
# Step 1. initialize MPC environment and load MPC model into default_main_program to update.
pfl_mpc.init("aby3", role, ip, server, port)
aby3.load_mpc_model(exe=exe,
mpc_model_dir=mpc_model_dir,
mpc_model_filename=mpc_model_filename)
# Step 2. MPC update
epoch_num = network.MPC_UPDATE_EPOCH
batch_size = network.BATCH_SIZE
mpc_data_dir = "../mpc_data/"
feature_file = mpc_data_dir + "house_feature"
feature_shape = (13,)
label_file = mpc_data_dir + "house_label"
label_shape = (1,)
loss_file = "./tmp/uci_mpc_loss.part{}".format(role)
if os.path.exists(loss_file):
os.remove(loss_file)
updated_model_name = 'mpc_updated_model'
feature_name = 'x'
label_name = 'y'
# fetch loss if needed
loss = fluid.default_main_program().global_block().var('mean_0.tmp_0')
loader = process_data.get_mpc_dataloader(feature_file, label_file, feature_shape, label_shape,
feature_name, label_name, role, batch_size)
start_time = time.time()
for epoch_id in range(epoch_num):
step = 0
for sample in loader():
mpc_loss = exe.run(feed=sample, fetch_list=[loss.name])
if step % 50 == 0:
print('Epoch={}, Step={}, Loss={}'.format(epoch_id, step, mpc_loss))
with open(loss_file, 'ab') as f:
f.write(np.array(mpc_loss).tostring())
step += 1
end_time = time.time()
print('Mpc Updating of Epoch={} Batch_size={}, cost time in seconds:{}'
.format(epoch_num, batch_size, (end_time - start_time)))
# Step 3. save updated MPC model as a trainable model.
aby3.save_trainable_model(exe=exe,
model_dir=updated_model_dir,
model_filename=updated_model_name)
print('Successfully save mpc updated model into:{}'.format(updated_model_dir))
if __name__ == '__main__':
role, server, port = int(sys.argv[1]), sys.argv[2], int(sys.argv[3])
mpc_model_dir = './tmp/mpc_models_to_update/model_share_{}'.format(role)
mpc_model_filename = 'model_to_update'
updated_model_dir = './tmp/mpc_models_updated/updated_model_share_{}'.format(role)
load_uci_update(role=role,
ip='localhost',
server=server,
port=port,
mpc_model_dir=mpc_model_dir,
mpc_model_filename=mpc_model_filename,
updated_model_dir=updated_model_dir)
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#!/bin/bash
#
# A tools to faciliate the parallel running of fluid_encrypted test scrips.
# A test script is EXPECTED to accepted arguments in the following format:
#
# SCRIPT_NAME $ROLE $SERVER $PORT
# ROLE: the role of the running party
# SERVER: the address of the party discovering service
# PORT: the port of the party discovering service
#
# This tool will try to fill the above three argument to the test script,
# so that totally three processes running the script will be started, to
# simulate run of three party in a standalone machine.
#
# Usage of this script:
#
# bash run_standalone.sh TEST_SCRIPT_NAME
#
# modify the following vars according to your environment
PYTHON=${PYTHON}
REDIS_HOME=${PATH_TO_REDIS_BIN}
SERVER=${LOCALHOST}
PORT=${REDIS_PORT}
echo "redis home in ${REDIS_HOME}, server is ${SERVER}, port is ${PORT}"
function usage(){
echo 'run_standalone.sh SCRIPT_NAME [ARG...]'
exit 0
}
if [ $# -lt 1 ]; then
usage
fi
SCRIPT=$1
if [ ! -f $SCRIPT ]; then
echo 'Could not find script of '$SCRIPT
exit 1
fi
REDIS_BIN=$REDIS_HOME/redis-cli
if [ ! -f $REDIS_BIN ]; then
echo 'Could not find redis cli in '$REDIS_HOME
exit 1
fi
# clear the redis cache
$REDIS_BIN -h $SERVER -p $PORT flushall
# remove temp data generated in last time
LOSS_FILE="/tmp/uci_loss.*"
PRED_FILE="/tmp/uci_prediction.*"
ls ${LOSS_FILE}
if [ $? -eq 0 ]; then
rm -rf $LOSS_FILE
fi
ls ${PRED_FILE}
if [ $? -eq 0 ]; then
rm -rf $PRED_FILE
fi
TRAINING_FILE="/tmp/house_feature.part*"
ls ${TRAINING_FILE}
if [ $? -ne 0 ]; then
echo "There is no data in /tmp, please prepare data with "python prepare.py" firstly"
exit 1
else
echo "There are data for uci:"
echo "`ls ${TRAINING_FILE}`"
fi
# kick off script with roles of 1 and 2, and redirect output to /dev/null
for role in {1..2}; do
$PYTHON $SCRIPT $role $SERVER $PORT 2>&1 >/dev/null &
done
# for party of role 0, run in a foreground mode and show the output
$PYTHON $SCRIPT 0 $SERVER $PORT
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册