提交 f400b58d 编写于 作者: J jingqinghe

update document

上级 fb8a15f3
......@@ -9,6 +9,36 @@ PaddleFL is an open source federated learning framework based on PaddlePaddle. R
Data is becoming more and more expensive nowadays, and sharing of raw data is very hard across organizations. Federated Learning aims to solve the problem of data isolation and secure sharing of data knowledge among organizations. The concept of federated learning is proposed by researchers in Google [1, 2, 3].
## Overview of PaddleFL
### Horizontal Federated Learning
<img src='images/FL-framework.png' width = "1000" height = "320" align="middle"/>
In PaddleFL, horizontal and vertical federated learning strategies will be implemented according to the categorization given in [4]. Application demonstrations in natural language processing, computer vision and recommendation will be provided in PaddleFL.
#### A. Federated Learning Strategy
- **Vertical Federated Learning**: Logistic Regression with PrivC, Neural Network with third-party PrivC [5]
- **Horizontal Federated Learning**: Federated Averaging [2], Differential Privacy [6], Secure Aggregation
#### B. Training Strategy
- **Multi Task Learning** [7]
- **Transfer Learning** [8]
- **Active Learning**
### Paddle Encrypted
Paddle Fluid Encrypted is a framework for privacy-preserving deep learning based on PaddlePaddle. It follows the same running mechanism and programming paradigm with PaddlePaddle, while using secure multi-party computation (MPC) to enable secure training and prediction.
With Paddle Fluid Encrypted, it is easy to train models or conduct prediction as on PaddlePaddle over encrypted data, without the need for cryptography expertise. Furthermore, the rich industry-oriented models and algorithms built on PaddlePaddle can be smoothly migrated to secure versions on Paddle Fluid Encrypted with little effort.
As a key product of PaddleFL, Paddle Fluid Encrypted intrinsically supports federated learning well, including horizontal, vertical and transfer learning scenarios. It provides both provable security (semantic security) and competitive performance.
## Compilation and Installation
### Docker Installation
......@@ -55,7 +85,6 @@ Then you can put the directory in the following command and make:
cmake ../ -DPYTHON_EXECUTABLE=${PYTHON_EXECUTABLE} -DPYTHON_INCLUDE_DIRS=${python_include_dir} -DCMAKE_CXX_COMPILER=${g++_path}
make -j$(nproc)
```
Install the package:
```sh
......@@ -72,36 +101,6 @@ tar -xf redis-stable.tar
cd redis-stable && make
```
## Overview of PaddleFL
### Horizontal Federated Learning
<img src='images/FL-framework.png' width = "1000" height = "320" align="middle"/>
In PaddleFL, horizontal and vertical federated learning strategies will be implemented according to the categorization given in [4]. Application demonstrations in natural language processing, computer vision and recommendation will be provided in PaddleFL.
#### A. Federated Learning Strategy
- **Vertical Federated Learning**: Logistic Regression with PrivC, Neural Network with third-party PrivC [5]
- **Horizontal Federated Learning**: Federated Averaging [2], Differential Privacy [6], Secure Aggregation
#### B. Training Strategy
- **Multi Task Learning** [7]
- **Transfer Learning** [8]
- **Active Learning**
### Paddle Encrypted
Paddle Fluid Encrypted is a framework for privacy-preserving deep learning based on PaddlePaddle. It follows the same running mechanism and programming paradigm with PaddlePaddle, while using secure multi-party computation (MPC) to enable secure training and prediction.
With Paddle Fluid Encrypted, it is easy to train models or conduct prediction as on PaddlePaddle over encrypted data, without the need for cryptography expertise. Furthermore, the rich industry-oriented models and algorithms built on PaddlePaddle can be smoothly migrated to secure versions on Paddle Fluid Encrypted with little effort.
As a key product of PaddleFL, Paddle Fluid Encrypted intrinsically supports federated learning well, including horizontal, vertical and transfer learning scenarios. It provides both provable security (semantic security) and competitive performance.
## Framework design of PaddleFL
### Horizontal Federated Learning
......@@ -128,6 +127,7 @@ In PaddleFL, components for defining a federated learning task and training a fe
- **FL-scheduler**: Decide which set of trainers can join the training before each updating cycle.
For more instructions, please refer to the [examples](./python/paddle_fl/paddle_fl/examples)
### Paddle Encrypted
Paddle Fluid Encrypted implements secure training and inference tasks based on the underlying MPC protocol of ABY3, in which participants can be classified into roles of Input Party (IP), Computing Party (CP) and Result Party (RP).
......
......@@ -6,6 +6,36 @@ PaddleFL是一个基于PaddlePaddle的开源联邦学习框架。研究人员可
如今,数据变得越来越昂贵,而且跨组织共享原始数据非常困难。联合学习旨在解决组织间数据隔离和数据知识安全共享的问题。联邦学习的概念是由谷歌的研究人员提出的[1,2,3]。
## PaddleFL概述
### 横向联邦方案
<img src='images/FL-framework-zh.png' width = "1300" height = "310" align="middle"/>
在PaddleFL中,横向和纵向联邦学习策略将根据[4]中给出的分类来实现。PaddleFL也将提供在自然语言处理,计算机视觉和推荐算法等领域的应用示例。
#### A. 联邦学习策略
- **纵向联邦学习**: 带privc的逻辑回归,带第三方privc的神经网络[5]
- **横向联邦学习**: 联邦平均 [2],差分隐私 [6],安全聚合
#### B. 训练策略
- **多任务学习** [7]
- **迁移学习** [8]
- **主动学习**
### Paddle Encrypted
Paddle Encrypted 是一个基于PaddlePaddle的隐私保护深度学习框架。Paddle Encrypted基于多方计算(MPC)实现安全训练及预测,拥有与PaddlePaddle相同的运行机制及编程范式。
Paddle Encrypted 设计与PaddlePaddle相似,没有密码学相关背景的用户亦可简单的对加密的数据进行训练和预测。同时,PaddlePaddle中丰富的模型和算法可以轻易地迁移到Paddle Encrypted中。
作为PaddleFL的一个重要组成部分,Paddle Encrypted可以很好滴支持联邦学习,包括横向、纵向及联邦迁移学习等多个场景。既提供了可靠的安全性,也拥有可观的性能。
## 编译与安装
### 使用docker安装
......@@ -52,7 +82,6 @@ ${PYTHON_EXECUTABLE} -c "from distutils.sysconfig import get_python_inc;print(ge
cmake ../ -DPYTHON_EXECUTABLE=${PYTHON_EXECUTABLE} -DPYTHON_INCLUDE_DIRS=${python_include_dir} -DCMAKE_CXX_COMPILER=${g++_path}
make -j$(nproc)
```
安装对应的安装包
```sh
......@@ -70,36 +99,6 @@ tar -xf redis-stable.tar
cd redis-stable && make
```
## PaddleFL概述
### 横向联邦方案
<img src='images/FL-framework-zh.png' width = "1300" height = "310" align="middle"/>
在PaddleFL中,横向和纵向联邦学习策略将根据[4]中给出的分类来实现。PaddleFL也将提供在自然语言处理,计算机视觉和推荐算法等领域的应用示例。
#### A. 联邦学习策略
- **纵向联邦学习**: 带privc的逻辑回归,带第三方privc的神经网络[5]
- **横向联邦学习**: 联邦平均 [2],差分隐私 [6],安全聚合
#### B. 训练策略
- **多任务学习** [7]
- **迁移学习** [8]
- **主动学习**
### Paddle Encrypted
Paddle Encrypted 是一个基于PaddlePaddle的隐私保护深度学习框架。Paddle Encrypted基于多方计算(MPC)实现安全训练及预测,拥有与PaddlePaddle相同的运行机制及编程范式。
Paddle Encrypted 设计与PaddlePaddle相似,没有密码学相关背景的用户亦可简单的对加密的数据进行训练和预测。同时,PaddlePaddle中丰富的模型和算法可以轻易地迁移到Paddle Encrypted中。
作为PaddleFL的一个重要组成部分,Paddle Encrypted可以很好滴支持联邦学习,包括横向、纵向及联邦迁移学习等多个场景。既提供了可靠的安全性,也拥有可观的性能。
## PaddleFL框架设计
### 横向联邦方案
......@@ -131,7 +130,7 @@ Paddle Encrypted 设计与PaddlePaddle相似,没有密码学相关背景的用
Paddle Encrypted 中的安全训练和推理任务是基于底层的ABY3多方计算协议实现的。在ABY3中,参与方可分为:输入方、计算方和结果方。
输入方为训练数据及模型的持有方,负责加密数据和模型,并将其发送到计算方。计算方为训练的执行方,基于特定的多方安全计算协议完成训练任务。计算方只能得到加密后的数据及模型,以保证数据隐>私。计算结束后,结果方会拿到计算结果并恢复出明文数据。每个参与方可充当多个角色,如一个数据拥有方也可以作为计算方参与训练。
输入方为训练数据及模型的持有方,负责加密数据和模型,并将其发送到计算方。计算方为训练的执行方,基于特定的多方安全计算协议完成训练任务。计算方只能得到加密后的数据及模型,以保证数据隐私。计算结束后,结果方会拿到计算结果并恢复出明文数据。每个参与方可充当多个角色,如一个数据拥有方也可以作为计算方参与训练。
Paddle Encrypted的整个训练及推理过程主要由三个部分组成:数据准备,训练/推理,结果解析。
......@@ -139,7 +138,7 @@ Paddle Encrypted的整个训练及推理过程主要由三个部分组成:数
##### 1. 私有数据对齐
Paddle Encrypted允许数据拥有方(数据方)在不泄露自己数据的情况下,找出多方共有的样本集合。此功能在纵向联邦学习中非常必要,因为其要求多个数据方在训练前进行数据对齐,并且保护用户的数>据隐私。凭借PSI算法,Paddle Encrypted可以在一秒内完成6万条数据的对齐。
Paddle Encrypted允许数据拥有方(数据方)在不泄露自己数据的情况下,找出多方共有的样本集合。此功能在纵向联邦学习中非常必要,因为其要求多个数据方在训练前进行数据对齐,并且保护用户的数据隐私。凭借PSI算法,Paddle Encrypted可以在一秒内完成6万条数据的对齐。
##### 2. 数据加密及分发
......
......@@ -10,7 +10,7 @@ This document introduces how to use PaddleFL to train a model with Fl Strategy.
Please use pip which has paddlepaddle installed
```
```sh
pip install paddle_fl
```
......@@ -18,7 +18,7 @@ pip install paddle_fl
PaddleFL has two phases , CompileTime and RunTime. In CompileTime, a federated learning task is defined by fl_master. In RunTime, a federated learning job is executed on fl_server and fl_trainer in distributed clusters.
```
```sh
sh run.sh
```
......@@ -26,7 +26,7 @@ sh run.sh
In this example, we implement compile time programs in fl_master.py
```
```sh
python fl_master.py
```
......@@ -84,7 +84,7 @@ job_generator.generate_fl_job(
#### How to work in RunTime
```
```sh
python -u fl_scheduler.py >scheduler.log &
python -u fl_server.py >server0.log &
python -u fl_trainer.py 0 >trainer0.log &
......@@ -92,7 +92,7 @@ python -u fl_trainer.py 1 >trainer1.log &
```
In fl_scheduler.py, we let server and trainers to do registeration.
```
```python
worker_num = 2
server_num = 1
# Define the number of worker/server and the port for scheduler
......@@ -104,7 +104,7 @@ scheduler.start_fl_training()
```
In fl_server.py, we load and run the FL server job.
```
```python
server = FLServer()
server_id = 0
job_path = "fl_job_config"
......@@ -118,7 +118,7 @@ server.start()
In fl_trainer.py, we load and run the FL trainer job, then evaluate the accuracy with test data and compute the privacy budget. The DataSet is ramdomly generated.
```
```python
def reader():
for i in range(1000):
data_dict = {}
......
......@@ -10,7 +10,7 @@ This document introduces how to use PaddleFL to train a model with Fl Strategy:
Please use pip which has paddlepaddle installed
```
```sh
pip install paddle_fl
```
......@@ -35,7 +35,7 @@ The dataset will downloaded automatically in the API and will be located under `
PaddleFL has two phases , CompileTime and RunTime. In CompileTime, a federated learning task is defined by fl_master. In RunTime, a federated learning job is executed on fl_server and fl_trainer in distributed clusters.
```
```sh
sh run.sh
```
......@@ -43,7 +43,7 @@ sh run.sh
In this example, we implement compile time programs in fl_master.py
```
```sh
python fl_master.py
```
......@@ -100,7 +100,7 @@ job_generator.generate_fl_job(
#### How to work in RunTime
```
```sh
python -u fl_scheduler.py >scheduler.log &
python -u fl_server.py >server0.log &
python -u fl_trainer.py 0 >trainer0.log &
......@@ -110,7 +110,7 @@ python -u fl_trainer.py 3 >trainer3.log &
```
In fl_scheduler.py, we let server and trainers to do registeration.
```
```python
worker_num = 4
server_num = 1
#Define number of worker/server and the port for scheduler
......@@ -122,7 +122,7 @@ scheduler.start_fl_training()
```
In fl_server.py, we load and run the FL server job.
```
```python
server = FLServer()
server_id = 0
job_path = "fl_job_config"
......@@ -136,18 +136,15 @@ server.start()
In fl_trainer.py, we load and run the FL trainer job, then evaluate the accuracy with test data and compute the privacy budget.
```
```python
trainer_id = int(sys.argv[1]) # trainer id for each guest
job_path = "fl_job_config"
job = FLRunTimeJob()
job.load_trainer_job(job_path, trainer_id)
trainer = FLTrainerFactory().create_fl_trainer(job)
trainer.start()
```
```
def train_test(train_test_program, train_test_feed, train_test_reader):
acc_set = []
for test_data in train_test_reader():
......@@ -195,4 +192,4 @@ while not trainer.stop():
To show the effectiveness of DPSGD-based federated learning with PaddleFL, a simulated experiment is conducted on an open source dataset MNIST. From the figure given below, model evaluation results are similar between DPSGD-based federated learning and traditional parameter server training when the overall privacy budget *epsilon* is 1.3 or 0.13.
<img src="fl_dpsgd_benchmark.png" height=400 width=600 hspace='10'/> <br />
<img src="../../../../../docs/source/examples/md/fl_dpsgd_benchmark.png" height=400 width=600 hspace='10'/> <br />
......@@ -10,7 +10,7 @@ This document introduces how to use PaddleFL to train a model with Fl Strategy:
Please use pip which has paddlepaddle installed
```
```sh
pip install paddle_fl
```
......@@ -26,7 +26,7 @@ Public Dataset FEMNIST in [LEAF](https://github.com/TalwalkarLab/leaf)
PaddleFL has two phases , CompileTime and RunTime. In CompileTime, a federated learning task is defined by fl_master. In RunTime, a federated learning job is executed on fl_server and fl_trainer in distributed clusters.
```
```sh
sh run.sh
```
......@@ -34,7 +34,7 @@ sh run.sh
In this example, we implement compile time programs in fl_master.py
```
```sh
python fl_master.py
```
......@@ -99,7 +99,7 @@ job_generator.generate_fl_job(
#### How to work in RunTime
```
```sh
python -u fl_scheduler.py >scheduler.log &
python -u fl_server.py >server0.log &
for ((i=0;i<4;i++))
......@@ -109,7 +109,7 @@ done
```
In fl_scheduler.py, we let server and trainers to do registeration.
```
```python
worker_num = 4
server_num = 1
# Define the number of worker/server and the port for scheduler
......@@ -121,7 +121,7 @@ scheduler.start_fl_training()
```
In fl_server.py, we load and run the FL server job.
```
```python
server = FLServer()
server_id = 0
job_path = "fl_job_config"
......@@ -135,7 +135,7 @@ server.start()
In fl_trainer.py, we load and run the FL trainer job.
```
```python
trainer_id = int(sys.argv[1]) # trainer id for each guest
job_path = "fl_job_config"
job = FLRunTimeJob()
......
......@@ -9,7 +9,7 @@ This document introduces how to load a pre-defined model, and transfer into prog
Please use pip which has paddlepaddle installed
```
```sh
pip install paddle_fl
```
......@@ -18,7 +18,8 @@ pip install paddle_fl
#### How to save a program
In program_saver.py, you can defind a model. And save the program in to 'load_file'
```
```python
input = fluid.layers.data(name='input', shape=[1, 28, 28], dtype="float32")
label = fluid.layers.data(name='label', shape=[1], dtype='int64')
feeder = fluid.DataFeeder(feed_list=[input, label], place=fluid.CPUPlace())
......@@ -42,7 +43,7 @@ job_generator.save_program(program_path, [input, label],
In fl_master.py, you can load the program in 'load_file' and transfer it into an fl program.
```
```python
build_strategy = FLStrategyFactory()
build_strategy.fed_avg = True
build_strategy.inner_step = 10
......@@ -62,7 +63,7 @@ job_generator.generate_fl_job_from_program(
#### How to work in RunTime
```
```sh
python -u fl_scheduler.py >scheduler.log &
python -u fl_server.py >server0.log &
python -u fl_trainer.py 0 >trainer0.log &
......@@ -70,7 +71,7 @@ python -u fl_trainer.py 1 >trainer1.log &
```
In fl_scheduler.py, we let server and trainers to do registeration.
```
```python
worker_num = 2
server_num = 1
#Define number of worker/server and the port for scheduler
......@@ -82,7 +83,7 @@ scheduler.start_fl_training()
```
In fl_server.py, we load and run the FL server job.
```
```python
server = FLServer()
server_id = 0
job_path = "fl_job_config"
......@@ -95,7 +96,8 @@ server.start()
```
In fl_trainer.py, we load and run the FL trainer job, then evaluate the accuracy with test data.
```
```python
trainer_id = int(sys.argv[1]) # trainer id for each guest
job_path = "fl_job_config"
job = FLRunTimeJob()
......
......@@ -104,4 +104,4 @@ wget https://paddle-zwh.bj.bcebos.com/gru4rec_paddlefl_benchmark/gru4rec_benchma
| 1/4 of the whole dataset | private training | - | 0.269 |
| 1/4 of the whole dataset | private training | - | 0.282 |
<img src="fl_benchmark.png" height=300 width=500 hspace='10'/> <br />
<img src="../../../../../docs/source/examples/md/fl_benchmark.png" height=300 width=500 hspace='10'/> <br />
......@@ -10,7 +10,7 @@ This document introduces how to use PaddleFL to train a model with Fl Strategy:
Please use pip which has paddlepaddle installed
```
```sh
pip install paddle_fl
```
......@@ -35,7 +35,7 @@ The dataset will downloaded automatically in the API and will be located under `
PaddleFL has two phases , CompileTime and RunTime. In CompileTime, a federated learning task is defined by fl_master. In RunTime, a federated learning job is executed on fl_server and fl_trainer in distributed clusters.
```
```sh
sh run.sh
```
......@@ -43,7 +43,7 @@ sh run.sh
In this example, we implement compile time programs in fl_master.py
```
```sh
python fl_master.py
```
......@@ -98,7 +98,7 @@ job_generator.generate_fl_job(
#### How to work in RunTime
```shell
```sh
python3 fl_master.py
sleep 2
python3 -u fl_server.py >log/server0.log &
......@@ -109,7 +109,7 @@ python3 -u fl_trainer.py 1 >log/trainer1.log &
```
In fl_scheduler.py, we let server and trainers to do registeration.
```
```python
worker_num = 2
server_num = 1
#Define number of worker/server and the port for scheduler
......@@ -122,7 +122,7 @@ scheduler.start_fl_training()
In fl_server.py, we load and run the FL server job.
```
```python
server = FLServer()
server_id = 0
job_path = "fl_job_config"
......
......@@ -10,7 +10,7 @@ This document introduces how to submit an FL job to mpi cluster
Please use pip which has paddlepaddle installed
```
```sh
pip install paddle_fl
```
......@@ -26,7 +26,7 @@ pip install paddle_fl
The information of the cluster is defined in config.txt and will be transmitted into client.py. Then a function called job_generator() will generate job for fl_server and fl_trainer. Finally, the job will be submitted.
The train_program.py is the executed program in cluster.
```
```sh
#use the python prepared above to submit job
python/bin/python client.py config.txt
```
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册