@@ -9,6 +9,36 @@ PaddleFL is an open source federated learning framework based on PaddlePaddle. R
Data is becoming more and more expensive nowadays, and sharing of raw data is very hard across organizations. Federated Learning aims to solve the problem of data isolation and secure sharing of data knowledge among organizations. The concept of federated learning is proposed by researchers in Google [1, 2, 3].
In PaddleFL, horizontal and vertical federated learning strategies will be implemented according to the categorization given in [4]. Application demonstrations in natural language processing, computer vision and recommendation will be provided in PaddleFL.
#### A. Federated Learning Strategy
-**Vertical Federated Learning**: Logistic Regression with PrivC, Neural Network with third-party PrivC [5]
Paddle Fluid Encrypted is a framework for privacy-preserving deep learning based on PaddlePaddle. It follows the same running mechanism and programming paradigm with PaddlePaddle, while using secure multi-party computation (MPC) to enable secure training and prediction.
With Paddle Fluid Encrypted, it is easy to train models or conduct prediction as on PaddlePaddle over encrypted data, without the need for cryptography expertise. Furthermore, the rich industry-oriented models and algorithms built on PaddlePaddle can be smoothly migrated to secure versions on Paddle Fluid Encrypted with little effort.
As a key product of PaddleFL, Paddle Fluid Encrypted intrinsically supports federated learning well, including horizontal, vertical and transfer learning scenarios. It provides both provable security (semantic security) and competitive performance.
## Compilation and Installation
### Docker Installation
...
...
@@ -55,7 +85,6 @@ Then you can put the directory in the following command and make:
In PaddleFL, horizontal and vertical federated learning strategies will be implemented according to the categorization given in [4]. Application demonstrations in natural language processing, computer vision and recommendation will be provided in PaddleFL.
#### A. Federated Learning Strategy
-**Vertical Federated Learning**: Logistic Regression with PrivC, Neural Network with third-party PrivC [5]
Paddle Fluid Encrypted is a framework for privacy-preserving deep learning based on PaddlePaddle. It follows the same running mechanism and programming paradigm with PaddlePaddle, while using secure multi-party computation (MPC) to enable secure training and prediction.
With Paddle Fluid Encrypted, it is easy to train models or conduct prediction as on PaddlePaddle over encrypted data, without the need for cryptography expertise. Furthermore, the rich industry-oriented models and algorithms built on PaddlePaddle can be smoothly migrated to secure versions on Paddle Fluid Encrypted with little effort.
As a key product of PaddleFL, Paddle Fluid Encrypted intrinsically supports federated learning well, including horizontal, vertical and transfer learning scenarios. It provides both provable security (semantic security) and competitive performance.
## Framework design of PaddleFL
### Horizontal Federated Learning
...
...
@@ -128,6 +127,7 @@ In PaddleFL, components for defining a federated learning task and training a fe
-**FL-scheduler**: Decide which set of trainers can join the training before each updating cycle.
For more instructions, please refer to the [examples](./python/paddle_fl/paddle_fl/examples)
### Paddle Encrypted
Paddle Fluid Encrypted implements secure training and inference tasks based on the underlying MPC protocol of ABY3, in which participants can be classified into roles of Input Party (IP), Computing Party (CP) and Result Party (RP).
@@ -10,7 +10,7 @@ This document introduces how to use PaddleFL to train a model with Fl Strategy.
Please use pip which has paddlepaddle installed
```
```sh
pip install paddle_fl
```
...
...
@@ -18,7 +18,7 @@ pip install paddle_fl
PaddleFL has two phases , CompileTime and RunTime. In CompileTime, a federated learning task is defined by fl_master. In RunTime, a federated learning job is executed on fl_server and fl_trainer in distributed clusters.
```
```sh
sh run.sh
```
...
...
@@ -26,7 +26,7 @@ sh run.sh
In this example, we implement compile time programs in fl_master.py
In fl_scheduler.py, we let server and trainers to do registeration.
```
```python
worker_num=2
server_num=1
# Define the number of worker/server and the port for scheduler
...
...
@@ -104,7 +104,7 @@ scheduler.start_fl_training()
```
In fl_server.py, we load and run the FL server job.
```
```python
server=FLServer()
server_id=0
job_path="fl_job_config"
...
...
@@ -118,7 +118,7 @@ server.start()
In fl_trainer.py, we load and run the FL trainer job, then evaluate the accuracy with test data and compute the privacy budget. The DataSet is ramdomly generated.
@@ -10,7 +10,7 @@ This document introduces how to use PaddleFL to train a model with Fl Strategy:
Please use pip which has paddlepaddle installed
```
```sh
pip install paddle_fl
```
...
...
@@ -35,7 +35,7 @@ The dataset will downloaded automatically in the API and will be located under `
PaddleFL has two phases , CompileTime and RunTime. In CompileTime, a federated learning task is defined by fl_master. In RunTime, a federated learning job is executed on fl_server and fl_trainer in distributed clusters.
```
```sh
sh run.sh
```
...
...
@@ -43,7 +43,7 @@ sh run.sh
In this example, we implement compile time programs in fl_master.py
To show the effectiveness of DPSGD-based federated learning with PaddleFL, a simulated experiment is conducted on an open source dataset MNIST. From the figure given below, model evaluation results are similar between DPSGD-based federated learning and traditional parameter server training when the overall privacy budget *epsilon* is 1.3 or 0.13.
@@ -10,7 +10,7 @@ This document introduces how to use PaddleFL to train a model with Fl Strategy:
Please use pip which has paddlepaddle installed
```
```sh
pip install paddle_fl
```
...
...
@@ -26,7 +26,7 @@ Public Dataset FEMNIST in [LEAF](https://github.com/TalwalkarLab/leaf)
PaddleFL has two phases , CompileTime and RunTime. In CompileTime, a federated learning task is defined by fl_master. In RunTime, a federated learning job is executed on fl_server and fl_trainer in distributed clusters.
```
```sh
sh run.sh
```
...
...
@@ -34,7 +34,7 @@ sh run.sh
In this example, we implement compile time programs in fl_master.py
```
```sh
python fl_master.py
```
...
...
@@ -99,7 +99,7 @@ job_generator.generate_fl_job(
#### How to work in RunTime
```
```sh
python -u fl_scheduler.py >scheduler.log &
python -u fl_server.py >server0.log &
for((i=0;i<4;i++))
...
...
@@ -109,7 +109,7 @@ done
```
In fl_scheduler.py, we let server and trainers to do registeration.
```
```python
worker_num=4
server_num=1
# Define the number of worker/server and the port for scheduler
...
...
@@ -121,7 +121,7 @@ scheduler.start_fl_training()
```
In fl_server.py, we load and run the FL server job.
```
```python
server=FLServer()
server_id=0
job_path="fl_job_config"
...
...
@@ -135,7 +135,7 @@ server.start()
In fl_trainer.py, we load and run the FL trainer job.
```
```python
trainer_id=int(sys.argv[1])# trainer id for each guest
@@ -10,7 +10,7 @@ This document introduces how to use PaddleFL to train a model with Fl Strategy:
Please use pip which has paddlepaddle installed
```
```sh
pip install paddle_fl
```
...
...
@@ -35,7 +35,7 @@ The dataset will downloaded automatically in the API and will be located under `
PaddleFL has two phases , CompileTime and RunTime. In CompileTime, a federated learning task is defined by fl_master. In RunTime, a federated learning job is executed on fl_server and fl_trainer in distributed clusters.
```
```sh
sh run.sh
```
...
...
@@ -43,7 +43,7 @@ sh run.sh
In this example, we implement compile time programs in fl_master.py
@@ -10,7 +10,7 @@ This document introduces how to submit an FL job to mpi cluster
Please use pip which has paddlepaddle installed
```
```sh
pip install paddle_fl
```
...
...
@@ -26,7 +26,7 @@ pip install paddle_fl
The information of the cluster is defined in config.txt and will be transmitted into client.py. Then a function called job_generator() will generate job for fl_server and fl_trainer. Finally, the job will be submitted.
The train_program.py is the executed program in cluster.