未验证 提交 7d144c3b 编写于 作者: Q Qinghe JING 提交者: GitHub

Merge pull request #73 from qjing666/add_document

Update document in new_version
......@@ -11,17 +11,19 @@ Data is becoming more and more expensive nowadays, and sharing of raw data is ve
## Overview of PaddleFL
### Horizontal Federated Learning
<img src='images/FL-framework.png' width = "1000" height = "320" align="middle"/>
In PaddleFL, horizontal and vertical federated learning strategies will be implemented according to the categorization given in [4]. Application demonstrations in natural language processing, computer vision and recommendation will be provided in PaddleFL.
#### Federated Learning Strategy
#### A. Federated Learning Strategy
- **Vertical Federated Learning**: Logistic Regression with PrivC, Neural Network with third-party PrivC [5]
- **Vertical Federated Learning**: Logistic Regression with PrivC[5], Neural Network with ABY3 [11]
- **Horizontal Federated Learning**: Federated Averaging [2], Differential Privacy [6], Secure Aggregation
#### Training Strategy
#### B. Training Strategy
- **Multi Task Learning** [7]
......@@ -29,14 +31,86 @@ In PaddleFL, horizontal and vertical federated learning strategies will be imple
- **Active Learning**
### Federated Learning with MPC
<img src='images/PFM-overview.png' width = "1000" height = "446" align="middle"/>
Paddle FL MPC (PFM) is a framework for privacy-preserving deep learning based on PaddlePaddle. It follows the same running mechanism and programming paradigm with PaddlePaddle, while using secure multi-party computation (MPC) to enable secure training and prediction.
With PFM, it is easy to train models or conduct prediction as on PaddlePaddle over encrypted data, without the need for cryptography expertise. Furthermore, the rich industry-oriented models and algorithms built on PaddlePaddle can be smoothly migrated to secure versions on PFM with little effort.
As a key product of PaddleFL, PFM intrinsically supports federated learning well, including horizontal, vertical and transfer learning scenarios. It provides both provable security (semantic security) and competitive performance.
## Compilation and Installation
### Docker Installation
```sh
#Pull and run the docker
docker pull hub.baidubce.com/paddlefl/paddle_fl:latest
docker run --name <docker_name> --net=host -it -v $PWD:/root <image id> /bin/bash
#Install paddle_fl
pip install paddle_fl
```
### Compile From Source Code
#### A. Environment preparation
* CentOS 6 or CentOS 7 (64 bit)
* Python 2.7.15+/3.5.1+/3.6/3.7 ( 64 bit) or above
* pip or pip3 9.0.1+ (64 bit)
* PaddlePaddle release 1.8
* Redis 5.0.8 (64 bit)
* GCC or G++ 4.8.3+
* cmake 3.15+
#### B. Clone the source code, compile and install
Fetch the source code and checkout stable release
```sh
git clone https://github.com/PaddlePaddle/PaddleFL
cd /path/to/PaddleFL
# Checkout stable release
mkdir build && cd build
```
Execute compile commands, where `PYTHON_EXECUTABLE` is path to the python binary where the PaddlePaddle is installed, `CMAKE_CXX_COMPILER` is the path of G++ and `PYTHON_INCLUDE_DIRS` is the corresponding python include directory. You can get the `PYTHON_INCLUDE_DIRS` via the following command:
```sh
${PYTHON_EXECUTABLE} -c "from distutils.sysconfig import get_python_inc;print(get_python_inc())"
```
Then you can put the directory in the following command and make:
```sh
cmake ../ -DPYTHON_EXECUTABLE=${PYTHON_EXECUTABLE} -DPYTHON_INCLUDE_DIRS=${python_include_dir} -DCMAKE_CXX_COMPILER=${g++_path}
make -j$(nproc)
```
Install the package:
```sh
make install
cd /path/to/PaddleFL/python
${PYTHON_EXECUTABLE} setup.py sdist bdist_wheel
pip or pip3 install dist/***.whl -U
```
We also prepare a stable redis package for you to download and install
```sh
wget --no-check-certificate https://paddlefl.bj.bcebos.com/redis-stable.tar
tar -xf redis-stable.tar
cd redis-stable && make
```
## Framework design of PaddleFL
### Horizontal Federated Learning
<img src='images/FL-training.png' width = "1000" height = "400" align="middle"/>
In PaddleFL, components for defining a federated learning task and training a federated learning job are as follows:
#### Compile Time
#### A. Compile Time
- **FL-Strategy**: a user can define federated learning strategies with FL-Strategy such as Fed-Avg[2]
......@@ -46,7 +120,7 @@ In PaddleFL, components for defining a federated learning task and training a fe
- **FL-Job-Generator**: Given FL-Strategy, User-Defined Program and Distributed Training Config, FL-Job for federated server and worker will be generated through FL Job Generator. FL-Jobs will be sent to organizations and federated parameter server for run-time execution.
#### Run Time
#### B. Run Time
- **FL-Server**: federated parameter server that usually runs in cloud or third-party clusters.
......@@ -54,36 +128,110 @@ In PaddleFL, components for defining a federated learning task and training a fe
- **FL-scheduler**: Decide which set of trainers can join the training before each updating cycle.
## Install Guide and Quick-Start
For more instructions, please refer to the [examples](./python/paddle_fl/paddle_fl/examples)
### Federated Learning with MPC
Paddle FL MPC implements secure training and inference tasks based on the underlying MPC protocol like ABY3[11], which is a high efficient three-party computing model.
In ABY3, participants can be classified into roles of Input Party (IP), Computing Party (CP) and Result Party (RP). Input Parties (e.g., the training data/model owners) encrypt and distribute data or models to Computing Parties. Computing Parties (e.g., the VM on the cloud) conduct training or inference tasks based on specific MPC protocols, being restricted to see only the encrypted data or models, and thus guarantee the data privacy. When the computation is completed, one or more Result Parties (e.g., data owners or specified third-party) receive the encrypted results from Computing Parties, and reconstruct the plaintext results. Roles can be overlapped, e.g., a data owner can also act as a computing party.
A full training or inference process in PFM consists of mainly three phases: data preparation, training/inference, and result reconstruction.
#### A. Data preparation
##### 1. Private data alignment
PFM enables data owners (IPs) to find out records with identical keys (like UUID) without revealing private data to each other. This is especially useful in the vertical learning cases where segmented features with same keys need to be identified and aligned from all owners in a private manner before training. Using the OT-based PSI (Private Set Intersection) algorithm, PFM can perform private alignment at a speed of up to 60k records per second.
##### 2. Encryption and distribution
In PFM, data and models from IPs will be encrypted using Secret-Sharing[10], and then be sent to CPs, via directly transmission or distributed storage like HDFS. Each CP can only obtain one share of each piece of data, and thus is unable to recover the original value in the Semi-honest model.
#### B. Training/inference
<img src='images/PFM-design.png' width = "1000" height = "622" align="middle"/>
As in PaddlePaddle, a training or inference job can be separated into the compile-time phase and the run-time phase:
Please reference [Quick Start](https://paddlefl.readthedocs.io/en/latest/instruction.html) for installation and quick-start example.
##### 1. Compile time
* **MPC environment specification**: a user needs to choose a MPC protocol, and configure the network settings. In current version, PFM provides only the "ABY3" protocol. More protocol implementation will be provided in future.
* **User-defined job program**: a user can define the machine learning model structure and the training strategies (or inference task) in a PFM program, using the secure operators.
##### 2. Run time
A PFM program is exactly a PaddlePaddle program, and will be executed as normal PaddlePaddle programs. For example, in run-time a PFM program will be transpiled into ProgramDesc, and then be passed to and run by the Executor. The main concepts in the run-time phase are as follows:
* **Computing nodes**: a computing node is an entity corresponding to a Computing Party. In real deployment, it can be a bare-metal machine, a cloud VM, a docker or even a process. PFM requires exactly three computing nodes in each run, which is determined by the underlying ABY3 protocol. A PFM program will be deployed and run in parallel on all three computing nodes.
* **Operators using MPC**: PFM provides typical machine learning operators in `paddle_fl.mpc` over encrypted data. Such operators are implemented upon PaddlePaddle framework, based on MPC protocols like ABY3. Like other PaddlePaddle operators, in run time, instances of PFM operators are created and run in order by Executor.
#### C. Result reconstruction
Upon completion of the secure training (or inference) job, the models (or prediction results) will be output by CPs in encrypted form. Result Parties can collect the encrypted results, decrypt them using the tools in PFM, and deliver the plaintext results to users.
For more instructions, please refer to [mpc examples](./python/paddle_fl/mpc/examples)
## Easy deployment with kubernetes
### Horizontal Federated Learning
```sh
kubectl apply -f ./paddle_fl/examples/k8s_deployment/master.yaml
kubectl apply -f ./python/paddle_fl/paddle_fl/examples/k8s_deployment/master.yaml
```
Please refer [K8S deployment example](./paddle_fl/examples/k8s_deployment/README.md) for details
Please refer [K8S deployment example](./python/paddle_fl/paddle_fl/examples/k8s_deployment/README.md) for details
You can also refer [K8S cluster application and kubectl installation](./python/paddle_fl/paddle_fl/examples/k8s_deployment/deploy_instruction.md) to deploy your K8S cluster
### Federated Learning with MPC
To be added.
You can also refer [K8S cluster application and kubectl installation](./paddle_fl/examples/k8s_deployment/deploy_instruction.md) to deploy your K8S cluster
## Benchmark task
### Horzontal Federated Learning
Gru4Rec [9] introduces recurrent neural network model in session-based recommendation. PaddlePaddle's Gru4Rec implementation is in https://github.com/PaddlePaddle/models/tree/develop/PaddleRec/gru4rec. An example is given in [Gru4Rec in Federated Learning](https://paddlefl.readthedocs.io/en/latest/examples/gru4rec_examples.html)
## Release note
- v0.2.0 released
- Support Kubernetes easy deployment
- Add api for [LEAF](https://arxiv.org/abs/1812.01097) dataset which is in federated settings, supporting benchmark experiments.
- Add FL-scheduler, acting as a central controller during the training phase.
- Add FL-Submitter to support cluster task submission
- Add secure aggregation algorithm
- Support more optimizers in PaddleFL such as Adam
- More examples available
### Federated Learning with MPC
#### A. Convergence of paddle_fl.mpc vs paddle
##### 1. Training Parameters
- Dataset: Boston house price dataset
- Number of Epoch: 20
- Batch Size: 10
##### 2. Experiment Results
| Epoch/Step | paddle_fl.mpc | Paddle |
| ---------- | ------------- | ------ |
| Epoch=0, Step=0 | 738.39491 | 738.46204 |
| Epoch=1, Step=0 | 630.68834 | 629.9071 |
| Epoch=2, Step=0 | 539.54683 | 538.1757 |
| Epoch=3, Step=0 | 462.41159 | 460.64722 |
| Epoch=4, Step=0 | 397.11516 | 395.11017 |
| Epoch=5, Step=0 | 341.83102 | 339.69815 |
| Epoch=6, Step=0 | 295.01114 | 292.83597 |
| Epoch=7, Step=0 | 255.35141 | 253.19429 |
| Epoch=8, Step=0 | 221.74739 | 219.65132 |
| Epoch=9, Step=0 | 193.26459 | 191.25981 |
| Epoch=10, Step=0 | 169.11423 | 167.2204 |
| Epoch=11, Step=0 | 148.63138 | 146.85835 |
| Epoch=12, Step=0 | 131.25081 | 129.60391 |
| Epoch=13, Step=0 | 116.49708 | 114.97599 |
| Epoch=14, Step=0 | 103.96669 | 102.56854 |
| Epoch=15, Step=0 | 93.31706 | 92.03858 |
| Epoch=16, Step=0 | 84.26219 | 83.09653 |
| Epoch=17, Step=0 | 76.55664 | 75.49785 |
| Epoch=18, Step=0 | 69.99673 | 69.03561 |
| Epoch=19, Step=0 | 64.40562 | 63.53539 |
## On Going and Future Work
- Vertical Federated Learning Strategies and more horizontal federated learning strategies will be open sourced.
- Vertial Federated Learning will support more algorithms.
- Add K8S deployment scheme for Paddle Encrypted.
## Reference
......@@ -95,7 +243,7 @@ Gru4Rec [9] introduces recurrent neural network model in session-based recommend
[4]. Qiang Yang, Yang Liu, Tianjian Chen, Yongxin Tong. **Federated Machine Learning: Concept and Applications.** 2019
[5]. Kai He, Liu Yang, Jue Hong, Jinghua Jiang, Jieming Wu, Xu Dong et al. **PrivC - A framework for efficient Secure Two-Party Computation. In Proceedings of 15th EAI International Conference on Security and Privacy in Communication Networks.** SecureComm 2019
[5]. Kai He, Liu Yang, Jue Hong, Jinghua Jiang, Jieming Wu, Xu Dong et al. **PrivC - A framework for efficient Secure Two-Party Computation.** In Proc. of SecureComm 2019
[6]. Martín Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, Li Zhang. **Deep Learning with Differential Privacy.** 2016
......@@ -104,3 +252,7 @@ Gru4Rec [9] introduces recurrent neural network model in session-based recommend
[8]. Yang Liu, Tianjian Chen, Qiang Yang. **Secure Federated Transfer Learning.** 2018
[9]. Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, Domonkos Tikk. **Session-based Recommendations with Recurrent Neural Networks.** 2016
[10]. https://en.wikipedia.org/wiki/Secret_sharing
[11]. Payman Mohassel and Peter Rindal. **ABY3: A Mixed Protocol Framework for Machine Learning.** In Proc. of CCS 2018
......@@ -8,17 +8,19 @@ PaddleFL是一个基于PaddlePaddle的开源联邦学习框架。研究人员可
## PaddleFL概述
### 横向联邦方案
<img src='images/FL-framework-zh.png' width = "1300" height = "310" align="middle"/>
在PaddleFL中,横向和纵向联邦学习策略将根据[4]中给出的分类来实现。PaddleFL也将提供在自然语言处理,计算机视觉和推荐算法等领域的应用示例。
#### 联邦学习策略
#### A. 联邦学习策略
- **纵向联邦学习**: 带privc的逻辑回归,带第三方privc的神经网络[5]
- **纵向联邦学习**: 带privc的逻辑回归,带ABY3[11]的神经网络
- **横向联邦学习**: 联邦平均 [2],差分隐私 [6],安全聚合
#### 训练策略
#### B. 训练策略
- **多任务学习** [7]
......@@ -26,14 +28,88 @@ PaddleFL是一个基于PaddlePaddle的开源联邦学习框架。研究人员可
- **主动学习**
### Federated Learning with MPC
<img src='images/PFM-overview.png' width = "1000" height = "446" align="middle"/>
Paddle FL MPC(PFM) 是一个基于PaddlePaddle的隐私保护深度学习框架。Paddle Encrypted基于多方计算(MPC)实现安全训练及预测,拥有与PaddlePaddle相同的运行机制及编程范式。
PFM 设计与PaddlePaddle相似,没有密码学相关背景的用户亦可简单的对加密的数据进行训练和预测。同时,PaddlePaddle中丰富的模型和算法可以轻易地迁移到PFM中。
作为PaddleFL的一个重要组成部分,PFM可以很好地支持联邦学习,包括横向、纵向及联邦迁移学习等多个场景。既提供了可靠的安全性,也拥有可观的性能。
## 编译与安装
### 使用docker安装
```sh
#Pull and run the docker
docker pull hub.baidubce.com/paddlefl/paddle_fl:latest
docker run --name <docker_name> --net=host -it -v $PWD:/root <image id> /bin/bash
#Install paddle_fl
pip install paddle_fl
```
### 从源码编译
#### A. 环境准备
* CentOS 6 or CentOS 7 (64 bit)
* Python 2.7.15+/3.5.1+/3.6/3.7 ( 64 bit) or above
* pip or pip3 9.0.1+ (64 bit)
* PaddlePaddle release 1.8
* Redis 5.0.8 (64 bit)
* GCC or G++ 4.8.3+
* cmake 3.15+
#### B. 克隆源代码并编译安装
获取源代码
```sh
git clone https://github.com/PaddlePaddle/PaddleFL
cd /path/to/PaddleFL
# Checkout stable release
mkdir build && cd build
```
执行编译指令, `PYTHON_EXECUTABLE` 为安装了PaddlePaddle的可执行python路径, `CMAKE_CXX_COMPILER` 为指定的g++路径。 `PYTHON_INCLUDE_DIRS` 是相应的include路径,可以用如下指令获得:
```sh
${PYTHON_EXECUTABLE} -c "from distutils.sysconfig import get_python_inc;print(get_python_inc())"
```
之后就可以执行编译和安装的指令
```sh
cmake ../ -DPYTHON_EXECUTABLE=${PYTHON_EXECUTABLE} -DPYTHON_INCLUDE_DIRS=${python_include_dir} -DCMAKE_CXX_COMPILER=${g++_path}
make -j$(nproc)
```
安装对应的安装包
```sh
make install
cd /path/to/PaddleFL/python
${PYTHON_EXECUTABLE} setup.py sdist bdist_wheel
pip or pip3 install dist/***.whl -U
```
我们也提供了稳定的redis安装包, 可供下载。
```sh
wget --no-check-certificate https://paddlefl.bj.bcebos.com/redis-stable.tar
tar -xf redis-stable.tar
cd redis-stable && make
```
## PaddleFL框架设计
### 横向联邦方案
<img src='images/FL-training.png' width = "1300" height = "450" align="middle"/>
在PaddeFL中,用于定义联邦学习任务和联邦学习训练工作的组件如下:
#### 编译时
#### A. 编译时
- **FL-Strategy**: 用户可以使用FL-Strategy定义联邦学习策略,例如Fed-Avg[2]。
......@@ -43,7 +119,7 @@ PaddleFL是一个基于PaddlePaddle的开源联邦学习框架。研究人员可
- **FL-Job-Generator**: 给定FL-Strategy, User-Defined Program 和 Distributed Training Config,联邦参数的Server端和Worker端的FL-Job将通过FL Job Generator生成。FL-Jobs 被发送到组织和联邦参数服务器以进行联合训练。
#### 运行时
#### B. 运行时
- **FL-Server**: 在云或第三方集群中运行的联邦参数服务器。
......@@ -51,36 +127,105 @@ PaddleFL是一个基于PaddlePaddle的开源联邦学习框架。研究人员可
- **FL-Scheduler**: 训练过程中起到调度Worker的作用,在每个更新周期前,决定哪些Worker可以参与训练。
## 安装指南和快速入门
请参考更多的[例子](./python/paddle_fl/paddle_fl/examples), 获取更多的信息。
### Federated Learning with MPC
Paddle FL MPC 中的安全训练和推理任务是基于高效的多方计算协议实现的,如ABY3[11]
在ABY3[11]中,参与方可分为:输入方、计算方和结果方。输入方为训练数据及模型的持有方,负责加密数据和模型,并将其发送到计算方。计算方为训练的执行方,基于特定的多方安全计算协议完成训练任务。计算方只能得到加密后的数据及模型,以保证数据隐私。计算结束后,结果方会拿到计算结果并恢复出明文数据。每个参与方可充当多个角色,如一个数据拥有方也可以作为计算方参与训练。
PFM的整个训练及推理过程主要由三个部分组成:数据准备,训练/推理,结果解析。
#### A. 数据准备
##### 1. 私有数据对齐
PFM允许数据拥有方(数据方)在不泄露自己数据的情况下,找出多方共有的样本集合。此功能在纵向联邦学习中非常必要,因为其要求多个数据方在训练前进行数据对齐,并且保护用户的数据隐私。凭借PSI算法,PFM可以在一秒内完成6万条数据的对齐。
请参考[快速开始](https://paddlefl.readthedocs.io/en/latest/instruction.html)
##### 2. 数据加密及分发
在PFM中,数据方将数据和模型用秘密共享[10]的方法加密,然后用直接传输或者数据库存储的方式传到计算方。每个计算方只会拿到数据的一部分,因此计算方无法还原真实数据。
#### B. 训练及推理
<img src='images/PFM-design.png' width = "1000" height = "622" align="middle"/>
像PaddlePaddle一样,训练和推理任务可以分为编译阶段和运行阶段。
##### 1. 编译时
* **确定MPC环境**:用户需要指定用到的MPC协议,并配置网络环境。现有版本的Paddle Encrypted只支持"ABY3"协议。更多的协议将在后续版本中支持。
* **用户定义训练任务**:用户可以根据PFM提供的安全接口,定义集齐学习网络以及训练策略。
##### 2. 运行时
一个Paddle Encrypted程序实际上就是一个PaddlePaddle程序。在运行时,PFM的程序将会转变为PaddlePaddle中的ProgramDesc,并交给Executor运行。以下是运行阶段的主要概念:
* **运算节点**:计算节点是与计算方相对应的实体。在实际部署中,它可以是裸机、云虚拟机、docker甚至进程。PFM在每次运行中只需要三个计算节点,这由底层ABY3协议决定。Paddle Encrypted程序将在所有三个计算节点上并行部署和运行。
* **基于MPC的算子**:PFM 为操作加密数据提供了特殊的算子,这些算子在PaddlePaddle框架中实现,基于像ABY3一样的MPC协议。像PaddlePaddle中一样,在运行时PFM的算子将被创建并按照顺序执行。
#### C. 结果重构
安全训练和推理工作完成后,模型(或预测结果)将由计算方以加密形式输出。结果方可以收集加密的结果,使用PFM中的工具对其进行解密,并将明文结果传递给用户。
请参考[MPC的例子](./python/paddle_fl/mpc/examples),以获取更多的信息。
## Kubernetes简单部署
### 横向联邦方案
```sh
kubectl apply -f ./paddle_fl/examples/k8s_deployment/master.yaml
kubectl apply -f ./python/paddle_fl/paddle_fl/examples/k8s_deployment/master.yaml
```
请参考[K8S部署实例](./paddle_fl/examples/k8s_deployment/README.md)
请参考[K8S部署实例](./python/paddle_fl/paddle_fl/examples/k8s_deployment/README.md)
也可以参考[K8S集群申请及kubectl安装](./python/paddle_fl/paddle_fl/examples/k8s_deployment/deploy_instruction.md) 配置自己的K8S集群
### Federated Learning with MPC
也可以参考[K8S集群申请及kubectl安装](./paddle_fl/examples/k8s_deployment/deploy_instruction.md) 配置自己的K8S集群
会在后续版本中发布。
## 性能测试
### 横向联邦方案
Gru4Rec [9] 在基于会话的推荐中引入了递归神经网络模型。PaddlePaddle的GRU4RC实现代码在 https://github.com/PaddlePaddle/models/tree/develop/PaddleRec/gru4rec. 一个基于联邦学习训练Gru4Rec模型的示例请参考[Gru4Rec in Federated Learning](https://paddlefl.readthedocs.io/en/latest/examples/gru4rec_examples.html)
## 版本更新
- v0.2.0 发布
- 支持 Kubernetes 简易部署
- 添加在联邦学习设定下的[LEAF](https://arxiv.org/abs/1812.01097) 公开数据集接口,支持基准的设定
- 添加 FL-scheduler, 在训练过程中充当中心控制器的角色
- 添加 FL-Submitter 功能,支持集群任务部署
- 添加 secure aggregation 算法
- 支持更多的机器学习优化器,例如:Adam
- 增加更多的实际应用例子
### Federated Learning with MPC
#### A. 精度测试
##### 1. 训练参数
- 数据集:波士顿房价预测。
- 训练轮数: 20
- Batch Size:10
##### 2. 实验结果
| Epoch/Step | paddle_fl.mpc | Paddle |
| ---------- | ------------- | ------ |
| Epoch=0, Step=0 | 738.39491 | 738.46204 |
| Epoch=1, Step=0 | 630.68834 | 629.9071 |
| Epoch=2, Step=0 | 539.54683 | 538.1757 |
| Epoch=3, Step=0 | 462.41159 | 460.64722 |
| Epoch=4, Step=0 | 397.11516 | 395.11017 |
| Epoch=5, Step=0 | 341.83102 | 339.69815 |
| Epoch=6, Step=0 | 295.01114 | 292.83597 |
| Epoch=7, Step=0 | 255.35141 | 253.19429 |
| Epoch=8, Step=0 | 221.74739 | 219.65132 |
| Epoch=9, Step=0 | 193.26459 | 191.25981 |
| Epoch=10, Step=0 | 169.11423 | 167.2204 |
| Epoch=11, Step=0 | 148.63138 | 146.85835 |
| Epoch=12, Step=0 | 131.25081 | 129.60391 |
| Epoch=13, Step=0 | 116.49708 | 114.97599 |
| Epoch=14, Step=0 | 103.96669 | 102.56854 |
| Epoch=15, Step=0 | 93.31706 | 92.03858 |
| Epoch=16, Step=0 | 84.26219 | 83.09653 |
| Epoch=17, Step=0 | 76.55664 | 75.49785 |
| Epoch=18, Step=0 | 69.99673 | 69.03561 |
| Epoch=19, Step=0 | 64.40562 | 63.53539 |
## 正在进行的工作
- 垂直联合学习策略和更多的水平联合学习策略将是开源的。
- 纵向联合学习支持更多的模型。
- 发布纵向联邦学习K8S部署方案。
## 参考文献
......@@ -92,7 +237,7 @@ Gru4Rec [9] 在基于会话的推荐中引入了递归神经网络模型。Paddl
[4]. Qiang Yang, Yang Liu, Tianjian Chen, Yongxin Tong. **Federated Machine Learning: Concept and Applications.** 2019
[5]. Kai He, Liu Yang, Jue Hong, Jinghua Jiang, Jieming Wu, Xu Dong et al. **PrivC - A framework for efficient Secure Two-Party Computation. In Proceedings of 15th EAI International Conference on Security and Privacy in Communication Networks.** SecureComm 2019
[5]. Kai He, Liu Yang, Jue Hong, Jinghua Jiang, Jieming Wu, Xu Dong et al. **PrivC - A framework for efficient Secure Two-Party Computation.** In Proc. of SecureComm 2019
[6]. Martín Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, Li Zhang. **Deep Learning with Differential Privacy.** 2016
......@@ -101,3 +246,7 @@ Gru4Rec [9] 在基于会话的推荐中引入了递归神经网络模型。Paddl
[8]. Yang Liu, Tianjian Chen, Qiang Yang. **Secure Federated Transfer Learning.** 2018
[9]. Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, Domonkos Tikk. **Session-based Recommendations with Recurrent Neural Networks.** 2016
[10]. https://en.wikipedia.org/wiki/Secret_sharing
[11]. Payman Mohassel and Peter Rindal. **ABY3: A Mixed Protocol Framework for Machine Learning.** In Proc. of CCS 2018
......@@ -4,14 +4,14 @@ This document introduces how to use PaddleFL to train a model with Fl Strategy:
### Dependencies
- paddlepaddle>=1.6
- paddlepaddle>=1.8
### How to install PaddleFL
Please use python which has paddlepaddle installed
Please use pip which has paddlepaddle installed
```
python setup.py install
pip install paddle_fl
```
### Model
......@@ -98,15 +98,28 @@ job_generator.generate_fl_job(
strategy, server_endpoints=endpoints, worker_num=2, output=output)
```
How to work in RunTime
#### How to work in RunTime
```
python -u fl_scheduler.py >scheduler.log &
python -u fl_server.py >server0.log &
python -u fl_trainer.py 0 data/ >trainer0.log &
python -u fl_trainer.py 1 data/ >trainer1.log &
python -u fl_trainer.py 0 >trainer0.log &
python -u fl_trainer.py 1 >trainer1.log &
python -u fl_trainer.py 2 >trainer2.log &
python -u fl_trainer.py 3 >trainer3.log &
```
In fl_scheduler.py, we let server and trainers to do registeration.
```
worker_num = 4
server_num = 1
#Define number of worker/server and the port for scheduler
scheduler = FLScheduler(worker_num, server_num, port=9091)
scheduler.set_sample_worker_num(4)
scheduler.init_env()
print("init env done.")
scheduler.start_fl_training()
```
In fl_server.py, we load and run the FL server job.
```
......@@ -115,7 +128,9 @@ server_id = 0
job_path = "fl_job_config"
job = FLRunTimeJob()
job.load_server_job(job_path, server_id)
job._scheduler_ep = "127.0.0.1:9091" # IP address for scheduler
server.set_server_job(job)
server._current_ep = "127.0.0.1:8181" # IP address for server
server.start()
```
......
......@@ -3,12 +3,13 @@
This document introduces how to use PaddleFL to train a model with Fl Strategy.
### Dependencies
- paddlepaddle>=1.6
- paddlepaddle>=1.8
### How to install PaddleFL
Please use python which has paddlepaddle installed
Please use pip which has paddlepaddle installed
```sh
python setup.py install
pip install paddle_fl
```
### Model
......@@ -20,7 +21,6 @@ Public Dataset [Rsc15](https://2015.recsyschallenge.com)
```sh
#download data
cd example/gru4rec_demo
sh download.sh
```
......@@ -63,7 +63,7 @@ strategy = build_strategy.create_fl_strategy()
endpoints = ["127.0.0.1:8181"]
output = "fl_job_config"
job_generator.generate_fl_job(
strategy, server_endpoints=endpoints, worker_num=2, output=output)
strategy, server_endpoints=endpoints, worker_num=4, output=output)
```
......@@ -72,8 +72,10 @@ job_generator.generate_fl_job(
```sh
python -u fl_scheduler.py >scheduler.log &
python -u fl_server.py >server0.log &
python -u fl_trainer.py 0 data/ >trainer0.log &
python -u fl_trainer.py 1 data/ >trainer1.log &
python -u fl_trainer.py 0 >trainer0.log &
python -u fl_trainer.py 1 >trainer1.log &
python -u fl_trainer.py 2 >trainer2.log &
python -u fl_trainer.py 3 >trainer3.log &
```
fl_trainer.py can define own reader according to data.
```python
......
# Example in Recognize Digits with DPSGD
# Example in Recognize Digits with Secure Aggragate
This document introduces how to use PaddleFL to train a model with Fl Strategy: Secure Aggregation. Using Secure Aggregation strategy, the server can aggregate the model parameters without learning the value of the parameters.
### Dependencies
- paddlepaddle>=1.6
- paddlepaddle>=1.8
### How to install PaddleFL
Please use python which has paddlepaddle installed
Please use pip which has paddlepaddle installed
```
python setup.py install
pip install paddle_fl
```
### Model
......@@ -96,7 +96,7 @@ job_generator.generate_fl_job(
```
How to work in RunTime
#### How to work in RunTime
```shell
python3 fl_master.py
......@@ -107,6 +107,18 @@ python3 -u fl_trainer.py 0 >log/trainer0.log &
sleep 2
python3 -u fl_trainer.py 1 >log/trainer1.log &
```
In fl_scheduler.py, we let server and trainers to do registeration.
```
worker_num = 2
server_num = 1
#Define number of worker/server and the port for scheduler
scheduler = FLScheduler(worker_num, server_num, port=9091)
scheduler.set_sample_worker_num(2)
scheduler.init_env()
print("init env done.")
scheduler.start_fl_training()
```
In fl_server.py, we load and run the FL server job.
......
##Instructions for PaddleFL-MPC MNIST Demo
## Instructions for PaddleFL-MPC MNIST Demo
([简体中文](./README_CN.md)|English)
This document introduces how to run MNIST demo based on Paddle-MPC, which has two ways of running, i.e., single machine and multi machines.
###1. Running on Single Machine
### 1. Running on Single Machine
####(1). Prepare Data
#### (1). Prepare Data
Generate encrypted training and testing data utilizing `generate_encrypted_data()` and `generate_encrypted_test_data()` in `process_data.py` script. For example, users can write the following code into a python script named `prepare.py`, and then run the script with command `python prepare.py`.
......@@ -18,7 +18,7 @@ process_data.generate_encrypted_test_data()
Encrypted data files of feature and label would be generated and saved in `/tmp` directory. Different suffix names are used for these files to indicate the ownership of different computation parties. For instance, a file named `mnist2_feature.part0` means it is a feature file of party 0.
####(2). Launch Demo with A Shell Script
#### (2). Launch Demo with A Shell Script
Launch demo with the `run_standalone.sh` script. The concrete command is:
......@@ -30,7 +30,7 @@ The information of current epoch and step will be displayed on screen while trai
Besides, predictions would be made in this demo once training is finished. The predictions with cypher text format would be save in `/tmp` directory, and the format of file name is similar to what is described in Step 1.
####(3). Decrypt Data
#### (3). Decrypt Data
Decrypt the saved prediction data and save the decrypted prediction results into a specified file using `decrypt_data_to_file()` in `process_data.py` script. For example, users can write the following code into a python script named `decrypt_save.py`, and then run the script with command `python decrypt_save.py`. The decrypted prediction results would be saved into `mpc_label`.
......@@ -51,17 +51,17 @@ fi
###2. Running on Multi Machines
### 2. Running on Multi Machines
####(1). Prepare Data
#### (1). Prepare Data
Data owner encrypts data. Concrete operations are consistent with “Prepare Data” in “Running on Single Machine”.
####(2). Distribute Encrypted Data
#### (2). Distribute Encrypted Data
According to the suffix of file name, distribute encrypted data files to `/tmp ` directories of all 3 computation parties. For example, send `mnist2_feature.part0` and `mnist2_label.part0` to `/tmp` of party 0 with `scp` command.
####(3). Modify mnist_demo.py
#### (3). Modify mnist_demo.py
Each computation party modifies `localhost` in the following code as the IP address of it's machine.
......@@ -69,7 +69,7 @@ Each computation party modifies `localhost` in the following code as the IP addr
pfl_mpc.init("aby3", int(role), "localhost", server, int(port))
```
####(4). Launch Demo on Each Party
#### (4). Launch Demo on Each Party
**Note** that Redis service is necessary for demo running. Remember to clear the cache of Redis server before launching demo on each computation party, in order to avoid any negative influences caused by the cached records in Redis. The following command can be used for clear Redis, where REDIS_BIN is the executable binary of redis-cli, SERVER and PORT represent the IP and port of Redis server respectively.
......@@ -87,9 +87,9 @@ where PYTHON_EXECUTABLE is the python which installs PaddleFL, PARTY_ID is the I
Similarly, predictions with cypher text format would be saved in `/tmp` directory, for example, a file named `mnist_output_prediction.part0` for party 0.
**Note** that remember to delete the precidtion files in `/tmp` directory generated in last running, in case of any influence on the decrypted results of current running.
**Note** that remember to delete the prediction files in `/tmp` directory generated in last running, in case of any influence on the decrypted results of current running.
####(5). Decrypt Prediction Data
#### (5). Decrypt Prediction Data
Each computation party sends `mnist_output_prediction.part` file in `/tmp` directory to the `/tmp` directory of data owner. Data owner decrypts the prediction data and saves the decrypted prediction results into a specified file using `decrypt_data_to_file()` in `process_data.py` script. For example, users can write the following code into a python script named `decrypt_save.py`, and then run the script with command `python decrypt_save.py`. The decrypted prediction results would be saved into file `mpc_label`.
......
##PaddleFL-MPC MNIST Demo运行说明
## PaddleFL-MPC MNIST Demo运行说明
(简体中文|[English](./README.md))
本示例介绍基于PaddleFL-MPC进行MNIST数据集模型训练和预测的使用说明,分为单机运行和多机运行两种方式。
###一. 单机运行
### 一. 单机运行
####1. 准备数据
#### 1. 准备数据
使用`process_data.py`脚本中的`generate_encrypted_data()``generate_encrypted_test_data()`产生加密训练数据和测试数据,比如将如下内容写到一个`prepare.py`脚本中,然后`python prepare.py`
......@@ -18,7 +18,7 @@ process_data.generate_encrypted_test_data()
将在/tmp目录下生成对应于3个计算party的feature和label的加密数据文件,以后缀名区分属于不同party的数据。比如,`mnist2_feature.part0`表示属于party0的feature数据。
####2. 使用shell脚本启动demo
#### 2. 使用shell脚本启动demo
使用`run_standalone.sh`脚本,启动并运行demo,命令如下:
......@@ -30,7 +30,7 @@ bash run_standalone.sh mnist_demo.py
此外,在完成训练之后,demo会继续进行预测,并将预测密文结果保存到/tmp目录下的文件中,文件命名格式类似于步骤1中所述。
####3. 解密数据
#### 3. 解密数据
使用`process_data.py`脚本中的`decrypt_data_to_file()`,将保存的密文预测结果进行解密,并且将解密得到的明文预测结果保存到指定文件中。例如,将下面的内容写到一个`decrypt_save.py`脚本中,然后`python decrypt_save.py`,将把明文预测结果保存在`mpc_label`文件中。
......@@ -51,19 +51,19 @@ fi
###二. 多机运行
### 二. 多机运行
####1. 准备数据
#### 1. 准备数据
数据方对数据进行加密处理。具体操作和单机运行中的准备数据步骤一致。
####2. 分发数据
#### 2. 分发数据
按照后缀名,将步骤1中准备好的数据分别发送到对应的计算party的/tmp目录下。比如,使用scp命令,将
`mnist2_feature.part0``mnist2_label.part0`发送到party0的/tmp目录下。
####3. 计算party修改mnist_demo.py脚本
#### 3. 计算party修改mnist_demo.py脚本
各计算party根据自己的机器环境,将脚本如下内容中的`localhost`修改为自己的IP地址:
......@@ -71,7 +71,7 @@ fi
pfl_mpc.init("aby3", int(role), "localhost", server, int(port))
```
####4. 各计算party启动demo
#### 4. 各计算party启动demo
**注意**:运行需要用到redis服务。为了确保redis中已保存的数据不会影响demo的运行,请在各计算party启动demo之前,使用如下命令清空redis。其中,REDIS_BIN表示redis-cli可执行程序,SERVER和PORT分别表示redis server的IP地址和端口号。
......@@ -91,7 +91,7 @@ $PYTHON_EXECUTABLE mnist_demo.py $PARTY_ID $SERVER $PORT
**注意**:再次启动运行demo之前,请先将上次在`/tmp`保存的prediction文件删除,以免影响本次密文数据的恢复结果。
####5. 解密预测数据
#### 5. 解密预测数据
各计算party将`/tmp`目录下的`mnist_output_prediction.part`文件发送到数据方的/tmp目录下。数据方使用`process_data.py`脚本中的`decrypt_data_to_file()`,将密文预测结果进行解密,并且将解密得到的明文预测结果保存到指定文件中。例如,将下面的内容写到一个`decrypt_save.py`脚本中,然后`python decrypt_save.py`,将把明文预测结果保存在`mpc_label`文件中。
......
##Instructions for PaddleFL-MPC UCI Housing Demo
## Instructions for PaddleFL-MPC UCI Housing Demo
([简体中文](./README_CN.md)|English)
This document introduces how to run UCI Housing demo based on Paddle-MPC, which has two ways of running, i.e., single machine and multi machines.
###1. Running on Single Machine
### 1. Running on Single Machine
####(1). Prepare Data
#### (1). Prepare Data
Generate encrypted data utilizing `generate_encrypted_data()` in `process_data.py` script. For example, users can write the following code into a python script named `prepare.py`, and then run the script with command `python prepare.py`.
......@@ -17,7 +17,7 @@ process_data.generate_encrypted_data()
Encrypted data files of feature and label would be generated and saved in `/tmp` directory. Different suffix names are used for these files to indicate the ownership of different computation parties. For instance, a file named `house_feature.part0` means it is a feature file of party 0.
####(2). Launch Demo with A Shell Script
#### (2). Launch Demo with A Shell Script
Launch demo with the `run_standalone.sh` script. The concrete command is:
......@@ -48,17 +48,17 @@ fi
###2. Running on Multi Machines
### 2. Running on Multi Machines
####(1). Prepare Data
#### (1). Prepare Data
Data owner encrypts data. Concrete operations are consistent with “Prepare Data” in “Running on Single Machine”.
####(2). Distribute Encrypted Data
#### (2). Distribute Encrypted Data
According to the suffix of file name, distribute encrypted data files to `/tmp ` directories of all 3 computation parties. For example, send `house_feature.part0` and `house_label.part0` to `/tmp` of party 0 with `scp` command.
####(3). Modify uci_housing_demo.py
#### (3). Modify uci_housing_demo.py
Each computation party makes the following modifications on `uci_housing_demo.py` according to the environment of machine.
......@@ -82,7 +82,7 @@ Each computation party makes the following modifications on `uci_housing_demo.py
process_data.load_decrypt_data("/tmp/uci_prediction", (BATCH_SIZE,))
```
####(4). Launch Demo on Each Party
#### (4). Launch Demo on Each Party
**Note** that Redis service is necessary for demo running. Remember to clear the cache of Redis server before launching demo on each computation party, in order to avoid any negative influences caused by the cached records in Redis. The following command can be used for clear Redis, where REDIS_BIN is the executable binary of redis-cli, SERVER and PORT represent the IP and port of Redis server respectively.
......@@ -100,9 +100,9 @@ where PYTHON_EXECUTABLE is the python which installs PaddleFL, PARTY_ID is the I
Similarly, training loss with cypher text format would be printed on the screen of each computation party. And at the same time, the loss and predictions would be saved in `/tmp` directory.
**Note** that remember to delete the loss and precidtion files in `/tmp` directory generated in last running, in case of any influence on the decrypted results of current running.
**Note** that remember to delete the loss and prediction files in `/tmp` directory generated in last running, in case of any influence on the decrypted results of current running.
####(5). Decrypt Loss and Prediction Data
#### (5). Decrypt Loss and Prediction Data
Each computation party sends `uci_loss.part` and `uci_prediction.part` files in `/tmp` directory to the `/tmp` directory of data owner. Data owner decrypts and gets the plain text of loss and predictions with ` load_decrypt_data()` in `process_data.py`.
......
##PaddleFL-MPC UCI Housing Demo运行说明
## PaddleFL-MPC UCI Housing Demo运行说明
(简体中文|[English](./README.md))
本示例介绍基于PaddleFL-MPC进行UCI房价预测模型训练和预测的使用说明,分为单机运行和多机运行两种方式。
###一. 单机运行
### 一. 单机运行
####1. 准备数据
#### 1. 准备数据
使用`process_data.py`脚本中的`generate_encrypted_data()`产生加密数据,比如将如下内容写到一个`prepare.py`脚本中,然后`python prepare.py`
......@@ -17,7 +17,7 @@ process_data.generate_encrypted_data()
将在/tmp目录下生成对应于3个计算party的feature和label的加密数据文件,以后缀名区分属于不同party的数据。比如,`house_feature.part0`表示属于party0的feature数据。
####2. 使用shell脚本启动demo
#### 2. 使用shell脚本启动demo
使用`run_standalone.sh`脚本,启动并运行demo,命令如下:
......@@ -48,19 +48,19 @@ fi
###二. 多机运行
### 二. 多机运行
####1. 准备数据
#### 1. 准备数据
数据方对数据进行加密处理。具体操作和单机运行中的准备数据步骤一致。
####2. 分发数据
#### 2. 分发数据
按照后缀名,将步骤1中准备好的数据分别发送到对应的计算party的/tmp目录下。比如,使用scp命令,将
`house_feature.part0``house_label.part0`发送到party0的/tmp目录下。
####3. 计算party修改uci_housing_demo.py脚本
#### 3. 计算party修改uci_housing_demo.py脚本
各计算party根据自己的机器环境,对uci_housing_demo.py做如下改动:
......@@ -84,7 +84,7 @@ fi
process_data.load_decrypt_data("/tmp/uci_prediction", (BATCH_SIZE,))
```
####4. 各计算party启动demo
#### 4. 各计算party启动demo
**注意**:运行需要用到redis服务。为了确保redis中已保存的数据不会影响demo的运行,请在各计算party启动demo之前,使用如下命令清空redis。其中,REDIS_BIN表示redis-cli可执行程序,SERVER和PORT分别表示redis server的IP地址和端口号。
......@@ -104,7 +104,7 @@ $PYTHON_EXECUTABLE uci_housing_demo.py $PARTY_ID $SERVER $PORT
**注意**:再次启动运行demo之前,请先将上次在`/tmp`保存的loss和prediction文件删除,以免影响本次密文数据的恢复结果。
####5. 数据方解密loss和prediction
#### 5. 数据方解密loss和prediction
各计算party将`/tmp`目录下的`uci_loss.part``uci_prediction.part`文件发送到数据方的/tmp目录下。数据方使用process_data.py脚本中的load_decrypt_data()解密恢复出loss数据和prediction数据。
......
# Example in CTR(Click-Through-Rate) with FedAvg
This document introduces how to use PaddleFL to train a model with Fl Strategy.
### Dependencies
- paddlepaddle>=1.8
### How to install PaddleFL
Please use pip which has paddlepaddle installed
```sh
pip install paddle_fl
```
### How to work in PaddleFL
PaddleFL has two phases , CompileTime and RunTime. In CompileTime, a federated learning task is defined by fl_master. In RunTime, a federated learning job is executed on fl_server and fl_trainer in distributed clusters.
```sh
sh run.sh
```
#### How to work in CompileTime
In this example, we implement compile time programs in fl_master.py
```sh
python fl_master.py
```
In fl_master.py, we first define FL-Strategy, User-Defined-Program and Distributed-Config. Then FL-Job-Generator generate FL-Job for federated server and worker.
```python
import paddle.fluid as fluid
import paddle_fl.paddle_fl as fl
from paddle_fl.paddle_fl.core.master.job_generator import JobGenerator
from paddle_fl.paddle_fl.core.strategy.fl_strategy_base import FLStrategyFactory
class Model(object):
def __init__(self):
pass
def mlp(self, inputs, label, hidden_size=128):
self.concat = fluid.layers.concat(inputs, axis=1)
self.fc1 = fluid.layers.fc(input=self.concat, size=256, act='relu')
self.fc2 = fluid.layers.fc(input=self.fc1, size=128, act='relu')
self.predict = fluid.layers.fc(input=self.fc2, size=2, act='softmax')
self.sum_cost = fluid.layers.cross_entropy(
input=self.predict, label=label)
self.accuracy = fluid.layers.accuracy(input=self.predict, label=label)
self.loss = fluid.layers.reduce_mean(self.sum_cost)
self.startup_program = fluid.default_startup_program()
inputs = [fluid.layers.data( \
name=str(slot_id), shape=[5],
dtype="float32")
for slot_id in range(3)]
label = fluid.layers.data( \
name="label",
shape=[1],
dtype='int64')
model = Model()
model.mlp(inputs, label)
job_generator = JobGenerator()
optimizer = fluid.optimizer.SGD(learning_rate=0.1)
job_generator.set_optimizer(optimizer)
job_generator.set_losses([model.loss])
job_generator.set_startup_program(model.startup_program)
job_generator.set_infer_feed_and_target_names([x.name for x in inputs],
[model.predict.name])
build_strategy = FLStrategyFactory()
build_strategy.fed_avg = True
build_strategy.inner_step = 10
strategy = build_strategy.create_fl_strategy()
# endpoints will be collected through the cluster
# in this example, we suppose endpoints have been collected
endpoints = ["127.0.0.1:8181"]
output = "fl_job_config"
job_generator.generate_fl_job(
strategy, server_endpoints=endpoints, worker_num=2, output=output)
# fl_job_config will be dispatched to workers
```
#### How to work in RunTime
```sh
python -u fl_scheduler.py >scheduler.log &
python -u fl_server.py >server0.log &
python -u fl_trainer.py 0 >trainer0.log &
python -u fl_trainer.py 1 >trainer1.log &
```
In fl_scheduler.py, we let server and trainers to do registeration.
```python
from paddle_fl.paddle_fl.core.scheduler.agent_master import FLScheduler
worker_num = 2
server_num = 1
# Define the number of worker/server and the port for scheduler
scheduler = FLScheduler(worker_num, server_num, port=9091)
scheduler.set_sample_worker_num(worker_num)
scheduler.init_env()
print("init env done.")
scheduler.start_fl_training()
```
In fl_server.py, we load and run the FL server job.
```python
import paddle_fl.paddle_fl as fl
import paddle.fluid as fluid
from paddle_fl.paddle_fl.core.server.fl_server import FLServer
from paddle_fl.paddle_fl.core.master.fl_job import FLRunTimeJob
server = FLServer()
server_id = 0
job_path = "fl_job_config"
job = FLRunTimeJob()
job.load_server_job(job_path, server_id)
job._scheduler_ep = "127.0.0.1:9091" # IP address for scheduler
server.set_server_job(job)
server._current_ep = "127.0.0.1:8181" # IP address for server
server.start()
```
In fl_trainer.py, we load and run the FL trainer job, then evaluate the accuracy with test data and compute the privacy budget. The DataSet is ramdomly generated.
```python
import sys
import time
import logging
import numpy as np
import paddle.fluid as fluid
from paddle_fl.paddle_fl.core.trainer.fl_trainer import FLTrainerFactory
from paddle_fl.paddle_fl.core.master.fl_job import FLRunTimeJob
def reader():
for i in range(1000):
data_dict = {}
for i in range(3):
data_dict[str(i)] = np.random.rand(1, 5).astype('float32')
data_dict["label"] = np.random.randint(2, size=(1, 1)).astype('int64')
yield data_dict
trainer_id = int(sys.argv[1]) # trainer id for each guest
job_path = "fl_job_config"
job = FLRunTimeJob()
job.load_trainer_job(job_path, trainer_id)
job._scheduler_ep = "127.0.0.1:9091" # Inform the scheduler IP to trainer
trainer = FLTrainerFactory().create_fl_trainer(job)
trainer._current_ep = "127.0.0.1:{}".format(9000 + trainer_id)
place = fluid.CPUPlace()
trainer.start(place)
print(trainer._scheduler_ep, trainer._current_ep)
output_folder = "fl_model"
epoch_id = 0
while not trainer.stop():
print("batch %d start train" % (epoch_id))
train_step = 0
for data in reader():
trainer.run(feed=data, fetch=[])
train_step += 1
if train_step == trainer._step:
break
epoch_id += 1
if epoch_id % 5 == 0:
trainer.save_inference_program(output_folder)
```
# Example in Recognize Digits with DPSGD
This document introduces how to use PaddleFL to train a model with Fl Strategy: DPSGD.
### Dependencies
- paddlepaddle>=1.8
### How to install PaddleFL
Please use pip which has paddlepaddle installed
```sh
pip install paddle_fl
```
### Model
The simplest Softmax regression model is to get features with input layer passing through a fully connected layer and then compute and ouput probabilities of multiple classifications directly via Softmax function [[PaddlePaddle tutorial: recognize digits](https://github.com/PaddlePaddle/book/tree/develop/02.recognize_digits#references)].
### Datasets
Public Dataset [MNIST](http://yann.lecun.com/exdb/mnist/)
The dataset will downloaded automatically in the API and will be located under `/home/username/.cache/paddle/dataset/mnist`:
| filename | note |
| ----------------------- | ------------------------------- |
| train-images-idx3-ubyte | train data picture, 60,000 data |
| train-labels-idx1-ubyte | train data label, 60,000 data |
| t10k-images-idx3-ubyte | test data picture, 10,000 data |
| t10k-labels-idx1-ubyte | test data label, 10,000 data |
### How to work in PaddleFL
PaddleFL has two phases , CompileTime and RunTime. In CompileTime, a federated learning task is defined by fl_master. In RunTime, a federated learning job is executed on fl_server and fl_trainer in distributed clusters.
```sh
sh run.sh
```
#### How to work in CompileTime
In this example, we implement compile time programs in fl_master.py
```sh
python fl_master.py
```
In fl_master.py, we first define FL-Strategy, User-Defined-Program and Distributed-Config. Then FL-Job-Generator generate FL-Job for federated server and worker.
```python
import paddle.fluid as fluid
import paddle_fl.paddle_fl as fl
from paddle_fl.paddle_fl.core.master.job_generator import JobGenerator
from paddle_fl.paddle_fl.core.strategy.fl_strategy_base import FLStrategyFactory
import math
class Model(object):
def __init__(self):
pass
def lr_network(self):
self.inputs = fluid.layers.data(name='img', shape=[1, 28, 28], dtype="float32")
self.label = fluid.layers.data(name='label', shape=[1],dtype='int64')
self.predict = fluid.layers.fc(input=self.inputs, size=10, act='softmax')
self.sum_cost = fluid.layers.cross_entropy(input=self.predict, label=self.label)
self.accuracy = fluid.layers.accuracy(input=self.predict, label=self.label)
self.loss = fluid.layers.mean(self.sum_cost)
self.startup_program = fluid.default_startup_program()
model = Model()
model.lr_network()
STEP_EPSILON = 0.1
DELTA = 0.00001
SIGMA = math.sqrt(2.0 * math.log(1.25/DELTA)) / STEP_EPSILON
CLIP = 4.0
batch_size = 64
job_generator = JobGenerator()
optimizer = fluid.optimizer.SGD(learning_rate=0.1)
job_generator.set_optimizer(optimizer)
job_generator.set_losses([model.loss])
job_generator.set_startup_program(model.startup_program)
job_generator.set_infer_feed_and_target_names(
[model.inputs.name, model.label.name], [model.loss.name, model.accuracy.name])
build_strategy = FLStrategyFactory()
build_strategy.dpsgd = True
build_strategy.inner_step = 1
strategy = build_strategy.create_fl_strategy()
strategy.learning_rate = 0.1
strategy.clip = CLIP
strategy.batch_size = float(batch_size)
strategy.sigma = CLIP * SIGMA
# endpoints will be collected through the cluster
# in this example, we suppose endpoints have been collected
endpoints = ["127.0.0.1:8181"]
output = "fl_job_config"
job_generator.generate_fl_job(
strategy, server_endpoints=endpoints, worker_num=2, output=output)
```
#### How to work in RunTime
```sh
python -u fl_scheduler.py >scheduler.log &
python -u fl_server.py >server0.log &
python -u fl_trainer.py 0 >trainer0.log &
python -u fl_trainer.py 1 >trainer1.log &
python -u fl_trainer.py 2 >trainer2.log &
python -u fl_trainer.py 3 >trainer3.log &
```
In fl_scheduler.py, we let server and trainers to do registeration.
```python
from paddle_fl.paddle_fl.core.scheduler.agent_master import FLScheduler
worker_num = 4
server_num = 1
#Define number of worker/server and the port for scheduler
scheduler = FLScheduler(worker_num, server_num, port=9091)
scheduler.set_sample_worker_num(4)
scheduler.init_env()
print("init env done.")
scheduler.start_fl_training()
```
In fl_server.py, we load and run the FL server job.
```python
import paddle_fl.paddle_fl as fl
import paddle.fluid as fluid
from paddle_fl.paddle_fl.core.server.fl_server import FLServer
from paddle_fl.paddle_fl.core.master.fl_job import FLRunTimeJob
server = FLServer()
server_id = 0
job_path = "fl_job_config"
job = FLRunTimeJob()
job.load_server_job(job_path, server_id)
job._scheduler_ep = "127.0.0.1:9091" # IP address for scheduler
server.set_server_job(job)
server._current_ep = "127.0.0.1:8181" # IP address for server
server.start()
```
In fl_trainer.py, we load and run the FL trainer job, then evaluate the accuracy with test data and compute the privacy budget.
```python
from paddle_fl.paddle_fl.core.trainer.fl_trainer import FLTrainerFactory
from paddle_fl.paddle_fl.core.master.fl_job import FLRunTimeJob
import numpy
import sys
import paddle
import paddle.fluid as fluid
import logging
import math
trainer_id = int(sys.argv[1]) # trainer id for each guest
job_path = "fl_job_config"
job = FLRunTimeJob()
job.load_trainer_job(job_path, trainer_id)
trainer = FLTrainerFactory().create_fl_trainer(job)
trainer.start()
def train_test(train_test_program, train_test_feed, train_test_reader):
acc_set = []
for test_data in train_test_reader():
acc_np = trainer.exe.run(
program=train_test_program,
feed=train_test_feed.feed(test_data),
fetch_list=["accuracy_0.tmp_0"])
acc_set.append(float(acc_np[0]))
acc_val_mean = numpy.array(acc_set).mean()
return acc_val_mean
def compute_privacy_budget(sample_ratio, epsilon, step, delta):
E = 2 * epsilon * math.sqrt(step * sample_ratio)
print("({0}, {1})-DP".format(E, delta))
output_folder = "model_node%d" % trainer_id
epoch_id = 0
step = 0
while not trainer.stop():
epoch_id += 1
if epoch_id > 40:
break
print("epoch %d start train" % (epoch_id))
for step_id, data in enumerate(train_reader()):
acc = trainer.run(feeder.feed(data), fetch=["accuracy_0.tmp_0"])
step += 1
# print("acc:%.3f" % (acc[0]))
acc_val = train_test(
train_test_program=test_program,
train_test_reader=test_reader,
train_test_feed=feeder)
print("Test with epoch %d, accuracy: %s" % (epoch_id, acc_val))
compute_privacy_budget(sample_ratio=0.001, epsilon=0.1, step=step, delta=0.00001)
save_dir = (output_folder + "/epoch_%d") % epoch_id
trainer.save_inference_program(output_folder)
```
### Simulated experiments on public dataset MNIST
To show the effectiveness of DPSGD-based federated learning with PaddleFL, a simulated experiment is conducted on an open source dataset MNIST. From the figure given below, model evaluation results are similar between DPSGD-based federated learning and traditional parameter server training when the overall privacy budget *epsilon* is 1.3 or 0.13.
<img src="../../../../../docs/source/examples/md/fl_dpsgd_benchmark.png" height=400 width=600 hspace='10'/> <br />
# Example in LEAF Dataset with FedAvg
This document introduces how to use PaddleFL to train a model with Fl Strategy: FedAvg.
### Dependencies
- paddlepaddle>=1.8
### How to install PaddleFL
Please use pip which has paddlepaddle installed
```sh
pip install paddle_fl
```
### Model
An CNN model which get features with two convolution layers and one fully connected layer and then compute and ouput probabilities of multiple classifications directly via Softmax function.
### Datasets
Public Dataset FEMNIST in [LEAF](https://github.com/TalwalkarLab/leaf)
### How to work in PaddleFL
PaddleFL has two phases , CompileTime and RunTime. In CompileTime, a federated learning task is defined by fl_master. In RunTime, a federated learning job is executed on fl_server and fl_trainer in distributed clusters.
```sh
sh run.sh
```
#### How to work in CompileTime
In this example, we implement compile time programs in fl_master.py
```sh
python fl_master.py
```
In fl_master.py, we first define FL-Strategy, User-Defined-Program and Distributed-Config. Then FL-Job-Generator generate FL-Job for federated server and worker.
```python
import paddle.fluid as fluid
import paddle_fl.paddle_fl as fl
from paddle_fl.paddle_fl.core.master.job_generator import JobGenerator
from paddle_fl.paddle_fl.core.strategy.fl_strategy_base import FLStrategyFactory
class Model(object):
def __init__(self):
pass
def cnn(self):
self.inputs = fluid.layers.data(
name='img', shape=[1, 28, 28], dtype="float32")
self.label = fluid.layers.data(name='label', shape=[1], dtype='int64')
self.conv_pool_1 = fluid.nets.simple_img_conv_pool(
input=self.inputs,
num_filters=20,
filter_size=5,
pool_size=2,
pool_stride=2,
act='relu')
self.conv_pool_2 = fluid.nets.simple_img_conv_pool(
input=self.conv_pool_1,
num_filters=50,
filter_size=5,
pool_size=2,
pool_stride=2,
act='relu')
self.predict = self.predict = fluid.layers.fc(input=self.conv_pool_2,
size=62,
act='softmax')
self.cost = fluid.layers.cross_entropy(
input=self.predict, label=self.label)
self.accuracy = fluid.layers.accuracy(
input=self.predict, label=self.label)
self.loss = fluid.layers.mean(self.cost)
self.startup_program = fluid.default_startup_program()
model = Model()
model.cnn()
job_generator = JobGenerator()
optimizer = fluid.optimizer.Adam(learning_rate=0.1)
job_generator.set_optimizer(optimizer)
job_generator.set_losses([model.loss])
job_generator.set_startup_program(model.startup_program)
job_generator.set_infer_feed_and_target_names(
[model.inputs.name, model.label.name],
[model.loss.name, model.accuracy.name])
build_strategy = FLStrategyFactory()
build_strategy.fed_avg = True
build_strategy.inner_step = 1
strategy = build_strategy.create_fl_strategy()
endpoints = ["127.0.0.1:8181"]
output = "fl_job_config"
job_generator.generate_fl_job(
strategy, server_endpoints=endpoints, worker_num=4, output=output)
```
#### How to work in RunTime
```sh
python -u fl_scheduler.py >scheduler.log &
python -u fl_server.py >server0.log &
for ((i=0;i<4;i++))
do
python -u fl_trainer.py $i >trainer$i.log &
done
```
In fl_scheduler.py, we let server and trainers to do registeration.
```python
from paddle_fl.paddle_fl.core.scheduler.agent_master import FLScheduler
worker_num = 4
server_num = 1
# Define the number of worker/server and the port for scheduler
scheduler = FLScheduler(worker_num, server_num, port=9091)
scheduler.set_sample_worker_num(4)
scheduler.init_env()
print("init env done.")
scheduler.start_fl_training()
```
In fl_server.py, we load and run the FL server job.
```python
import paddle_fl.paddle_fl as fl
import paddle.fluid as fluid
from paddle_fl.paddle_fl.core.server.fl_server import FLServer
from paddle_fl.paddle_fl.core.master.fl_job import FLRunTimeJob
server = FLServer()
server_id = 0
job_path = "fl_job_config"
job = FLRunTimeJob()
job.load_server_job(job_path, server_id)
job._scheduler_ep = "127.0.0.1:9091" # IP address for scheduler
server.set_server_job(job)
server._current_ep = "127.0.0.1:8181" # IP address for server
server.start()
```
In fl_trainer.py, we load and run the FL trainer job.
```python
from paddle_fl.paddle_fl.core.trainer.fl_trainer import FLTrainerFactory
from paddle_fl.paddle_fl.core.master.fl_job import FLRunTimeJob
import paddle_fl.paddle_fl.dataset.femnist as femnist
import numpy
import sys
import paddle
import paddle.fluid as fluid
import logging
import math
trainer_id = int(sys.argv[1]) # trainer id for each guest
job_path = "fl_job_config"
job = FLRunTimeJob()
job.load_trainer_job(job_path, trainer_id)
job._scheduler_ep = "127.0.0.1:9091" # Inform the scheduler IP to trainer
print(job._target_names)
trainer = FLTrainerFactory().create_fl_trainer(job)
trainer._current_ep = "127.0.0.1:{}".format(9000 + trainer_id)
place = fluid.CPUPlace()
trainer.start(place)
print(trainer._step)
test_program = trainer._main_program.clone(for_test=True)
img = fluid.layers.data(name='img', shape=[1, 28, 28], dtype='float32')
label = fluid.layers.data(name='label', shape=[1], dtype='int64')
feeder = fluid.DataFeeder(feed_list=[img, label], place=fluid.CPUPlace())
def train_test(train_test_program, train_test_feed, train_test_reader):
acc_set = []
for test_data in train_test_reader():
acc_np = trainer.exe.run(program=train_test_program,
feed=train_test_feed.feed(test_data),
fetch_list=["accuracy_0.tmp_0"])
acc_set.append(float(acc_np[0]))
acc_val_mean = numpy.array(acc_set).mean()
return acc_val_mean
epoch_id = 0
step = 0
epoch = 3000
count_by_step = False
if count_by_step:
output_folder = "model_node%d" % trainer_id
else:
output_folder = "model_node%d_epoch" % trainer_id
while not trainer.stop():
count = 0
epoch_id += 1
if epoch_id > epoch:
break
print("epoch %d start train" % (epoch_id))
#train_data,test_data= data_generater(trainer_id,inner_step=trainer._step,batch_size=64,count_by_step=count_by_step)
train_reader = paddle.batch(
paddle.reader.shuffle(
femnist.train(
trainer_id,
inner_step=trainer._step,
batch_size=64,
count_by_step=count_by_step),
buf_size=500),
batch_size=64)
test_reader = paddle.batch(
femnist.test(
trainer_id,
inner_step=trainer._step,
batch_size=64,
count_by_step=count_by_step),
batch_size=64)
if count_by_step:
for step_id, data in enumerate(train_reader()):
acc = trainer.run(feeder.feed(data), fetch=["accuracy_0.tmp_0"])
step += 1
count += 1
print(count)
if count % trainer._step == 0:
break
# print("acc:%.3f" % (acc[0]))
else:
trainer.run_with_epoch(
train_reader, feeder, fetch=["accuracy_0.tmp_0"], num_epoch=1)
acc_val = train_test(
train_test_program=test_program,
train_test_reader=test_reader,
train_test_feed=feeder)
print("Test with epoch %d, accuracy: %s" % (epoch_id, acc_val))
if trainer_id == 0:
save_dir = (output_folder + "/epoch_%d") % epoch_id
trainer.save_inference_program(output_folder)
```
......@@ -9,6 +9,6 @@ python -u fl_server.py >server0.log &
sleep 2
for ((i=0;i<4;i++))
do
python -u fl_trainer.py $i >trainer$i.log &
sleep 2
python -u fl_trainer.py $i >trainer$i.log &
sleep 2
done
# Example to Load Program from a Pre-defined Model
This document introduces how to load a pre-defined model, and transfer into program that is usable by PaddleFL.
### Dependencies
- paddlepaddle>=1.8
- paddle_fl>=1.0
Please use pip which has paddlepaddle installed
```sh
pip install paddle_fl
```
### Compile Time
#### How to save a program
In program_saver.py, you can defind a model. And save the program in to 'load_file'
```python
import os
import json
import paddle.fluid as fluid
from paddle_fl.paddle_fl.core.master.job_generator import JobGenerator
input = fluid.layers.data(name='input', shape=[1, 28, 28], dtype="float32")
label = fluid.layers.data(name='label', shape=[1], dtype='int64')
feeder = fluid.DataFeeder(feed_list=[input, label], place=fluid.CPUPlace())
predict = fluid.layers.fc(input=input, size=10, act='softmax')
sum_cost = fluid.layers.cross_entropy(input=predict, label=label)
accuracy = fluid.layers.accuracy(input=predict, label=label)
avg_cost = fluid.layers.mean(sum_cost, name="loss")
startup_program = fluid.default_startup_program()
place = fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(startup_program)
job_generator = JobGenerator()
program_path = './load_file'
job_generator.save_program(program_path, [input, label],
[['predict', predict], ['accuracy', accuracy]],
avg_cost)
```
#### How to load a program
In fl_master.py, you can load the program in 'load_file' and transfer it into an fl program.
```python
import paddle_fl.paddle_fl as fl
import paddle.fluid as fluid
from paddle_fl.paddle_fl.core.master.job_generator import JobGenerator
from paddle_fl.paddle_fl.core.strategy.fl_strategy_base import FLStrategyFactory
build_strategy = FLStrategyFactory()
build_strategy.fed_avg = True
build_strategy.inner_step = 10
strategy = build_strategy.create_fl_strategy()
endpoints = ["127.0.0.1:8181"]
output = "fl_job_config"
program_file = "./load_file"
job_generator = JobGenerator()
job_generator.generate_fl_job_from_program(
strategy=strategy,
endpoints=endpoints,
worker_num=2,
program_input=program_file,
output=output)
```
#### How to work in RunTime
```sh
python -u fl_scheduler.py >scheduler.log &
python -u fl_server.py >server0.log &
python -u fl_trainer.py 0 >trainer0.log &
python -u fl_trainer.py 1 >trainer1.log &
```
In fl_scheduler.py, we let server and trainers to do registeration.
```python
from paddle_fl.paddle_fl.core.scheduler.agent_master import FLScheduler
worker_num = 2
server_num = 1
#Define number of worker/server and the port for scheduler
scheduler = FLScheduler(worker_num, server_num, port=9091)
scheduler.set_sample_worker_num(2)
scheduler.init_env()
print("init env done.")
scheduler.start_fl_training()
```
In fl_server.py, we load and run the FL server job.
```python
import paddle_fl.paddle_fl as fl
import paddle.fluid as fluid
from paddle_fl.paddle_fl.core.server.fl_server import FLServer
from paddle_fl.paddle_fl.core.master.fl_job import FLRunTimeJob
server = FLServer()
server_id = 0
job_path = "fl_job_config"
job = FLRunTimeJob()
job.load_server_job(job_path, server_id)
job._scheduler_ep = "127.0.0.1:9091" # IP address for scheduler
server.set_server_job(job)
server._current_ep = "127.0.0.1:8181" # IP address for server
server.start()
```
In fl_trainer.py, we load and run the FL trainer job, then evaluate the accuracy with test data.
```python
from paddle_fl.paddle_fl.core.trainer.fl_trainer import FLTrainerFactory
from paddle_fl.paddle_fl.core.master.fl_job import FLRunTimeJob
import numpy
import sys
import paddle
import paddle.fluid as fluid
import logging
import math
trainer_id = int(sys.argv[1]) # trainer id for each guest
job_path = "fl_job_config"
job = FLRunTimeJob()
job.load_trainer_job(job_path, trainer_id)
job._scheduler_ep = "127.0.0.1:9091" # Inform scheduler IP address to trainer
trainer = FLTrainerFactory().create_fl_trainer(job)
trainer._current_ep = "127.0.0.1:{}".format(9000 + trainer_id)
place = fluid.CPUPlace()
trainer.start(place)
test_program = trainer._main_program.clone(for_test=True)
train_reader = paddle.batch(
paddle.reader.shuffle(
paddle.dataset.mnist.train(), buf_size=500),
batch_size=64)
test_reader = paddle.batch(paddle.dataset.mnist.test(), batch_size=64)
input = fluid.layers.data(name='input', shape=[1, 28, 28], dtype='float32')
label = fluid.layers.data(name='label', shape=[1], dtype='int64')
feeder = fluid.DataFeeder(feed_list=[input, label], place=fluid.CPUPlace())
def train_test(train_test_program, train_test_feed, train_test_reader):
acc_set = []
for test_data in train_test_reader():
acc_np = trainer.exe.run(program=train_test_program,
feed=train_test_feed.feed(test_data),
fetch_list=["accuracy_0.tmp_0"])
acc_set.append(float(acc_np[0]))
acc_val_mean = numpy.array(acc_set).mean()
return acc_val_mean
output_folder = "model_node%d" % trainer_id
epoch_id = 0
step = 0
while not trainer.stop():
epoch_id += 1
if epoch_id > 40:
break
print("epoch %d start train" % (epoch_id))
for step_id, data in enumerate(train_reader()):
acc = trainer.run(feeder.feed(data), fetch=["accuracy_0.tmp_0"])
step += 1
acc_val = train_test(
train_test_program=test_program,
train_test_reader=test_reader,
train_test_feed=feeder)
print("Test with epoch %d, accuracy: %s" % (epoch_id, acc_val))
save_dir = (output_folder + "/epoch_%d") % epoch_id
trainer.save_inference_program(output_folder)
```
......@@ -35,7 +35,8 @@ job.load_trainer_job(job_path, trainer_id)
job._scheduler_ep = "127.0.0.1:9091" # Inform scheduler IP address to trainer
trainer = FLTrainerFactory().create_fl_trainer(job)
trainer._current_ep = "127.0.0.1:{}".format(9000 + trainer_id)
trainer.start()
place = fluid.CPUPlace()
trainer.start(place)
test_program = trainer._main_program.clone(for_test=True)
train_reader = paddle.batch(
......
# Example in recommendation with FedAvg
This document introduces how to use PaddleFL to train a model with Fl Strategy.
### Dependencies
- paddlepaddle>=1.8
### How to install PaddleFL
Please use pip which has paddlepaddle installed
```sh
pip install paddle_fl
```
### Model
[Gru4rec](https://arxiv.org/abs/1511.06939) is a classical session-based recommendation model. Detailed implementations with paddlepaddle is [here](https://github.com/PaddlePaddle/models/tree/develop/PaddleRec/gru4rec).
### Datasets
Public Dataset [Rsc15](https://2015.recsyschallenge.com)
```sh
#download data
sh download.sh
```
### How to work in PaddleFL
PaddleFL has two phases , CompileTime and RunTime. In CompileTime, a federated learning task is defined by fl_master. In RunTime, a federated learning job is executed on fl_server and fl_trainer in distributed clusters.
```sh
sh run.sh
```
### How to work in CompileTime
In this example, we implement compile time programs in fl_master.py
```sh
# please run fl_master to generate fl_job
python fl_master.py
```
In fl_master.py, we first define FL-Strategy, User-Defined-Program and Distributed-Config. Then FL-Job-Generator generate FL-Job for federated server and worker.
```python
import paddle.fluid as fluid
import paddle_fl.paddle_fl as fl
from paddle_fl.paddle_fl.core.master.job_generator import JobGenerator
from paddle_fl.paddle_fl.core.strategy.fl_strategy_base import FLStrategyFactory
# define model
model = Model()
model.gru4rec_network()
# define JobGenerator and set model config
# feed_name and target_name are config for save model.
job_generator = JobGenerator()
optimizer = fluid.optimizer.SGD(learning_rate=2.0)
job_generator.set_optimizer(optimizer)
job_generator.set_losses([model.loss])
job_generator.set_startup_program(model.startup_program)
job_generator.set_infer_feed_and_target_names(
[x.name for x in model.inputs], [model.loss.name, model.recall.name])
# define FL-Strategy , we now support two flstrategy, fed_avg and dpsgd. Inner_step means fl_trainer locally train inner_step mini-batch.
build_strategy = FLStrategyFactory()
build_strategy.fed_avg = True
build_strategy.inner_step = 1
strategy = build_strategy.create_fl_strategy()
# define Distributed-Config and generate fl_job
endpoints = ["127.0.0.1:8181"]
output = "fl_job_config"
job_generator.generate_fl_job(
strategy, server_endpoints=endpoints, worker_num=4, output=output)
```
### How to work in RunTime
```sh
python -u fl_scheduler.py >scheduler.log &
python -u fl_server.py >server0.log &
python -u fl_trainer.py 0 >trainer0.log &
python -u fl_trainer.py 1 >trainer1.log &
python -u fl_trainer.py 2 >trainer2.log &
python -u fl_trainer.py 3 >trainer3.log &
```
fl_trainer.py can define own reader according to data.
```python
r = Gru4rec_Reader()
train_reader = r.reader(train_file_dir, place, batch_size=10)
```
### Simulated experiments on real world dataset
To show the concept and effectiveness of horizontal federated learning with PaddleFL, a simulated experiment is conducted on an open source dataset with a real world task. In horizontal federated learning, a group of organizations are doing similar tasks based on private dataset and they are willing to collaborate on a certain task. The goal of the collaboration is to improve the task accuracy with federated learning.
The simulated experiment suppose all organizations have homogeneous dataset and homogeneous task which is an ideal case. The whole dataset is from small part of [Rsc15] and each organization has a subset as a private dataset. To show the performanc e improvement under federated learning, models based on each organization's private dataset are trained and a model under distributed federated learning is trained. A model based on traditional parameter server training is also trained where the whole dataset is owned by a single organization.
From the table and the figure given below, model evaluation results are similar between federated learning and traditional parameter server training. It is clear that compare with models trained with only private dataset, models' performance for each organization get significant improvement with federated learning.
```sh
# download code and readme
wget https://paddle-zwh.bj.bcebos.com/gru4rec_paddlefl_benchmark/gru4rec_benchmark.tar
```
| Dataset | training methods | FL Strategy | recall@20|
| --- | --- | --- |---|
| the whole dataset | private training | - | 0.504 |
| the whole dataset | federated learning | FedAvg | 0.504 |
| 1/4 of the whole dataset | private training | - | 0.286 |
| 1/4 of the whole dataset | private training | - | 0.277 |
| 1/4 of the whole dataset | private training | - | 0.269 |
| 1/4 of the whole dataset | private training | - | 0.282 |
<img src="../../../../../docs/source/examples/md/fl_benchmark.png" height=300 width=500 hspace='10'/> <br />
# PaddleFL deployment example with Kubernetes
## compile time
## How to run in PaddleFL
```sh
kubectl apply -f master.yaml
```
## Compile time
#### Master
......
# Example in Recognize Digits with Secure Aggragate
This document introduces how to use PaddleFL to train a model with Fl Strategy: Secure Aggregation. Using Secure Aggregation strategy, the server can aggregate the model parameters without learning the value of the parameters.
### Dependencies
- paddlepaddle>=1.8
### How to install PaddleFL
Please use pip which has paddlepaddle installed
```sh
pip install paddle_fl
```
### Model
The simplest Softmax regression model is to get features with input layer passing through a fully connected layer and then compute and ouput probabilities of multiple classes directly via Softmax function [[PaddlePaddle tutorial: recognize digits](https://github.com/PaddlePaddle/book/tree/develop/02.recognize_digits#references)].
### Datasets
Public Dataset [MNIST](http://yann.lecun.com/exdb/mnist/)
The dataset will downloaded automatically in the API and will be located under `/home/username/.cache/paddle/dataset/mnist`:
| filename | note |
| ----------------------- | ------------------------------- |
| train-images-idx3-ubyte | train data picture, 60,000 data |
| train-labels-idx1-ubyte | train data label, 60,000 data |
| t10k-images-idx3-ubyte | test data picture, 10,000 data |
| t10k-labels-idx1-ubyte | test data label, 10,000 data |
### How to work in PaddleFL
PaddleFL has two phases , CompileTime and RunTime. In CompileTime, a federated learning task is defined by fl_master. In RunTime, a federated learning job is executed on fl_server and fl_trainer in distributed clusters.
```sh
sh run.sh
```
#### How to work in CompileTime
In this example, we implement compile time programs in fl_master.py
```sh
python fl_master.py
```
In fl_master.py, we first define FL-Strategy, User-Defined-Program and Distributed-Config. Then FL-Job-Generator generate FL-Job for federated server and worker.
```python
import paddle.fluid as fluid
import paddle_fl.paddle_fl as fl
from paddle_fl.paddle_fl.core.master.job_generator import JobGenerator
from paddle_fl.paddle_fl.core.strategy.fl_strategy_base import FLStrategyFactory
def linear_regression(self, inputs, label):
param_attrs = fluid.ParamAttr(
name="fc_0.b_0",
initializer=fluid.initializer.ConstantInitializer(0.0))
param_attrs = fluid.ParamAttr(
name="fc_0.w_0",
initializer=fluid.initializer.ConstantInitializer(0.0))
self.predict = fluid.layers.fc(input=inputs, size=10, act='softmax', param_attr=param_attrs)
self.sum_cost = fluid.layers.cross_entropy(input=self.predict, label=label)
self.loss = fluid.layers.mean(self.sum_cost)
self.accuracy = fluid.layers.accuracy(input=self.predict, label=label)
self.startup_program = fluid.default_startup_program()
inputs = fluid.layers.data(name='x', shape=[1, 28, 28], dtype='float32')
label = fluid.layers.data(name='y', shape=[1], dtype='int64')
model = Model()
model.linear_regression(inputs, label)
job_generator = JobGenerator()
optimizer = fluid.optimizer.SGD(learning_rate=0.01)
job_generator.set_optimizer(optimizer)
job_generator.set_losses([model.loss])
job_generator.set_startup_program(model.startup_program)
job_generator.set_infer_feed_and_target_names(
[inputs.name, label.name], [model.loss.name])
build_strategy = FLStrategyFactory()
build_strategy.sec_agg = True
param_name_list = []
param_name_list.append("fc_0.w_0.opti.trainer_") # need trainer_id when running
param_name_list.append("fc_0.b_0.opti.trainer_")
build_strategy.param_name_list = param_name_list
build_strategy.inner_step = 10
strategy = build_strategy.create_fl_strategy()
# endpoints will be collected through the cluster
# in this example, we suppose endpoints have been collected
endpoints = ["127.0.0.1:8181"]
output = "fl_job_config"
job_generator.generate_fl_job(
strategy, server_endpoints=endpoints, worker_num=2, output=output)
```
#### How to work in RunTime
```sh
python3 fl_master.py
sleep 2
python3 -u fl_server.py >log/server0.log &
sleep 2
python3 -u fl_trainer.py 0 >log/trainer0.log &
sleep 2
python3 -u fl_trainer.py 1 >log/trainer1.log &
```
In fl_scheduler.py, we let server and trainers to do registeration.
```python
from paddle_fl.paddle_fl.core.scheduler.agent_master import FLScheduler
worker_num = 2
server_num = 1
#Define number of worker/server and the port for scheduler
scheduler = FLScheduler(worker_num, server_num, port=9091)
scheduler.set_sample_worker_num(2)
scheduler.init_env()
print("init env done.")
scheduler.start_fl_training()
```
In fl_server.py, we load and run the FL server job.
```python
import paddle_fl.paddle_fl as fl
import paddle.fluid as fluid
from paddle_fl.paddle_fl.core.server.fl_server import FLServer
from paddle_fl.paddle_fl.core.master.fl_job import FLRunTimeJob
server = FLServer()
server_id = 0
job_path = "fl_job_config"
job = FLRunTimeJob()
job.load_server_job(job_path, server_id)
server.set_server_job(job)
server.start()
```
In fl_trainer.py, we prepare the MNIST dataset, load and run the FL trainer job, then evaluate the accuracy. Before training , we first prepare the party's private key and other party's public key. Then, each party generates a random noise using Diffie-Hellman key aggregate protocol with its private key and each other's public key [1]. If the other party's id is larger than this party's id, the model parameters add this random noise. If the other party's id is less than this party's id, the model parameters subtract this random noise. So, and the model parameters is masked before uploading to the server. Finally, the random noises can be removed when aggregating the masked parameters from all the parties.
```python
import numpy
import sys
import logging
import time
import datetime
import math
import hashlib
import hmac
import paddle
import paddle.fluid as fluid
from paddle_fl.paddle_fl.core.trainer.fl_trainer import FLTrainerFactory
from paddle_fl.paddle_fl.core.master.fl_job import FLRunTimeJob
logging.basicConfig(filename="log/test.log", filemode="w", format="%(asctime)s %(name)s:%(levelname)s:%(message)s", datefmt="%d-%M-%Y %H:%M:%S", level=logging.DEBUG)
logger = logging.getLogger("FLTrainer")
BATCH_SIZE = 64
train_reader = paddle.batch(
paddle.reader.shuffle(paddle.dataset.mnist.train(), buf_size=500),
batch_size=BATCH_SIZE)
test_reader = paddle.batch(
paddle.dataset.mnist.test(), batch_size=BATCH_SIZE)
trainer_num = 2
trainer_id = int(sys.argv[1]) # trainer id for each guest
job_path = "fl_job_config"
job = FLRunTimeJob()
job.load_trainer_job(job_path, trainer_id)
trainer = FLTrainerFactory().create_fl_trainer(job)
trainer.trainer_id = trainer_id
trainer.trainer_num = trainer_num
trainer.key_dir = "./keys/"
trainer.start()
output_folder = "fl_model"
epoch_id = 0
step_i = 0
inputs = fluid.layers.data(name='x', shape=[1, 28, 28], dtype='float32')
label = fluid.layers.data(name='y', shape=[1], dtype='int64')
feeder = fluid.DataFeeder(feed_list=[inputs, label], place=fluid.CPUPlace())
# for test
test_program = trainer._main_program.clone(for_test=True)
def train_test(train_test_program,
train_test_feed, train_test_reader):
acc_set = []
avg_loss_set = []
for test_data in train_test_reader():
acc_np, avg_loss_np = trainer.exe.run(
program=train_test_program,
feed=train_test_feed.feed(test_data),
fetch_list=["accuracy_0.tmp_0", "mean_0.tmp_0"])
acc_set.append(float(acc_np))
avg_loss_set.append(float(avg_loss_np))
acc_val_mean = numpy.array(acc_set).mean()
avg_loss_val_mean = numpy.array(avg_loss_set).mean()
return avg_loss_val_mean, acc_val_mean
# for test
while not trainer.stop():
epoch_id += 1
print("epoch %d start train" % (epoch_id))
for data in train_reader():
step_i += 1
trainer.step_id = step_i
accuracy, = trainer.run(feed=feeder.feed(data),
fetch=["accuracy_0.tmp_0"])
if step_i % 100 == 0:
print("Epoch: {0}, step: {1}, accuracy: {2}".format(epoch_id, step_i, accuracy[0]))
avg_loss_val, acc_val = train_test(train_test_program=test_program,
train_test_reader=test_reader,
train_test_feed=feeder)
print("Test with Epoch %d, avg_cost: %s, acc: %s" %(epoch_id, avg_loss_val, acc_val))
if epoch_id > 40:
break
if step_i % 100 == 0:
trainer.save_inference_program(output_folder)
```
[1] Aaron Segal, Antonio Marcedone, Benjamin Kreuter, Daniel Ramage, H. Brendan McMahan, Karn Seth, Keith Bonawitz, Sarvar Patel, Vladimir Ivanov. **Practical Secure Aggregation for Privacy-Preserving Machine Learning**, The 24th ACM Conference on Computer and Communications Security (**CCS**), 2017
# Example of submitting job to mpi cluster
This document introduces how to submit an FL job to mpi cluster
### Dependency
- paddlepaddle>=1.8
- paddle_fl==0.2.0
### How to install PaddleFL
Please use pip which has paddlepaddle installed
```sh
pip install paddle_fl==0.2.0
```
### How it works
#### Prepare packages
- An executable python package that will be used in cluster
- An installl package of PaddldPaddle
#### Submitter job
The information of the cluster is defined in config.txt and will be transmitted into client.py. Then a function called job_generator() will generate job for fl_server and fl_trainer. Finally, the job will be submitted.
The train_program.py is the executed program in cluster.
```sh
#use the python prepared above to generate fl job and submit the job to mpi cluster
python/bin/python client.py config.txt
```
......@@ -18,16 +18,16 @@ import random
import zmq
import time
import sys
from paddle_fl.paddle_fl.core.submitter.client_base import HPCClient
from paddle_fl.paddle_fl.core.scheduler.agent_master import FLScheduler
from paddle_fl.core.submitter.client_base import HPCClient
from paddle_fl.core.scheduler.agent_master import FLScheduler
import paddle.fluid as fluid
from paddle_fl.paddle_fl.core.master.job_generator import JobGenerator
from paddle_fl.paddle_fl.core.strategy.fl_strategy_base import FLStrategyFactory
from paddle_fl.core.master.job_generator import JobGenerator
from paddle_fl.core.strategy.fl_strategy_base import FLStrategyFactory
from model import Model
import tarfile
#random_port = random.randint(60001, 64001)
random_port = 60001
random_port = 64001
print(random_port)
current_ip = socket.gethostbyname(socket.gethostname())
endpoints = "{}:{}".format(current_ip, random_port)
......@@ -46,17 +46,17 @@ with open("package/scheduler.conf", "w") as fout:
# submit a job with current endpoint
default_dict = {
"task_name": "test_submit_job",
"hdfs_path": "afs://xingtian.afs.baidu.com:9902",
"task_name": "",
"hdfs_path": "",
"ugi": "",
"worker_nodes": 5,
"server_nodes": 5,
"hadoop_home": "/home/jingqinghe/hadoop-xingtian/hadoop",
"hpc_home": "/home/jingqinghe/mpi_feed4/smart_client",
"server_nodes": 1,
"hadoop_home": "/path/to/hadoop",
"hpc_home": "/path/to/hpc",
"package_path": "./package",
"priority": "high",
"queue": "paddle-dev-amd",
"server": "yq01-hpc-lvliang01-smart-master.dmop.baidu.com",
"queue": "",
"server": "",
"mpi_node_mem": 11000,
"pcpu": 180,
"python_tar": "./python.tar.gz",
......
unset http_proxy
unset https_proxy
/home/jingqinghe/tools/mpi_feed4/smart_client/bin/qdel $1".yq01-hpc-lvliang01-smart-master.dmop.baidu.com"
unset http_proxy
unset https_proxy
/path/to/qdel $1".<site/of/mpi/server>"
tar -xf python.tar.gz
python/bin/python scheduler_client.py config.txt
python/bin/python client.py config.txt
......@@ -17,17 +17,17 @@ import random
import zmq
import os
import tarfile
import paddle_fl.paddle_fl as fl
import paddle_fl as fl
import paddle.fluid as fluid
from paddle_fl.paddle_fl.core.server.fl_server import FLServer
from paddle_fl.paddle_fl.core.master.fl_job import FLRunTimeJob
from paddle_fl.paddle_fl.core.trainer.fl_trainer import FLTrainerFactory
from paddle_fl.core.server.fl_server import FLServer
from paddle_fl.core.master.fl_job import FLRunTimeJob
from paddle_fl.core.trainer.fl_trainer import FLTrainerFactory
import numpy as np
import sys
import logging
import time
random_port = 60001
random_port = 64001
scheduler_conf = {}
#connect to scheduler and get the role and id of the endpoint
......@@ -99,8 +99,7 @@ else:
job._scheduler_ep = scheduler_conf["ENDPOINT"]
trainer = FLTrainerFactory().create_fl_trainer(job)
trainer._current_ep = endpoint
place = fluid.CPUPlace()
trainer.start(place)
trainer.start()
print(trainer._scheduler_ep, trainer._current_ep)
output_folder = "fl_model"
epoch_id = 0
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册