From f400b58dc21112f1506e88ffed8e66d22a641c19 Mon Sep 17 00:00:00 2001 From: jingqinghe Date: Mon, 18 May 2020 15:27:24 +0800 Subject: [PATCH] update document --- README.md | 62 +++++++++--------- README_cn.md | 65 +++++++++---------- .../paddle_fl/examples/ctr_demo/README.md | 14 ++-- .../paddle_fl/examples/dpsgd_demo/README.md | 19 +++--- .../paddle_fl/examples/femnist_demo/README.md | 14 ++-- .../generate_job_from_program/README.md | 16 +++-- .../paddle_fl/examples/gru4rec_demo/README.md | 2 +- .../paddle_fl/examples/secagg_demo/README.md | 12 ++-- .../examples/submitter_demo/README.md | 4 +- 9 files changed, 103 insertions(+), 105 deletions(-) diff --git a/README.md b/README.md index f094aee..a549364 100644 --- a/README.md +++ b/README.md @@ -9,6 +9,36 @@ PaddleFL is an open source federated learning framework based on PaddlePaddle. R Data is becoming more and more expensive nowadays, and sharing of raw data is very hard across organizations. Federated Learning aims to solve the problem of data isolation and secure sharing of data knowledge among organizations. The concept of federated learning is proposed by researchers in Google [1, 2, 3]. +## Overview of PaddleFL + +### Horizontal Federated Learning + + + +In PaddleFL, horizontal and vertical federated learning strategies will be implemented according to the categorization given in [4]. Application demonstrations in natural language processing, computer vision and recommendation will be provided in PaddleFL. + +#### A. Federated Learning Strategy + +- **Vertical Federated Learning**: Logistic Regression with PrivC, Neural Network with third-party PrivC [5] + +- **Horizontal Federated Learning**: Federated Averaging [2], Differential Privacy [6], Secure Aggregation + +#### B. Training Strategy + +- **Multi Task Learning** [7] + +- **Transfer Learning** [8] + +- **Active Learning** + +### Paddle Encrypted + +Paddle Fluid Encrypted is a framework for privacy-preserving deep learning based on PaddlePaddle. It follows the same running mechanism and programming paradigm with PaddlePaddle, while using secure multi-party computation (MPC) to enable secure training and prediction. + +With Paddle Fluid Encrypted, it is easy to train models or conduct prediction as on PaddlePaddle over encrypted data, without the need for cryptography expertise. Furthermore, the rich industry-oriented models and algorithms built on PaddlePaddle can be smoothly migrated to secure versions on Paddle Fluid Encrypted with little effort. + +As a key product of PaddleFL, Paddle Fluid Encrypted intrinsically supports federated learning well, including horizontal, vertical and transfer learning scenarios. It provides both provable security (semantic security) and competitive performance. + ## Compilation and Installation ### Docker Installation @@ -55,7 +85,6 @@ Then you can put the directory in the following command and make: cmake ../ -DPYTHON_EXECUTABLE=${PYTHON_EXECUTABLE} -DPYTHON_INCLUDE_DIRS=${python_include_dir} -DCMAKE_CXX_COMPILER=${g++_path} make -j$(nproc) ``` - Install the package: ```sh @@ -72,36 +101,6 @@ tar -xf redis-stable.tar cd redis-stable && make ``` -## Overview of PaddleFL - -### Horizontal Federated Learning - - - -In PaddleFL, horizontal and vertical federated learning strategies will be implemented according to the categorization given in [4]. Application demonstrations in natural language processing, computer vision and recommendation will be provided in PaddleFL. - -#### A. Federated Learning Strategy - -- **Vertical Federated Learning**: Logistic Regression with PrivC, Neural Network with third-party PrivC [5] - -- **Horizontal Federated Learning**: Federated Averaging [2], Differential Privacy [6], Secure Aggregation - -#### B. Training Strategy - -- **Multi Task Learning** [7] - -- **Transfer Learning** [8] - -- **Active Learning** - -### Paddle Encrypted - -Paddle Fluid Encrypted is a framework for privacy-preserving deep learning based on PaddlePaddle. It follows the same running mechanism and programming paradigm with PaddlePaddle, while using secure multi-party computation (MPC) to enable secure training and prediction. - -With Paddle Fluid Encrypted, it is easy to train models or conduct prediction as on PaddlePaddle over encrypted data, without the need for cryptography expertise. Furthermore, the rich industry-oriented models and algorithms built on PaddlePaddle can be smoothly migrated to secure versions on Paddle Fluid Encrypted with little effort. - -As a key product of PaddleFL, Paddle Fluid Encrypted intrinsically supports federated learning well, including horizontal, vertical and transfer learning scenarios. It provides both provable security (semantic security) and competitive performance. - ## Framework design of PaddleFL ### Horizontal Federated Learning @@ -128,6 +127,7 @@ In PaddleFL, components for defining a federated learning task and training a fe - **FL-scheduler**: Decide which set of trainers can join the training before each updating cycle. For more instructions, please refer to the [examples](./python/paddle_fl/paddle_fl/examples) + ### Paddle Encrypted Paddle Fluid Encrypted implements secure training and inference tasks based on the underlying MPC protocol of ABY3, in which participants can be classified into roles of Input Party (IP), Computing Party (CP) and Result Party (RP). diff --git a/README_cn.md b/README_cn.md index c05d53e..0391654 100644 --- a/README_cn.md +++ b/README_cn.md @@ -6,6 +6,36 @@ PaddleFL是一个基于PaddlePaddle的开源联邦学习框架。研究人员可 如今,数据变得越来越昂贵,而且跨组织共享原始数据非常困难。联合学习旨在解决组织间数据隔离和数据知识安全共享的问题。联邦学习的概念是由谷歌的研究人员提出的[1,2,3]。 +## PaddleFL概述 + +### 横向联邦方案 + + + +在PaddleFL中,横向和纵向联邦学习策略将根据[4]中给出的分类来实现。PaddleFL也将提供在自然语言处理,计算机视觉和推荐算法等领域的应用示例。 + +#### A. 联邦学习策略 + +- **纵向联邦学习**: 带privc的逻辑回归,带第三方privc的神经网络[5] + +- **横向联邦学习**: 联邦平均 [2],差分隐私 [6],安全聚合 + +#### B. 训练策略 + +- **多任务学习** [7] + +- **迁移学习** [8] + +- **主动学习** + +### Paddle Encrypted + +Paddle Encrypted 是一个基于PaddlePaddle的隐私保护深度学习框架。Paddle Encrypted基于多方计算(MPC)实现安全训练及预测,拥有与PaddlePaddle相同的运行机制及编程范式。 + +Paddle Encrypted 设计与PaddlePaddle相似,没有密码学相关背景的用户亦可简单的对加密的数据进行训练和预测。同时,PaddlePaddle中丰富的模型和算法可以轻易地迁移到Paddle Encrypted中。 + +作为PaddleFL的一个重要组成部分,Paddle Encrypted可以很好滴支持联邦学习,包括横向、纵向及联邦迁移学习等多个场景。既提供了可靠的安全性,也拥有可观的性能。 + ## 编译与安装 ### 使用docker安装 @@ -52,7 +82,6 @@ ${PYTHON_EXECUTABLE} -c "from distutils.sysconfig import get_python_inc;print(ge cmake ../ -DPYTHON_EXECUTABLE=${PYTHON_EXECUTABLE} -DPYTHON_INCLUDE_DIRS=${python_include_dir} -DCMAKE_CXX_COMPILER=${g++_path} make -j$(nproc) ``` - 安装对应的安装包 ```sh @@ -70,36 +99,6 @@ tar -xf redis-stable.tar cd redis-stable && make ``` -## PaddleFL概述 - -### 横向联邦方案 - - - -在PaddleFL中,横向和纵向联邦学习策略将根据[4]中给出的分类来实现。PaddleFL也将提供在自然语言处理,计算机视觉和推荐算法等领域的应用示例。 - -#### A. 联邦学习策略 - -- **纵向联邦学习**: 带privc的逻辑回归,带第三方privc的神经网络[5] - -- **横向联邦学习**: 联邦平均 [2],差分隐私 [6],安全聚合 - -#### B. 训练策略 - -- **多任务学习** [7] - -- **迁移学习** [8] - -- **主动学习** - -### Paddle Encrypted - -Paddle Encrypted 是一个基于PaddlePaddle的隐私保护深度学习框架。Paddle Encrypted基于多方计算(MPC)实现安全训练及预测,拥有与PaddlePaddle相同的运行机制及编程范式。 - -Paddle Encrypted 设计与PaddlePaddle相似,没有密码学相关背景的用户亦可简单的对加密的数据进行训练和预测。同时,PaddlePaddle中丰富的模型和算法可以轻易地迁移到Paddle Encrypted中。 - -作为PaddleFL的一个重要组成部分,Paddle Encrypted可以很好滴支持联邦学习,包括横向、纵向及联邦迁移学习等多个场景。既提供了可靠的安全性,也拥有可观的性能。 - ## PaddleFL框架设计 ### 横向联邦方案 @@ -131,7 +130,7 @@ Paddle Encrypted 设计与PaddlePaddle相似,没有密码学相关背景的用 Paddle Encrypted 中的安全训练和推理任务是基于底层的ABY3多方计算协议实现的。在ABY3中,参与方可分为:输入方、计算方和结果方。 -输入方为训练数据及模型的持有方,负责加密数据和模型,并将其发送到计算方。计算方为训练的执行方,基于特定的多方安全计算协议完成训练任务。计算方只能得到加密后的数据及模型,以保证数据隐>私。计算结束后,结果方会拿到计算结果并恢复出明文数据。每个参与方可充当多个角色,如一个数据拥有方也可以作为计算方参与训练。 +输入方为训练数据及模型的持有方,负责加密数据和模型,并将其发送到计算方。计算方为训练的执行方,基于特定的多方安全计算协议完成训练任务。计算方只能得到加密后的数据及模型,以保证数据隐私。计算结束后,结果方会拿到计算结果并恢复出明文数据。每个参与方可充当多个角色,如一个数据拥有方也可以作为计算方参与训练。 Paddle Encrypted的整个训练及推理过程主要由三个部分组成:数据准备,训练/推理,结果解析。 @@ -139,7 +138,7 @@ Paddle Encrypted的整个训练及推理过程主要由三个部分组成:数 ##### 1. 私有数据对齐 -Paddle Encrypted允许数据拥有方(数据方)在不泄露自己数据的情况下,找出多方共有的样本集合。此功能在纵向联邦学习中非常必要,因为其要求多个数据方在训练前进行数据对齐,并且保护用户的数>据隐私。凭借PSI算法,Paddle Encrypted可以在一秒内完成6万条数据的对齐。 +Paddle Encrypted允许数据拥有方(数据方)在不泄露自己数据的情况下,找出多方共有的样本集合。此功能在纵向联邦学习中非常必要,因为其要求多个数据方在训练前进行数据对齐,并且保护用户的数据隐私。凭借PSI算法,Paddle Encrypted可以在一秒内完成6万条数据的对齐。 ##### 2. 数据加密及分发 diff --git a/python/paddle_fl/paddle_fl/examples/ctr_demo/README.md b/python/paddle_fl/paddle_fl/examples/ctr_demo/README.md index e854bd9..dd8f89a 100644 --- a/python/paddle_fl/paddle_fl/examples/ctr_demo/README.md +++ b/python/paddle_fl/paddle_fl/examples/ctr_demo/README.md @@ -10,7 +10,7 @@ This document introduces how to use PaddleFL to train a model with Fl Strategy. Please use pip which has paddlepaddle installed -``` +```sh pip install paddle_fl ``` @@ -18,7 +18,7 @@ pip install paddle_fl PaddleFL has two phases , CompileTime and RunTime. In CompileTime, a federated learning task is defined by fl_master. In RunTime, a federated learning job is executed on fl_server and fl_trainer in distributed clusters. -``` +```sh sh run.sh ``` @@ -26,7 +26,7 @@ sh run.sh In this example, we implement compile time programs in fl_master.py -``` +```sh python fl_master.py ``` @@ -84,7 +84,7 @@ job_generator.generate_fl_job( #### How to work in RunTime -``` +```sh python -u fl_scheduler.py >scheduler.log & python -u fl_server.py >server0.log & python -u fl_trainer.py 0 >trainer0.log & @@ -92,7 +92,7 @@ python -u fl_trainer.py 1 >trainer1.log & ``` In fl_scheduler.py, we let server and trainers to do registeration. -``` +```python worker_num = 2 server_num = 1 # Define the number of worker/server and the port for scheduler @@ -104,7 +104,7 @@ scheduler.start_fl_training() ``` In fl_server.py, we load and run the FL server job. -``` +```python server = FLServer() server_id = 0 job_path = "fl_job_config" @@ -118,7 +118,7 @@ server.start() In fl_trainer.py, we load and run the FL trainer job, then evaluate the accuracy with test data and compute the privacy budget. The DataSet is ramdomly generated. -``` +```python def reader(): for i in range(1000): data_dict = {} diff --git a/python/paddle_fl/paddle_fl/examples/dpsgd_demo/README.md b/python/paddle_fl/paddle_fl/examples/dpsgd_demo/README.md index f3dd18f..ab5bc6d 100644 --- a/python/paddle_fl/paddle_fl/examples/dpsgd_demo/README.md +++ b/python/paddle_fl/paddle_fl/examples/dpsgd_demo/README.md @@ -10,7 +10,7 @@ This document introduces how to use PaddleFL to train a model with Fl Strategy: Please use pip which has paddlepaddle installed -``` +```sh pip install paddle_fl ``` @@ -35,7 +35,7 @@ The dataset will downloaded automatically in the API and will be located under ` PaddleFL has two phases , CompileTime and RunTime. In CompileTime, a federated learning task is defined by fl_master. In RunTime, a federated learning job is executed on fl_server and fl_trainer in distributed clusters. -``` +```sh sh run.sh ``` @@ -43,7 +43,7 @@ sh run.sh In this example, we implement compile time programs in fl_master.py -``` +```sh python fl_master.py ``` @@ -100,7 +100,7 @@ job_generator.generate_fl_job( #### How to work in RunTime -``` +```sh python -u fl_scheduler.py >scheduler.log & python -u fl_server.py >server0.log & python -u fl_trainer.py 0 >trainer0.log & @@ -110,7 +110,7 @@ python -u fl_trainer.py 3 >trainer3.log & ``` In fl_scheduler.py, we let server and trainers to do registeration. -``` +```python worker_num = 4 server_num = 1 #Define number of worker/server and the port for scheduler @@ -122,7 +122,7 @@ scheduler.start_fl_training() ``` In fl_server.py, we load and run the FL server job. -``` +```python server = FLServer() server_id = 0 job_path = "fl_job_config" @@ -136,18 +136,15 @@ server.start() In fl_trainer.py, we load and run the FL trainer job, then evaluate the accuracy with test data and compute the privacy budget. -``` +```python trainer_id = int(sys.argv[1]) # trainer id for each guest job_path = "fl_job_config" job = FLRunTimeJob() job.load_trainer_job(job_path, trainer_id) trainer = FLTrainerFactory().create_fl_trainer(job) trainer.start() -``` - -``` def train_test(train_test_program, train_test_feed, train_test_reader): acc_set = [] for test_data in train_test_reader(): @@ -195,4 +192,4 @@ while not trainer.stop(): To show the effectiveness of DPSGD-based federated learning with PaddleFL, a simulated experiment is conducted on an open source dataset MNIST. From the figure given below, model evaluation results are similar between DPSGD-based federated learning and traditional parameter server training when the overall privacy budget *epsilon* is 1.3 or 0.13. -
+
diff --git a/python/paddle_fl/paddle_fl/examples/femnist_demo/README.md b/python/paddle_fl/paddle_fl/examples/femnist_demo/README.md index d02605f..5bac5c7 100644 --- a/python/paddle_fl/paddle_fl/examples/femnist_demo/README.md +++ b/python/paddle_fl/paddle_fl/examples/femnist_demo/README.md @@ -10,7 +10,7 @@ This document introduces how to use PaddleFL to train a model with Fl Strategy: Please use pip which has paddlepaddle installed -``` +```sh pip install paddle_fl ``` @@ -26,7 +26,7 @@ Public Dataset FEMNIST in [LEAF](https://github.com/TalwalkarLab/leaf) PaddleFL has two phases , CompileTime and RunTime. In CompileTime, a federated learning task is defined by fl_master. In RunTime, a federated learning job is executed on fl_server and fl_trainer in distributed clusters. -``` +```sh sh run.sh ``` @@ -34,7 +34,7 @@ sh run.sh In this example, we implement compile time programs in fl_master.py -``` +```sh python fl_master.py ``` @@ -99,7 +99,7 @@ job_generator.generate_fl_job( #### How to work in RunTime -``` +```sh python -u fl_scheduler.py >scheduler.log & python -u fl_server.py >server0.log & for ((i=0;i<4;i++)) @@ -109,7 +109,7 @@ done ``` In fl_scheduler.py, we let server and trainers to do registeration. -``` +```python worker_num = 4 server_num = 1 # Define the number of worker/server and the port for scheduler @@ -121,7 +121,7 @@ scheduler.start_fl_training() ``` In fl_server.py, we load and run the FL server job. -``` +```python server = FLServer() server_id = 0 job_path = "fl_job_config" @@ -135,7 +135,7 @@ server.start() In fl_trainer.py, we load and run the FL trainer job. -``` +```python trainer_id = int(sys.argv[1]) # trainer id for each guest job_path = "fl_job_config" job = FLRunTimeJob() diff --git a/python/paddle_fl/paddle_fl/examples/generate_job_from_program/README.md b/python/paddle_fl/paddle_fl/examples/generate_job_from_program/README.md index 9460a7c..c407d48 100644 --- a/python/paddle_fl/paddle_fl/examples/generate_job_from_program/README.md +++ b/python/paddle_fl/paddle_fl/examples/generate_job_from_program/README.md @@ -9,7 +9,7 @@ This document introduces how to load a pre-defined model, and transfer into prog Please use pip which has paddlepaddle installed -``` +```sh pip install paddle_fl ``` @@ -18,7 +18,8 @@ pip install paddle_fl #### How to save a program In program_saver.py, you can defind a model. And save the program in to 'load_file' -``` + +```python input = fluid.layers.data(name='input', shape=[1, 28, 28], dtype="float32") label = fluid.layers.data(name='label', shape=[1], dtype='int64') feeder = fluid.DataFeeder(feed_list=[input, label], place=fluid.CPUPlace()) @@ -42,7 +43,7 @@ job_generator.save_program(program_path, [input, label], In fl_master.py, you can load the program in 'load_file' and transfer it into an fl program. -``` +```python build_strategy = FLStrategyFactory() build_strategy.fed_avg = True build_strategy.inner_step = 10 @@ -62,7 +63,7 @@ job_generator.generate_fl_job_from_program( #### How to work in RunTime -``` +```sh python -u fl_scheduler.py >scheduler.log & python -u fl_server.py >server0.log & python -u fl_trainer.py 0 >trainer0.log & @@ -70,7 +71,7 @@ python -u fl_trainer.py 1 >trainer1.log & ``` In fl_scheduler.py, we let server and trainers to do registeration. -``` +```python worker_num = 2 server_num = 1 #Define number of worker/server and the port for scheduler @@ -82,7 +83,7 @@ scheduler.start_fl_training() ``` In fl_server.py, we load and run the FL server job. -``` +```python server = FLServer() server_id = 0 job_path = "fl_job_config" @@ -95,7 +96,8 @@ server.start() ``` In fl_trainer.py, we load and run the FL trainer job, then evaluate the accuracy with test data. -``` + +```python trainer_id = int(sys.argv[1]) # trainer id for each guest job_path = "fl_job_config" job = FLRunTimeJob() diff --git a/python/paddle_fl/paddle_fl/examples/gru4rec_demo/README.md b/python/paddle_fl/paddle_fl/examples/gru4rec_demo/README.md index 33d4493..fea89bb 100644 --- a/python/paddle_fl/paddle_fl/examples/gru4rec_demo/README.md +++ b/python/paddle_fl/paddle_fl/examples/gru4rec_demo/README.md @@ -104,4 +104,4 @@ wget https://paddle-zwh.bj.bcebos.com/gru4rec_paddlefl_benchmark/gru4rec_benchma | 1/4 of the whole dataset | private training | - | 0.269 | | 1/4 of the whole dataset | private training | - | 0.282 | -
+
diff --git a/python/paddle_fl/paddle_fl/examples/secagg_demo/README.md b/python/paddle_fl/paddle_fl/examples/secagg_demo/README.md index 08d9751..7997960 100644 --- a/python/paddle_fl/paddle_fl/examples/secagg_demo/README.md +++ b/python/paddle_fl/paddle_fl/examples/secagg_demo/README.md @@ -10,7 +10,7 @@ This document introduces how to use PaddleFL to train a model with Fl Strategy: Please use pip which has paddlepaddle installed -``` +```sh pip install paddle_fl ``` @@ -35,7 +35,7 @@ The dataset will downloaded automatically in the API and will be located under ` PaddleFL has two phases , CompileTime and RunTime. In CompileTime, a federated learning task is defined by fl_master. In RunTime, a federated learning job is executed on fl_server and fl_trainer in distributed clusters. -``` +```sh sh run.sh ``` @@ -43,7 +43,7 @@ sh run.sh In this example, we implement compile time programs in fl_master.py -``` +```sh python fl_master.py ``` @@ -98,7 +98,7 @@ job_generator.generate_fl_job( #### How to work in RunTime -```shell +```sh python3 fl_master.py sleep 2 python3 -u fl_server.py >log/server0.log & @@ -109,7 +109,7 @@ python3 -u fl_trainer.py 1 >log/trainer1.log & ``` In fl_scheduler.py, we let server and trainers to do registeration. -``` +```python worker_num = 2 server_num = 1 #Define number of worker/server and the port for scheduler @@ -122,7 +122,7 @@ scheduler.start_fl_training() In fl_server.py, we load and run the FL server job. -``` +```python server = FLServer() server_id = 0 job_path = "fl_job_config" diff --git a/python/paddle_fl/paddle_fl/examples/submitter_demo/README.md b/python/paddle_fl/paddle_fl/examples/submitter_demo/README.md index 576c9c8..9efba8f 100644 --- a/python/paddle_fl/paddle_fl/examples/submitter_demo/README.md +++ b/python/paddle_fl/paddle_fl/examples/submitter_demo/README.md @@ -10,7 +10,7 @@ This document introduces how to submit an FL job to mpi cluster Please use pip which has paddlepaddle installed -``` +```sh pip install paddle_fl ``` @@ -26,7 +26,7 @@ pip install paddle_fl The information of the cluster is defined in config.txt and will be transmitted into client.py. Then a function called job_generator() will generate job for fl_server and fl_trainer. Finally, the job will be submitted. The train_program.py is the executed program in cluster. -``` +```sh #use the python prepared above to submit job python/bin/python client.py config.txt ``` -- GitLab