Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
PaddleFL
提交
57c82ab5
P
PaddleFL
项目概览
PaddlePaddle
/
PaddleFL
通知
35
Star
5
Fork
1
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
6
列表
看板
标记
里程碑
合并请求
4
Wiki
3
Wiki
分析
仓库
DevOps
项目成员
Pages
P
PaddleFL
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
6
Issue
6
列表
看板
标记
里程碑
合并请求
4
合并请求
4
Pages
分析
分析
仓库分析
DevOps
Wiki
3
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
57c82ab5
编写于
9月 16, 2020
作者:
H
He, Kai
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
add mean normalize demo
上级
e8240167
变更
5
隐藏空白更改
内联
并排
Showing
5 changed file
with
302 addition
and
0 deletion
+302
-0
python/paddle_fl/mpc/examples/mean_normalize_demo/README.md
python/paddle_fl/mpc/examples/mean_normalize_demo/README.md
+59
-0
python/paddle_fl/mpc/examples/mean_normalize_demo/mean_normalize_demo.py
...l/mpc/examples/mean_normalize_demo/mean_normalize_demo.py
+86
-0
python/paddle_fl/mpc/examples/mean_normalize_demo/prepare.py
python/paddle_fl/mpc/examples/mean_normalize_demo/prepare.py
+41
-0
python/paddle_fl/mpc/examples/mean_normalize_demo/process_data.py
...addle_fl/mpc/examples/mean_normalize_demo/process_data.py
+68
-0
python/paddle_fl/mpc/examples/mean_normalize_demo/verify.py
python/paddle_fl/mpc/examples/mean_normalize_demo/verify.py
+48
-0
未找到文件。
python/paddle_fl/mpc/examples/mean_normalize_demo/README.md
0 → 100644
浏览文件 @
57c82ab5
## Instructions for PaddleFL-MPC Mean Normalize Demo
This document introduces how to run Mean Normalize demo based on Paddle-MPC,
which is single machine demo.
### Running on Single Machine
#### (1). Prepare Data
Create a empty dir for data, and modify
`data_path`
in
`process_data.py`
,
default dir path is
`./data`
.
Then run the script with command
`python prepare.py`
to generate random data
for demo. Otherwise generate your own data, move them to
`data_path`
and modify
corresponding meta info in
`prepare.py`
.
Encrypted data files of feature statstics would be generated and saved in
`data_path`
directory. Different suffix names are used for these files to
indicate the ownership of different data source and computation parties.
For instance, a file named
`feature_max.1.part2`
means it contains the max
feature values from data owner 1 and needs to be feed to computing party 2.
#### (2). Launch Demo with A Shell Script
You should set the env params as follow:
```
export PYTHON=/yor/python
export PATH_TO_REDIS_BIN=/path/to/redis_bin
export LOCALHOST=/your/localhost
export REDIS_PORT=/your/redis/port
```
Launch demo with the
`run_standalone.sh`
script. The concrete command is:
```
bash
bash ../run_standalone.sh mean_normalize_demo.py
```
The ciphertext result of global feature range and feature mean will be save in
`data_path`
directory, named
`result.part{i}`
.
#### (3). Decrypt Data
Finally, using
`decrypt_data()`
in
`process_data.py`
script, this demo would
decrypt and returns the result, which can be used to rescale local feature data
by all data owners respectively.
```
python
import
prepare
import
process_data
# 0 for f_range, 1 for f_mean
# use decrypted global f_range and f_mean to rescaling local feature data
res
=
process_data
.
decrypt_data
(
prepare
.
data_path
+
'result'
,
(
2
,
prepare
.
feat_width
,
))
```
Also,
`verify.py`
could be used to calculate error between direct plaintext
numpy calculation and mpc mean normalize.
python/paddle_fl/mpc/examples/mean_normalize_demo/mean_normalize_demo.py
0 → 100644
浏览文件 @
57c82ab5
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Mean normalize demo.
"""
import
sys
import
numpy
as
np
import
paddle.fluid
as
fluid
import
paddle_fl.mpc
as
pfl_mpc
import
paddle_fl.mpc.data_utils.aby3
as
aby3
import
prepare
import
process_data
role
,
server
,
port
=
sys
.
argv
[
1
],
sys
.
argv
[
2
],
sys
.
argv
[
3
]
role
,
port
=
int
(
role
),
int
(
port
)
share_num
=
aby3
.
ABY3_SHARE_DIM
party_num
=
len
(
prepare
.
sample_nums
)
feat_num
=
prepare
.
feat_width
data_path
=
prepare
.
data_path
def
get_shares
(
path
):
'''
collect encrypted feature stats from all data owners
'''
data
=
[]
for
i
in
range
(
party_num
):
reader
=
aby3
.
load_aby3_shares
(
path
+
'.'
+
str
(
i
),
id
=
role
,
shape
=
(
feat_num
,))
data
.
append
([
x
for
x
in
reader
()])
data
=
np
.
array
(
data
).
reshape
([
party_num
,
share_num
,
feat_num
])
return
np
.
transpose
(
data
,
axes
=
[
1
,
0
,
2
])
def
get_sample_num
(
path
):
'''
get encrypted sample nums
'''
reader
=
aby3
.
load_aby3_shares
(
path
,
id
=
role
,
shape
=
(
party_num
,))
for
n
in
reader
():
return
n
f_max
=
get_shares
(
data_path
+
'feature_max'
)
f_min
=
get_shares
(
data_path
+
'feature_min'
)
f_mean
=
get_shares
(
data_path
+
'feature_mean'
)
sample_num
=
get_sample_num
(
data_path
+
'sample_num'
)
pfl_mpc
.
init
(
"aby3"
,
int
(
role
),
"localhost"
,
server
,
int
(
port
))
shape
=
[
party_num
,
feat_num
]
mi
=
pfl_mpc
.
data
(
name
=
'mi'
,
shape
=
shape
,
dtype
=
'int64'
)
ma
=
pfl_mpc
.
data
(
name
=
'ma'
,
shape
=
shape
,
dtype
=
'int64'
)
me
=
pfl_mpc
.
data
(
name
=
'me'
,
shape
=
shape
,
dtype
=
'int64'
)
sn
=
pfl_mpc
.
data
(
name
=
'sn'
,
shape
=
shape
[:
-
1
],
dtype
=
'int64'
)
out0
,
out1
=
pfl_mpc
.
layers
.
mean_normalize
(
f_min
=
mi
,
f_max
=
ma
,
f_mean
=
me
,
sample_num
=
sn
)
exe
=
fluid
.
Executor
(
place
=
fluid
.
CPUPlace
())
f_range
,
f_mean
=
exe
.
run
(
feed
=
{
'mi'
:
f_min
,
'ma'
:
f_max
,
'me'
:
f_mean
,
'sn'
:
sample_num
},
fetch_list
=
[
out0
,
out1
])
result
=
np
.
transpose
(
np
.
array
([
f_range
,
f_mean
]),
axes
=
[
1
,
0
,
2
])
result_file
=
data_path
+
"result.part{}"
.
format
(
role
)
with
open
(
result_file
,
'wb'
)
as
f
:
f
.
write
(
result
.
tostring
())
python/paddle_fl/mpc/examples/mean_normalize_demo/prepare.py
0 → 100644
浏览文件 @
57c82ab5
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Prepare data for mean normalize demo.
"""
import
numpy
as
np
import
process_data
from
paddle_fl.mpc.data_utils
import
aby3
data_path
=
process_data
.
data_path
feat_width
=
100
# assume data owner i has sample_nums[i] samples
sample_nums
=
[
1
,
2
,
3
,
4
]
def
gen_random_data
():
for
i
,
num
in
enumerate
(
sample_nums
):
suffix
=
'.'
+
str
(
i
)
f_mat
=
np
.
random
.
rand
(
num
,
feat_width
)
np
.
save
(
data_path
+
'feature_data'
+
suffix
,
f_mat
)
process_data
.
generate_encrypted_data
(
i
,
f_mat
)
aby3
.
save_aby3_shares
(
process_data
.
encrypted_data
(
np
.
array
(
sample_nums
)),
data_path
+
'sample_num'
)
if
__name__
==
"__main__"
:
gen_random_data
()
python/paddle_fl/mpc/examples/mean_normalize_demo/process_data.py
0 → 100644
浏览文件 @
57c82ab5
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Process data for mean normalize demo.
"""
import
numpy
as
np
import
six
import
os
import
paddle
from
paddle_fl.mpc.data_utils
import
aby3
data_path
=
'./data/'
def
encrypted_data
(
data
):
"""
feature stat reader
"""
def
func
():
yield
aby3
.
make_shares
(
data
)
return
func
def
generate_encrypted_data
(
party_id
,
f_mat
):
"""
generate encrypted data from feature matrix (np.array)
"""
f_max
=
np
.
amax
(
f_mat
,
axis
=
0
)
f_min
=
np
.
amin
(
f_mat
,
axis
=
0
)
f_mean
=
np
.
mean
(
f_mat
,
axis
=
0
)
suffix
=
'.'
+
str
(
party_id
)
aby3
.
save_aby3_shares
(
encrypted_data
(
f_max
),
data_path
+
"feature_max"
+
suffix
)
aby3
.
save_aby3_shares
(
encrypted_data
(
f_min
),
data_path
+
"feature_min"
+
suffix
)
aby3
.
save_aby3_shares
(
encrypted_data
(
f_mean
),
data_path
+
"feature_mean"
+
suffix
)
def
decrypt_data
(
filepath
,
shape
):
"""
load the encrypted data and reconstruct
"""
part_readers
=
[]
for
id
in
six
.
moves
.
range
(
3
):
part_readers
.
append
(
aby3
.
load_aby3_shares
(
filepath
,
id
=
id
,
shape
=
shape
))
aby3_share_reader
=
paddle
.
reader
.
compose
(
part_readers
[
0
],
part_readers
[
1
],
part_readers
[
2
])
for
instance
in
aby3_share_reader
():
p
=
aby3
.
reconstruct
(
np
.
array
(
instance
))
return
p
python/paddle_fl/mpc/examples/mean_normalize_demo/verify.py
0 → 100644
浏览文件 @
57c82ab5
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Verification for mean normalize demo.
"""
import
prepare
import
process_data
import
numpy
as
np
import
paddle_fl.mpc.data_utils.aby3
as
aby3
# 0 for f_range, 1 for f_mean
# use decrypted global f_range and f_mean to rescaling local feature data
res
=
process_data
.
decrypt_data
(
prepare
.
data_path
+
'result'
,
(
2
,
prepare
.
feat_width
,
))
# reconstruct plaintext global data to verify
row
,
col
=
sum
(
prepare
.
sample_nums
),
prepare
.
feat_width
plain_mat
=
np
.
empty
((
row
,
col
))
row
=
0
for
i
,
num
in
enumerate
(
prepare
.
sample_nums
):
m
=
np
.
load
(
prepare
.
data_path
+
'feature_data.'
+
str
(
i
)
+
'.npy'
)
plain_mat
[
row
:
row
+
num
]
=
m
row
+=
num
def
mean_normalize
(
f_mat
):
'''
get plain text f_range & f_mean
'''
ma
=
np
.
amax
(
f_mat
,
axis
=
0
)
mi
=
np
.
amin
(
f_mat
,
axis
=
0
)
return
ma
-
mi
,
np
.
mean
(
f_mat
,
axis
=
0
)
plain_range
,
plain_mean
=
mean_normalize
(
plain_mat
)
print
(
"max error in featrue range:"
,
np
.
max
(
np
.
abs
(
res
[
0
]
-
plain_range
)))
print
(
"max error in featrue mean:"
,
np
.
max
(
np
.
abs
(
res
[
1
]
-
plain_mean
)))
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录