Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
Primihub
PrimiHub
提交
8e965485
P
PrimiHub
项目概览
Primihub
/
PrimiHub
9 个月 前同步成功
通知
21
Star
1
Fork
1
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
DevOps
流水线
流水线任务
计划
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
PrimiHub
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
DevOps
DevOps
流水线
流水线任务
计划
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
流水线任务
提交
Issue看板
前往新版Gitcode,体验更适合开发者的 AI 搜索 >>
未验证
提交
8e965485
编写于
8月 21, 2023
作者:
X
Xuefeng Xu
提交者:
GitHub
8月 21, 2023
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
add HFL preprocessing pipeline (#604)
上级
1b70f701
变更
3
隐藏空白更改
内联
并排
Showing
3 changed file
with
93 addition
and
17 deletion
+93
-17
example/FL/preprocessing/hfl_pipeline.json
example/FL/preprocessing/hfl_pipeline.json
+63
-0
example/FL/preprocessing/vfl_pipeline.json
example/FL/preprocessing/vfl_pipeline.json
+5
-1
python/primihub/FL/preprocessing/pipeline.py
python/primihub/FL/preprocessing/pipeline.py
+25
-16
未找到文件。
example/FL/preprocessing/hfl_pipeline.json
0 → 100644
浏览文件 @
8e965485
{
"party_info"
:
{
"task_manager"
:
"127.0.0.1:50050"
},
"component_params"
:
{
"roles"
:
{
"server"
:
"Alice"
,
"client"
:
[
"Bob"
,
"Charlie"
]
},
"common_params"
:
{
"model"
:
"FL_Preprocess"
,
"process"
:
"fit_transform"
,
"FL_type"
:
"H"
,
"task_name"
:
"HFL_pipeline_fit_transform"
,
"task"
:
"classification"
,
"selected_column"
:
null
,
"id"
:
"id"
,
"label"
:
"y"
,
"preprocess_column"
:
null
,
"preprocess_module"
:
{
"LabelEncoder"
:
{
"column"
:
"y"
},
"SimpleImputer_string"
:
{
"column"
:
null
,
"missing_values"
:
"np.nan"
,
"strategy"
:
"most_frequent"
},
"SimpleImputer_numeric"
:
{
"column"
:
null
,
"missing_values"
:
"np.nan"
,
"strategy"
:
"mean"
},
"OrdinalEncoder"
:
{
"column"
:
null
,
"handle_unknown"
:
"use_encoded_value"
,
"unknown_value"
:
-1
},
"StandardScaler"
:
{
"column"
:
null
}
}
},
"role_params"
:
{
"Bob"
:
{
"data_set"
:
"preprocess_hfl_train_client1"
,
"preprocess_dataset_path"
:
"data/result/Bob_train_dataset.csv"
,
"preprocess_module_path"
:
"data/result/Bob_preprocess_module.pkl"
},
"Charlie"
:
{
"data_set"
:
"preprocess_hfl_train_client2"
,
"preprocess_dataset_path"
:
"data/result/Charlie_train_dataset.csv"
,
"preprocess_module_path"
:
"data/result/Charlie_preprocess_module.pkl"
},
"Alice"
:
{
"data_set"
:
"fl_fake_data"
}
}
}
}
\ No newline at end of file
example/FL/preprocessing/vfl_pipeline.json
浏览文件 @
8e965485
...
...
@@ -11,7 +11,7 @@
"model"
:
"FL_Preprocess"
,
"process"
:
"fit_transform"
,
"FL_type"
:
"V"
,
"task_name"
:
"VFL_p
reprocess
_fit_transform"
,
"task_name"
:
"VFL_p
ipeline
_fit_transform"
,
"task"
:
"classification"
},
"role_params"
:
{
...
...
@@ -29,10 +29,12 @@
},
"SimpleImputer_string"
:
{
"column"
:
null
,
"missing_values"
:
"np.nan"
,
"strategy"
:
"most_frequent"
},
"SimpleImputer_numeric"
:
{
"column"
:
null
,
"missing_values"
:
"np.nan"
,
"strategy"
:
"mean"
},
"OrdinalEncoder"
:
{
...
...
@@ -55,10 +57,12 @@
"preprocess_module"
:
{
"SimpleImputer_string"
:
{
"column"
:
null
,
"missing_values"
:
"np.nan"
,
"strategy"
:
"most_frequent"
},
"SimpleImputer_numeric"
:
{
"column"
:
null
,
"missing_values"
:
"np.nan"
,
"strategy"
:
"mean"
},
"OrdinalEncoder"
:
{
...
...
python/primihub/FL/preprocessing/pipeline.py
浏览文件 @
8e965485
...
...
@@ -8,6 +8,7 @@ from primihub.FL.preprocessing import *
import
pickle
import
numpy
as
np
import
pandas
as
pd
from
itertools
import
chain
class
Pipeline
(
BaseModel
):
...
...
@@ -109,23 +110,31 @@ class Pipeline(BaseModel):
raise
RuntimeError
(
error_msg
)
column
=
params
.
get
(
'column'
)
if
column
is
None
and
preprocess_column
is
not
None
:
if
column
is
None
:
column
=
preprocess_column
if
column
is
None
and
role
!=
'server'
:
if
'SimpleImputer'
in
module_name
:
nan_column
=
preprocess_column
[
data
[
preprocess_column
].
isna
().
any
()]
if
'string'
in
module_name
:
column
=
data
[
nan_column
].
select_dtypes
(
exclude
=
num_type
).
columns
elif
'numeric'
in
module_name
:
column
=
data
[
nan_column
].
select_dtypes
(
include
=
num_type
).
columns
else
:
column
=
nan_column
elif
'Encoder'
in
module_name
:
column
=
data
[
preprocess_column
].
select_dtypes
(
exclude
=
num_type
).
columns
elif
'Scaler'
in
module_name
:
column
=
data
[
preprocess_column
].
select_dtypes
(
include
=
num_type
).
columns
else
:
column
=
preprocess_column
if
role
!=
'server'
:
if
'SimpleImputer'
in
module_name
:
nan_column
=
column
[
data
[
column
].
isna
().
any
()]
if
'string'
in
module_name
:
column
=
data
[
nan_column
].
select_dtypes
(
exclude
=
num_type
).
columns
elif
'numeric'
in
module_name
:
column
=
data
[
nan_column
].
select_dtypes
(
include
=
num_type
).
columns
else
:
column
=
nan_column
elif
module_name
in
[
'OrdinalEncoder'
,
'OneHotEncoder'
]:
column
=
data
[
column
].
select_dtypes
(
exclude
=
num_type
).
columns
elif
'Scaler'
in
module_name
:
column
=
data
[
column
].
select_dtypes
(
include
=
num_type
).
columns
if
role
==
'client'
:
channel
.
send
(
'column'
,
column
)
column
=
channel
.
recv
(
'column'
)
if
role
==
'server'
:
client_column
=
channel
.
recv_all
(
'column'
)
column
=
list
(
set
(
chain
.
from_iterable
(
client_column
)))
channel
.
send_all
(
'column'
,
column
)
if
column
is
not
None
:
if
isinstance
(
column
,
pd
.
Index
):
column
=
column
.
tolist
()
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录