Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
Greenplum
Pytorch Widedeep
提交
5f5bb84b
P
Pytorch Widedeep
项目概览
Greenplum
/
Pytorch Widedeep
11 个月 前同步成功
通知
9
Star
1
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
DevOps
流水线
流水线任务
计划
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Pytorch Widedeep
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
DevOps
DevOps
流水线
流水线任务
计划
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
流水线任务
提交
Issue看板
体验新版 GitCode,发现更多精彩内容 >>
提交
5f5bb84b
编写于
7月 13, 2020
作者:
J
jrzaurin
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
minor adjustment to setup.py
上级
dc2f5892
变更
2
隐藏空白更改
内联
并排
Showing
2 changed file
with
1 addition
and
162 deletion
+1
-162
docs/pypiREADME.md
docs/pypiREADME.md
+0
-161
setup.py
setup.py
+1
-1
未找到文件。
docs/pypiREADME.md
已删除
100644 → 0
浏览文件 @
dc2f5892
# pytorch-widedeep
A flexible package to combine tabular data with text and images using wide and
deep models.
### Introduction
`pytorch-widedeep`
is based on Google's Wide and Deep Algorithm. Details of
the original algorithm can be found
[
here
](
https://www.tensorflow.org/tutorials/wide_and_deep
)
, and the nice
research paper can be found
[
here
](
https://arxiv.org/abs/1606.07792
)
.
In general terms,
`pytorch-widedeep`
is a package to use deep learning with
tabular data. In particular, is intended to facilitate the combination of text
and images with corresponding tabular data using wide and deep models. With
that in mind there are two architectures that can be implemented with just a
few lines of code. For details on these architectures please visit the
[
repo
](
https://github.com/jrzaurin/pytorch-widedeep
)
.
### Installation
Install using pip:
```
bash
pip
install
pytorch-widedeep
```
Or install directly from github
```
bash
pip
install
git+https://github.com/jrzaurin/pytorch-widedeep.git
```
#### Developer Install
```
bash
# Clone the repository
git clone https://github.com/jrzaurin/pytorch-widedeep
cd
pytorch-widedeep
# Install in dev mode
pip
install
-e
.
```
### Examples
There are a number of notebooks in the
`examples`
folder plus some additional
files. These notebooks cover most of the utilities of this package and can
also act as documentation. In the case that github does not render the
notebooks, or it renders them missing some parts, they are saved as markdown
files in the
`docs`
folder.
### Quick start
Binary classification with the
[
adult
dataset
](
[adult](https://www.kaggle.com/wenruliu/adult-income-dataset
)
)
using
`Wide`
and
`DeepDense`
and defaults settings.
```
python
import
pandas
as
pd
from
sklearn.model_selection
import
train_test_split
from
pytorch_widedeep.preprocessing
import
WidePreprocessor
,
DeepPreprocessor
from
pytorch_widedeep.models
import
Wide
,
DeepDense
,
WideDeep
from
pytorch_widedeep.metrics
import
BinaryAccuracy
# these next 4 lines are not directly related to pytorch-widedeep. I assume
# you have downloaded the dataset and place it in a dir called data/adult/
df
=
pd
.
read_csv
(
"data/adult/adult.csv.zip"
)
df
[
"income_label"
]
=
(
df
[
"income"
].
apply
(
lambda
x
:
">50K"
in
x
)).
astype
(
int
)
df
.
drop
(
"income"
,
axis
=
1
,
inplace
=
True
)
df_train
,
df_test
=
train_test_split
(
df
,
test_size
=
0.2
,
stratify
=
df
.
income_label
)
# prepare wide, crossed, embedding and continuous columns
wide_cols
=
[
"education"
,
"relationship"
,
"workclass"
,
"occupation"
,
"native-country"
,
"gender"
,
]
cross_cols
=
[(
"education"
,
"occupation"
),
(
"native-country"
,
"occupation"
)]
embed_cols
=
[
(
"education"
,
16
),
(
"workclass"
,
16
),
(
"occupation"
,
16
),
(
"native-country"
,
32
),
]
cont_cols
=
[
"age"
,
"hours-per-week"
]
target_col
=
"income_label"
# target
target
=
df_train
[
target_col
].
values
# wide
preprocess_wide
=
WidePreprocessor
(
wide_cols
=
wide_cols
,
crossed_cols
=
cross_cols
)
X_wide
=
preprocess_wide
.
fit_transform
(
df_train
)
wide
=
Wide
(
wide_dim
=
X_wide
.
shape
[
1
],
output_dim
=
1
)
# deepdense
preprocess_deep
=
DeepPreprocessor
(
embed_cols
=
embed_cols
,
continuous_cols
=
cont_cols
)
X_deep
=
preprocess_deep
.
fit_transform
(
df_train
)
deepdense
=
DeepDense
(
hidden_layers
=
[
64
,
32
],
deep_column_idx
=
preprocess_deep
.
deep_column_idx
,
embed_input
=
preprocess_deep
.
embeddings_input
,
continuous_cols
=
cont_cols
,
)
# build, compile and fit
model
=
WideDeep
(
wide
=
wide
,
deepdense
=
deepdense
)
model
.
compile
(
method
=
"binary"
,
metrics
=
[
BinaryAccuracy
])
model
.
fit
(
X_wide
=
X_wide
,
X_deep
=
X_deep
,
target
=
target
,
n_epochs
=
5
,
batch_size
=
256
,
val_split
=
0.1
,
)
# predict
X_wide_te
=
preprocess_wide
.
transform
(
df_test
)
X_deep_te
=
preprocess_deep
.
transform
(
df_test
)
preds
=
model
.
predict
(
X_wide
=
X_wide_te
,
X_deep
=
X_deep_te
)
```
Of course, one can do much more, such as using different initializations,
optimizers or learning rate schedulers for each component of the overall
model. Adding FC-Heads to the Text and Image components. Using the
[
Focal
Loss
](
https://arxiv.org/abs/1708.02002
)
, warming up individual components
before joined training, etc. See the
`examples`
or the
`docs`
folders for a
better understanding of the content of the package and its functionalities.
### Testing
```
pytest tests
```
### Acknowledgments
This library takes from a series of other libraries, so I think it is just
fair to mention them here in the README (specific mentions are also included
in the code).
The
`Callbacks`
and
`Initializers`
structure and code is inspired by the
[
`torchsample`
](
https://github.com/ncullen93/torchsample
)
library, which in
itself partially inspired by
[
`Keras`
](
https://keras.io/
)
.
The
`TextProcessor`
class in this library uses the
[
`fastai`
](
https://docs.fast.ai/text.transform.html#BaseTokenizer.tokenizer
)
's
`Tokenizer`
and
`Vocab`
. The code at
`utils.fastai_transforms`
is a minor
adaptation of their code so it functions within this library. To my experience
their
`Tokenizer`
is the best in class.
The
`ImageProcessor`
class in this library uses code from the fantastic
[
Deep
Learning for Computer
Vision
](
https://www.pyimagesearch.com/deep-learning-computer-vision-python-book/
)
(DL4CV) book by Adrian Rosebrock.
\ No newline at end of file
setup.py
浏览文件 @
5f5bb84b
...
...
@@ -26,7 +26,7 @@ setup_kwargs = {
"name"
:
"pytorch-widedeep"
,
"version"
:
version
,
"description"
:
"Combine tabular data with text and images using Wide and Deep models in Pytorch"
,
"long_description"
:
open
(
"
docs/pypi
README.md"
,
"r"
,
encoding
=
"utf-8"
).
read
(),
"long_description"
:
open
(
"
pypi_
README.md"
,
"r"
,
encoding
=
"utf-8"
).
read
(),
"long_description_content_type"
:
"text/markdown"
,
# "long_description": long_description,
"author"
:
"Javier Rodriguez Zaurin"
,
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录