...
 
Commits (18)
    https://gitcode.net/greenplum/pytorch-widedeep/-/commit/0fdbfdcecba0e61cc083faa7ad2d00d9c88ebce4 first step towards adding an example on how to reproduce a kaggle notebook... 2023-07-24T17:01:50+02:00 Javier jrzaurin@gmail.com first step towards adding an example on how to reproduce a kaggle notebook (details in the code) with this library https://gitcode.net/greenplum/pytorch-widedeep/-/commit/21759eefe4fe56513ffec70f500511fe3cca165b Added scripts on how to use the library for recsys in response to issue #133.... 2023-07-27T10:50:04+02:00 Javier jrzaurin@gmail.com Added scripts on how to use the library for recsys in response to issue #133. Also Added a simple/basic transformer model for the text component before integrating with HF. Also added the option of specify the dimension of the feed forward network https://gitcode.net/greenplum/pytorch-widedeep/-/commit/d09446e53cb5ccb041b897d16e71290f44cc96c1 Added unit tests. Need to write a notebook. Test is on GPU and ready to merge 2023-07-27T12:55:45+02:00 Javier jrzaurin@gmail.com https://gitcode.net/greenplum/pytorch-widedeep/-/commit/5b5e680871de6f151022bafca2447de5480e8890 Added the 1st of two notebooks to illustrate the use of the library in the... 2023-07-28T16:53:38+02:00 Javier jrzaurin@gmail.com Added the 1st of two notebooks to illustrate the use of the library in the context of recommendation systems https://gitcode.net/greenplum/pytorch-widedeep/-/commit/04e9d38b500c37dc03dce13de0431d20ff39da5a Added the notebooks to illustrate how to use the library to build... 2023-07-28T22:25:31+02:00 Javier jrzaurin@gmail.com Added the notebooks to illustrate how to use the library to build recommendation algos. A couple of bugs to fix and ready to merge and publish https://gitcode.net/greenplum/pytorch-widedeep/-/commit/8813ceeabaf4a9b37a5b8824a761c9fe12719715 added movielens dataset and tests 2023-07-30T21:24:14+02:00 Pavol Mulinka mulinka.pavol@gmail.com https://gitcode.net/greenplum/pytorch-widedeep/-/commit/b6a1033639089c12352d64a2879d04921a03eabc Merge pull request #182 from jrzaurin/recsys_movielens_dataset 2023-07-30T21:32:29+01:00 Javier jrzaurin@gmail.com added movielens dataset and tests https://gitcode.net/greenplum/pytorch-widedeep/-/commit/d30203a0a02a75b5bf2a421ff0aac35c8b1c93b5 Fixed a bug related to the padding idx and the fast ai transforms. Also... 2023-07-31T13:11:32+01:00 Javier jrzaurin@gmail.com Fixed a bug related to the padding idx and the fast ai transforms. Also adjusted the scripts to show how one can use the 'load_movielens100k' function in the library https://gitcode.net/greenplum/pytorch-widedeep/-/commit/801c597ae6d5d9224af43d9d495d59e81062f3b1 Adjusted the notebooks to show how one can use the 'load_movielens100k' funct... 2023-07-31T13:59:17+01:00 Javier jrzaurin@gmail.com https://gitcode.net/greenplum/pytorch-widedeep/-/commit/e75c119073312e6bee047e52ad89a900125f524b bump version to 1.3.1 2023-07-31T17:36:48+01:00 Javier jrzaurin@gmail.com https://gitcode.net/greenplum/pytorch-widedeep/-/commit/cd1ff79ae327ac025fe65e6dafe76f305bfd2cbb Merge pull request #183 from jrzaurin/wide_deep_recsys 2023-07-31T18:01:57+01:00 Javier jrzaurin@gmail.com Wide deep recsys https://gitcode.net/greenplum/pytorch-widedeep/-/commit/52ae96b5829b60ce1f119db047574c96c950a571 Merge remote-tracking branch 'origin/master' into flash_attention 2023-08-02T10:42:06+01:00 Javier jrzaurin@gmail.com https://gitcode.net/greenplum/pytorch-widedeep/-/commit/eb02f25f07908c0285fb3a0be7cfdb5186781576 Added linear attention from the paper 'Transformers are RNNs: Fast... 2023-08-02T13:12:53+01:00 Javier jrzaurin@gmail.com Added linear attention from the paper 'Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention'. This now needs to be turned into an encoder and offer as an optional model https://gitcode.net/greenplum/pytorch-widedeep/-/commit/7a80f3e7068032ee1b57ef07900bcbdd6f188ed7 implemented linear and 'standard' attention in a functional way so they are... 2023-08-02T15:20:57+01:00 Javier jrzaurin@gmail.com implemented linear and 'standard' attention in a functional way so they are available via parameters passed to the main multi head attention class https://gitcode.net/greenplum/pytorch-widedeep/-/commit/2a74d34f6855db80c32a37ddaa346b1da4721aa1 test passed. Need to increase a bit test coverage for the tabtransformer and... 2023-08-02T18:18:36+01:00 Javier jrzaurin@gmail.com test passed. Need to increase a bit test coverage for the tabtransformer and attention_layers, and review the docs https://gitcode.net/greenplum/pytorch-widedeep/-/commit/b6362d1d31e6ac71557539421c3875b4286d57dd Added some docs. Only thing left is test new attention mechanism on GPU 2023-08-03T18:20:53+01:00 Javier jrzaurin@gmail.com https://gitcode.net/greenplum/pytorch-widedeep/-/commit/d3657c32d9a352940eab1d4436794b57a1432751 Added a example of flash and linear attention. Fix some small bugs in one... 2023-08-04T13:12:37+01:00 Javier jrzaurin@gmail.com Added a example of flash and linear attention. Fix some small bugs in one example. Adjusted all new functionality to GPU usage https://gitcode.net/greenplum/pytorch-widedeep/-/commit/67439c4220fa6fdf979b32f7c0c9ea4d9f748437 Bumped to version 1.3.2 2023-08-04T15:17:32+01:00 Javier jrzaurin@gmail.com
......@@ -21,6 +21,7 @@ tmp_dir/
weights/
pretrained_weights/
model_weights/
prepared_data/
# Unit Tests/Coverage
.coverage
......
cff-version: "1.2.0"
authors:
- family-names: Zaurin
given-names: Javier Rodriguez
orcid: "https://orcid.org/0000-0002-1082-1107"
- family-names: Mulinka
given-names: Pavol
orcid: "https://orcid.org/0000-0002-9394-8794"
doi: 10.5281/zenodo.7908172
message: If you use this software, please cite our article in the
Journal of Open Source Software.
preferred-citation:
authors:
- family-names: Zaurin
given-names: Javier Rodriguez
orcid: "https://orcid.org/0000-0002-1082-1107"
- family-names: Mulinka
given-names: Pavol
orcid: "https://orcid.org/0000-0002-9394-8794"
date-published: 2023-06-24
doi: 10.21105/joss.05027
issn: 2475-9066
issue: 86
journal: Journal of Open Source Software
publisher:
name: Open Journals
start: 5027
title: "pytorch-widedeep: A flexible package for multimodal deep
learning"
type: article
url: "https://joss.theoj.org/papers/10.21105/joss.05027"
volume: 8
title: "pytorch-widedeep: A flexible package for multimodal deep
learning"
\ No newline at end of file
......@@ -12,6 +12,7 @@
[![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://github.com/jrzaurin/pytorch-widedeep/graphs/commit-activity)
[![contributions welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat)](https://github.com/jrzaurin/pytorch-widedeep/issues)
[![Slack](https://img.shields.io/badge/slack-chat-green.svg?logo=slack)](https://join.slack.com/t/pytorch-widedeep/shared_invite/zt-soss7stf-iXpVuLeKZz8lGTnxxtHtTw)
[![DOI](https://joss.theoj.org/papers/10.21105/joss.05027/status.svg)](https://doi.org/10.21105/joss.05027)
# pytorch-widedeep
......@@ -38,6 +39,9 @@ The content of this document is organized as follows:
- [How to Contribute](#how-to-contribute)
- [Acknowledgments](#acknowledgments)
- [License](#license)
- [Cite](#cite)
- [BibTex](#bibtex)
- [APA](#apa)
### Introduction
......@@ -82,7 +86,7 @@ without a ``deephead`` component can be formulated as:
Where σ is the sigmoid function, *'W'* are the weight matrices applied to the wide model and to the final
activations of the deep models, *'a'* are these final activations,
activations of the deep models, *'a'* are these final activations,
φ(x) are the cross product transformations of the original features *'x'*, and
, and *'b'* is the bias term.
In case you are wondering what are *"cross product transformations"*, here is
......@@ -126,26 +130,33 @@ passed through a series of ResNet blocks built with dense layers.
3. **TabNet**: details on TabNet can be found in
[TabNet: Attentive Interpretable Tabular Learning](https://arxiv.org/abs/1908.07442)
Two simpler attention based models that we call:
4. **ContextAttentionMLP**: MLP with at attention mechanism "on top" that is based on
[Hierarchical Attention Networks for Document Classification](https://www.cs.cmu.edu/~./hovy/papers/16HLT-hierarchical-attention-networks.pd)
5. **SelfAttentionMLP**: MLP with an attention mechanism that is a simplified
version of a transformer block that we refer as "query-key self-attention".
The ``Tabformer`` family, i.e. Transformers for Tabular data:
4. **TabTransformer**: details on the TabTransformer can be found in
6. **TabTransformer**: details on the TabTransformer can be found in
[TabTransformer: Tabular Data Modeling Using Contextual Embeddings](https://arxiv.org/pdf/2012.06678.pdf).
5. **SAINT**: Details on SAINT can be found in
7. **SAINT**: Details on SAINT can be found in
[SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training](https://arxiv.org/abs/2106.01342).
6. **FT-Transformer**: details on the FT-Transformer can be found in
8. **FT-Transformer**: details on the FT-Transformer can be found in
[Revisiting Deep Learning Models for Tabular Data](https://arxiv.org/abs/2106.11959).
7. **TabFastFormer**: adaptation of the FastFormer for tabular data. Details
9. **TabFastFormer**: adaptation of the FastFormer for tabular data. Details
on the Fasformer can be found in
[FastFormers: Highly Efficient Transformer Models for Natural Language Understanding](https://arxiv.org/abs/2010.13382)
8. **TabPerceiver**: adaptation of the Perceiver for tabular data. Details on
10. **TabPerceiver**: adaptation of the Perceiver for tabular data. Details on
the Perceiver can be found in
[Perceiver: General Perception with Iterative Attention](https://arxiv.org/abs/2103.03206)
And probabilistic DL models for tabular data based on
[Weight Uncertainty in Neural Networks](https://arxiv.org/abs/1505.05424):
9. **BayesianWide**: Probabilistic adaptation of the `Wide` model.
10. **BayesianTabMlp**: Probabilistic adaptation of the `TabMlp` model
11. **BayesianWide**: Probabilistic adaptation of the `Wide` model.
12. **BayesianTabMlp**: Probabilistic adaptation of the `TabMlp` model
Note that while there are scientific publications for the TabTransformer,
SAINT and FT-Transformer, the TabFasfFormer and TabPerceiver are our own
......@@ -192,7 +203,6 @@ using `Wide` and `DeepDense` and defaults settings.
Building a wide (linear) and deep model with ``pytorch-widedeep``:
```python
import pandas as pd
import numpy as np
import torch
from sklearn.model_selection import train_test_split
......@@ -331,4 +341,31 @@ Vision](https://www.pyimagesearch.com/deep-learning-computer-vision-python-book/
This work is dual-licensed under Apache 2.0 and MIT (or any later version).
You can choose between one of them if you use this work.
`SPDX-License-Identifier: Apache-2.0 AND MIT`
\ No newline at end of file
`SPDX-License-Identifier: Apache-2.0 AND MIT`
### Cite
#### BibTex
```
@article{Zaurin_pytorch-widedeep_A_flexible_2023,
author = {Zaurin, Javier Rodriguez and Mulinka, Pavol},
doi = {10.21105/joss.05027},
journal = {Journal of Open Source Software},
month = jun,
number = {86},
pages = {5027},
title = {{pytorch-widedeep: A flexible package for multimodal deep learning}},
url = {https://joss.theoj.org/papers/10.21105/joss.05027},
volume = {8},
year = {2023}
}
```
#### APA
```
Zaurin, J. R., & Mulinka, P. (2023). pytorch-widedeep: A flexible package for
multimodal deep learning. Journal of Open Source Software, 8(86), 5027.
https://doi.org/10.21105/joss.05027
```
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is the second of the two notebooks where we aim to illustrate how one could use this library to build recommendation algorithms using the example in this [Kaggle notebook](https://www.kaggle.com/code/matanivanov/wide-deep-learning-for-recsys-with-pytorch) as guidance. In the previous notebook we used `pytorch-widedeep` to build a model that replicated almost exactly that in the notebook. In this, shorter notebook we will show how one could use the library to explore other models, following the same problem formulation, this is: given a state of a user at a certain point in time having watched a series of movies, our goal is to predict which movie the user will watch next. \n",
"\n",
"Assuming that one has read (and run) the previous notebook, the required data will be stored in a local dir called `prepared_data`, so let's read it:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from pathlib import Path\n",
"\n",
"import numpy as np\n",
"import torch\n",
"import pandas as pd\n",
"from torch import nn\n",
"\n",
"from pytorch_widedeep import Trainer\n",
"from pytorch_widedeep.utils import pad_sequences\n",
"from pytorch_widedeep.models import TabMlp, WideDeep, Transformer\n",
"from pytorch_widedeep.preprocessing import TabPreprocessor"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"save_path = Path(\"prepared_data\")\n",
"\n",
"PAD_IDX = 0\n",
"\n",
"id_cols = [\"user_id\", \"movie_id\"]\n",
"\n",
"df_train = pd.read_pickle(save_path / \"df_train.pkl\")\n",
"df_valid = pd.read_pickle(save_path / \"df_valid.pkl\")\n",
"df_test = pd.read_pickle(save_path / \"df_test.pkl\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"...remember that in the previous notebook we explained that we are not going to use a validation set here (in a real-world example, or simply a more realistic example, one should always use it).\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"df_test = pd.concat([df_valid, df_test], ignore_index=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Also remember that, in the previous notebook we discussed that the `'maxlen'` and `'max_movie_index'` parameters should be computed using only the train set. In particular, to properly do the tokenization, one would have to use ONLY train tokens and add a token for new 'unknown'/'unseen' movies in the test set. This can also be done with this library or manually, so I will leave it to the reader to implement that tokenzation appraoch."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"maxlen = max(\n",
" df_train.prev_movies.apply(lambda x: len(x)).max(),\n",
" df_test.prev_movies.apply(lambda x: len(x)).max(),\n",
")\n",
"\n",
"max_movie_index = max(df_train.movie_id.max(), df_test.movie_id.max())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"From now one things are pretty simple, moreover bearing in mind that in this example we are not going to use a wide component since, in pple, one would believe that the information in that component is also 'carried' by the movie sequences (However in the previous notebook, if one performs ablation studies, these suggest that most of the prediction power comes from the linear, wide model).\n",
"\n",
"In the example here we are going to explore one (of many) possibilities. We are simply going to encode the triplet `(user, item, rating)` and use it as a `deeptabular` component and the sequences of previously watched movies as the `deeptext` component. For the `deeptext` component we are going to use a basic encoder-only transformer model.\n",
"\n",
"Let's start with the tabular data preparation\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"df_train_user_item = df_train[[\"user_id\", \"movie_id\", \"rating\"]]\n",
"train_movies_sequences = df_train.prev_movies.apply(\n",
" lambda x: [int(el) for el in x]\n",
").to_list()\n",
"y_train = df_train.target.values.astype(int)\n",
"\n",
"df_test_user_item = df_train[[\"user_id\", \"movie_id\", \"rating\"]]\n",
"test_movies_sequences = df_test.prev_movies.apply(\n",
" lambda x: [int(el) for el in x]\n",
").to_list()\n",
"y_test = df_test.target.values.astype(int)\n",
"\n",
"tab_preprocessor = tab_preprocessor = TabPreprocessor(\n",
" cat_embed_cols=[\"user_id\", \"movie_id\", \"rating\"],\n",
")\n",
"X_train_tab = tab_preprocessor.fit_transform(df_train_user_item)\n",
"X_test_tab = tab_preprocessor.transform(df_test_user_item)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And not the text component, simply padding the sequences:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"X_train_text = np.array(\n",
" [\n",
" pad_sequences(\n",
" s,\n",
" maxlen=maxlen,\n",
" pad_first=False,\n",
" pad_idx=PAD_IDX,\n",
" )\n",
" for s in train_movies_sequences\n",
" ]\n",
")\n",
"X_test_text = np.array(\n",
" [\n",
" pad_sequences(\n",
" s,\n",
" maxlen=maxlen,\n",
" pad_first=False,\n",
" pad_idx=0,\n",
" )\n",
" for s in test_movies_sequences\n",
" ]\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We now define the model components and the wide and deep model."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"tab_mlp = TabMlp(\n",
" column_idx=tab_preprocessor.column_idx,\n",
" cat_embed_input=tab_preprocessor.cat_embed_input,\n",
" mlp_hidden_dims=[1024, 512, 256],\n",
" mlp_activation=\"relu\",\n",
")\n",
"\n",
"# plenty of options here, see the docs\n",
"transformer = Transformer(\n",
" vocab_size=max_movie_index + 1,\n",
" embed_dim=32,\n",
" n_heads=2,\n",
" n_blocks=2,\n",
" seq_length=maxlen,\n",
")\n",
"\n",
"wide_deep_model = WideDeep(\n",
" deeptabular=tab_mlp, deeptext=transformer, pred_dim=max_movie_index + 1\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"WideDeep(\n",
" (deeptabular): Sequential(\n",
" (0): TabMlp(\n",
" (cat_and_cont_embed): DiffSizeCatAndContEmbeddings(\n",
" (cat_embed): DiffSizeCatEmbeddings(\n",
" (embed_layers): ModuleDict(\n",
" (emb_layer_user_id): Embedding(749, 65, padding_idx=0)\n",
" (emb_layer_movie_id): Embedding(1612, 100, padding_idx=0)\n",
" (emb_layer_rating): Embedding(6, 4, padding_idx=0)\n",
" )\n",
" (embedding_dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" )\n",
" (encoder): MLP(\n",
" (mlp): Sequential(\n",
" (dense_layer_0): Sequential(\n",
" (0): Dropout(p=0.1, inplace=False)\n",
" (1): Linear(in_features=169, out_features=1024, bias=True)\n",
" (2): ReLU(inplace=True)\n",
" )\n",
" (dense_layer_1): Sequential(\n",
" (0): Dropout(p=0.1, inplace=False)\n",
" (1): Linear(in_features=1024, out_features=512, bias=True)\n",
" (2): ReLU(inplace=True)\n",
" )\n",
" (dense_layer_2): Sequential(\n",
" (0): Dropout(p=0.1, inplace=False)\n",
" (1): Linear(in_features=512, out_features=256, bias=True)\n",
" (2): ReLU(inplace=True)\n",
" )\n",
" )\n",
" )\n",
" )\n",
" (1): Linear(in_features=256, out_features=1683, bias=True)\n",
" )\n",
" (deeptext): Sequential(\n",
" (0): Transformer(\n",
" (embedding): Embedding(1683, 32)\n",
" (pos_encoder): PositionalEncoding(\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" )\n",
" (encoder): Sequential(\n",
" (transformer_block0): TransformerEncoder(\n",
" (attn): MultiHeadedAttention(\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" (q_proj): Linear(in_features=32, out_features=32, bias=False)\n",
" (kv_proj): Linear(in_features=32, out_features=64, bias=False)\n",
" (out_proj): Linear(in_features=32, out_features=32, bias=False)\n",
" )\n",
" (ff): FeedForward(\n",
" (w_1): Linear(in_features=32, out_features=128, bias=True)\n",
" (w_2): Linear(in_features=128, out_features=32, bias=True)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" (activation): GELU(approximate='none')\n",
" )\n",
" (attn_addnorm): AddNorm(\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" (ln): LayerNorm((32,), eps=1e-05, elementwise_affine=True)\n",
" )\n",
" (ff_addnorm): AddNorm(\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" (ln): LayerNorm((32,), eps=1e-05, elementwise_affine=True)\n",
" )\n",
" )\n",
" (transformer_block1): TransformerEncoder(\n",
" (attn): MultiHeadedAttention(\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" (q_proj): Linear(in_features=32, out_features=32, bias=False)\n",
" (kv_proj): Linear(in_features=32, out_features=64, bias=False)\n",
" (out_proj): Linear(in_features=32, out_features=32, bias=False)\n",
" )\n",
" (ff): FeedForward(\n",
" (w_1): Linear(in_features=32, out_features=128, bias=True)\n",
" (w_2): Linear(in_features=128, out_features=32, bias=True)\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" (activation): GELU(approximate='none')\n",
" )\n",
" (attn_addnorm): AddNorm(\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" (ln): LayerNorm((32,), eps=1e-05, elementwise_affine=True)\n",
" )\n",
" (ff_addnorm): AddNorm(\n",
" (dropout): Dropout(p=0.1, inplace=False)\n",
" (ln): LayerNorm((32,), eps=1e-05, elementwise_affine=True)\n",
" )\n",
" )\n",
" )\n",
" )\n",
" (1): Linear(in_features=23552, out_features=1683, bias=True)\n",
" )\n",
")"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"wide_deep_model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And as in the previous notebook, let's train (you will need a GPU for this)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"trainer = Trainer(\n",
" model=wide_deep_model,\n",
" objective=\"multiclass\",\n",
" custom_loss_function=nn.CrossEntropyLoss(ignore_index=PAD_IDX),\n",
" optimizers=torch.optim.Adam(wide_deep_model.parameters(), lr=1e-3),\n",
")\n",
"\n",
"trainer.fit(\n",
" X_train={\n",
" \"X_tab\": X_train_tab,\n",
" \"X_text\": X_train_text,\n",
" \"target\": y_train,\n",
" },\n",
" X_val={\n",
" \"X_tab\": X_test_tab,\n",
" \"X_text\": X_test_text,\n",
" \"target\": y_test,\n",
" },\n",
" n_epochs=10,\n",
" batch_size=521,\n",
" shuffle=False,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.15"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
from time import time
from sklearn.model_selection import train_test_split
from pytorch_widedeep import Trainer
from pytorch_widedeep.models import WideDeep, TabTransformer
from pytorch_widedeep.metrics import Accuracy
from pytorch_widedeep.datasets import load_adult
from pytorch_widedeep.preprocessing import TabPreprocessor
# use_cuda = torch.cuda.is_available()
df = load_adult(as_frame=True)
df.columns = [c.replace("-", "_") for c in df.columns]
df["income_label"] = (df["income"].apply(lambda x: ">50K" in x)).astype(int)
df.drop("income", axis=1, inplace=True)
target_colname = "income_label"
cat_embed_cols = []
for col in df.columns:
if df[col].dtype == "O" or df[col].nunique() < 200 and col != target_colname:
cat_embed_cols.append(col)
train, test = train_test_split(
df, test_size=0.1, random_state=1, stratify=df[[target_colname]]
)
with_cls_token = True
tab_preprocessor = TabPreprocessor(
cat_embed_cols=cat_embed_cols, with_attention=True, with_cls_token=with_cls_token
)
X_tab_train = tab_preprocessor.fit_transform(train)
X_tab_test = tab_preprocessor.transform(test)
target = train[target_colname].values
tab_transformer = TabTransformer(
column_idx=tab_preprocessor.column_idx,
cat_embed_input=tab_preprocessor.cat_embed_input,
input_dim=16,
n_heads=2,
n_blocks=2,
)
linear_tab_transformer = TabTransformer(
column_idx=tab_preprocessor.column_idx,
cat_embed_input=tab_preprocessor.cat_embed_input,
input_dim=16,
n_heads=2,
n_blocks=2,
use_linear_attention=True,
)
flash_tab_transformer = TabTransformer(
column_idx=tab_preprocessor.column_idx,
cat_embed_input=tab_preprocessor.cat_embed_input,
input_dim=16,
n_heads=2,
n_blocks=2,
use_flash_attention=True,
)
s_model = WideDeep(deeptabular=tab_transformer)
l_model = WideDeep(deeptabular=linear_tab_transformer)
f_model = WideDeep(deeptabular=flash_tab_transformer)
for name, model in [("standard", s_model), ("linear", l_model), ("flash", f_model)]:
trainer = Trainer(
model,
objective="binary",
metrics=[Accuracy],
)
s = time()
trainer.fit(
X_tab=X_tab_train,
target=target,
n_epochs=1,
batch_size=64,
val_split=0.2,
)
e = time() - s
print(f"{name} attention time: {round(e, 3)} secs")
# This script is mostly a copy/paste from the Kaggle notebook
# https://www.kaggle.com/code/matanivanov/wide-deep-learning-for-recsys-with-pytorch.
# Is a response to the issue:
# https://github.com/jrzaurin/pytorch-widedeep/issues/133.
# In this script we run the exact same model used in that Kaggle notebook
from pathlib import Path
import numpy as np
import torch
import pandas as pd
from torch import nn, cat, mean
from scipy.sparse import coo_matrix
device = "cuda" if torch.cuda.is_available() else "cpu"
save_path = Path("prepared_data")
def get_coo_indexes(lil):
rows = []
cols = []
for i, el in enumerate(lil):
if type(el) != list:
el = [el]
for j in el:
rows.append(i)
cols.append(j)
return rows, cols
def get_sparse_features(series, shape):
coo_indexes = get_coo_indexes(series.tolist())
sparse_df = coo_matrix(
(np.ones(len(coo_indexes[0])), (coo_indexes[0], coo_indexes[1])), shape=shape
)
return sparse_df
def sparse_to_idx(data, pad_idx=-1):
indexes = data.nonzero()
indexes_df = pd.DataFrame()
indexes_df["rows"] = indexes[0]
indexes_df["cols"] = indexes[1]
mdf = indexes_df.groupby("rows").apply(lambda x: x["cols"].tolist())
max_len = mdf.apply(lambda x: len(x)).max()
return mdf.apply(lambda x: pd.Series(x + [pad_idx] * (max_len - len(x)))).values
def idx_to_sparse(idx, sparse_dim):
sparse = np.zeros(sparse_dim)
sparse[int(idx)] = 1
return pd.Series(sparse, dtype=int)
def process_cats_as_kaggle_notebook(df):
df["gender"] = (df["gender"] == "M").astype(int)
df = pd.concat(
[
df.drop("occupation", axis=1),
pd.get_dummies(df["occupation"]).astype(int),
],
axis=1,
)
df.drop("other", axis=1, inplace=True)
df.drop("zip_code", axis=1, inplace=True)
return df
id_cols = ["user_id", "movie_id"]
df_train = pd.read_pickle(save_path / "df_train.pkl")
df_valid = pd.read_pickle(save_path / "df_valid.pkl")
df_test = pd.read_pickle(save_path / "df_test.pkl")
df_test = pd.concat([df_valid, df_test], ignore_index=True)
df_train = process_cats_as_kaggle_notebook(df_train)
df_test = process_cats_as_kaggle_notebook(df_test)
# here is another caveat, using all dataset to build 'train_movies_watched'
# when in reality one should use only the training
max_movie_index = max(df_train.movie_id.max(), df_test.movie_id.max())
X_train = df_train.drop(id_cols + ["prev_movies", "target"], axis=1)
y_train = df_train.target.values
train_movies_watched = get_sparse_features(
df_train["prev_movies"], (len(df_train), max_movie_index + 1)
)
X_test = df_test.drop(id_cols + ["prev_movies", "target"], axis=1)
y_test = df_test.target.values
test_movies_watched = get_sparse_features(
df_test["prev_movies"], (len(df_test), max_movie_index + 1)
)
PAD_IDX = 0
X_train_tensor = torch.Tensor(X_train.fillna(0).values).to(device)
train_movies_watched_tensor = (
torch.sparse_coo_tensor(
indices=train_movies_watched.nonzero(),
values=[1] * len(train_movies_watched.nonzero()[0]),
size=train_movies_watched.shape,
)
.to_dense()
.to(device)
)
movies_train_sequences = (
torch.Tensor(
sparse_to_idx(train_movies_watched, pad_idx=PAD_IDX),
)
.long()
.to(device)
)
target_train = torch.Tensor(y_train).long().to(device)
X_test_tensor = torch.Tensor(X_test.fillna(0).values).to(device)
test_movies_watched_tensor = (
torch.sparse_coo_tensor(
indices=test_movies_watched.nonzero(),
values=[1] * len(test_movies_watched.nonzero()[0]),
size=test_movies_watched.shape,
)
.to_dense()
.to(device)
)
movies_test_sequences = (
torch.Tensor(
sparse_to_idx(test_movies_watched, pad_idx=PAD_IDX),
)
.long()
.to(device)
)
target_test = torch.Tensor(y_test).long().to(device)
class WideAndDeep(nn.Module):
def __init__(
self,
continious_feature_shape, # number of continious features
embed_size, # size of embedding for binary features
embed_dict_len, # number of unique binary features
pad_idx, # padding index
):
super(WideAndDeep, self).__init__()
self.embed = nn.Embedding(embed_dict_len, embed_size, padding_idx=pad_idx)
self.linear_relu_stack = nn.Sequential(
nn.Linear(embed_size + continious_feature_shape, 1024),
nn.ReLU(),
nn.Linear(1024, 512),
nn.ReLU(),
nn.Linear(512, 256),
nn.ReLU(),
)
self.head = nn.Sequential(
nn.Linear(embed_dict_len + 256, embed_dict_len),
)
def forward(self, continious, binary, binary_idx):
# get embeddings for sequence of indexes
binary_embed = self.embed(binary_idx)
binary_embed_mean = mean(binary_embed, dim=1)
# get logits for "deep" part: continious features + binary embeddings
deep_logits = self.linear_relu_stack(
cat((continious, binary_embed_mean), dim=1)
)
# get final softmax logits for "deep" part and raw binary features
total_logits = self.head(cat((deep_logits, binary), dim=1))
return total_logits
model = WideAndDeep(X_train.shape[1], 16, max_movie_index + 1, PAD_IDX).to(device)
print(model)
EPOCHS = 10
loss_fn = nn.CrossEntropyLoss(ignore_index=PAD_IDX)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
for t in range(EPOCHS):
model.train()
pred_train = model(
X_train_tensor, train_movies_watched_tensor, movies_train_sequences
)
loss_train = loss_fn(pred_train, target_train)
# Backpropagation
optimizer.zero_grad()
loss_train.backward()
optimizer.step()
model.eval()
with torch.no_grad():
pred_test = model(
X_test_tensor, test_movies_watched_tensor, movies_test_sequences
)
loss_test = loss_fn(pred_test, target_test)
print(f"Epoch {t}")
print(f"Train loss: {loss_train:>7f}")
print(f"Test loss: {loss_test:>7f}")
# This script is mostly a copy/paste from the Kaggle notebook
# https://www.kaggle.com/code/matanivanov/wide-deep-learning-for-recsys-with-pytorch.
# Is a response to the issue:
# https://github.com/jrzaurin/pytorch-widedeep/issues/133 In this script we
# simply prepare the data that will later be used for a custom Wide and Deep
# model and for Wide and Deep models created using this library
from pathlib import Path
from sklearn.model_selection import train_test_split
from pytorch_widedeep.datasets import load_movielens100k
data, users, items = load_movielens100k(as_frame=True)
# Alternatively, as specified in the docs: 'The last 19 fields are the genres' so:
# list_of_genres = items.columns.tolist()[-19:]
list_of_genres = [
"unknown",
"Action",
"Adventure",
"Animation",
"Children's",
"Comedy",
"Crime",
"Documentary",
"Drama",
"Fantasy",
"Film-Noir",
"Horror",
"Musical",
"Mystery",
"Romance",
"Sci-Fi",
"Thriller",
"War",
"Western",
]
# adding a column with the number of movies watched per users
dataset = data.sort_values(["user_id", "timestamp"]).reset_index(drop=True)
dataset["one"] = 1
dataset["num_watched"] = dataset.groupby("user_id")["one"].cumsum()
dataset.drop("one", axis=1, inplace=True)
# adding a column with the mean rating at a point in time per user
dataset["mean_rate"] = (
dataset.groupby("user_id")["rating"].cumsum() / dataset["num_watched"]
)
# In this particular exercise the problem is formulating as predicting the
# next movie that will be watched (in consequence the last interactions will be discarded)
dataset["target"] = dataset.groupby("user_id")["movie_id"].shift(-1)
# Here the author builds the sequences
dataset["prev_movies"] = dataset["movie_id"].apply(lambda x: str(x))
dataset["prev_movies"] = (
dataset.groupby("user_id")["prev_movies"]
.apply(lambda x: (x + " ").cumsum().str.strip())
.reset_index(drop=True)
)
dataset["prev_movies"] = dataset["prev_movies"].apply(lambda x: x.split())
# Adding user feats
dataset = dataset.merge(users, on="user_id", how="left")
# Adding a genre_rate as the mean of all movies rated for a given genre per
# user
dataset = dataset.merge(items[["movie_id"] + list_of_genres], on="movie_id", how="left")
for genre in list_of_genres:
dataset[f"{genre}_rate"] = dataset[genre] * dataset["rating"]
dataset[genre] = dataset.groupby("user_id")[genre].cumsum()
dataset[f"{genre}_rate"] = (
dataset.groupby("user_id")[f"{genre}_rate"].cumsum() / dataset[genre]
)
dataset[list_of_genres] = dataset[list_of_genres].apply(
lambda x: x / dataset["num_watched"]
)
# Again, we use the same settings as those in the Kaggle notebook,
# but 'COLD_START_TRESH' is pretty aggressive
COLD_START_TRESH = 5
filtred_data = dataset[
(dataset["num_watched"] >= COLD_START_TRESH) & ~(dataset["target"].isna())
].sort_values("timestamp")
train_data, _test_data = train_test_split(filtred_data, test_size=0.2, shuffle=False)
valid_data, test_data = train_test_split(_test_data, test_size=0.5, shuffle=False)
cols_to_drop = [
# "rating",
"timestamp",
"num_watched",
]
df_train = train_data.drop(cols_to_drop, axis=1)
df_valid = valid_data.drop(cols_to_drop, axis=1)
df_test = test_data.drop(cols_to_drop, axis=1)
save_path = Path("prepared_data")
if not save_path.exists():
save_path.mkdir(parents=True, exist_ok=True)
df_train.to_pickle(save_path / "df_train.pkl")
df_valid.to_pickle(save_path / "df_valid.pkl")
df_test.to_pickle(save_path / "df_test.pkl")
# In this script I illustrate how one coould use our library to reproduce
# almost exactly the same model used in the Kaggle Notebook
from pathlib import Path
import numpy as np
import torch
import pandas as pd
from torch import nn
from scipy.sparse import coo_matrix
from pytorch_widedeep import Trainer
from pytorch_widedeep.models import TabMlp, BasicRNN, WideDeep
from pytorch_widedeep.preprocessing import TabPreprocessor
device = "cuda" if torch.cuda.is_available() else "cpu"
save_path = Path("prepared_data")
PAD_IDX = 0
def get_coo_indexes(lil):
rows = []
cols = []
for i, el in enumerate(lil):
if type(el) != list:
el = [el]
for j in el:
rows.append(i)
cols.append(j)
return rows, cols
def get_sparse_features(series, shape):
coo_indexes = get_coo_indexes(series.tolist())
sparse_df = coo_matrix(
(np.ones(len(coo_indexes[0])), (coo_indexes[0], coo_indexes[1])), shape=shape
)
return sparse_df
def sparse_to_idx(data, pad_idx=-1):
indexes = data.nonzero()
indexes_df = pd.DataFrame()
indexes_df["rows"] = indexes[0]
indexes_df["cols"] = indexes[1]
mdf = indexes_df.groupby("rows").apply(lambda x: x["cols"].tolist())
max_len = mdf.apply(lambda x: len(x)).max()
return mdf.apply(lambda x: pd.Series(x + [pad_idx] * (max_len - len(x)))).values
id_cols = ["user_id", "movie_id"]
df_train = pd.read_pickle(save_path / "df_train.pkl")
df_valid = pd.read_pickle(save_path / "df_valid.pkl")
df_test = pd.read_pickle(save_path / "df_test.pkl")
df_test = pd.concat([df_valid, df_test], ignore_index=True)
# here is another caveat, using all dataset to build 'train_movies_watched'
# when in reality one should use only the training
max_movie_index = max(df_train.movie_id.max(), df_test.movie_id.max())
X_train = df_train.drop(id_cols + ["rating", "prev_movies", "target"], axis=1)
y_train = np.array(df_train.target.values, dtype="int64")
train_movies_watched = get_sparse_features(
df_train["prev_movies"], (len(df_train), max_movie_index + 1)
)
X_test = df_test.drop(id_cols + ["rating", "prev_movies", "target"], axis=1)
y_test = np.array(df_test.target.values, dtype="int64")
test_movies_watched = get_sparse_features(
df_test["prev_movies"], (len(df_test), max_movie_index + 1)
)
cat_cols = ["gender", "occupation", "zip_code"]
cont_cols = [c for c in X_train if c not in cat_cols]
tab_preprocessor = TabPreprocessor(
cat_embed_cols=cat_cols,
continuous_cols=cont_cols,
)
# The sparse matrices need to be turned into dense whether at array or tensor
# stage. This is one of the reasons why the wide component in our library is
# implemented as Embeddings. However, our implementation is still not
# suitable for the type of pre-processing that the author of the Kaggle
# notebook did to come up with the what it would be the wide component
# (a sparse martrix with 1s at those locations corresponding to the movies
# that a user has seen at a point in time). Therefore, we will have to code a
# Wide model (fairly simple since it is a linear layer)
X_train_wide = np.array(train_movies_watched.todense())
X_test_wide = np.array(test_movies_watched.todense())
# Here our tabular component is a bit more elaborated than that in the
# notebook, just a bit...
X_train_tab = tab_preprocessor.fit_transform(X_train.fillna(0))
X_test_tab = tab_preprocessor.transform(X_test.fillna(0))
# The text component are the sequences of movies wacthed. There is an element
# of information redundancy here in my opinion. This is because the wide and
# text components have implicitely the same information, but in different
# form. Anyway, we want to reproduce the Kaggle notebook as close as
# possible.
X_train_text = sparse_to_idx(train_movies_watched, pad_idx=PAD_IDX)
X_test_text = sparse_to_idx(test_movies_watched, pad_idx=PAD_IDX)
class Wide(nn.Module):
def __init__(self, input_dim: int, pred_dim: int):
super().__init__()
self.input_dim = input_dim
self.pred_dim = pred_dim
# The way I coded the library I never though that someone would ever
# wanted to code their own wide component. However, if you do, the
# wide component must have a 'wide_linear' attribute. In other words,
# the linear layer must be called 'wide_linear'
self.wide_linear = nn.Linear(input_dim, pred_dim)
def forward(self, X):
out = self.wide_linear(X.type(torch.float32))
return out
wide = Wide(X_train_wide.shape[1], max_movie_index + 1)
class SimpleEmbed(nn.Module):
def __init__(self, vocab_size: int, embed_dim: int, pad_idx: int):
super().__init__()
self.vocab_size = vocab_size
self.embed_dim = embed_dim
self.pad_idx = pad_idx
# The sequences of movies watched are simply embedded in the Kaggle
# notebook. No RNN, Transformer or any model is used
self.embed = nn.Embedding(vocab_size, embed_dim, padding_idx=pad_idx)
def forward(self, X):
embed = self.embed(X)
embed_mean = torch.mean(embed, dim=1)
return embed_mean
@property
def output_dim(self) -> int:
return self.embed_dim
# In the notebook the author uses simply embeddings
simple_embed = SimpleEmbed(max_movie_index + 1, 16, 0)
# but maybe one would like to use an RNN to account for the sequence nature of
# the problem formulation
basic_rnn = BasicRNN(
vocab_size=max_movie_index + 1,
embed_dim=16,
hidden_dim=32,
n_layers=2,
rnn_type="gru",
)
tab_mlp = TabMlp(
column_idx=tab_preprocessor.column_idx,
cat_embed_input=tab_preprocessor.cat_embed_input,
continuous_cols=tab_preprocessor.continuous_cols,
cont_norm_layer=None,
mlp_hidden_dims=[1024, 512, 256],
mlp_activation="relu",
)
# The main difference between this wide and deep model and the Wide and Deep
# model in the Kaggle notebook is that in that notebook, the author
# concatenates the embedings and the tabular features(which he refers
# as 'continuous'), then passes this concatenation through a stack of
# linear + Relu layers. Then concatenates this output with the binary
# features and connects this concatenation with the final linear layer. Our
# implementation follows the notation of the original paper and instead of
# concatenating the tabular, text and wide components, we first compute their
# output, and then add it (see here: https://arxiv.org/pdf/1606.07792.pdf,
# their Eq 3). Note that this is effectively the same with the caveat that
# while in one case we initialise a big weight matrix at once, in our
# implementation we initialise different matrices for different components.
# Anyway, let's give it a go.
wide_deep_model = WideDeep(
wide=wide, deeptabular=tab_mlp, deeptext=simple_embed, pred_dim=max_movie_index + 1
)
# # To use an RNN, simply
# wide_deep_model = WideDeep(
# wide=wide, deeptabular=tab_mlp, deeptext=basic_rnn, pred_dim=max_movie_index + 1
# )
trainer = Trainer(
model=wide_deep_model,
objective="multiclass",
custom_loss_function=nn.CrossEntropyLoss(ignore_index=PAD_IDX),
optimizers=torch.optim.Adam(wide_deep_model.parameters(), lr=1e-3),
)
trainer.fit(
X_train={
"X_wide": X_train_wide,
"X_tab": X_train_tab,
"X_text": X_train_text,
"target": y_train,
},
X_val={
"X_wide": X_test_wide,
"X_tab": X_test_tab,
"X_text": X_test_text,
"target": y_test,
},
n_epochs=10,
batch_size=512,
shuffle=False,
)
from pathlib import Path
import numpy as np
import torch
import pandas as pd
from torch import nn
from pytorch_widedeep import Trainer
from pytorch_widedeep.utils import pad_sequences
from pytorch_widedeep.models import TabMlp, WideDeep, Transformer
from pytorch_widedeep.preprocessing import TabPreprocessor
save_path = Path("prepared_data")
PAD_IDX = 0
id_cols = ["user_id", "movie_id"]
df_train = pd.read_pickle(save_path / "df_train.pkl")
df_valid = pd.read_pickle(save_path / "df_valid.pkl")
df_test = pd.read_pickle(save_path / "df_test.pkl")
df_test = pd.concat([df_valid, df_test], ignore_index=True)
# sequence length. Shorter sequences will be padded to this length. This is
# identical to the Kaggle's implementation
maxlen = max(
df_train.prev_movies.apply(lambda x: len(x)).max(),
df_test.prev_movies.apply(lambda x: len(x)).max(),
)
# Here there is a caveat. In pple, we are using (as in the Kaggle notebook)
# all indexes to compute the number of tokens in the dataset. To do this
# properly, one would have to use ONLY train tokens and add a token for new
# unknown/unseen movies in the test set. This can also be done with this
# library and manually, so I will leave it to the reader to implement that
# tokenzation appraoch
max_movie_index = max(df_train.movie_id.max(), df_test.movie_id.max())
# From now one things are pretty simple, moreover bearing in mind that in this
# example we are not going to use a wide component since, in pple, I believe
# the information in that component is also 'carried' by the movie sequences
# (also in previous scripts one can see that most prediction power comes from
# the linear, wide model)
df_train_user_item = df_train[["user_id", "movie_id", "rating"]]
train_movies_sequences = df_train.prev_movies.apply(
lambda x: [int(el) for el in x]
).to_list()
y_train = df_train.target.values.astype(int)
df_test_user_item = df_test[["user_id", "movie_id", "rating"]]
test_movies_sequences = df_test.prev_movies.apply(
lambda x: [int(el) for el in x]
).to_list()
y_test = df_test.target.values.astype(int)
# As a tabular component we are going to encode simply the triplets
# (user, items, rating)
tab_preprocessor = tab_preprocessor = TabPreprocessor(
cat_embed_cols=["user_id", "movie_id", "rating"],
)
X_train_tab = tab_preprocessor.fit_transform(df_train_user_item)
X_test_tab = tab_preprocessor.transform(df_test_user_item)
# And here we pad the sequences and define a transformer model for the text
# component that is, in this case, the sequences of movies watched
X_train_text = np.array(
[
pad_sequences(
s,
maxlen=maxlen,
pad_first=False,
pad_idx=PAD_IDX,
)
for s in train_movies_sequences
]
)
X_test_text = np.array(
[
pad_sequences(
s,
maxlen=maxlen,
pad_first=False,
pad_idx=0,
)
for s in test_movies_sequences
]
)
tab_mlp = TabMlp(
column_idx=tab_preprocessor.column_idx,
cat_embed_input=tab_preprocessor.cat_embed_input,
mlp_hidden_dims=[512, 256],
mlp_activation="relu",
)
# plenty of options here, see the docs
transformer = Transformer(
vocab_size=max_movie_index + 1,
embed_dim=16,
n_heads=2,
n_blocks=2,
seq_length=maxlen,
)
wide_deep_model = WideDeep(
deeptabular=tab_mlp, deeptext=transformer, pred_dim=max_movie_index + 1
)
trainer = Trainer(
model=wide_deep_model,
objective="multiclass",
custom_loss_function=nn.CrossEntropyLoss(ignore_index=PAD_IDX),
optimizers=torch.optim.Adam(wide_deep_model.parameters(), lr=1e-3),
)
trainer.fit(
X_train={
"X_tab": X_train_tab,
"X_text": X_train_text,
"target": y_train,
},
X_val={
"X_tab": X_test_tab,
"X_text": X_test_text,
"target": y_test,
},
n_epochs=2,
batch_size=32,
shuffle=False,
)
......@@ -56,6 +56,9 @@ nav:
- 16_Self-Supervised Pre-Training pt 1: examples/16_Self_Supervised_Pretraning_pt1.ipynb
- 16_Self-Supervised Pre-Training pt 2: examples/16_Self_Supervised_Pretraning_pt2.ipynb
- 17_Using_a_huggingface_model: examples/17_Usign_a_hugging_face_model.ipynb
- 18_feature_importance_via_attention_weights: examples/18_feature_importance_via_attention_weights.ipynb
- 19_wide_and_deep_for_recsys_pt1: examples/19_wide_and_deep_for_recsys_pt1.ipynb
- 19_wide_and_deep_for_recsys_pt2: examples/19_wide_and_deep_for_recsys_pt2.ipynb
- Contributing: contributing.md
theme:
......
......@@ -739,6 +739,12 @@
......@@ -1012,6 +1018,48 @@
<li class="md-nav__item">
<a href="/examples/18_feature_importance_via_attention_weights.html" class="md-nav__link">
18_feature_importance_via_attention_weights
</a>
</li>
<li class="md-nav__item">
<a href="/examples/19_wide_and_deep_for_recsys_pt1.html" class="md-nav__link">
19_wide_and_deep_for_recsys_pt1
</a>
</li>
<li class="md-nav__item">
<a href="/examples/19_wide_and_deep_for_recsys_pt2.html" class="md-nav__link">
19_wide_and_deep_for_recsys_pt2
</a>
</li>
</ul>
</nav>
</li>
......
......@@ -743,6 +743,12 @@
......@@ -1016,6 +1022,48 @@
<li class="md-nav__item">
<a href="examples/18_feature_importance_via_attention_weights.html" class="md-nav__link">
18_feature_importance_via_attention_weights
</a>
</li>
<li class="md-nav__item">
<a href="examples/19_wide_and_deep_for_recsys_pt1.html" class="md-nav__link">
19_wide_and_deep_for_recsys_pt1
</a>
</li>
<li class="md-nav__item">
<a href="examples/19_wide_and_deep_for_recsys_pt2.html" class="md-nav__link">
19_wide_and_deep_for_recsys_pt2
</a>
</li>
</ul>
</nav>
</li>
......@@ -1095,7 +1143,7 @@
<nav class="md-footer__inner md-grid" aria-label="Footer" >
<a href="examples/17_Usign_a_hugging_face_model.html" class="md-footer__link md-footer__link--prev" aria-label="Previous: 17_Using_a_huggingface_model" rel="prev">
<a href="examples/19_wide_and_deep_for_recsys_pt2.html" class="md-footer__link md-footer__link--prev" aria-label="Previous: 19_wide_and_deep_for_recsys_pt2" rel="prev">
<div class="md-footer__button md-icon">
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24"><path d="M20 11v2H8l5.5 5.5-1.42 1.42L4.16 12l7.92-7.92L13.5 5.5 8 11h12Z"/></svg>
</div>
......@@ -1104,7 +1152,7 @@
<span class="md-footer__direction">
Previous
</span>
17_Using_a_huggingface_model
19_wide_and_deep_for_recsys_pt2
</div>
</div>
</a>
......
......@@ -750,6 +750,12 @@
......@@ -1092,6 +1098,48 @@
<li class="md-nav__item">
<a href="18_feature_importance_via_attention_weights.html" class="md-nav__link">
18_feature_importance_via_attention_weights
</a>
</li>
<li class="md-nav__item">
<a href="19_wide_and_deep_for_recsys_pt1.html" class="md-nav__link">
19_wide_and_deep_for_recsys_pt1
</a>
</li>
<li class="md-nav__item">
<a href="19_wide_and_deep_for_recsys_pt2.html" class="md-nav__link">
19_wide_and_deep_for_recsys_pt2
</a>
</li>
</ul>
</nav>
</li>
......
......@@ -750,6 +750,12 @@
......@@ -1085,6 +1091,48 @@
<li class="md-nav__item">
<a href="18_feature_importance_via_attention_weights.html" class="md-nav__link">
18_feature_importance_via_attention_weights
</a>
</li>
<li class="md-nav__item">
<a href="19_wide_and_deep_for_recsys_pt1.html" class="md-nav__link">
19_wide_and_deep_for_recsys_pt1
</a>
</li>
<li class="md-nav__item">
<a href="19_wide_and_deep_for_recsys_pt2.html" class="md-nav__link">
19_wide_and_deep_for_recsys_pt2
</a>
</li>
</ul>
</nav>
</li>
......
......@@ -750,6 +750,12 @@
......@@ -1071,6 +1077,48 @@
<li class="md-nav__item">
<a href="18_feature_importance_via_attention_weights.html" class="md-nav__link">
18_feature_importance_via_attention_weights
</a>
</li>
<li class="md-nav__item">
<a href="19_wide_and_deep_for_recsys_pt1.html" class="md-nav__link">
19_wide_and_deep_for_recsys_pt1
</a>
</li>
<li class="md-nav__item">
<a href="19_wide_and_deep_for_recsys_pt2.html" class="md-nav__link">
19_wide_and_deep_for_recsys_pt2
</a>
</li>
</ul>
</nav>
</li>
......
......@@ -750,6 +750,12 @@
......@@ -1085,6 +1091,48 @@
<li class="md-nav__item">
<a href="18_feature_importance_via_attention_weights.html" class="md-nav__link">
18_feature_importance_via_attention_weights
</a>
</li>
<li class="md-nav__item">
<a href="19_wide_and_deep_for_recsys_pt1.html" class="md-nav__link">
19_wide_and_deep_for_recsys_pt1
</a>
</li>
<li class="md-nav__item">
<a href="19_wide_and_deep_for_recsys_pt2.html" class="md-nav__link">
19_wide_and_deep_for_recsys_pt2
</a>
</li>
</ul>
</nav>
</li>
......
......@@ -750,6 +750,12 @@
......@@ -1033,6 +1039,48 @@
<li class="md-nav__item">
<a href="18_feature_importance_via_attention_weights.html" class="md-nav__link">
18_feature_importance_via_attention_weights
</a>
</li>
<li class="md-nav__item">
<a href="19_wide_and_deep_for_recsys_pt1.html" class="md-nav__link">
19_wide_and_deep_for_recsys_pt1
</a>
</li>
<li class="md-nav__item">
<a href="19_wide_and_deep_for_recsys_pt2.html" class="md-nav__link">
19_wide_and_deep_for_recsys_pt2
</a>
</li>
</ul>
</nav>
</li>
......
......@@ -750,6 +750,12 @@
......@@ -1071,6 +1077,48 @@
<li class="md-nav__item">
<a href="18_feature_importance_via_attention_weights.html" class="md-nav__link">
18_feature_importance_via_attention_weights
</a>
</li>
<li class="md-nav__item">
<a href="19_wide_and_deep_for_recsys_pt1.html" class="md-nav__link">
19_wide_and_deep_for_recsys_pt1
</a>
</li>
<li class="md-nav__item">
<a href="19_wide_and_deep_for_recsys_pt2.html" class="md-nav__link">
19_wide_and_deep_for_recsys_pt2
</a>
</li>
</ul>
</nav>
</li>
......
......@@ -750,6 +750,12 @@
......@@ -1085,6 +1091,48 @@
<li class="md-nav__item">
<a href="18_feature_importance_via_attention_weights.html" class="md-nav__link">
18_feature_importance_via_attention_weights
</a>
</li>
<li class="md-nav__item">
<a href="19_wide_and_deep_for_recsys_pt1.html" class="md-nav__link">
19_wide_and_deep_for_recsys_pt1
</a>
</li>
<li class="md-nav__item">
<a href="19_wide_and_deep_for_recsys_pt2.html" class="md-nav__link">
19_wide_and_deep_for_recsys_pt2
</a>
</li>
</ul>
</nav>
</li>
......
......@@ -750,6 +750,12 @@
......@@ -1064,6 +1070,48 @@
<li class="md-nav__item">
<a href="18_feature_importance_via_attention_weights.html" class="md-nav__link">
18_feature_importance_via_attention_weights
</a>
</li>
<li class="md-nav__item">
<a href="19_wide_and_deep_for_recsys_pt1.html" class="md-nav__link">
19_wide_and_deep_for_recsys_pt1
</a>
</li>
<li class="md-nav__item">
<a href="19_wide_and_deep_for_recsys_pt2.html" class="md-nav__link">
19_wide_and_deep_for_recsys_pt2
</a>
</li>
</ul>
</nav>
</li>
......
......@@ -750,6 +750,12 @@
......@@ -1033,6 +1039,48 @@
<li class="md-nav__item">
<a href="18_feature_importance_via_attention_weights.html" class="md-nav__link">
18_feature_importance_via_attention_weights
</a>
</li>
<li class="md-nav__item">
<a href="19_wide_and_deep_for_recsys_pt1.html" class="md-nav__link">
19_wide_and_deep_for_recsys_pt1
</a>
</li>
<li class="md-nav__item">
<a href="19_wide_and_deep_for_recsys_pt2.html" class="md-nav__link">
19_wide_and_deep_for_recsys_pt2
</a>
</li>
</ul>
</nav>
</li>
......
......@@ -750,6 +750,12 @@
......@@ -1078,6 +1084,48 @@
<li class="md-nav__item">
<a href="18_feature_importance_via_attention_weights.html" class="md-nav__link">
18_feature_importance_via_attention_weights
</a>
</li>
<li class="md-nav__item">
<a href="19_wide_and_deep_for_recsys_pt1.html" class="md-nav__link">
19_wide_and_deep_for_recsys_pt1
</a>
</li>
<li class="md-nav__item">
<a href="19_wide_and_deep_for_recsys_pt2.html" class="md-nav__link">
19_wide_and_deep_for_recsys_pt2
</a>
</li>
</ul>
</nav>
</li>
......
......@@ -750,6 +750,12 @@
......@@ -1092,6 +1098,48 @@
<li class="md-nav__item">
<a href="18_feature_importance_via_attention_weights.html" class="md-nav__link">
18_feature_importance_via_attention_weights
</a>
</li>
<li class="md-nav__item">
<a href="19_wide_and_deep_for_recsys_pt1.html" class="md-nav__link">
19_wide_and_deep_for_recsys_pt1
</a>
</li>
<li class="md-nav__item">
<a href="19_wide_and_deep_for_recsys_pt2.html" class="md-nav__link">
19_wide_and_deep_for_recsys_pt2
</a>
</li>
</ul>
</nav>
</li>
......
......@@ -750,6 +750,12 @@
......@@ -1071,6 +1077,48 @@
<li class="md-nav__item">
<a href="18_feature_importance_via_attention_weights.html" class="md-nav__link">
18_feature_importance_via_attention_weights
</a>
</li>
<li class="md-nav__item">
<a href="19_wide_and_deep_for_recsys_pt1.html" class="md-nav__link">
19_wide_and_deep_for_recsys_pt1
</a>
</li>
<li class="md-nav__item">
<a href="19_wide_and_deep_for_recsys_pt2.html" class="md-nav__link">
19_wide_and_deep_for_recsys_pt2
</a>
</li>
</ul>
</nav>
</li>
......
......@@ -750,6 +750,12 @@
......@@ -1064,6 +1070,48 @@
<li class="md-nav__item">
<a href="18_feature_importance_via_attention_weights.html" class="md-nav__link">
18_feature_importance_via_attention_weights
</a>
</li>
<li class="md-nav__item">
<a href="19_wide_and_deep_for_recsys_pt1.html" class="md-nav__link">
19_wide_and_deep_for_recsys_pt1
</a>
</li>
<li class="md-nav__item">
<a href="19_wide_and_deep_for_recsys_pt2.html" class="md-nav__link">
19_wide_and_deep_for_recsys_pt2
</a>
</li>
</ul>
</nav>
</li>
......
......@@ -750,6 +750,12 @@
......@@ -1080,6 +1086,48 @@
<li class="md-nav__item">
<a href="18_feature_importance_via_attention_weights.html" class="md-nav__link">
18_feature_importance_via_attention_weights
</a>
</li>
<li class="md-nav__item">
<a href="19_wide_and_deep_for_recsys_pt1.html" class="md-nav__link">
19_wide_and_deep_for_recsys_pt1
</a>
</li>
<li class="md-nav__item">
<a href="19_wide_and_deep_for_recsys_pt2.html" class="md-nav__link">
19_wide_and_deep_for_recsys_pt2
</a>
</li>
</ul>
</nav>
</li>
......
......@@ -750,6 +750,12 @@
......@@ -1073,6 +1079,48 @@
<li class="md-nav__item">
<a href="18_feature_importance_via_attention_weights.html" class="md-nav__link">
18_feature_importance_via_attention_weights
</a>
</li>
<li class="md-nav__item">
<a href="19_wide_and_deep_for_recsys_pt1.html" class="md-nav__link">
19_wide_and_deep_for_recsys_pt1
</a>
</li>
<li class="md-nav__item">
<a href="19_wide_and_deep_for_recsys_pt2.html" class="md-nav__link">
19_wide_and_deep_for_recsys_pt2
</a>
</li>
</ul>
</nav>
</li>
......
此差异已折叠。
此差异已折叠。
......@@ -789,6 +789,12 @@
......@@ -1062,6 +1068,48 @@
<li class="md-nav__item">
<a href="examples/18_feature_importance_via_attention_weights.html" class="md-nav__link">
18_feature_importance_via_attention_weights
</a>
</li>
<li class="md-nav__item">
<a href="examples/19_wide_and_deep_for_recsys_pt1.html" class="md-nav__link">
19_wide_and_deep_for_recsys_pt1
</a>
</li>
<li class="md-nav__item">
<a href="examples/19_wide_and_deep_for_recsys_pt2.html" class="md-nav__link">
19_wide_and_deep_for_recsys_pt2
</a>
</li>
</ul>
</nav>
</li>
......@@ -1121,10 +1169,10 @@ pip<span class="w"> </span>install<span class="w"> </span>-e<span class="w"> </s
</code></pre></div>
<h2 id="dependencies">Dependencies<a class="headerlink" href="#dependencies" title="Permanent link">&para;</a></h2>
<ul>
<li>pandas</li>
<li>numpy</li>
<li>scipy</li>
<li>scikit-learn</li>
<li>pandas&gt;=1.3.5</li>
<li>numpy&gt;=1.21.6</li>
<li>scipy&gt;=1.7.3</li>
<li>scikit-learn&gt;=1.0.2</li>
<li>gensim</li>
<li>spacy</li>
<li>opencv-contrib-python</li>
......@@ -1135,6 +1183,8 @@ pip<span class="w"> </span>install<span class="w"> </span>-e<span class="w"> </s
<li>einops</li>
<li>wrapt</li>
<li>torchmetrics</li>
<li>pyarrow</li>
<li>fastparquet&gt;=0.8.1</li>
</ul>
......
......@@ -27,10 +27,10 @@ pip install -e .
## Dependencies
* pandas
* numpy
* scipy
* scikit-learn
* pandas>=1.3.5
* numpy>=1.21.6
* scipy>=1.7.3
* scikit-learn>=1.0.2
* gensim
* spacy
* opencv-contrib-python
......@@ -41,3 +41,5 @@ pip install -e .
* einops
* wrapt
* torchmetrics
* pyarrow
* fastparquet>=0.8.1
\ No newline at end of file
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。