Merge pull request #33 from jrzaurin/tabtransformer

Tabtransformer

Merge pull request #33 from jrzaurin/tabtransformer
Tabtransformer
f430864e · Javier · GitHub · 2c539019 · 56bd75ed · f430864e
108 changed file
--- a/README.md
+++ b/README.md
@@ -13,8 +13,8 @@

 # pytorch-widedeep

-A flexible package to combine tabular data with text and images using wide and
-deep models.
+A flexible package to use Deep Learning with tabular data, text and images
+using wide and deep models.

 **Documentation:** [https://pytorch-widedeep.readthedocs.io](https://pytorch-widedeep.readthedocs.io/en/latest/index.html)

@@ -22,35 +22,40 @@ deep models.

 ### Introduction

-`pytorch-widedeep` is based on Google's Wide and Deep Algorithm. Details of
-the original algorithm can be found
-[here](https://www.tensorflow.org/tutorials/wide_and_deep), and the nice
-research paper can be found [here](https://arxiv.org/abs/1606.07792).
+`pytorch-widedeep` is based on Google's Wide and Deep Algorithm, [Wide & Deep
+Learning for Recommender Systems](https://arxiv.org/abs/1606.07792).

 In general terms, `pytorch-widedeep` is a package to use deep learning with
 tabular data. In particular, is intended to facilitate the combination of text
 and images with corresponding tabular data using wide and deep models. With
-that in mind there are two architectures that can be implemented with just a
-few lines of code.
+that in mind there are a number of architectures that can be implemented with
+just a few lines of code. The main components of those architectures are shown
+in the Figure below:

-### Architectures
-
-**Architecture 1**:

 <p align="center">
-  <img width="750" src="docs/figures/architecture_1.png">
+  <img width="750" src="docs/figures/widedeep_arch.png">
 </p>

-Architecture 1 combines the `Wide`, Linear model with the outputs from the
-`DeepDense` or `DeepDenseResnet`, `DeepText` and `DeepImage` components
-connected to a final output neuron or neurons, depending on whether we are
-performing a binary classification or regression, or a multi-class
-classification. The components within the faded-pink rectangles are
-concatenated.
+The dashed boxes in the figure represent optional, overall components, and the
+dashed lines/arrows indicate the corresponding connections, depending on
+whether or not certain components are present. For example, the dashed,
+blue-lines indicate that the ``deeptabular``, ``deeptext`` and ``deepimage``
+components are connected directly to the output neuron or neurons (depending
+on whether we are performing a binary classification or regression, or a
+multi-class classification) if the optional ``deephead`` is not present.
+Finally, the components within the faded-pink rectangle are concatenated.
+
+Note that it is not possible to illustrate the number of possible
+architectures and components available in ``pytorch-widedeep`` in one Figure.
+Therefore, for more details on possible architectures (and more) please, see
+the
+[documentation]((https://pytorch-widedeep.readthedocs.io/en/latest/index.html)),
+or the Examples folders and the notebooks there.

 In math terms, and following the notation in the
-[paper](https://arxiv.org/abs/1606.07792), Architecture 1 can be formulated
-as:
+[paper](https://arxiv.org/abs/1606.07792), the expression for the architecture
+without a ``deephead`` component can be formulated as:

 <p align="center">
  <img width="500" src="docs/figures/architecture_1_math.png">
@@ -67,43 +72,47 @@ the constituent features (“gender=female” and “language=en”) are all 1,
 otherwise".*


-**Architecture 2**
-
-<p align="center">
-  <img width="750" src="docs/figures/architecture_2.png">
-</p>
-
-Architecture 2 combines the `Wide`, Linear model with the Deep components of
-the model connected to the output neuron(s), after the different Deep
-components have been themselves combined through a FC-Head (that I refer as
-`deephead`).
-
-In math terms, and following the notation in the
-[paper](https://arxiv.org/abs/1606.07792), Architecture 2 can be formulated
-as:
+While if there is a ``deephead`` component, the previous expression turns
+into:

 <p align="center">
  <img width="300" src="docs/figures/architecture_2_math.png">
 </p>

-Note that each individual component, `wide`, `deepdense` (either `DeepDense`
-or `DeepDenseResnet`), `deeptext` and `deepimage`, can be used independently
-and in isolation. For example, one could use only `wide`, which is in simply a
-linear model.
-
-On the other hand, while I recommend using the `Wide` and `DeepDense` (or
-`DeepDenseResnet`) classes in `pytorch-widedeep` to build the `wide` and
-`deepdense` component, it is very likely that users will want to use their own
-models in the case of the `deeptext` and `deepimage` components. That is
-perfectly possible as long as the the custom models have an attribute called
-`output_dim` with the size of the last layer of activations, so that
-`WideDeep` can be constructed
-
-`pytorch-widedeep` includes standard text (stack of LSTMs) and image
+It is important to emphasize that **each individual component, `wide`,
+`deeptabular`, `deeptext` and `deepimage`, can be used independently** and in
+isolation. For example, one could use only `wide`, which is in simply a linear
+model. In fact, one of the most interesting functionalities
+in``pytorch-widedeep`` is the ``deeptabular`` component. Currently,
+``pytorch-widedeep`` offers 3 models for that component:
+
+1. ``TabMlp``: this is almost identical to the [tabular
+model](https://docs.fast.ai/tutorial.tabular.html) in the fantastic
+[fastai](https://docs.fast.ai/) library, and consists simply in embeddings
+representing the categorical features, concatenated with the continuous
+features, and passed then through a MLP.
+
+2. ``TabRenset``: This is similar to the previous model but the embeddings are
+passed through a series of ResNet blocks built with dense layers.
+
+3. ``TabTransformer``: Details on the TabTransformer can be found in:
+[TabTransformer: Tabular Data Modeling Using Contextual
+Embeddings](https://arxiv.org/pdf/2012.06678.pdf)
+
+
+For details on these 3 models and their options please see the examples in the
+Examples folder and the documentation.
+
+Finally, while I recommend using the ``wide`` and ``deeptabular`` models in
+``pytorch-widedeep`` it is very likely that users will want to use their own
+models for the ``deeptext`` and ``deepimage`` components. That is perfectly
+possible as long as the the custom models have an attribute called
+``output_dim`` with the size of the last layer of activations, so that
+``WideDeep`` can be constructed. Again, examples on how to use custom
+components can be found in the Examples folder. Just in case
+``pytorch-widedeep`` includes standard text (stack of LSTMs) and image
 (pre-trained ResNets or stack of CNNs) models.

-See the examples folder or the docs for more information.
-

 ### Installation

@@ -130,8 +139,8 @@ cd pytorch-widedeep
 pip install -e .
 ```

-**Important note for Mac users**: at the time of writing (Dec-2020) the latest
-`torch` release is `1.7`. This release has some
+**Important note for Mac users**: at the time of writing (Feb-2020) the latest
+`torch` release is `1.7.1`. This release has some
 [issues](https://stackoverflow.com/questions/64772335/pytorch-w-parallelnative-cpp206)
 when running on Mac and the data-loaders will not run in parallel. In
 addition, since `python 3.8`, [the `multiprocessing` library start method
@@ -158,17 +167,26 @@ Binary classification with the [adult
 dataset]([adult](https://www.kaggle.com/wenruliu/adult-income-dataset))
 using `Wide` and `DeepDense` and defaults settings.

+
+```python
+```
+
+Building a wide (linear) and deep model with ``pytorch-widedeep``:
+
 ```python
+
 import pandas as pd
 import numpy as np
 from sklearn.model_selection import train_test_split

-from pytorch_widedeep.preprocessing import WidePreprocessor, DensePreprocessor
-from pytorch_widedeep.models import Wide, DeepDense, WideDeep
+from pytorch_widedeep import Trainer
+from pytorch_widedeep.preprocessing import WidePreprocessor, TabPreprocessor
+from pytorch_widedeep.models import Wide, TabMlp, WideDeep
 from pytorch_widedeep.metrics import Accuracy

-# these next 4 lines are not directly related to pytorch-widedeep. I assume
-# you have downloaded the dataset and place it in a dir called data/adult/
+# the following 4 lines are not directly related to ``pytorch-widedeep``. I
+# assume you have downloaded the dataset and place it in a dir called
+# data/adult/
 df = pd.read_csv("data/adult/adult.csv.zip")
 df["income_label"] = (df["income"].apply(lambda x: ">50K" in x)).astype(int)
 df.drop("income", axis=1, inplace=True)
@@ -197,34 +215,28 @@ target_col = "income_label"
 target = df_train[target_col].values

 # wide
-preprocess_wide = WidePreprocessor(wide_cols=wide_cols, crossed_cols=cross_cols)
-X_wide = preprocess_wide.fit_transform(df_train)
+wide_preprocessor = WidePreprocessor(wide_cols=wide_cols, crossed_cols=cross_cols)
+X_wide = wide_preprocessor.fit_transform(df_train)
 wide = Wide(wide_dim=np.unique(X_wide).shape[0], pred_dim=1)

-# deepdense
-preprocess_deep = DensePreprocessor(embed_cols=embed_cols, continuous_cols=cont_cols)
-X_deep = preprocess_deep.fit_transform(df_train)
-deepdense = DeepDense(
-    hidden_layers=[64, 32],
-    deep_column_idx=preprocess_deep.deep_column_idx,
-    embed_input=preprocess_deep.embeddings_input,
+# deeptabular
+tab_preprocessor = TabPreprocessor(embed_cols=embed_cols, continuous_cols=cont_cols)
+X_tab = tab_preprocessor.fit_transform(df_train)
+deeptabular = TabMlp(
+    mlp_hidden_dims=[64, 32],
+    column_idx=tab_preprocessor.column_idx,
+    embed_input=tab_preprocessor.embeddings_input,
    continuous_cols=cont_cols,
 )
-# # To use DeepDenseResnet as the deepdense component simply:
-# from pytorch_widedeep.models import DeepDenseResnet:
-# deepdense = DeepDenseResnet(
-#     blocks=[64, 32],
-#     deep_column_idx=preprocess_deep.deep_column_idx,
-#     embed_input=preprocess_deep.embeddings_input,
-#     continuous_cols=cont_cols,
-# )
-
-# build, compile and fit
-model = WideDeep(wide=wide, deepdense=deepdense)
-model.compile(method="binary", metrics=[Accuracy])
-model.fit(
+
+# wide and deep
+model = WideDeep(wide=wide, deeptabular=deeptabular)
+
+# train the model
+trainer = Trainer(model, objective="binary", metrics=[Accuracy])
+trainer.fit(
    X_wide=X_wide,
-    X_deep=X_deep,
+    X_tab=X_tab,
    target=target,
    n_epochs=5,
    batch_size=256,
@@ -232,26 +244,17 @@ model.fit(
 )

 # predict
-X_wide_te = preprocess_wide.transform(df_test)
-X_deep_te = preprocess_deep.transform(df_test)
-preds = model.predict(X_wide=X_wide_te, X_deep=X_deep_te)
-
-#  # save and load
-# torch.save(model, "model_weights/model.t")
-# model = torch.load("model_weights/model.t")
-
-#  # or via state dictionaries
-# torch.save(model.state_dict(), PATH)
-# model = WideDeep(*args)
-# model.load_state_dict(torch.load(PATH))
+X_wide_te = wide_preprocessor.transform(df_test)
+X_tab_te = tab_preprocessor.transform(df_test)
+preds = trainer.predict(X_wide=X_wide_te, X_tab=X_tab_te)
+
+# save and load
+trainer.save_model("model_weights/model.t")
 ```

-Of course, one can do much more, such as using different initializations,
-optimizers or learning rate schedulers for each component of the overall
-model. Adding FC-Heads to the Text and Image components. Using the [Focal
-Loss](https://arxiv.org/abs/1708.02002), warming up individual components
-before joined training, etc. See the `examples` or the `docs` folders for a
-better understanding of the content of the package and its functionalities.
+Of course, one can do **much more**. See the Examples folder, the
+documentation or the companion posts for a better understanding of the content
+of the package and its functionalities.

 ### Testing


--- a/VERSION
+++ b/VERSION
-0.4.7
\ No newline at end of file
+0.4.8
\ No newline at end of file
--- a/docs/_static/custom.css
+++ b/docs/_static/custom.css
@@ -39,3 +39,7 @@ div.ethical-rtd {
 .wy-nav-content {
    max-width: none; !important;
 }
+
+div.container a.header-logo {
+  background-image: url("../figures/widedeep_logo.png");
+}
--- a/docs/_static/img/widedeep_logo_docs.ico
+++ b/docs/_static/img/widedeep_logo_docs.ico
--- a/docs/callbacks.rst
+++ b/docs/callbacks.rst
 Callbacks
 =========

+Here are the 4 callbacks available in ``pytorch-widedepp``: ``History``,
+``LRHistory``, ``ModelCheckpoint`` and ``EarlyStopping``.
+
+.. note:: ``History`` runs by default, so it should not be passed
+    to the ``Trainer``
+
 .. autoclass:: pytorch_widedeep.callbacks.History
 	:members:
-	:undoc-members:
-	:show-inheritance:

 .. autoclass:: pytorch_widedeep.callbacks.LRHistory
 	:members:
-	:undoc-members:
-	:show-inheritance:

 .. autoclass:: pytorch_widedeep.callbacks.ModelCheckpoint
 	:members:
-	:undoc-members:
-	:show-inheritance:

 .. autoclass:: pytorch_widedeep.callbacks.EarlyStopping
 	:members:
-	:undoc-members:
-	:show-inheritance:
\ No newline at end of file
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -16,6 +16,8 @@ import os
 import re
 import sys

+from sphinx.ext.napoleon.docstring import GoogleDocstring
+
 # this adds the equivalent of "../../" to the python path
 PACKAGEDIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
 sys.path.insert(0, PACKAGEDIR)
@@ -24,7 +26,7 @@ sys.path.insert(0, PACKAGEDIR)
 # -- Project information -----------------------------------------------------

 project = "pytorch-widedeep"
-copyright = "2020, Javier Rodriguez Zaurin"
+copyright = "2021, Javier Rodriguez Zaurin"
 author = "Javier Rodriguez Zaurin"

 # # The full version, including alpha/beta/rc tags
@@ -37,7 +39,7 @@ author = "Javier Rodriguez Zaurin"
 #     String
 #         Of the form "<major>.<minor>.<micro>", in which "major", "minor" and "micro" are numbers

-# 	"""
+#   """
 #     with open("../pytorch_widedeep/VERSION") as f:
 #         return f.read().strip()
 # release = get_version()
@@ -45,8 +47,8 @@ author = "Javier Rodriguez Zaurin"
 with open(os.path.join(PACKAGEDIR, "pytorch_widedeep", "version.py")) as f:
    version = re.search(r"__version__ \= \"(\d+\.\d+\.\d+)\"", f.read())
    assert version is not None, "can't parse __version__ from __init__.py"
-    version = version.groups()[0]
-    assert len(version.split(".")) == 3, "bad version spec"
+    version = version.groups()[0]  # type: ignore[assignment]
+    assert len(version.split(".")) == 3, "bad version spec"  # type: ignore[attr-defined]
    release = version

 # -- General configuration ---------------------------------------------------
@@ -77,6 +79,8 @@ extensions = [

 autosummary_generate = True

+napoleon_use_ivar = True
+
 # Add any paths that contain templates here, relative to this directory.
 templates_path = ["_templates"]

@@ -107,6 +111,10 @@ pygments_style = "sphinx"
 # Remove the prompt when copying examples
 copybutton_prompt_text = ">>> "

+autoclass_content = "init"  # 'both'
+autodoc_member_order = "bysource"
+# autodoc_default_flags = ["show-inheritance"]
+
 # -- Options for HTML output -------------------------------------------------

 # The theme to use for HTML and HTML Help pages.  See the documentation for
@@ -118,7 +126,7 @@ html_theme = "sphinx_rtd_theme"
 # further.  For a list of options available for each theme, see the
 # documentation.
 #
-html_theme_options = {"analytics_id": "UA-83738774-2"}
+# html_theme_options = {"analytics_id": "UA-83738774-2"}

 # html_theme_options = {
 #     "canonical_url": "",
@@ -154,7 +162,15 @@ html_static_path = ["_static"]
 # directory) that is the favicon of the docs. Modern browsers use this as
 # the icon for tabs, windows and bookmarks. It should be a Windows-style
 # icon file (.ico).
-html_favicon = "infinitoml.ico"
+html_favicon = "_static/img/widedeep_logo_docs.ico"
+html_logo = "figures/widedeep_logo.png"
+html_theme_options = {
+    "canonical_url": "https://pytorch-widedeep.readthedocs.io/en/latest/",
+    "collapse_navigation": False,
+    "logo_only": False,
+    "display_version": True,
+}
+# html_favicon = "_static/img/widedeep_logo_docs.ico"


 # -- Options for HTMLHelp output ---------------------------------------------
@@ -165,7 +181,7 @@ htmlhelp_basename = "pytorch_widedeepdoc"

 # -- Options for LaTeX output ------------------------------------------------

-latex_elements = {
+latex_elements = {  # type: ignore[var-annotated]
    # The paper size ('letterpaper' or 'a4paper').
    #
    # 'papersize': 'letterpaper',
@@ -244,3 +260,38 @@ todo_include_todos = True

 def setup(app):
    app.add_css_file("custom.css")
+
+
+# -- Extensions to the  Napoleon GoogleDocstring class ---------------------
+# first, we define new methods for any new sections and add them to the class
+def parse_keys_section(self, section):
+    return self._format_fields("Keys", self._consume_fields())
+
+
+GoogleDocstring._parse_keys_section = parse_keys_section  # type: ignore[attr-defined]
+
+
+def parse_attributes_section(self, section):
+    return self._format_fields("Attributes", self._consume_fields())
+
+
+GoogleDocstring._parse_attributes_section = parse_attributes_section  # type: ignore[assignment]
+
+
+def parse_class_attributes_section(self, section):
+    return self._format_fields("Class Attributes", self._consume_fields())
+
+
+GoogleDocstring._parse_class_attributes_section = parse_class_attributes_section  # type: ignore[attr-defined]
+
+
+# we now patch the parse method to guarantee that the the above methods are
+# assigned to the _section dict
+def patched_parse(self):
+    self._sections["keys"] = self._parse_keys_section
+    self._sections["class attributes"] = self._parse_class_attributes_section
+    self._unpatched_parse()
+
+
+GoogleDocstring._unpatched_parse = GoogleDocstring._parse  # type: ignore[attr-defined]
+GoogleDocstring._parse = patched_parse  # type: ignore[assignment]
--- a/docs/examples.rst
+++ b/docs/examples.rst
@@ -6,8 +6,10 @@ understand the functionalities withing ``pytorch-widedeep`` and how to use
 them to address different problems

 * `Preprocessors and Utils <https://github.com/jrzaurin/pytorch-widedeep/blob/master/examples/01_Preprocessors_and_utils.ipynb>`__
-* `Model Components <https://github.com/jrzaurin/pytorch-widedeep/blob/master/examples/02_Model_Components.ipynb>`__
+* `Model Components <https://github.com/jrzaurin/pytorch-widedeep/blob/master/examples/02_1_Model_Components.ipynb>`__
+* `deeptabular Models <https://github.com/jrzaurin/pytorch-widedeep/blob/master/examples/02_2_deeptabular_models.ipynb>`__
 * `Binary Classification with default parameters <https://github.com/jrzaurin/pytorch-widedeep/blob/master/examples/03_Binary_Classification_with_Defaults.ipynb>`__
 * `Binary Classification with varying parameters <https://github.com/jrzaurin/pytorch-widedeep/blob/master/examples/04_Binary_Classification_Varying_Parameters.ipynb>`__
 * `Regression with Images and Text <https://github.com/jrzaurin/pytorch-widedeep/blob/master/examples/05_Regression_with_Images_and_Text.ipynb>`__
-* `Warm up routines <https://github.com/jrzaurin/pytorch-widedeep/blob/master/examples/06_WarmUp_Model_Components.ipynb>`__
+* `FineTune routines <https://github.com/jrzaurin/pytorch-widedeep/blob/master/examples/06_FineTune_and_WarmUp_Model_Components.ipynb>`__
+* `Custom Components <https://github.com/jrzaurin/pytorch-widedeep/blob/master/examples/07_Custom_Components.ipynb>`__
--- a/docs/figures/architecture_1.png
+++ b/docs/figures/architecture_1.png
--- a/docs/figures/architecture_1_math.png
+++ b/docs/figures/architecture_1_math.png
--- a/docs/figures/architecture_1_v1.png
+++ b/docs/figures/architecture_1_v1.png
--- a/docs/figures/architecture_2.png
+++ b/docs/figures/architecture_2.png
--- a/docs/figures/architecture_2_v1.png
+++ b/docs/figures/architecture_2_v1.png
--- a/docs/figures/resnet_block.png
+++ b/docs/figures/resnet_block.png
--- a/docs/figures/tabmlp_arch.png
+++ b/docs/figures/tabmlp_arch.png
--- a/docs/figures/tabresnet_arch.png
+++ b/docs/figures/tabresnet_arch.png
--- a/docs/figures/tabtransformer_arch.png
+++ b/docs/figures/tabtransformer_arch.png
--- a/docs/figures/transformer_block.png
+++ b/docs/figures/transformer_block.png
--- a/docs/figures/widedeep_arch.png
+++ b/docs/figures/widedeep_arch.png
--- a/docs/figures/widedeep_logo_v1.png
+++ b/docs/figures/widedeep_logo_v1.png
--- a/docs/figures/widedeep_logo_v2.png
+++ b/docs/figures/widedeep_logo_v2.png
--- a/docs/figures/widedeep_logo_v3.png
+++ b/docs/figures/widedeep_logo_v3.png
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -3,9 +3,9 @@ pytorch-widedeep

 *A flexible package to combine tabular data with text and images using wide and deep models*

-Below there is an introduction to the two architectures available in
+Below there is an introduction to the architectures one can build using
 ``pytorch-widedeep``. If you prefer to learn about the utilities and
-components go straight to the Documentation.
+components go straight to corresponding sections in the Documentation.

 Documentation
 -------------
@@ -19,51 +19,56 @@ Documentation
    Preprocessing <preprocessing>
    Model Components <model_components>
    Metrics <metrics>
+    Losses <losses>
    Callbacks <callbacks>
-    Focal Loss <losses>
-    Wide and Deep Models <wide_deep>
+    The Trainer <trainer>
    Examples <examples>


 Introduction
 ------------
-``pytorch-widedeep`` is based on Google's Wide and Deep Algorithm. Details of
-the original algorithm can be found in this nice `tutorial
-<https://www.tensorflow.org/tutorials/wide_and_deep>`_, and the `research
-paper <https://arxiv.org/abs/1606.07792>`_ [1].
+``pytorch-widedeep`` is based on Google's `Wide and Deep Algorithm
+<https://arxiv.org/abs/1606.07792>`_.

 In general terms, ``pytorch-widedeep`` is a package to use deep learning with
 tabular data. In particular, is intended to facilitate the combination of text
 and images with corresponding tabular data using wide and deep models. With
-that in mind there are two architectures that can be implemented with just a
-few lines of code.
+that in mind there are a number of architectures that can be implemented with
+just a few lines of code. The main components of those architectures are shown
+in the Figure below:

-
-Architectures
-------------
-
-**Architecture 1**
-
-.. image:: figures/architecture_1.png
-   :width: 600px
+.. image:: figures/widedeep_arch.png
+   :width: 700px
   :align: center

-Architecture 1 combines the `Wide`, Linear model with the outputs from the
-`DeepDense` or `DeepDenseResnet`, `DeepText` and `DeepImage` components
-connected to a final output neuron or neurons, depending on whether we are
-performing a binary classification or regression, or a multi-class
-classification. The components within the faded-pink rectangles are
-concatenated.
+The dashed boxes in the figure represent optional, overall components, and the
+dashed lines indicate the corresponding connections, depending on whether or
+not certain components are present. For example, the dashed, blue-arrows
+indicate that the ``deeptabular``, ``deeptext`` and ``deepimage`` components
+are connected directly to the output neuron or neurons (depending on whether
+we are performing a binary classification or regression, or a multi-class
+classification) if the optional ``deephead`` is not present. The components
+within the faded-pink rectangle are concatenated.
+
+Note that it is not possible to illustrate the number of possible
+architectures and components available in ``pytorch-widedeep`` in one Figure.
+Therefore, for more details on possible architectures (and more) please, read
+this documentation, or see the `Examples
+<https://github.com/jrzaurin/pytorch-widedeep/tree/master/examples>`_ folders
+in the repo.

 In math terms, and following the notation in the `paper
-<https://arxiv.org/abs/1606.07792>`_, Architecture 1 can be formulated as:
+<https://arxiv.org/abs/1606.07792>`_, the expression for the architecture
+without a ``deephead`` component can be formulated as:
+

 .. image:: figures/architecture_1_math.png
-   :width: 500px
+   :width: 600px
   :align: center

+
 Where *'W'* are the weight matrices applied to the wide model and to the final
-activations of the deep models, *'a'* are these final activations, and
+activations of the deep models, :math:`a` are these final activations, and
 :math:`{\phi}` (x) are the cross product transformations of the original
 features *'x'*. In case you are wondering what are *"cross product
 transformations"*, here is a quote taken directly from the paper: *"For binary
@@ -72,34 +77,47 @@ language=en)”) is 1 if and only if the constituent features (“gender=female
 and “language=en”) are all 1, and 0 otherwise".* Finally, :math:`{\sigma}` (.)
 is the activation function.

+While if there is a ``deephead`` component, the previous expression turns
+into:

-**Architecture 2**
-
-.. image:: figures/architecture_2.png
-   :width: 600px
+.. image:: figures/architecture_2_math.png
+   :width: 350px
   :align: center

-Architecture 2 combines the `Wide`, Linear model with the Deep components of
-the model connected to the output neuron(s), after the different Deep
-components have been themselves combined through a FC-Head (that I refer as
-`deephead`).

-In math terms, and following the notation in the `paper
-<https://arxiv.org/abs/1606.07792>`_, Architecture 2 can be formulated as:
+It is important to emphasize that **each individual component, wide,
+deeptabular, deeptext and deepimage, can be used independently** and in
+isolation. For example, one could use only ``wide``, which is in simply a
+linear model. In fact, one of the most interesting offerings of
+``pytorch-widedeep`` is the ``deeptabular`` component. Currently,
+``pytorch-widedeep`` offers 3 models for that component:

-.. image:: figures/architecture_2_math.png
-   :width: 300px
-   :align: center
+1. ``TabMlp``: this is almost identical to the `tabular
+model <https://docs.fast.ai/tutorial.tabular.html>`_ in the fantastic
+`fastai <https://docs.fast.ai/>`_ library, and consists simply in embeddings
+representing the categorical features, concatenated with the continuous
+features, and passed then through a MLP.
+
+2. ``TabRenset``: This is similar to the previous model but the embeddings are
+passed through a series of ResNet blocks built with dense layers.
+
+3. ``TabTransformer``: Details on the TabTransformer can be found in:
+`TabTransformer: Tabular Data Modeling Using Contextual
+Embeddings <https://arxiv.org/pdf/2012.06678.pdf>`_.
+
+
+For details on these 3 models and their options please see the examples in the
+Examples folder and the documentation.

-When using `pytorch-widedeep`, the assumption is that the so called `Wide` and
-`deep dense` (this can be either `DeepDense` or `DeepDenseResnet`. See the
-documentation and examples folder for more details) components in the figures
-are **always** present, while `DeepText text` and `DeepImage` are optional.
-`pytorch-widedeep` includes standard text (stack of LSTMs) and image
-(pre-trained ResNets or stack of CNNs) models. However, the user can use any
-custom model as long as it has an attribute called `output_dim` with the size
-of the last layer of activations, so that `WideDeep` can be constructed. See
-the examples folder or the docs for more information.
+Finally, while I recommend using the ``wide`` and ``deeptabular`` models in
+``pytorch-widedeep`` it is very likely that users will want to use their own
+models for the ``deeptext`` and ``deepimage`` components. That is perfectly
+possible as long as the the custom models have an attribute called
+``output_dim`` with the size of the last layer of activations, so that
+``WideDeep`` can be constructed. Again, examples on how to use custom
+components can be found in the Examples folder. Just in case
+``pytorch-widedeep`` includes standard text (stack of LSTMs) and image
+(pre-trained ResNets or stack of CNNs) models.

 References
 ----------
@@ -110,5 +128,4 @@ Indices and tables
 ==================

 * :ref:`genindex`
-* :ref:`modindex`
 * :ref:`search`
--- a/docs/infinitoml.ico
+++ b/docs/infinitoml.ico
--- a/docs/installation.rst
+++ b/docs/installation.rst
@@ -39,3 +39,5 @@ Dependencies
 * tqdm
 * torch
 * torchvision
+* einops
+* wrapt
\ No newline at end of file
--- a/docs/losses.rst
+++ b/docs/losses.rst
 Losses
 ======

+``pytorch-widedeep`` accepts a number of losses and objectives that can be
+passed to the ``Trainer`` class via the ``str`` parameter ``objective`` (see
+``pytorch-widedeep.training.Trainer``). For most cases the loss function that
+``pytorch-widedeep`` will use internally is already implemented in Pytorch.
+
+In addition, ``pytorch-widedeep`` implements four "custom" loss functions.
+These are described below for completion since, as I mentioned before, they
+are used internally by the ``Trainer``. Of course, onen could always use them
+on their own and can be imported as:
+
+.. code-block:: python
+
+	from pytorch_widedeep.losses import FocalLoss
+
+.. note:: Losses in this module expect the predictions and ground truth to have the
+	same dimensions for regression and binary classification problems (i.e.
+	:math:`N_{samples}, 1)`. In the case of multiclass classification problems
+	the ground truth is expected to be a 1D tensor with the corresponding
+	classes. See Examples below
+
 .. autoclass:: pytorch_widedeep.losses.FocalLoss
 	:members:
-	:undoc-members:
-	:show-inheritance:
+
+.. autoclass:: pytorch_widedeep.losses.MSLELoss
+	:members:
+
+.. autoclass:: pytorch_widedeep.losses.RMSELoss
+	:members:
+
+.. autoclass:: pytorch_widedeep.losses.RMSLELoss
+	:members:
--- a/docs/metrics.rst
+++ b/docs/metrics.rst
 Metrics
 =======

+.. note:: Metrics in this module expect the predictions and ground truth to have the
+	same dimensions for regression and binary classification problems (i.e.
+	:math:`N_{samples}, 1)`. In the case of multiclass classification problems the
+	ground truth is expected to be a 1D tensor with the corresponding classes.
+	See Examples below
+
 .. autoclass:: pytorch_widedeep.metrics.Accuracy
 	:members:
 	:undoc-members:
-	:show-inheritance:

 .. autoclass:: pytorch_widedeep.metrics.Precision
 	:members:
 	:undoc-members:
-	:show-inheritance:

 .. autoclass:: pytorch_widedeep.metrics.Recall
 	:members:
 	:undoc-members:
-	:show-inheritance:

 .. autoclass:: pytorch_widedeep.metrics.FBetaScore
 	:members:
 	:undoc-members:
-	:show-inheritance:

 .. autoclass:: pytorch_widedeep.metrics.F1Score
 	:members:
 	:undoc-members:
-	:show-inheritance:
+
+.. autoclass:: pytorch_widedeep.metrics.R2Score
+	:members:
+	:undoc-members:
--- a/docs/model_components.rst
+++ b/docs/model_components.rst
 The ``models`` module
 ======================

-This module contains the four main Wide and Deep model component. These are:
-``Wide``, ``DeepDense`` or ``DeepDenseResnet``, ``DeepText`` and ``DeepImage``.
+This module contains the four main components that will comprise a Wide and
+Deep model, and the ``WideDeep`` "constructor" class. These four components
+are: ``wide``, ``deeptabular``, ``deeptext``, ``deepimage``.

-.. note:: ``DeepDense`` and ``DeepDenseResnet`` both correspond to what we
-    refer as the `"deep dense"` component of the model and simply represent
-    two different alternatives
+.. note:: ``TabMlp``, ``TabResnet`` and ``TabTransformer`` can all be used
+    as the ``deeptabular``  component of the model and simply represent
+    different alternatives

 .. autoclass:: pytorch_widedeep.models.wide.Wide
-    :members:
-    :undoc-members:
-    :show-inheritance:
+	:exclude-members: forward
+	:members:

-.. autoclass:: pytorch_widedeep.models.deep_dense.DeepDense
-    :members:
-    :undoc-members:
-    :show-inheritance:
+.. autoclass:: pytorch_widedeep.models.tab_mlp.TabMlp
+	:exclude-members: forward
+	:members:

-.. autoclass:: pytorch_widedeep.models.deep_dense_resnet.DeepDenseResnet
-    :members:
-    :undoc-members:
-    :show-inheritance:
+.. autoclass:: pytorch_widedeep.models.tab_resnet.TabResnet
+	:exclude-members: forward
+	:members:
+
+.. autoclass:: pytorch_widedeep.models.tab_transformer.TabTransformer
+	:exclude-members: forward
+	:members:

 .. autoclass:: pytorch_widedeep.models.deep_text.DeepText
-    :members:
-    :undoc-members:
-    :show-inheritance:
+	:exclude-members: forward
+	:members:

 .. autoclass:: pytorch_widedeep.models.deep_image.DeepImage
-    :members:
-    :undoc-members:
-    :show-inheritance:
+	:exclude-members: forward
+	:members:
+
+.. autoclass:: pytorch_widedeep.models.wide_deep.WideDeep
+	:exclude-members: forward
+	:members:
--- a/docs/preprocessing.rst
+++ b/docs/preprocessing.rst
@@ -2,24 +2,21 @@ The ``preprocessing`` module
 ============================

 This module contains the classes that are used to prepare the data before
-being passed to the Wide and Deep `constructor` class
+being passed to the models. There is one Preprocessor per model type or
+component: ``wide``, ``deeptabular``, ``deepimage`` and ``deeptext``.

 .. autoclass:: pytorch_widedeep.preprocessing.WidePreprocessor
 	:members:
 	:undoc-members:
-	:show-inheritance:

-.. autoclass:: pytorch_widedeep.preprocessing.DensePreprocessor
+.. autoclass:: pytorch_widedeep.preprocessing.TabPreprocessor
 	:members:
 	:undoc-members:
-	:show-inheritance:

 .. autoclass:: pytorch_widedeep.preprocessing.TextPreprocessor
 	:members:
 	:undoc-members:
-	:show-inheritance:

 .. autoclass:: pytorch_widedeep.preprocessing.ImagePreprocessor
 	:members:
 	:undoc-members:
-	:show-inheritance:
--- a/docs/quick_start.rst
+++ b/docs/quick_start.rst
@@ -3,15 +3,13 @@ Quick Start

 This is an example of a binary classification with the `adult census
 <https://www.kaggle.com/wenruliu/adult-income-dataset?select=adult.csv>`__
-dataset using a combination of a ``Wide`` and ``DeepDense`` model with
-defaults settings.
+dataset using a combination of a wide and deep model (in this case a so called
+``deeptabular`` model) with defaults settings.


 Read and split the dataset
 --------------------------

-The following code snippet is not directly related to ``pytorch-widedeep``.
-
 .. code-block:: python

    import pandas as pd
@@ -30,8 +28,9 @@ Prepare the wide and deep columns

 .. code-block:: python

-    from pytorch_widedeep.preprocessing import WidePreprocessor, DensePreprocessor
-    from pytorch_widedeep.models import Wide, DeepDense, WideDeep
+    from pytorch_widedeep import Trainer
+    from pytorch_widedeep.preprocessing import WidePreprocessor, TabPreprocessor
+    from pytorch_widedeep.models import Wide, TabMlp, WideDeep
    from pytorch_widedeep.metrics import Accuracy

    # prepare wide, crossed, embedding and continuous columns
@@ -56,39 +55,40 @@ Prepare the wide and deep columns
    # target
    target = df_train[target_col].values

-
 Preprocessing and model components definition
 ---------------------------------------------

 .. code-block:: python

    # wide
-    preprocess_wide = WidePreprocessor(wide_cols=wide_cols, crossed_cols=cross_cols)
-    X_wide = preprocess_wide.fit_transform(df_train)
+    wide_preprocessor = WidePreprocessor(wide_cols=wide_cols, crossed_cols=cross_cols)
+    X_wide = wide_preprocessor.fit_transform(df_train)
    wide = Wide(wide_dim=np.unique(X_wide).shape[0], pred_dim=1)

-    # deepdense
-    preprocess_deep = DensePreprocessor(embed_cols=embed_cols, continuous_cols=cont_cols)
-    X_deep = preprocess_deep.fit_transform(df_train)
-    deepdense = DeepDense(
-        hidden_layers=[64, 32],
-        deep_column_idx=preprocess_deep.deep_column_idx,
-        embed_input=preprocess_deep.embeddings_input,
+    # deeptabular
+    tab_preprocessor = TabPreprocessor(embed_cols=embed_cols, continuous_cols=cont_cols)
+    X_tab = tab_preprocessor.fit_transform(df_train)
+    deeptabular = TabMlp(
+        mlp_hidden_dims=[64, 32],
+        column_idx=tab_preprocessor.column_idx,
+        embed_input=tab_preprocessor.embeddings_input,
        continuous_cols=cont_cols,
    )

+    # wide and deep
+    model = WideDeep(wide=wide, deeptabular=deeptabular)
+

-Build, compile, fit and predict
+Fit and predict
 -------------------------------

 .. code-block:: python

-    # build, compile and fit
-    model = WideDeep(wide=wide, deepdense=deepdense)
-    model.compile(method="binary", metrics=[Accuracy])
-    model.fit(
+    # train the model
+    trainer = Trainer(model, objective="binary", metrics=[Accuracy])
+    trainer.fit(
        X_wide=X_wide,
-        X_deep=X_deep,
+        X_tab=X_tab,
        target=target,
        n_epochs=5,
        batch_size=256,
@@ -96,15 +96,13 @@ Build, compile, fit and predict
    )

    # predict
-    X_wide_te = preprocess_wide.transform(df_test)
-    X_deep_te = preprocess_deep.transform(df_test)
-    preds = model.predict(X_wide=X_wide_te, X_deep=X_deep_te)
-
-Of course, one can do much more, such as using different initializations,
-optimizers or learning rate schedulers for each component of the overall
-model. Adding FC-Heads to the Text and Image components. Using the Focal Loss,
-warming up individual components before joined training, etc. See the
-`examples
-<https://github.com/jrzaurin/pytorch-widedeep/tree/build_docs/examples>`__
-directory for a better understanding of the content of the package and its
-functionalities.
+    X_wide_te = wide_preprocessor.transform(df_test)
+    X_tab_te = tab_preprocessor.transform(df_test)
+    preds = trainer.predict(X_wide=X_wide_te, X_tab=X_tab_te)
+
+    # save and load
+    trainer.save_model("model_weights/model.t")
+
+Of course, one can do **much more**. See the Examples folder in the repo, this
+documentation or the companion posts for a better understanding of the content
+of the package and its functionalities.
--- a/docs/requirements.txt
+++ b/docs/requirements.txt
-sphinx==3.1.1
-sphinx_rtd_theme==0.5.0
-recommonmark==0.6.0
-sphinx-markdown-tables==0.0.15
-sphinx-copybutton==0.2.12
-sphinx-autodoc-typehints==1.11.0
+sphinx
+sphinx_rtd_theme
+recommonmark
+sphinx-markdown-tables
+sphinx-copybutton
+sphinx-autodoc-typehints
 pandas
 numpy
 scipy
@@ -15,3 +15,5 @@ imutils
 tqdm
 torch
 torchvision
+einops
+wrapt
\ No newline at end of file
--- a/docs/trainer.rst
+++ b/docs/trainer.rst
+Training wide and deep models for tabular data
+==============================================
+
+`...` or just deep learning models for tabular data.
+
+Here is the documentation for the ``Trainer`` class, that will do all the heavy lifting.
+
+Trainer is also available from ``pytorch-widedeep`` directly, for example, one could do:
+
+.. code-block:: python
+
+	from pytorch-widedeep.training import Trainer
+
+
+or also:
+
+.. code-block:: python
+
+	from pytorch-widedeep import Trainer
+
+
+.. autoclass:: pytorch_widedeep.training.Trainer
+	:exclude-members: forward
+	:members:
+	:undoc-members:
--- a/docs/utils/deeptabular_utils.rst
+++ b/docs/utils/deeptabular_utils.rst
+deeptabular utils
+=================
+
+.. autoclass:: pytorch_widedeep.utils.deeptabular_utils.LabelEncoder
+	:members:
+	:undoc-members:
--- a/docs/utils/dense_utils.rst
+++ b/docs/utils/dense_utils.rst
-Dense Utils
-===========
-
-.. autoclass:: pytorch_widedeep.utils.dense_utils.LabelEncoder
-	:members:
-	:undoc-members:
-	:show-inheritance:
--- a/docs/utils/fastai_transforms.rst
+++ b/docs/utils/fastai_transforms.rst
-Fastai Transforms
+Fastai transforms
 =================

 I have directly copied and pasted part of the ``transforms.py`` module from
 the ``fastai`` library. The reason to do such a thing is because
 ``pytorch_widedeep`` only needs the ``Tokenizer`` and the ``Vocab`` classes
-there. This way I avoid the numerous fastai dependencies. Credit for all the
-code in the ``fastai_transforms`` module to Jeremy Howard and the `fastai`
-team. I only include the documentation here for completion, but I strongly
-advise the user to read the ``fastai`` `documentation
-<https://docs.fast.ai/>`_.
-
-.. autoclass:: pytorch_widedeep.utils.fastai_transforms.SpacyTokenizer
-	:members:
-	:undoc-members:
-	:show-inheritance:
+there. This way I avoid extra dependencies. Credit for all the code in the
+``fastai_transforms`` module in this ``pytorch-widedeep`` package goes to
+Jeremy Howard and the `fastai` team. I only include the documentation here for
+completion, but I strongly advise the user to read the ``fastai``
+`documentation <https://docs.fast.ai/>`_.

 .. autoclass:: pytorch_widedeep.utils.fastai_transforms.Tokenizer
 	:members:
 	:undoc-members:
-	:show-inheritance:

 .. autoclass:: pytorch_widedeep.utils.fastai_transforms.Vocab
 	:members:
 	:undoc-members:
-	:show-inheritance:
--- a/docs/utils/image_utils.rst
+++ b/docs/utils/image_utils.rst
-Image Utils
+Image utils
 ===========
 :class:`SimplePreprocessor
 <pytorch_widedeep.utils.image_utils.SimplePreprocessor>` and
 :class:`AspectAwarePreprocessor
 <pytorch_widedeep.utils.image_utils.AspectAwarePreprocessor>` are directly
 taked from the great series of Books `Deep Learning for Computer Vision
-<https://www.pyimagesearch.com/deep-learning-computer-vision-python-book/>`_ by
-`Adrian <https://www.pyimagesearch.com/>`_. Therefore, all credit for the
-code in the ``image_utils`` module goes to *Adrian Rosebrock*
+<https://www.pyimagesearch.com/deep-learning-computer-vision-python-book/>`_
+by `Adrian <https://www.pyimagesearch.com/>`_. Therefore, all credit for the
+code in the ``image_utils`` module goes to `Adrian Rosebrock
+<https://www.pyimagesearch.com/>`_.

 .. autoclass:: pytorch_widedeep.utils.image_utils.AspectAwarePreprocessor
 	:members:
 	:undoc-members:
-	:show-inheritance:

 .. autoclass:: pytorch_widedeep.utils.image_utils.SimplePreprocessor
 	:members:
 	:undoc-members:
-	:show-inheritance:

--- a/docs/utils/index.rst
+++ b/docs/utils/index.rst
 The ``utils`` module
 ====================

-Initially the intention was for the ``utils`` module to be hidden from the
-user. However, there are a series of utilities there that might be useful for
-a number of preprocessing tasks. All the classes and functions discussed here
-are available directly from the ``utils`` module. For example, the
-``LabelEncoder`` within the ``dense_utils`` submodule can be imported as:
+These are a series utilities that might be useful for a number of
+preprocessing tasks, even not directly related to ``pytorch-widedeep``. All
+the classes and functions discussed here are available directly from the
+``utils`` module. For example, the ``LabelEncoder`` within the
+``deeptabular_utils`` submodule can be imported as:

 .. code-block:: python

@@ -17,7 +17,7 @@ Objects

 .. toctree::

-    dense_utils
-    image_utils
+    deeptabular_utils
    fastai_transforms
+    image_utils
    text_utils
\ No newline at end of file
--- a/docs/utils/text_utils.rst
+++ b/docs/utils/text_utils.rst
-Text Utils
+Text utils
 =================
 Collection of helper function that facilitate processing text.


--- a/docs/wide_deep.rst
+++ b/docs/wide_deep.rst
-Building Wide and Deep Models
-=============================
-
-Here is the documentation to build the two architectures, and the different
-options available in ``pytorch-widedeep`` as one builds the model.
-
-:class:`pytorch_widedeep.models.wide_deep.WideDeep` is the main class. It will
-collect all model components and build one of the two possible architectures
-with a series of optional parameters.
-
-.. autoclass:: pytorch_widedeep.models.wide_deep.WideDeep
-	:exclude-members: forward
-	:members:
-	:undoc-members:
-	:show-inheritance:
--- a/examples/01_Preprocessors_and_utils.ipynb
+++ b/examples/01_Preprocessors_and_utils.ipynb
--- a/examples/02_Model_Components.ipynb
+++ b/examples/02_Model_Components.ipynb
--- a/examples/02_2_deeptabular_models.ipynb
+++ b/examples/02_2_deeptabular_models.ipynb
--- a/examples/03_Binary_Classification_with_Defaults.ipynb
+++ b/examples/03_Binary_Classification_with_Defaults.ipynb
--- a/examples/04_Binary_Classification_Varying_Parameters.ipynb
+++ b/examples/04_Binary_Classification_Varying_Parameters.ipynb
--- a/examples/05_Regression_with_Images_and_Text.ipynb
+++ b/examples/05_Regression_with_Images_and_Text.ipynb
--- a/examples/06_WarmUp_Model_Components.ipynb
+++ b/examples/06_WarmUp_Model_Components.ipynb
--- a/examples/07_Custom_Components.ipynb
+++ b/examples/07_Custom_Components.ipynb
--- a/examples/adult_script.py
+++ b/examples/adult_script.py
@@ -2,8 +2,14 @@ import numpy as np
 import torch
 import pandas as pd

+from pytorch_widedeep import Trainer
 from pytorch_widedeep.optim import RAdam
-from pytorch_widedeep.models import Wide, WideDeep, DeepDense, DeepDenseResnet
+from pytorch_widedeep.models import (  # noqa: F401
+    Wide,
+    TabMlp,
+    WideDeep,
+    TabResnet,
+)
 from pytorch_widedeep.metrics import Accuracy, Precision
 from pytorch_widedeep.callbacks import (
    LRHistory,
@@ -11,7 +17,7 @@ from pytorch_widedeep.callbacks import (
    ModelCheckpoint,
 )
 from pytorch_widedeep.initializers import XavierNormal, KaimingNormal
-from pytorch_widedeep.preprocessing import WidePreprocessor, DensePreprocessor
+from pytorch_widedeep.preprocessing import TabPreprocessor, WidePreprocessor

 use_cuda = torch.cuda.is_available()

@@ -48,39 +54,39 @@ if __name__ == "__main__":
    target = df[target].values
    prepare_wide = WidePreprocessor(wide_cols=wide_cols, crossed_cols=crossed_cols)
    X_wide = prepare_wide.fit_transform(df)
-    prepare_deep = DensePreprocessor(
+    prepare_deep = TabPreprocessor(
        embed_cols=cat_embed_cols, continuous_cols=continuous_cols  # type: ignore[arg-type]
    )
-    X_deep = prepare_deep.fit_transform(df)
+    X_tab = prepare_deep.fit_transform(df)

    wide = Wide(wide_dim=np.unique(X_wide).shape[0], pred_dim=1)

-    deepdense = DeepDense(
-        hidden_layers=[64, 32],
-        dropout=[0.2, 0.2],
-        deep_column_idx=prepare_deep.deep_column_idx,
+    deeptabular = TabMlp(
+        mlp_hidden_dims=[200, 100],
+        mlp_dropout=[0.2, 0.2],
+        column_idx=prepare_deep.column_idx,
        embed_input=prepare_deep.embeddings_input,
        continuous_cols=continuous_cols,
    )

-    # # To use DeepDenseResnet as the deepdense component simply:
-    # deepdense = DeepDenseResnet(
-    #     blocks=[64, 32],
-    #     deep_column_idx=prepare_deep.deep_column_idx,
+    # # To use TabResnet as the deeptabular component simply:
+    # deeptabular = TabResnet(
+    #     blocks_dims=[200, 100],
+    #     column_idx=prepare_deep.column_idx,
    #     embed_input=prepare_deep.embeddings_input,
    #     continuous_cols=continuous_cols,
    # )

-    model = WideDeep(wide=wide, deepdense=deepdense)
+    model = WideDeep(wide=wide, deeptabular=deeptabular)

    wide_opt = torch.optim.Adam(model.wide.parameters(), lr=0.01)
-    deep_opt = RAdam(model.deepdense.parameters())
+    deep_opt = RAdam(model.deeptabular.parameters())
    wide_sch = torch.optim.lr_scheduler.StepLR(wide_opt, step_size=3)
    deep_sch = torch.optim.lr_scheduler.StepLR(deep_opt, step_size=5)

-    optimizers = {"wide": wide_opt, "deepdense": deep_opt}
-    schedulers = {"wide": wide_sch, "deepdense": deep_sch}
-    initializers = {"wide": KaimingNormal, "deepdense": XavierNormal}
+    optimizers = {"wide": wide_opt, "deeptabular": deep_opt}
+    schedulers = {"wide": wide_sch, "deeptabular": deep_sch}
+    initializers = {"wide": KaimingNormal, "deeptabular": XavierNormal}
    callbacks = [
        LRHistory(n_epochs=10),
        EarlyStopping(patience=5),
@@ -88,8 +94,9 @@ if __name__ == "__main__":
    ]
    metrics = [Accuracy, Precision]

-    model.compile(
-        method="binary",
+    trainer = Trainer(
+        model,
+        objective="binary",
        optimizers=optimizers,
        lr_schedulers=schedulers,
        initializers=initializers,
@@ -97,20 +104,20 @@ if __name__ == "__main__":
        metrics=metrics,
    )

-    model.fit(
+    trainer.fit(
        X_wide=X_wide,
-        X_deep=X_deep,
+        X_tab=X_tab,
        target=target,
        n_epochs=4,
        batch_size=64,
        val_split=0.2,
    )
-    # # to save/load the model
-    # torch.save(model, "model_weights/model.t")
-    # model = torch.load("model_weights/model.t")

+    # # to save/load the model
+    # trainer.save_model("model_weights/model.t")
+    # # ... days after
+    # model = Trainer.load_model("model_weights/model.t")
    # # or via state dictionaries
-    # torch.save(model.state_dict(), "model_weights/model_dict.t")
-    # model = WideDeep(wide=wide, deepdense=deepdense)
-    # model.load_state_dict(torch.load("model_weights/model_dict.t"))
-    # # <All keys matched successfully>
+    # trainer.save_model_state_dict("model_weights/model_dict.t")
+    # # ... days after, with an instantiated class of Trainer
+    # trainer.load_model_state_dict("model_weights/model_dict.t")
--- a/examples/adult_script_tabtransformer.py
+++ b/examples/adult_script_tabtransformer.py
+import numpy as np
+import torch
+import pandas as pd
+
+from pytorch_widedeep import Trainer
+from pytorch_widedeep.optim import RAdam
+from pytorch_widedeep.models import Wide, WideDeep, TabTransformer
+from pytorch_widedeep.metrics import Accuracy, Precision
+from pytorch_widedeep.callbacks import (
+    LRHistory,
+    EarlyStopping,
+    ModelCheckpoint,
+)
+from pytorch_widedeep.initializers import XavierNormal, KaimingNormal
+from pytorch_widedeep.preprocessing import TabPreprocessor, WidePreprocessor
+
+use_cuda = torch.cuda.is_available()
+
+if __name__ == "__main__":
+
+    df = pd.read_csv("data/adult/adult.csv.zip")
+    df.columns = [c.replace("-", "_") for c in df.columns]
+    df["age_buckets"] = pd.cut(
+        df.age, bins=[16, 25, 30, 35, 40, 45, 50, 55, 60, 91], labels=np.arange(9)
+    )
+    df["income_label"] = (df["income"].apply(lambda x: ">50K" in x)).astype(int)
+    df.drop("income", axis=1, inplace=True)
+    df.head()
+
+    wide_cols = [
+        "age_buckets",
+        "education",
+        "relationship",
+        "workclass",
+        "occupation",
+        "native_country",
+        "gender",
+    ]
+    crossed_cols = [("education", "occupation"), ("native_country", "occupation")]
+    cat_embed_cols = [
+        "education",
+        "relationship",
+        "workclass",
+        "occupation",
+        "native_country",
+    ]
+    continuous_cols = ["age", "hours_per_week"]
+    target = "income_label"
+    target = df[target].values
+    prepare_wide = WidePreprocessor(wide_cols=wide_cols, crossed_cols=crossed_cols)
+    X_wide = prepare_wide.fit_transform(df)
+    prepare_deep = TabPreprocessor(
+        embed_cols=cat_embed_cols, continuous_cols=continuous_cols, for_tabtransformer=True  # type: ignore[arg-type]
+    )
+    X_tab = prepare_deep.fit_transform(df)
+
+    wide = Wide(wide_dim=np.unique(X_wide).shape[0], pred_dim=1)
+
+    deeptabular = TabTransformer(
+        column_idx=prepare_deep.column_idx,
+        embed_input=prepare_deep.embeddings_input,
+        continuous_cols=continuous_cols,
+    )
+
+    model = WideDeep(wide=wide, deeptabular=deeptabular)
+
+    wide_opt = torch.optim.Adam(model.wide.parameters(), lr=0.01)
+    deep_opt = RAdam(model.deeptabular.parameters())
+    wide_sch = torch.optim.lr_scheduler.StepLR(wide_opt, step_size=3)
+    deep_sch = torch.optim.lr_scheduler.StepLR(deep_opt, step_size=5)
+
+    optimizers = {"wide": wide_opt, "deeptabular": deep_opt}
+    schedulers = {"wide": wide_sch, "deeptabular": deep_sch}
+    initializers = {"wide": KaimingNormal, "deeptabular": XavierNormal}
+    callbacks = [
+        LRHistory(n_epochs=10),
+        EarlyStopping(patience=5),
+        ModelCheckpoint(filepath="model_weights/wd_out"),
+    ]
+    metrics = [Accuracy, Precision]
+
+    trainer = Trainer(
+        model,
+        objective="binary",
+        optimizers=optimizers,
+        lr_schedulers=schedulers,
+        initializers=initializers,
+        callbacks=callbacks,
+        metrics=metrics,
+    )
+
+    trainer.fit(
+        X_wide=X_wide,
+        X_tab=X_tab,
+        target=target,
+        n_epochs=2,
+        batch_size=128,
+        val_split=0.2,
+    )
--- a/examples/airbnb_script.py
+++ b/examples/airbnb_script.py
@@ -3,21 +3,16 @@ import torch
 import pandas as pd
 from torchvision.transforms import ToTensor, Normalize

+import pytorch_widedeep as wd
 from pytorch_widedeep.optim import RAdam
-from pytorch_widedeep.models import (
-    Wide,
-    DeepText,
-    WideDeep,
-    DeepDense,
-    DeepImage,
-    DeepDenseResnet,
-)
+from pytorch_widedeep.models import TabResnet  # noqa: F401
+from pytorch_widedeep.models import Wide, TabMlp, DeepText, WideDeep, DeepImage
 from pytorch_widedeep.callbacks import EarlyStopping, ModelCheckpoint
 from pytorch_widedeep.initializers import KaimingNormal
 from pytorch_widedeep.preprocessing import (
+    TabPreprocessor,
    TextPreprocessor,
    WidePreprocessor,
-    DensePreprocessor,
    ImagePreprocessor,
 )

@@ -27,7 +22,7 @@ if __name__ == "__main__":

    df = pd.read_csv("data/airbnb/airbnb_sample.csv")

-    crossed_cols = (["property_type", "room_type"],)
+    crossed_cols = [("property_type", "room_type")]
    already_dummies = [c for c in df.columns if "amenity" in c] + ["has_house_rules"]
    wide_cols = [
        "is_location_exact",
@@ -50,15 +45,15 @@ if __name__ == "__main__":

    target = df[target].values

-    prepare_wide = WidePreprocessor(wide_cols=wide_cols, crossed_cols=crossed_cols)
-    X_wide = prepare_wide.fit_transform(df)
+    wide_preprocessor = WidePreprocessor(wide_cols=wide_cols, crossed_cols=crossed_cols)
+    X_wide = wide_preprocessor.fit_transform(df)

-    prepare_deep = DensePreprocessor(
+    tab_preprocessor = TabPreprocessor(
        embed_cols=cat_embed_cols,  # type: ignore[arg-type]
        continuous_cols=continuous_cols,
        already_standard=already_standard,
    )
-    X_deep = prepare_deep.fit_transform(df)
+    X_tab = tab_preprocessor.fit_transform(df)

    text_processor = TextPreprocessor(
        word_vectors_path=word_vectors_path, text_col=text_col
@@ -69,19 +64,19 @@ if __name__ == "__main__":
    X_images = image_processor.fit_transform(df)

    wide = Wide(wide_dim=np.unique(X_wide).shape[0], pred_dim=1)
-    deepdense = DeepDense(
-        hidden_layers=[64, 32],
-        dropout=[0.2, 0.2],
-        deep_column_idx=prepare_deep.deep_column_idx,
-        embed_input=prepare_deep.embeddings_input,
+    deepdense = TabMlp(
+        mlp_hidden_dims=[64, 32],
+        mlp_dropout=[0.2, 0.2],
+        column_idx=tab_preprocessor.column_idx,
+        embed_input=tab_preprocessor.embeddings_input,
        continuous_cols=continuous_cols,
    )
-    # # To use DeepDenseResnet as the deepdense component simply:
-    # deepdense = DeepDenseResnet(
-    #     blocks=[64, 32],
+    # # To use TabResnet as the deepdense component simply:
+    # deepdense = TabResnet(
+    #     blocks_dims=[64, 32],
    #     dropout=0.2,
-    #     deep_column_idx=prepare_deep.deep_column_idx,
-    #     embed_input=prepare_deep.embeddings_input,
+    #     column_idx=tab_preprocessor.column_idx,
+    #     embed_input=tab_preprocessor.embeddings_input,
    #     continuous_cols=continuous_cols,
    # )
    deeptext = DeepText(
@@ -90,15 +85,15 @@ if __name__ == "__main__":
        n_layers=3,
        rnn_dropout=0.5,
        padding_idx=1,
-        embedding_matrix=text_processor.embedding_matrix,
+        embed_matrix=text_processor.embedding_matrix,
    )
-    deepimage = DeepImage(pretrained=True, head_layers=None)
+    deepimage = DeepImage(pretrained=True, head_hidden_dims=None)
    model = WideDeep(
-        wide=wide, deepdense=deepdense, deeptext=deeptext, deepimage=deepimage
+        wide=wide, deeptabular=deepdense, deeptext=deeptext, deepimage=deepimage
    )

    wide_opt = torch.optim.Adam(model.wide.parameters(), lr=0.01)
-    deep_opt = torch.optim.Adam(model.deepdense.parameters())
+    deep_opt = torch.optim.Adam(model.deeptabular.parameters())
    text_opt = RAdam(model.deeptext.parameters())
    img_opt = RAdam(model.deepimage.parameters())

@@ -109,19 +104,19 @@ if __name__ == "__main__":

    optimizers = {
        "wide": wide_opt,
-        "deepdense": deep_opt,
+        "deeptabular": deep_opt,
        "deeptext": text_opt,
        "deepimage": img_opt,
    }
    schedulers = {
        "wide": wide_sch,
-        "deepdense": deep_sch,
+        "deeptabular": deep_sch,
        "deeptext": text_sch,
        "deepimage": img_sch,
    }
    initializers = {
        "wide": KaimingNormal,
-        "deepdense": KaimingNormal,
+        "deeptabular": KaimingNormal,
        "deeptext": KaimingNormal,
        "deepimage": KaimingNormal,
    }
@@ -130,8 +125,9 @@ if __name__ == "__main__":
    transforms = [ToTensor, Normalize(mean=mean, std=std)]
    callbacks = [EarlyStopping, ModelCheckpoint(filepath="model_weights/wd_out.pt")]

-    model.compile(
-        method="regression",
+    trainer = wd.Trainer(
+        model,
+        objective="regression",
        initializers=initializers,
        optimizers=optimizers,
        lr_schedulers=schedulers,
@@ -139,9 +135,9 @@ if __name__ == "__main__":
        transforms=transforms,
    )

-    model.fit(
+    trainer.fit(
        X_wide=X_wide,
-        X_deep=X_deep,
+        X_tab=X_tab,
        X_text=X_text,
        X_img=X_images,
        target=target,
@@ -151,11 +147,25 @@ if __name__ == "__main__":
    )

    # # With warm_up
-    # child = list(model.deepimage.children())[0]
-    # img_layers = list(child.backbone.children())[4:8] + [list(model.deepimage.children())[1]]
+    # child = list(trainer.model.deepimage.children())[0]
+    # img_layers = list(child.backbone.children())[4:8] + [
+    #     list(trainer.model.deepimage.children())[1]
+    # ]
    # img_layers = img_layers[::-1]

-    # model.fit(X_wide=X_wide, X_deep=X_deep, X_text=X_text, X_img=X_images,
-    #     target=target, n_epochs=1, batch_size=32, val_split=0.2, warm_up=True,
-    #     warm_epochs=1, warm_deepimage_gradual=True, warm_deepimage_layers=img_layers,
-    #     warm_deepimage_max_lr=0.01, warm_routine='howard')
+    # trainer.fit(
+    #     X_wide=X_wide,
+    #     X_tab=X_tab,
+    #     X_text=X_text,
+    #     X_img=X_images,
+    #     target=target,
+    #     n_epochs=1,
+    #     batch_size=32,
+    #     val_split=0.2,
+    #     warm_up=True,
+    #     warm_epochs=1,
+    #     warm_deepimage_gradual=True,
+    #     warm_deepimage_layers=img_layers,
+    #     warm_deepimage_max_lr=0.01,
+    #     warm_routine="howard",
+    # )
--- a/examples/airbnb_script_multiclass.py
+++ b/examples/airbnb_script_multiclass.py
@@ -2,9 +2,10 @@ import numpy as np
 import torch
 import pandas as pd

-from pytorch_widedeep.models import Wide, WideDeep, DeepDense
+import pytorch_widedeep as wd
+from pytorch_widedeep.models import Wide, TabMlp, WideDeep
 from pytorch_widedeep.metrics import F1Score, Accuracy
-from pytorch_widedeep.preprocessing import WidePreprocessor, DensePreprocessor
+from pytorch_widedeep.preprocessing import TabPreprocessor, WidePreprocessor

 use_cuda = torch.cuda.is_available()

@@ -12,7 +13,7 @@ if __name__ == "__main__":

    df = pd.read_csv("data/airbnb/airbnb_sample.csv")

-    crossed_cols = (["property_type", "room_type"],)
+    crossed_cols = [("property_type", "room_type")]
    already_dummies = [c for c in df.columns if "amenity" in c] + ["has_house_rules"]
    wide_cols = [
        "is_location_exact",
@@ -32,31 +33,32 @@ if __name__ == "__main__":
    target = "yield_cat"

    target = np.array(df[target].values)
-    prepare_wide = WidePreprocessor(wide_cols=wide_cols, crossed_cols=crossed_cols)
-    X_wide = prepare_wide.fit_transform(df)
+    wide_preprocessor = WidePreprocessor(wide_cols=wide_cols, crossed_cols=crossed_cols)
+    X_wide = wide_preprocessor.fit_transform(df)

-    prepare_deep = DensePreprocessor(
-        embed_cols=cat_embed_cols, continuous_cols=continuous_cols
+    tab_preprocessor = TabPreprocessor(
+        embed_cols=cat_embed_cols, continuous_cols=continuous_cols  # type: ignore[arg-type]
    )
-    X_deep = prepare_deep.fit_transform(df)
+    X_deep = tab_preprocessor.fit_transform(df)

    wide = Wide(wide_dim=np.unique(X_wide).shape[0], pred_dim=3)
-    deepdense = DeepDense(
-        hidden_layers=[64, 32],
-        dropout=[0.2, 0.2],
-        deep_column_idx=prepare_deep.deep_column_idx,
-        embed_input=prepare_deep.embeddings_input,
+    deepdense = TabMlp(
+        mlp_hidden_dims=[64, 32],
+        mlp_dropout=[0.2, 0.2],
+        column_idx=tab_preprocessor.column_idx,
+        embed_input=tab_preprocessor.embeddings_input,
        continuous_cols=continuous_cols,
    )
-    model = WideDeep(wide=wide, deepdense=deepdense, pred_dim=3)
+    model = WideDeep(wide=wide, deeptabular=deepdense, pred_dim=3)
    optimizer = torch.optim.Adam(model.parameters(), lr=0.03)
-    model.compile(
-        method="multiclass", metrics=[Accuracy, F1Score], optimizers=optimizer
+
+    trainer = wd.Trainer(
+        model, objective="multiclass", metrics=[Accuracy, F1Score], optimizers=optimizer
    )

-    model.fit(
+    trainer.fit(
        X_wide=X_wide,
-        X_deep=X_deep,
+        X_tab=X_deep,
        target=target,
        n_epochs=1,
        batch_size=32,

--- a/pypi_README.md
+++ b/pypi_README.md
@@ -4,27 +4,28 @@
 [![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://github.com/jrzaurin/pytorch-widedeep/graphs/commit-activity)
 [![contributions welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat)](https://github.com/jrzaurin/pytorch-widedeep/issues)
 [![codecov](https://codecov.io/gh/jrzaurin/pytorch-widedeep/branch/master/graph/badge.svg)](https://codecov.io/gh/jrzaurin/pytorch-widedeep)
- [![Python 3.6 3.7 3.8](https://img.shields.io/badge/python-3.6%20%7C%203.7%20%7C%203.8-blue.svg)](https://www.python.org/)
+[![Python 3.6 3.7 3.8](https://img.shields.io/badge/python-3.6%20%7C%203.7%20%7C%203.8-blue.svg)](https://www.python.org/)

 # pytorch-widedeep

-A flexible package to combine tabular data with text and images using wide and
-deep models.
+A flexible package to use Deep Learning with tabular data, text and images
+using wide and deep models.

 **Documentation:** [https://pytorch-widedeep.readthedocs.io](https://pytorch-widedeep.readthedocs.io/en/latest/index.html)

+**Companion posts:** [infinitoml](https://jrzaurin.github.io/infinitoml/)
+
 ### Introduction

-`pytorch-widedeep` is based on Google's Wide and Deep Algorithm. Details of
-the original algorithm can be found
-[here](https://www.tensorflow.org/tutorials/wide_and_deep), and the nice
-research paper can be found [here](https://arxiv.org/abs/1606.07792).
+`pytorch-widedeep` is based on Google's Wide and Deep Algorithm, [Wide & Deep
+Learning for Recommender Systems](https://arxiv.org/abs/1606.07792).

 In general terms, `pytorch-widedeep` is a package to use deep learning with
 tabular data. In particular, is intended to facilitate the combination of text
 and images with corresponding tabular data using wide and deep models. With
-that in mind there are two architectures that can be implemented with just a
-few lines of code. For details on these architectures please visit the
+that in mind there are a number of architectures that can be implemented with
+just a few lines of code. For details on the main components of those
+architectures please visit the
 [repo](https://github.com/jrzaurin/pytorch-widedeep).


@@ -81,17 +82,26 @@ Binary classification with the [adult
 dataset]([adult](https://www.kaggle.com/wenruliu/adult-income-dataset))
 using `Wide` and `DeepDense` and defaults settings.

+
+```python
+```
+
+Building a wide (linear) and deep model with ``pytorch-widedeep``:
+
 ```python
+
 import pandas as pd
 import numpy as np
 from sklearn.model_selection import train_test_split

-from pytorch_widedeep.preprocessing import WidePreprocessor, DensePreprocessor
-from pytorch_widedeep.models import Wide, DeepDense, WideDeep
+from pytorch_widedeep import Trainer
+from pytorch_widedeep.preprocessing import WidePreprocessor, TabPreprocessor
+from pytorch_widedeep.models import Wide, TabMlp, WideDeep
 from pytorch_widedeep.metrics import Accuracy

-# these next 4 lines are not directly related to pytorch-widedeep. I assume
-# you have downloaded the dataset and place it in a dir called data/adult/
+# the following 4 lines are not directly related to ``pytorch-widedeep``. I
+# assume you have downloaded the dataset and place it in a dir called
+# data/adult/
 df = pd.read_csv("data/adult/adult.csv.zip")
 df["income_label"] = (df["income"].apply(lambda x: ">50K" in x)).astype(int)
 df.drop("income", axis=1, inplace=True)
@@ -120,34 +130,28 @@ target_col = "income_label"
 target = df_train[target_col].values

 # wide
-preprocess_wide = WidePreprocessor(wide_cols=wide_cols, crossed_cols=cross_cols)
-X_wide = preprocess_wide.fit_transform(df_train)
+wide_preprocessor = WidePreprocessor(wide_cols=wide_cols, crossed_cols=cross_cols)
+X_wide = wide_preprocessor.fit_transform(df_train)
 wide = Wide(wide_dim=np.unique(X_wide).shape[0], pred_dim=1)

-# deepdense
-preprocess_deep = DensePreprocessor(embed_cols=embed_cols, continuous_cols=cont_cols)
-X_deep = preprocess_deep.fit_transform(df_train)
-deepdense = DeepDense(
-    hidden_layers=[64, 32],
-    deep_column_idx=preprocess_deep.deep_column_idx,
-    embed_input=preprocess_deep.embeddings_input,
+# deeptabular
+tab_preprocessor = TabPreprocessor(embed_cols=embed_cols, continuous_cols=cont_cols)
+X_tab = tab_preprocessor.fit_transform(df_train)
+deeptabular = TabMlp(
+    mlp_hidden_dims=[64, 32],
+    column_idx=tab_preprocessor.column_idx,
+    embed_input=tab_preprocessor.embeddings_input,
    continuous_cols=cont_cols,
 )
-# # To use DeepDenseResnet as the deepdense component simply:
-# from pytorch_widedeep.models import DeepDenseResnet:
-# deepdense = DeepDenseResnet(
-#     blocks=[64, 32],
-#     deep_column_idx=preprocess_deep.deep_column_idx,
-#     embed_input=preprocess_deep.embeddings_input,
-#     continuous_cols=cont_cols,
-# )
-
-# build, compile and fit
-model = WideDeep(wide=wide, deepdense=deepdense)
-model.compile(method="binary", metrics=[Accuracy])
-model.fit(
+
+# wide and deep
+model = WideDeep(wide=wide, deeptabular=deeptabular)
+
+# train the model
+trainer = Trainer(model, objective="binary", metrics=[Accuracy])
+trainer.fit(
    X_wide=X_wide,
-    X_deep=X_deep,
+    X_tab=X_tab,
    target=target,
    n_epochs=5,
    batch_size=256,
@@ -155,17 +159,17 @@ model.fit(
 )

 # predict
-X_wide_te = preprocess_wide.transform(df_test)
-X_deep_te = preprocess_deep.transform(df_test)
-preds = model.predict(X_wide=X_wide_te, X_deep=X_deep_te)
+X_wide_te = wide_preprocessor.transform(df_test)
+X_tab_te = tab_preprocessor.transform(df_test)
+preds = trainer.predict(X_wide=X_wide_te, X_tab=X_tab_te)
+
+# save and load
+trainer.save_model("model_weights/model.t")
 ```

-Of course, one can do much more, such as using different initializations,
-optimizers or learning rate schedulers for each component of the overall
-model. Adding FC-Heads to the Text and Image components. Using the [Focal
-Loss](https://arxiv.org/abs/1708.02002), warming up individual components
-before joined training, etc. See the `examples` or the `docs` folders for a
-better understanding of the content of the package and its functionalities.
+Of course, one can do **much more**. See the Examples folder, the
+documentation or the companion posts for a better understanding of the content
+of the package and its functionalities.

 ### Testing


--- a/pytorch_widedeep/__init__.py
+++ b/pytorch_widedeep/__init__.py
@@ -3,9 +3,14 @@
 ##################################################
 import os.path

+###############################################################
+# utils module accessible directly from pytorch-widedeep.<util>
+##############################################################
+from pytorch_widedeep.utils import (
+    text_utils,
+    image_utils,
+    deeptabular_utils,
+    fastai_transforms,
+)
 from pytorch_widedeep.version import __version__
-
-##################################################
-# utils module accessible from pytorch-widedeep
-##################################################
-from .utils import text_utils, dense_utils, image_utils, fastai_transforms
+from pytorch_widedeep.training import Trainer
--- a/pytorch_widedeep/callbacks.py
+++ b/pytorch_widedeep/callbacks.py
--- a/pytorch_widedeep/initializers.py
+++ b/pytorch_widedeep/initializers.py
@@ -3,7 +3,9 @@ import warnings

 from torch import nn

-from .wdtypes import *
+from pytorch_widedeep.wdtypes import *  # noqa: F403
+
+warnings.filterwarnings("default")


 class Initializer(object):
@@ -27,9 +29,11 @@ class MultipleInitializer(object):
        for name, child in model.named_children():
            try:
                self._initializers[name](child)
-            except:
+            except KeyError:
                if self.verbose:
-                    warnings.warn("No initializer found for {}".format(name))
+                    warnings.warn(
+                        "No initializer found for {}".format(name), UserWarning
+                    )


 class Normal(Initializer):
@@ -103,7 +107,7 @@ class XavierUniform(Initializer):
                elif p.requires_grad:
                    try:
                        nn.init.xavier_uniform_(p, gain=self.gain)
-                    except:
+                    except Exception:
                        pass


@@ -121,7 +125,7 @@ class XavierNormal(Initializer):
                elif p.requires_grad:
                    try:
                        nn.init.xavier_normal_(p, gain=self.gain)
-                    except:
+                    except Exception:
                        pass


@@ -143,7 +147,7 @@ class KaimingUniform(Initializer):
                        nn.init.kaiming_normal_(
                            p, a=self.a, mode=self.mode, nonlinearity=self.nonlinearity
                        )
-                    except:
+                    except Exception:
                        pass


@@ -165,7 +169,7 @@ class KaimingNormal(Initializer):
                        nn.init.kaiming_normal_(
                            p, a=self.a, mode=self.mode, nonlinearity=self.nonlinearity
                        )
-                    except:
+                    except Exception:
                        pass


@@ -183,5 +187,5 @@ class Orthogonal(Initializer):
                elif p.requires_grad:
                    try:
                        nn.init.orthogonal_(p, gain=self.gain)
-                    except:
+                    except Exception:
                        pass
--- a/pytorch_widedeep/losses.py
+++ b/pytorch_widedeep/losses.py
@@ -2,25 +2,30 @@ import torch
 import torch.nn as nn
 import torch.nn.functional as F

-from .wdtypes import *
+from pytorch_widedeep.wdtypes import *  # noqa: F403

 use_cuda = torch.cuda.is_available()


 class FocalLoss(nn.Module):
-    r"""Implementation of the `focal loss
-    <https://arxiv.org/pdf/1708.02002.pdf>`_ for both binary and
-    multiclass classification
-
-    Parameters
-    ----------
-    alpha: float
-        Focal Loss ``alpha`` parameter
-    gamma: float
-        Focal Loss ``gamma`` parameter
-    """
-
    def __init__(self, alpha: float = 0.25, gamma: float = 1.0):
+        r"""Implementation of the `focal loss
+        <https://arxiv.org/pdf/1708.02002.pdf>`_ for both binary and
+        multiclass classification
+
+        :math:`FL(p_t) = \alpha (1 - p_t)^{\gamma} log(p_t)`
+
+        where, for a case of a binary classification problem
+
+        :math:`\begin{equation} p_t= \begin{cases}p, & \text{if $y=1$}.\\1-p, & \text{otherwise}. \end{cases} \end{equation}`
+
+        Parameters
+        ----------
+        alpha: float
+            Focal Loss ``alpha`` parameter
+        gamma: float
+            Focal Loss ``gamma`` parameter
+        """
        super().__init__()
        self.alpha = alpha
        self.gamma = gamma
@@ -30,9 +35,8 @@ class FocalLoss(nn.Module):
        w = self.alpha * t + (1 - self.alpha) * (1 - t)  # type: ignore
        return (w * (1 - pt).pow(self.gamma)).detach()  # type: ignore

-    def forward(self, input: Tensor, target: Tensor) -> Tensor:  # type: ignore
-        r"""Focal Loss computation
-
+    def forward(self, input: Tensor, target: Tensor) -> Tensor:
+        r"""
        Parameters
        ----------
        input: Tensor
@@ -47,13 +51,13 @@ class FocalLoss(nn.Module):
        >>> from pytorch_widedeep.losses import FocalLoss
        >>>
        >>> # BINARY
-        >>> target = torch.tensor([0, 1, 0, 1])
+        >>> target = torch.tensor([0, 1, 0, 1]).view(-1, 1)
        >>> input = torch.tensor([[0.6, 0.7, 0.3, 0.8]]).t()
        >>> FocalLoss()(input, target)
        tensor(0.1762)
        >>>
        >>> # MULTICLASS
-        >>> target = torch.tensor([1, 0, 2])
+        >>> target = torch.tensor([1, 0, 2]).view(-1, 1)
        >>> input = torch.tensor([[0.2, 0.5, 0.3], [0.8, 0.1, 0.1], [0.7, 0.2, 0.1]])
        >>> FocalLoss()(input, target)
        tensor(0.2573)
@@ -64,7 +68,7 @@ class FocalLoss(nn.Module):
            num_class = 2
        else:
            num_class = input_prob.size(1)
-        binary_target = torch.eye(num_class)[target.long()]
+        binary_target = torch.eye(num_class)[target.squeeze().long()]
        if use_cuda:
            binary_target = binary_target.cuda()
        binary_target = binary_target.contiguous()
@@ -72,3 +76,87 @@ class FocalLoss(nn.Module):
        return F.binary_cross_entropy(
            input_prob, binary_target, weight, reduction="mean"
        )
+
+
+class MSLELoss(nn.Module):
+    def __init__(self):
+        r"""mean squared log error"""
+        super().__init__()
+        self.mse = nn.MSELoss()
+
+    def forward(self, input: Tensor, target: Tensor) -> Tensor:
+        r"""
+        Parameters
+        ----------
+        input: Tensor
+            input tensor with predictions (not probabilities)
+        target: Tensor
+            target tensor with the actual classes
+
+        Examples
+        --------
+        >>> import torch
+        >>> from pytorch_widedeep.losses import MSLELoss
+        >>>
+        >>> target = torch.tensor([1, 1.2, 0, 2]).view(-1, 1)
+        >>> input = torch.tensor([0.6, 0.7, 0.3, 0.8]).view(-1, 1)
+        >>> MSLELoss()(input, target)
+        tensor(0.1115)
+        """
+        return self.mse(torch.log(input + 1), torch.log(target + 1))
+
+
+class RMSELoss(nn.Module):
+    def __init__(self):
+        """root mean squared error"""
+        super().__init__()
+        self.mse = nn.MSELoss()
+
+    def forward(self, input: Tensor, target: Tensor) -> Tensor:
+        r"""
+        Parameters
+        ----------
+        input: Tensor
+            input tensor with predictions (not probabilities)
+        target: Tensor
+            target tensor with the actual classes
+
+        Examples
+        --------
+        >>> import torch
+        >>> from pytorch_widedeep.losses import RMSELoss
+        >>>
+        >>> target = torch.tensor([1, 1.2, 0, 2]).view(-1, 1)
+        >>> input = torch.tensor([0.6, 0.7, 0.3, 0.8]).view(-1, 1)
+        >>> RMSELoss()(input, target)
+        tensor(0.6964)
+        """
+        return torch.sqrt(self.mse(input, target))
+
+
+class RMSLELoss(nn.Module):
+    def __init__(self):
+        """root mean squared log error"""
+        super().__init__()
+        self.mse = nn.MSELoss()
+
+    def forward(self, input: Tensor, target: Tensor) -> Tensor:
+        r"""
+        Parameters
+        ----------
+        input: Tensor
+            input tensor with predictions (not probabilities)
+        target: Tensor
+            target tensor with the actual classes
+
+        Examples
+        --------
+        >>> import torch
+        >>> from pytorch_widedeep.losses import RMSLELoss
+        >>>
+        >>> target = torch.tensor([1, 1.2, 0, 2]).view(-1, 1)
+        >>> input = torch.tensor([0.6, 0.7, 0.3, 0.8]).view(-1, 1)
+        >>> RMSLELoss()(input, target)
+        tensor(0.3339)
+        """
+        return torch.sqrt(self.mse(torch.log(input + 1), torch.log(target + 1)))
--- a/pytorch_widedeep/metrics.py
+++ b/pytorch_widedeep/metrics.py
--- a/pytorch_widedeep/models/__init__.py
+++ b/pytorch_widedeep/models/__init__.py
-from .wide import Wide
-from .deep_text import DeepText
-from .wide_deep import WideDeep
-from .deep_dense import DeepDense
-from .deep_image import DeepImage
-from .deep_dense_resnet import DeepDenseResnet
+from pytorch_widedeep.models.wide import Wide
+from pytorch_widedeep.models.tab_mlp import TabMlp
+from pytorch_widedeep.models.deep_text import DeepText
+from pytorch_widedeep.models.wide_deep import WideDeep
+from pytorch_widedeep.models.deep_image import DeepImage
+from pytorch_widedeep.models.tab_resnet import TabResnet
+from pytorch_widedeep.models.tab_transformer import TabTransformer
--- a/pytorch_widedeep/models/deep_dense.py
+++ b/pytorch_widedeep/models/deep_dense.py
--- a/pytorch_widedeep/models/deep_dense_resnet.py
+++ b/pytorch_widedeep/models/deep_dense_resnet.py
--- a/pytorch_widedeep/models/deep_image.py
+++ b/pytorch_widedeep/models/deep_image.py
--- a/pytorch_widedeep/models/deep_text.py
+++ b/pytorch_widedeep/models/deep_text.py
--- a/pytorch_widedeep/models/tab_mlp.py
+++ b/pytorch_widedeep/models/tab_mlp.py
--- a/pytorch_widedeep/models/tab_resnet.py
+++ b/pytorch_widedeep/models/tab_resnet.py
--- a/pytorch_widedeep/models/tab_transformer.py
+++ b/pytorch_widedeep/models/tab_transformer.py
--- a/pytorch_widedeep/models/wide.py
+++ b/pytorch_widedeep/models/wide.py
@@ -3,41 +3,41 @@ import math
 import torch
 from torch import nn

-from ..wdtypes import *
+from pytorch_widedeep.wdtypes import *  # noqa: F403


 class Wide(nn.Module):
-    r"""Wide component
-
-    Linear model implemented via an Embedding layer connected to the output
-    neuron(s).
-
-    Parameters
-    -----------
-    wide_dim: int
-        size of the Embedding layer. `wide_dim` is the summation of all the
-        individual values for all the features that go through the wide
-        component. For example, if the wide component receives 2 features with
-        5 individual values each, `wide_dim = 10`
-    pred_dim: int
-        size of the ouput tensor containing the predictions
-
-    Attributes
-    -----------
-    wide_linear: :obj:`nn.Module`
-        the linear layer that comprises the wide branch of the model
-
-    Examples
-    --------
-    >>> import torch
-    >>> from pytorch_widedeep.models import Wide
-    >>> X = torch.empty(4, 4).random_(6)
-    >>> wide = Wide(wide_dim=X.unique().size(0), pred_dim=1)
-    >>> out = wide(X)
-    """
-
    def __init__(self, wide_dim: int, pred_dim: int = 1):
+        r"""wide (linear) component
+
+        Linear model implemented via an Embedding layer connected to the output
+        neuron(s).
+
+        Parameters
+        -----------
+        wide_dim: int
+            size of the Embedding layer. `wide_dim` is the summation of all the
+            individual values for all the features that go through the wide
+            component. For example, if the wide component receives 2 features with
+            5 individual values each, `wide_dim = 10`
+        pred_dim: int, default = 1
+            size of the ouput tensor containing the predictions
+
+        Attributes
+        -----------
+        wide_linear: :obj:`nn.Module`
+            the linear layer that comprises the wide branch of the model
+
+        Examples
+        --------
+        >>> import torch
+        >>> from pytorch_widedeep.models import Wide
+        >>> X = torch.empty(4, 4).random_(6)
+        >>> wide = Wide(wide_dim=X.unique().size(0), pred_dim=1)
+        >>> out = wide(X)
+        """
        super(Wide, self).__init__()
+        # Embeddings: val + 1 because 0 is reserved for padding/unseen cateogories.
        self.wide_linear = nn.Embedding(wide_dim + 1, pred_dim, padding_idx=0)
        # (Sum(Embedding) + bias) is equivalent to (OneHotVector + Linear)
        self.bias = nn.Parameter(torch.zeros(pred_dim))

--- a/pytorch_widedeep/models/wide_deep.py
+++ b/pytorch_widedeep/models/wide_deep.py
--- a/pytorch_widedeep/optim/__init__.py
+++ b/pytorch_widedeep/optim/__init__.py
-from .radam import RAdam
+from pytorch_widedeep.optim.radam import RAdam
--- a/pytorch_widedeep/optim/radam.py
+++ b/pytorch_widedeep/optim/radam.py
--- a/pytorch_widedeep/preprocessing/__init__.py
+++ b/pytorch_widedeep/preprocessing/__init__.py
-from ._preprocessors import (
-    TextPreprocessor,
-    WidePreprocessor,
-    DensePreprocessor,
-    ImagePreprocessor,
-)
+from pytorch_widedeep.preprocessing.preprocessors import *
--- a/pytorch_widedeep/preprocessing/_preprocessors.py
+++ b/pytorch_widedeep/preprocessing/_preprocessors.py
--- a/pytorch_widedeep/training/__init__.py
+++ b/pytorch_widedeep/training/__init__.py
+from pytorch_widedeep.training.trainer import Trainer
--- a/pytorch_widedeep/models/_warmup.py
+++ b/pytorch_widedeep/models/_warmup.py
--- a/pytorch_widedeep/training/_loss_and_obj_aliases.py
+++ b/pytorch_widedeep/training/_loss_and_obj_aliases.py
--- a/pytorch_widedeep/models/_multiple_lr_scheduler.py
+++ b/pytorch_widedeep/models/_multiple_lr_scheduler.py
-from ..wdtypes import *
+from pytorch_widedeep.wdtypes import *  # noqa: F403


 class MultipleLRScheduler(object):

--- a/pytorch_widedeep/models/_multiple_optimizer.py
+++ b/pytorch_widedeep/models/_multiple_optimizer.py
-from ..wdtypes import *
+from pytorch_widedeep.wdtypes import *  # noqa: F403


 class MultipleOptimizer(object):

--- a/pytorch_widedeep/models/_multiple_transforms.py
+++ b/pytorch_widedeep/models/_multiple_transforms.py
 from torchvision.transforms import Compose

-from ..wdtypes import *
+from pytorch_widedeep.wdtypes import *  # noqa: F403


 class MultipleTransforms(object):

--- a/pytorch_widedeep/models/_wd_dataset.py
+++ b/pytorch_widedeep/models/_wd_dataset.py
--- a/pytorch_widedeep/training/trainer.py
+++ b/pytorch_widedeep/training/trainer.py
--- a/pytorch_widedeep/utils/__init__.py
+++ b/pytorch_widedeep/utils/__init__.py
-from .text_utils import *
-from .dense_utils import *
-from .image_utils import *
-from .fastai_transforms import *
+from pytorch_widedeep.utils.text_utils import *
+from pytorch_widedeep.utils.image_utils import *
+from pytorch_widedeep.utils.deeptabular_utils import *
+from pytorch_widedeep.utils.fastai_transforms import *
--- a/pytorch_widedeep/utils/dense_utils.py
+++ b/pytorch_widedeep/utils/dense_utils.py
--- a/pytorch_widedeep/utils/fastai_transforms.py
+++ b/pytorch_widedeep/utils/fastai_transforms.py
--- a/pytorch_widedeep/utils/general_utils.py
+++ b/pytorch_widedeep/utils/general_utils.py
--- a/pytorch_widedeep/utils/image_utils.py
+++ b/pytorch_widedeep/utils/image_utils.py
--- a/pytorch_widedeep/utils/text_utils.py
+++ b/pytorch_widedeep/utils/text_utils.py
--- a/pytorch_widedeep/version.py
+++ b/pytorch_widedeep/version.py
-__version__ = "0.4.7"
+__version__ = "0.4.8"
--- a/setup.py
+++ b/setup.py
@@ -62,6 +62,8 @@ setup_kwargs = {
        "tqdm",
        "torch",
        "torchvision",
+        "einops",
+        "wrapt",
    ],
    "extras_require": extras,
    "python_requires": ">=3.6.0",

--- a/tests/test_data_utils/test_du_deep_image.py
+++ b/tests/test_data_utils/test_du_deep_image.py
--- a/tests/test_data_utils/test_du_deep_dense.py
+++ b/tests/test_data_utils/test_du_deep_dense.py
--- a/tests/test_data_utils/test_du_deep_text.py
+++ b/tests/test_data_utils/test_du_deep_text.py
--- a/tests/test_data_utils/test_du_wide.py
+++ b/tests/test_data_utils/test_du_wide.py
--- a/tests/test_data_utils/test_fastai_transforms.py
+++ b/tests/test_data_utils/test_fastai_transforms.py
--- a/tests/test_warm_up/__init__.py
+++ b/tests/test_warm_up/__init__.py
--- a/tests/test_warm_up/test_warm_up_routines.py
+++ b/tests/test_warm_up/test_warm_up_routines.py
--- a/tests/test_losses/test_losses.py
+++ b/tests/test_losses/test_losses.py
--- a/tests/test_model_functioning/test_metrics.py
+++ b/tests/test_model_functioning/test_metrics.py
--- a/tests/test_model_components/test_mc_deep_dense_resnet.py
+++ b/tests/test_model_components/test_mc_deep_dense_resnet.py
--- a/tests/test_model_components/test_mc_deep_image.py
+++ b/tests/test_model_components/test_mc_deep_image.py
--- a/tests/test_model_components/test_mc_deep_text.py
+++ b/tests/test_model_components/test_mc_deep_text.py
--- a/tests/test_model_components/test_mc_deep_dense.py
+++ b/tests/test_model_components/test_mc_deep_dense.py
--- a/tests/test_model_components/test_mc_tab_resnet.py
+++ b/tests/test_model_components/test_mc_tab_resnet.py
--- a/tests/test_model_components/test_mc_tab_transformer.py
+++ b/tests/test_model_components/test_mc_tab_transformer.py
--- a/tests/test_model_components/test_wide_deep.py
+++ b/tests/test_model_components/test_wide_deep.py
--- a/tests/test_model_functioning/test_callbacks.py
+++ b/tests/test_model_functioning/test_callbacks.py
--- a/tests/test_model_functioning/test_data_inputs.py
+++ b/tests/test_model_functioning/test_data_inputs.py
--- a/tests/test_model_functioning/test_fit_methods.py
+++ b/tests/test_model_functioning/test_fit_methods.py
--- a/tests/test_model_functioning/test_focal_loss.py
+++ b/tests/test_model_functioning/test_focal_loss.py
--- a/tests/test_model_functioning/test_initializers.py
+++ b/tests/test_model_functioning/test_initializers.py
--- a/tests/test_model_functioning/test_miscellaneous.py
+++ b/tests/test_model_functioning/test_miscellaneous.py