Review docs. Ready to check installations

3227f159 · Javier Rodriguez Zaurin · c23fed86 · 3227f159 · 3227f159 · 3227f159
40 changed file
--- a/mkdocs/site/objects.inv
+++ b/mkdocs/site/objects.inv
--- a/mkdocs/site/pytorch-widedeep/losses.html
+++ b/mkdocs/site/pytorch-widedeep/losses.html
--- a/mkdocs/site/pytorch-widedeep/losses.md
+++ b/mkdocs/site/pytorch-widedeep/losses.md
@@ -50,3 +50,9 @@ from pytorch_widedeep.losses import FocalLoss
 ::: pytorch_widedeep.losses.FocalR_RMSELoss

 ::: pytorch_widedeep.losses.HuberLoss
+
+::: pytorch_widedeep.losses.InfoNCELoss
+
+::: pytorch_widedeep.losses.DenoisingLoss
+
+::: pytorch_widedeep.losses.EncoderDecoderLoss
--- a/mkdocs/site/pytorch-widedeep/model_components.html
+++ b/mkdocs/site/pytorch-widedeep/model_components.html
--- a/mkdocs/site/pytorch-widedeep/model_components.md
+++ b/mkdocs/site/pytorch-widedeep/model_components.md
@@ -3,7 +3,9 @@
 This module contains the models that can be used as the four main components
 that will comprise a Wide and Deep model (``wide``, ``deeptabular``,
 ``deeptext``, ``deepimage``), as well as the ``WideDeep`` "constructor"
-class. Note that each of the four components can be used independently.
+class. Note that each of the four components can be used independently. It
+also contains all the documentation for the models that can be used for
+self-supervised pre-training with tabular data.


 ::: pytorch_widedeep.models.tabular.linear.wide.Wide
@@ -82,16 +84,15 @@ class. Note that each of the four components can be used independently.
 :information_source: **NOTE**: when we started developing the library we
 thought that combining Deep Learning architectures for tabular data, with
 CNN-based architectures (pretrained or not) for images and Transformer-based
- architectures for text would be an _'overkill'_(also, pretrained
+ architectures for text would be an _'overkill'_  (also, pretrained
 transformer-based models were not as readily available as they are today).
 Therefore, at that time we made the decision of including in the library
 simple RNN-based architectures for the text dataset. A lot has passed since
 then and it is our intention to integrate this library with the
- [Hugginface's Transformers library]
- (https://huggingface.co/docs/transformers/main/en/index) in the near future.
- Nonetheless, note that it is still possible to use any custom model as the
- `deeptext` component using this library. Please, see the example section in
- this documentation for details
+ [Hugginface's Transformers library](https://huggingface.co/docs/transformers/main/en/index)
+ in the near future. Nonetheless, note that it is still possible to use any
+ custom model as the `deeptext` component using this library. Please, see the
+ example section in this documentation for details

 ::: pytorch_widedeep.models.text.attentive_rnn.BasicRNN
    selection:

--- a/mkdocs/site/pytorch-widedeep/preprocessing.html
+++ b/mkdocs/site/pytorch-widedeep/preprocessing.html
--- a/mkdocs/site/pytorch-widedeep/self_supervised_pretraining.html
+++ b/mkdocs/site/pytorch-widedeep/self_supervised_pretraining.html
@@ -1078,7 +1078,9 @@
 user to self-suerpvised pre-training for all tabular models in the library
 with the exception of the <code>TabPerceiver</code> (this is a particular model and
 self-supervised pre-training requires some adjustments that will be
-implemented in future versions).</p>
+implemented in future versions). Please see the examples folder in the repo
+or the examples section in the docs for details on how to use self-supervised
+pre-training with this library.</p>
 <p>The two routines implemented are illustrated in the figures below. The first
 is from <a href="https://arxiv.org/abs/1908.07442">TabNet: Attentive Interpretable Tabular Learning</a>.
 It is a <em>'standard'</em> encoder-decoder architecture and and is designed here for

--- a/mkdocs/site/pytorch-widedeep/self_supervised_pretraining.md
+++ b/mkdocs/site/pytorch-widedeep/self_supervised_pretraining.md
@@ -4,7 +4,10 @@ In this library we have implemented two methods or routines that allow the
 user to self-suerpvised pre-training for all tabular models in the library
 with the exception of the `TabPerceiver` (this is a particular model and
 self-supervised pre-training requires some adjustments that will be
-implemented in future versions).
+implemented in future versions). Please see the examples folder in the repo
+or the examples section in the docs for details on how to use self-supervised
+pre-training with this library.
+

 The two routines implemented are illustrated in the figures below. The first
 is from [TabNet: Attentive Interpretable Tabular Learning](https://arxiv.org/abs/1908.07442).

--- a/mkdocs/site/pytorch-widedeep/trainer.html
+++ b/mkdocs/site/pytorch-widedeep/trainer.html
--- a/mkdocs/site/pytorch-widedeep/utils/fastai_transforms.html
+++ b/mkdocs/site/pytorch-widedeep/utils/fastai_transforms.html
--- a/mkdocs/site/pytorch-widedeep/utils/index.html
+++ b/mkdocs/site/pytorch-widedeep/utils/index.html
@@ -1035,6 +1035,8 @@ the classes and functions discussed here are available directly from the
 <code>deeptabular_utils</code> submodule can be imported as:</p>
 <div class="highlight"><pre><span></span><code>from pytorch_widedeep.utils import LabelEncoder
 </code></pre></div>
+<p>These are classes and functions that are internally used in the library. We
+include them here in case the user finds them useful for other purposes.</p>

              
            </article>

--- a/mkdocs/site/pytorch-widedeep/utils/index.md
+++ b/mkdocs/site/pytorch-widedeep/utils/index.md
@@ -10,3 +10,6 @@ the classes and functions discussed here are available directly from the
 ```
 from pytorch_widedeep.utils import LabelEncoder
 ```
+
+These are classes and functions that are internally used in the library. We
+include them here in case the user finds them useful for other purposes.
--- a/mkdocs/site/search/search_index.json
+++ b/mkdocs/site/search/search_index.json
--- a/mkdocs/site/sitemap.xml
+++ b/mkdocs/site/sitemap.xml
@@ -2,182 +2,182 @@
 <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <url>
         <loc>https://pytorch-widedeep.readthedocs.io/index.html</loc>
-         <lastmod>2022-08-17</lastmod>
+         <lastmod>2022-08-20</lastmod>
         <changefreq>daily</changefreq>
    </url>
    <url>
         <loc>https://pytorch-widedeep.readthedocs.io/contributing.html</loc>
-         <lastmod>2022-08-17</lastmod>
+         <lastmod>2022-08-20</lastmod>
         <changefreq>daily</changefreq>
    </url>
    <url>
         <loc>https://pytorch-widedeep.readthedocs.io/installation.html</loc>
-         <lastmod>2022-08-17</lastmod>
+         <lastmod>2022-08-20</lastmod>
         <changefreq>daily</changefreq>
    </url>
    <url>
         <loc>https://pytorch-widedeep.readthedocs.io/quick_start.html</loc>
-         <lastmod>2022-08-17</lastmod>
+         <lastmod>2022-08-20</lastmod>
         <changefreq>daily</changefreq>
    </url>
    <url>
         <loc>https://pytorch-widedeep.readthedocs.io/examples/01_Preprocessors_and_utils.html</loc>
-         <lastmod>2022-08-17</lastmod>
+         <lastmod>2022-08-20</lastmod>
         <changefreq>daily</changefreq>
    </url>
    <url>
         <loc>https://pytorch-widedeep.readthedocs.io/examples/02_model_components.html</loc>
-         <lastmod>2022-08-17</lastmod>
+         <lastmod>2022-08-20</lastmod>
         <changefreq>daily</changefreq>
    </url>
    <url>
         <loc>https://pytorch-widedeep.readthedocs.io/examples/03_Binary_Classification_with_Defaults.html</loc>
-         <lastmod>2022-08-17</lastmod>
+         <lastmod>2022-08-20</lastmod>
         <changefreq>daily</changefreq>
    </url>
    <url>
         <loc>https://pytorch-widedeep.readthedocs.io/examples/04_regression_with_images_and_text.html</loc>
-         <lastmod>2022-08-17</lastmod>
+         <lastmod>2022-08-20</lastmod>
         <changefreq>daily</changefreq>
    </url>
    <url>
         <loc>https://pytorch-widedeep.readthedocs.io/examples/05_save_and_load_model_and_artifacts.html</loc>
-         <lastmod>2022-08-17</lastmod>
+         <lastmod>2022-08-20</lastmod>
         <changefreq>daily</changefreq>
    </url>
    <url>
         <loc>https://pytorch-widedeep.readthedocs.io/examples/06_fineTune_and_warmup.html</loc>
-         <lastmod>2022-08-17</lastmod>
+         <lastmod>2022-08-20</lastmod>
         <changefreq>daily</changefreq>
    </url>
    <url>
         <loc>https://pytorch-widedeep.readthedocs.io/examples/07_Custom_Components.html</loc>
-         <lastmod>2022-08-17</lastmod>
+         <lastmod>2022-08-20</lastmod>
         <changefreq>daily</changefreq>
    </url>
    <url>
         <loc>https://pytorch-widedeep.readthedocs.io/examples/08_custom_dataLoader_imbalanced_dataset.html</loc>
-         <lastmod>2022-08-17</lastmod>
+         <lastmod>2022-08-20</lastmod>
         <changefreq>daily</changefreq>
    </url>
    <url>
         <loc>https://pytorch-widedeep.readthedocs.io/examples/09_extracting_embeddings.html</loc>
-         <lastmod>2022-08-17</lastmod>
+         <lastmod>2022-08-20</lastmod>
         <changefreq>daily</changefreq>
    </url>
    <url>
         <loc>https://pytorch-widedeep.readthedocs.io/examples/11_auc_multiclass.html</loc>
-         <lastmod>2022-08-17</lastmod>
+         <lastmod>2022-08-20</lastmod>
         <changefreq>daily</changefreq>
    </url>
    <url>
         <loc>https://pytorch-widedeep.readthedocs.io/examples/12_ZILNLoss_origkeras_vs_pytorch_multimodal.html</loc>
-         <lastmod>2022-08-17</lastmod>
+         <lastmod>2022-08-20</lastmod>
         <changefreq>daily</changefreq>
    </url>
    <url>
         <loc>https://pytorch-widedeep.readthedocs.io/examples/13_Model_Uncertainty_prediction.html</loc>
-         <lastmod>2022-08-17</lastmod>
+         <lastmod>2022-08-20</lastmod>
         <changefreq>daily</changefreq>
    </url>
    <url>
         <loc>https://pytorch-widedeep.readthedocs.io/examples/14_bayesian_models.html</loc>
-         <lastmod>2022-08-17</lastmod>
+         <lastmod>2022-08-20</lastmod>
         <changefreq>daily</changefreq>
    </url>
    <url>
         <loc>https://pytorch-widedeep.readthedocs.io/examples/15_DIR-LDS_and_FDS.html</loc>
-         <lastmod>2022-08-17</lastmod>
+         <lastmod>2022-08-20</lastmod>
         <changefreq>daily</changefreq>
    </url>
    <url>
         <loc>https://pytorch-widedeep.readthedocs.io/examples/16_Self_Supervised_Pretraning_pt1.html</loc>
-         <lastmod>2022-08-17</lastmod>
+         <lastmod>2022-08-20</lastmod>
         <changefreq>daily</changefreq>
    </url>
    <url>
         <loc>https://pytorch-widedeep.readthedocs.io/examples/16_Self_Supervised_Pretraning_pt2.html</loc>
-         <lastmod>2022-08-17</lastmod>
+         <lastmod>2022-08-20</lastmod>
         <changefreq>daily</changefreq>
    </url>
    <url>
         <loc>https://pytorch-widedeep.readthedocs.io/pytorch-widedeep/bayesian_models.html</loc>
-         <lastmod>2022-08-17</lastmod>
+         <lastmod>2022-08-20</lastmod>
         <changefreq>daily</changefreq>
    </url>
    <url>
         <loc>https://pytorch-widedeep.readthedocs.io/pytorch-widedeep/bayesian_trainer.html</loc>
-         <lastmod>2022-08-17</lastmod>
+         <lastmod>2022-08-20</lastmod>
         <changefreq>daily</changefreq>
    </url>
    <url>
         <loc>https://pytorch-widedeep.readthedocs.io/pytorch-widedeep/callbacks.html</loc>
-         <lastmod>2022-08-17</lastmod>
+         <lastmod>2022-08-20</lastmod>
         <changefreq>daily</changefreq>
    </url>
    <url>
         <loc>https://pytorch-widedeep.readthedocs.io/pytorch-widedeep/dataloaders.html</loc>
-         <lastmod>2022-08-17</lastmod>
+         <lastmod>2022-08-20</lastmod>
         <changefreq>daily</changefreq>
    </url>
    <url>
         <loc>https://pytorch-widedeep.readthedocs.io/pytorch-widedeep/losses.html</loc>
-         <lastmod>2022-08-17</lastmod>
+         <lastmod>2022-08-20</lastmod>
         <changefreq>daily</changefreq>
    </url>
    <url>
         <loc>https://pytorch-widedeep.readthedocs.io/pytorch-widedeep/metrics.html</loc>
-         <lastmod>2022-08-17</lastmod>
+         <lastmod>2022-08-20</lastmod>
         <changefreq>daily</changefreq>
    </url>
    <url>
         <loc>https://pytorch-widedeep.readthedocs.io/pytorch-widedeep/model_components.html</loc>
-         <lastmod>2022-08-17</lastmod>
+         <lastmod>2022-08-20</lastmod>
         <changefreq>daily</changefreq>
    </url>
    <url>
         <loc>https://pytorch-widedeep.readthedocs.io/pytorch-widedeep/preprocessing.html</loc>
-         <lastmod>2022-08-17</lastmod>
+         <lastmod>2022-08-20</lastmod>
         <changefreq>daily</changefreq>
    </url>
    <url>
         <loc>https://pytorch-widedeep.readthedocs.io/pytorch-widedeep/self_supervised_pretraining.html</loc>
-         <lastmod>2022-08-17</lastmod>
+         <lastmod>2022-08-20</lastmod>
         <changefreq>daily</changefreq>
    </url>
    <url>
         <loc>https://pytorch-widedeep.readthedocs.io/pytorch-widedeep/tab2vec.html</loc>
-         <lastmod>2022-08-17</lastmod>
+         <lastmod>2022-08-20</lastmod>
         <changefreq>daily</changefreq>
    </url>
    <url>
         <loc>https://pytorch-widedeep.readthedocs.io/pytorch-widedeep/trainer.html</loc>
-         <lastmod>2022-08-17</lastmod>
+         <lastmod>2022-08-20</lastmod>
         <changefreq>daily</changefreq>
    </url>
    <url>
         <loc>https://pytorch-widedeep.readthedocs.io/pytorch-widedeep/utils/index.html</loc>
-         <lastmod>2022-08-17</lastmod>
+         <lastmod>2022-08-20</lastmod>
         <changefreq>daily</changefreq>
    </url>
    <url>
         <loc>https://pytorch-widedeep.readthedocs.io/pytorch-widedeep/utils/deeptabular_utils.html</loc>
-         <lastmod>2022-08-17</lastmod>
+         <lastmod>2022-08-20</lastmod>
         <changefreq>daily</changefreq>
    </url>
    <url>
         <loc>https://pytorch-widedeep.readthedocs.io/pytorch-widedeep/utils/fastai_transforms.html</loc>
-         <lastmod>2022-08-17</lastmod>
+         <lastmod>2022-08-20</lastmod>
         <changefreq>daily</changefreq>
    </url>
    <url>
         <loc>https://pytorch-widedeep.readthedocs.io/pytorch-widedeep/utils/image_utils.html</loc>
-         <lastmod>2022-08-17</lastmod>
+         <lastmod>2022-08-20</lastmod>
         <changefreq>daily</changefreq>
    </url>
    <url>
         <loc>https://pytorch-widedeep.readthedocs.io/pytorch-widedeep/utils/text_utils.html</loc>
-         <lastmod>2022-08-17</lastmod>
+         <lastmod>2022-08-20</lastmod>
         <changefreq>daily</changefreq>
    </url>
 </urlset>
\ No newline at end of file
--- a/mkdocs/site/sitemap.xml.gz
+++ b/mkdocs/site/sitemap.xml.gz
--- a/mkdocs/sources/pytorch-widedeep/losses.md
+++ b/mkdocs/sources/pytorch-widedeep/losses.md
@@ -50,3 +50,9 @@ from pytorch_widedeep.losses import FocalLoss
 ::: pytorch_widedeep.losses.FocalR_RMSELoss

 ::: pytorch_widedeep.losses.HuberLoss
+
+::: pytorch_widedeep.losses.InfoNCELoss
+
+::: pytorch_widedeep.losses.DenoisingLoss
+
+::: pytorch_widedeep.losses.EncoderDecoderLoss
--- a/mkdocs/sources/pytorch-widedeep/model_components.md
+++ b/mkdocs/sources/pytorch-widedeep/model_components.md
@@ -3,7 +3,9 @@
 This module contains the models that can be used as the four main components
 that will comprise a Wide and Deep model (``wide``, ``deeptabular``,
 ``deeptext``, ``deepimage``), as well as the ``WideDeep`` "constructor"
-class. Note that each of the four components can be used independently.
+class. Note that each of the four components can be used independently. It
+also contains all the documentation for the models that can be used for
+self-supervised pre-training with tabular data.


 ::: pytorch_widedeep.models.tabular.linear.wide.Wide
@@ -82,16 +84,15 @@ class. Note that each of the four components can be used independently.
 :information_source: **NOTE**: when we started developing the library we
 thought that combining Deep Learning architectures for tabular data, with
 CNN-based architectures (pretrained or not) for images and Transformer-based
- architectures for text would be an _'overkill'_(also, pretrained
+ architectures for text would be an _'overkill'_  (also, pretrained
 transformer-based models were not as readily available as they are today).
 Therefore, at that time we made the decision of including in the library
 simple RNN-based architectures for the text dataset. A lot has passed since
 then and it is our intention to integrate this library with the
- [Hugginface's Transformers library]
- (https://huggingface.co/docs/transformers/main/en/index) in the near future.
- Nonetheless, note that it is still possible to use any custom model as the
- `deeptext` component using this library. Please, see the example section in
- this documentation for details
+ [Hugginface's Transformers library](https://huggingface.co/docs/transformers/main/en/index)
+ in the near future. Nonetheless, note that it is still possible to use any
+ custom model as the `deeptext` component using this library. Please, see the
+ example section in this documentation for details

 ::: pytorch_widedeep.models.text.attentive_rnn.BasicRNN
    selection:

--- a/mkdocs/sources/pytorch-widedeep/self_supervised_pretraining.md
+++ b/mkdocs/sources/pytorch-widedeep/self_supervised_pretraining.md
@@ -4,7 +4,10 @@ In this library we have implemented two methods or routines that allow the
 user to self-suerpvised pre-training for all tabular models in the library
 with the exception of the `TabPerceiver` (this is a particular model and
 self-supervised pre-training requires some adjustments that will be
-implemented in future versions).
+implemented in future versions). Please see the examples folder in the repo
+or the examples section in the docs for details on how to use self-supervised
+pre-training with this library.
+

 The two routines implemented are illustrated in the figures below. The first
 is from [TabNet: Attentive Interpretable Tabular Learning](https://arxiv.org/abs/1908.07442).

--- a/mkdocs/sources/pytorch-widedeep/utils/index.md
+++ b/mkdocs/sources/pytorch-widedeep/utils/index.md
@@ -10,3 +10,6 @@ the classes and functions discussed here are available directly from the
 ```
 from pytorch_widedeep.utils import LabelEncoder
 ```
+
+These are classes and functions that are internally used in the library. We
+include them here in case the user finds them useful for other purposes.
--- a/pytorch_widedeep/losses.py
+++ b/pytorch_widedeep/losses.py
@@ -798,14 +798,18 @@ class HuberLoss(nn.Module):


 class InfoNCELoss(nn.Module):
-    r"""InfoNCE Loss
+    r"""InfoNCE Loss. Loss applied during the Contrastive Denoising Self
+    Supervised Pre-training routine available in this library

-    See `SAINT: Improved Neural Networks for Tabular Data via Row Attention
-    and Contrastive Pre-Training <https://arxiv.org/abs/2106.01342>`_ and
+    :information_source: **NOTE**: This loss is in principle not exposed to
+     the user, as it is used internally in the library, but it is included
+     here for completion.
+
+    See [SAINT: Improved Neural Networks for Tabular Data via Row Attention
+    and Contrastive Pre-Training](https://arxiv.org/abs/2106.01342) and
    references therein

-    Partially inspired by the code in this `repo
-    <https://github.com/RElbers/info-nce-pytorch>`_
+    Partially inspired by the code in this [repo](https://github.com/RElbers/info-nce-pytorch)

    Parameters:
    -----------
@@ -857,10 +861,15 @@ class InfoNCELoss(nn.Module):


 class DenoisingLoss(nn.Module):
-    r"""Denoising Loss
+    r"""Denoising Loss. Loss applied during the Contrastive Denoising Self
+    Supervised Pre-training routine available in this library
+
+    :information_source: **NOTE**: This loss is in principle not exposed to
+     the user, as it is used internally in the library, but it is included
+     here for completion.

-    See `SAINT: Improved Neural Networks for Tabular Data via Row Attention
-    and Contrastive Pre-Training <https://arxiv.org/abs/2106.01342>`_ and
+    See [SAINT: Improved Neural Networks for Tabular Data via Row Attention
+    and Contrastive Pre-Training](https://arxiv.org/abs/2106.01342) and
    references therein

    Parameters:
@@ -898,12 +907,12 @@ class DenoisingLoss(nn.Module):
        ----------
        x_cat_and_cat_: tuple of Tensors or lists of tuples
            Tuple of tensors containing the raw input features and their
-            encodings, referred in the SAINT paper as :math:x` and :math:x''`
+            encodings, referred in the SAINT paper as $x$ and $x''$
            respectively. If one denoising MLP is used per categorical
-            feature 'x_cat_and_cat_' will be a list of tuples, one per
+            feature `x_cat_and_cat_` will be a list of tuples, one per
            categorical feature
        x_cont_and_cont_: tuple of Tensors or lists of tuples
-            same as 'x_cat_and_cat_' but for continuous columns
+            same as `x_cat_and_cat_` but for continuous columns

        Examples
        --------
@@ -928,7 +937,9 @@ class DenoisingLoss(nn.Module):

        return self.lambda_cat * loss_cat + self.lambda_cont * loss_cont

-    def _compute_cat_loss(self, x_cat_and_cat_):
+    def _compute_cat_loss(
+        self, x_cat_and_cat_: Union[List[Tuple[Tensor, Tensor]], Tuple[Tensor, Tensor]]
+    ) -> Tensor:

        loss_cat = torch.tensor(0.0)
        if isinstance(x_cat_and_cat_, list):
@@ -940,7 +951,7 @@ class DenoisingLoss(nn.Module):

        return loss_cat

-    def _compute_cont_loss(self, x_cont_and_cont_):
+    def _compute_cont_loss(self, x_cont_and_cont_) -> Tensor:

        loss_cont = torch.tensor(0.0)
        if isinstance(x_cont_and_cont_, list):
@@ -954,12 +965,17 @@ class DenoisingLoss(nn.Module):


 class EncoderDecoderLoss(nn.Module):
-    r"""Loss applied for the Endoder-Decoder Self Supervised pretraining process
+    r"""'_Standard_' Encoder Decoder Loss. Loss applied during the Endoder-Decoder
+     Self-Supervised Pre-Training routine available in this library
+
+    :information_source: **NOTE**: This loss is in principle not exposed to
+     the user, as it is used internally in the library, but it is included
+     here for completion.

    The implementation of this lost is based on that at the
-    https://github.com/dreamquark-ai/tabnet repo, which is in itself an
-    adaptation of that in the original TabNet paper: `TabNet: Attentive
-    Interpretable Tabular Learning <https://arxiv.org/abs/1908.07442>`_.
+    [tabnet repo](https://github.com/dreamquark-ai/tabnet), which is in itself an
+    adaptation of that in the original paper [TabNet: Attentive
+    Interpretable Tabular Learning](https://arxiv.org/abs/1908.07442).

    Parameters:
    -----------

--- a/pytorch_widedeep/models/fds_layer.py
+++ b/pytorch_widedeep/models/fds_layer.py
@@ -38,6 +38,11 @@ class FDSLayer(nn.Module):
        :information_source: **NOTE**: Feature Distribution Smoothing is
         available when using ONLY a `deeptabular` component

+        :information_source: **NOTE**: We consider this feature absolutely
+        experimental and we recommend the user to not use it unless the
+        corresponding [publication](https://arxiv.org/abs/2102.09554) is
+        well understood
+
        The code here is based on the code at the
        [official repo](https://github.com/YyzHarry/imbalanced-regression)


--- a/pytorch_widedeep/models/image/vision.py
+++ b/pytorch_widedeep/models/image/vision.py
@@ -62,8 +62,8 @@ class Vision(nn.Module):
        used. Alternatively, since Torchvision 0.13 one can use pretrained
        models with different weigths. Therefore, `pretrained_model_setup` can
        also be dictionary with the name of the model and the weights (e.g.
-        {'resnet50': ResNet50_Weights.DEFAULT} or
-        {'resnet50': "IMAGENET1K_V2"}). Aliased as `pretrained_model_name`.
+        `{'resnet50': ResNet50_Weights.DEFAULT}` or
+        `{'resnet50': "IMAGENET1K_V2"}`). <br/> Aliased as `pretrained_model_name`.
    n_trainable: Optional, int, default = None
        Number of trainable layers starting from the layer closer to the
        output neuron(s). Note that this number DOES NOT take into account
@@ -108,9 +108,6 @@ class Vision(nn.Module):
    ----------
    features: nn.Module
        The pretrained model or Standard CNN plus the optional head
-    output_dim: int
-        The output dimension of the model. This is a required attribute
-        neccesary to build the `WideDeep` class

    Examples
    --------
@@ -188,6 +185,9 @@ class Vision(nn.Module):

    @property
    def output_dim(self) -> int:
+        r"""The output dimension of the model. This is a required property
+        neccesary to build the `WideDeep` class
+        """
        return (
            self.head_hidden_dims[-1]
            if self.head_hidden_dims is not None

--- a/pytorch_widedeep/models/tabular/mlp/context_attention_mlp.py
+++ b/pytorch_widedeep/models/tabular/mlp/context_attention_mlp.py
@@ -88,11 +88,8 @@ class ContextAttentionMLP(BaseTabularModelWithAttention):
    ----------
    cat_and_cont_embed: nn.Module
        This is the module that processes the categorical and continuous columns
-    attention_blks: nn.Sequential
+    encoder: nn.Module
        Sequence of attention encoders.
-    output_dim: int
-        The output dimension of the model. This is a required attribute
-        neccesary to build the `WideDeep` class

    Examples
    --------
@@ -181,6 +178,9 @@ class ContextAttentionMLP(BaseTabularModelWithAttention):

    @property
    def output_dim(self) -> int:
+        r"""The output dimension of the model. This is a required property
+        neccesary to build the `WideDeep` class
+        """
        return (
            self.input_dim
            if self.with_cls_token

--- a/pytorch_widedeep/models/tabular/mlp/self_attention_mlp.py
+++ b/pytorch_widedeep/models/tabular/mlp/self_attention_mlp.py
@@ -17,7 +17,10 @@ class SelfAttentionMLP(BaseTabularModelWithAttention):
    are then passed through a series of attention blocks. Each attention
    block is comprised by what we would refer as a simplified
    `SelfAttentionEncoder`. See
-    `pytorch_widedeep.models.tabular.mlp._attention_layers` for details.
+    `pytorch_widedeep.models.tabular.mlp._attention_layers` for details. The
+    reason to use a simplified version of self attention is because we
+    observed that the '_standard_' attention mechanism used in the
+    TabTransformer has a notable tendency to overfit.

    Parameters
    ----------
@@ -89,11 +92,8 @@ class SelfAttentionMLP(BaseTabularModelWithAttention):
    ----------
    cat_and_cont_embed: nn.Module
        This is the module that processes the categorical and continuous columns
-    attention_blks: nn.Sequential
+    encoder: nn.Module
        Sequence of attention encoders.
-    output_dim: int
-        The output dimension of the model. This is a required attribute
-        neccesary to build the WideDeep class

    Examples
    --------
@@ -188,6 +188,9 @@ class SelfAttentionMLP(BaseTabularModelWithAttention):

    @property
    def output_dim(self) -> int:
+        r"""The output dimension of the model. This is a required property
+        neccesary to build the WideDeep class
+        """
        return (
            self.input_dim
            if self.with_cls_token

--- a/pytorch_widedeep/models/tabular/mlp/tab_mlp.py
+++ b/pytorch_widedeep/models/tabular/mlp/tab_mlp.py
@@ -71,12 +71,9 @@ class TabMlp(BaseTabularModelWithoutAttention):
    ----------
    cat_and_cont_embed: nn.Module
        This is the module that processes the categorical and continuous columns
-    tab_mlp: nn.Sequential
+    encoder: nn.Module
        mlp model that will receive the concatenation of the embeddings and
        the continuous columns
-    output_dim: int
-        The output dimension of the model. This is a required attribute
-        neccesary to build the `WideDeep` class

    Examples
    --------
@@ -152,7 +149,9 @@ class TabMlp(BaseTabularModelWithoutAttention):
        return self.encoder(x)

    @property
-    def output_dim(self):
+    def output_dim(self) -> int:
+        r"""The output dimension of the model. This is a required property
+        neccesary to build the `WideDeep` class"""
        return self.mlp_hidden_dims[-1]


@@ -170,8 +169,7 @@ class TabMlpDecoder(nn.Module):
    This class is designed to be used with the `EncoderDecoderTrainer` when
    using self-supervised pre-training (see the corresponding section in the
    docs). The `TabMlpDecoder` will receive the output from the MLP
-    and '_reconstruct_' the embeddings in the embeddings layer in the
-    `TabMlp` model.
+    and '_reconstruct_' the embeddings.

    Parameters
    ----------

--- a/pytorch_widedeep/models/tabular/resnet/tab_resnet.py
+++ b/pytorch_widedeep/models/tabular/resnet/tab_resnet.py
@@ -88,16 +88,13 @@ class TabResnet(BaseTabularModelWithoutAttention):
    ----------
    cat_and_cont_embed: nn.Module
        This is the module that processes the categorical and continuous columns
-    tab_resnet_blks: nn.Sequential
+    encoder: nn.Module
        deep dense Resnet model that will receive the concatenation of the
        embeddings and the continuous columns
-    tab_resnet_mlp: nn.Sequential
+    mlp: nn.Module
        if `mlp_hidden_dims` is `True`, this attribute will be an mlp
        model that will receive the results of the concatenation of the
        embeddings and the continuous columns -- if present --.
-    output_dim: int
-        The output dimension of the model. This is a required attribute
-        neccesary to build the `WideDeep` class

    Examples
    --------
@@ -200,6 +197,9 @@ class TabResnet(BaseTabularModelWithoutAttention):

    @property
    def output_dim(self) -> int:
+        r"""The output dimension of the model. This is a required property
+        neccesary to build the `WideDeep` class
+        """
        return (
            self.mlp_hidden_dims[-1]
            if self.mlp_hidden_dims is not None
@@ -214,8 +214,7 @@ class TabResnetDecoder(nn.Module):
    This class is designed to be used with the `EncoderDecoderTrainer` when
    using self-supervised pre-training (see the corresponding section in the
    docs). This class will receive the output from the ResNet blocks or the
-    MLP(if present) and '_reconstruct_' the embeddings in the embeddings
-    layer in the `TabResnet` model.
+    MLP(if present) and '_reconstruct_' the embeddings.

    Parameters
    ----------

--- a/pytorch_widedeep/models/tabular/tabnet/tab_net.py
+++ b/pytorch_widedeep/models/tabular/tabnet/tab_net.py
@@ -95,11 +95,8 @@ class TabNet(BaseTabularModelWithoutAttention):
    ----------
    cat_and_cont_embed: nn.Module
        This is the module that processes the categorical and continuous columns
-    tabnet_encoder: nn.Module
+    encoder: nn.Module
        the TabNet encoder. For details see the [original publication](https://arxiv.org/abs/1908.07442).
-    output_dim: int
-        The output dimension of the model. This is a required attribute
-        neccesary to build the `WideDeep` class

    Examples
    --------
@@ -202,6 +199,9 @@ class TabNet(BaseTabularModelWithoutAttention):

    @property
    def output_dim(self) -> int:
+        r"""The output dimension of the model. This is a required property
+        neccesary to build the `WideDeep` class
+        """
        return self.step_dim


@@ -234,7 +234,7 @@ class TabNetDecoder(nn.Module):
    using self-supervised pre-training (see the corresponding section in the
    docs). This class will receive the output from the `TabNet` encoder
    (i.e. the output from the so called 'steps') and '_reconstruct_' the
-    embeddings from the embeddings layer in the `TabNet` encoder.
+    embeddings.

    Parameters
    ----------

--- a/pytorch_widedeep/models/tabular/transformers/ft_transformer.py
+++ b/pytorch_widedeep/models/tabular/transformers/ft_transformer.py
@@ -120,13 +120,10 @@ class FTTransformer(BaseTabularModelWithAttention):
    ----------
    cat_and_cont_embed: nn.Module
        This is the module that processes the categorical and continuous columns
-    fttransformer_blks: nn.Sequential
+    encoder: nn.Module
        Sequence of FTTransformer blocks
-    fttransformer_mlp: nn.Module
+    mlp: nn.Module
        MLP component in the model
-    output_dim: int
-        The output dimension of the model. This is a required attribute
-        neccesary to build the `WideDeep` class

    Examples
    --------
@@ -268,6 +265,9 @@ class FTTransformer(BaseTabularModelWithAttention):

    @property
    def output_dim(self) -> int:
+        r"""The output dimension of the model. This is a required property
+        neccesary to build the `WideDeep` class
+        """
        return (
            self.mlp_hidden_dims[-1]
            if self.mlp_hidden_dims is not None

--- a/pytorch_widedeep/models/tabular/transformers/saint.py
+++ b/pytorch_widedeep/models/tabular/transformers/saint.py
@@ -106,13 +106,10 @@ class SAINT(BaseTabularModelWithAttention):
    ----------
    cat_and_cont_embed: nn.Module
        This is the module that processes the categorical and continuous columns
-    saint_blks: nn.Sequential
+    encoder: nn.Module
        Sequence of SAINT-Transformer blocks
-    saint_mlp: nn.Module
+    mlp: nn.Module
        MLP component in the model
-    output_dim: int
-        The output dimension of the model. This is a required attribute
-        neccesary to build the `WideDeep` class

    Examples
    --------
@@ -241,6 +238,9 @@ class SAINT(BaseTabularModelWithAttention):

    @property
    def output_dim(self) -> int:
+        r"""The output dimension of the model. This is a required property
+        neccesary to build the `WideDeep` class
+        """
        return (
            self.mlp_hidden_dims[-1]
            if self.mlp_hidden_dims is not None

--- a/pytorch_widedeep/models/tabular/transformers/tab_fastformer.py
+++ b/pytorch_widedeep/models/tabular/transformers/tab_fastformer.py
@@ -119,13 +119,10 @@ class TabFastFormer(BaseTabularModelWithAttention):
    ----------
    cat_and_cont_embed: nn.Module
        This is the module that processes the categorical and continuous columns
-    fastformer_blks: nn.Sequential
+    encoder: nn.Module
        Sequence of FasFormer blocks.
-    fastformer_mlp: nn.Module
+    mlp: nn.Module
        MLP component in the model
-    output_dim: int
-        The output dimension of the model. This is a required attribute
-        neccesary to build the `WideDeep` class

    Examples
    --------
@@ -274,6 +271,9 @@ class TabFastFormer(BaseTabularModelWithAttention):

    @property
    def output_dim(self) -> int:
+        r"""The output dimension of the model. This is a required property
+        neccesary to build the `WideDeep` class
+        """
        return (
            self.mlp_hidden_dims[-1]
            if self.mlp_hidden_dims is not None

--- a/pytorch_widedeep/models/tabular/transformers/tab_perceiver.py
+++ b/pytorch_widedeep/models/tabular/transformers/tab_perceiver.py
@@ -134,15 +134,12 @@ class TabPerceiver(BaseTabularModelWithAttention):
    ----------
    cat_and_cont_embed: nn.Module
        This is the module that processes the categorical and continuous columns
-    perceiver_blks: nn.ModuleDict
+    encoder: nn.ModuleDict
        ModuleDict with the Perceiver blocks
    latents: nn.Parameter
        Latents that will be used for prediction
-    perceiver_mlp: nn.Module
+    mlp: nn.Module
        MLP component in the model
-    output_dim: int
-        The output dimension of the model. This is a required attribute
-        neccesary to build the `WideDeep` class

    Examples
    --------
@@ -289,6 +286,9 @@ class TabPerceiver(BaseTabularModelWithAttention):

    @property
    def output_dim(self) -> int:
+        r"""The output dimension of the model. This is a required property
+        neccesary to build the `WideDeep` class
+        """
        return (
            self.mlp_hidden_dims[-1]
            if self.mlp_hidden_dims is not None

--- a/pytorch_widedeep/models/tabular/transformers/tab_transformer.py
+++ b/pytorch_widedeep/models/tabular/transformers/tab_transformer.py
@@ -112,13 +112,10 @@ class TabTransformer(BaseTabularModelWithAttention):
    ----------
    cat_and_cont_embed: nn.Module
        This is the module that processes the categorical and continuous columns
-    transformer_blks: nn.Sequential
+    encoder: nn.Module
        Sequence of Transformer blocks
-    transformer_mlp: nn.Module
+    mlp: nn.Module
        MLP component in the model
-    output_dim: int
-        The output dimension of the model. This is a required attribute
-        neccesary to build the `WideDeep` class

    Examples
    --------
@@ -279,6 +276,9 @@ class TabTransformer(BaseTabularModelWithAttention):

    @property
    def output_dim(self) -> int:
+        r"""The output dimension of the model. This is a required property
+        neccesary to build the `WideDeep` class
+        """
        return (
            self.mlp_hidden_dims[-1]
            if self.mlp_hidden_dims is not None

--- a/pytorch_widedeep/models/text/attentive_rnn.py
+++ b/pytorch_widedeep/models/text/attentive_rnn.py
@@ -77,12 +77,9 @@ class AttentiveRNN(BasicRNN):
        word embedding matrix
    rnn: nn.Module
        Stack of RNNs
-    rnn_mlp: nn.Sequential
+    rnn_mlp: nn.Module
        Stack of dense layers on top of the RNN. This will only exists if
        `head_layers_dim` is not `None`
-    output_dim: int
-        The output dimension of the model. This is a required attribute
-        neccesary to build the `WideDeep` class

    Examples
    --------

--- a/pytorch_widedeep/models/text/basic_rnn.py
+++ b/pytorch_widedeep/models/text/basic_rnn.py
@@ -69,12 +69,9 @@ class BasicRNN(nn.Module):
        word embedding matrix
    rnn: nn.Module
        Stack of RNNs
-    rnn_mlp: nn.Sequential
+    rnn_mlp: nn.Module
        Stack of dense layers on top of the RNN. This will only exists if
        `head_layers_dim` is not None
-    output_dim: int
-        The output dimension of the model. This is a required attribute
-        neccesary to build the `WideDeep` class

    Examples
    --------
@@ -193,6 +190,9 @@ class BasicRNN(nn.Module):

    @property
    def output_dim(self) -> int:
+        r"""The output dimension of the model. This is a required property
+        neccesary to build the `WideDeep` class
+        """
        return (
            self.head_hidden_dims[-1]
            if self.head_hidden_dims is not None

--- a/pytorch_widedeep/models/text/stacked_attentive_rnn.py
+++ b/pytorch_widedeep/models/text/stacked_attentive_rnn.py
@@ -75,12 +75,9 @@ class StackedAttentiveRNN(nn.Module):
        word embedding matrix
    rnn: nn.Module
        Stack of RNNs
-    rnn_mlp: nn.Sequential
+    rnn_mlp: nn.Module
        Stack of dense layers on top of the RNN. This will only exists if
        `head_layers_dim` is not `None`
-    output_dim: int
-        The output dimension of the model. This is a required attribute
-        neccesary to build the `WideDeep` class

    Examples
    --------
@@ -235,6 +232,9 @@ class StackedAttentiveRNN(nn.Module):
        return self.rnn_mlp(x)

    def output_dim(self) -> int:
+        r"""The output dimension of the model. This is a required property
+        neccesary to build the `WideDeep` class
+        """
        return (
            self.head_hidden_dims[-1]
            if self.head_hidden_dims is not None

--- a/pytorch_widedeep/models/wide_deep.py
+++ b/pytorch_widedeep/models/wide_deep.py
@@ -28,6 +28,12 @@ class WideDeep(nn.Module):
    r"""Main collector class that combines all `wide`, `deeptabular`
    `deeptext` and `deepimage` models.

+    Note that all models described so far in this library must be passed to
+    the `WideDeep` class once constructed. This is because the models output
+    the last layer before the prediction layer. Such prediction layer is
+    added by the `WideDeep` class as it collects the components for every
+    data mode.
+
    There are two options to combine these models that correspond to the
    two main architectures that `pytorch-widedeep` can build.

@@ -100,7 +106,8 @@ class WideDeep(nn.Module):
        Distribution Smoothing. Please, see the docs for the `FDSLayer`.
        <br/>
        :information_source: **NOTE**: Feature Distribution Smoothing
-         is available when using ONLY a `deeptabular` component
+         is available when using **ONLY** a `deeptabular` component
+        <br/>
        :information_source: **NOTE**: We consider this feature absolutely
        experimental and we recommend the user to not use it unless the
        corresponding [publication](https://arxiv.org/abs/2102.09554) is

--- a/pytorch_widedeep/preprocessing/tab_preprocessor.py
+++ b/pytorch_widedeep/preprocessing/tab_preprocessor.py
@@ -76,12 +76,14 @@ class TabPreprocessor(BasePreprocessor):
        `False`.
    with_attention: bool, default = False
        Boolean indicating whether the preprocessed data will be passed to an
-        attention-based model. If `True`, the param `cat_embed_cols` must
-        just be a list containing just the categorical column names: e.g.
+        attention-based model (more precisely a model where all embeddings
+        must have the same dimensions). If `True`, the param `cat_embed_cols`
+        must just be a list containing just the categorical column names:
+        e.g.
        _['education', 'relationship', ...]_. This is because they will all be
-        encoded using embeddings of the same dim, which will be specified
-        later when the model is defined. <br/>
-        Param alias: `for_transformer`
+         encoded using embeddings of the same dim, which will be specified
+         later when the model is defined. <br/> Param alias:
+         `for_transformer`
    with_cls_token: bool, default = False
        Boolean indicating if a `'[CLS]'` token will be added to the dataset
        when using attention-based models. The final hidden state
@@ -144,10 +146,10 @@ class TabPreprocessor(BasePreprocessor):
        cat_embed_cols: Union[List[str], List[Tuple[str, int]]] = None,
        continuous_cols: List[str] = None,
        scale: bool = True,
+        already_standard: List[str] = None,
        auto_embed_dim: bool = True,
        embedding_rule: Literal["google", "fastai_old", "fastai_new"] = "fastai_new",
        default_embed_dim: int = 16,
-        already_standard: List[str] = None,
        with_attention: bool = False,
        with_cls_token: bool = False,
        shared_embed: bool = False,
@@ -157,10 +159,10 @@ class TabPreprocessor(BasePreprocessor):

        self.continuous_cols = continuous_cols
        self.scale = scale
+        self.already_standard = already_standard
        self.auto_embed_dim = auto_embed_dim
        self.embedding_rule = embedding_rule
        self.default_embed_dim = default_embed_dim
-        self.already_standard = already_standard
        self.with_attention = with_attention
        self.with_cls_token = with_cls_token
        self.shared_embed = shared_embed

--- a/pytorch_widedeep/preprocessing/wide_preprocessor.py
+++ b/pytorch_widedeep/preprocessing/wide_preprocessor.py
@@ -27,7 +27,7 @@ class WidePreprocessor(BasePreprocessor):
        List of Tuples with the name of the columns that will be `'crossed'`
        and then label encoded. e.g. _[('education', 'occupation'), ...]_. For
        binary features, a cross-product transformation is 1 if and only if
-        the constituent features are all 1, and 0 otherwise".
+        the constituent features are all 1, and 0 otherwise.

    Attributes
    ----------
@@ -36,6 +36,8 @@ class WidePreprocessor(BasePreprocessor):
    encoding_dict: Dict
        Dictionary where the keys are the result of pasting `colname + '_' +
        column value` and the values are the corresponding mapped integer.
+    inverse_encoding_dict: Dict
+        the inverse encoding dictionary
    wide_dim: int
        Dimension of the wide model (i.e. dim of the linear layer)


--- a/pytorch_widedeep/training/trainer.py
+++ b/pytorch_widedeep/training/trainer.py
@@ -50,10 +50,10 @@ class Trainer(BaseTrainer):
    model: `WideDeep`
        An object of class `WideDeep`
    objective: str
-        Defines the objective, loss or cost function.
+        Defines the objective, loss or cost function. <br/>

        Param aliases: `loss_function`, `loss_fn`, `loss`,
-        `cost_function`, `cost_fn`, `cost`
+        `cost_function`, `cost_fn`, `cost`. <br/>

        Possible values are:

@@ -312,12 +312,10 @@ class Trainer(BaseTrainer):
            predefined dataloaders are in `pytorch-widedeep.dataloaders`.If
            `None`, a standard torch `DataLoader` is used.
        finetune: bool, default=False
-            param alias: `warmup`
-
            fine-tune individual model components. This functionality can also
-            be used to 'warm-up' individual components before the joined
-            training starts, and hence its alias. See the Examples folder in
-            the repo for more details
+            be used to 'warm-up' (and hence the alias `warmup`) individual
+            components before the joined training starts, and hence its
+            alias. See the Examples folder in the repo for more details

            `pytorch_widedeep` implements 3 fine-tune routines.

@@ -341,7 +339,14 @@ class Trainer(BaseTrainer):
              [ULMfit paper](https://arxiv.org/abs/1801.06146>).

            For details on how these routines work, please see the Examples
-            section in this documentation and the Examples folder in the repo.
+            section in this documentation and the Examples folder in the repo. <br/>
+            Param Alias: `warmup`
+        with_lds: bool, default=False
+            Boolean indicating if Label Distribution Smoothing will be used. <br/>
+            information_source: **NOTE**: We consider this feature absolutely
+            experimental and we recommend the user to not use it unless the
+            corresponding [publication](https://arxiv.org/abs/2102.09554) is
+            well understood

        Other Parameters
        ----------------
@@ -355,12 +360,24 @@ class Trainer(BaseTrainer):
                for details.

            - **Label Distribution Smoothing related parameters**:<br/>
-                see the source code at `pytorch_widedeep._wd_dataset` for some details

-                    >:information_source: **NOTE**: We consider this feature absolutely
-                    experimental and we recommend the user to not use it unless the
-                    corresponding [publication](https://arxiv.org/abs/2102.09554) is
-                    well understood
+                - lds_kernel (`Literal['gaussian', 'triang', 'laplace']`):
+                    choice of kernel for Label Distribution Smoothing
+                - lds_ks (`int`):
+                    LDS kernel window size
+                - lds_sigma (`float`):
+                    standard deviation of ['gaussian','laplace'] kernel for LDS
+                - lds_granularity (`int`):
+                    number of bins in histogram used in LDS to count occurence of sample values
+                - lds_reweight (`bool`):
+                    option to reweight bin frequency counts in LDS
+                - lds_y_max (`Optional[float]`):
+                    option to restrict LDS bins by upper label limit
+                - lds_y_min (`Optional[float]`):
+                    option to restrict LDS bins by lower label limit
+
+                See `pytorch_widedeep.trainer._wd_dataset` for more details on
+                the implications of these parameters

            - **Finetune related parameters**:<br/>
                see the source code at `pytorch_widedeep._finetune`. Namely, these are:

--- a/pytorch_widedeep/utils/fastai_transforms.py
+++ b/pytorch_widedeep/utils/fastai_transforms.py
@@ -227,6 +227,9 @@ class Tokenizer:
    r"""Class to combine a series of rules and a tokenizer function to tokenize
    text with multiprocessing.

+    Setting some of the parameters of this class require perhaps some
+    familiarity with the source code.
+
    Parameters
    ----------
    tok_func: Callable, default = ``SpacyTokenizer``
@@ -234,11 +237,13 @@ class Tokenizer:
    lang: str, default = "en"
        Text's Language
    pre_rules: ListRules, Optional, default = None
-        Custom type: ``Collection[Callable[[str], str]]``.
-        see  `pytorch_widedeep.wdtypes`. Preprocessing Rules
+        Custom type: ``Collection[Callable[[str], str]]``. These are
+        `Callable` objects that will be applied to the text (str) directly as
+        `rule(tok)` before being tokenized.
    post_rules: ListRules, Optional, default = None
-        Custom type: ``Collection[Callable[[str], str]]``.
-        see `pytorch_widedeep.wdtypes`. Postprocessing Rules
+        Custom type: ``Collection[Callable[[str], str]]``. These are
+        `Callable` objects that will be applied to the tokens as
+        `rule(tokens)` after the text has been tokenized.
    special_cases: Collection, Optional, default= None
        special cases to be added to the tokenizer via ``Spacy``'s
        ``add_special_case`` method
@@ -272,7 +277,7 @@ class Tokenizer:
        return res

    def process_text(self, t: str, tok: BaseTokenizer) -> List[str]:
-        """Process and tokenize one text ``t`` with tokenizer ``tok``.
+        r"""Process and tokenize one text ``t`` with tokenizer ``tok``.

        Parameters
        ----------