Merge pull request #56 from jrzaurin/pmulinka/uncertainty

Embedding, MC and draft requets

Merge pull request #56 from jrzaurin/pmulinka/uncertainty
Embedding, MC and draft requets
e1e6a2ac · Javier · GitHub · c287c870 · 45330670 · e1e6a2ac
23 changed file
--- a/.github/workflows/build.yml
+++ b/.github/workflows/build.yml
@@ -11,6 +11,7 @@ on:
 jobs:
  codestyle:
    runs-on: ubuntu-latest
+    if: ${{ github.event_name == 'push' || !github.event.pull_request.draft }}
    steps:
    - uses: actions/checkout@v2
    - name: Set up Python 3.9
@@ -32,6 +33,7 @@ jobs:

  test:
    runs-on: ubuntu-latest
+    if: ${{ github.event_name == 'push' || !github.event.pull_request.draft }}
    strategy:
      fail-fast: true
      matrix:
@@ -59,6 +61,7 @@ jobs:
  finish: 
    needs: test
    runs-on: ubuntu-latest
+    if: ${{ github.event_name == 'push' || !github.event.pull_request.draft }}
    steps:
    - uses: actions/checkout@v2
    - name: Set up Python 3.9

--- a/.gitignore
+++ b/.gitignore
@@ -11,7 +11,7 @@ __pycache__*
 Untitled*.ipynb

 # data related dirs
-data/
+tmp_data/
 model_weights/
 tmp_dir/
 weights/

--- a/CONTRIBUTING.MD
+++ b/CONTRIBUTING.MD
+Pytorch-widedeep is being developed and used by many active community members. Your help is very valuable to make it better for everyone.
+
+- **[TBA]** Check for the [Roadmap](https://github.com/jrzaurin/pytorch-widedeep/projects/1) or [Open an issue](https://github.com/microsoft/jrzaurin/pytorch-widedeep/issues) to report problems or recommend new features and submit a draft pull requests, which will be changed to pull request after intial review
+- Contribute to the [tests](https://github.com/jrzaurin/pytorch-widedeep/tree/master/tests) to make it more reliable.
+- Contribute to the [documentation](https://github.com/jrzaurin/pytorch-widedeep/tree/master/docs) to make it clearer for everyone.
+- Contribute to the [examples](https://github.com/jrzaurin/pytorch-widedeep/tree/master/examples) to share your experience with other users.
+- Join the dicussion on [slack](https://join.slack.com/t/pytorch-widedeep/shared_invite/zt-soss7stf-iXpVuLeKZz8lGTnxxtHtTw)
\ No newline at end of file
--- a/README.md
+++ b/README.md
@@ -24,8 +24,6 @@ using wide and deep models.

 **Experiments and comparisson with `LightGBM`**: [TabularDL vs LightGBM](https://github.com/jrzaurin/tabulardl-benchmark)

-**Slack**: if you want to contribute or just want to chat with us, join [slack](https://join.slack.com/t/pytorch-widedeep/shared_invite/zt-soss7stf-iXpVuLeKZz8lGTnxxtHtTw)
-
 The content of this document is organized as follows:

 1. [introduction](#introduction)
@@ -307,6 +305,10 @@ of the package and its functionalities.
 pytest tests
 ```

+### How to Contribute
+
+Check [CONTRIBUTING](https://github.com/jrzaurin/pytorch-widedeep/CONTRIBUTING.MD) page.
+
 ### Acknowledgments

 This library takes from a series of other libraries, so I think it is just

--- a/VERSION
+++ b/VERSION
-1.0.11
\ No newline at end of file
+1.0.12
\ No newline at end of file
--- a/examples/12_HyperParameter_tuning_w_RayTune.ipynb
+++ b/examples/12_HyperParameter_tuning_w_RayTune.ipynb
@@ -2,27 +2,15 @@
 "cells": [
  {
   "cell_type": "markdown",
+   "id": "731975e2",
   "metadata": {},
   "source": [
-    "# Hyperparameter tuning and using Raytune and visulization using Tensorboard"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "* In this notebook we will use the higly imbalanced Protein Homology Dataset from [KDD cup 2004](https://www.kdd.org/kdd-cup/view/kdd-cup-2004/Data)\n",
-    "\n",
-    "```\n",
-    "* The first element of each line is a BLOCK ID that denotes to which native sequence this example belongs. There is a unique BLOCK ID for each native sequence. BLOCK IDs are integers running from 1 to 303 (one for each native sequence, i.e. for each query). BLOCK IDs were assigned before the blocks were split into the train and test sets, so they do not run consecutively in either file.\n",
-    "* The second element of each line is an EXAMPLE ID that uniquely describes the example. You will need this EXAMPLE ID and the BLOCK ID when you submit results.\n",
-    "* The third element is the class of the example. Proteins that are homologous to the native sequence are denoted by 1, non-homologous proteins (i.e. decoys) by 0. Test examples have a \"?\" in this position.\n",
-    "* All following elements are feature values. There are 74 feature values in each line. The features describe the match (e.g. the score of a sequence alignment) between the native protein sequence and the sequence that is tested for homology.\n",
-    "```"
+    "# Hyperparameter tuning with Raytune and visulization using Tensorboard and Weights & Biases"
   ]
  },
  {
   "cell_type": "markdown",
+   "id": "ee745c58",
   "metadata": {},
   "source": [
    "## Initial imports"
@@ -30,19 +18,12 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": 61,
+   "id": "fdab94eb",
   "metadata": {},
-   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "/Users/javier/.pyenv/versions/3.7.7/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject\n",
-      "  return f(*args, **kwds)\n"
-     ]
-    }
-   ],
+   "outputs": [],
   "source": [
+    "import os\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import torch\n",
@@ -51,7 +32,6 @@
    "from pytorch_widedeep import Trainer\n",
    "from pytorch_widedeep.preprocessing import TabPreprocessor\n",
    "from pytorch_widedeep.models import TabMlp, WideDeep\n",
-    "from pytorch_widedeep.dataloaders import DataLoaderImbalanced, DataLoaderDefault\n",
    "from torchmetrics import F1 as F1_torchmetrics\n",
    "from torchmetrics import Accuracy as Accuracy_torchmetrics\n",
    "from torchmetrics import Precision as Precision_torchmetrics\n",
@@ -61,22 +41,21 @@
    "from pytorch_widedeep.callbacks import (\n",
    "    EarlyStopping,\n",
    "    ModelCheckpoint,\n",
-    "    LRHistory,\n",
    "    RayTuneReporter,\n",
    ")\n",
+    "from pytorch_widedeep.datasets import load_bio_kdd04\n",
    "\n",
    "from sklearn.model_selection import train_test_split\n",
-    "from sklearn.metrics import classification_report\n",
-    "\n",
-    "import time\n",
-    "import datetime\n",
-    "\n",
    "import warnings\n",
    "\n",
    "warnings.filterwarnings(\"ignore\", category=DeprecationWarning)\n",
    "\n",
    "from ray import tune\n",
+    "from ray.tune.schedulers import AsyncHyperBandScheduler\n",
    "from ray.tune import JupyterNotebookReporter\n",
+    "from ray.tune.integration.wandb import WandbLoggerCallback, wandb_mixin\n",
+    "import wandb\n",
+    "\n",
    "import tracemalloc\n",
    "\n",
    "tracemalloc.start()\n",
@@ -88,7 +67,8 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 64,
+   "id": "07c75f0c",
   "metadata": {},
   "outputs": [
    {
@@ -647,20 +627,20 @@
       "4  0.68 -0.59  2.0 -36.0   -6.9  2.02  0.14 -0.23  "
      ]
     },
-     "execution_count": 2,
+     "execution_count": 64,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
-    "header_list = ['EXAMPLE_ID', 'BLOCK_ID', 'target'] + [str(i) for i in range(4,78)]\n",
-    "df = pd.read_csv('data/kddcup04/bio_train.dat', sep='\\t', names=header_list)\n",
+    "df = load_bio_kdd04(as_frame=True)\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 65,
+   "id": "1e3f8efc",
   "metadata": {},
   "outputs": [
    {
@@ -671,7 +651,7 @@
       "Name: target, dtype: int64"
      ]
     },
-     "execution_count": 3,
+     "execution_count": 65,
     "metadata": {},
     "output_type": "execute_result"
    }
@@ -683,7 +663,8 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 35,
+   "id": "214b3071",
   "metadata": {},
   "outputs": [],
   "source": [
@@ -693,7 +674,8 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 36,
+   "id": "168c81f1",
   "metadata": {},
   "outputs": [],
   "source": [
@@ -703,6 +685,7 @@
  },
  {
   "cell_type": "markdown",
+   "id": "87e7b8f0",
   "metadata": {},
   "source": [
    "## Preparing the data"
@@ -710,7 +693,8 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 37,
+   "id": "3a7b246b",
   "metadata": {},
   "outputs": [],
   "source": [
@@ -719,7 +703,8 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 38,
+   "id": "7a2dac24",
   "metadata": {},
   "outputs": [],
   "source": [
@@ -737,6 +722,7 @@
  },
  {
   "cell_type": "markdown",
+   "id": "7b9f63e2",
   "metadata": {},
   "source": [
    "## Define the model"
@@ -744,7 +730,8 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": 39,
+   "id": "81bfda03",
   "metadata": {},
   "outputs": [],
   "source": [
@@ -757,7 +744,8 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 9,
+   "execution_count": 40,
+   "id": "511198d4",
   "metadata": {},
   "outputs": [
    {
@@ -804,7 +792,7 @@
       ")"
      ]
     },
-     "execution_count": 9,
+     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
@@ -821,7 +809,8 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 10,
+   "execution_count": 41,
+   "id": "2d76f463",
   "metadata": {},
   "outputs": [],
   "source": [
@@ -834,7 +823,8 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 11,
+   "execution_count": 42,
+   "id": "a5359b0f",
   "metadata": {},
   "outputs": [],
   "source": [
@@ -847,42 +837,19 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 33,
+   "execution_count": 60,
+   "id": "34a18ac0",
   "metadata": {
    "scrolled": false
   },
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "== Status ==<br>Memory usage on this node: 10.5/16.0 GiB<br>Using FIFO scheduling algorithm.<br>Resources requested: 0/8 CPUs, 0/0 GPUs, 0.0/4.32 GiB heap, 0.0/2.16 GiB objects<br>Result logdir: /Users/javier/ray_results/training_function_2021-10-18_15-59-06<br>Number of trials: 2/2 (2 TERMINATED)<br><table>\n",
-       "<thead>\n",
-       "<tr><th>Trial name                   </th><th>status    </th><th>loc  </th><th style=\"text-align: right;\">  batch_size</th><th style=\"text-align: right;\">  iter</th><th style=\"text-align: right;\">  total time (s)</th></tr>\n",
-       "</thead>\n",
-       "<tbody>\n",
-       "<tr><td>training_function_8f035_00000</td><td>TERMINATED</td><td>     </td><td style=\"text-align: right;\">        1000</td><td style=\"text-align: right;\">     5</td><td style=\"text-align: right;\">         18.2589</td></tr>\n",
-       "<tr><td>training_function_8f035_00001</td><td>TERMINATED</td><td>     </td><td style=\"text-align: right;\">        5000</td><td style=\"text-align: right;\">     5</td><td style=\"text-align: right;\">         18.0369</td></tr>\n",
-       "</tbody>\n",
-       "</table><br><br>"
-      ],
-      "text/plain": [
-       "<IPython.core.display.HTML object>"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "2021-10-18 15:59:28,373\tINFO tune.py:617 -- Total run time: 22.07 seconds (21.91 seconds for the tuning loop).\n"
-     ]
-    }
-   ],
+   "outputs": [],
   "source": [
    "config = {\n",
    "    \"batch_size\": tune.grid_search([1000, 5000]),\n",
+    "    \"wandb\": {\n",
+    "        \"project\": \"test\",\n",
+    "        \"api_key_file\": os.getcwd() + \"/wandb_api.key\",\n",
+    "    },\n",
    "}\n",
    "\n",
    "# Optimizers\n",
@@ -890,16 +857,17 @@
    "# LR Scheduler\n",
    "deep_sch = lr_scheduler.StepLR(deep_opt, step_size=3)\n",
    "\n",
-    "early_stopping = EarlyStopping()\n",
-    "\n",
-    "\n",
+    "@wandb_mixin\n",
    "def training_function(config, X_train, X_val):\n",
+    "    early_stopping = EarlyStopping()\n",
+    "    model_checkpoint = ModelCheckpoint(save_best_only=True, \n",
+    "                                       wb=wandb)\n",
    "    # Hyperparameters\n",
    "    batch_size = config[\"batch_size\"]\n",
    "    trainer = Trainer(\n",
    "        model,\n",
    "        objective=\"binary_focal_loss\",\n",
-    "        callbacks=[RayTuneReporter, LRHistory(n_epochs=10), early_stopping],\n",
+    "        callbacks=[RayTuneReporter, early_stopping, model_checkpoint],\n",
    "        lr_schedulers={\"deeptabular\": deep_sch},\n",
    "        initializers={\"deeptabular\": XavierNormal},\n",
    "        optimizers={\"deeptabular\": deep_opt},\n",
@@ -912,121 +880,67 @@
    "\n",
    "X_train = {\"X_tab\": X_tab_train, \"target\": y_train}\n",
    "X_val = {\"X_tab\": X_tab_valid, \"target\": y_valid}\n",
+    "\n",
+    "asha_scheduler = AsyncHyperBandScheduler(\n",
+    "    time_attr=\"training_iteration\",\n",
+    "    metric=\"_metric/val_loss\",\n",
+    "    mode=\"min\",\n",
+    "    max_t=100,\n",
+    "    grace_period=10,\n",
+    "    reduction_factor=3,\n",
+    "    brackets=1,\n",
+    ")\n",
+    "\n",
    "analysis = tune.run(\n",
    "    tune.with_parameters(training_function, X_train=X_train, X_val=X_val),\n",
    "    resources_per_trial={\"cpu\": 1, \"gpu\": 0},\n",
    "    progress_reporter=JupyterNotebookReporter(overwrite=True),\n",
+    "    scheduler=asha_scheduler,\n",
    "    config=config,\n",
+    "    callbacks=[WandbLoggerCallback(\n",
+    "    project=config[\"wandb\"][\"project\"],\n",
+    "    api_key_file=config[\"wandb\"][\"api_key_file\"],\n",
+    "    log_config=True)],\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 34,
+   "execution_count": 56,
+   "id": "fac74d5f",
   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "{'8f035_00000': {'_metric': {'train_loss': 0.007156053689332345,\n",
-       "   'train_Accuracy_0': 1.0,\n",
-       "   'train_Accuracy_1': 0.008678881451487541,\n",
-       "   'train_Precision': 0.9911835193634033,\n",
-       "   'train_Recall_0': 1.0,\n",
-       "   'train_Recall_1': 0.008678881451487541,\n",
-       "   'train_F1_0': 0.9955719113349915,\n",
-       "   'train_F1_1': 0.017208412289619446,\n",
-       "   'val_loss': 0.006261252580831448,\n",
-       "   'val_Accuracy_0': 1.0,\n",
-       "   'val_Accuracy_1': 0.023255813866853714,\n",
-       "   'val_Precision': 0.9913550615310669,\n",
-       "   'val_Recall_0': 1.0,\n",
-       "   'val_Recall_1': 0.023255813866853714,\n",
-       "   'val_F1_0': 0.9956578612327576,\n",
-       "   'val_F1_1': 0.045454543083906174,\n",
-       "   'lr_deeptabular_0': 0.010000000000000002},\n",
-       "  'time_this_iter_s': 3.5364139080047607,\n",
-       "  'done': True,\n",
-       "  'timesteps_total': None,\n",
-       "  'episodes_total': None,\n",
-       "  'training_iteration': 5,\n",
-       "  'experiment_id': 'f62bb9c9c32a45b9af85a6a3b0e30d94',\n",
-       "  'date': '2021-10-18_15-59-28',\n",
-       "  'timestamp': 1634565568,\n",
-       "  'time_total_s': 18.25888180732727,\n",
-       "  'pid': 15457,\n",
-       "  'hostname': 'infinito.bbrouter',\n",
-       "  'node_ip': '192.168.18.34',\n",
-       "  'config': {'batch_size': 1000},\n",
-       "  'time_since_restore': 18.25888180732727,\n",
-       "  'timesteps_since_restore': 0,\n",
-       "  'iterations_since_restore': 5,\n",
-       "  'trial_id': '8f035_00000',\n",
-       "  'experiment_tag': '0_batch_size=1000'},\n",
-       " '8f035_00001': {'_metric': {'train_loss': 0.019367387828727562,\n",
-       "   'train_Accuracy_0': 0.9999827146530151,\n",
-       "   'train_Accuracy_1': 0.01157184224575758,\n",
-       "   'train_Precision': 0.991192102432251,\n",
-       "   'train_Recall_0': 0.9999827146530151,\n",
-       "   'train_Recall_1': 0.01157184224575758,\n",
-       "   'train_F1_0': 0.9955761432647705,\n",
-       "   'train_F1_1': 0.022835396230220795,\n",
-       "   'val_loss': 0.01834123209118843,\n",
-       "   'val_Accuracy_0': 1.0,\n",
-       "   'val_Accuracy_1': 0.0,\n",
-       "   'val_Precision': 0.9911492466926575,\n",
-       "   'val_Recall_0': 1.0,\n",
-       "   'val_Recall_1': 0.0,\n",
-       "   'val_F1_0': 0.9955549836158752,\n",
-       "   'val_F1_1': 0.0,\n",
-       "   'lr_deeptabular_0': 0.010000000000000002},\n",
-       "  'time_this_iter_s': 3.42478084564209,\n",
-       "  'done': True,\n",
-       "  'timesteps_total': None,\n",
-       "  'episodes_total': None,\n",
-       "  'training_iteration': 5,\n",
-       "  'experiment_id': 'd7379f1debc14e41bc971bf4c27b6793',\n",
-       "  'date': '2021-10-18_15-59-27',\n",
-       "  'timestamp': 1634565567,\n",
-       "  'time_total_s': 18.036858797073364,\n",
-       "  'pid': 15456,\n",
-       "  'hostname': 'infinito.bbrouter',\n",
-       "  'node_ip': '192.168.18.34',\n",
-       "  'config': {'batch_size': 5000},\n",
-       "  'time_since_restore': 18.036858797073364,\n",
-       "  'timesteps_since_restore': 0,\n",
-       "  'iterations_since_restore': 5,\n",
-       "  'trial_id': '8f035_00001',\n",
-       "  'experiment_tag': '1_batch_size=5000'}}"
-      ]
-     },
-     "execution_count": 34,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
+   "outputs": [],
   "source": [
    "analysis.results"
   ]
  },
  {
-   "cell_type": "code",
-   "execution_count": 16,
+   "cell_type": "markdown",
+   "id": "81450d98",
+   "metadata": {},
+   "source": [
+    "Using Weights and Biases logging you can create [parallel coordinates graphs](https://docs.wandb.ai/ref/app/features/panels/parallel-coordinates) that map parametr combinations to the best(lowest) loss achieved during the training of the networks\n",
+    "\n",
+    "![WNB](wnb.png \"parallel coordinates\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "56fc4823",
   "metadata": {},
-   "outputs": [],
   "source": [
-    "# %load_ext tensorboard"
+    "local visualization of raytune reults using tensorboard"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 17,
-   "metadata": {
-    "scrolled": true
-   },
+   "execution_count": 59,
+   "id": "e1719cc0",
+   "metadata": {},
   "outputs": [],
   "source": [
-    "# %tensorboard --logdir ~/ray_results"
+    "%load_ext tensorboard\n",
+    "%tensorboard --logdir ~/ray_results"
   ]
  }
 ],
@@ -1035,8 +949,7 @@
   "hash": "3b99005fd577fa40f3cce433b2b92303885900e634b2b5344c07c59d06c8792d"
  },
  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
+   "display_name": "Python 3.8.5 64-bit ('base': conda)",
   "name": "python3"
  },
  "language_info": {
@@ -1049,7 +962,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.7.7"
+   "version": "3.8.5"
  },
  "toc": {
   "base_numbering": 1,

--- a/examples/13_Model_Uncertainty_prediction.ipynb
+++ b/examples/13_Model_Uncertainty_prediction.ipynb
--- a/examples/wnb.png
+++ b/examples/wnb.png
--- a/pytorch_widedeep/callbacks.py
+++ b/pytorch_widedeep/callbacks.py
@@ -338,12 +338,13 @@ class ModelCheckpoint(Callback):

    Parameters
    ----------
-    filepath: str
+    filepath: str, default=None
        Full path to save the output weights. It must contain only the root of
        the filenames. Epoch number and ``.pt`` extension (for pytorch) will
        be added. e.g. ``filepath="path/to/output_weights/weights_out"`` And
        the saved files in that directory will be named: ``weights_out_1.pt,
        weights_out_2.pt, ...``
+        If set to None the class just report best metric and best_epoch.
    monitor: str, default="loss"
        quantity to monitor. Typically 'val_loss' or metric name (e.g. 'val_acc')
    verbose:int, default=0
@@ -362,6 +363,15 @@ class ModelCheckpoint(Callback):
        Interval (number of epochs) between checkpoints.
    max_save: int, default=-1
        Maximum number of outputs to save. If -1 will save all outputs
+    wb: obj, default=None
+        Weights&Biases API interface to report single best result usable for comparisson of multiple
+        paramater combinations by e.g. parallel coordinates:
+        https://docs.wandb.ai/ref/app/features/panels/parallel-coordinates.
+        E.g W&B summary report `wandb.run.summary["best"]`:
+        If external EarlyStopping scheduler is used from e.g. RayTune in combination with W&B,
+        the RayTune EarlyStopping stops training function and the summary log is not sent if defined
+        after training by e.g.:
+        `wandb.run.summary["best"]=model_checkpoint.best`.

    Attributes
    ----------
@@ -369,6 +379,8 @@ class ModelCheckpoint(Callback):
        best metric
    best_epoch: int
        best epoch
+    best_state_dict: dict
+        best model state dictionary to restore model to its best state using trainer.model.load_state_dict(ModelCheckpoint.best_state_dict)

    Examples
    --------
@@ -386,13 +398,14 @@ class ModelCheckpoint(Callback):

    def __init__(
        self,
-        filepath: str,
+        filepath: Optional[str] = None,
        monitor: str = "val_loss",
        verbose: int = 0,
        save_best_only: bool = False,
        mode: str = "auto",
        period: int = 1,
        max_save: int = -1,
+        wb: Optional[object] = None,
    ):
        super(ModelCheckpoint, self).__init__()

@@ -403,18 +416,20 @@ class ModelCheckpoint(Callback):
        self.mode = mode
        self.period = period
        self.max_save = max_save
+        self.wb = wb

        self.epochs_since_last_save = 0

-        if len(self.filepath.split("/")[:-1]) == 0:
-            raise ValueError(
-                "'filepath' must be the full path to save the output weights,"
-                " including the root of the filenames. e.g. 'checkpoints/weights_out'"
-            )
+        if self.filepath:
+            if len(self.filepath.split("/")[:-1]) == 0:
+                raise ValueError(
+                    "'filepath' must be the full path to save the output weights,"
+                    " including the root of the filenames. e.g. 'checkpoints/weights_out'"
+                )

-        root_dir = ("/").join(self.filepath.split("/")[:-1])
-        if not os.path.exists(root_dir):
-            os.makedirs(root_dir)
+            root_dir = ("/").join(self.filepath.split("/")[:-1])
+            if not os.path.exists(root_dir):
+                os.makedirs(root_dir)

        if self.max_save > 0:
            self.old_files: List[str] = []
@@ -447,7 +462,8 @@ class ModelCheckpoint(Callback):
        self.epochs_since_last_save += 1
        if self.epochs_since_last_save >= self.period:
            self.epochs_since_last_save = 0
-            filepath = "{}_{}.p".format(self.filepath, epoch + 1)
+            if self.filepath:
+                filepath = "{}_{}.p".format(self.filepath, epoch + 1)
            if self.save_best_only:
                current = logs.get(self.monitor)
                if current is None:
@@ -459,35 +475,50 @@ class ModelCheckpoint(Callback):
                else:
                    if self.monitor_op(current, self.best):
                        if self.verbose > 0:
-                            print(
-                                "\nEpoch %05d: %s improved from %0.5f to %0.5f,"
-                                " saving model to %s"
-                                % (
-                                    epoch + 1,
-                                    self.monitor,
-                                    self.best,
-                                    current,
-                                    filepath,
+                            if self.filepath:
+                                print(
+                                    "\nEpoch %05d: %s improved from %0.5f to %0.5f,"
+                                    " saving model to %s"
+                                    % (
+                                        epoch + 1,
+                                        self.monitor,
+                                        self.best,
+                                        current,
+                                        filepath,
+                                    )
                                )
-                            )
+                            else:
+                                print(
+                                    "\nEpoch %05d: %s improved from %0.5f to %0.5f"
+                                    % (
+                                        epoch + 1,
+                                        self.monitor,
+                                        self.best,
+                                        current,
+                                    )
+                                )
+                        if self.wb is not None:
+                            self.wb.run.summary["best"] = current  # type: ignore[attr-defined]
                        self.best = current
                        self.best_epoch = epoch
-                        torch.save(self.model.state_dict(), filepath)
-                        if self.max_save > 0:
-                            if len(self.old_files) == self.max_save:
-                                try:
-                                    os.remove(self.old_files[0])
-                                except FileNotFoundError:
-                                    pass
-                                self.old_files = self.old_files[1:]
-                            self.old_files.append(filepath)
+                        self.best_state_dict = self.model.state_dict()
+                        if self.filepath:
+                            torch.save(self.best_state_dict, filepath)
+                            if self.max_save > 0:
+                                if len(self.old_files) == self.max_save:
+                                    try:
+                                        os.remove(self.old_files[0])
+                                    except FileNotFoundError:
+                                        pass
+                                    self.old_files = self.old_files[1:]
+                                self.old_files.append(filepath)
                    else:
                        if self.verbose > 0:
                            print(
                                "\nEpoch %05d: %s did not improve from %0.5f"
                                % (epoch + 1, self.monitor, self.best)
                            )
-            else:
+            if not self.save_best_only and self.filepath:
                if self.verbose > 0:
                    print("\nEpoch %05d: saving model to %s" % (epoch + 1, filepath))
                torch.save(self.model.state_dict(), filepath)

--- a/pytorch_widedeep/datasets/__init__.py
+++ b/pytorch_widedeep/datasets/__init__.py
+from ._base import load_adult, load_bio_kdd04
+
+__all__ = ["load_bio_kdd04", "load_adult"]
--- a/pytorch_widedeep/datasets/_base.py
+++ b/pytorch_widedeep/datasets/_base.py
+from importlib import resources
+
+import pandas as pd
+
+
+def load_bio_kdd04(as_frame: bool = False):
+    """Load and return the higly imbalanced Protein Homology
+    Dataset from [KDD cup 2004](https://www.kdd.org/kdd-cup/view/kdd-cup-2004/Data.
+    This datasets include only bio_train.dat part of the dataset
+
+
+    * The first element of each line is a BLOCK ID that denotes to which native sequence
+    this example belongs. There is a unique BLOCK ID for each native sequence.
+    BLOCK IDs are integers running from 1 to 303 (one for each native sequence,
+    i.e. for each query). BLOCK IDs were assigned before the blocks were split
+    into the train and test sets, so they do not run consecutively in either file.
+    * The second element of each line is an EXAMPLE ID that uniquely describes
+    the example. You will need this EXAMPLE ID and the BLOCK ID when you submit results.
+    * The third element is the class of the example. Proteins that are homologous to
+    the native sequence are denoted by 1, non-homologous proteins (i.e. decoys) by 0.
+    Test examples have a "?" in this position.
+    * All following elements are feature values. There are 74 feature values in each line.
+    The features describe the match (e.g. the score of a sequence alignment) between
+    the native protein sequence and the sequence that is tested for homology.
+    """
+
+    header_list = ["EXAMPLE_ID", "BLOCK_ID", "target"] + [str(i) for i in range(4, 78)]
+    with resources.path("pytorch_widedeep.datasets.data", "bio_train.dat") as fpath:
+        df = pd.read_csv(fpath, sep="\t", names=header_list)
+
+    if as_frame:
+        return df
+    else:
+        return df.to_numpy()
+
+
+def load_adult(as_frame: bool = False):
+    """Load and return the [adult income datatest](http://www.cs.toronto.edu/~delve/data/adult/desc.html).
+    you may find detailed description [here](http://www.cs.toronto.edu/~delve/data/adult/adultDetail.html)
+    """
+
+    with resources.path("pytorch_widedeep.datasets.data", "adult.csv.zip") as fpath:
+        df = pd.read_csv(fpath)
+
+    if as_frame:
+        return df
+    else:
+        return df.to_numpy()
--- a/pytorch_widedeep/datasets/data/__init__.py
+++ b/pytorch_widedeep/datasets/data/__init__.py
--- a/pytorch_widedeep/datasets/data/adult.csv.zip
+++ b/pytorch_widedeep/datasets/data/adult.csv.zip
--- a/pytorch_widedeep/datasets/data/bio_train.dat
+++ b/pytorch_widedeep/datasets/data/bio_train.dat
--- a/pytorch_widedeep/models/transformers/_embeddings_layers.py
+++ b/pytorch_widedeep/models/transformers/_embeddings_layers.py
@@ -4,6 +4,7 @@ https://github.com/awslabs/autogluon/tree/master/tabular/src/autogluon/tabular/m
 """

 import math
+import warnings

 import torch
 from torch import nn
@@ -18,10 +19,13 @@ class FullEmbeddingDropout(nn.Module):
        self.dropout = dropout

    def forward(self, X: Tensor) -> Tensor:
-        mask = X.new().resize_((X.size(1), 1)).bernoulli_(1 - self.dropout).expand_as(
-            X
-        ) / (1 - self.dropout)
-        return mask * X
+        if self.training:
+            mask = X.new().resize_((X.size(1), 1)).bernoulli_(
+                1 - self.dropout
+            ).expand_as(X) / (1 - self.dropout)
+            return mask * X
+        else:
+            return X


 DropoutLayers = Union[nn.Dropout, FullEmbeddingDropout]
@@ -125,13 +129,16 @@ class CategoricalEmbeddings(nn.Module):
        self.categorical_cols = [ei[0] for ei in embed_input]
        self.cat_idx = [self.column_idx[col] for col in self.categorical_cols]

-        self.bias = (
-            nn.Parameter(torch.Tensor(len(self.categorical_cols), embed_dim))
-            if use_bias
-            else None
-        )
-        if self.bias is not None:
+        if use_bias is not None:
+            self.bias = nn.Parameter(
+                torch.Tensor(len(self.categorical_cols), embed_dim)
+            )
            nn.init.kaiming_uniform_(self.bias, a=math.sqrt(5))
+            if shared_embed:
+                warnings.warn(
+                    "The current implementation of 'SharedEmbeddings' does not use bias",
+                    UserWarning,
+                )

        # Categorical: val + 1 because 0 is reserved for padding/unseen cateogories.
        if self.shared_embed:
@@ -167,11 +174,10 @@ class CategoricalEmbeddings(nn.Module):
            x = torch.cat(cat_embed, 1)
        else:
            x = self.embed(X[:, self.cat_idx].long())
+            if self.bias is not None:
+                x = x + self.bias.unsqueeze(0)
            x = self.dropout(x)

-        if self.bias is not None:
-            x = x + self.bias.unsqueeze(0)
-
        return x



--- a/pytorch_widedeep/preprocessing/tab_preprocessor.py
+++ b/pytorch_widedeep/preprocessing/tab_preprocessor.py
@@ -12,10 +12,24 @@ from pytorch_widedeep.preprocessing.base_preprocessor import (
 )


-def embed_sz_rule(n_cat):
-    r"""Rule of thumb to pick embedding size corresponding to ``n_cat``. Taken
-    from fastai's Tabular API"""
-    return min(600, round(1.6 * n_cat ** 0.56))
+def embed_sz_rule(n_cat: int, embedding_rule: str = "fastai_new") -> int:
+    r"""Rule of thumb to pick embedding size corresponding to ``n_cat``. Default rule is taken
+    from recent fastai's Tabular API. The function also includes previously used rule by fastai
+    and rule included in the Google's Tensorflow documentation
+
+    Parameters
+    ----------
+    n_cat: int
+        number of unique categorical values in a feature
+    embedding_rule: str, default = fastai_old
+        rule of thumb to be used for embedding vector size
+    """
+    if embedding_rule == "google":
+        return int(round(n_cat ** 0.25))
+    elif embedding_rule == "fastai_old":
+        return int(min(50, (n_cat // 2) + 1))
+    else:
+        return int(min(600, round(1.6 * n_cat ** 0.56)))


 class TabPreprocessor(BasePreprocessor):
@@ -38,8 +52,16 @@ class TabPreprocessor(BasePreprocessor):
        :obj:`pytorch_widedeep.models.transformers._embedding_layers`
    auto_embed_dim: bool, default = True
        Boolean indicating whether the embedding dimensions will be
-        automatically defined via fastai's rule of thumb':
-        :math:`min(600, int(1.6 \times n_{cat}^{0.56}))`
+        automatically defined via rule of thumb
+    embedding_rule: str, default = 'fastai_new'
+        choice of embedding rule of thumb are:
+
+        - 'fastai_new' -- :math:`min(600, round(1.6 \times n_{cat}^{0.56}))`
+
+        - 'fastai_old' -- :math:`min(50, (n_{cat}//{2})+1)`
+
+        - 'google' -- :math:`min(600, round(n_{cat}^{0.24}))`
+
    default_embed_dim: int, default=16
        Dimension for the embeddings used for the ``deeptabular``
        component if the embed_dim is not provided in the ``embed_cols``
@@ -118,6 +140,7 @@ class TabPreprocessor(BasePreprocessor):
        continuous_cols: List[str] = None,
        scale: bool = True,
        auto_embed_dim: bool = True,
+        embedding_rule: str = "fastai_new",
        default_embed_dim: int = 16,
        already_standard: List[str] = None,
        for_transformer: bool = False,
@@ -131,6 +154,7 @@ class TabPreprocessor(BasePreprocessor):
        self.continuous_cols = continuous_cols
        self.scale = scale
        self.auto_embed_dim = auto_embed_dim
+        self.embedding_rule = embedding_rule
        self.default_embed_dim = default_embed_dim
        self.already_standard = already_standard
        self.for_transformer = for_transformer
@@ -250,7 +274,10 @@ class TabPreprocessor(BasePreprocessor):
                embed_colname = [emb[0] for emb in self.embed_cols]
            elif self.auto_embed_dim:
                n_cats = {col: df[col].nunique() for col in self.embed_cols}
-                self.embed_dim = {col: embed_sz_rule(n_cat) for col, n_cat in n_cats.items()}  # type: ignore[misc]
+                self.embed_dim = {
+                    col: embed_sz_rule(n_cat, self.embedding_rule)  # type: ignore[misc]
+                    for col, n_cat in n_cats.items()
+                }
                embed_colname = self.embed_cols  # type: ignore
            else:
                self.embed_dim = {e: self.default_embed_dim for e in self.embed_cols}  # type: ignore

--- a/pytorch_widedeep/training/trainer.py
+++ b/pytorch_widedeep/training/trainer.py
 import os
 import json
 import warnings
-from copy import deepcopy
 from pathlib import Path

 import numpy as np
@@ -20,7 +19,6 @@ from pytorch_widedeep.callbacks import (
    History,
    Callback,
    MetricCallback,
-    RayTuneReporter,
    CallbackContainer,
    LRShedulerCallback,
 )
@@ -685,8 +683,14 @@ class Trainer:
            If a trainer is used to predict after having trained a model, the
            ``batch_size`` needs to be defined as it will not be defined as
            the :obj:`Trainer` is instantiated
+        uncertainty: bool, default = False
+            If set to True the model activates the dropout layers and predicts
+            the each sample N times (uncertainty_granularity times) and returns
+            {max, min, mean, stdev} value for each sample
+        uncertainty_granularity: int default = 1000
+            number of times the model does prediction for each sample if uncertainty
+            is set to True
        """
-
        preds_l = self._predict(X_wide, X_tab, X_text, X_img, X_test, batch_size)
        if self.method == "regression":
            return np.vstack(preds_l).squeeze(1)
@@ -697,6 +701,96 @@ class Trainer:
            preds = np.vstack(preds_l)
            return np.argmax(preds, 1)  # type: ignore[return-value]

+    def predict_uncertainty(  # type: ignore[return]
+        self,
+        X_wide: Optional[np.ndarray] = None,
+        X_tab: Optional[np.ndarray] = None,
+        X_text: Optional[np.ndarray] = None,
+        X_img: Optional[np.ndarray] = None,
+        X_test: Optional[Dict[str, np.ndarray]] = None,
+        batch_size: int = 256,
+        uncertainty_granularity=1000,
+    ) -> np.ndarray:
+        r"""Returns the predicted ucnertainty of the model for the test dataset using a
+        Monte Carlo method during which dropout layers are activated in the evaluation/prediction
+        phase and each sample is predicted N times (uncertainty_granularity times). Based on [1].
+
+        [1] Gal Y. & Ghahramani Z., 2016, Dropout as a Bayesian Approximation: Representing Model
+        Uncertainty in Deep Learning, Proceedings of the 33rd International Conference on Machine Learning
+
+        Parameters
+        ----------
+        X_wide: np.ndarray, Optional. default=None
+            Input for the ``wide`` model component.
+            See :class:`pytorch_widedeep.preprocessing.WidePreprocessor`
+        X_tab: np.ndarray, Optional. default=None
+            Input for the ``deeptabular`` model component.
+            See :class:`pytorch_widedeep.preprocessing.TabPreprocessor`
+        X_text: np.ndarray, Optional. default=None
+            Input for the ``deeptext`` model component.
+            See :class:`pytorch_widedeep.preprocessing.TextPreprocessor`
+        X_img : np.ndarray, Optional. default=None
+            Input for the ``deepimage`` model component.
+            See :class:`pytorch_widedeep.preprocessing.ImagePreprocessor`
+        X_test: Dict, Optional. default=None
+            The test dataset can also be passed in a dictionary. Keys are
+            `X_wide`, `'X_tab'`, `'X_text'`, `'X_img'` and `'target'`. Values
+            are the corresponding matrices.
+        batch_size: int, default = 256
+            If a trainer is used to predict after having trained a model, the
+            ``batch_size`` needs to be defined as it will not be defined as
+            the :obj:`Trainer` is instantiated
+        uncertainty_granularity: int default = 1000
+            number of times the model does prediction for each sample if uncertainty
+            is set to True
+
+        Returns
+        -------
+            method == regression : np.ndarray
+                {max, min, mean, stdev} values for each sample
+            method == binary : np.ndarray
+                {mean_cls_0_prob, mean_cls_1_prob, predicted_cls} values for each sample
+            method == multiclass : np.ndarray
+                {mean_cls_0_prob, mean_cls_1_prob, mean_cls_2_prob, ... , predicted_cls} values for each sample
+
+        """
+        preds_l = self._predict(
+            X_wide,
+            X_tab,
+            X_text,
+            X_img,
+            X_test,
+            batch_size,
+            uncertainty_granularity,
+            uncertainty=True,
+        )
+        preds = np.vstack(preds_l)
+        samples_num = int(preds.shape[0] / uncertainty_granularity)
+        if self.method == "regression":
+            preds = preds.squeeze(1)
+            preds = preds.reshape((uncertainty_granularity, samples_num))
+            return np.array(
+                (
+                    preds.max(axis=0),
+                    preds.min(axis=0),
+                    preds.mean(axis=0),
+                    preds.std(axis=0),
+                )
+            ).T
+        if self.method == "binary":
+            preds = preds.squeeze(1)
+            preds = preds.reshape((uncertainty_granularity, samples_num))
+            preds = preds.mean(axis=0)
+            probs = np.zeros([preds.shape[0], 3])
+            probs[:, 0] = 1 - preds
+            probs[:, 1] = preds
+            return probs
+        if self.method == "multiclass":
+            preds = preds.reshape(uncertainty_granularity, samples_num, preds.shape[1])
+            preds = preds.mean(axis=0)
+            preds = np.hstack((preds, np.vstack(np.argmax(preds, 1))))
+            return preds
+
    def predict_proba(  # type: ignore[return]
        self,
        X_wide: Optional[np.ndarray] = None,
@@ -944,14 +1038,11 @@ class Trainer:
            for callback in self.callback_container.callbacks:
                if callback.__class__.__name__ == "ModelCheckpoint":
                    if callback.save_best_only:
-                        filepath = "{}_{}.p".format(
-                            callback.filepath, callback.best_epoch + 1
-                        )
                        if self.verbose:
                            print(
                                f"Model weights restored to best epoch: {callback.best_epoch + 1}"
                            )
-                        self.model.load_state_dict(torch.load(filepath))
+                        self.model.load_state_dict(callback.best_state_dict)
                    else:
                        if self.verbose:
                            print(
@@ -1104,7 +1195,7 @@ class Trainer:
            k: v for k, v in zip(tabnet_backbone.column_idx.keys(), feat_imp)  # type: ignore[operator, union-attr]
        }

-    def _predict(
+    def _predict(  # noqa: C901
        self,
        X_wide: Optional[np.ndarray] = None,
        X_tab: Optional[np.ndarray] = None,
@@ -1112,6 +1203,8 @@ class Trainer:
        X_img: Optional[np.ndarray] = None,
        X_test: Optional[Dict[str, np.ndarray]] = None,
        batch_size: int = 256,
+        uncertainty_granularity=1000,
+        uncertainty: bool = False,
    ) -> List:
        r"""Private method to avoid code repetition in predict and
        predict_proba. For parameter information, please, see the .predict()
@@ -1144,20 +1237,41 @@ class Trainer:

        self.model.eval()
        preds_l = []
+
+        if uncertainty:
+            for m in self.model.modules():
+                if m.__class__.__name__.startswith("Dropout"):
+                    m.train()
+            prediction_iters = uncertainty_granularity
+        else:
+            prediction_iters = 1
+
        with torch.no_grad():
-            with trange(test_steps, disable=self.verbose != 1) as t:
-                for i, data in zip(t, test_loader):
-                    t.set_description("predict")
-                    X = {k: v.cuda() for k, v in data.items()} if use_cuda else data
-                    preds = (
-                        self.model(X) if not self.model.is_tabnet else self.model(X)[0]
-                    )
-                    if self.method == "binary":
-                        preds = torch.sigmoid(preds)
-                    if self.method == "multiclass":
-                        preds = F.softmax(preds, dim=1)
-                    preds = preds.cpu().data.numpy()
-                    preds_l.append(preds)
+            with trange(uncertainty_granularity, disable=uncertainty is False) as t:
+                for i, k in zip(t, range(prediction_iters)):
+                    t.set_description("predict_UncertaintyIter")
+
+                    with trange(
+                        test_steps, disable=self.verbose != 1 or uncertainty is True
+                    ) as tt:
+                        for j, data in zip(tt, test_loader):
+                            tt.set_description("predict")
+                            X = (
+                                {k: v.cuda() for k, v in data.items()}
+                                if use_cuda
+                                else data
+                            )
+                            preds = (
+                                self.model(X)
+                                if not self.model.is_tabnet
+                                else self.model(X)[0]
+                            )
+                            if self.method == "binary":
+                                preds = torch.sigmoid(preds)
+                            if self.method == "multiclass":
+                                preds = F.softmax(preds, dim=1)
+                            preds = preds.cpu().data.numpy()
+                            preds_l.append(preds)
        self.model.train()
        return preds_l


--- a/pytorch_widedeep/version.py
+++ b/pytorch_widedeep/version.py
-__version__ = "1.0.11"
+__version__ = "1.0.12"
--- a/setup.py
+++ b/setup.py
@@ -81,6 +81,7 @@ setup_kwargs = {
        "Topic :: Scientific/Engineering :: Artificial Intelligence",
    ],
    "zip_safe": True,
+    "package_data": {"pytorch_widedeep": ["datasets/data/*"]},
    "packages": setuptools.find_packages(exclude=["test_*.py"]),
 }


--- a/tests/test_data_utils/test_du_deep_tabular.py
+++ b/tests/test_data_utils/test_du_deep_tabular.py
@@ -269,7 +269,15 @@ def test_notfittederror():
 ###############################################################################


-def test_embed_sz_rule_of_thumb():
+@pytest.mark.parametrize(
+    "rule",
+    [
+        ("google"),
+        ("fastai_old"),
+        ("fastai_new"),
+    ],
+)
+def test_embed_sz_rule_of_thumb(rule):

    embed_cols = ["col1", "col2"]
    df = pd.DataFrame(
@@ -279,8 +287,8 @@ def test_embed_sz_rule_of_thumb():
        }
    )
    n_cats = {c: df[c].nunique() for c in ["col1", "col2"]}
-    embed_szs = {c: embed_sz_rule(nc) for c, nc in n_cats.items()}
-    tab_preprocessor = TabPreprocessor(embed_cols=embed_cols)
+    embed_szs = {c: embed_sz_rule(nc, embedding_rule=rule) for c, nc in n_cats.items()}
+    tab_preprocessor = TabPreprocessor(embed_cols=embed_cols, embedding_rule=rule)
    tdf = tab_preprocessor.fit_transform(df)  # noqa: F841
    out = [
        tab_preprocessor.embed_dim[col] == embed_szs[col] for col in embed_szs.keys()

--- a/tests/test_datasets/test_datasets.py
+++ b/tests/test_datasets/test_datasets.py
+import numpy as np
+import pandas as pd
+import pytest
+
+from pytorch_widedeep.datasets import load_adult, load_bio_kdd04
+
+
+@pytest.mark.parametrize(
+    "as_frame",
+    [
+        (True),
+        (False),
+    ],
+)
+def test_load_bio_kdd04(as_frame):
+    df = load_bio_kdd04(as_frame=as_frame)
+    if as_frame:
+        assert (df.shape, type(df)) == ((145751, 77), pd.DataFrame)
+    else:
+        assert (df.shape, type(df)) == ((145751, 77), np.ndarray)
+
+
+@pytest.mark.parametrize(
+    "as_frame",
+    [
+        (True),
+        (False),
+    ],
+)
+def test_load_adult(as_frame):
+    df = load_adult(as_frame=as_frame)
+    if as_frame:
+        assert (df.shape, type(df)) == ((48842, 15), pd.DataFrame)
+    else:
+        assert (df.shape, type(df)) == ((48842, 15), np.ndarray)
--- a/tests/test_model_functioning/test_callbacks.py
+++ b/tests/test_model_functioning/test_callbacks.py
@@ -168,9 +168,15 @@ def test_early_stop():
 # Test that ModelCheckpoint behaves as expected
 ###############################################################################
 @pytest.mark.parametrize(
-    "save_best_only, max_save, n_files", [(True, 2, 2), (False, 2, 2), (False, 0, 5)]
+    "fpath, save_best_only, max_save, n_files",
+    [
+        ("tests/test_model_functioning/weights/test_weights", True, 2, 2),
+        ("tests/test_model_functioning/weights/test_weights", False, 2, 2),
+        ("tests/test_model_functioning/weights/test_weights", False, 0, 5),
+        (None, False, 0, 0),
+    ],
 )
-def test_model_checkpoint(save_best_only, max_save, n_files):
+def test_model_checkpoint(fpath, save_best_only, max_save, n_files):
    wide = Wide(np.unique(X_wide).shape[0], 1)
    deeptabular = TabMlp(
        mlp_hidden_dims=[32, 16],
@@ -185,7 +191,7 @@ def test_model_checkpoint(save_best_only, max_save, n_files):
        objective="binary",
        callbacks=[
            ModelCheckpoint(
-                "tests/test_model_functioning/weights/test_weights",
+                filepath=fpath,
                save_best_only=save_best_only,
                max_save=max_save,
            )
@@ -193,10 +199,11 @@ def test_model_checkpoint(save_best_only, max_save, n_files):
        verbose=0,
    )
    trainer.fit(X_wide=X_wide, X_tab=X_tab, target=target, n_epochs=5, val_split=0.2)
-    n_saved = len(os.listdir("tests/test_model_functioning/weights/"))
-
-    shutil.rmtree("tests/test_model_functioning/weights/")
-
+    if fpath:
+        n_saved = len(os.listdir("tests/test_model_functioning/weights/"))
+        shutil.rmtree("tests/test_model_functioning/weights/")
+    else:
+        n_saved = 0
    assert n_saved <= n_files


@@ -340,6 +347,7 @@ def test_modelcheckpoint_mode_options():
    model_checkpoint_2 = ModelCheckpoint(filepath=fpath, monitor="val_loss")
    model_checkpoint_3 = ModelCheckpoint(filepath=fpath, monitor="acc", mode="max")
    model_checkpoint_4 = ModelCheckpoint(filepath=fpath, monitor="acc")
+    model_checkpoint_5 = ModelCheckpoint(filepath=None, monitor="acc")

    is_min = model_checkpoint_1.monitor_op is np.less
    best_inf = model_checkpoint_1.best is np.Inf
@@ -349,6 +357,8 @@ def test_modelcheckpoint_mode_options():
    best_minus_inf = -model_checkpoint_3.best == np.Inf
    auto_is_max = model_checkpoint_4.monitor_op is np.greater
    auto_best_minus_inf = -model_checkpoint_4.best == np.Inf
+    auto_is_max = model_checkpoint_5.monitor_op is np.greater
+    auto_best_minus_inf = -model_checkpoint_5.best == np.Inf

    shutil.rmtree("tests/test_model_functioning/modelcheckpoint/")

@@ -478,6 +488,16 @@ def test_early_stopping_get_state():

 def test_ray_tune_reporter():

+    rt_wide = Wide(np.unique(X_wide).shape[0], 1)
+    rt_deeptabular = TabMlp(
+        mlp_hidden_dims=[32, 16],
+        mlp_dropout=[0.5, 0.5],
+        column_idx=column_idx,
+        embed_input=embed_input,
+        continuous_cols=colnames[-5:],
+    )
+    rt_model = WideDeep(wide=rt_wide, deeptabular=rt_deeptabular)
+
    config = {
        "batch_size": tune.grid_search([8, 16]),
    }
@@ -486,7 +506,7 @@ def test_ray_tune_reporter():
        batch_size = config["batch_size"]

        trainer = Trainer(
-            model,
+            rt_model,
            objective="binary",
            callbacks=[RayTuneReporter],
            verbose=0,
@@ -503,7 +523,9 @@ def test_ray_tune_reporter():
    analysis = tune.run(
        tune.with_parameters(training_function),
        config=config,
-        resources_per_trial={"cpu": 1, "gpu": 0},
+        resources_per_trial={"cpu": 1, "gpu": 0}
+        if not torch.cuda.is_available()
+        else {"cpu": 0, "gpu": 1},
        verbose=0,
    )


--- a/tests/test_model_functioning/test_fit_methods.py
+++ b/tests/test_model_functioning/test_fit_methods.py
@@ -43,14 +43,14 @@ X_test = {"X_wide": X_wide, "X_tab": X_tab}
 # work well
 ##############################################################################
 @pytest.mark.parametrize(
-    "X_wide, X_tab, target, objective, X_wide_test, X_tab_test, X_test, pred_dim, probs_dim",
+    "X_wide, X_tab, target, objective, X_test, pred_dim, probs_dim, uncertainties_pred_dim",
    [
-        (X_wide, X_tab, target_regres, "regression", X_wide, X_tab, None, 1, None),
-        (X_wide, X_tab, target_binary, "binary", X_wide, X_tab, None, 1, 2),
-        (X_wide, X_tab, target_multic, "multiclass", X_wide, X_tab, None, 3, 3),
-        (X_wide, X_tab, target_regres, "regression", None, None, X_test, 1, None),
-        (X_wide, X_tab, target_binary, "binary", None, None, X_test, 1, 2),
-        (X_wide, X_tab, target_multic, "multiclass", None, None, X_test, 3, 3),
+        (X_wide, X_tab, target_regres, "regression", None, 1, None, 4),
+        (X_wide, X_tab, target_binary, "binary", None, 1, 2, 3),
+        (X_wide, X_tab, target_multic, "multiclass", None, 3, 3, 4),
+        (X_wide, X_tab, target_regres, "regression", X_test, 1, None, 4),
+        (X_wide, X_tab, target_binary, "binary", X_test, 1, 2, 3),
+        (X_wide, X_tab, target_multic, "multiclass", X_test, 3, 3, 4),
    ],
 )
 def test_fit_objectives(
@@ -58,11 +58,10 @@ def test_fit_objectives(
    X_tab,
    target,
    objective,
-    X_wide_test,
-    X_tab_test,
    X_test,
    pred_dim,
    probs_dim,
+    uncertainties_pred_dim,
 ):
    wide = Wide(np.unique(X_wide).shape[0], pred_dim)
    deeptabular = TabMlp(
@@ -76,11 +75,22 @@ def test_fit_objectives(
    trainer = Trainer(model, objective=objective, verbose=0)
    trainer.fit(X_wide=X_wide, X_tab=X_tab, target=target, batch_size=16)
    preds = trainer.predict(X_wide=X_wide, X_tab=X_tab, X_test=X_test)
-    if objective == "binary":
-        pass
+    probs = trainer.predict_proba(X_wide=X_wide, X_tab=X_tab, X_test=X_test)
+    unc_preds = trainer.predict_uncertainty(
+        X_wide=X_wide, X_tab=X_tab, X_test=X_test, uncertainty_granularity=5
+    )
+    if objective == "regression":
+        assert (preds.shape[0], probs, unc_preds.shape[1]) == (
+            32,
+            probs_dim,
+            uncertainties_pred_dim,
+        )
    else:
-        probs = trainer.predict_proba(X_wide=X_wide, X_tab=X_tab, X_test=X_test)
-    assert preds.shape[0] == 32, probs.shape[1] == probs_dim
+        assert (preds.shape[0], probs.shape[1], unc_preds.shape[1]) == (
+            32,
+            probs_dim,
+            uncertainties_pred_dim,
+        )


 ##############################################################################
@@ -100,7 +110,10 @@ def test_fit_with_deephead():
    trainer.fit(X_wide=X_wide, X_tab=X_tab, target=target_binary, batch_size=16)
    preds = trainer.predict(X_wide=X_wide, X_tab=X_tab, X_test=X_test)
    probs = trainer.predict_proba(X_wide=X_wide, X_tab=X_tab, X_test=X_test)
-    assert preds.shape[0] == 32, probs.shape[1] == 2
+    unc_preds = trainer.predict_uncertainty(
+        X_wide=X_wide, X_tab=X_tab, X_test=X_test, uncertainty_granularity=5
+    )
+    assert (preds.shape[0], probs.shape[1], unc_preds.shape[1]) == (32, 2, 3)


 ##############################################################################
@@ -109,14 +122,14 @@ def test_fit_with_deephead():


 @pytest.mark.parametrize(
-    "X_wide, X_tab, target, objective, X_wide_test, X_tab_test, X_test, pred_dim, probs_dim",
+    "X_wide, X_tab, target, objective, X_wide_test, X_tab_test, X_test, pred_dim, probs_dim, uncertainties_pred_dim",
    [
-        (X_wide, X_tab, target_regres, "regression", X_wide, X_tab, None, 1, None),
-        (X_wide, X_tab, target_binary, "binary", X_wide, X_tab, None, 1, 2),
-        (X_wide, X_tab, target_multic, "multiclass", X_wide, X_tab, None, 3, 3),
-        (X_wide, X_tab, target_regres, "regression", None, None, X_test, 1, None),
-        (X_wide, X_tab, target_binary, "binary", None, None, X_test, 1, 2),
-        (X_wide, X_tab, target_multic, "multiclass", None, None, X_test, 3, 3),
+        (X_wide, X_tab, target_regres, "regression", X_wide, X_tab, None, 1, None, 4),
+        (X_wide, X_tab, target_binary, "binary", X_wide, X_tab, None, 1, 2, 3),
+        (X_wide, X_tab, target_multic, "multiclass", X_wide, X_tab, None, 3, 3, 4),
+        (X_wide, X_tab, target_regres, "regression", None, None, X_test, 1, None, 4),
+        (X_wide, X_tab, target_binary, "binary", None, None, X_test, 1, 2, 3),
+        (X_wide, X_tab, target_multic, "multiclass", None, None, X_test, 3, 3, 4),
    ],
 )
 def test_fit_objectives_tab_transformer(
@@ -129,6 +142,7 @@ def test_fit_objectives_tab_transformer(
    X_test,
    pred_dim,
    probs_dim,
+    uncertainties_pred_dim,
 ):
    wide = Wide(np.unique(X_wide).shape[0], pred_dim)
    tab_transformer = TabTransformer(
@@ -140,11 +154,22 @@ def test_fit_objectives_tab_transformer(
    trainer = Trainer(model, objective=objective, verbose=0)
    trainer.fit(X_wide=X_wide, X_tab=X_tab, target=target, batch_size=16)
    preds = trainer.predict(X_wide=X_wide, X_tab=X_tab, X_test=X_test)
-    if objective == "binary":
-        pass
+    probs = trainer.predict_proba(X_wide=X_wide, X_tab=X_tab, X_test=X_test)
+    unc_preds = trainer.predict_uncertainty(
+        X_wide=X_wide, X_tab=X_tab, X_test=X_test, uncertainty_granularity=5
+    )
+    if objective == "regression":
+        assert (preds.shape[0], probs, unc_preds.shape[1]) == (
+            32,
+            probs_dim,
+            uncertainties_pred_dim,
+        )
    else:
-        probs = trainer.predict_proba(X_wide=X_wide, X_tab=X_tab, X_test=X_test)
-    assert preds.shape[0] == 32, probs.shape[1] == probs_dim
+        assert (preds.shape[0], probs.shape[1], unc_preds.shape[1]) == (
+            32,
+            probs_dim,
+            uncertainties_pred_dim,
+        )


 ##############################################################################
@@ -153,14 +178,14 @@ def test_fit_objectives_tab_transformer(


 @pytest.mark.parametrize(
-    "X_wide, X_tab, target, objective, X_wide_test, X_tab_test, X_test, pred_dim, probs_dim",
+    "X_wide, X_tab, target, objective, X_wide_test, X_tab_test, X_test, pred_dim, probs_dim, uncertainties_pred_dim",
    [
-        (X_wide, X_tab, target_regres, "regression", X_wide, X_tab, None, 1, None),
-        (X_wide, X_tab, target_binary, "binary", X_wide, X_tab, None, 1, 2),
-        (X_wide, X_tab, target_multic, "multiclass", X_wide, X_tab, None, 3, 3),
-        (X_wide, X_tab, target_regres, "regression", None, None, X_test, 1, None),
-        (X_wide, X_tab, target_binary, "binary", None, None, X_test, 1, 2),
-        (X_wide, X_tab, target_multic, "multiclass", None, None, X_test, 3, 3),
+        (X_wide, X_tab, target_regres, "regression", X_wide, X_tab, None, 1, None, 4),
+        (X_wide, X_tab, target_binary, "binary", X_wide, X_tab, None, 1, 2, 3),
+        (X_wide, X_tab, target_multic, "multiclass", X_wide, X_tab, None, 3, 3, 4),
+        (X_wide, X_tab, target_regres, "regression", None, None, X_test, 1, None, 4),
+        (X_wide, X_tab, target_binary, "binary", None, None, X_test, 1, 2, 3),
+        (X_wide, X_tab, target_multic, "multiclass", None, None, X_test, 3, 3, 4),
    ],
 )
 def test_fit_objectives_tabnet(
@@ -173,6 +198,7 @@ def test_fit_objectives_tabnet(
    X_test,
    pred_dim,
    probs_dim,
+    uncertainties_pred_dim,
 ):
    warnings.filterwarnings("ignore")
    wide = Wide(np.unique(X_wide).shape[0], pred_dim)
@@ -185,11 +211,22 @@ def test_fit_objectives_tabnet(
    trainer = Trainer(model, objective=objective, verbose=0)
    trainer.fit(X_wide=X_wide, X_tab=X_tab, target=target, batch_size=16)
    preds = trainer.predict(X_wide=X_wide, X_tab=X_tab, X_test=X_test)
-    if objective == "binary":
-        pass
+    probs = trainer.predict_proba(X_wide=X_wide, X_tab=X_tab, X_test=X_test)
+    unc_preds = trainer.predict_uncertainty(
+        X_wide=X_wide, X_tab=X_tab, X_test=X_test, uncertainty_granularity=5
+    )
+    if objective == "regression":
+        assert (preds.shape[0], probs, unc_preds.shape[1]) == (
+            32,
+            probs_dim,
+            uncertainties_pred_dim,
+        )
    else:
-        probs = trainer.predict_proba(X_wide=X_wide, X_tab=X_tab, X_test=X_test)
-    assert preds.shape[0] == 32, probs.shape[1] == probs_dim
+        assert (preds.shape[0], probs.shape[1], unc_preds.shape[1]) == (
+            32,
+            probs_dim,
+            uncertainties_pred_dim,
+        )


 ##############################################################################