wide_deep.py 17.9 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12
"""
During the development of the package I realised that there is a typing
inconsistency. The input components of a Wide and Deep model are of type
nn.Module. These change type internally to nn.Sequential. While nn.Sequential
is an instance of nn.Module the oppossite is, of course, not true. This does
not affect any funcionality of the package, but it is something that needs
fixing. However, while fixing is simple (simply define new attributes that
are the nn.Sequential objects), its implications are quite wide within the
package (involves changing a number of tests and tutorials). Therefore, I
will introduce that fix when I do a major release. For now, we live with it.
"""

13
import warnings
14

15
import torch
16 17
import torch.nn as nn

18
from pytorch_widedeep.wdtypes import *  # noqa: F403
P
Pavol Mulinka 已提交
19
from pytorch_widedeep.models.tab_mlp import MLP, get_activation_fn
20
from pytorch_widedeep.models.tabnet.tab_net import TabNetPredLayer
P
Pavol Mulinka 已提交
21
from pytorch_widedeep.models import fds_layer
J
jrzaurin 已提交
22

23
warnings.filterwarnings("default", category=UserWarning)
24

25
use_cuda = torch.cuda.is_available()
26
device = torch.device("cuda" if use_cuda else "cpu")
27 28 29


class WideDeep(nn.Module):
30 31 32 33 34
    r"""Main collector class that combines all ``wide``, ``deeptabular``
    (which can be a number of architectures), ``deeptext`` and
    ``deepimage`` models.

    There are two options to combine these models that correspond to the
35
    two main architectures that ``pytorch-widedeep`` can build.
36 37 38 39 40 41 42 43 44 45

        - Directly connecting the output of the model components to an ouput neuron(s).

        - Adding a `Fully-Connected Head` (FC-Head) on top of the deep models.
          This FC-Head will combine the output form the ``deeptabular``, ``deeptext`` and
          ``deepimage`` and will be then connected to the output neuron(s).

    Parameters
    ----------
    wide: ``nn.Module``, Optional, default = None
J
jrzaurin 已提交
46
        ``Wide`` model. I recommend using the ``Wide`` class in this
47 48 49 50
        package. However, it is possible to use a custom model as long as
        is consistent with the required architecture, see
        :class:`pytorch_widedeep.models.wide.Wide`
    deeptabular: ``nn.Module``, Optional, default = None
J
jrzaurin 已提交
51 52 53 54 55
        currently ``pytorch-widedeep`` implements a number of possible
        architectures for the ``deeptabular`` component. See the documenation
        of the package. I recommend using the ``deeptabular`` components in
        this package. However, it is possible to use a custom model as long
        as is  consistent with the required architecture.
56 57 58 59
    deeptext: ``nn.Module``, Optional, default = None
        Model for the text input. Must be an object of class ``DeepText``
        or a custom model as long as is consistent with the required
        architecture. See
60
        :class:`pytorch_widedeep.models.deep_text.DeepText`
61 62 63 64
    deepimage: ``nn.Module``, Optional, default = None
        Model for the images input. Must be an object of class
        ``DeepImage`` or a custom model as long as is consistent with the
        required architecture. See
65
        :class:`pytorch_widedeep.models.deep_image.DeepImage`
66 67 68 69 70 71 72 73 74 75 76 77
    deephead: ``nn.Module``, Optional, default = None
        Custom model by the user that will receive the outtput of the deep
        component. Typically a FC-Head (MLP)
    head_hidden_dims: List, Optional, default = None
        Alternatively, the ``head_hidden_dims`` param can be used to
        specify the sizes of the stacked dense layers in the fc-head e.g:
        ``[128, 64]``. Use ``deephead`` or ``head_hidden_dims``, but not
        both.
    head_dropout: float, default = 0.1
        If ``head_hidden_dims`` is not None, dropout between the layers in
        ``head_hidden_dims``
    head_activation: str, default = "relu"
J
jrzaurin 已提交
78 79
        If ``head_hidden_dims`` is not None, activation function of the head
        layers. One of ``tanh``, ``relu``, ``gelu`` or ``leaky_relu``
80 81 82 83 84 85 86 87 88 89 90
    head_batchnorm: bool, default = False
        If ``head_hidden_dims`` is not None, specifies if batch
        normalizatin should be included in the head layers
    head_batchnorm_last: bool, default = False
        If ``head_hidden_dims`` is not None, boolean indicating whether or
        not to apply batch normalization to the last of the dense layers
    head_linear_first: bool, default = False
        If ``head_hidden_dims`` is not None, boolean indicating whether
        the order of the operations in the dense layer. If ``True``:
        ``[LIN -> ACT -> BN -> DP]``. If ``False``: ``[BN -> DP -> LIN ->
        ACT]``
P
Pavol Mulinka 已提交
91
    enforce_positive: bool, default = False
P
Pavol Mulinka 已提交
92 93 94
        If final layer has activation function or not. Important if you are using
        loss functions non-negative input restrictions, e.g. RMSLE, or if you know
        your predictions are limited only to <0, inf)
P
Pavol Mulinka 已提交
95 96 97
    enforce_positive_activation: str, default = "softplus"
        Activation function to enforce positive output from final layer. Use
        "softplus" or "relu".
P
Pavol Mulinka 已提交
98 99 100
    fds: bool, default = False
        If the feature distribution smoothing layer should be applied before the
        final prediction layer. Only available for objective='regressor'.
P
Pavol Mulinka 已提交
101 102
    fds_config: dict, default = None
        dictionary defining specific values for FeatureDistributionSmoothing layer
103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133
    pred_dim: int, default = 1
        Size of the final wide and deep output layer containing the
        predictions. `1` for regression and binary classification or number
        of classes for multiclass classification.

    Examples
    --------

    >>> from pytorch_widedeep.models import TabResnet, DeepImage, DeepText, Wide, WideDeep
    >>> embed_input = [(u, i, j) for u, i, j in zip(["a", "b", "c"][:4], [4] * 3, [8] * 3)]
    >>> column_idx = {k: v for v, k in enumerate(["a", "b", "c"])}
    >>> wide = Wide(10, 1)
    >>> deeptabular = TabResnet(blocks_dims=[8, 4], column_idx=column_idx, embed_input=embed_input)
    >>> deeptext = DeepText(vocab_size=10, embed_dim=4, padding_idx=0)
    >>> deepimage = DeepImage(pretrained=False)
    >>> model = WideDeep(wide=wide, deeptabular=deeptabular, deeptext=deeptext, deepimage=deepimage)


    .. note:: While I recommend using the ``wide`` and ``deeptabular`` components
        within this package when building the corresponding model components,
        it is very likely that the user will want to use custom text and image
        models. That is perfectly possible. Simply, build them and pass them
        as the corresponding parameters. Note that the custom models MUST
        return a last layer of activations (i.e. not the final prediction) so
        that  these activations are collected by ``WideDeep`` and combined
        accordingly. In addition, the models MUST also contain an attribute
        ``output_dim`` with the size of these last layers of activations. See
        for example :class:`pytorch_widedeep.models.tab_mlp.TabMlp`

    """

134
    def __init__(
J
jrzaurin 已提交
135
        self,
136
        wide: Optional[nn.Module] = None,
137
        deeptabular: Optional[nn.Module] = None,
J
jrzaurin 已提交
138 139 140
        deeptext: Optional[nn.Module] = None,
        deepimage: Optional[nn.Module] = None,
        deephead: Optional[nn.Module] = None,
141
        head_hidden_dims: Optional[List[int]] = None,
142 143 144 145 146
        head_activation: str = "relu",
        head_dropout: float = 0.1,
        head_batchnorm: bool = False,
        head_batchnorm_last: bool = False,
        head_linear_first: bool = False,
P
Pavol Mulinka 已提交
147 148
        enforce_positive: bool = False,
        enforce_positive_activation: str = "softplus",
149
        pred_dim: int = 1,
P
Pavol Mulinka 已提交
150
        fds: bool = False,
P
Pavol Mulinka 已提交
151
        fds_config: Optional[dict] = None,
J
jrzaurin 已提交
152
    ):
153
        super(WideDeep, self).__init__()
154

J
jrzaurin 已提交
155 156
        self._check_model_components(
            wide,
157
            deeptabular,
J
jrzaurin 已提交
158 159 160
            deeptext,
            deepimage,
            deephead,
161
            head_hidden_dims,
J
jrzaurin 已提交
162
            pred_dim,
163
        )
164

165 166 167
        # required as attribute just in case we pass a deephead
        self.pred_dim = pred_dim

168
        # The main 5 components of the wide and deep assemble
169
        self.wide = wide
170
        self.deeptabular = deeptabular
J
jrzaurin 已提交
171
        self.deeptext = deeptext
172
        self.deepimage = deepimage
173
        self.deephead = deephead
P
Pavol Mulinka 已提交
174
        self.enforce_positive = enforce_positive
P
Pavol Mulinka 已提交
175
        self.fds = fds
P
Pavol Mulinka 已提交
176

177 178
        if self.deeptabular is not None:
            self.is_tabnet = deeptabular.__class__.__name__ == "TabNet"
179 180
        else:
            self.is_tabnet = False
181

P
Pavol Mulinka 已提交
182 183 184 185 186 187 188 189 190 191 192
        if self.deephead is None and head_hidden_dims is not None:
            self._build_deephead(
                head_hidden_dims,
                head_activation,
                head_dropout,
                head_batchnorm,
                head_batchnorm_last,
                head_linear_first,
            )
        elif self.deephead is not None:
            pass
P
Pavol Mulinka 已提交
193 194 195 196
        elif self.fds:
            if (
                not self.deeptabular
                or self.pred_dim != 1
P
Pavol Mulinka 已提交
197
                # or self.wide.pred_dim != self.deeptabular.output_dim
P
Pavol Mulinka 已提交
198
            ):
P
Pavol Mulinka 已提交
199
                raise ValueError(
P
Pavol Mulinka 已提交
200 201 202
                    """Feature Distribution Smoothing is supported only with deeptabular
                    component without deephead with single output neuron. If used, wide
                    component must have pred_dim == deeptabular.output_dim """
203
                )
P
Pavol Mulinka 已提交
204

P
Pavol Mulinka 已提交
205 206
            if fds_config:
                self.FDS = fds_layer.FDS(**fds_config)
207
            else:
P
Pavol Mulinka 已提交
208 209 210 211 212
                self.FDS = fds_layer.FDS(feature_dim=self.deeptabular.output_dim)
            self.FDS_dropout = nn.Dropout(p=self.deeptabular.mlp_dropout)
            self.pred_layer = nn.Linear(self.deeptabular.output_dim, self.pred_dim)
        else:
            self._add_pred_layer()
213

P
Pavol Mulinka 已提交
214 215 216
        if self.enforce_positive:
            self.enf_pos = get_activation_fn(enforce_positive_activation)

P
Pavol Mulinka 已提交
217 218 219 220 221 222
    def forward(
        self,
        X: Dict[str, Tensor],
        y: Optional[Tensor] = None,
        epoch: Optional[int] = None,
    ):
P
Pavol Mulinka 已提交
223
        y_pred = self._forward_wide(X)
224
        if self.deephead:
P
Pavol Mulinka 已提交
225 226
            y_pred = self._forward_deephead(X, y_pred)
        elif self.training and self.fds:
P
Pavol Mulinka 已提交
227
            y_pred, deep_features = self._forward_deep(X, y_pred, y, epoch)
P
Pavol Mulinka 已提交
228 229 230 231
            if self.enforce_positive:
                return self.enf_pos(y_pred), deep_features
            else:
                return y_pred, deep_features
P
Pavol Mulinka 已提交
232
        else:
P
Pavol Mulinka 已提交
233
            y_pred = self._forward_deep(X, y_pred)
P
Pavol Mulinka 已提交
234
        if self.enforce_positive:
P
Pavol Mulinka 已提交
235
            return self.enf_pos(y_pred)
236
        else:
P
Pavol Mulinka 已提交
237
            return y_pred
238

239 240 241 242 243 244 245 246 247
    def _build_deephead(
        self,
        head_hidden_dims,
        head_activation,
        head_dropout,
        head_batchnorm,
        head_batchnorm_last,
        head_linear_first,
    ):
248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264
        deep_dim = 0
        if self.deeptabular is not None:
            deep_dim += self.deeptabular.output_dim
        if self.deeptext is not None:
            deep_dim += self.deeptext.output_dim
        if self.deepimage is not None:
            deep_dim += self.deepimage.output_dim

        head_hidden_dims = [deep_dim] + head_hidden_dims
        self.deephead = MLP(
            head_hidden_dims,
            head_activation,
            head_dropout,
            head_batchnorm,
            head_batchnorm_last,
            head_linear_first,
        )
P
Pavol Mulinka 已提交
265

266 267 268 269
        self.deephead.add_module(
            "head_out", nn.Linear(head_hidden_dims[-1], self.pred_dim)
        )

P
Pavol Mulinka 已提交
270
    def _add_pred_layer(self):
271 272 273 274 275
        if self.deeptabular is not None:
            if self.is_tabnet:
                self.deeptabular = nn.Sequential(
                    self.deeptabular,
                    TabNetPredLayer(self.deeptabular.output_dim, self.pred_dim),
J
jrzaurin 已提交
276
                )
277
            else:
278 279 280 281 282 283 284 285 286 287 288 289 290 291
                self.deeptabular = nn.Sequential(
                    self.deeptabular,
                    nn.Linear(self.deeptabular.output_dim, self.pred_dim),
                )
        if self.deeptext is not None:
            self.deeptext = nn.Sequential(
                self.deeptext, nn.Linear(self.deeptext.output_dim, self.pred_dim)
            )
        if self.deepimage is not None:
            self.deepimage = nn.Sequential(
                self.deepimage, nn.Linear(self.deepimage.output_dim, self.pred_dim)
            )

    def _forward_wide(self, X):
292 293 294 295
        if self.wide is not None:
            out = self.wide(X["wide"])
        else:
            batch_size = X[list(X.keys())[0]].size(0)
296
            out = torch.zeros(batch_size, self.pred_dim).to(device)
297

298 299 300 301 302 303 304 305
        return out

    def _forward_deephead(self, X, wide_out):
        if self.deeptabular is not None:
            if self.is_tabnet:
                tab_out = self.deeptabular(X["deeptabular"])
                deepside, M_loss = tab_out[0], tab_out[1]
            else:
306
                deepside = self.deeptabular(X["deeptabular"])
307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323
        else:
            deepside = torch.FloatTensor().to(device)
        if self.deeptext is not None:
            deepside = torch.cat([deepside, self.deeptext(X["deeptext"])], axis=1)
        if self.deepimage is not None:
            deepside = torch.cat([deepside, self.deepimage(X["deepimage"])], axis=1)

        deephead_out = self.deephead(deepside)
        deepside_out = nn.Linear(deephead_out.size(1), self.pred_dim).to(device)

        if self.is_tabnet:
            res = (wide_out.add_(deepside_out(deephead_out)), M_loss)
        else:
            res = wide_out.add_(deepside_out(deephead_out))

        return res

P
Pavol Mulinka 已提交
324
    def _forward_deep(self, X, wide_out, y=None, epoch=None):
325 326 327 328
        if self.deeptabular is not None:
            if self.is_tabnet:
                tab_out, M_loss = self.deeptabular(X["deeptabular"])
                wide_out.add_(tab_out)
329
            else:
P
Pavol Mulinka 已提交
330 331
                deeptab_features = self.deeptabular(X["deeptabular"])
                if self.training and self.fds:
P
Pavol Mulinka 已提交
332 333 334 335 336 337
                    deeptab_features = self.FDS.smooth(deeptab_features, y, epoch)
                    deeptab_features = self.FDS_dropout(deeptab_features)
                    wide_out.add_(self.pred_layer(deeptab_features))
                    return wide_out, deeptab_features
                elif self.fds:
                    wide_out.add_(self.pred_layer(deeptab_features))
P
Pavol Mulinka 已提交
338 339
                else:
                    wide_out.add_(deeptab_features)
340 341 342 343 344 345
        if self.deeptext is not None:
            wide_out.add_(self.deeptext(X["deeptext"]))
        if self.deepimage is not None:
            wide_out.add_(self.deepimage(X["deepimage"]))
        if self.is_tabnet:
            res = (wide_out, M_loss)
346
        else:
347 348 349
            res = wide_out

        return res
350

J
jrzaurin 已提交
351
    @staticmethod  # noqa: C901
352
    def _check_model_components(  # noqa: C901
J
jrzaurin 已提交
353
        wide,
354
        deeptabular,
J
jrzaurin 已提交
355 356 357
        deeptext,
        deepimage,
        deephead,
358
        head_hidden_dims,
J
jrzaurin 已提交
359
        pred_dim,
360 361
    ):

J
jrzaurin 已提交
362 363 364 365 366 367 368
        if wide is not None:
            assert wide.wide_linear.weight.size(1) == pred_dim, (
                "the 'pred_dim' of the wide component ({}) must be equal to the 'pred_dim' "
                "of the deep component and the overall model itself ({})".format(
                    wide.wide_linear.weight.size(1), pred_dim
                )
            )
369
        if deeptabular is not None and not hasattr(deeptabular, "output_dim"):
370
            raise AttributeError(
371
                "deeptabular model must have an 'output_dim' attribute. "
372
                "See pytorch-widedeep.models.deep_text.DeepText"
373
            )
374 375 376 377 378 379 380 381 382
        if deeptabular is not None:
            is_tabnet = deeptabular.__class__.__name__ == "TabNet"
            has_wide_text_or_image = (
                wide is not None or deeptext is not None or deepimage is not None
            )
            if is_tabnet and has_wide_text_or_image:
                warnings.warn(
                    "'WideDeep' is a model comprised by multiple components and the 'deeptabular'"
                    " component is 'TabNet'. We recommend using 'TabNet' in isolation."
383
                    " The reasons are: i)'TabNet' uses sparse regularization which partially losses"
384 385
                    " its purpose when used in combination with other components."
                    " If you still want to use a multiple component model with 'TabNet',"
386 387 388
                    " consider setting 'lambda_sparse' to 0 during training. ii) The feature"
                    " importances will be computed only for TabNet but the model will comprise multiple"
                    " components. Therefore, such importances will partially lose their 'meaning'.",
389 390
                    UserWarning,
                )
391 392 393
        if deeptext is not None and not hasattr(deeptext, "output_dim"):
            raise AttributeError(
                "deeptext model must have an 'output_dim' attribute. "
394
                "See pytorch-widedeep.models.deep_text.DeepText"
395 396 397 398
            )
        if deepimage is not None and not hasattr(deepimage, "output_dim"):
            raise AttributeError(
                "deepimage model must have an 'output_dim' attribute. "
399
                "See pytorch-widedeep.models.deep_text.DeepText"
400
            )
401
        if deephead is not None and head_hidden_dims is not None:
402
            raise ValueError(
403
                "both 'deephead' and 'head_hidden_dims' are not None. Use one of the other, but not both"
404
            )
405
        if (
406
            head_hidden_dims is not None
407 408 409 410
            and not deeptabular
            and not deeptext
            and not deepimage
        ):
411
            raise ValueError(
412
                "if 'head_hidden_dims' is not None, at least one deep component must be used"
413
            )
J
jrzaurin 已提交
414 415 416
        if deephead is not None:
            deephead_inp_feat = next(deephead.parameters()).size(1)
            output_dim = 0
417 418
            if deeptabular is not None:
                output_dim += deeptabular.output_dim
J
jrzaurin 已提交
419 420 421 422 423 424 425 426 427 428
            if deeptext is not None:
                output_dim += deeptext.output_dim
            if deepimage is not None:
                output_dim += deepimage.output_dim
            assert deephead_inp_feat == output_dim, (
                "if a custom 'deephead' is used its input features ({}) must be equal to "
                "the output features of the deep component ({})".format(
                    deephead_inp_feat, output_dim
                )
            )