README.md 4.0 KB
Newer Older
Y
yao_yf 已提交
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93
recommendation Model
## Overview
This is an implementation of WideDeep as described in the [Wide & Deep Learning for Recommender System](https://arxiv.org/pdf/1606.07792.pdf) paper.

WideDeep model jointly trained wide linear models and deep neural network, which combined the benefits of memorization and generalization for recommender systems.

## Dataset
The [Criteo datasets](http://labs.criteo.com/2014/02/download-kaggle-display-advertising-challenge-dataset/) are used for model training and evaluation.

## Running Code

### Download and preprocess dataset
To download the dataset, please install Pandas package first. Then issue the following command:
```
bash download.sh
```

### Code Structure
The entire code structure is as following:
```
|--- wide_and_deep/
    train_and_test.py            "Entrance of Wide&Deep model training and evaluation"
    test.py                      "Entrance of Wide&Deep model evaluation"
    train.py                     "Entrance of Wide&Deep model training"
    train_and_test_multinpu.py   "Entrance of Wide&Deep model data parallel training and evaluation"
    |--- src/                    "entrance of training and evaluation"
        config.py                "parameters configuration"
        dataset.py               "Dataset loader class"
        WideDeep.py              "Model structure"
        callbacks.py             "Callback class for training and evaluation"
        metrics.py               "Metric class"
```

### Train and evaluate model
To train and evaluate the model, issue the following command:
```
python train_and_test.py
```
Arguments:
  * `--data_path`: This should be set to the same directory given to the data_download's data_dir argument.
  * `--epochs`: Total train epochs.
  * `--batch_size`: Training batch size.
  * `--eval_batch_size`: Eval batch size.
  * `--field_size`: The number of features.
  * `--vocab_size`: The total features of dataset.
  * `--emb_dim`: The dense embedding dimension of sparse feature.
  * `--deep_layers_dim`: The dimension of all deep layers.
  * `--deep_layers_act`: The activation of all deep layers.
  * `--keep_prob`: The rate to keep in dropout layer.
  * `--ckpt_path`:The location of the checkpoint file.
  * `--eval_file_name` : Eval output file.
  * `--loss_file_name` :  Loss output file.

To train the model, issue the following command:
```
python train.py
```
Arguments:
  * `--data_path`: This should be set to the same directory given to the data_download's data_dir argument.
  * `--epochs`: Total train epochs.
  * `--batch_size`: Training batch size.
  * `--eval_batch_size`: Eval batch size.
  * `--field_size`: The number of features.
  * `--vocab_size`: The total features of dataset.
  * `--emb_dim`: The dense embedding dimension of sparse feature.
  * `--deep_layers_dim`: The dimension of all deep layers.
  * `--deep_layers_act`: The activation of all deep layers.
  * `--keep_prob`: The rate to keep in dropout layer.
  * `--ckpt_path`:The location of the checkpoint file.
  * `--eval_file_name` : Eval output file.
  * `--loss_file_name` :  Loss output file.

To evaluate the model, issue the following command:
```
python test.py
```
Arguments:
  * `--data_path`: This should be set to the same directory given to the data_download's data_dir argument.
  * `--epochs`: Total train epochs.
  * `--batch_size`: Training batch size.
  * `--eval_batch_size`: Eval batch size.
  * `--field_size`: The number of features.
  * `--vocab_size`: The total features of dataset.
  * `--emb_dim`: The dense embedding dimension of sparse feature.
  * `--deep_layers_dim`: The dimension of all deep layers.
  * `--deep_layers_act`: The activation of all deep layers.
  * `--keep_prob`: The rate to keep in dropout layer.
  * `--ckpt_path`:The location of the checkpoint file.
  * `--eval_file_name` : Eval output file.
  * `--loss_file_name` :  Loss output file.

There are other arguments about models and training process. Use the `--help` or `-h` flag to get a full list of possible arguments with detailed descriptions.