README.md 2.7 KB
Newer Older
G
guru4elephant 已提交
1
# Sequence Semantic Retrieval Model
2 3

## Introduction
D
dongdaxiang 已提交
4 5 6 7
In news recommendation scenarios, different from traditional systems that recommend entertainment items such as movies or music, there are several new problems to solve.
- Very sparse user profile features exist that a user may login a news recommendation app anonymously and a user is likely to read a fresh news item.
- News are generated or disappeared very fast compare with movies or musics. Usually, there will be thousands of news generated in a news recommendation app. The Consumption of news is also fast since users care about newly happened things. 
- User interests may change frequently in the news recommendation setting. The content of news will affect users' reading behaviors a lot even the category of the news does not belong to users' long-term interest. In news recommendation, reading behaviors are determined by both short-term interest and long-term interest of users.
8

D
dongdaxiang 已提交
9
[GRU4Rec](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleRec/gru4rec) models a user's short-term and long-term interest by applying a gated-recurrent-unit on the user's reading history. The generalization ability of recurrent neural network captures users' similarity of reading sequences that alleviates the user profile sparsity problem. However, the paper of GRU4Rec operates on close domain of items that the model predicts which item a user will be interested in through classification method. In news recommendation, news items are dynamic through time that GRU4Rec model can not predict items that do not exist in training dataset.
D
dongdaxiang 已提交
10

D
dongdaxiang 已提交
11
Sequence Semantic Retrieval(SSR) Model shares the similar idea with Multi-Rate Deep Learning for Temporal Recommendation, SIGIR 2016. Sequence Semantic Retrieval Model has two components, one is the matching model part, the other one is the retrieval part.
D
dongdaxiang 已提交
12
- The idea of SSR is to model a user's personalized interest of an item through matching model structure, and the representation of a news item can be computed online even the news item does not exist in training dataset. 
D
dongdaxiang 已提交
13
- With the representation of news items, we are able to build an vector indexing service online for news prediction and this is the retrieval part of SSR.
14 15

## Dataset
D
dongdaxiang 已提交
16
Dataset preprocessing follows the method of [GRU4Rec Project](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleRec/gru4rec). Note that you should reuse scripts from GRU4Rec project for data preprocessing.
17 18

## Training
D
dongdaxiang 已提交
19 20 21 22 23
Before training, you should set PYTHONPATH environment
```
export PYTHONPATH=./models/fluid:$PYTHONPATH
```

D
dongdaxiang 已提交
24 25
The command line options for training can be listed by `python train.py -h`
``` bash
D
dongdaxiang 已提交
26
python train.py --train_file rsc15_train_tr_paddle.txt
D
dongdaxiang 已提交
27 28 29 30
```

## Build Index
TBA
31 32

## Retrieval
D
dongdaxiang 已提交
33
TBA