README.md 21.9 KB
Newer Older
C
choijulie 已提交
1
# Personalized Recommendation
L
livc 已提交
2

Y
Yi Wang 已提交
3
The source code from this tutorial is at [here](https://github.com/PaddlePaddle/book/tree/develop/05.recommender_system).  For instructions to run it, please refer to [this guide](https://github.com/PaddlePaddle/book/blob/develop/README.md#running-the-book).
L
livc 已提交
4 5


C
choijulie 已提交
6
## Background
L
livc 已提交
7

Y
Yi Wang 已提交
8
The recommender system is a component of e-commerce, online videos, and online reading services.  There are several different approaches for recommender systems to learn from user behavior and product properties and to understand users' interests.
L
livc 已提交
9

M
Mimee 已提交
10
- User behavior-based approach.  A well-known method of this approach is collaborative filtering, which assumes that if two users made similar purchases, they share common interests and would likely go on making the same decision. Some variants of collaborative filtering are user-based[[3](#references)], item-based [[4](#references)], social network based[[5](#references)], and model-based.
L
livc 已提交
11

M
Mimee 已提交
12
- Content-based approach[[1](#references)].  This approach represents product properties and user interests as feature vectors of the same space so that it could measure how much a user is interested in a product by the distance between two feature vectors.
L
livc 已提交
13

M
Mimee 已提交
14
- Hybrid approach[[2](#references)]: This one combines above two to help with each other about the data sparsity problem[[6](#references)].
C
choijulie 已提交
15

Y
Yi Wang 已提交
16
This tutorial explains a deep learning based hybrid approach and its implement in PaddlePaddle.  We are going to train a model using a dataset that includes user information, movie information, and ratings.  Once we train the model, we will be able to get a predicted rating given a pair of user and movie IDs.
L
livc 已提交
17 18


C
choijulie 已提交
19
## Model Overview
L
livc 已提交
20

M
Mimee 已提交
21
To know more about deep learning based recommendation, let us start from going over the Youtube recommender system[[7](#references)] before introducing our hybrid model.
L
livc 已提交
22 23


C
choijulie 已提交
24 25 26
### YouTube's Deep Learning Recommendation Model

YouTube is a video-sharing Web site with one of the largest user base in the world.  Its recommender system serves more than a billion users.  This system is composed of two major parts: candidate generation and ranking.  The former selects few hundreds of candidates from millions of videos, and the latter ranks and outputs the top 10.
L
livc 已提交
27

28
<p align="center">
C
choijulie 已提交
29 30
<img src="image/YouTube_Overview.en.png" width="70%" ><br/>
Figure 1. YouTube recommender system overview.
31
</p>
L
livc 已提交
32

C
choijulie 已提交
33
#### Candidate Generation Network
L
livc 已提交
34

M
Mimee 已提交
35
YouTube models candidate generation as a multi-class classification problem with a huge number of classes equal to the number of videos.  The architecture of the model is as follows:
L
livc 已提交
36

37
<p align="center">
C
choijulie 已提交
38 39
<img src="image/Deep_candidate_generation_model_architecture.en.png" width="70%" ><br/>
Figure 2. Deep candidate generation model.
40
</p>
L
livc 已提交
41

M
Mimee 已提交
42
The first stage of this model maps watching history and search queries into fixed-length representative features.  Then, an MLP (multi-layer Perceptron, as described in the [Recognize Digits](https://github.com/PaddlePaddle/book/blob/develop/recognize_digits/README.md) tutorial) takes the concatenation of all representative vectors.  The output of the MLP represents the user' *intrinsic interests*.  At training time, it is used together with a softmax output layer for minimizing the classification error.   At serving time, it is used to compute the relevance of the user with all movies.
C
choijulie 已提交
43 44

For a user $U$, the predicted watching probability of video $i$ is
L
livc 已提交
45 46 47

$$P(\omega=i|u)=\frac{e^{v_{i}u}}{\sum_{j \in V}e^{v_{j}u}}$$

C
choijulie 已提交
48
where $u$ is the representative vector of user $U$, $V$ is the corpus of all videos, $v_i$ is the representative vector of the $i$-th video. $u$ and $v_i$ are vectors of the same length, so we can compute their dot product using a fully connected layer.
L
livc 已提交
49

C
choijulie 已提交
50
This model could have a performance issue as the softmax output covers millions of classification labels.  To optimize performance, at the training time, the authors down-sample negative samples, so the actual number of classes is reduced to thousands.  At serving time, the authors ignore the normalization of the softmax outputs, because the results are just for ranking.
L
livc 已提交
51

C
choijulie 已提交
52
#### Ranking Network
L
livc 已提交
53

C
choijulie 已提交
54
The architecture of the ranking network is similar to that of the candidate generation network.  Similar to ranking models widely used in online advertising, it uses rich features like video ID, last watching time, etc.  The output layer of the ranking network is a weighted logistic regression, which rates all candidate videos.
L
livc 已提交
55

C
choijulie 已提交
56
### Hybrid Model
57

C
choijulie 已提交
58
In the section, let us introduce our movie recommendation system. Especially, we feed moives titles into a text convolution network to get a fixed-length representative feature vector. Accordingly we will introduce the convolutional neural network for texts and the hybrid recommendation model respectively.
59

C
choijulie 已提交
60
#### Convolutional Neural Networks for Texts (CNN)
61

C
choijulie 已提交
62
**Convolutional Neural Networks** are frequently applied to data with grid-like topology such as two-dimensional images and one-dimensional texts. A CNN can extract multiple local features, combine them, and produce high-level abstractions, which correspond to semantic understanding. Empirically, CNN is shown to be efficient for image and text modeling.
63

C
choijulie 已提交
64
CNN mainly contains convolution and pooling operation, with versatile combinations in various applications. Here, we briefly describe a CNN as shown in Figure 3.
65 66


C
choijulie 已提交
67 68 69 70
<p align="center">
<img src="image/text_cnn_en.png" width = "80%" align="center"/><br/>
Figure 3. CNN for text modeling.
</p>
71

C
choijulie 已提交
72
Let $n$ be the length of the sentence to process, and the $i$-th word has embedding as $x_i\in\mathbb{R}^k$,where $k$ is the embedding dimensionality.
73

C
choijulie 已提交
74
First, we concatenate the words by piecing together every $h$ words, each as a window of length $h$. This window is denoted as $x_{i:i+h-1}$, consisting of $x_{i},x_{i+1},\ldots,x_{i+h-1}$, where $x_i$ is the first word in the window and $i$ takes value ranging from $1$ to $n-h+1$: $x_{i:i+h-1}\in\mathbb{R}^{hk}$.
75

C
choijulie 已提交
76
Next, we apply the convolution operation: we apply the kernel $w\in\mathbb{R}^{hk}$ in each window, extracting features $c_i=f(w\cdot x_{i:i+h-1}+b)$, where $b\in\mathbb{R}$ is the bias and $f$ is a non-linear activation function such as $sigmoid$. Convolving by the kernel at every window ${x_{1:h},x_{2:h+1},\ldots,x_{n-h+1:n}}$ produces a feature map in the following form:
77

C
choijulie 已提交
78
$$c=[c_1,c_2,\ldots,c_{n-h+1}], c \in \mathbb{R}^{n-h+1}$$
L
livc 已提交
79

C
choijulie 已提交
80
Next, we apply *max pooling* over time to represent the whole sentence $\hat c$, which is the maximum element across the feature map:
L
livc 已提交
81

C
choijulie 已提交
82
$$\hat c=max(c)$$
L
livc 已提交
83

C
choijulie 已提交
84
#### Model Structure Of The Hybrid Model
L
livc 已提交
85

C
choijulie 已提交
86
In our network, the input includes features of users and movies.  The user feature includes four properties: user ID, gender, occupation, and age.  Movie features include their IDs, genres, and titles.
L
livc 已提交
87

C
choijulie 已提交
88
We use fully-connected layers to map user features into representative feature vectors and concatenate them.  The process of movie features is similar, except that for movie titles -- we feed titles into a text convolution network as described in the above section to get a fixed-length representative feature vector.
L
livc 已提交
89

C
choijulie 已提交
90
Given the feature vectors of users and movies, we compute the relevance using cosine similarity.  We minimize the squared error at training time.
L
livc 已提交
91

92
<p align="center">
C
choijulie 已提交
93 94
<img src="image/rec_regression_network_en.png" width="90%" ><br/>
Figure 4. A hybrid recommendation model.
95
</p>
L
livc 已提交
96

C
choijulie 已提交
97
## Dataset
L
livc 已提交
98

C
choijulie 已提交
99
We use the [MovieLens ml-1m](http://files.grouplens.org/datasets/movielens/ml-1m.zip) to train our model.  This dataset includes 10,000 ratings of 4,000 movies from 6,000 users to 4,000 movies.  Each rate is in the range of 1~5.  Thanks to GroupLens Research for collecting, processing and publishing the dataset.
L
livc 已提交
100

101
`paddle.datasets` package encapsulates multiple public datasets, including `cifar`, `imdb`, `mnist`, `movielens` and `wmt14`, etc. There's no need for us to manually download and preprocess `MovieLens` dataset.
L
livc 已提交
102

C
choijulie 已提交
103 104
The raw `MoiveLens` contains movie ratings, relevant features from both movies and users.
For instance, one movie's feature could be:
Y
Yu Yang 已提交
105 106

```python
107
import paddle
C
choijulie 已提交
108 109
movie_info = paddle.dataset.movielens.movie_info()
print movie_info.values()[0]
L
livc 已提交
110
```
Y
Yu Yang 已提交
111

C
choijulie 已提交
112 113
```text
<MovieInfo id(1), title(Toy Story), categories(['Animation', "Children's", 'Comedy'])>
L
livc 已提交
114 115
```

C
choijulie 已提交
116
One user's feature could be:
L
livc 已提交
117

Y
Yu Yang 已提交
118
```python
C
choijulie 已提交
119 120
user_info = paddle.dataset.movielens.user_info()
print user_info.values()[0]
L
livc 已提交
121 122
```

C
choijulie 已提交
123 124 125
```text
<UserInfo id(1), gender(F), age(1), job(10)>
```
L
livc 已提交
126

C
choijulie 已提交
127
In this dateset, the distribution of age is shown as follows:
Y
Yu Yang 已提交
128

C
choijulie 已提交
129 130 131 132 133 134 135 136
```text
1: "Under 18"
18: "18-24"
25: "25-34"
35: "35-44"
45: "45-49"
50: "50-55"
56: "56+"
137
```
Y
Yu Yang 已提交
138

C
choijulie 已提交
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163
User's occupation is selected from the following options:

```text
0: "other" or not specified
1: "academic/educator"
2: "artist"
3: "clerical/admin"
4: "college/grad student"
5: "customer service"
6: "doctor/health care"
7: "executive/managerial"
8: "farmer"
9: "homemaker"
10: "K-12 student"
11: "lawyer"
12: "programmer"
13: "retired"
14: "sales/marketing"
15: "scientist"
16: "self-employed"
17: "technician/engineer"
18: "tradesman/craftsman"
19: "unemployed"
20: "writer"
```
Y
Yu Yang 已提交
164

C
choijulie 已提交
165 166
Each record consists of three main components: user features, movie features and movie ratings.
Likewise, as a simple example, consider the following:
Y
Yu Yang 已提交
167 168 169 170 171 172 173

```python
train_set_creator = paddle.dataset.movielens.train()
train_sample = next(train_set_creator())
uid = train_sample[0]
mov_id = train_sample[len(user_info[uid].value())]
print "User %s rates Movie %s with Score %s"%(user_info[uid], movie_info[mov_id], train_sample[-1])
174 175
```

C
choijulie 已提交
176 177 178 179 180
```text
User <UserInfo id(1), gender(F), age(1), job(10)> rates Movie <MovieInfo id(1193), title(One Flew Over the Cuckoo's Nest), categories(['Drama'])> with Score [5.0]
```

The output shows that user 1 gave movie `1193` a rating of 5.
Y
Yu Yang 已提交
181

C
choijulie 已提交
182
After issuing a command `python train.py`, training will start immediately. The details will be unpacked by the following sessions to see how it works.
L
livc 已提交
183

184
## Model Configuration
C
choijulie 已提交
185

N
Nicky 已提交
186
Our program starts with importing necessary packages and initializing some global variables:
C
choijulie 已提交
187
```python
188 189 190 191 192 193 194 195 196 197 198
import math
import sys
import numpy as np
import paddle
import paddle.fluid as fluid
import paddle.fluid.layers as layers
import paddle.fluid.nets as nets

IS_SPARSE = True
USE_GPU = False
BATCH_SIZE = 256
C
choijulie 已提交
199
```
L
livc 已提交
200

201 202

Then we define the model configuration for user combined features:
L
livc 已提交
203 204

```python
205
def get_usr_combined_features():
L
livc 已提交
206

207
    USR_DICT_SIZE = paddle.dataset.movielens.max_user_id() + 1
Y
Yu Yang 已提交
208

209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259
    uid = layers.data(name='user_id', shape=[1], dtype='int64')

    usr_emb = layers.embedding(
        input=uid,
        dtype='float32',
        size=[USR_DICT_SIZE, 32],
        param_attr='user_table',
        is_sparse=IS_SPARSE)

    usr_fc = layers.fc(input=usr_emb, size=32)

    USR_GENDER_DICT_SIZE = 2

    usr_gender_id = layers.data(name='gender_id', shape=[1], dtype='int64')

    usr_gender_emb = layers.embedding(
        input=usr_gender_id,
        size=[USR_GENDER_DICT_SIZE, 16],
        param_attr='gender_table',
        is_sparse=IS_SPARSE)

    usr_gender_fc = layers.fc(input=usr_gender_emb, size=16)

    USR_AGE_DICT_SIZE = len(paddle.dataset.movielens.age_table)
    usr_age_id = layers.data(name='age_id', shape=[1], dtype="int64")

    usr_age_emb = layers.embedding(
        input=usr_age_id,
        size=[USR_AGE_DICT_SIZE, 16],
        is_sparse=IS_SPARSE,
        param_attr='age_table')

    usr_age_fc = layers.fc(input=usr_age_emb, size=16)

    USR_JOB_DICT_SIZE = paddle.dataset.movielens.max_job_id() + 1
    usr_job_id = layers.data(name='job_id', shape=[1], dtype="int64")

    usr_job_emb = layers.embedding(
        input=usr_job_id,
        size=[USR_JOB_DICT_SIZE, 16],
        param_attr='job_table',
        is_sparse=IS_SPARSE)

    usr_job_fc = layers.fc(input=usr_job_emb, size=16)

    concat_embed = layers.concat(
        input=[usr_fc, usr_gender_fc, usr_age_fc, usr_job_fc], axis=1)

    usr_combined_features = layers.fc(input=concat_embed, size=200, act="tanh")

    return usr_combined_features
L
livc 已提交
260 261
```

262 263 264
As shown in the above code, the input is four dimension integers for each user, that is `user_id`,`gender_id`, `age_id` and `job_id`. In order to deal with these features conveniently, we use the language model in NLP to transform these discrete values into embedding vaules `usr_emb`, `usr_gender_emb`, `usr_age_emb` and `usr_job_emb`.

Then we can use user features as input, directly connecting to a fully-connected layer, which is used to reduce dimension to 200.
L
livc 已提交
265

C
choijulie 已提交
266
Furthermore, we do a similar transformation for each movie feature. The model configuration is:
L
livc 已提交
267 268

```python
269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315
def get_mov_combined_features():

    MOV_DICT_SIZE = paddle.dataset.movielens.max_movie_id() + 1

    mov_id = layers.data(name='movie_id', shape=[1], dtype='int64')

    mov_emb = layers.embedding(
        input=mov_id,
        dtype='float32',
        size=[MOV_DICT_SIZE, 32],
        param_attr='movie_table',
        is_sparse=IS_SPARSE)

    mov_fc = layers.fc(input=mov_emb, size=32)

    CATEGORY_DICT_SIZE = len(paddle.dataset.movielens.movie_categories())

    category_id = layers.data(
        name='category_id', shape=[1], dtype='int64', lod_level=1)

    mov_categories_emb = layers.embedding(
        input=category_id, size=[CATEGORY_DICT_SIZE, 32], is_sparse=IS_SPARSE)

    mov_categories_hidden = layers.sequence_pool(
        input=mov_categories_emb, pool_type="sum")

    MOV_TITLE_DICT_SIZE = len(paddle.dataset.movielens.get_movie_title_dict())

    mov_title_id = layers.data(
        name='movie_title', shape=[1], dtype='int64', lod_level=1)

    mov_title_emb = layers.embedding(
        input=mov_title_id, size=[MOV_TITLE_DICT_SIZE, 32], is_sparse=IS_SPARSE)

    mov_title_conv = nets.sequence_conv_pool(
        input=mov_title_emb,
        num_filters=32,
        filter_size=3,
        act="tanh",
        pool_type="sum")

    concat_embed = layers.concat(
        input=[mov_fc, mov_categories_hidden, mov_title_conv], axis=1)

    mov_combined_features = layers.fc(input=concat_embed, size=200, act="tanh")

    return mov_combined_features
Y
Yu Yang 已提交
316
```
317

N
Nicky 已提交
318
Movie title, which is a sequence of words represented by an integer word index sequence, will be fed into a `sequence_conv_pool` layer, which will apply convolution and pooling on time dimension. Because pooling is done on time dimension, the output will be a fixed-length vector regardless the length of the input sequence.
319

320

N
Nicky 已提交
321
Finally, we can define a `inference_program` that uses cosine similarity to calculate the similarity between user characteristics and movie features.
322

Y
Yu Yang 已提交
323
```python
324 325 326 327 328 329 330 331 332 333
def inference_program():
    usr_combined_features = get_usr_combined_features()
    mov_combined_features = get_mov_combined_features()

    inference = layers.cos_sim(X=usr_combined_features, Y=mov_combined_features)
    scale_infer = layers.scale(x=inference, scale=5.0)

    return scale_infer
```

N
Nicky 已提交
334 335
Then we define a `training_program` that uses the result from `inference_program` to compute the cost with label data.
Also define `optimizer_func` to specify the optimizer.
336 337 338 339 340 341 342 343 344 345 346

```python
def train_program():

    scale_infer = inference_program()

    label = layers.data(name='score', shape=[1], dtype='float32')
    square_cost = layers.square_error_cost(input=scale_infer, label=label)
    avg_cost = layers.mean(square_cost)

    return [avg_cost, scale_infer]
347 348


N
Nicky 已提交
349 350
def optimizer_func():
    return fluid.optimizer.SGD(learning_rate=0.2)
L
livc 已提交
351 352
```

C
choijulie 已提交
353
## Model Training
L
livc 已提交
354

355
### Specify training environment
L
livc 已提交
356

357
Specify your training environment, you should specify if the training is on CPU or GPU.
L
livc 已提交
358

Y
Yu Yang 已提交
359
```python
360 361
use_cuda = False
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
Y
Yu Yang 已提交
362
```
L
livc 已提交
363

364
### Datafeeder Configuration
Y
Yu Yang 已提交
365

366
Next we define data feeders for test and train. The feeder reads a `buf_size` of data each time and feed them to the training/testing process.
N
Nicky 已提交
367
`paddle.dataset.movielens.train` will yield records during each pass, after shuffling, a batch input of `BATCH_SIZE` is generated for training.
Y
Yu Yang 已提交
368 369

```python
370 371 372 373
train_reader = paddle.batch(
    paddle.reader.shuffle(
        paddle.dataset.movielens.train(), buf_size=8192),
    batch_size=BATCH_SIZE)
L
livc 已提交
374

375 376
test_reader = paddle.batch(
    paddle.dataset.movielens.test(), batch_size=BATCH_SIZE)
C
choijulie 已提交
377
```
378

379
### Create Trainer
380

N
Nicky 已提交
381
Create a trainer that takes `train_program` as input and specify optimizer function.
382

Y
Yu Yang 已提交
383
```python
384
trainer = fluid.Trainer(
N
Nicky 已提交
385
    train_func=train_program, place=place, optimizer_func=optimizer_func)
Y
Yu Yang 已提交
386
```
387

388 389 390
### Feeding Data

`feed_order` is devoted to specifying the correspondence between each yield record and `paddle.layer.data`. For instance, the first column of data generated by `movielens.train` corresponds to `user_id` feature.
L
livc 已提交
391

Y
Yu Yang 已提交
392
```python
393 394 395 396
feed_order = [
        'user_id', 'gender_id', 'age_id', 'job_id', 'movie_id', 'category_id',
        'movie_title', 'score'
    ]
Q
qijun 已提交
397 398
```

399 400 401 402
### Event Handler

Callback function `event_handler` will be called during training when a pre-defined event happens.
For example, we can check the cost by `trainer.test` when `EndStepEvent` occurs
Q
qijun 已提交
403 404

```python
405 406 407
# Specify the directory path to save the parameters
params_dirname = "recommender_system.inference.model"

Q
qijun 已提交
408
def event_handler(event):
409 410 411 412 413 414 415 416 417 418 419 420 421 422 423
    if isinstance(event, fluid.EndStepEvent):
        avg_cost_set = trainer.test(
            reader=test_reader, feed_order=feed_order)

        # get avg cost
        avg_cost = np.array(avg_cost_set).mean()

        print("avg_cost: %s" % avg_cost)
        print('BatchID {0}, Test Loss {1:0.2}'.format(event.epoch + 1,
                                                          float(avg_cost)))

        if float(avg_cost) < 4:
            trainer.save_params(params_dirname)
            trainer.stop()

Q
qijun 已提交
424 425
```

426 427 428 429
### Training

Finally, we invoke `trainer.train` to start training with `num_epochs` and other parameters.

Q
qijun 已提交
430
```python
431 432 433 434 435 436 437 438
trainer.train(
    num_epochs=1,
    event_handler=event_handler,
    reader=train_reader,
    feed_order=feed_order)
```

## Inference
Q
qijun 已提交
439

440
### Create Inferencer
Y
Yu Yang 已提交
441

442
Initialize Inferencer with `inference_program` and `params_dirname` which is where we save params from training.
Y
Yu Yang 已提交
443

444 445
```python
inferencer = fluid.Inferencer(
446
        inference_program, param_path=params_dirname, place=place)  
447
```
448

449
### Generate input data for testing
450

451
Use create_lod_tensor(data, lod, place) API to generate LoD Tensor, where `data` is a list of sequences of index numbers, `lod` is the level of detail (lod) info associated with `data`.
N
Nicky 已提交
452
For example, data = [[10, 2, 3], [2, 3]] means that it contains two sequences of indices, of length 3 and 2, respectively.
453
Correspondingly, lod = [[3, 2]] contains one level of detail info, indicating that `data` consists of two sequences of length 3 and 2.
L
liaogang 已提交
454

N
Nicky 已提交
455
In this infer example, we try to predict rating of movie 'Hunchback of Notre Dame' from the info of user id 1.
456
```python
N
Nicky 已提交
457 458
infer_movie_id = 783
infer_movie_name = paddle.dataset.movielens.movie_info()[infer_movie_id].title
459 460 461 462
user_id = fluid.create_lod_tensor([[1]], [[1]], place)
gender_id = fluid.create_lod_tensor([[1]], [[1]], place)
age_id = fluid.create_lod_tensor([[0]], [[1]], place)
job_id = fluid.create_lod_tensor([[10]], [[1]], place)
N
Nicky 已提交
463 464
movie_id = fluid.create_lod_tensor([[783]], [[1]], place) # Hunchback of Notre Dame
category_id = fluid.create_lod_tensor([[10, 8, 9]], [[3]], place) # Animation, Children's, Musical
465
movie_title = fluid.create_lod_tensor([[1069, 4140, 2923, 710, 988]], [[5]],
N
Nicky 已提交
466
                                      place) # 'hunchback','of','notre','dame','the'
L
liaogang 已提交
467
```
Y
Yu Yang 已提交
468

469 470
### Infer

N
Nicky 已提交
471
Now we can infer with inputs that we provide in `feed_order` during training.
C
choijulie 已提交
472

L
liaogang 已提交
473
```python
474 475 476 477 478 479 480 481 482 483 484 485
results = inferencer.infer(
    {
        'user_id': user_id,
        'gender_id': gender_id,
        'age_id': age_id,
        'job_id': job_id,
        'movie_id': movie_id,
        'category_id': category_id,
        'movie_title': movie_title
    },
    return_numpy=False)

N
Nicky 已提交
486 487 488
predict_rating = np.array(results[0])
print("Predict Rating of user id 1 on movie \"" + infer_movie_name + "\" is " + str(predict_rating[0][0]))
print("Actual Rating of user id 1 on movie \"" + infer_movie_name + "\" is 4.")
489

L
livc 已提交
490 491
```

C
choijulie 已提交
492
## Conclusion
Y
Yu Yang 已提交
493

C
choijulie 已提交
494
This tutorial goes over traditional approaches in recommender system and a deep learning based approach.  We also show that how to train and use the model with PaddlePaddle.  Deep learning has been well used in computer vision and NLP, we look forward to its new successes in recommender systems.
L
livc 已提交
495

M
Mimee 已提交
496
## References
L
livc 已提交
497

C
choijulie 已提交
498 499
1. [Peter Brusilovsky](https://en.wikipedia.org/wiki/Peter_Brusilovsky) (2007). *The Adaptive Web*. p. 325.
2. Robin Burke , [Hybrid Web Recommender Systems](http://www.dcs.warwick.ac.uk/~acristea/courses/CS411/2010/Book%20-%20The%20Adaptive%20Web/HybridWebRecommenderSystems.pdf), pp. 377-408, The Adaptive Web, Peter Brusilovsky, Alfred Kobsa, Wolfgang Nejdl (Ed.), Lecture Notes in Computer Science, Springer-Verlag, Berlin, Germany, Lecture Notes in Computer Science, Vol. 4321, May 2007, 978-3-540-72078-2.
500
3. P. Resnick, N. Iacovou, etc. “[GroupLens: An Open Architecture for Collaborative Filtering of Netnews](http://ccs.mit.edu/papers/CCSWP165.html)”, Proceedings of ACM Conference on Computer Supported Cooperative Work, CSCW 1994. pp.175-186.
C
choijulie 已提交
501 502 503
4. Sarwar, Badrul, et al. "[Item-based collaborative filtering recommendation algorithms.](http://files.grouplens.org/papers/www10_sarwar.pdf)" *Proceedings of the 10th International Conference on World Wide Web*. ACM, 2001.
5. Kautz, Henry, Bart Selman, and Mehul Shah. "[Referral Web: Combining Social networks and collaborative filtering.](http://www.cs.cornell.edu/selman/papers/pdf/97.cacm.refweb.pdf)" Communications of the ACM 40.3 (1997): 63-65. APA
6. Yuan, Jianbo, et al. ["Solving Cold-Start Problem in Large-scale Recommendation Engines: A Deep Learning Approach."](https://arxiv.org/pdf/1611.05480v1.pdf) *arXiv preprint arXiv:1611.05480* (2016).
504
7. Covington P, Adams J, Sargin E. [Deep neural networks for youtube recommendations](https://static.googleusercontent.com/media/research.google.com/zh-CN//pubs/archive/45530.pdf)[C]//Proceedings of the 10th ACM Conference on Recommender Systems. ACM, 2016: 191-198.
505

L
Luo Tao 已提交
506
<br/>
L
Luo Tao 已提交
507
This tutorial is contributed by <a xmlns:cc="http://creativecommons.org/ns#" href="http://book.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">PaddlePaddle</a>, and licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.