README.md 31.4 KB
Newer Older
H
Hao Wang 已提交
1
# Recommender System
L
livc 已提交
2

H
Hao Wang 已提交
3
The source code of this tutorial is in [book/recommender_system](https://github.com/PaddlePaddle/book/tree/develop/05.recommender_system). For new users, please refer to [Running This Book](https://github.com/PaddlePaddle/book/blob/develop/README.md#running-the-book) .
L
livc 已提交
4

H
Hao Wang 已提交
5
## Background Introduction
L
livc 已提交
6

H
Hao Wang 已提交
7
With the continuous development of network technology and the ever-expanding scale of e-commerce, the number and variety of goods grow rapidly and users need to spend a lot of time to find the goods they want to buy. This is information overload. In order to solve this problem, recommendation system came into being.
L
livc 已提交
8

H
Hao Wang 已提交
9
The recommendation system is a subset of the Information Filtering System, which can be used in a range of areas such as movies, music, e-commerce, and Feed stream recommendations. The recommendation system discovers the user's personalized needs and interests by analyzing and mining user behaviors, and recommends information or products that may be of interest to the user. Unlike search engines, recommendation system do not require users to accurately describe their needs, but model their historical behavior to proactively provide information that meets user interests and needs.
L
livc 已提交
10

H
Hao Wang 已提交
11
The GroupLens system \[[1](#references)\] introduced by the University of Minnesota in 1994 is generally considered to be a relatively independent research direction for the recommendation system. The system first proposed the idea of completing recommendation task based on collaborative filtering. After that, the collaborative filtering recommendation based on the model led the development of recommendation system for more than ten years.
L
livc 已提交
12

H
Hao Wang 已提交
13
The traditional personalized recommendation system methods mainly include:
L
livc 已提交
14

H
Hao Wang 已提交
15 16 17
- Collaborative Filtering Recommendation: This method is one of the most widely used technologies which requires the collection and analysis of users' historical behaviors, activities and preferences. It can usually be divided into two sub-categories: User-Based Recommendation \[[1](#references)\] and Item-Based Recommendation \[[2](#references)\]. A key advantage of this method is that it does not rely on the machine to analyze the content characteristics of the item, so it does not need to understand the item itself to accurately recommend complex items such as movies. However, the disadvantage is that there is a cold start problem for new users without any behavior. At the same time, there is also a sparsity problem caused by insufficient interaction data between users and commodities. It is worth mentioning that social network \[[3](#references)\] or geographic location and other context information can be integrated into collaborative filtering.
- Content-Based Filtering Recommendation \[[4](#references)\] : This method uses the content description of the product to abstract meaningful features by calculating the similarity between the user's interest and the product description to make recommendations to users. The advantage is that it is simple and straightforward. It does not need to evaluate products based on the comments of users. Instead, it compares the product similarity by product attributes to recommend similar products to the users of interest. The disadvantage is that there is also a cold start problem for new users without any behavior.
- Hybrid Recommendation \[[5](#references)\]: Use different inputs and techniques to jointly recommend items to complement each single recommendation technique.
C
choijulie 已提交
18

H
Hao Wang 已提交
19
In recent years, deep learning has achieved great success in many fields. Both academia and industry are trying to apply deep learning to the field of recommendation systems. Deep learning has excellent ability to automatically extract features, can learn multi-level abstract feature representations, and learn heterogeneous or cross-domain content information, which can deal with the cold start problem \[[6](#references)\] of recommendation system to some extent. This tutorial focuses on the deep learning model of recommendation system and how to implement the model with PaddlePaddle.
L
livc 已提交
20

H
Hao Wang 已提交
21
## Result Demo
L
livc 已提交
22

H
Hao Wang 已提交
23
We use a dataset containing user information, movie information, and movie ratings as a recommendation system. When we train the model, we only need to input the corresponding user ID and movie ID, we can get a matching score (range [0, 5], the higher the score is regarded as the greater interest), and then according to the recommendation of all movies sort the scores and recommend them to movies that may be of interest to the user.
L
livc 已提交
24

H
Hao Wang 已提交
25 26 27 28 29 30 31
```
Input movie_id: 1962
Input user_id: 1
Prediction Score is 4.25
```

## Model Overview
L
livc 已提交
32

H
Hao Wang 已提交
33
In this chapter, we first introduce YouTube's video personalization recommendation system \[[7](#references)\], and then introduce the fusion recommendation model we implemented.
L
livc 已提交
34

H
Hao Wang 已提交
35
### YouTube's Deep Neural Network Personalized Recommendation System
C
choijulie 已提交
36

H
Hao Wang 已提交
37
YouTube is the world's largest video uploading, sharing and discovery site, and the YouTube Personalized Recommendation System recommends personalized content from a growing library to more than 1 billion users. The entire system consists of two neural networks: a candidate generation network and a ranking network. The candidate generation network generates hundreds of candidates from a million-level video library, and the ranking network sorts the candidates and outputs the highest ranked tens of results. The system structure is shown in Figure 1:
L
livc 已提交
38

39
<p align="center">
H
Hao Wang 已提交
40 41
<img src="https://github.com/PaddlePaddle/book/blob/develop/05.recommender_system/image/YouTube_Overview.png?raw=true" width="70%" ><br/>
Figure 1. YouTube personalized recommendation system structure
42
</p>
L
livc 已提交
43

C
choijulie 已提交
44
#### Candidate Generation Network
L
livc 已提交
45

H
Hao Wang 已提交
46 47 48
The candidate generation network models the recommendation problem as a multi-class classification problem with a large number of categories. For a Youtube user, using its watching history (video ID), search tokens, demographic information (such as geographic location, user login device), binary features (such as gender, whether to log in), and continuous features (such as user age), etc., multi-classify all videos in the video library to obtain the classification result of each category (ie, the recommendation probability of each video), eventually outputting hundreds of videos with high probability.

First, the historical information such as watching history and search token records are mapped to vectors and averaged to obtain a fixed length representation. At the same time, demographic characteristics are input to optimize the recommendation effect of new users, and the binary features and continuous features are normalized to the range [0, 1]. Next, put all the feature representations into a vector and input them to the non-linear multilayer perceptron (MLP, see [Identification Figures](https://github.com/PaddlePaddle/book/blob/develop/02.recognize_digits/README.md) tutorial). Finally, during training, the output of the MLP is classified by softmax. When predicting, the similarity of the user's comprehensive features (MLP output) to all videos' features is calculated, and the highest score of $k$ is obtained as the result of the candidate generation network. Figure 2 shows the candidate generation network structure.
L
livc 已提交
49

50
<p align="center">
H
Hao Wang 已提交
51 52
<img src="https://github.com/PaddlePaddle/book/blob/develop/05.recommender_system/image/Deep_candidate_generation_model_architecture.png?raw=true" width="70%" ><br/>
Figure 2. Candidate generation network structure
53
</p>
L
livc 已提交
54

H
Hao Wang 已提交
55
For a user $U$, the formula for predicting whether the video $\omega$ that the user wants to watch at the moment is video $i$ is:
L
livc 已提交
56 57 58

$$P(\omega=i|u)=\frac{e^{v_{i}u}}{\sum_{j \in V}e^{v_{j}u}}$$

H
Hao Wang 已提交
59
Where $u$ is the feature representation of the user $U$, $V$ is the video library collection, and $v_i$ is the feature representation of the $i$ video in the video library. $u$ and $v_i$ are vectors of equal length, and the dot product can be implemented by a fully connected layer.
L
livc 已提交
60

H
Hao Wang 已提交
61
Considering that the number of categories in the softmax classification is very large, in order to ensure a certain computational efficiency: 1) in the training phase, use negative sample category sampling to reduce the number of actually calculated categories to thousands; 2) in the recommendation (prediction) phase, ignore the normalized calculation of softmax (does not affect the result), and simplifies the category scoring problem into the nearest neighbor search problem in the dot product space, then takes the nearest $k$ video of $u$ as a candidate for generation.
L
livc 已提交
62

C
choijulie 已提交
63
#### Ranking Network
H
Hao Wang 已提交
64
The structure of the ranking network is similar to the candidate generation network, but its goal is to perform finer ranking of the candidates. Similar to the feature extraction method in traditional advertisement ranking, a large number of related features (such as video ID, last watching time, etc.) for video sorting are also constructed here. These features are treated similarly to the candidate generation network, except that at the top of the ranking network is a weighted logistic regression that scores all candidate videos and sorts them from high to low. Then, return to the user.
L
livc 已提交
65

H
Hao Wang 已提交
66 67
### Fusion recommendation model
This section uses Convolutional Neural Networks to learn the representation of movie titles. The convolutional neural network for text and the fusion recommendation model are introduced in turn.
68

H
Hao Wang 已提交
69
#### Convolutional Neural Network (CNN) for text
70

H
Hao Wang 已提交
71
Convolutional neural networks are often used to deal with data of a grid-like topology. For example, an image can be viewed as a pixel of a two-dimensional grid, and a natural language can be viewed as a one-dimensional sequence of words. Convolutional neural networks can extract a variety of local features and combine them to obtain more advanced feature representations. Experiments show that convolutional neural networks can efficiently model image and text problems.
72

H
Hao Wang 已提交
73
The convolutional neural network is mainly composed of convolution and pooling operations, and its application and combination methods are flexible and varied. In this section we will explain the network as shown in Figure 3:
74

C
choijulie 已提交
75
<p align="center">
H
Hao Wang 已提交
76 77
<img src="https://github.com/PaddlePaddle/book/blob/develop/05.recommender_system/image/text_cnn.png?raw=true" width = "80%" align="center"/><br />
Figure 3. Convolutional neural network text classification model
C
choijulie 已提交
78
</p>
79

H
Hao Wang 已提交
80
Suppose the length of the sentence to be processed is $n$, where the word vector of the $i$ word is $x_i\in\mathbb{R}^k$, and $k$ is the dimension size.
81

H
Hao Wang 已提交
82
First, splicing the word vector: splicing each $h$ word to form a word window of size $h$, denoted as $x_{i:i+h-1}$, which represents the word sequence splicing of $x_{i}, x_{i+1}, \ldots, x_{i+h-1}$, where $i$ represents the position of the first word in the word window throughout the sentence, ranging from $1$ to $n-h+1$, $x_{i:i+h-1}\in\mathbb{R}^{hk}$.
83

H
Hao Wang 已提交
84
Second, perform a convolution operation: apply the convolution kernel $w\in\mathbb{R}^{hk}$ to the window $x_{i:i+h-1}$ containing $h$ words. , get the feature $c_i=f(w\cdot x_{i:i+h-1}+b)$, where $b\in\mathbb{R}$ is the bias and $f$ is the non Linear activation function, such as $sigmoid$. Apply the convolution kernel to all word windows ${x_{1:h}, x_{2:h+1},\ldots,x_{n-h+1:n}}$ in the sentence, producing a feature map:
85

C
choijulie 已提交
86
$$c=[c_1,c_2,\ldots,c_{n-h+1}], c \in \mathbb{R}^{n-h+1}$$
L
livc 已提交
87

H
Hao Wang 已提交
88
Next, using the max pooling over time for feature maps to obtain the feature $\hat c$, of the whole sentence corresponding to this convolution kernel, which is the maximum value of all elements in the feature map:
L
livc 已提交
89

C
choijulie 已提交
90
$$\hat c=max(c)$$
L
livc 已提交
91

H
Hao Wang 已提交
92 93 94 95 96 97 98
#### Fusion recommendation model overview

In the film personalized recommendation system that incorporates the recommendation model:

1. First, take user features and movie features as input to the neural network, where:

   - The user features incorporate four attribute information: user ID, gender, occupation, and age.
L
livc 已提交
99

H
Hao Wang 已提交
100
   - The movie feature incorporate three attribute information: movie ID, movie type ID, and movie name.
L
livc 已提交
101

H
Hao Wang 已提交
102
2. For the user feature, map the user ID to a vector representation with a dimension size of 256, enter the fully connected layer, and do similar processing for the other three attributes. Then the feature representations of the four attributes are fully connected and added separately.
L
livc 已提交
103

H
Hao Wang 已提交
104 105 106
3. For movie features, the movie ID is processed in a manner similar to the user ID. The movie type ID is directly input into the fully connected layer in the form of a vector, and the movie name is represented by a fixed-length vector using a text convolutional neural network. The feature representations of the three attributes are then fully connected and added separately.

4. After obtaining the vector representation of the user and the movie, calculate the cosine similarity of them as the score of the personalized recommendation system. Finally, the square of the difference between the similarity score and the user's true score is used as the loss function of the regression model.
L
livc 已提交
107

108
<p align="center">
H
Hao Wang 已提交
109 110
<img src="https://github.com/PaddlePaddle/book/blob/develop/05.recommender_system/image/rec_regression_network.png?raw=true" width="90%" ><br/>
Figure 4. Fusion recommendation model
111
</p>
L
livc 已提交
112

H
Hao Wang 已提交
113 114 115
## Data Preparation

### Data Introduction and Download
L
livc 已提交
116

H
Hao Wang 已提交
117
We take [MovieLens Million Dataset (ml-1m)](http://files.grouplens.org/datasets/movielens/ml-1m.zip) as an example. The ml-1m dataset contains 1,000,000 reviews of 4,000 movies by 6,000 users (scores ranging from 1 to 5, all integer), collected by the GroupLens Research lab.
L
livc 已提交
118

H
Hao Wang 已提交
119
Paddle provides modules for automatically loading data in the API. The data module is `paddle.dataset.movielens`
L
livc 已提交
120

Y
Yu Yang 已提交
121 122

```python
123
from __future__ import print_function
124
import paddle
C
choijulie 已提交
125
movie_info = paddle.dataset.movielens.movie_info()
126
print(list(movie_info.values())[0])
L
livc 已提交
127
```
Y
Yu Yang 已提交
128

H
Hao Wang 已提交
129 130 131 132 133 134 135 136 137 138 139 140 141

```python
# Run this block to show dataset's documentation
# help(paddle.dataset.movielens)
```

The original data includes feature data of the movie, user's feature data, and the user's rating of the movie.

For example, one of the movie features is:


```python
movie_info = paddle.dataset.movielens.movie_info()
142
print(list(movie_info.values())[0])
L
livc 已提交
143 144
```

H
Hao Wang 已提交
145 146 147 148 149
    <MovieInfo id(1), title(Toy Story ), categories(['Animation', "Children's", 'Comedy'])>


This means that the movie id is 1, and the title is 《Toy Story》, which is divided into three categories. These three categories are animation, children, and comedy.

L
livc 已提交
150

Y
Yu Yang 已提交
151
```python
C
choijulie 已提交
152
user_info = paddle.dataset.movielens.user_info()
153
print(list(user_info.values())[0])
L
livc 已提交
154 155
```

H
Hao Wang 已提交
156
    <UserInfo id(1), gender(F), age(1), job(10)>
L
livc 已提交
157

Y
Yu Yang 已提交
158

H
Hao Wang 已提交
159
This means that the user ID is 1, female, and younger than 18 years old. The occupation ID is 10.
Y
Yu Yang 已提交
160 161


H
Hao Wang 已提交
162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199
Among them, the age uses the following distribution

*  1:  "Under 18"
* 18:  "18-24"
* 25:  "25-34"
* 35:  "35-44"
* 45:  "45-49"
* 50:  "50-55"
* 56:  "56+"

The occupation is selected from the following options:

*  0:  "other" or not specified
*  1:  "academic/educator"
*  2:  "artist"
*  3:  "clerical/admin"
*  4:  "college/grad student"
*  5:  "customer service"
*  6:  "doctor/health care"
*  7:  "executive/managerial"
*  8:  "farmer"
*  9:  "homemaker"
* 10:  "K-12 student"
* 11:  "lawyer"
* 12:  "programmer"
* 13:  "retired"
* 14:  "sales/marketing"
* 15:  "scientist"
* 16:  "self-employed"
* 17:  "technician/engineer"
* 18:  "tradesman/craftsman"
* 19:  "unemployed"
* 20:  "writer"

For each training or test data, it is <user features> + <movie feature> + rating.

For example, we get the first training data:

Y
Yu Yang 已提交
200 201 202 203 204 205

```python
train_set_creator = paddle.dataset.movielens.train()
train_sample = next(train_set_creator())
uid = train_sample[0]
mov_id = train_sample[len(user_info[uid].value())]
206
print("User %s rates Movie %s with Score %s"%(user_info[uid], movie_info[mov_id], train_sample[-1]))
207 208
```

H
hutuxian 已提交
209 210
    User <UserInfo id(1), gender(F), age(1), job(10)> rates Movie <MovieInfo id(1193), title(One Flew Over the Cuckoo's Nest ), categories(['Drama'])> with Score [5.0]

C
choijulie 已提交
211

H
Hao Wang 已提交
212 213 214
That is, the user 1 evaluates the movie 1193 as 5 points.

## Configuration Instruction
Y
Yu Yang 已提交
215

H
Hao Wang 已提交
216 217 218 219
Below we begin to configure the model based on the form of the input data. First import the required library functions and define global variables.

- IS_SPARSE: whether to use sparse update in embedding
- PASS_NUM: number of epoch
L
livc 已提交
220

C
choijulie 已提交
221 222

```python
223 224 225 226 227 228 229 230 231 232
import math
import sys
import numpy as np
import paddle
import paddle.fluid as fluid
import paddle.fluid.layers as layers
import paddle.fluid.nets as nets

IS_SPARSE = True
BATCH_SIZE = 256
H
Hao Wang 已提交
233
PASS_NUM = 20
C
choijulie 已提交
234
```
L
livc 已提交
235

H
Hao Wang 已提交
236
Then define the model configuration for our user feature synthesis model
L
livc 已提交
237 238

```python
239
def get_usr_combined_features():
H
Hao Wang 已提交
240
    """network definition for user part"""
L
livc 已提交
241

242
    USR_DICT_SIZE = paddle.dataset.movielens.max_user_id() + 1
Y
Yu Yang 已提交
243

244
    uid = fluid.data(name='user_id', shape=[None], dtype='int64')
245

H
hutuxian 已提交
246
    usr_emb = fluid.embedding(
247 248 249 250 251 252 253 254 255 256
        input=uid,
        dtype='float32',
        size=[USR_DICT_SIZE, 32],
        param_attr='user_table',
        is_sparse=IS_SPARSE)

    usr_fc = layers.fc(input=usr_emb, size=32)

    USR_GENDER_DICT_SIZE = 2

257
    usr_gender_id = fluid.data(name='gender_id', shape=[None], dtype='int64')
258

H
hutuxian 已提交
259
    usr_gender_emb = fluid.embedding(
260 261 262 263 264 265 266 267
        input=usr_gender_id,
        size=[USR_GENDER_DICT_SIZE, 16],
        param_attr='gender_table',
        is_sparse=IS_SPARSE)

    usr_gender_fc = layers.fc(input=usr_gender_emb, size=16)

    USR_AGE_DICT_SIZE = len(paddle.dataset.movielens.age_table)
268
    usr_age_id = fluid.data(name='age_id', shape=[None], dtype="int64")
269

H
hutuxian 已提交
270
    usr_age_emb = fluid.embedding(
271 272 273 274 275 276 277 278
        input=usr_age_id,
        size=[USR_AGE_DICT_SIZE, 16],
        is_sparse=IS_SPARSE,
        param_attr='age_table')

    usr_age_fc = layers.fc(input=usr_age_emb, size=16)

    USR_JOB_DICT_SIZE = paddle.dataset.movielens.max_job_id() + 1
279
    usr_job_id = fluid.data(name='job_id', shape=[None], dtype="int64")
280

H
hutuxian 已提交
281
    usr_job_emb = fluid.embedding(
282 283 284 285 286 287 288 289 290 291 292 293 294
        input=usr_job_id,
        size=[USR_JOB_DICT_SIZE, 16],
        param_attr='job_table',
        is_sparse=IS_SPARSE)

    usr_job_fc = layers.fc(input=usr_job_emb, size=16)

    concat_embed = layers.concat(
        input=[usr_fc, usr_gender_fc, usr_age_fc, usr_job_fc], axis=1)

    usr_combined_features = layers.fc(input=concat_embed, size=200, act="tanh")

    return usr_combined_features
L
livc 已提交
295 296
```

H
Hao Wang 已提交
297
As shown in the code above, for each user, we enter a 4-dimensional feature. This includes user_id, gender_id, age_id, job_id. These dimensional features are simple integer values. In order to facilitate the subsequent neural network processing of these features, we use the language model in NLP to transform these discrete integer values ​​into embedding. And form them into usr_emb, usr_gender_emb, usr_age_emb, usr_job_emb, respectively.
298

H
Hao Wang 已提交
299 300 301
Then, we enter all the user features into a fully connected layer(fc). Combine all features into one 200-dimension feature.

Furthermore, we make a similar transformation for each movie feature, the network configuration is:
L
livc 已提交
302 303 304


```python
305
def get_mov_combined_features():
H
Hao Wang 已提交
306
    """network definition for item(movie) part"""
307 308 309

    MOV_DICT_SIZE = paddle.dataset.movielens.max_movie_id() + 1

310
    mov_id = fluid.data(name='movie_id', shape=[None], dtype='int64')
311

H
hutuxian 已提交
312
    mov_emb = fluid.embedding(
313 314 315 316 317 318 319 320 321 322
        input=mov_id,
        dtype='float32',
        size=[MOV_DICT_SIZE, 32],
        param_attr='movie_table',
        is_sparse=IS_SPARSE)

    mov_fc = layers.fc(input=mov_emb, size=32)

    CATEGORY_DICT_SIZE = len(paddle.dataset.movielens.movie_categories())

H
hutuxian 已提交
323
    category_id = fluid.data(
324
        name='category_id', shape=[None], dtype='int64', lod_level=1)
325

H
hutuxian 已提交
326
    mov_categories_emb = fluid.embedding(
327 328 329 330 331 332 333
        input=category_id, size=[CATEGORY_DICT_SIZE, 32], is_sparse=IS_SPARSE)

    mov_categories_hidden = layers.sequence_pool(
        input=mov_categories_emb, pool_type="sum")

    MOV_TITLE_DICT_SIZE = len(paddle.dataset.movielens.get_movie_title_dict())

H
hutuxian 已提交
334
    mov_title_id = fluid.data(
335
        name='movie_title', shape=[None], dtype='int64', lod_level=1)
336

H
hutuxian 已提交
337
    mov_title_emb = fluid.embedding(
338 339 340 341 342 343 344 345 346 347 348 349 350 351 352
        input=mov_title_id, size=[MOV_TITLE_DICT_SIZE, 32], is_sparse=IS_SPARSE)

    mov_title_conv = nets.sequence_conv_pool(
        input=mov_title_emb,
        num_filters=32,
        filter_size=3,
        act="tanh",
        pool_type="sum")

    concat_embed = layers.concat(
        input=[mov_fc, mov_categories_hidden, mov_title_conv], axis=1)

    mov_combined_features = layers.fc(input=concat_embed, size=200, act="tanh")

    return mov_combined_features
Y
Yu Yang 已提交
353
```
354

355

H
Hao Wang 已提交
356
The title of a movie is a sequence of integers, and the integer represents the subscript of the word in the index sequence. This sequence is sent to the `sequence_conv_pool` layer, which uses convolution and pooling on the time dimension. Because of this, the output will be fixed length, although the length of the input sequence will vary.
357

H
Hao Wang 已提交
358
Finally, we define an `inference_program` to calculate the similarity between user features and movie features using cosine similarity.
359

Y
Yu Yang 已提交
360
```python
361
def inference_program():
H
Hao Wang 已提交
362 363
    """the combined network"""

364 365 366 367 368 369 370 371 372
    usr_combined_features = get_usr_combined_features()
    mov_combined_features = get_mov_combined_features()

    inference = layers.cos_sim(X=usr_combined_features, Y=mov_combined_features)
    scale_infer = layers.scale(x=inference, scale=5.0)

    return scale_infer
```

H
Hao Wang 已提交
373
Furthermore, we define a `train_program` to use the result computed by `inference_program`, and calculate the error with the help of the tag data. We also define an `optimizer_func` to define the optimizer.
374 375 376

```python
def train_program():
H
Hao Wang 已提交
377
    """define the cost function"""
378 379 380

    scale_infer = inference_program()

381
    label = fluid.data(name='score', shape=[None, 1], dtype='float32')
382 383 384 385
    square_cost = layers.square_error_cost(input=scale_infer, label=label)
    avg_cost = layers.mean(square_cost)

    return [avg_cost, scale_infer]
386 387


N
Nicky 已提交
388 389
def optimizer_func():
    return fluid.optimizer.SGD(learning_rate=0.2)
L
livc 已提交
390 391 392
```


H
Hao Wang 已提交
393
## Training Model
L
livc 已提交
394

H
Hao Wang 已提交
395 396
### Defining the training environment
Define your training environment and specify whether the training takes place on CPU or GPU.
L
livc 已提交
397

Y
Yu Yang 已提交
398
```python
399 400
use_cuda = False
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
Y
Yu Yang 已提交
401
```
L
livc 已提交
402

H
Hao Wang 已提交
403 404
### Defining the data provider
The next step is to define a data provider for training and testing. The provider reads in a data of size `BATCH_SIZE`. `paddle.dataset.movielens.train` will provide a data of size `BATCH_SIZE` after each scribbling, and the size of the out-of-order is the cache size `buf_size`.
Y
Yu Yang 已提交
405 406

```python
H
hutuxian 已提交
407 408
train_reader = fluid.io.batch(
    fluid.io.shuffle(
409 410
        paddle.dataset.movielens.train(), buf_size=8192),
    batch_size=BATCH_SIZE)
L
livc 已提交
411

H
hutuxian 已提交
412
test_reader = fluid.io.batch(
413
    paddle.dataset.movielens.test(), batch_size=BATCH_SIZE)
C
choijulie 已提交
414
```
415

H
Hao Wang 已提交
416 417 418 419
### Constructing a training process (trainer)
We have constructed a training process here, including training optimization functions.

### Provide data
420

H
Hao Wang 已提交
421
`feed_order` is used to define the mapping between each generated data and `paddle.layer.data`. For example, the data in the first column generated by `movielens.train` corresponds to the feature `user_id`.
422

Y
Yu Yang 已提交
423
```python
H
Hao Wang 已提交
424 425 426 427
feed_order = [
    'user_id', 'gender_id', 'age_id', 'job_id', 'movie_id', 'category_id',
    'movie_title', 'score'
]
Y
Yu Yang 已提交
428
```
429

H
Hao Wang 已提交
430 431
### Building training programs and testing programs
The training program and the test program are separately constructed, and the training optimizer is imported.
L
livc 已提交
432

Y
Yu Yang 已提交
433
```python
H
Hao Wang 已提交
434 435 436 437 438 439 440 441 442 443 444 445 446
main_program = fluid.default_main_program()
star_program = fluid.default_startup_program()
[avg_cost, scale_infer] = train_program()

test_program = main_program.clone(for_test=True)
sgd_optimizer = optimizer_func()
sgd_optimizer.minimize(avg_cost)
exe = fluid.Executor(place)

def train_test(program, reader):
    count = 0
    feed_var_list = [
        program.global_block().var(var_name) for var_name in feed_order
447
    ]
H
Hao Wang 已提交
448 449 450 451 452 453 454 455 456 457 458
    feeder_test = fluid.DataFeeder(
    feed_list=feed_var_list, place=place)
    test_exe = fluid.Executor(place)
    accumulated = 0
    for test_data in reader():
        avg_cost_np = test_exe.run(program=program,
                                               feed=feeder_test.feed(test_data),
                                               fetch_list=[avg_cost])
        accumulated += avg_cost_np[0]
        count += 1
    return accumulated / count
Q
qijun 已提交
459 460
```

H
Hao Wang 已提交
461 462
### Build a training main loop and start training
We perform the training cycle according to the training cycle number (`PASS_NUM`) defined above and some other parameters, and perform a test every time. When the test result is good enough, we exit the training and save the trained parameters.
Q
qijun 已提交
463 464

```python
465 466 467
# Specify the directory path to save the parameters
params_dirname = "recommender_system.inference.model"

H
Hao Wang 已提交
468 469 470
from paddle.utils.plot import Ploter
train_prompt = "Train cost"
test_prompt = "Test cost"
471

H
Hao Wang 已提交
472
plot_cost = Ploter(train_prompt, test_prompt)
473

H
Hao Wang 已提交
474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507
def train_loop():
    feed_list = [
        main_program.global_block().var(var_name) for var_name in feed_order
    ]
    feeder = fluid.DataFeeder(feed_list, place)
    exe.run(star_program)

    for pass_id in range(PASS_NUM):
        for batch_id, data in enumerate(train_reader()):
            # train a mini-batch
            outs = exe.run(program=main_program,
                               feed=feeder.feed(data),
                               fetch_list=[avg_cost])
            out = np.array(outs[0])

            # get test avg_cost
            test_avg_cost = train_test(test_program, test_reader)

            plot_cost.append(train_prompt, batch_id, outs[0])
            plot_cost.append(test_prompt, batch_id, test_avg_cost)
            plot_cost.plot()

            if batch_id == 20:
                if params_dirname is not None:
                    fluid.io.save_inference_model(params_dirname, [
                                "user_id", "gender_id", "age_id", "job_id",
                                "movie_id", "category_id", "movie_title"
                        ], [scale_infer], exe)
                return
            print('EpochID {0}, BatchID {1}, Test Loss {2:0.2}'.format(
                            pass_id + 1, batch_id + 1, float(test_avg_cost)))

            if math.isnan(float(out[0])):
                sys.exit("got NaN loss, training failed.")
508
```
H
Hao Wang 已提交
509
Start training
510
```python
H
Hao Wang 已提交
511
train_loop()
512
```
513

H
Hao Wang 已提交
514 515 516 517
## Model Application

### Generate test data
Use the API of create_lod_tensor(data, lod, place) to generate the tensor of the detail level. `data` is a sequence, and each element is a sequence of index numbers. `lod` is the detail level's information, corresponding to `data`. For example, data = [[10, 2, 3], [2, 3]] means that it contains two sequences of lengths 3 and 2. Correspondingly lod = [[3, 2]], which indicates that it contains a layer of detail information, meaning that `data` has two sequences, lengths of 3 and 2.
518

H
Hao Wang 已提交
519
In this prediction example, we try to predict the score given by user with ID1 for the movie 'Hunchback of Notre Dame'.
L
liaogang 已提交
520

521
```python
N
Nicky 已提交
522 523
infer_movie_id = 783
infer_movie_name = paddle.dataset.movielens.movie_info()[infer_movie_id].title
H
hutuxian 已提交
524 525 526 527 528 529 530
user_id = np.array([1]).astype("int64").reshape(-1)
gender_id = np.array([1]).astype("int64").reshape(-1)
age_id = np.array([0]).astype("int64").reshape(-1)
job_id = np.array([10]).astype("int64").reshape(-1)
movie_id = np.array([783]).astype("int64").reshape(-1) # Hunchback of Notre Dame
category_id = fluid.create_lod_tensor(np.array([10, 8, 9], dtype='int64'), [[3]], place) # Animation, Children's, Musical
movie_title = fluid.create_lod_tensor(np.array([1069, 4140, 2923, 710, 988], dtype='int64'), [[5]],
N
Nicky 已提交
531
                                      place) # 'hunchback','of','notre','dame','the'
L
liaogang 已提交
532
```
Y
Yu Yang 已提交
533

H
Hao Wang 已提交
534 535
### Building the prediction process and testing
Similar to the training process, we need to build a prediction process, where `params_dirname` is the address used to store the various parameters in the training process.
C
choijulie 已提交
536

L
liaogang 已提交
537
```python
H
Hao Wang 已提交
538 539 540 541 542
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
exe = fluid.Executor(place)

inference_scope = fluid.core.Scope()
```
543

H
Hao Wang 已提交
544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569
### Testing
Now we can make predictions. The `feed_order` we provide should be consistent with the training process.


```python
with fluid.scope_guard(inference_scope):
    [inferencer, feed_target_names,
    fetch_targets] = fluid.io.load_inference_model(params_dirname, exe)

    results = exe.run(inferencer,
                          feed={
                               'user_id': user_id,
                              'gender_id': gender_id,
                              'age_id': age_id,
                              'job_id': job_id,
                              'movie_id': movie_id,
                              'category_id': category_id,
                              'movie_title': movie_title
                          },
                          fetch_list=fetch_targets,
                          return_numpy=False)
    predict_rating = np.array(results[0])
    print("Predict Rating of user id 1 on movie \"" + infer_movie_name +
              "\" is " + str(predict_rating[0][0]))
    print("Actual Rating of user id 1 on movie \"" + infer_movie_name +
              "\" is 4.")
L
livc 已提交
570 571
```

H
Hao Wang 已提交
572
## Summary
Y
Yu Yang 已提交
573

H
Hao Wang 已提交
574
This chapter introduced the traditional personalized recommendation system method and YouTube's deep neural network personalized recommendation system. It further took movie recommendation as an example, and used PaddlePaddle to train a personalized recommendation neural network model. The personalized recommendation system covers almost all aspects of e-commerce systems, social networks, advertising recommendations, search engines, etc. Deep learning technologies have played an important role in image processing, natural language processing, etc., and will also prevail in personalized recommendation systems.
L
livc 已提交
575

H
Hao Wang 已提交
576
<a name="references"></a>
M
Mimee 已提交
577
## References
L
livc 已提交
578

H
Hao Wang 已提交
579 580 581 582
1. P. Resnick, N. Iacovou, etc. “[GroupLens: An Open Architecture for Collaborative Filtering of Netnews](http://ccs.mit.edu/papers/CCSWP165.html)”, Proceedings of ACM Conference on Computer Supported Cooperative Work, CSCW 1994. pp.175-186.
2. Sarwar, Badrul, et al. "[Item-based collaborative filtering recommendation algorithms.](http://files.grouplens.org/papers/www10_sarwar.pdf)*Proceedings of the 10th international conference on World Wide Web*. ACM, 2001.
3. Kautz, Henry, Bart Selman, and Mehul Shah. "[Referral Web: combining social networks and collaborative filtering.](http://www.cs.cornell.edu/selman/papers/pdf/97.cacm.refweb.pdf)" Communications of the ACM 40.3 (1997): 63-65. APA
4. [Peter Brusilovsky](https://en.wikipedia.org/wiki/Peter_Brusilovsky) (2007). *The Adaptive Web*. p. 325.
X
xiaoting 已提交
583
5. Robin Burke , [Hybrid Web recommendation systems](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.435.7538&rep=rep1&type=pdf), pp. 377-408, The Adaptive Web, Peter Brusilovsky, Alfred Kobsa, Wolfgang Nejdl (Ed.), Lecture Notes in Computer Science, Springer-Verlag, Berlin, Germany, Lecture Notes in Computer Science, Vol. 4321, May 2007, 978-3-540-72078-2.
H
Hao Wang 已提交
584 585 586
6. Yuan, Jianbo, et al. ["Solving Cold-Start Problem in Large-scale Recommendation Engines: A Deep Learning Approach."](https://arxiv.org/pdf/1611.05480v1.pdf) *arXiv preprint arXiv:1611.05480* (2016).
7. Covington P, Adams J, Sargin E. [Deep neural networks for youtube recommendations](https://static.googleusercontent.com/media/research.google.com/zh-CN//pubs/archive/45530.pdf)[C]//Proceedings of the 10th ACM Conference on recommendation systems. ACM, 2016: 191-198.

587

L
Luo Tao 已提交
588
<br/>
X
xiaoting 已提交
589
<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="知识共享许可协议" style="border-width:0" src="https://paddlepaddleimage.cdn.bcebos.com/bookimage/camo.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" href="http://purl.org/dc/dcmitype/Text" property="dct:title" rel="dct:type">This tutorial</span> is contributed by <a xmlns:cc="http://creativecommons.org/ns#" href="http://book.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">PaddlePaddle</a>, and licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.