README.md 18.8 KB
Newer Older
C
choijulie 已提交
1
# Personalized Recommendation
L
livc 已提交
2

C
choijulie 已提交
3
The source code of this tutorial is in [book/recommender_system](https://github.com/PaddlePaddle/book/tree/develop/05.recommender_system).
L
Luo Tao 已提交
4

C
choijulie 已提交
5
For instructions on getting started with PaddlePaddle, see [PaddlePaddle installation guide](https://github.com/PaddlePaddle/book/blob/develop/README.en.md#running-the-book).
L
livc 已提交
6 7


C
choijulie 已提交
8
## Background
L
livc 已提交
9

C
choijulie 已提交
10
With the fast growth of e-commerce, online videos, and online reading business, users have to rely on recommender systems to avoid manually browsing tremendous volume of choices.  Recommender systems understand users' interest by mining user behavior and other properties of users and products.
L
livc 已提交
11

C
choijulie 已提交
12
Some well know approaches include:
L
livc 已提交
13

C
choijulie 已提交
14
- User behavior-based approach.  A well-known method is collaborative filtering. The underlying assumption is that if a person A has the same opinion as a person B on an issue, A is more likely to have B's opinion on a different issue than that of a randomly chosen person.
L
livc 已提交
15

C
choijulie 已提交
16
- Content-based recommendation[[1](#reference)]. This approach infers feature vectors that represent products from their descriptions.  It also infers feature vectors that represent users' interests.  Then it measures the relevance of users and products by some distances between these feature vectors.
L
livc 已提交
17

C
choijulie 已提交
18
- Hybrid approach[[2](#reference)]: This approach uses the content-based information to help address the cold start problem[[6](#reference)] in behavior-based approach.
L
livc 已提交
19

C
choijulie 已提交
20 21 22
Among these options, collaborative filtering might be the most studied one.  Some of its variants include user-based[[3](#reference)], item-based [[4](#reference)], social network based[[5](#reference)], and model-based.

This tutorial explains a deep learning based approach and how to implement it using PaddlePaddle.  We will train a model using a dataset that includes user information, movie information, and ratings.  Once we train the model, we will be able to get a predicted rating given a pair of user and movie IDs.
L
livc 已提交
23 24


C
choijulie 已提交
25
## Model Overview
L
livc 已提交
26

C
choijulie 已提交
27
To know more about deep learning based recommendation, let us start from going over the Youtube recommender system[[7](#reference)] before introducing our hybrid model.
L
livc 已提交
28 29


C
choijulie 已提交
30 31 32
### YouTube's Deep Learning Recommendation Model

YouTube is a video-sharing Web site with one of the largest user base in the world.  Its recommender system serves more than a billion users.  This system is composed of two major parts: candidate generation and ranking.  The former selects few hundreds of candidates from millions of videos, and the latter ranks and outputs the top 10.
L
livc 已提交
33

34
<p align="center">
C
choijulie 已提交
35 36
<img src="image/YouTube_Overview.en.png" width="70%" ><br/>
Figure 1. YouTube recommender system overview.
37
</p>
L
livc 已提交
38

C
choijulie 已提交
39
#### Candidate Generation Network
L
livc 已提交
40

C
choijulie 已提交
41
Youtube models candidate generation as a multiclass classification problem with a huge number of classes equal to the number of videos.  The architecture of the model is as follows:
L
livc 已提交
42

43
<p align="center">
C
choijulie 已提交
44 45
<img src="image/Deep_candidate_generation_model_architecture.en.png" width="70%" ><br/>
Figure 2. Deep candidate generation model.
46
</p>
L
livc 已提交
47

C
choijulie 已提交
48 49 50
The first stage of this model maps watching history and search queries into fixed-length representative features.  Then, an MLP (multi-layer perceptron, as described in the [Recognize Digits](https://github.com/PaddlePaddle/book/blob/develop/recognize_digits/README.md) tutorial) takes the concatenation of all representative vectors.  The output of the MLP represents the user' *intrinsic interests*.  At training time, it is used together with a softmax output layer for minimizing the classification error.   At serving time, it is used to compute the relevance of the user with all movies.

For a user $U$, the predicted watching probability of video $i$ is
L
livc 已提交
51 52 53

$$P(\omega=i|u)=\frac{e^{v_{i}u}}{\sum_{j \in V}e^{v_{j}u}}$$

C
choijulie 已提交
54
where $u$ is the representative vector of user $U$, $V$ is the corpus of all videos, $v_i$ is the representative vector of the $i$-th video. $u$ and $v_i$ are vectors of the same length, so we can compute their dot product using a fully connected layer.
L
livc 已提交
55

C
choijulie 已提交
56
This model could have a performance issue as the softmax output covers millions of classification labels.  To optimize performance, at the training time, the authors down-sample negative samples, so the actual number of classes is reduced to thousands.  At serving time, the authors ignore the normalization of the softmax outputs, because the results are just for ranking.
L
livc 已提交
57

C
choijulie 已提交
58
#### Ranking Network
L
livc 已提交
59

C
choijulie 已提交
60
The architecture of the ranking network is similar to that of the candidate generation network.  Similar to ranking models widely used in online advertising, it uses rich features like video ID, last watching time, etc.  The output layer of the ranking network is a weighted logistic regression, which rates all candidate videos.
L
livc 已提交
61

C
choijulie 已提交
62
### Hybrid Model
63

C
choijulie 已提交
64
In the section, let us introduce our movie recommendation system. Especially, we feed moives titles into a text convolution network to get a fixed-length representative feature vector. Accordingly we will introduce the convolutional neural network for texts and the hybrid recommendation model respectively.
65

C
choijulie 已提交
66
#### Convolutional Neural Networks for Texts (CNN)
67

C
choijulie 已提交
68
**Convolutional Neural Networks** are frequently applied to data with grid-like topology such as two-dimensional images and one-dimensional texts. A CNN can extract multiple local features, combine them, and produce high-level abstractions, which correspond to semantic understanding. Empirically, CNN is shown to be efficient for image and text modeling.
69

C
choijulie 已提交
70
CNN mainly contains convolution and pooling operation, with versatile combinations in various applications. Here, we briefly describe a CNN as shown in Figure 3.
71 72


C
choijulie 已提交
73 74 75 76
<p align="center">
<img src="image/text_cnn_en.png" width = "80%" align="center"/><br/>
Figure 3. CNN for text modeling.
</p>
77

C
choijulie 已提交
78
Let $n$ be the length of the sentence to process, and the $i$-th word has embedding as $x_i\in\mathbb{R}^k$,where $k$ is the embedding dimensionality.
79

C
choijulie 已提交
80
First, we concatenate the words by piecing together every $h$ words, each as a window of length $h$. This window is denoted as $x_{i:i+h-1}$, consisting of $x_{i},x_{i+1},\ldots,x_{i+h-1}$, where $x_i$ is the first word in the window and $i$ takes value ranging from $1$ to $n-h+1$: $x_{i:i+h-1}\in\mathbb{R}^{hk}$.
81

C
choijulie 已提交
82
Next, we apply the convolution operation: we apply the kernel $w\in\mathbb{R}^{hk}$ in each window, extracting features $c_i=f(w\cdot x_{i:i+h-1}+b)$, where $b\in\mathbb{R}$ is the bias and $f$ is a non-linear activation function such as $sigmoid$. Convolving by the kernel at every window ${x_{1:h},x_{2:h+1},\ldots,x_{n-h+1:n}}$ produces a feature map in the following form:
83

C
choijulie 已提交
84
$$c=[c_1,c_2,\ldots,c_{n-h+1}], c \in \mathbb{R}^{n-h+1}$$
L
livc 已提交
85

C
choijulie 已提交
86
Next, we apply *max pooling* over time to represent the whole sentence $\hat c$, which is the maximum element across the feature map:
L
livc 已提交
87

C
choijulie 已提交
88
$$\hat c=max(c)$$
L
livc 已提交
89

C
choijulie 已提交
90
#### Model Structure Of The Hybrid Model
L
livc 已提交
91

C
choijulie 已提交
92
In our network, the input includes features of users and movies.  The user feature includes four properties: user ID, gender, occupation, and age.  Movie features include their IDs, genres, and titles.
L
livc 已提交
93

C
choijulie 已提交
94
We use fully-connected layers to map user features into representative feature vectors and concatenate them.  The process of movie features is similar, except that for movie titles -- we feed titles into a text convolution network as described in the above section to get a fixed-length representative feature vector.
L
livc 已提交
95

C
choijulie 已提交
96
Given the feature vectors of users and movies, we compute the relevance using cosine similarity.  We minimize the squared error at training time.
L
livc 已提交
97

98
<p align="center">
C
choijulie 已提交
99 100
<img src="image/rec_regression_network_en.png" width="90%" ><br/>
Figure 4. A hybrid recommendation model.
101
</p>
L
livc 已提交
102

C
choijulie 已提交
103
## Dataset
L
livc 已提交
104

C
choijulie 已提交
105
We use the [MovieLens ml-1m](http://files.grouplens.org/datasets/movielens/ml-1m.zip) to train our model.  This dataset includes 10,000 ratings of 4,000 movies from 6,000 users to 4,000 movies.  Each rate is in the range of 1~5.  Thanks to GroupLens Research for collecting, processing and publishing the dataset.
L
livc 已提交
106

C
choijulie 已提交
107
`paddle.v2.datasets` package encapsulates multiple public datasets, including `cifar`, `imdb`, `mnist`, `moivelens` and `wmt14`, etc. There's no need for us to manually download and preprocess `MovieLens` dataset.
L
livc 已提交
108

C
choijulie 已提交
109 110
The raw `MoiveLens` contains movie ratings, relevant features from both movies and users.
For instance, one movie's feature could be:
Y
Yu Yang 已提交
111 112 113

```python
import paddle.v2 as paddle
C
choijulie 已提交
114 115
movie_info = paddle.dataset.movielens.movie_info()
print movie_info.values()[0]
L
livc 已提交
116
```
Y
Yu Yang 已提交
117

C
choijulie 已提交
118 119
```text
<MovieInfo id(1), title(Toy Story), categories(['Animation', "Children's", 'Comedy'])>
L
livc 已提交
120 121
```

C
choijulie 已提交
122
One user's feature could be:
L
livc 已提交
123

Y
Yu Yang 已提交
124
```python
C
choijulie 已提交
125 126
user_info = paddle.dataset.movielens.user_info()
print user_info.values()[0]
L
livc 已提交
127 128
```

C
choijulie 已提交
129 130 131
```text
<UserInfo id(1), gender(F), age(1), job(10)>
```
L
livc 已提交
132

C
choijulie 已提交
133
In this dateset, the distribution of age is shown as follows:
Y
Yu Yang 已提交
134

C
choijulie 已提交
135 136 137 138 139 140 141 142
```text
1: "Under 18"
18: "18-24"
25: "25-34"
35: "35-44"
45: "45-49"
50: "50-55"
56: "56+"
143
```
Y
Yu Yang 已提交
144

C
choijulie 已提交
145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169
User's occupation is selected from the following options:

```text
0: "other" or not specified
1: "academic/educator"
2: "artist"
3: "clerical/admin"
4: "college/grad student"
5: "customer service"
6: "doctor/health care"
7: "executive/managerial"
8: "farmer"
9: "homemaker"
10: "K-12 student"
11: "lawyer"
12: "programmer"
13: "retired"
14: "sales/marketing"
15: "scientist"
16: "self-employed"
17: "technician/engineer"
18: "tradesman/craftsman"
19: "unemployed"
20: "writer"
```
Y
Yu Yang 已提交
170

C
choijulie 已提交
171 172
Each record consists of three main components: user features, movie features and movie ratings.
Likewise, as a simple example, consider the following:
Y
Yu Yang 已提交
173 174 175 176 177 178 179

```python
train_set_creator = paddle.dataset.movielens.train()
train_sample = next(train_set_creator())
uid = train_sample[0]
mov_id = train_sample[len(user_info[uid].value())]
print "User %s rates Movie %s with Score %s"%(user_info[uid], movie_info[mov_id], train_sample[-1])
180 181
```

C
choijulie 已提交
182 183 184 185 186
```text
User <UserInfo id(1), gender(F), age(1), job(10)> rates Movie <MovieInfo id(1193), title(One Flew Over the Cuckoo's Nest), categories(['Drama'])> with Score [5.0]
```

The output shows that user 1 gave movie `1193` a rating of 5.
Y
Yu Yang 已提交
187

C
choijulie 已提交
188
After issuing a command `python train.py`, training will start immediately. The details will be unpacked by the following sessions to see how it works.
L
livc 已提交
189

C
choijulie 已提交
190
## Model Architecture
Y
Yu Yang 已提交
191

C
choijulie 已提交
192
### Initialize PaddlePaddle
Y
Yu Yang 已提交
193

C
choijulie 已提交
194 195 196 197 198 199
First, we must import and initialize PaddlePaddle (enable/disable GPU, set the number of trainers, etc).

```python
import paddle.v2 as paddle
paddle.init(use_gpu=False)
```
L
livc 已提交
200

C
choijulie 已提交
201
### Model Configuration
L
livc 已提交
202 203

```python
Y
Yu Yang 已提交
204
uid = paddle.layer.data(
L
livc 已提交
205 206 207
    name='user_id',
    type=paddle.data_type.integer_value(
        paddle.dataset.movielens.max_user_id() + 1))
Y
Yu Yang 已提交
208
usr_emb = paddle.layer.embedding(input=uid, size=32)
L
livc 已提交
209
usr_fc = paddle.layer.fc(input=usr_emb, size=32)
Y
Yu Yang 已提交
210 211

usr_gender_id = paddle.layer.data(
L
livc 已提交
212
    name='gender_id', type=paddle.data_type.integer_value(2))
Y
Yu Yang 已提交
213
usr_gender_emb = paddle.layer.embedding(input=usr_gender_id, size=16)
L
livc 已提交
214
usr_gender_fc = paddle.layer.fc(input=usr_gender_emb, size=16)
Y
Yu Yang 已提交
215 216

usr_age_id = paddle.layer.data(
L
livc 已提交
217 218 219
    name='age_id',
    type=paddle.data_type.integer_value(
        len(paddle.dataset.movielens.age_table)))
Y
Yu Yang 已提交
220
usr_age_emb = paddle.layer.embedding(input=usr_age_id, size=16)
L
livc 已提交
221
usr_age_fc = paddle.layer.fc(input=usr_age_emb, size=16)
Y
Yu Yang 已提交
222 223

usr_job_id = paddle.layer.data(
L
livc 已提交
224 225 226
    name='job_id',
    type=paddle.data_type.integer_value(
        paddle.dataset.movielens.max_job_id() + 1))
Y
Yu Yang 已提交
227
usr_job_emb = paddle.layer.embedding(input=usr_job_id, size=16)
L
livc 已提交
228
usr_job_fc = paddle.layer.fc(input=usr_job_emb, size=16)
Y
Yu Yang 已提交
229
```
L
livc 已提交
230

C
choijulie 已提交
231
As shown in the above code, the input is four dimension integers for each user, that is,  `user_id`,`gender_id`, `age_id` and `job_id`. In order to deal with these features conveniently, we use the language model in NLP to transform these discrete values into embedding vaules `usr_emb`, `usr_gender_emb`, `usr_age_emb` and `usr_job_emb`.
Y
Yu Yang 已提交
232 233 234

```python
usr_combined_features = paddle.layer.fc(
L
livc 已提交
235
        input=[usr_fc, usr_gender_fc, usr_age_fc, usr_job_fc],
Y
Yu Yang 已提交
236 237
        size=200,
        act=paddle.activation.Tanh())
L
livc 已提交
238 239
```

C
choijulie 已提交
240
Then, employing user features as input, directly connecting to a fully-connected layer, which is used to reduce dimension to 200.
L
livc 已提交
241

C
choijulie 已提交
242
Furthermore, we do a similar transformation for each movie feature. The model configuration is:
L
livc 已提交
243 244

```python
Y
Yu Yang 已提交
245 246 247 248 249
mov_id = paddle.layer.data(
    name='movie_id',
    type=paddle.data_type.integer_value(
        paddle.dataset.movielens.max_movie_id() + 1))
mov_emb = paddle.layer.embedding(input=mov_id, size=32)
L
livc 已提交
250
mov_fc = paddle.layer.fc(input=mov_emb, size=32)
Y
Yu Yang 已提交
251 252 253 254 255 256 257

mov_categories = paddle.layer.data(
    name='category_id',
    type=paddle.data_type.sparse_binary_vector(
        len(paddle.dataset.movielens.movie_categories())))
mov_categories_hidden = paddle.layer.fc(input=mov_categories, size=32)

L
livc 已提交
258
movie_title_dict = paddle.dataset.movielens.get_movie_title_dict()
Y
Yu Yang 已提交
259 260 261 262 263 264 265 266
mov_title_id = paddle.layer.data(
    name='movie_title',
    type=paddle.data_type.integer_value_sequence(len(movie_title_dict)))
mov_title_emb = paddle.layer.embedding(input=mov_title_id, size=32)
mov_title_conv = paddle.networks.sequence_conv_pool(
    input=mov_title_emb, hidden_size=32, context_len=3)

mov_combined_features = paddle.layer.fc(
L
livc 已提交
267
    input=[mov_fc, mov_categories_hidden, mov_title_conv],
Y
Yu Yang 已提交
268 269 270
    size=200,
    act=paddle.activation.Tanh())
```
271

C
choijulie 已提交
272
Movie title, a sequence of words represented by an integer word index sequence, will be feed into a `sequence_conv_pool` layer, which will apply convolution and pooling on time dimension. Because pooling is done on time dimension, the output will be a fixed-length vector regardless the length of the input sequence.
273

C
choijulie 已提交
274
Finally, we can use cosine similarity to calculate the similarity between user characteristics and movie features.
275

Y
Yu Yang 已提交
276 277
```python
inference = paddle.layer.cos_sim(a=usr_combined_features, b=mov_combined_features, size=1, scale=5)
L
Luo Tao 已提交
278
cost = paddle.layer.mse_cost(
Y
Yu Yang 已提交
279 280
        input=inference,
        label=paddle.layer.data(
C
choijulie 已提交
281
        name='score', type=paddle.data_type.dense_vector(1)))
L
livc 已提交
282 283
```

C
choijulie 已提交
284
## Model Training
L
livc 已提交
285

C
choijulie 已提交
286
### Define Parameters
L
livc 已提交
287

C
choijulie 已提交
288
First, we define the model parameters according to the previous model configuration `cost`.
L
livc 已提交
289

Y
Yu Yang 已提交
290
```python
C
choijulie 已提交
291
# Create parameters
Y
Yu Yang 已提交
292 293
parameters = paddle.parameters.create(cost)
```
L
livc 已提交
294

C
choijulie 已提交
295
### Create Trainer
Y
Yu Yang 已提交
296

C
choijulie 已提交
297
Before jumping into creating a training module, algorithm setting is also necessary. Here we specified Adam optimization algorithm via `paddle.optimizer`.
Y
Yu Yang 已提交
298 299

```python
C
choijulie 已提交
300 301
trainer = paddle.trainer.SGD(cost=cost, parameters=parameters,
                             update_equation=paddle.optimizer.Adam(learning_rate=1e-4))
Y
Yu Yang 已提交
302
```
L
livc 已提交
303

C
choijulie 已提交
304 305 306 307
```text
[INFO 2017-03-06 17:12:13,378 networks.py:1472] The input order is [user_id, gender_id, age_id, job_id, movie_id, category_id, movie_title, score]
[INFO 2017-03-06 17:12:13,379 networks.py:1478] The output order is [__mse_cost_0__]
```
308

C
choijulie 已提交
309
### Training
310

C
choijulie 已提交
311
`paddle.dataset.movielens.train` will yield records during each pass, after shuffling, a batch input is generated for training.
312

Y
Yu Yang 已提交
313
```python
C
choijulie 已提交
314 315 316 317
reader=paddle.batch(
    paddle.reader.shuffle(
        paddle.dataset.movielens.train(), buf_size=8192),
        batch_size=256)
Y
Yu Yang 已提交
318
```
319

C
choijulie 已提交
320
`feeding` is devoted to specifying the correspondence between each yield record and `paddle.layer.data`. For instance, the first column of data generated by `movielens.train` corresponds to `user_id` feature.
L
livc 已提交
321

Y
Yu Yang 已提交
322
```python
Q
qijun 已提交
323
feeding = {
Y
Yu Yang 已提交
324 325 326 327 328 329 330 331 332
    'user_id': 0,
    'gender_id': 1,
    'age_id': 2,
    'job_id': 3,
    'movie_id': 4,
    'category_id': 5,
    'movie_title': 6,
    'score': 7
}
Q
qijun 已提交
333 334
```

C
choijulie 已提交
335
Callback function `event_handler` and  `event_handler_plot` will be called during training when a pre-defined event happens.
Q
qijun 已提交
336 337 338 339 340 341 342 343 344 345

```python
def event_handler(event):
    if isinstance(event, paddle.event.EndIteration):
        if event.batch_id % 100 == 0:
            print "Pass %d Batch %d Cost %.2f" % (
                event.pass_id, event.batch_id, event.cost)
```

```python
L
liaogang 已提交
346
from paddle.v2.plot import Ploter
Q
qijun 已提交
347

L
liaogang 已提交
348 349 350
train_title = "Train cost"
test_title = "Test cost"
cost_ploter = Ploter(train_title, test_title)
Y
Yu Yang 已提交
351

L
liaogang 已提交
352
step = 0
Y
Yu Yang 已提交
353

Q
qijun 已提交
354
def event_handler_plot(event):
Y
Yu Yang 已提交
355 356 357
    global step
    if isinstance(event, paddle.event.EndIteration):
        if step % 10 == 0:  # every 10 batches, record a train cost
L
liaogang 已提交
358
            cost_ploter.append(train_title, step, event.cost)
359

Y
Yu Yang 已提交
360
        if step % 1000 == 0: # every 1000 batches, record a test cost
L
liaogang 已提交
361 362 363 364 365
            result = trainer.test(
                reader=paddle.batch(
                    paddle.dataset.movielens.test(), batch_size=256),
                feeding=feeding)
            cost_ploter.append(test_title, step, result.cost)
366

Y
Yu Yang 已提交
367
        if step % 100 == 0: # every 100 batches, update cost plot
L
liaogang 已提交
368 369
            cost_ploter.plot()

Y
Yu Yang 已提交
370
        step += 1
L
liaogang 已提交
371
```
Y
Yu Yang 已提交
372

C
choijulie 已提交
373 374
Finally, we can invoke `trainer.train` to start training:

L
liaogang 已提交
375
```python
Y
Yu Yang 已提交
376
trainer.train(
C
choijulie 已提交
377
    reader=reader,
Q
qijun 已提交
378
    event_handler=event_handler_plot,
Q
qijun 已提交
379
    feeding=feeding,
Y
Yu Yang 已提交
380
    num_passes=2)
L
livc 已提交
381 382
```

C
choijulie 已提交
383
## Conclusion
Y
Yu Yang 已提交
384

C
choijulie 已提交
385
This tutorial goes over traditional approaches in recommender system and a deep learning based approach.  We also show that how to train and use the model with PaddlePaddle.  Deep learning has been well used in computer vision and NLP, we look forward to its new successes in recommender systems.
L
livc 已提交
386

C
choijulie 已提交
387
## Reference
L
livc 已提交
388

C
choijulie 已提交
389 390
1. [Peter Brusilovsky](https://en.wikipedia.org/wiki/Peter_Brusilovsky) (2007). *The Adaptive Web*. p. 325.
2. Robin Burke , [Hybrid Web Recommender Systems](http://www.dcs.warwick.ac.uk/~acristea/courses/CS411/2010/Book%20-%20The%20Adaptive%20Web/HybridWebRecommenderSystems.pdf), pp. 377-408, The Adaptive Web, Peter Brusilovsky, Alfred Kobsa, Wolfgang Nejdl (Ed.), Lecture Notes in Computer Science, Springer-Verlag, Berlin, Germany, Lecture Notes in Computer Science, Vol. 4321, May 2007, 978-3-540-72078-2.
391
3. P. Resnick, N. Iacovou, etc. “[GroupLens: An Open Architecture for Collaborative Filtering of Netnews](http://ccs.mit.edu/papers/CCSWP165.html)”, Proceedings of ACM Conference on Computer Supported Cooperative Work, CSCW 1994. pp.175-186.
C
choijulie 已提交
392 393 394
4. Sarwar, Badrul, et al. "[Item-based collaborative filtering recommendation algorithms.](http://files.grouplens.org/papers/www10_sarwar.pdf)" *Proceedings of the 10th International Conference on World Wide Web*. ACM, 2001.
5. Kautz, Henry, Bart Selman, and Mehul Shah. "[Referral Web: Combining Social networks and collaborative filtering.](http://www.cs.cornell.edu/selman/papers/pdf/97.cacm.refweb.pdf)" Communications of the ACM 40.3 (1997): 63-65. APA
6. Yuan, Jianbo, et al. ["Solving Cold-Start Problem in Large-scale Recommendation Engines: A Deep Learning Approach."](https://arxiv.org/pdf/1611.05480v1.pdf) *arXiv preprint arXiv:1611.05480* (2016).
395
7. Covington P, Adams J, Sargin E. [Deep neural networks for youtube recommendations](https://static.googleusercontent.com/media/research.google.com/zh-CN//pubs/archive/45530.pdf)[C]//Proceedings of the 10th ACM Conference on Recommender Systems. ACM, 2016: 191-198.
396

L
Luo Tao 已提交
397
<br/>
C
choijulie 已提交
398
This tutorial is contributed by <a xmlns:cc="http://creativecommons.org/ns#" href="http://book.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">PaddlePaddle</a>, and licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.