index.en.html 22.2 KB
Newer Older
1

Z
Zhuoyuan 已提交
2 3 4 5 6 7 8
<html>
<head>
  <script type="text/x-mathjax-config">
  MathJax.Hub.Config({
    extensions: ["tex2jax.js", "TeX/AMSsymbols.js", "TeX/AMSmath.js"],
    jax: ["input/TeX", "output/HTML-CSS"],
    tex2jax: {
9 10
      inlineMath: [ ['$','$'] ],
      displayMath: [ ['$$','$$'] ],
Z
Zhuoyuan 已提交
11 12 13 14 15 16
      processEscapes: true
    },
    "HTML-CSS": { availableFonts: ["TeX"] }
  });
  </script>
  <script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js" async></script>
Y
Yu Yang 已提交
17
  <script type="text/javascript" src="../.tools/theme/marked.js">
Z
Zhuoyuan 已提交
18 19 20 21 22
  </script>
  <link href="http://cdn.bootcss.com/highlight.js/9.9.0/styles/darcula.min.css" rel="stylesheet">
  <script src="http://cdn.bootcss.com/highlight.js/9.9.0/highlight.min.js"></script>
  <link href="http://cdn.bootcss.com/bootstrap/4.0.0-alpha.6/css/bootstrap.min.css" rel="stylesheet">
  <link href="https://cdn.jsdelivr.net/perfect-scrollbar/0.6.14/css/perfect-scrollbar.min.css" rel="stylesheet">
Y
Yu Yang 已提交
23
  <link href="../.tools/theme/github-markdown.css" rel='stylesheet'>
Z
Zhuoyuan 已提交
24 25 26 27 28 29 30 31 32 33 34 35 36 37
</head>
<style type="text/css" >
.markdown-body {
    box-sizing: border-box;
    min-width: 200px;
    max-width: 980px;
    margin: 0 auto;
    padding: 45px;
}
</style>


<body>

Y
Yu Yang 已提交
38
<div id="context" class="container-fluid markdown-body">
Z
Zhuoyuan 已提交
39 40 41 42 43 44
</div>

<!-- This block will be replaced by each markdown file content. Please do not change lines below.-->
<div id="markdown" style='display:none'>
# Sentiment Analysis

L
Luo Tao 已提交
45
The source codes of this section can be located at [book/understand_sentiment](https://github.com/PaddlePaddle/book/tree/develop/06.understand_sentiment). First-time users may refer to PaddlePaddle for [Installation guide](https://github.com/PaddlePaddle/book/blob/develop/README.en.md#running-the-book).
Z
Zhuoyuan 已提交
46

L
liaogang 已提交
47
## Background
Z
Zhuoyuan 已提交
48

49
In natural language processing, sentiment analysis refers to determining the emotion expressed in a piece of text. The text can be a sentence, a paragraph, or a document. Emotion categorization can be binary -- positive/negative or happy/sad -- or in three classes -- positive/neutral/negative. Sentiment analysis is applicable in a wide range of services, such as e-commerce sites like Amazon and Taobao, hospitality services like Airbnb and hotels.com, and movie rating sites like Rotten Tomatoes and IMDB. It can be used to gauge from the reviews how the customers feel about the product. Table 1 illustrates an example of sentiment analysis in movie reviews:
Z
Zhuoyuan 已提交
50 51 52 53 54 55 56 57 58 59

| Movie Review       | Category  |
| --------     | -----  |
| Best movie of Xiaogang Feng in recent years!| Positive |
| Pretty bad. Feels like a tv-series from a local TV-channel     | Negative |
| Politically correct version of Taken ... and boring as Heck| Negative|
|delightful, mesmerizing, and completely unexpected. The plot is nicely designed.|Positive|

<p align="center">Table 1 Sentiment Analysis in Movie Reviews</p>

60
In natural language processing, sentiment analysis can be categorized as a **Text Classification problem**, i.e., to categorize a piece of text to a specific class. It involves two related tasks: text representation and classification. Before the emergence of deep learning techniques, the mainstream methods for text representation include BOW (*bag of words*) and topic modeling, while the latter contain SVM (*support vector machine*) and LR (*logistic regression*).
Z
Zhuoyuan 已提交
61

62
The BOW model does not capture all the information in a piece of text, as it ignores syntax and grammar and just treats the text as a set of words. For example, “this movie is extremely bad“ and “boring, dull, and empty work” describe very similar semantic meaning, yet their BOW representations have with little similarity. Furthermore, “the movie is bad“ and “the movie is not bad“ have high similarity with BOW features, but they express completely opposite semantics.
Z
Zhuoyuan 已提交
63

64
This chapter introduces a deep learning model that handles these issues in BOW. Our model embeds texts into a low-dimensional space and takes word order into consideration. It is an end-to-end framework and it has large performance improvement over traditional methods \[[1](#Reference)\].
Z
Zhuoyuan 已提交
65 66

## Model Overview
L
liaogang 已提交
67

68
The model we used in this chapter uses **Convolutional Neural Networks** (**CNNs**) and **Recurrent Neural Networks** (**RNNs**) with some specific extensions.
Z
Zhuoyuan 已提交
69 70


71
### Revisit to the Convolutional Neural Networks for Texts (CNN)
L
liaogang 已提交
72

73
The convolutional neural network for texts is introduced in chapter [recommender_system](https://github.com/PaddlePaddle/book/tree/develop/05.recommender_system), here we make a brief overview.
Z
Zhuoyuan 已提交
74

75
CNN mainly contains convolution and pooling operation, with versatile combinations in various applications. We first apply the convolution operation: we apply the kernel in each window, extracting features. Convolving by the kernel at every window produces a feature map. Next, we apply *max pooling* over time to represent the whole sentence, which is the maximum element across the feature map. In real applications, we will apply multiple CNN kernels on the sentences. It can be implemented efficiently by concatenating the kernels together as a matrix. Also, we can use CNN kernels with different kernel size. Finally, concatenating the resulting features produces a fixed-length representation, which can be combined with a softmax to form the model for the sentiment analysis problem.
Z
Zhuoyuan 已提交
76

77
For short texts, the aforementioned CNN model can achieve very high accuracy \[[1](#Reference)\]. If we want to extract more abstract representations, we may apply a deeper CNN model \[[2](#Reference),[3](#Reference)\].
Z
Zhuoyuan 已提交
78

L
liaogang 已提交
79 80
### Recurrent Neural Network (RNN)

81
RNN is an effective model for sequential data. In terms of computability, the RNN is Turing-complete \[[4](#Reference)\]. Since NLP is a classical problem on sequential data, the RNN, especially its variant LSTM\[[5](#Reference)\]), achieves state-of-the-art performance on various NLP tasks, such as language modeling, syntax parsing, POS-tagging, image captioning, dialog, machine translation, and so forth.
Z
Zhuoyuan 已提交
82 83 84

<p align="center">
<img src="image/rnn.png" width = "60%" align="center"/><br/>
85
Figure 1. An illustration of an unfolded RNN in time.
Z
Zhuoyuan 已提交
86
</p>
L
fix bug  
livc 已提交
87

88
As shown in Figure 1, we unfold an RNN: at the $t$-th time step, the network takes two inputs: the $t$-th input vector $\vec{x_t}$ and the latent state from the last time-step $\vec{h_{t-1}}$. From those, it computes the latent state of the current step $\vec{h_t}$. This process is repeated until all inputs are consumed. Denoting the RNN as function $f$, it can be formulated as follows:
Z
Zhuoyuan 已提交
89

90
$$\vec{h_t}=f(\vec{x_t},\vec{h_{t-1}})=\sigma(W_{xh}\vec{x_t}+W_{hh}\vec{h_{h-1}}+\vec{b_h})$$
Z
Zhuoyuan 已提交
91

92
where $W_{xh}$ is the weight matrix to feed into the latent layer; $W_{hh}$ is the latent-to-latent matrix; $b_h$ is the latent bias and $\sigma$ refers to the $sigmoid$ function.
Z
Zhuoyuan 已提交
93

94
In NLP, words are often represented as a one-hot vectors and then mapped to an embedding. The embedded feature goes through an RNN as input $x_t$ at every time step. Moreover, we can add other layers on top of RNN, such as a deep or stacked RNN. Finally, the last latent state may be used as a feature for sentence classification.
Z
Zhuoyuan 已提交
95

L
liaogang 已提交
96 97
### Long-Short Term Memory (LSTM)

98
Training an RNN on long sequential data sometimes leads to the gradient vanishing or exploding\[[6](#)\]. To solve this problem Hochreiter S, Schmidhuber J. (1997) proposed **Long Short Term Memory** (LSTM)\[[5](#Reference)\]).
Z
Zhuoyuan 已提交
99

100
Compared to the structure of a simple RNN, an LSTM includes memory cell $c$, input gate $i$, forget gate $f$ and output gate $o$. These gates and memory cells dramatically improve the ability for the network to handle long sequences. We can formulate the **LSTM-RNN**, denoted as a function $F$, as follows:
Z
Zhuoyuan 已提交
101 102 103 104 105 106 107

$$ h_t=F(x_t,h_{t-1})$$

$F$ contains following formulations\[[7](#Reference)\]:
\begin{align}
i_t & = \sigma(W_{xi}x_t+W_{hi}h_{h-1}+W_{ci}c_{t-1}+b_i)\\\\
f_t & = \sigma(W_{xf}x_t+W_{hf}h_{h-1}+W_{cf}c_{t-1}+b_f)\\\\
108
c_t & = f_t\odot c_{t-1}+i_t\odot \tanh(W_{xc}x_t+W_{hc}h_{h-1}+b_c)\\\\
Z
Zhuoyuan 已提交
109
o_t & = \sigma(W_{xo}x_t+W_{ho}h_{h-1}+W_{co}c_{t}+b_o)\\\\
110
h_t & = o_t\odot \tanh(c_t)\\\\
Z
Zhuoyuan 已提交
111 112
\end{align}

113
In the equation,$i_t, f_t, c_t, o_t$ stand for input gate, forget gate, memory cell and output gate, respectively. $W$ and $b$ are model parameters, $\tanh$ is a hyperbolic tangent, and $\odot$ denotes an element-wise product operation. The input gate controls the magnitude of the new input into the memory cell $c$; the forget gate controls the memory propagated from the last time step; the output gate controls the magnitutde of the output. The three gates are computed similarly with different parameters, and they influence memory cell $c$ separately, as shown in Figure 2:
Y
Yi Wang 已提交
114

Z
Zhuoyuan 已提交
115
<p align="center">
Y
Yi Wang 已提交
116
<img src="image/lstm_en.png" width = "65%" align="center"/><br/>
117
Figure 2. LSTM at time step $t$ [7].
Z
Zhuoyuan 已提交
118
</p>
Y
Yi Wang 已提交
119

Z
Zhuoyuan 已提交
120
LSTM enhances the ability of considering long-term reliance, with the help of memory cell and gate. Similar structures are also proposed in Gated Recurrent Unit (GRU)\[[8](Reference)\] with simpler design. **The structures are still similar to RNN, though with some modifications (As shown in Figure 2), i.e., latent status depends on input as well as the latent status of last time-step, and the process goes on recurrently until all input are consumed:**
Z
Zhuoyuan 已提交
121 122 123 124 125

$$ h_t=Recrurent(x_t,h_{t-1})$$
where $Recrurent$ is a simple RNN, GRU or LSTM.

### Stacked Bidirectional LSTM
L
liaogang 已提交
126

Z
Zhuoyuan 已提交
127 128
For vanilla LSTM, $h_t$ contains input information from previous time-step $1..t-1$ context. We can also apply an RNN with reverse-direction to take successive context $t+1…n$ into consideration. Combining constructing deep RNN (deeper RNN can contain more abstract and higher level semantic), we can design structures with deep stacked bidirectional LSTM to model sequential data\[[9](#Reference)\].

129
As shown in Figure 3 (3-layer RNN), odd/even layers are forward/reverse LSTM. Higher layers of LSTM take lower-layers LSTM as input, and the top-layer LSTM produces a fixed length vector by max-pooling (this representation considers contexts from previous and successive words for higher-level abstractions). Finally, we concatenate the output to a softmax layer for classification.
Z
Zhuoyuan 已提交
130 131

<p align="center">
Y
Yi Wang 已提交
132
<img src="image/stacked_lstm_en.png" width=450><br/>
133
Figure 3. Stacked Bidirectional LSTM for NLP modeling.
Z
Zhuoyuan 已提交
134 135
</p>

L
liaogang 已提交
136
## Dataset
Z
Zhuoyuan 已提交
137

L
liaogang 已提交
138
We use [IMDB](http://ai.stanford.edu/%7Eamaas/data/sentiment/) dataset for sentiment analysis in this tutorial, which consists of 50,000 movie reviews split evenly into 25k train and 25k test sets. In the labeled train/test sets, a negative review has a score <= 4 out of 10, and a positive review has a score >= 7 out of 10.
Z
Zhuoyuan 已提交
139

140
`paddle.datasets` package encapsulates multiple public datasets, including `cifar`, `imdb`, `mnist`, `moivelens`, and `wmt14`, etc. There's no need for us to manually download and preprocess IMDB.
Z
Zhuoyuan 已提交
141

142
After issuing a command `python train.py`, training will start immediately. The details will be unpacked by the following sessions to see how it works.
Z
Zhuoyuan 已提交
143 144


L
liaogang 已提交
145
## Model Structure
Z
Zhuoyuan 已提交
146

L
liaogang 已提交
147
### Initialize PaddlePaddle
Z
Zhuoyuan 已提交
148

L
liaogang 已提交
149
We must import and initialize PaddlePaddle (enable/disable GPU, set the number of trainers, etc).
Z
Zhuoyuan 已提交
150

L
liaogang 已提交
151 152 153
```python
import sys
import paddle.v2 as paddle
Z
Zhuoyuan 已提交
154

L
liaogang 已提交
155 156 157
# PaddlePaddle init
paddle.init(use_gpu=False, trainer_count=1)
```
Z
Zhuoyuan 已提交
158

L
liaogang 已提交
159
As alluded to in section [Model Overview](#model-overview), here we provide the implementations of both Text CNN and Stacked-bidirectional LSTM models.
Z
Zhuoyuan 已提交
160

L
liaogang 已提交
161
### Text Convolution Neural Network (Text CNN)
Z
Zhuoyuan 已提交
162

L
liaogang 已提交
163
We create a neural network `convolution_net` as the following snippet code.
Z
Zhuoyuan 已提交
164

L
liaogang 已提交
165
Note: `paddle.networks.sequence_conv_pool` includes both convolution and pooling layer operations.
Z
Zhuoyuan 已提交
166 167

```python
L
liaogang 已提交
168 169 170 171 172 173 174 175 176 177 178 179 180 181
def convolution_net(input_dim, class_dim=2, emb_dim=128, hid_dim=128):
    data = paddle.layer.data("word",
                             paddle.data_type.integer_value_sequence(input_dim))
    emb = paddle.layer.embedding(input=data, size=emb_dim)
    conv_3 = paddle.networks.sequence_conv_pool(
        input=emb, context_len=3, hidden_size=hid_dim)
    conv_4 = paddle.networks.sequence_conv_pool(
        input=emb, context_len=4, hidden_size=hid_dim)
    output = paddle.layer.fc(input=[conv_3, conv_4],
                             size=class_dim,
                             act=paddle.activation.Softmax())
    lbl = paddle.layer.data("label", paddle.data_type.integer_value(2))
    cost = paddle.layer.classification_cost(input=output, label=lbl)
    return cost
Z
Zhuoyuan 已提交
182 183
```

L
liaogang 已提交
184
1. Define input data and its dimension
Z
Zhuoyuan 已提交
185

L
liaogang 已提交
186
    Parameter `input_dim` denotes the dictionary size, and `class_dim` is the number of categories. In `convolution_net`, the input to the network is defined in `paddle.layer.data`.
Z
Zhuoyuan 已提交
187

L
liaogang 已提交
188
1. Define Classifier
Z
Zhuoyuan 已提交
189

L
liaogang 已提交
190 191 192
    The above Text CNN network extracts high-level features and maps them to a vector of the same size as the categories. `paddle.activation.Softmax` function or classifier is then used for calculating the probability of the sentence belonging to each category.

1. Define Loss Function
Z
Zhuoyuan 已提交
193

194
    In the context of supervised learning, labels of the training set are defined in `paddle.layer.data`, too. During training, cross-entropy is used as loss function in `paddle.layer.classification_cost` and as the output of the network; During testing, the outputs are the probabilities calculated in the classifier.
Z
Zhuoyuan 已提交
195

L
liaogang 已提交
196
#### Stacked bidirectional LSTM
Z
Zhuoyuan 已提交
197

L
liaogang 已提交
198
We create a neural network `stacked_lstm_net` as below.
Z
Zhuoyuan 已提交
199 200 201

```python
def stacked_lstm_net(input_dim,
L
liaogang 已提交
202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266
                     class_dim=2,
                     emb_dim=128,
                     hid_dim=512,
                     stacked_num=3):
    """
    A Wrapper for sentiment classification task.
    This network uses bi-directional recurrent network,
    consisting three LSTM layers. This configure is referred to
    the paper as following url, but use fewer layrs.
        http://www.aclweb.org/anthology/P15-1109
    input_dim: here is word dictionary dimension.
    class_dim: number of categories.
    emb_dim: dimension of word embedding.
    hid_dim: dimension of hidden layer.
    stacked_num: number of stacked lstm-hidden layer.
    """
    assert stacked_num % 2 == 1

    layer_attr = paddle.attr.Extra(drop_rate=0.5)
    fc_para_attr = paddle.attr.Param(learning_rate=1e-3)
    lstm_para_attr = paddle.attr.Param(initial_std=0., learning_rate=1.)
    para_attr = [fc_para_attr, lstm_para_attr]
    bias_attr = paddle.attr.Param(initial_std=0., l2_rate=0.)
    relu = paddle.activation.Relu()
    linear = paddle.activation.Linear()

    data = paddle.layer.data("word",
                             paddle.data_type.integer_value_sequence(input_dim))
    emb = paddle.layer.embedding(input=data, size=emb_dim)

    fc1 = paddle.layer.fc(input=emb,
                          size=hid_dim,
                          act=linear,
                          bias_attr=bias_attr)
    lstm1 = paddle.layer.lstmemory(
        input=fc1, act=relu, bias_attr=bias_attr, layer_attr=layer_attr)

    inputs = [fc1, lstm1]
    for i in range(2, stacked_num + 1):
        fc = paddle.layer.fc(input=inputs,
                             size=hid_dim,
                             act=linear,
                             param_attr=para_attr,
                             bias_attr=bias_attr)
        lstm = paddle.layer.lstmemory(
            input=fc,
            reverse=(i % 2) == 0,
            act=relu,
            bias_attr=bias_attr,
            layer_attr=layer_attr)
        inputs = [fc, lstm]

    fc_last = paddle.layer.pooling(
        input=inputs[0], pooling_type=paddle.pooling.Max())
    lstm_last = paddle.layer.pooling(
        input=inputs[1], pooling_type=paddle.pooling.Max())
    output = paddle.layer.fc(input=[fc_last, lstm_last],
                             size=class_dim,
                             act=paddle.activation.Softmax(),
                             bias_attr=bias_attr,
                             param_attr=para_attr)

    lbl = paddle.layer.data("label", paddle.data_type.integer_value(2))
    cost = paddle.layer.classification_cost(input=output, label=lbl)
    return cost
Z
Zhuoyuan 已提交
267 268
```

L
liaogang 已提交
269
1. Define input data and its dimension
Y
Yi Wang 已提交
270

L
liaogang 已提交
271
    Parameter `input_dim` denotes the dictionary size, and `class_dim` is the number of categories. In `stacked_lstm_net`, the input to the network is defined in `paddle.layer.data`.
Z
Zhuoyuan 已提交
272

L
liaogang 已提交
273
1. Define Classifier
Z
Zhuoyuan 已提交
274

L
liaogang 已提交
275
    The above stacked bidirectional LSTM network extracts high-level features and maps them to a vector of the same size as the categories. `paddle.activation.Softmax` function or classifier is then used for calculating the probability of the sentence belonging to each category.
Z
Zhuoyuan 已提交
276

L
liaogang 已提交
277
1. Define Loss Function
Z
Zhuoyuan 已提交
278

279
    In the context of supervised learning, labels of the training set are defined in `paddle.layer.data`, too. During training, cross-entropy is used as loss function in `paddle.layer.classification_cost` and as the output of the network; During testing, the outputs are the probabilities calculated in the classifier.
Z
Zhuoyuan 已提交
280 281


L
liaogang 已提交
282
To reiterate, we can either invoke `convolution_net` or `stacked_lstm_net`.
Y
Yi Wang 已提交
283

Z
Zhuoyuan 已提交
284
```python
L
liaogang 已提交
285 286 287 288 289 290 291 292
word_dict = paddle.dataset.imdb.word_dict()
dict_dim = len(word_dict)
class_dim = 2

# option 1
cost = convolution_net(dict_dim, class_dim=class_dim)
# option 2
# cost = stacked_lstm_net(dict_dim, class_dim=class_dim, stacked_num=3)
Z
Zhuoyuan 已提交
293 294 295 296
```

## Model Training

L
liaogang 已提交
297
### Define Parameters
Z
Zhuoyuan 已提交
298

L
liaogang 已提交
299
First, we create the model parameters according to the previous model configuration `cost`.
Z
Zhuoyuan 已提交
300

L
liaogang 已提交
301 302 303
```python
# create parameters
parameters = paddle.parameters.create(cost)
Z
Zhuoyuan 已提交
304 305
```

L
liaogang 已提交
306
### Create Trainer
Z
Zhuoyuan 已提交
307

L
liaogang 已提交
308 309
Before jumping into creating a training module, algorithm setting is also necessary.
Here we specified `Adam` optimization algorithm via `paddle.optimizer`.
Z
Zhuoyuan 已提交
310

L
liaogang 已提交
311 312 313 314 315 316 317 318 319 320 321
```python
# create optimizer
adam_optimizer = paddle.optimizer.Adam(
    learning_rate=2e-3,
    regularization=paddle.optimizer.L2Regularization(rate=8e-4),
    model_average=paddle.optimizer.ModelAverage(average_window=0.5))

# create trainer
trainer = paddle.trainer.SGD(cost=cost,
                                parameters=parameters,
                                update_equation=adam_optimizer)
Z
Zhuoyuan 已提交
322 323
```

L
liaogang 已提交
324
### Training
Z
Zhuoyuan 已提交
325

L
liaogang 已提交
326
`paddle.dataset.imdb.train()` will yield records during each pass, after shuffling, a batch input is generated for training.
Z
Zhuoyuan 已提交
327

L
liaogang 已提交
328 329 330 331 332
```python
train_reader = paddle.batch(
    paddle.reader.shuffle(
        lambda: paddle.dataset.imdb.train(word_dict), buf_size=1000),
    batch_size=100)
Z
Zhuoyuan 已提交
333

L
liaogang 已提交
334 335
test_reader = paddle.batch(
    lambda: paddle.dataset.imdb.test(word_dict), batch_size=100)
Z
Zhuoyuan 已提交
336 337
```

L
liaogang 已提交
338
`feeding` is devoted to specifying the correspondence between each yield record and `paddle.layer.data`. For instance, the first column of data generated by `paddle.dataset.imdb.train()` corresponds to `word` feature.
Z
Zhuoyuan 已提交
339

L
liaogang 已提交
340 341
```python
feeding = {'word': 0, 'label': 1}
Z
Zhuoyuan 已提交
342 343
```

344
Callback function `event_handler` will be invoked to track training progress when a pre-defined event happens.
Z
Zhuoyuan 已提交
345

L
liaogang 已提交
346 347 348 349 350 351 352 353 354 355
```python
def event_handler(event):
    if isinstance(event, paddle.event.EndIteration):
        if event.batch_id % 100 == 0:
            print "\nPass %d, Batch %d, Cost %f, %s" % (
                event.pass_id, event.batch_id, event.cost, event.metrics)
        else:
            sys.stdout.write('.')
            sys.stdout.flush()
    if isinstance(event, paddle.event.EndPass):
G
gongweibao 已提交
356
        result = trainer.test(reader=test_reader, feeding=feeding)
L
liaogang 已提交
357
        print "\nTest with Pass %d, %s" % (event.pass_id, result.metrics)
Z
Zhuoyuan 已提交
358 359
```

L
liaogang 已提交
360
Finally, we can invoke `trainer.train` to start training:
Z
Zhuoyuan 已提交
361

L
liaogang 已提交
362 363 364 365
```python
trainer.train(
    reader=train_reader,
    event_handler=event_handler,
H
Helin Wang 已提交
366
    feeding=feeding,
L
liaogang 已提交
367
    num_passes=10)
Z
Zhuoyuan 已提交
368 369 370
```


L
liaogang 已提交
371
## Conclusion
Z
Zhuoyuan 已提交
372

373
In this chapter, we use sentiment analysis as an example to introduce applying deep learning models on end-to-end short text classification, as well as how to use PaddlePaddle to implement the model. Meanwhile, we briefly introduce two models for text processing: CNN and RNN. In following chapters, we will see how these models can be applied in other tasks.
Z
Zhuoyuan 已提交
374 375

## Reference
L
liaogang 已提交
376

Z
Zhuoyuan 已提交
377 378 379 380 381 382 383 384 385 386 387
1. Kim Y. [Convolutional neural networks for sentence classification](http://arxiv.org/pdf/1408.5882)[J]. arXiv preprint arXiv:1408.5882, 2014.
2. Kalchbrenner N, Grefenstette E, Blunsom P. [A convolutional neural network for modelling sentences](http://arxiv.org/pdf/1404.2188.pdf?utm_medium=App.net&utm_source=PourOver)[J]. arXiv preprint arXiv:1404.2188, 2014.
3. Yann N. Dauphin, et al. [Language Modeling with Gated Convolutional Networks](https://arxiv.org/pdf/1612.08083v1.pdf)[J] arXiv preprint arXiv:1612.08083, 2016.
4. Siegelmann H T, Sontag E D. [On the computational power of neural nets](http://research.cs.queensu.ca/home/akl/cisc879/papers/SELECTED_PAPERS_FROM_VARIOUS_SOURCES/05070215382317071.pdf)[C]//Proceedings of the fifth annual workshop on Computational learning theory. ACM, 1992: 440-449.
5. Hochreiter S, Schmidhuber J. [Long short-term memory](http://web.eecs.utk.edu/~itamar/courses/ECE-692/Bobby_paper1.pdf)[J]. Neural computation, 1997, 9(8): 1735-1780.
6. Bengio Y, Simard P, Frasconi P. [Learning long-term dependencies with gradient descent is difficult](http://www-dsi.ing.unifi.it/~paolo/ps/tnn-94-gradient.pdf)[J]. IEEE transactions on neural networks, 1994, 5(2): 157-166.
7. Graves A. [Generating sequences with recurrent neural networks](http://arxiv.org/pdf/1308.0850)[J]. arXiv preprint arXiv:1308.0850, 2013.
8. Cho K, Van Merriënboer B, Gulcehre C, et al. [Learning phrase representations using RNN encoder-decoder for statistical machine translation](http://arxiv.org/pdf/1406.1078)[J]. arXiv preprint arXiv:1406.1078, 2014.
9. Zhou J, Xu W. [End-to-end learning of semantic role labeling using recurrent neural networks](http://www.aclweb.org/anthology/P/P15/P15-1109.pdf)[C]//Proceedings of the Annual Meeting of the Association for Computational Linguistics. 2015.

<br/>
388
This tutorial is contributed by <a xmlns:cc="http://creativecommons.org/ns#" href="http://book.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">PaddlePaddle</a>, and licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.
389

Z
Zhuoyuan 已提交
390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407
</div>
<!-- You can change the lines below now. -->

<script type="text/javascript">
marked.setOptions({
  renderer: new marked.Renderer(),
  gfm: true,
  breaks: false,
  smartypants: true,
  highlight: function(code, lang) {
    code = code.replace(/&amp;/g, "&")
    code = code.replace(/&gt;/g, ">")
    code = code.replace(/&lt;/g, "<")
    code = code.replace(/&nbsp;/g, " ")
    return hljs.highlightAuto(code, [lang]).value;
  }
});
document.getElementById("context").innerHTML = marked(
408
        document.getElementById("markdown").innerHTML)
Z
Zhuoyuan 已提交
409 410
</script>
</body>