index.html 28.2 KB
Newer Older
F
fengjiayi 已提交
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42

<html>
<head>
  <script type="text/x-mathjax-config">
  MathJax.Hub.Config({
    extensions: ["tex2jax.js", "TeX/AMSsymbols.js", "TeX/AMSmath.js"],
    jax: ["input/TeX", "output/HTML-CSS"],
    tex2jax: {
      inlineMath: [ ['$','$'] ],
      displayMath: [ ['$$','$$'] ],
      processEscapes: true
    },
    "HTML-CSS": { availableFonts: ["TeX"] }
  });
  </script>
  <script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js" async></script>
  <script type="text/javascript" src="../.tools/theme/marked.js">
  </script>
  <link href="http://cdn.bootcss.com/highlight.js/9.9.0/styles/darcula.min.css" rel="stylesheet">
  <script src="http://cdn.bootcss.com/highlight.js/9.9.0/highlight.min.js"></script>
  <link href="http://cdn.bootcss.com/bootstrap/4.0.0-alpha.6/css/bootstrap.min.css" rel="stylesheet">
  <link href="https://cdn.jsdelivr.net/perfect-scrollbar/0.6.14/css/perfect-scrollbar.min.css" rel="stylesheet">
  <link href="../.tools/theme/github-markdown.css" rel='stylesheet'>
</head>
<style type="text/css" >
.markdown-body {
    box-sizing: border-box;
    min-width: 200px;
    max-width: 980px;
    margin: 0 auto;
    padding: 45px;
}
</style>


<body>

<div id="context" class="container-fluid markdown-body">
</div>

<!-- This block will be replaced by each markdown file content. Please do not change lines below.-->
<div id="markdown" style='display:none'>
H
Hao Wang 已提交
43 44


F
fengjiayi 已提交
45 46
# Sentiment Analysis

H
Hao Wang 已提交
47
The source code of this tutorial is in [book/understand_sentiment](https://github.com/PaddlePaddle/book/tree/develop/06.understand_sentiment). For new users, please refer to [Running This Book](https://github.com/PaddlePaddle/book/blob/develop/README.md#running-the-book) .
F
fengjiayi 已提交
48

H
Hao Wang 已提交
49
## Background Introduction
F
fengjiayi 已提交
50

H
Hao Wang 已提交
51
In natural language processing, sentiment analysis generally refers to judging the emotion expressed by a piece of text. Among them, a piece of text can be a sentence, a paragraph or a document. Emotional state can be two categories, such as (positive, negative), (happy, sad); or three categories, such as (positive, negative, neutral) and so on.The application scenarios of understanding sentiment are very broad, such as dividing the comments posted by users on shopping websites (Amazon, Tmall, Taobao, etc.), travel websites, and movie review websites into positive comments and negative comments; or in order to analyze the user's overall experience with a product, grab user reviews of the product, and perform sentiment analysis. Table 1 shows an example of understanding sentiment of movie reviews:
F
fengjiayi 已提交
52

H
Hao Wang 已提交
53 54 55 56 57 58
| Movie Comments | Category |
| -------- | ----- |
|In Feng Xiaogang’s movies of the past few years, it is the best one | Positive |
|Very bad feat, like a local TV series | Negative |
|The round-lens lens is full of brilliance, and the tonal background is beautiful, but the plot is procrastinating, the accent is not good, and even though taking an effort but it is hard to focus on the show | Negative |
|The plot could be scored 4 stars. In addition, the angle of the round lens plusing the scenery of Wuyuan is very much like the feeling of Chinese landscape painting. It satisfied me. | Positive |
F
fengjiayi 已提交
59

H
Hao Wang 已提交
60
<p align="center">Form 1  Sentiment analysis of movie comments</p>
F
fengjiayi 已提交
61

H
Hao Wang 已提交
62
In natural language processing, sentiment is a typical problem of **text categorization**, which divides the text that needs to be sentiment analysis into its category. Text categorization involves two issues: text representation and classification methods. Before the emergence of the deep learning, the mainstream text representation methods are BOW (bag of words), topic models, etc.; the classification methods are SVM (support vector machine), LR (logistic regression) and so on.
F
fengjiayi 已提交
63

H
Hao Wang 已提交
64
For a piece of text, BOW means that its word order, grammar and syntax are ignored, and this text is only treated as a collection of words, so the BOW method does not adequately represent the semantic information of the text. For example, the sentence "This movie is awful" and "a boring, empty, non-connotative work" have a high semantic similarity in sentiment analysis, but their BOW representation has a similarity of zero. Another example is that the BOW is very similar to the sentence "an empty, work without connotations" and "a work that is not empty and has connotations", but in fact they mean differently.
F
fengjiayi 已提交
65

H
Hao Wang 已提交
66
The deep learning we are going to introduce in this chapter overcomes the above shortcomings of BOW representation. It maps text to low-dimensional semantic space based on word order, and performs text representation and classification in end-to-end mode. Its performance is significantly improved compared to the traditional method \[[1](#References)\].
F
fengjiayi 已提交
67 68

## Model Overview
H
Hao Wang 已提交
69
The text representation models used in this chapter are Convolutional Neural Networks and Recurrent Neural Networks and their extensions. These models are described below.
F
fengjiayi 已提交
70

H
Hao Wang 已提交
71
### Introduction of Text Convolutional Neural Networks (CNN)
F
fengjiayi 已提交
72

H
Hao Wang 已提交
73
We introduced the calculation process of the CNN model applied to text data in the [Recommended System](https://github.com/PaddlePaddle/book/tree/develop/05.recommender_system) section. Here is a simple review.
F
fengjiayi 已提交
74

H
Hao Wang 已提交
75
For a CNN, first convolute input word vector sequence to generate a feature map, and then obtain the features of the whole sentence corresponding to the kernel by using a max pooling over time on the feature map. Finally, the splicing of all the features obtained is the fixed-length vector representation of the text. For the text classification problem, connecting it via softmax to construct a complete model. In actual applications, we use multiple convolution kernels to process sentences, and convolution kernels with the same window size are stacked to form a matrix, which can complete the operation more efficiently. In addition, we can also use the convolution kernel with different window sizes to process the sentence. Figure 3 in the [Recommend System](https://github.com/PaddlePaddle/book/tree/develop/05.recommender_system) section shows four convolution kernels, namely Figure 1 below, with different colors representing convolution kernel operations of different sizes.
F
fengjiayi 已提交
76

H
Hao Wang 已提交
77 78 79 80
<p align="center">
<img src="https://github.com/PaddlePaddle/book/blob/develop/05.recommender_system/image/text_cnn.png?raw=true" width = "80%" align="center"/><br/>
Figure 1. CNN text classification model
</p>
F
fengjiayi 已提交
81

H
Hao Wang 已提交
82
For the general short text classification problem, the simple text convolution network described above can achieve a high accuracy rate \[[1](#References)\]. If you want a more abstract and advanced text feature representation, you can construct a deep text convolutional neural network\[[2](#References), [3](#References)\].
F
fengjiayi 已提交
83 84 85

### Recurrent Neural Network (RNN)

H
Hao Wang 已提交
86
RNN is a powerful tool for accurately modeling sequence data. In fact, the theoretical computational power of the RNN is perfected by Turing' \[[4](#References)\]. Natural language is a typical sequence data (word sequence). In recent years, RNN and its derivation (such as long short term memory\[[5](#References)\]) have been applied in many natural language fields, such as in language models, syntactic parsing, semantic role labeling (or general sequence labeling), semantic representation, graphic generation, dialogue, machine translation, etc., all perform well and even become the best at present.
F
fengjiayi 已提交
87 88

<p align="center">
H
Hao Wang 已提交
89 90
<img src="https://github.com/PaddlePaddle/book/blob/develop/06.understand_sentiment/image/rnn.png?raw=true" width = "60%" align="center"/><br />
Figure 2. Schematic diagram of the RNN expanded by time
F
fengjiayi 已提交
91 92
</p>

H
Hao Wang 已提交
93
The RNN expands as time is shown in Figure 2: at the time of $t$, the network reads the $t$th input $x_t$ (vector representation) and the state value of the hidden layer at the previous moment $h_{t- 1}$ (vector representation, $h_0$ is normally initialized to $0$ vector), and calculate the state value $h_t$ of the hidden layer at this moment. Repeat this step until all the inputs have been read. If the function is recorded as $f$, its formula can be expressed as:
F
fengjiayi 已提交
94

H
Hao Wang 已提交
95
$$h_t=f(x_t,h_{t-1})=\sigma(W_{xh}x_t+W_{hh}h_{t-1}+b_h)$$
F
fengjiayi 已提交
96

H
Hao Wang 已提交
97
Where $W_{xh}$ is the matrix parameter of the input to the hidden layer, $W_{hh}$ is the matrix parameter of the hidden layer to the hidden layer, and $b_h$ is the bias vector parameter of the hidden layer, $\sigma $ is the $sigmoid$ function.
F
fengjiayi 已提交
98

H
Hao Wang 已提交
99
When dealing with natural language, the word (one-hot representation) is usually mapped to its word vector representation, and then used as the input $x_t$ for each moment of the recurrent neural network. In addition, other layers may be connected to the hidden layer of the RNN depending on actual needs. For example, you can connect the hidden layer output of a RNN to the input of the next RNN to build a deep or stacked RNN, or extract the hidden layer state at the last moment as a sentence representation and then implement a classification model, etc.
F
fengjiayi 已提交
100

H
Hao Wang 已提交
101
### Long and Short Term Memory Network (LSTM)
F
fengjiayi 已提交
102

H
Hao Wang 已提交
103
For longer sequence data, the gradient disappearance or explosion phenomenon is likely to occur during training RNN\[[6](#References)\]. To solve this problem, Hochreiter S, Schmidhuber J. (1997) proposed LSTM (long short term memory\[[5](#References)\]).
F
fengjiayi 已提交
104

H
Hao Wang 已提交
105
Compared to a simple RNN, LSTM adds memory unit $c$, input gate $i$, forget gate $f$, and output gate $o$. The combination of these gates and memory units greatly enhances the ability of the recurrent neural network to process long sequence data. If the function \is denoted as $F$, the formula is:
F
fengjiayi 已提交
106 107 108

$$ h_t=F(x_t,h_{t-1})$$

H
Hao Wang 已提交
109
$F$ It is a combination of the following formulas\[[7](#References)\]:
D
daming-lu 已提交
110 111 112 113 114
$$ i_t = \sigma{(W_{xi}x_t+W_{hi}h_{t-1}+W_{ci}c_{t-1}+b_i)} $$
$$ f_t = \sigma(W_{xf}x_t+W_{hf}h_{t-1}+W_{cf}c_{t-1}+b_f) $$
$$ c_t = f_t\odot c_{t-1}+i_t\odot tanh(W_{xc}x_t+W_{hc}h_{t-1}+b_c) $$
$$ o_t = \sigma(W_{xo}x_t+W_{ho}h_{t-1}+W_{co}c_{t}+b_o) $$
$$ h_t = o_t\odot tanh(c_t) $$
H
Hao Wang 已提交
115
Where $i_t, f_t, c_t, o_t$ respectively represent the vector representation of the input gate, the forget gate, the memory unit and the output gate, the $W$ and $b$ with the angular label are the model parameters, and the $tanh$ is the hyperbolic tangent function. , $\odot$ represents an elementwise multiplication operation. The input gate controls the intensity of the new input into the memory unit $c$, the forget gate controls the intensity of the memory unit to maintain the previous time value, and the output gate controls the intensity of the output memory unit. The three gates are calculated in a similar way, but with completely different parameters.They controll the memory unit $c$ in different ways, as shown in Figure 3:
F
fengjiayi 已提交
116 117

<p align="center">
H
Hao Wang 已提交
118 119
<img src="https://github.com/PaddlePaddle/book/blob/develop/06.understand_sentiment/image/lstm.png?raw=true" width = "65%" align="center"/><br />
Figure 3. LSTM for time $t$ [7]
F
fengjiayi 已提交
120 121
</p>

H
Hao Wang 已提交
122
LSTM enhances its ability to handle long-range dependencies by adding memory and control gates to RNN. A similar principle improvement is Gated Recurrent Unit (GRU)\[[8](#References)\], which is more concise in design. **These improvements are different, but their macro descriptions are the same as simple recurrent neural networks (as shown in Figure 2). That is, the hidden state changes according to the current input and the hidden state of the previous moment, and this process is continuous until the input is processed:**
F
fengjiayi 已提交
123

S
sheqiZ 已提交
124
$$ h_t=Recurrent(x_t,h_{t-1})$$
F
fengjiayi 已提交
125

S
sheqiZ 已提交
126
Among them, $Recurrent$ can represent a RNN, GRU or LSTM.
H
Hao Wang 已提交
127 128 129 130



<a name="Stacked Bidirectional LSTM"></a>
F
fengjiayi 已提交
131 132
### Stacked Bidirectional LSTM

H
Hao Wang 已提交
133
For a normal directional RNN, $h_t$ contains the input information before the $t$ time, which is the above context information. Similarly, in order to get the following context information, we can use a RNN in the opposite direction (which will be processed in reverse order). Combined with the method of constructing deep-loop neural networks (deep neural networks often get more abstract and advanced feature representations), we can build a more powerful LSTM-based stack bidirectional recurrent neural network\[[9](#References )\] to model time series data.
F
fengjiayi 已提交
134

H
Hao Wang 已提交
135
As shown in Figure 4 (taking three layers as an example), the odd-numbered LSTM is forward and the even-numbered LSTM is inverted. The higher-level LSTM uses the lower LSTM and all previous layers of information as input. The maximum pooling of the highest-level LSTM sequence in the time dimension can be used to obtain a fixed-length vector representation of the text (this representation fully fuses the contextual information and deeply abstracts of the text), and finally we connect the text representation to the softmax to build the classification model.
F
fengjiayi 已提交
136 137

<p align="center">
H
Hao Wang 已提交
138 139
<img src="https://github.com/PaddlePaddle/book/blob/develop/06.understand_sentiment/image/stacked_lstm.jpg?raw=true" width=450><br/>
Figure 4. Stacked bidirectional LSTM for text categorization
F
fengjiayi 已提交
140 141 142
</p>


H
Hao Wang 已提交
143
## Dataset Introduction
F
fengjiayi 已提交
144

H
Hao Wang 已提交
145 146 147 148 149 150 151 152 153 154 155
We use the [IMDB sentiment analysis data set](http://ai.stanford.edu/%7Eamaas/data/sentiment/) as an example. The training and testing IMDB dataset contain 25,000 labeled movie reviews respectively. Among them, the score of the negative comment is less than or equal to 4, and the score of the positive comment is greater than or equal to 7, full score is 10.
```text
aclImdb
|- test
   |-- neg
   |-- pos
|- train
   |-- neg
   |-- pos
```
Paddle implements the automatic download and read the imdb dataset in `dataset/imdb.py`, and provides API for reading dictionary, training data, testing data, and so on.
F
fengjiayi 已提交
156

157
## Model Configuration
F
fengjiayi 已提交
158

H
Hao Wang 已提交
159
In this example, we implement two text categorization algorithms based on the text convolutional neural network described in the [Recommender System](https://github.com/PaddlePaddle/book/tree/develop/05.recommender_system) section and [Stacked Bidirectional LSTM](#Stacked Bidirectional LSTM). We first import the packages we need to use and define global variables:
F
fengjiayi 已提交
160 161

```python
162
from __future__ import print_function
S
sidgoyal78 已提交
163 164
import paddle
import paddle.fluid as fluid
165
import numpy as np
H
Hao Wang 已提交
166 167
import sys
import math
F
fengjiayi 已提交
168

H
Hao Wang 已提交
169 170 171 172 173
CLASS_DIM = 2 #Number of categories for sentiment analysis
EMB_DIM = 128 #Dimensions of the word vector
HID_DIM = 512 #Dimensions of hide layer
STACKED_NUM = 3 #LSTM Layers of the bidirectional stack
BATCH_SIZE = 128 #batch size
F
fengjiayi 已提交
174

H
Hao Wang 已提交
175
```
F
fengjiayi 已提交
176 177


H
Hao Wang 已提交
178 179 180
### Text Convolutional Neural Network
We build the neural network `convolution_net`, the sample code is as follows.
Note that `fluid.nets.sequence_conv_pool` contains both convolution and pooling layers.
F
fengjiayi 已提交
181 182

```python
H
Hao Wang 已提交
183
#Textconvolution neural network
184
def convolution_net(data, input_dim, class_dim, emb_dim, hid_dim):
185
    emb = fluid.embedding(
S
sidgoyal78 已提交
186 187 188 189 190 191 192 193 194 195 196 197 198
        input=data, size=[input_dim, emb_dim], is_sparse=True)
    conv_3 = fluid.nets.sequence_conv_pool(
        input=emb,
        num_filters=hid_dim,
        filter_size=3,
        act="tanh",
        pool_type="sqrt")
    conv_4 = fluid.nets.sequence_conv_pool(
        input=emb,
        num_filters=hid_dim,
        filter_size=4,
        act="tanh",
        pool_type="sqrt")
199 200
    prediction = fluid.layers.fc(
        input=[conv_3, conv_4], size=class_dim, act="softmax")
S
sidgoyal78 已提交
201
    return prediction
F
fengjiayi 已提交
202 203
```

X
xiaoting 已提交
204
The network input `input_dim` indicates the size of the dictionary, and `class_dim` indicates the number of categories. Here, we implement the convolution and pooling operations using the [`sequence_conv_pool`](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/nets.py) API.
F
fengjiayi 已提交
205

H
Hao Wang 已提交
206
<a name="Stack value bidirectional LSTM"></a>
F
fengjiayi 已提交
207

208
### Stacked bidirectional LSTM
F
fengjiayi 已提交
209

H
Hao Wang 已提交
210
The code of the stack bidirectional LSTM `stacked_lstm_net` is as follows:
F
fengjiayi 已提交
211 212

```python
H
Hao Wang 已提交
213
#Stack Bidirectional LSTM
S
sidgoyal78 已提交
214
def stacked_lstm_net(data, input_dim, class_dim, emb_dim, hid_dim, stacked_num):
F
fengjiayi 已提交
215

H
Hao Wang 已提交
216
    # Calculate word vectorvector
217
    emb = fluid.embedding(
218
        input=data, size=[input_dim, emb_dim], is_sparse=True)
F
fengjiayi 已提交
219

H
Hao Wang 已提交
220 221
    #First stack
    #Fully connected layer
S
sidgoyal78 已提交
222
    fc1 = fluid.layers.fc(input=emb, size=hid_dim)
H
Hao Wang 已提交
223
    #lstm layer
S
sidgoyal78 已提交
224
    lstm1, cell1 = fluid.layers.dynamic_lstm(input=fc1, size=hid_dim)
F
fengjiayi 已提交
225 226

    inputs = [fc1, lstm1]
S
sidgoyal78 已提交
227

H
Hao Wang 已提交
228
    #All remaining stack structures
F
fengjiayi 已提交
229
    for i in range(2, stacked_num + 1):
S
sidgoyal78 已提交
230 231 232
        fc = fluid.layers.fc(input=inputs, size=hid_dim)
        lstm, cell = fluid.layers.dynamic_lstm(
            input=fc, size=hid_dim, is_reverse=(i % 2) == 0)
F
fengjiayi 已提交
233 234
        inputs = [fc, lstm]

H
Hao Wang 已提交
235
    #pooling layer
236
    fc_last = fluid.layers.sequence_pool(input=inputs[0], pool_type='max')
S
sidgoyal78 已提交
237 238
    lstm_last = fluid.layers.sequence_pool(input=inputs[1], pool_type='max')

H
Hao Wang 已提交
239 240 241
    #Fully connected layer, softmax prediction
    prediction = fluid.layers.fc(
        input=[fc_last, lstm_last], size=class_dim, act='softmax')
S
sidgoyal78 已提交
242
    return prediction
F
fengjiayi 已提交
243
```
H
Hao Wang 已提交
244
The above stacked bidirectional LSTM abstracts the advanced features and maps them to vectors of the same size as the number of classification. The 'softmax' activation function of the last fully connected layer is used to calculate the probability of a certain category.
F
fengjiayi 已提交
245

H
Hao Wang 已提交
246
Again, here we can call any network structure of `convolution_net` or `stacked_lstm_net` for training and learning. Let's take `convolution_net` as an example.
F
fengjiayi 已提交
247

H
Hao Wang 已提交
248
Next we define the prediction program (`inference_program`). We use `convolution_net` to predict the input of `fluid.layer.data`.
F
fengjiayi 已提交
249

250 251
```python
def inference_program(word_dict):
252 253
    data = fluid.data(
        name="words", shape=[-1], dtype="int64", lod_level=1)
F
fengjiayi 已提交
254

255 256
    dict_dim = len(word_dict)
    net = convolution_net(data, dict_dim, CLASS_DIM, EMB_DIM, HID_DIM)
257
    # net = stacked_lstm_net(data, dict_dim, CLASS_DIM, EMB_DIM, HID_DIM, STACKED_NUM)
258 259
    return net
```
F
fengjiayi 已提交
260

H
Hao Wang 已提交
261
We define `training_program` here, which uses the result returned from `inference_program` to calculate the error. We also define the optimization function `optimizer_func`.
F
fengjiayi 已提交
262

H
Hao Wang 已提交
263
Because it is supervised learning, the training set tags are also defined in `fluid.layers.data`. During training, cross-entropy is used as a loss function in `fluid.layer.cross_entropy`.
F
fengjiayi 已提交
264

H
Hao Wang 已提交
265
During the testing, the classifier calculates the probability of each output. The first returned value is specified as cost.
F
fengjiayi 已提交
266 267

```python
H
Hao Wang 已提交
268
def train_program(prediction):
269
    label = fluid.data(name="label", shape=[-1, 1], dtype="int64")
270 271 272
    cost = fluid.layers.cross_entropy(input=prediction, label=label)
    avg_cost = fluid.layers.mean(cost)
    accuracy = fluid.layers.accuracy(input=prediction, label=label)
H
Hao Wang 已提交
273
    return [avg_cost, accuracy] #return average cost and accuracy acc
274

H
Hao Wang 已提交
275
#Optimization function
276 277
def optimizer_func():
    return fluid.optimizer.Adagrad(learning_rate=0.002)
F
fengjiayi 已提交
278 279
```

H
Hao Wang 已提交
280 281 282
## Training Model

### Defining the training environment
F
fengjiayi 已提交
283

H
Hao Wang 已提交
284
Define whether your training is on the CPU or GPU:
F
fengjiayi 已提交
285 286 287


```python
H
Hao Wang 已提交
288
use_cuda = False #train on cpu
289
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
F
fengjiayi 已提交
290 291
```

H
Hao Wang 已提交
292
### Defining the data creator
F
fengjiayi 已提交
293

H
Hao Wang 已提交
294
The next step is to define a data creator for training and testing. The creator reads in a data of size BATCH_SIZE. Paddle.dataset.imdb.word_dict will provide a size of BATCH_SIZE after each time shuffling, which is the cache size: buf_size.
F
fengjiayi 已提交
295

H
Hao Wang 已提交
296
Note: It may take a few minutes to read the IMDB data, please be patient.
F
fengjiayi 已提交
297

298 299 300
```python
print("Loading IMDB word dict....")
word_dict = paddle.dataset.imdb.word_dict()
F
fengjiayi 已提交
301

302
print ("Reading training data....")
303
train_reader = fluid.io.batch(
F
fengjiayi 已提交
304
    paddle.reader.shuffle(
305 306
        paddle.dataset.imdb.train(word_dict), buf_size=25000),
    batch_size=BATCH_SIZE)
H
Hao Wang 已提交
307
print("Reading testing data....")
308
test_reader = fluid.io.batch(
H
Hao Wang 已提交
309
    paddle.dataset.imdb.test(word_dict), batch_size=BATCH_SIZE)
310
```
H
Hao Wang 已提交
311
Word_dict is a dictionary sequence, which is the correspondence between words and labels. You can see it specifically by running the next code:
312
```python
H
Hao Wang 已提交
313
word_dict
F
fengjiayi 已提交
314
```
H
Hao Wang 已提交
315
Each line is a correspondence such as ('limited': 1726), which indicates that the label corresponding to the word limited is 1726.
F
fengjiayi 已提交
316

H
Hao Wang 已提交
317 318
### Construction Trainer
The trainer requires a training program and a training optimization function.
319

H
Hao Wang 已提交
320 321 322 323 324 325 326
```python
exe = fluid.Executor(place)
prediction = inference_program(word_dict)
[avg_cost, accuracy] = train_program(prediction)#training program
sgd_optimizer = optimizer_func()# training optimization function
sgd_optimizer.minimize(avg_cost)
```
F
fengjiayi 已提交
327

H
Hao Wang 已提交
328
This function is used to calculate the result of the model on the test dataset.
F
fengjiayi 已提交
329
```python
H
Hao Wang 已提交
330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347
def train_test(program, reader):
    count = 0
    feed_var_list = [
        program.global_block().var(var_name) for var_name in feed_order
    ]
    feeder_test = fluid.DataFeeder(feed_list=feed_var_list, place=place)
    test_exe = fluid.Executor(place)
    accumulated = len([avg_cost, accuracy]) * [0]
    for test_data in reader():
        avg_cost_np = test_exe.run(
            program=program,
            feed=feeder_test.feed(test_data),
            fetch_list=[avg_cost, accuracy])
        accumulated = [
            x[0] + x[1][0] for x in zip(accumulated, avg_cost_np)
        ]
        count += 1
    return [x / count for x in accumulated]
F
fengjiayi 已提交
348 349
```

H
Hao Wang 已提交
350
### Providing data and building a main training loop
351

H
Hao Wang 已提交
352
`feed_order` is used to define the mapping relationship between each generated data and `fluid.layers.data`. For example, the data in the first column generated by `imdb.train` corresponds to the `words` feature.
F
fengjiayi 已提交
353 354

```python
355 356 357
# Specify the directory path to save the parameters
params_dirname = "understand_sentiment_conv.inference.model"

H
Hao Wang 已提交
358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395
feed_order = ['words', 'label']
pass_num = 1  #Number rounds of the training loop

# Main loop part of the program
def train_loop(main_program):
    # Start the trainer built above
    exe.run(fluid.default_startup_program())

    feed_var_list_loop = [
        main_program.global_block().var(var_name) for var_name in feed_order
    ]
    feeder = fluid.DataFeeder(
        feed_list=feed_var_list_loop, place=place)

    test_program = fluid.default_main_program().clone(for_test=True)

    # Training loop
    for epoch_id in range(pass_num):
        for step_id, data in enumerate(train_reader()):
            # Running trainer
            metrics = exe.run(main_program,
                              feed=feeder.feed(data),
                              fetch_list=[avg_cost, accuracy])

            # Testing Results
            avg_cost_test, acc_test = train_test(test_program, test_reader)
            print('Step {0}, Test Loss {1:0.2}, Acc {2:0.2}'.format(
                step_id, avg_cost_test, acc_test))

            print("Step {0}, Epoch {1} Metrics {2}".format(
                step_id, epoch_id, list(map(np.array,
                                            metrics))))

            if step_id == 30:
                if params_dirname is not None:
                    fluid.io.save_inference_model(params_dirname, ["words"],
                                                  prediction, exe)# Save model
                return
F
fengjiayi 已提交
396 397
```

H
Hao Wang 已提交
398 399 400
### Training process

We print the output of each step in the main loop of the training, and we can observe the training situation.
401

H
Hao Wang 已提交
402 403 404
### Start training

Finally, we start the training main loop to start training. The training time is longer. If you want to get the result faster, you can shorten the training time by adjusting the loss value range or the number of training steps at the cost of reducing the accuracy.
F
fengjiayi 已提交
405 406

```python
H
Hao Wang 已提交
407
train_loop(fluid.default_main_program())
F
fengjiayi 已提交
408 409
```

H
Hao Wang 已提交
410
## Application Model
411

H
Hao Wang 已提交
412
### Building a predictor
413

H
Hao Wang 已提交
414
As the training process, we need to create a prediction process and use the trained models and parameters to make predictions. `params_dirname` is used to store the various parameters in the training process.
415 416

```python
H
Hao Wang 已提交
417 418 419
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
exe = fluid.Executor(place)
inference_scope = fluid.core.Scope()
420 421
```

H
Hao Wang 已提交
422
### Generating test input data
423

H
Hao Wang 已提交
424 425
In order to make predictions, we randomly select 3 comments. We correspond each word in the comment to the id in `word_dict`. If the word is not in the dictionary, set it to `unknown`.
Then we use `create_lod_tensor` to create the tensor of the detail level. For a detailed explanation of this function, please refer to [API documentation](http://paddlepaddle.org/documentation/docs/en/1.2/user_guides/howto/basic_concept/lod_tensor.html).
426 427 428 429 430 431 432 433 434

```python
reviews_str = [
    'read the book forget the movie', 'this is a great movie', 'this is very bad'
]
reviews = [c.split() for c in reviews_str]

UNK = word_dict['<unk>']
lod = []
435
base_shape = []
436

437 438 439 440
for c in reviews:
    re = np.array([np.int64(word_dict.get(words, UNK)) for words in c])
    lod = np.concatenate([lod,re],axis = 0)
    base_shape.insert(-1, re.shape[0])
441

442 443
base_shape = [base_shape]
lod = np.array(lod).astype('int64')
444 445 446
tensor_words = fluid.create_lod_tensor(lod, base_shape, place)
```

H
Hao Wang 已提交
447
## Applying models and making predictions
448

H
Hao Wang 已提交
449
Now we can make positive or negative predictions for each comment.
450 451

```python
H
Hao Wang 已提交
452
with fluid.scope_guard(inference_scope):
453

H
Hao Wang 已提交
454 455
    [inferencer, feed_target_names,
     fetch_targets] = fluid.io.load_inference_model(params_dirname, exe)
456

H
Hao Wang 已提交
457 458 459 460 461 462 463 464 465
    assert feed_target_names[0] == "words"
    results = exe.run(inferencer,
                      feed={feed_target_names[0]: tensor_words},
                      fetch_list=fetch_targets,
                      return_numpy=False)
    np_data = np.array(results[0])
    for i, r in enumerate(np_data):
        print("Predict probability of ", r[0], " to be positive and ", r[1],
              " to be negative for review \'", reviews_str[i], "\'")
466 467

```
F
fengjiayi 已提交
468

H
Hao Wang 已提交
469

F
fengjiayi 已提交
470 471
## Conclusion

H
Hao Wang 已提交
472 473 474
In this chapter, we take sentiment analysis as an example to introduce end-to-end short text classification using deep learning, and complete all relevant experiments using PaddlePaddle. At the same time, we briefly introduce two text processing models: convolutional neural networks and recurrent neural networks. In the following chapters, we will see the application of these two basic deep learning models on other tasks.

<a name="References"></a>
F
fengjiayi 已提交
475

M
Mimee 已提交
476
## References
F
fengjiayi 已提交
477 478

1. Kim Y. [Convolutional neural networks for sentence classification](http://arxiv.org/pdf/1408.5882)[J]. arXiv preprint arXiv:1408.5882, 2014.
H
Hao Wang 已提交
479
2. Kalchbrenner N, Grefenstette E, Blunsom P. [A convolutional neural network for modelling sentences](http://arxiv.org/pdf/1404.2188.pdf?utm_medium=App.net&utm_source=PourOver)[J]. arXiv preprint arXiv:1404.2188, 2014.
F
fengjiayi 已提交
480 481 482 483 484 485 486 487 488
3. Yann N. Dauphin, et al. [Language Modeling with Gated Convolutional Networks](https://arxiv.org/pdf/1612.08083v1.pdf)[J] arXiv preprint arXiv:1612.08083, 2016.
4. Siegelmann H T, Sontag E D. [On the computational power of neural nets](http://research.cs.queensu.ca/home/akl/cisc879/papers/SELECTED_PAPERS_FROM_VARIOUS_SOURCES/05070215382317071.pdf)[C]//Proceedings of the fifth annual workshop on Computational learning theory. ACM, 1992: 440-449.
5. Hochreiter S, Schmidhuber J. [Long short-term memory](http://web.eecs.utk.edu/~itamar/courses/ECE-692/Bobby_paper1.pdf)[J]. Neural computation, 1997, 9(8): 1735-1780.
6. Bengio Y, Simard P, Frasconi P. [Learning long-term dependencies with gradient descent is difficult](http://www-dsi.ing.unifi.it/~paolo/ps/tnn-94-gradient.pdf)[J]. IEEE transactions on neural networks, 1994, 5(2): 157-166.
7. Graves A. [Generating sequences with recurrent neural networks](http://arxiv.org/pdf/1308.0850)[J]. arXiv preprint arXiv:1308.0850, 2013.
8. Cho K, Van Merriënboer B, Gulcehre C, et al. [Learning phrase representations using RNN encoder-decoder for statistical machine translation](http://arxiv.org/pdf/1406.1078)[J]. arXiv preprint arXiv:1406.1078, 2014.
9. Zhou J, Xu W. [End-to-end learning of semantic role labeling using recurrent neural networks](http://www.aclweb.org/anthology/P/P15/P15-1109.pdf)[C]//Proceedings of the Annual Meeting of the Association for Computational Linguistics. 2015.

<br/>
X
xiaoting 已提交
489
<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="知识共享许可协议" style="border-width:0" src="https://paddlepaddleimage.cdn.bcebos.com/bookimage/camo.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" href="http://purl.org/dc/dcmitype/Text" property="dct:title" rel="dct:type">This tutorial</span> is contributed by <a xmlns:cc="http://creativecommons.org/ns#" href="http://book.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">PaddlePaddle</a>, and licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.
F
fengjiayi 已提交
490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511

</div>
<!-- You can change the lines below now. -->

<script type="text/javascript">
marked.setOptions({
  renderer: new marked.Renderer(),
  gfm: true,
  breaks: false,
  smartypants: true,
  highlight: function(code, lang) {
    code = code.replace(/&amp;/g, "&")
    code = code.replace(/&gt;/g, ">")
    code = code.replace(/&lt;/g, "<")
    code = code.replace(/&nbsp;/g, " ")
    return hljs.highlightAuto(code, [lang]).value;
  }
});
document.getElementById("context").innerHTML = marked(
        document.getElementById("markdown").innerHTML)
</script>
</body>