auto format by precommit

58fac820 · liaogang · 9aa45361 · 58fac820
显示空白变更内容
内联并排

Showing with 200 addition and 317 deletion

understand_sentiment/index.en.html understand_sentiment/index.en.html +200 -317

未找到文件。
--- a/understand_sentiment/index.en.html
+++ b/understand_sentiment/index.en.html
@@ -44,7 +44,8 @@

 The source codes of this section can be located at [book/understand_sentiment](https://github.com/PaddlePaddle/book/tree/develop/understand_sentiment). First-time users may refer to PaddlePaddle for [Installation guide](http://www.paddlepaddle.org/doc_cn/build_and_install/index.html).

-## Background Introduction
+## Background
+
 In natural language processing, sentiment analysis refers to describing emotion status in texts. The texts may refer to a sentence, a paragraph or a document. Emotion status can be a binary classification problem (positive/negative or happy/sad), or a three-class problem (positive/neutral/negative). Sentiment analysis can be applied widely in various situations, such as online shopping (Amazon, Taobao), travel and movie websites. It can be used to grasp from the reviews how the customers feel about the product. Table 1 is an example of sentiment analysis in movie reviews:

 | Movie Review       | Category  |
@@ -64,10 +65,12 @@ For a piece of text, BOW model ignores its word order, grammar and syntax, and r
 In this chapter, we introduce our deep learning model which handles these issues in BOW. Our model embeds texts into a low-dimensional space and takes word order into consideration. It is an end-to-end framework, and has large performance improvement over traditional methods \[[1](#Reference)\].

 ## Model Overview
+
 The model we used in this chapter is the CNN (Convolutional Neural Networks) and RNN (Recurrent Neural Networks) with some specific extension.


 ### Convolutional Neural Networks for Texts (CNN)
+
 Convolutional Neural Networks are always applied in data with grid-like topology, such as 2-d images and 1-d texts. CNN can combine extracted multiple local features to produce higher-level abstract semantics. Experimentally, CNN is very efficient for image and text modeling.

 CNN mainly contains convolution and pooling operation, with various extensions. We briefly describe CNN here with an example \[[1](#Refernce)\]. As shown in Figure 1：
@@ -97,7 +100,8 @@ Finally, the CNN features are concatenated together to produce a fixed-length re

 For short texts, above CNN model can achieve high accuracy \[[1](#Reference)\]. If we want to extract more abstract representation, we may apply a deeper CNN model \[[2](#Reference),[3](#Reference)\].

-### Recurrent Neural Network（RNN）
+### Recurrent Neural Network (RNN)
+
 RNN is an effective model for sequential data. Theoretical, the  computational ability of RNN is Turing-complete \[[4](#Reference)\]. NLP is a classical sequential data, and RNN (especially its variant LSTM\[[5](#Reference)\]) achieves State-of-the-Art performance on various tasks in NLP, such as language modeling, syntax parsing, POS-tagging, image captioning, dialog, machine translation and so forth.

 <p align="center">
@@ -112,8 +116,9 @@ where $W_{xh}$ is the weight matrix from input to latent; $W_{hh}$ is the latent

 In NLP, words are first represented as a one-hot vector and then mapped to an embedding. The embedded feature goes through an RNN as input $x_t$ at every time step. Moreover, we can add other layers on top of RNN. e.g., a deep or stacked RNN. Also, the last latent state can be used as a feature for sentence classification.

-### Long-Short Term Memory
-For data of long sequence, training RNN sometimes has gradient vanishing and explosion problem \[[6](#)\]. To solve this problem Hochreiter S, Schmidhuber J. (1997) proposed the LSTM(long short term memory\[[5](#Refernce)\]).  
+### Long-Short Term Memory (LSTM)
+
+For data of long sequence, training RNN sometimes has gradient vanishing and explosion problem \[[6](#)\]. To solve this problem Hochreiter S, Schmidhuber J. (1997) proposed the LSTM(long short term memory\[[5](#Reference)\]).  

 Compared with simple RNN, the structrue of LSTM has included memory cell $c$, input gate $i$, forget gate $f$ and output gate $o$. These gates and memory cells largely improves the ability of handling long sequences. We can formulate LSTM-RNN as a function $F$ as：

@@ -141,6 +146,7 @@ $$ h_t=Recrurent(x_t,h_{t-1})$$
 where $Recrurent$ is a simple RNN, GRU or LSTM.

 ### Stacked Bidirectional LSTM
+
 For vanilla LSTM, $h_t$ contains input information from previous time-step $1..t-1$ context. We can also apply an RNN with reverse-direction to take successive context $t+1…n$ into consideration. Combining constructing deep RNN (deeper RNN can contain more abstract and higher level semantic), we can design structures with deep stacked bidirectional LSTM to model sequential data\[[9](#Reference)\].

 As shown in Figure 4 (3-layer RNN), odd/even layers are forward/reverse LSTM. Higher layers of LSTM take lower-layers LSTM as input, and the top-layer LSTM produces a fixed length vector by max-pooling (this representation considers contexts from previous and successive words for higher-level abstractions). Finally, we concatenate the output to a softmax layer for classification.
@@ -150,377 +156,254 @@ As shown in Figure 4 (3-layer RNN), odd/even layers are forward/reverse LSTM. Hi
 Figure 4. Stacked Bidirectional LSTM for NLP modeling.
 </p>

-## Data Preparation
-### Data introduction and Download
-We taks the [IMDB sentiment analysis dataset](http://ai.stanford.edu/%7Eamaas/data/sentiment/) as an example. IMDB dataset contains training and testing set, with 25000 movie reviews. With a 1-10 score, negative reviews are those with score<=4, while positives are those with score>=7. You may use following scripts to download the IMDB dataset and [Moses](http://www.statmt.org/moses/) toolbox:
+## Dataset

+We use [IMDB](http://ai.stanford.edu/%7Eamaas/data/sentiment/) dataset for sentiment analysis in this tutorial, which consists of 50,000 movie reviews split evenly into 25k train and 25k test sets. In the labeled train/test sets, a negative review has a score <= 4 out of 10, and a positive review has a score >= 7 out of 10.

-```bash
-./data/get_imdb.sh
-```
-If successful, you should see the directory ```data``` with following files:
+`paddle.datasets` package encapsulates multiple public datasets, including `cifar`, `imdb`, `mnist`, `moivelens` and `wmt14`, etc. There's no need for us to manually download and preprocess IMDB.

-```
-aclImdb  get_imdb.sh  imdb  mosesdecoder-master
-```
+After issuing a command `python train.py`, training will starting immediately. The details will be unpacked by the following sessions to see how it works.

-* aclImdb: original data downloaded from the website;
-* imdb: containing only training and testing data
-* mosesdecoder-master: Moses tool

-### Data Preprocessing
-We use the script `preprocess.py` to preprocess the data. It will call `tokenizer.perl` in the Moses toolbox to split words and punctuations, randomly shuffle training set and construct the dictionary. Notice: we only use labeled training and testing set. Executing following commands will preprocess the data:
+## Model Structure

-```
-data_dir="./data/imdb"
-python preprocess.py -i $data_dir
-```
+### Initialize PaddlePaddle

-If it runs successfully, `./data/pre-imdb` will contain:
+We must import and initialize PaddlePaddle (enable/disable GPU, set the number of trainers, etc).

-```
-dict.txt  labels.list  test.list  test_part_000  train.list  train_part_000
-```
+```python
+import sys
+import paddle.v2 as paddle

-* test\_part\_000 和 train\_part\_000: all labeled training and testing set, and the training set is shuffled.
-* train.list and test.list: training and testing file-list (containing list of file names).
-* dict.txt: dictionary generated from training set.
-* labels.list: class label, 0 stands for negative while 1 for positive.
+# PaddlePaddle init
+paddle.init(use_gpu=False, trainer_count=1)
+```

-### Data Provider for PaddlePaddle
-PaddlePaddle can read Python-style script for configuration. The following `dataprovider.py` provides a detailed example, consisting of two parts:
+As alluded to in section [Model Overview](#model-overview), here we provide the implementations of both Text CNN and Stacked-bidirectional LSTM models.

-* hook: define text information and class Id. Texts are defined as `integer_value_sequence` while class Ids are defined as `integer_value`.
-* process: read line by line for ID and text information split by `’\t\t’`, and yield the data as a generator.
+### Text Convolution Neural Network (Text CNN)

-```python
-from paddle.trainer.PyDataProvider2 import *
+We create a neural network `convolution_net` as the following snippet code.

-def hook(settings, dictionary, **kwargs):
-settings.word_dict = dictionary
-settings.input_types = {
-'word':  integer_value_sequence(len(settings.word_dict)),
-'label': integer_value(2)
-}
-settings.logger.info('dict len : %d' % (len(settings.word_dict)))
-
-@provider(init_hook=hook)
-def process(settings, file_name):
-with open(file_name, 'r') as fdata:
-for line_count, line in enumerate(fdata):
-label, comment = line.strip().split('\t\t')
-label = int(label)
-words = comment.split()
-word_slot = [
-settings.word_dict[w] for w in words if w in settings.word_dict
-]
-yield {
-'word': word_slot,
-'label': label
-}
-```
+Note: `paddle.networks.sequence_conv_pool` includes both convolution and pooling layer operations.

-## Model Setup
-`trainer_config.py` is an example of a setup file.
-### Data Definition
 ```python
-from os.path import join as join_path
-from paddle.trainer_config_helpers import *
-# if it is “test” mode
-is_test = get_config_arg('is_test', bool, False)
-# if it is “predict” mode
-is_predict = get_config_arg('is_predict', bool, False)
-
-# Data path
-data_dir = "./data/pre-imdb"
-# File names
-train_list = "train.list"
-test_list = "test.list"
-dict_file = "dict.txt"
-
-# Dictionary size
-dict_dim = len(open(join_path(data_dir, "dict.txt")).readlines())
-# class number
-class_dim = len(open(join_path(data_dir, 'labels.list')).readlines())
-
-if not is_predict:
-train_list = join_path(data_dir, train_list)
-test_list = join_path(data_dir, test_list)
-dict_file = join_path(data_dir, dict_file)
-train_list = train_list if not is_test else None
-# construct the dictionary
-word_dict = dict()
-with open(dict_file, 'r') as f:
-for i, line in enumerate(open(dict_file, 'r')):
-word_dict[line.split('\t')[0]] = i
-# Call the function “define_py_data_sources2” in the file dataprovider.py to extract features
-define_py_data_sources2(
-train_list,
-test_list,
-module="dataprovider",
-obj="process",  # function to generate data
-args={'dictionary': word_dict}) # extra parameters, here refers to dictionary
+def convolution_net(input_dim, class_dim=2, emb_dim=128, hid_dim=128):
+    data = paddle.layer.data("word",
+                             paddle.data_type.integer_value_sequence(input_dim))
+    emb = paddle.layer.embedding(input=data, size=emb_dim)
+    conv_3 = paddle.networks.sequence_conv_pool(
+        input=emb, context_len=3, hidden_size=hid_dim)
+    conv_4 = paddle.networks.sequence_conv_pool(
+        input=emb, context_len=4, hidden_size=hid_dim)
+    output = paddle.layer.fc(input=[conv_3, conv_4],
+                             size=class_dim,
+                             act=paddle.activation.Softmax())
+    lbl = paddle.layer.data("label", paddle.data_type.integer_value(2))
+    cost = paddle.layer.classification_cost(input=output, label=lbl)
+    return cost
 ```

-### Algorithm Setup
+1. Define input data and its dimension

-```python
-settings(
-batch_size=128,
-learning_rate=2e-3,
-learning_method=AdamOptimizer(),
-regularization=L2Regularization(8e-4),
-gradient_clipping_threshold=25)
-```
+    Parameter `input_dim` denotes the dictionary size, and `class_dim` is the number of categories. In `convolution_net`, the input to the network is defined in `paddle.layer.data`.

-* Batch size set as 128;
-* Set global learning rate;
-* Apply ADAM algorithm for optimization;
-* Set up L2 regularization;
-* Set up gradient clipping threshold;
+1. Define Classifier

-### Model Structure
-We use PaddlePaddle to implement two classification algorithms, based on above mentioned model [Text-CNN](#Text-CNN（CNN）)和[Stacked-bidirectional LSTM](#Stacked-bidirectional LSTM（Stacked Bidirectional LSTM）)。
-#### Implementation of Text CNN
-```python
-def convolution_net(input_dim,
-class_dim=2,
-emb_dim=128,
-hid_dim=128,
-is_predict=False):
-# network input: id denotes word order, dictionary size as input_dim
-data = data_layer("word", input_dim)
-# Embed one-hot id to embedding subspace
-emb = embedding_layer(input=data, size=emb_dim)
-# Convolution and max-pooling operation, convolution kernel size set as 3
-conv_3 = sequence_conv_pool(input=emb, context_len=3, hidden_size=hid_dim)
-# Convolution and max-pooling, convolution kernel size set as 4
-conv_4 = sequence_conv_pool(input=emb, context_len=4, hidden_size=hid_dim)
-# Concatenate conv_3 and conv_4 as input for softmax classification, class number as class_dim
-output = fc_layer(
-input=[conv_3, conv_4], size=class_dim, act=SoftmaxActivation())
-
-if not is_predict:
-lbl = data_layer("label", 1)    #network input: class label
-outputs(classification_cost(input=output, label=lbl))
-else:
-outputs(output)
-```
+    The above Text CNN network extracts high-level features and maps them to a vector of the same size as the categories. `paddle.activation.Softmax` function or classifier is then used for calculating the probability of the sentence belonging to each category.
+
+1. Define Loss Function

-In our implementation, we can use just a single layer [`sequence_conv_pool`](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/trainer_config_helpers/networks.py) to do convolution and pooling operation, convolution kernel size set as hidden_size parameters.
+    In the context of supervised learning, labels of training set are defined in `paddle.layer.data`, too. During training, cross-entropy is used as loss function in `paddle.layer.classification_cost` and as the output of the network; During testing, the outputs are the probabilities calculated in the classifier.

-#### Implementation of Stacked bidirectional LSTM
+#### Stacked bidirectional LSTM
+
+We create a neural network `stacked_lstm_net` as below.

 ```python
 def stacked_lstm_net(input_dim,
-class_dim=2,
-emb_dim=128,
-hid_dim=512,
-stacked_num=3,
-is_predict=False):
-
-# layer number of LSTM “stacked_num” is an odd number to confirm the top-layer LSTM is forward
-assert stacked_num % 2 == 1
-# network attributes setup
-layer_attr = ExtraLayerAttribute(drop_rate=0.5)
-# parameter attributes setup
-fc_para_attr = ParameterAttribute(learning_rate=1e-3)
-lstm_para_attr = ParameterAttribute(initial_std=0., learning_rate=1.)
-para_attr = [fc_para_attr, lstm_para_attr]
-bias_attr = ParameterAttribute(initial_std=0., l2_rate=0.)
-# Activation functions
-relu = ReluActivation()
-linear = LinearActivation()
-
-
-# Network input: id as word order, dictionary size is set as input_dim
-data = data_layer("word", input_dim)
-# Mapping id from word to the embedding subspace
-emb = embedding_layer(input=data, size=emb_dim)
-
-fc1 = fc_layer(input=emb, size=hid_dim, act=linear, bias_attr=bias_attr)
-# LSTM-based RNN
-lstm1 = lstmemory(
-input=fc1, act=relu, bias_attr=bias_attr, layer_attr=layer_attr)
-
-# Construct stacked bidirectional LSTM with fc_layer and lstmemory with layer depth as stacked_num:
-inputs = [fc1, lstm1]
-for i in range(2, stacked_num + 1):
-fc = fc_layer(
-input=inputs,
-size=hid_dim,
-act=linear,
-param_attr=para_attr,
-bias_attr=bias_attr)
-lstm = lstmemory(
-input=fc,
-# Odd number-th layer: forward, Even number-th reverse.
-reverse=(i % 2) == 0,
-act=relu,
-bias_attr=bias_attr,
-layer_attr=layer_attr)
-inputs = [fc, lstm]
-
-# Apply max-pooling along the temporal dimension on the last fc_layer to produce a fixed length vector
-fc_last = pooling_layer(input=inputs[0], pooling_type=MaxPooling())
-# Apply max-pooling along tempoeral dim of lstmemory to obtain fixed length feature vector
-lstm_last = pooling_layer(input=inputs[1], pooling_type=MaxPooling())
-# concatenate fc_last and lstm_last as input for a softmax classification layer, with class number equals class_dim
-output = fc_layer(
-input=[fc_last, lstm_last],
-size=class_dim,
-act=SoftmaxActivation(),
-bias_attr=bias_attr,
-param_attr=para_attr)
-
-if is_predict:
-outputs(output)
-else:
-outputs(classification_cost(input=output, label=data_layer('label', 1)))
+                     class_dim=2,
+                     emb_dim=128,
+                     hid_dim=512,
+                     stacked_num=3):
+    """
+    A Wrapper for sentiment classification task.
+    This network uses bi-directional recurrent network,
+    consisting three LSTM layers. This configure is referred to
+    the paper as following url, but use fewer layrs.
+        http://www.aclweb.org/anthology/P15-1109
+    input_dim: here is word dictionary dimension.
+    class_dim: number of categories.
+    emb_dim: dimension of word embedding.
+    hid_dim: dimension of hidden layer.
+    stacked_num: number of stacked lstm-hidden layer.
+    """
+    assert stacked_num % 2 == 1
+
+    layer_attr = paddle.attr.Extra(drop_rate=0.5)
+    fc_para_attr = paddle.attr.Param(learning_rate=1e-3)
+    lstm_para_attr = paddle.attr.Param(initial_std=0., learning_rate=1.)
+    para_attr = [fc_para_attr, lstm_para_attr]
+    bias_attr = paddle.attr.Param(initial_std=0., l2_rate=0.)
+    relu = paddle.activation.Relu()
+    linear = paddle.activation.Linear()
+
+    data = paddle.layer.data("word",
+                             paddle.data_type.integer_value_sequence(input_dim))
+    emb = paddle.layer.embedding(input=data, size=emb_dim)
+
+    fc1 = paddle.layer.fc(input=emb,
+                          size=hid_dim,
+                          act=linear,
+                          bias_attr=bias_attr)
+    lstm1 = paddle.layer.lstmemory(
+        input=fc1, act=relu, bias_attr=bias_attr, layer_attr=layer_attr)
+
+    inputs = [fc1, lstm1]
+    for i in range(2, stacked_num + 1):
+        fc = paddle.layer.fc(input=inputs,
+                             size=hid_dim,
+                             act=linear,
+                             param_attr=para_attr,
+                             bias_attr=bias_attr)
+        lstm = paddle.layer.lstmemory(
+            input=fc,
+            reverse=(i % 2) == 0,
+            act=relu,
+            bias_attr=bias_attr,
+            layer_attr=layer_attr)
+        inputs = [fc, lstm]
+
+    fc_last = paddle.layer.pooling(
+        input=inputs[0], pooling_type=paddle.pooling.Max())
+    lstm_last = paddle.layer.pooling(
+        input=inputs[1], pooling_type=paddle.pooling.Max())
+    output = paddle.layer.fc(input=[fc_last, lstm_last],
+                             size=class_dim,
+                             act=paddle.activation.Softmax(),
+                             bias_attr=bias_attr,
+                             param_attr=para_attr)
+
+    lbl = paddle.layer.data("label", paddle.data_type.integer_value(2))
+    cost = paddle.layer.classification_cost(input=output, label=lbl)
+    return cost
 ```

-Our model defined in `trainer_config.py` uses the `stacked_lstm_net` structure as default. If you want to use `convolution_net`, you can comment related lines.
+1. Define input data and its dimension

-```python
-stacked_lstm_net(
-dict_dim, class_dim=class_dim, stacked_num=3, is_predict=is_predict)
-# convolution_net(dict_dim, class_dim=class_dim, is_predict=is_predict)
-```
+    Parameter `input_dim` denotes the dictionary size, and `class_dim` is the number of categories. In `stacked_lstm_net`, the input to the network is defined in `paddle.layer.data`.

-## Model Training
-Use `train.sh` script to run local training:
+1. Define Classifier

-```
-./train.sh
-```
+    The above stacked bidirectional LSTM network extracts high-level features and maps them to a vector of the same size as the categories. `paddle.activation.Softmax` function or classifier is then used for calculating the probability of the sentence belonging to each category.

-train.sh is as following:
-
-```bash
-paddle train --config=trainer_config.py \
--save_dir=./model_output \
--job=train \
--use_gpu=false \
--trainer_count=4 \
--num_passes=10 \
--log_period=20 \
--dot_period=20 \
--show_parameter_stats_period=100 \
--test_all_data_in_one_period=1 \
-2>&1 | tee 'train.log'
-```
+1. Define Loss Function

-* \--config=trainer_config.py: set up model configuration.
-* \--save\_dir=./model_output: set up output folder to save model parameters.
-* \--job=train: set job mode as training.
-* \--use\_gpu=false: Use CPU for training. If you have installed GPU-version PaddlePaddle and want to try GPU training, you may set this term as true.
-* \--trainer\_count=4: setup thread number (or GPU numer）.
-* \--num\_passes=15: Setup pass. In PaddlePaddle, a pass means a training epoch over all samples.
-* \--log\_period=20: print log every 20 batches.
-* \--show\_parameter\_stats\_period=100: Print statistics to screen every 100 batch.
-* \--test\_all_data\_in\_one\_period=1: Predict all testing data every time.
+    In the context of supervised learning, labels of training set are defined in `paddle.layer.data`, too. During training, cross-entropy is used as loss function in `paddle.layer.classification_cost` and as the output of the network; During testing, the outputs are the probabilities calculated in the classifier.

-If it is running sussefully, the output log will be saved at `train.log`, model parameters will be saved at the directory `model_output/`. Output log will be as following:

-```
-Batch=20 samples=2560 AvgCost=0.681644 CurrentCost=0.681644 Eval: classification_error_evaluator=0.36875  CurrentEval: classification_error_evaluator=0.36875
-...
-Pass=0 Batch=196 samples=25000 AvgCost=0.418964 Eval: classification_error_evaluator=0.1922
-Test samples=24999 cost=0.39297 Eval: classification_error_evaluator=0.149406
-```
+To reiterate, we can either invoke `convolution_net` or `stacked_lstm_net`.

-* Batch=xx: Already |xx| Batch trained.
-* samples=xx: xx samples have been processed during training.
-* AvgCost=xx: Average loss from 0-th batch to the current batch.
-* CurrentCost=xx: loss of the latest |log_period|-th batch;
-* Eval: classification\_error\_evaluator=xx: Average accuracy from 0-th batch to current batch;
-* CurrentEval: classification\_error\_evaluator: latest |log_period| batches of classification error;
-* Pass=0: Running over all data in the training set is called as a Pass. Pass “0” denotes the first round.
+```python
+word_dict = paddle.dataset.imdb.word_dict()
+dict_dim = len(word_dict)
+class_dim = 2
+
+# option 1
+cost = convolution_net(dict_dim, class_dim=class_dim)
+# option 2
+# cost = stacked_lstm_net(dict_dim, class_dim=class_dim, stacked_num=3)
+```

+## Model Training

-## Application models
-### Testing
+### Define Parameters

-Testing refers to use trained model to evaluate labeled dataset.
+First, we create the model parameters according to the previous model configuration `cost`.

-```
-./test.sh
+```python
+# create parameters
+parameters = paddle.parameters.create(cost)
 ```

-Scripts for testing `test.sh` is as following, where the function `get_best_pass` ranks classification accuracy to obtain the best model:
+### Create Trainer

-```bash
-function get_best_pass() {
-cat $1  | grep -Pzo 'Test .*\n.*pass-.*' | \
-sed  -r 'N;s/Test.* error=([0-9]+\.[0-9]+).*\n.*pass-([0-9]+)/\1 \2/g' | \
-sort | head -n 1
-}
+Before jumping into creating a training module, algorithm setting is also necessary.
+Here we specified `Adam` optimization algorithm via `paddle.optimizer`.

-log=train.log
-LOG=`get_best_pass $log`
-LOG=(${LOG})
-evaluate_pass="model_output/pass-${LOG[1]}"
-
-echo 'evaluating from pass '$evaluate_pass
-
-model_list=./model.list
-touch $model_list | echo $evaluate_pass > $model_list
-net_conf=trainer_config.py
-paddle train --config=$net_conf \
--model_list=$model_list \
--job=test \
--use_gpu=false \
--trainer_count=4 \
--config_args=is_test=1 \
-2>&1 | tee 'test.log'
+```python
+# create optimizer
+adam_optimizer = paddle.optimizer.Adam(
+    learning_rate=2e-3,
+    regularization=paddle.optimizer.L2Regularization(rate=8e-4),
+    model_average=paddle.optimizer.ModelAverage(average_window=0.5))
+
+# create trainer
+trainer = paddle.trainer.SGD(cost=cost,
+                                parameters=parameters,
+                                update_equation=adam_optimizer)
 ```

-Different from training, testing requires denoting `--job = test` and model path `--model_list = $model_list`. If successful, log will be saved at `test.log`. In our test, the best model is `model_output/pass-00002`, with classification error rate as 0.115645：
+### Training

-```
-Pass=0 samples=24999 AvgCost=0.280471 Eval: classification_error_evaluator=0.115645
+`paddle.dataset.imdb.train()` will yield records during each pass, after shuffling, a batch input is generated for training.
+
+```python
+train_reader = paddle.batch(
+    paddle.reader.shuffle(
+        lambda: paddle.dataset.imdb.train(word_dict), buf_size=1000),
+    batch_size=100)
+
+test_reader = paddle.batch(
+    lambda: paddle.dataset.imdb.test(word_dict), batch_size=100)
+
+trainer.train(
+    reader=train_reader,
+    event_handler=event_handler,
+    feeding=feeding,
+    num_passes=2)
 ```

-### Prediction
-`predict.py` script provides an API. Predicting IMDB data without labels as following:
+`feeding` is devoted to specifying the correspondence between each yield record and `paddle.layer.data`. For instance, the first column of data generated by `paddle.dataset.imdb.train()` corresponds to `word` feature.

-```
-./predict.sh
-```
-predict.sh is as following（default model path `model_output/pass-00002` may exist or modified to others）:
-
-```bash
-model=model_output/pass-00002/
-config=trainer_config.py
-label=data/pre-imdb/labels.list
-cat ./data/aclImdb/test/pos/10007_10.txt | python predict.py \
--tconf=$config \
--model=$model \
--label=$label \
--dict=./data/pre-imdb/dict.txt \
--batch_size=1
+```python
+feeding = {'word': 0, 'label': 1}
 ```

-* `cat ./data/aclImdb/test/pos/10007_10.txt` : Input prediction samples.
-* `predict.py` : Prediction script.
-* `--tconf=$config` : Network set up.
-* `--model=$model` : Model path set up.
-* `--label=$label` : set up the label dictionary, mapping integer IDs to string labels.
-* `--dict=data/pre-imdb/dict.txt` : set up the dictionary file.
-* `--batch_size=1` : batch size during prediction.
+Callback function `event_handler` will be invoked to track training and testing process when a pre-defined event happens.

+```python
+def event_handler(event):
+    if isinstance(event, paddle.event.EndIteration):
+        if event.batch_id % 100 == 0:
+            print "\nPass %d, Batch %d, Cost %f, %s" % (
+                event.pass_id, event.batch_id, event.cost, event.metrics)
+        else:
+            sys.stdout.write('.')
+            sys.stdout.flush()
+    if isinstance(event, paddle.event.EndPass):
+        result = trainer.test(reader=test_reader, reader_dict=reader_dict)
+        print "\nTest with Pass %d, %s" % (event.pass_id, result.metrics)
+```

-Prediction result of our example:
+Finally, we can invoke `trainer.train` to start training:

+```python
+trainer.train(
+    reader=train_reader,
+    event_handler=event_handler,
+    feeding=feedig,
+    num_passes=10)
 ```
-Loading parameters from model_output/pass-00002/
-predicting label is pos
-```

-`10007_10.txt` in folder`./data/aclImdb/test/pos`, the predicted label is also pos，so the prediction is correct.
-## Summary
+After training is done, the model from each pass is saved in `output/pass-%05d`. For example, the model of Pass 300 is saved in `output/pass-00299`.
+
+## Conclusion
+
 In this chapter, we use sentiment analysis as an example to introduce applying deep learning models on end-to-end short text classification, as well as how to use PaddlePaddle to implement the model. Meanwhile, we briefly introduce two models for text processing: CNN and RNN. In following chapters we will see how these models can be applied in other tasks.
+
 ## Reference
+
 1. Kim Y. [Convolutional neural networks for sentence classification](http://arxiv.org/pdf/1408.5882)[J]. arXiv preprint arXiv:1408.5882, 2014.
 2. Kalchbrenner N, Grefenstette E, Blunsom P. [A convolutional neural network for modelling sentences](http://arxiv.org/pdf/1404.2188.pdf?utm_medium=App.net&utm_source=PourOver)[J]. arXiv preprint arXiv:1404.2188, 2014.
 3. Yann N. Dauphin, et al. [Language Modeling with Gated Convolutional Networks](https://arxiv.org/pdf/1612.08083v1.pdf)[J] arXiv preprint arXiv:1612.08083, 2016.