@@ -98,18 +98,21 @@ We use [IMDB](http://ai.stanford.edu/%7Eamaas/data/sentiment/) dataset for senti
After issuing a command `python train.py`, training will start immediately. The details will be unpacked by the following sessions to see how it works.
## Model Structure
## Model Configuration
### Initialize PaddlePaddle
We must import and initialize PaddlePaddle (enable/disable GPU, set the number of trainers, etc).
Our program starts with importing necessary packages and initializing some global variables:
```python
importsys
importpaddle.v2aspaddle
# PaddlePaddle init
paddle.init(use_gpu=False,trainer_count=1)
importpaddle
importpaddle.fluidasfluid
fromfunctoolsimportpartial
importnumpyasnp
CLASS_DIM=2
EMB_DIM=128
HID_DIM=512
BATCH_SIZE=128
USE_GPU=False
```
As alluded to in section [Model Overview](#model-overview), here we provide the implementations of both Text CNN and Stacked-bidirectional LSTM models.
...
...
@@ -118,212 +121,229 @@ As alluded to in section [Model Overview](#model-overview), here we provide the
We create a neural network `convolution_net` as the following snippet code.
Note: `paddle.networks.sequence_conv_pool` includes both convolution and pooling layer operations.
Note: `fluid.nets.sequence_conv_pool` includes both convolution and pooling layer operations.
Parameter `input_dim` denotes the dictionary size, and `class_dim` is the number of categories. In `convolution_net`, the input to the network is defined in `paddle.layer.data`.
1. Define Classifier
The above Text CNN network extracts high-level features and maps them to a vector of the same size as the categories. `paddle.activation.Softmax` function or classifier is then used for calculating the probability of the sentence belonging to each category.
```
Parameter `input_dim` denotes the dictionary size, and `class_dim` is the number of categories.
1. Define Loss Function
The above Text CNN network extracts high-level features and maps them to a vector of the same size as the categories. `paddle.activation.Softmax` function or classifier is then used for calculating the probability of the sentence belonging to each category.
In the context of supervised learning, labels of the training set are defined in `paddle.layer.data`, too. During training, cross-entropy is used as loss function in `paddle.layer.classification_cost` and as the output of the network; During testing, the outputs are the probabilities calculated in the classifier.
#### Stacked bidirectional LSTM
### Stacked bidirectional LSTM
We create a neural network `stacked_lstm_net` as below.
```python
defstacked_lstm_net(input_dim,
class_dim=2,
emb_dim=128,
hid_dim=512,
stacked_num=3):
"""
A Wrapper for sentiment classification task.
This network uses a bi-directional recurrent network,
consisting of three LSTM layers. This configuration is
motivated from the following paper, but uses few layers.
The above stacked bidirectional LSTM network extracts high-level features and maps them to a vector of the same size as the categories. `paddle.activation.Softmax` function or classifier is then used for calculating the probability of the sentence belonging to each category.
1. Define input data and its dimension
To reiterate, we can either invoke `convolution_net` or `stacked_lstm_net`. In below steps, we will go with `convolution_net`.
Parameter `input_dim` denotes the dictionary size, and `class_dim` is the number of categories. In `stacked_lstm_net`, the input to the network is defined in `paddle.layer.data`.
Next we define an `inference_program` that simply uses `convolution_net` to predict output with the input from `fluid.layer.data`.
1. Define Classifier
The above stacked bidirectional LSTM network extracts high-level features and maps them to a vector of the same size as the categories. `paddle.activation.Softmax` function or classifier is then used for calculating the probability of the sentence belonging to each category.
In the context of supervised learning, labels of the training set are defined in `paddle.layer.data`, too. During training, cross-entropy is used as loss function in `paddle.layer.classification_cost` and as the output of the network; During testing, the outputs are the probabilities calculated in the classifier.
Then we define a `training_program` that uses the result from `inference_program` to compute the cost with label data.
Also define `optimizer_func` to specify the optimizer.
To reiterate, we can either invoke `convolution_net` or `stacked_lstm_net`.
In the context of supervised learning, labels of the training set are defined in `paddle.layer.data` too. During training, cross-entropy is used as loss function in `paddle.layer.classification_cost` and as the output of the network; During testing, the outputs are the probabilities calculated in the classifier.
First result that returns from the list must be cost.
Create a trainer that takes `train_program` as input and specify optimizer function.
```python
trainer=fluid.Trainer(
train_func=partial(train_program,word_dict),
place=place,
optimizer_func=optimizer_func)
```
`feeding` is devoted to specifying the correspondence between each yield record and `paddle.layer.data`. For instance, the first column of data generated by `paddle.dataset.imdb.train()` corresponds to `word` feature.
### Feeding Data
`feed_order` is devoted to specifying the correspondence between each yield record and `paddle.layer.data`. For instance, the first column of data generated by `imdb.train` corresponds to `words`.
```python
feeding={'word':0,'label':1}
feed_order=['words','label']
```
Callback function `event_handler` will be invoked to track training progress when a pre-defined event happens.
### Event Handler
Callback function `event_handler` will be called during training when a pre-defined event happens.
For example, we can check the cost by `trainer.test` when `EndStepEvent` occurs
```python
# Specify the directory path to save the parameters
@@ -140,18 +140,21 @@ We use [IMDB](http://ai.stanford.edu/%7Eamaas/data/sentiment/) dataset for senti
After issuing a command `python train.py`, training will start immediately. The details will be unpacked by the following sessions to see how it works.
## Model Structure
## Model Configuration
### Initialize PaddlePaddle
We must import and initialize PaddlePaddle (enable/disable GPU, set the number of trainers, etc).
Our program starts with importing necessary packages and initializing some global variables:
```python
import sys
import paddle.v2 as paddle
# PaddlePaddle init
paddle.init(use_gpu=False, trainer_count=1)
import paddle
import paddle.fluid as fluid
from functools import partial
import numpy as np
CLASS_DIM = 2
EMB_DIM = 128
HID_DIM = 512
BATCH_SIZE = 128
USE_GPU = False
```
As alluded to in section [Model Overview](#model-overview), here we provide the implementations of both Text CNN and Stacked-bidirectional LSTM models.
...
...
@@ -160,212 +163,229 @@ As alluded to in section [Model Overview](#model-overview), here we provide the
We create a neural network `convolution_net` as the following snippet code.
Note: `paddle.networks.sequence_conv_pool` includes both convolution and pooling layer operations.
Note: `fluid.nets.sequence_conv_pool` includes both convolution and pooling layer operations.
Parameter `input_dim` denotes the dictionary size, and `class_dim` is the number of categories. In `convolution_net`, the input to the network is defined in `paddle.layer.data`.
1. Define Classifier
The above Text CNN network extracts high-level features and maps them to a vector of the same size as the categories. `paddle.activation.Softmax` function or classifier is then used for calculating the probability of the sentence belonging to each category.
```
Parameter `input_dim` denotes the dictionary size, and `class_dim` is the number of categories.
1. Define Loss Function
The above Text CNN network extracts high-level features and maps them to a vector of the same size as the categories. `paddle.activation.Softmax` function or classifier is then used for calculating the probability of the sentence belonging to each category.
In the context of supervised learning, labels of the training set are defined in `paddle.layer.data`, too. During training, cross-entropy is used as loss function in `paddle.layer.classification_cost` and as the output of the network; During testing, the outputs are the probabilities calculated in the classifier.
#### Stacked bidirectional LSTM
### Stacked bidirectional LSTM
We create a neural network `stacked_lstm_net` as below.
```python
def stacked_lstm_net(input_dim,
class_dim=2,
emb_dim=128,
hid_dim=512,
stacked_num=3):
"""
A Wrapper for sentiment classification task.
This network uses a bi-directional recurrent network,
consisting of three LSTM layers. This configuration is
motivated from the following paper, but uses few layers.
The above stacked bidirectional LSTM network extracts high-level features and maps them to a vector of the same size as the categories. `paddle.activation.Softmax` function or classifier is then used for calculating the probability of the sentence belonging to each category.
1. Define input data and its dimension
To reiterate, we can either invoke `convolution_net` or `stacked_lstm_net`. In below steps, we will go with `convolution_net`.
Parameter `input_dim` denotes the dictionary size, and `class_dim` is the number of categories. In `stacked_lstm_net`, the input to the network is defined in `paddle.layer.data`.
Next we define an `inference_program` that simply uses `convolution_net` to predict output with the input from `fluid.layer.data`.
1. Define Classifier
The above stacked bidirectional LSTM network extracts high-level features and maps them to a vector of the same size as the categories. `paddle.activation.Softmax` function or classifier is then used for calculating the probability of the sentence belonging to each category.
net = convolution_net(data, dict_dim, CLASS_DIM, EMB_DIM, HID_DIM)
return net
```
In the context of supervised learning, labels of the training set are defined in `paddle.layer.data`, too. During training, cross-entropy is used as loss function in `paddle.layer.classification_cost` and as the output of the network; During testing, the outputs are the probabilities calculated in the classifier.
Then we define a `training_program` that uses the result from `inference_program` to compute the cost with label data.
Also define `optimizer_func` to specify the optimizer.
To reiterate, we can either invoke `convolution_net` or `stacked_lstm_net`.
In the context of supervised learning, labels of the training set are defined in `paddle.layer.data` too. During training, cross-entropy is used as loss function in `paddle.layer.classification_cost` and as the output of the network; During testing, the outputs are the probabilities calculated in the classifier.
First result that returns from the list must be cost.
Create a trainer that takes `train_program` as input and specify optimizer function.
```python
trainer = fluid.Trainer(
train_func=partial(train_program, word_dict),
place=place,
optimizer_func=optimizer_func)
```
`feeding` is devoted to specifying the correspondence between each yield record and `paddle.layer.data`. For instance, the first column of data generated by `paddle.dataset.imdb.train()` corresponds to `word` feature.
### Feeding Data
`feed_order` is devoted to specifying the correspondence between each yield record and `paddle.layer.data`. For instance, the first column of data generated by `imdb.train` corresponds to `words`.
```python
feeding = {'word': 0, 'label': 1}
feed_order = ['words', 'label']
```
Callback function `event_handler` will be invoked to track training progress when a pre-defined event happens.
### Event Handler
Callback function `event_handler` will be called during training when a pre-defined event happens.
For example, we can check the cost by `trainer.test` when `EndStepEvent` occurs
```python
# Specify the directory path to save the parameters