@@ -98,18 +98,21 @@ We use [IMDB](http://ai.stanford.edu/%7Eamaas/data/sentiment/) dataset for senti
...
@@ -98,18 +98,21 @@ We use [IMDB](http://ai.stanford.edu/%7Eamaas/data/sentiment/) dataset for senti
After issuing a command `python train.py`, training will start immediately. The details will be unpacked by the following sessions to see how it works.
After issuing a command `python train.py`, training will start immediately. The details will be unpacked by the following sessions to see how it works.
## Model Structure
## Model Configuration
### Initialize PaddlePaddle
Our program starts with importing necessary packages and initializing some global variables:
We must import and initialize PaddlePaddle (enable/disable GPU, set the number of trainers, etc).
```python
```python
importsys
importpaddle
importpaddle.v2aspaddle
importpaddle.fluidasfluid
fromfunctoolsimportpartial
# PaddlePaddle init
importnumpyasnp
paddle.init(use_gpu=False,trainer_count=1)
CLASS_DIM=2
EMB_DIM=128
HID_DIM=512
BATCH_SIZE=128
USE_GPU=False
```
```
As alluded to in section [Model Overview](#model-overview), here we provide the implementations of both Text CNN and Stacked-bidirectional LSTM models.
As alluded to in section [Model Overview](#model-overview), here we provide the implementations of both Text CNN and Stacked-bidirectional LSTM models.
...
@@ -118,212 +121,229 @@ As alluded to in section [Model Overview](#model-overview), here we provide the
...
@@ -118,212 +121,229 @@ As alluded to in section [Model Overview](#model-overview), here we provide the
We create a neural network `convolution_net` as the following snippet code.
We create a neural network `convolution_net` as the following snippet code.
Note: `paddle.networks.sequence_conv_pool` includes both convolution and pooling layer operations.
Note: `fluid.nets.sequence_conv_pool` includes both convolution and pooling layer operations.
Parameter `input_dim` denotes the dictionary size, and `class_dim` is the number of categories. In `convolution_net`, the input to the network is defined in `paddle.layer.data`.
```
Parameter `input_dim` denotes the dictionary size, and `class_dim` is the number of categories.
1. Define Classifier
The above Text CNN network extracts high-level features and maps them to a vector of the same size as the categories. `paddle.activation.Softmax` function or classifier is then used for calculating the probability of the sentence belonging to each category.
1. Define Loss Function
The above Text CNN network extracts high-level features and maps them to a vector of the same size as the categories. `paddle.activation.Softmax` function or classifier is then used for calculating the probability of the sentence belonging to each category.
In the context of supervised learning, labels of the training set are defined in `paddle.layer.data`, too. During training, cross-entropy is used as loss function in `paddle.layer.classification_cost` and as the output of the network; During testing, the outputs are the probabilities calculated in the classifier.
#### Stacked bidirectional LSTM
### Stacked bidirectional LSTM
We create a neural network `stacked_lstm_net` as below.
We create a neural network `stacked_lstm_net` as below.
Parameter `input_dim` denotes the dictionary size, and `class_dim` is the number of categories. In `stacked_lstm_net`, the input to the network is defined in `paddle.layer.data`.
```
The above stacked bidirectional LSTM network extracts high-level features and maps them to a vector of the same size as the categories. `paddle.activation.Softmax` function or classifier is then used for calculating the probability of the sentence belonging to each category.
1. Define Classifier
To reiterate, we can either invoke `convolution_net` or `stacked_lstm_net`. In below steps, we will go with `convolution_net`.
The above stacked bidirectional LSTM network extracts high-level features and maps them to a vector of the same size as the categories. `paddle.activation.Softmax` function or classifier is then used for calculating the probability of the sentence belonging to each category.
Next we define an `inference_program` that simply uses `convolution_net` to predict output with the input from `fluid.layer.data`.
In the context of supervised learning, labels of the training set are defined in `paddle.layer.data`, too. During training, cross-entropy is used as loss function in `paddle.layer.classification_cost` and as the output of the network; During testing, the outputs are the probabilities calculated in the classifier.
Then we define a `training_program` that uses the result from `inference_program` to compute the cost with label data.
Also define `optimizer_func` to specify the optimizer.
To reiterate, we can either invoke `convolution_net` or `stacked_lstm_net`.
In the context of supervised learning, labels of the training set are defined in `paddle.layer.data` too. During training, cross-entropy is used as loss function in `paddle.layer.classification_cost` and as the output of the network; During testing, the outputs are the probabilities calculated in the classifier.
First result that returns from the list must be cost.
Create a trainer that takes `train_program` as input and specify optimizer function.
```python
trainer=fluid.Trainer(
train_func=partial(train_program,word_dict),
place=place,
optimizer_func=optimizer_func)
```
```
`feeding` is devoted to specifying the correspondence between each yield record and `paddle.layer.data`. For instance, the first column of data generated by `paddle.dataset.imdb.train()` corresponds to `word` feature.
### Feeding Data
`feed_order` is devoted to specifying the correspondence between each yield record and `paddle.layer.data`. For instance, the first column of data generated by `imdb.train` corresponds to `words`.
```python
```python
feeding={'word':0,'label':1}
feed_order=['words','label']
```
```
Callback function `event_handler` will be invoked to track training progress when a pre-defined event happens.
### Event Handler
Callback function `event_handler` will be called during training when a pre-defined event happens.
For example, we can check the cost by `trainer.test` when `EndStepEvent` occurs
```python
```python
# Specify the directory path to save the parameters
@@ -140,18 +140,21 @@ We use [IMDB](http://ai.stanford.edu/%7Eamaas/data/sentiment/) dataset for senti
...
@@ -140,18 +140,21 @@ We use [IMDB](http://ai.stanford.edu/%7Eamaas/data/sentiment/) dataset for senti
After issuing a command `python train.py`, training will start immediately. The details will be unpacked by the following sessions to see how it works.
After issuing a command `python train.py`, training will start immediately. The details will be unpacked by the following sessions to see how it works.
## Model Structure
## Model Configuration
### Initialize PaddlePaddle
Our program starts with importing necessary packages and initializing some global variables:
We must import and initialize PaddlePaddle (enable/disable GPU, set the number of trainers, etc).
```python
```python
import sys
import paddle
import paddle.v2 as paddle
import paddle.fluid as fluid
from functools import partial
# PaddlePaddle init
import numpy as np
paddle.init(use_gpu=False, trainer_count=1)
CLASS_DIM = 2
EMB_DIM = 128
HID_DIM = 512
BATCH_SIZE = 128
USE_GPU = False
```
```
As alluded to in section [Model Overview](#model-overview), here we provide the implementations of both Text CNN and Stacked-bidirectional LSTM models.
As alluded to in section [Model Overview](#model-overview), here we provide the implementations of both Text CNN and Stacked-bidirectional LSTM models.
...
@@ -160,212 +163,229 @@ As alluded to in section [Model Overview](#model-overview), here we provide the
...
@@ -160,212 +163,229 @@ As alluded to in section [Model Overview](#model-overview), here we provide the
We create a neural network `convolution_net` as the following snippet code.
We create a neural network `convolution_net` as the following snippet code.
Note: `paddle.networks.sequence_conv_pool` includes both convolution and pooling layer operations.
Note: `fluid.nets.sequence_conv_pool` includes both convolution and pooling layer operations.
Parameter `input_dim` denotes the dictionary size, and `class_dim` is the number of categories. In `convolution_net`, the input to the network is defined in `paddle.layer.data`.
```
Parameter `input_dim` denotes the dictionary size, and `class_dim` is the number of categories.
1. Define Classifier
The above Text CNN network extracts high-level features and maps them to a vector of the same size as the categories. `paddle.activation.Softmax` function or classifier is then used for calculating the probability of the sentence belonging to each category.
1. Define Loss Function
The above Text CNN network extracts high-level features and maps them to a vector of the same size as the categories. `paddle.activation.Softmax` function or classifier is then used for calculating the probability of the sentence belonging to each category.
In the context of supervised learning, labels of the training set are defined in `paddle.layer.data`, too. During training, cross-entropy is used as loss function in `paddle.layer.classification_cost` and as the output of the network; During testing, the outputs are the probabilities calculated in the classifier.
#### Stacked bidirectional LSTM
### Stacked bidirectional LSTM
We create a neural network `stacked_lstm_net` as below.
We create a neural network `stacked_lstm_net` as below.
Parameter `input_dim` denotes the dictionary size, and `class_dim` is the number of categories. In `stacked_lstm_net`, the input to the network is defined in `paddle.layer.data`.
```
The above stacked bidirectional LSTM network extracts high-level features and maps them to a vector of the same size as the categories. `paddle.activation.Softmax` function or classifier is then used for calculating the probability of the sentence belonging to each category.
1. Define Classifier
To reiterate, we can either invoke `convolution_net` or `stacked_lstm_net`. In below steps, we will go with `convolution_net`.
The above stacked bidirectional LSTM network extracts high-level features and maps them to a vector of the same size as the categories. `paddle.activation.Softmax` function or classifier is then used for calculating the probability of the sentence belonging to each category.
Next we define an `inference_program` that simply uses `convolution_net` to predict output with the input from `fluid.layer.data`.
net = convolution_net(data, dict_dim, CLASS_DIM, EMB_DIM, HID_DIM)
return net
```
In the context of supervised learning, labels of the training set are defined in `paddle.layer.data`, too. During training, cross-entropy is used as loss function in `paddle.layer.classification_cost` and as the output of the network; During testing, the outputs are the probabilities calculated in the classifier.
Then we define a `training_program` that uses the result from `inference_program` to compute the cost with label data.
Also define `optimizer_func` to specify the optimizer.
To reiterate, we can either invoke `convolution_net` or `stacked_lstm_net`.
In the context of supervised learning, labels of the training set are defined in `paddle.layer.data` too. During training, cross-entropy is used as loss function in `paddle.layer.classification_cost` and as the output of the network; During testing, the outputs are the probabilities calculated in the classifier.
First result that returns from the list must be cost.
Create a trainer that takes `train_program` as input and specify optimizer function.
```python
trainer = fluid.Trainer(
train_func=partial(train_program, word_dict),
place=place,
optimizer_func=optimizer_func)
```
```
`feeding` is devoted to specifying the correspondence between each yield record and `paddle.layer.data`. For instance, the first column of data generated by `paddle.dataset.imdb.train()` corresponds to `word` feature.
### Feeding Data
`feed_order` is devoted to specifying the correspondence between each yield record and `paddle.layer.data`. For instance, the first column of data generated by `imdb.train` corresponds to `words`.
```python
```python
feeding = {'word': 0, 'label': 1}
feed_order = ['words', 'label']
```
```
Callback function `event_handler` will be invoked to track training progress when a pre-defined event happens.
### Event Handler
Callback function `event_handler` will be called during training when a pre-defined event happens.
For example, we can check the cost by `trainer.test` when `EndStepEvent` occurs
```python
```python
# Specify the directory path to save the parameters