提交 40125f45 编写于 作者: Y Yu Yang

Update doc

上级 c9fd4963
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: 5bb206a2182263ffcb7c4270c50bc7c9
tags: 645f666f9bcd5a90fca523b33c5a78b7
Build and Install
=================
## Requirement
### Dependents
- **CMake**: required for 2.8+ version
- **g++**: a recent c++ compiler supporting c++11, >= 4.6, < 5
- **BLAS library**: such as openBLAS, MKL, ATLAS
- **protobuf**: required for 2.4+ version, 3.x is not supported
- **python**: currently only 2.7 version is supported
### Optional
PaddlePaddle also support some build options, you have to install related libraries.
- **WITH_GPU**: Compile with gpu mode
- The GPU version works best with Cuda Toolkit 7.5 and cuDNN v5
- Other versions Cuda Toolkit 6.5, 7.0 and cuDNN v2, v3, v4 are also supported
- Note: to utilize cuDNN v5, Cuda Toolkit 7.5 is prerequisite and vice versa
- **WITH_DOUBLE**: Compile with double precision, otherwise use single precision
- **WITH_GLOG**: Compile with glog, otherwise use a log implement internally
- **WITH_GFLAGS**: Compile with gflags, otherwise use a flag implement internally
- **WITH_TESTING**: Compile with gtest and run unittest for PaddlePaddle
- **WITH_DOC**: Compile with documentation
- **WITH_SWIG_PY**: Compile with python predict api
- **WITH_STYLE_CHECK**: Style check for source code
## Building on Ubuntu14.04
### Install Dependencies
- **CPU Dependencies**
```bash
# necessary
sudo apt-get update
sudo apt-get install -y g++ make cmake build-essential libatlas-base-dev python python-pip libpython-dev m4 libprotobuf-dev protobuf-compiler python-protobuf python-numpy git
# optional
sudo apt-get install libgoogle-glog-dev
sudo apt-get install libgflags-dev
sudo apt-get install libgtest-dev
pushd /usr/src/gtest
cmake .
make
sudo cp *.a /usr/lib
popd
```
- **GPU Dependencies(optional)**
If you need to build GPU version, the first thing you need is a machine that has GPU and CUDA installed.
And you also need to install cuDNN.
You can download CUDA toolkit and cuDNN from nvidia website:
```bash
https://developer.nvidia.com/cuda-downloads
https://developer.nvidia.com/cudnn
```
You can copy cuDNN files into the CUDA toolkit directory, such as:
```bash
sudo tar -xzf cudnn-7.5-linux-x64-v5.1.tgz -C /usr/local
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
```
Then you need to set LD\_LIBRARY\_PATH, CUDA\_HOME and PATH environment variables in ~/.bashrc.
```bash
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export CUDA_HOME=/usr/local/cuda
export PATH=/usr/local/cuda/bin:$PATH
```
- **Python Dependencies(optional)**
If you want to compile PaddlePaddle with python predict api, you need to add -DWITH_SWIG_PY=ON in cmake command and install these first:
```bash
sudo apt-get install swig
```
- **Doc Dependencies(optional)**
If you want to compile PaddlePaddle with doc, you need to add -DWITH_DOC=ON in cmake command and install these first:
```bash
pip install sphinx
pip install sphinx_rtd_theme breathe recommonmark
sudo apt-get install python-sphinx doxygen
```
### Build and Install
CMake will find dependent libraries in system default paths first. After installing some optional libraries, corresponding build option will automatically be on(such as glog, gtest and gflags). And if libraries are not found, you have to set following variables manually in cmake command(CUDNN_ROOT, ATLAS_ROOT, MKL_ROOT, OPENBLAS_ROOT).
Here are some examples of cmake command with different options:
**only cpu**
```bash
cmake -DWITH_GPU=OFF -DWITH_DOC=OFF
```
**gpu**
```bash
cmake -DWITH_GPU=ON -DWITH_DOC=OFF
```
**gpu with doc and swig**
```bash
cmake -DWITH_GPU=ON -DWITH_DOC=ON -DWITH_SWIG_PY=ON
```
Finally, you can download source code and build:
```bash
git clone https://github.com/baidu/Paddle paddle
cd paddle
mkdir build
cd build
# you can add build option here, such as:
cmake -DWITH_GPU=ON -DWITH_DOC=OFF -DCMAKE_INSTALL_PREFIX=<path to install> ..
make -j `nproc` && make install
# PaddlePaddle installation path
export PATH=<path to install>/bin:$PATH
```
**Note**
And if you set WITH_SWIG_PY=ON, you have to install related python predict api at the same time:
```bash
pip install <path to install>/opt/paddle/share/wheels/*.whl
```
# Contribute to PaddlePaddle
We sincerely appreciate your contributions. You can use fork and pull request
workflow to merge your code.
## Code Requirements
- Your code mush be fully documented by
[doxygen](http://www.stack.nl/~dimitri/doxygen/) style.
- Make sure the compiler option WITH\_STYLE\_CHECK is on and the compiler
passes the code style check.
- All code must have unit test.
- Pass all unit tests.
The following tutorial guides you into submitting your contibution.
## [Creating a Fork](https://help.github.com/articles/fork-a-repo/)
Just head over to the GitHub page and click the "Fork" button.
It's just that simple.
## Clone
Once you've created a fork, you can use your favorite git client to clone your
repo or just head straight to the command line:
```shell
# Clone your fork to your local machine
git clone git@github.com:USERNAME/paddle.git
```
Then you can start to develop.
## Commit
Commit your changes by following command lines:
```shell
# show the working tree status
git status
# add modified files
git add xx
git commit -m "commit info"
```
The first line of commit infomation is the title. The second and later lines
are the details if any.
## Keeping Fork Up to Date
Before pull your request, you shold sync you code from the latest PaddlePaddle.
To do this, you'll need to add a remote at first:
```shell
# see the current configured remote repository
git remote -v
# add upstream repository
git remote add upstream https://github.com/paddle/paddle.git
# verify the new upstream
git remote -v
```
Update your fork with the latest upstream changes:
```shell
git fetch upstream
git pull upstream master
```
If there are no unique commits locally, git will simply perform a fast-forward.
However, if you have been making changes (in the vast majority of cases you
probably shouldn't be), you may have to deal with conflicts.
Now, your local master branch is up-to-date with everything modified upstream.
## Push to GitHub
```shell
# push to your repository in Github
git push origin master
```
## Pull Request
Go to the page for your fork on GitHub, select your development branch,
and click the **pull request button**.
Build And Install PaddlePaddle
================================
Install PaddlePaddle
----------------------
.. toctree::
:glob:
install_*
Build from Source
-----------------
If you want to hack and contribute PaddlePaddle source code, following guides can help you\:
.. toctree::
:glob:
build_from_source.md
contribute_to_paddle.md
Build Docker Images
-------------------
Note: The intallation packages are still in pre-release
state and your experience of installation may not be smooth.
If you want to pack docker image, the following guide can help you\:
.. toctree::
:glob:
docker/*
Cluster Train
====================
.. toctree::
:glob:
opensource/cluster_train.md
# Cluster Training
We provide this simple scripts to help you to launch cluster training Job to harness PaddlePaddle's distributed trainning. For MPI and other cluster scheduler refer this naive script to implement more robust cluster training platform by yourself.
The following cluster demo is based on RECOMMENDATION local training demo in PaddlePaddle ```demo/recommendation``` directory. Assuming you enter the cluster_scripts/ directory.
## Pre-requirements
Firstly,
```bash
pip install fabric
```
Secondly, go through installing scripts to install PaddlePaddle at all nodes to make sure demo can run as local mode.
Then you should prepare same ROOT_DIR directory in all nodes. ROOT_DIR is from in cluster_scripts/conf.py. Assuming that the ROOT_DIR = /home/paddle, you can create ```paddle``` user account as well, at last ```paddle.py``` can ssh connections to all nodes with ```paddle``` user automatically.
At last you can create ssh mutual trust relationship between all nodes for easy ssh login, otherwise ```password``` should be provided at runtime from ```paddle.py```.
## Prepare Job Workspace
```Job workspace``` is defined as one package directory which contains dependency libraries, train data, test data, model config file and all other related file dependencies.
These ```train/test``` data should be prepared before launching cluster job. To satisfy the requirement that train/test data are placed in different directory from workspace, PADDLE refers train/test data according to index file named as ```train.list/test.list``` which are used in model config file. So the train/test data also contains train.list/test.list two list file. All local training demo already provides scripts to help you create these two files, and all nodes in cluster job will handle files with same logical code in normal condition.
Generally, you can use same model file from local training for cluster training. What you should have in mind that, the ```batch_size``` set in ```setting``` function in model file means batch size in ```each``` node of cluster job instead of total batch size if synchronization SGD was used.
Following steps are based on demo/recommendation demo in demo directory.
You just go through demo/recommendation tutorial doc until ```Train``` section, and at last you will get train/test data and model configuration file. Besides, you can place paddle binaries and related dependencies files in this demo/recommendation directory as well. Finaly, just use demo/recommendation as workspace for cluster training.
At last your workspace should look like as follow:
```
.
|-- conf
| `-- trainer_config.conf
|-- test
| |-- dnn_instance_000000
|-- test.list
|-- train
| |-- dnn_instance_000000
| |-- dnn_instance_000001
`-- train.list
```
```conf/trainer_config.conf```
Indicates the model config file.
```test``` and ```train```
Train/test data. Different node should owns different parts of all Train data. This simple script did not do this job, so you should prepare it at last. All test data should be placed at node 0 only.
```train.list``` and ```test.list```
File index. It stores all relative or absolute file paths of all train/test data at current node.
## Prepare Cluster Job Configuration
Set serveral options must be carefully set in cluster_scripts/conf.py
```HOSTS``` all nodes hostname or ip that will run cluster job. You can also append user and ssh port with hostname, such as root@192.168.100.17:9090.
```ROOT_DIR``` workspace ROOT directory for placing JOB workspace directory
```PADDLE_NIC``` the NIC(Network Interface Card) interface name for cluster communication channel, such as eth0 for ethternet, ib0 for infiniband.
```PADDLE_PORT``` port number for cluster commnunication channel
```PADDLE_PORTS_NUM``` the number of port used for cluster communication channle. if the number of cluster nodes is small(less than 5~6nodes), recommend you set it to larger, such as 2 ~ 8, for better network performance.
```PADDLE_PORTS_NUM_FOR_SPARSE``` the number of port used for sparse updater cluster commnunication channel. if sparse remote update is used, set it like ```PADDLE_PORTS_NUM```
Default Configuration as follow:
```python
HOSTS = [
"root@192.168.100.17",
"root@192.168.100.18",
]
'''
workspace configuration
'''
#root dir for workspace
ROOT_DIR = "/home/paddle"
'''
network configuration
'''
#pserver nics
PADDLE_NIC = "eth0"
#pserver port
PADDLE_PORT = 7164
#pserver ports num
PADDLE_PORTS_NUM = 2
#pserver sparse ports num
PADDLE_PORTS_NUM_FOR_SPARSE = 2
```
### Launching Cluster Job
```paddle.py``` provides automatical scripts to start all PaddlePaddle cluster processes in different nodes. By default, all command line options can set as ```paddle.py``` command options and ```paddle.py``` will transparently and automatically set these options to PaddlePaddle lower level processes.
```paddle.py```provides two distinguished command option for easy job launching.
```job_dispatch_package``` set it with local ```workspace```directory, it will be dispatched to all nodes set in conf.py. It could be helpful for frequent hacking workspace files, otherwise frequent mulit-nodes workspace deployment could make your crazy.
```job_workspace``` set it with already deployed workspace directory, ```paddle.py``` will skip dispatch stage to directly launch cluster job with all nodes. It could help to reduce heavy
dispatch latency.
```cluster_scripts/run.sh``` provides command line sample to run ```demo/recommendation``` cluster job, just modify ```job_dispatch_package``` and ```job_workspace``` with your defined directory, then:
```
sh run.sh
```
The cluster Job will start in several seconds.
### Kill Cluster Job
```paddle.py``` can capture ```Ctrl + C``` SIGINT signal to automatically kill all processes launched by it. So just stop ```paddle.py``` to kill cluster job.
### Check Cluster Training Result
Check log in $workspace/log for details, each node owns same log structure.
```paddle_trainer.INFO```
It provides almost all interal output log for training, same as local training. Check runtime model convergence here.
```paddle_pserver2.INFO```
It provides pserver running log, which could help to diagnose distributed error.
```server.log```
It provides stderr and stdout of pserver process. Check error log if training crashs.
```train.log```
It provides stderr and stdout of trainer process. Check error log if training crashs.
### Check Model Output
After one pass finished, model files will be writed in ```output``` directory in node 0.
```nodefile``` in workspace indicates the node id of current cluster job.
# Chinese Word Embedding Model Tutorial #
----------
This tutorial is to guide you through the process of using a Pretrained Chinese Word Embedding Model in the PaddlePaddle standard format.
We thank @lipeng for the pull request that defined the model schemas and pretrained the models.
## Introduction ###
### Chinese Word Dictionary ###
Our Chinese-word dictionary is created on Baidu ZhiDao and Baidu Baike by using in-house word segmentor. For example, the participle of "《红楼梦》" is "《","红楼梦","》",and "《红楼梦》". Our dictionary (using UTF-8 format) has has two columns: word and its frequency. The total word count is 3206325, including 3 special token:
- `<s>`: the start of a sequence
- `<e>`: the end of a sequence
- `<unk>`: a word not included in dictionary
### Pretrained Chinese Word Embedding Model ###
Inspired by paper [A Neural Probabilistic Language Model](http://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf), our model architecture (**Embedding joint of six words->FullyConnect->SoftMax**) is as following graph. And for our dictionary, we pretrain four models with different word vector dimenstions, i.e 32, 64, 128, 256.
<center>![](./neural-n-gram-model.png)</center>
<center>Figure 1. neural-n-gram-model</center>
### Download and Extract ###
To download and extract our dictionary and pretrained model, run the following commands.
cd $PADDLE_ROOT/demo/model_zoo/embedding
./pre_DictAndModel.sh
## Chinese Paraphrasing Example ##
We provide a paraphrasing task to show the usage of pretrained Chinese Word Dictionary and Embedding Model.
### Data Preparation and Preprocess ###
First, run the following commands to download and extract the in-house dataset. The dataset (using UTF-8 format) has 20 training samples, 5 testing samples and 2 generating samples.
cd $PADDLE_ROOT/demo/seqToseq/data
./paraphrase_data.sh
Second, preprocess data and build dictionary on train data by running the following commands, and the preprocessed dataset is stored in `$PADDLE_SOURCE_ROOT/demo/seqToseq/data/pre-paraphrase`:
cd $PADDLE_ROOT/demo/seqToseq/
python preprocess.py -i data/paraphrase [--mergeDict]
- `--mergeDict`: if using this option, the source and target dictionary are merged, i.e, two dictionaries have the same context. Here, as source and target data are all chinese words, this option can be used.
### User Specified Embedding Model ###
The general command of extracting desired parameters from the pretrained embedding model based on user dictionary is:
cd $PADDLE_ROOT/demo/model_zoo/embedding
python extract_para.py --preModel PREMODEL --preDict PREDICT --usrModel USRMODEL--usrDict USRDICT -d DIM
- `--preModel PREMODEL`: the name of pretrained embedding model
- `--preDict PREDICT`: the name of pretrained dictionary
- `--usrModel USRMODEL`: the name of extracted embedding model
- `--usrDict USRDICT`: the name of user specified dictionary
- `-d DIM`: dimension of parameter
Here, you can simply run the command:
cd $PADDLE_ROOT/demo/seqToseq/data/
./paraphase_model.sh
And you will see following embedding model structure:
paraphase_model
|--- _source_language_embedding
|--- _target_language_embedding
### Training Model in PaddlePaddle ###
First, create a model config file, see example `demo/seqToseq/paraphrase/train.conf`:
from seqToseq_net import *
is_generating = False
################## Data Definition #####################
train_conf = seq_to_seq_data(data_dir = "./data/pre-paraphrase",
job_mode = job_mode)
############## Algorithm Configuration ##################
settings(
learning_method = AdamOptimizer(),
batch_size = 50,
learning_rate = 5e-4)
################# Network configure #####################
gru_encoder_decoder(train_conf, is_generating, word_vector_dim = 32)
This config is almost the same as `demo/seqToseq/translation/train.conf`.
Then, train the model by running the command:
cd $PADDLE_SOURCE_ROOT/demo/seqToseq/paraphrase
./train.sh
where `train.sh` is almost the same as `demo/seqToseq/translation/train.sh`, the only difference is following two command arguments:
- `--init_model_path`: path of the initialization model, here is `data/paraphase_model`
- `--load_missing_parameter_strategy`: operations when model file is missing, here use a normal distibution to initialize the other parameters except for the embedding layer
For users who want to understand the dataset format, model architecture and training procedure in detail, please refer to [Text generation Tutorial](text_generation.md).
## Optional Function ##
### Embedding Parameters Observation
For users who want to observe the embedding parameters, this function can convert a PaddlePaddle binary embedding model to a text model by running the command:
cd $PADDLE_ROOT/demo/model_zoo/embedding
python paraconvert.py --b2t -i INPUT -o OUTPUT -d DIM
- `-i INPUT`: the name of input binary embedding model
- `-o OUTPUT`: the name of output text embedding model
- `-d DIM`: the dimension of parameter
You will see parameters like this in output text model:
0,4,32156096
-0.7845433,1.1937413,-0.1704215,0.4154715,0.9566584,-0.5558153,-0.2503305, ......
0.0000909,0.0009465,-0.0008813,-0.0008428,0.0007879,0.0000183,0.0001984, ......
......
- 1st line is **PaddlePaddle format file head**, it has 3 attributes:
- version of PaddlePaddle, here is 0
- sizeof(float), here is 4
- total number of parameter, here is 32156096
- Other lines print the paramters (assume `<dim>` = 32)
- each line print 32 paramters splitted by ','
- there is 32156096/32 = 1004877 lines, meaning there is 1004877 embedding words
### Embedding Parameters Revision
For users who want to revise the embedding parameters, this function can convert a revised text embedding model to a PaddlePaddle binary model by running the command:
cd $PADDLE_ROOT/demo/model_zoo/embedding
python paraconvert.py --t2b -i INPUT -o OUTPUT
- `-i INPUT`: the name of input text embedding model.
- `-o OUTPUT`: the name of output binary embedding model
Note that the format of input text model is as follows:
-0.7845433,1.1937413,-0.1704215,0.4154715,0.9566584,-0.5558153,-0.2503305, ......
0.0000909,0.0009465,-0.0008813,-0.0008428,0.0007879,0.0000183,0.0001984, ......
......
- there is no file header in 1st line
- each line stores parameters for one word, the separator is commas ','
#Image Classification Tutorial
This tutorial will guide you through training a convolutional neural network to classify objects using the CIFAR-10 image classification dataset.
As shown in the following figure, the convolutional neural network can recognize the main object in images, and output the classification result.
<center>![Image Classification](./image_classification.png)</center>
## Data Preparation
First, download CIFAR-10 dataset. CIFAR-10 dataset can be downloaded from its official website.
<https://www.cs.toronto.edu/~kriz/cifar.html>
We have prepared a script to download and process CIFAR-10 dataset. The script will download CIFAR-10 dataset from the official dataset.
It will convert it to jpeg images and organize them into a directory with the required structure for the tutorial. Make sure that you have installed the python dependency (PIL). If not, you can install it by `pip install PIL` and if you have installed `pip` package.
```bash
cd demo/image_classification/data/
sh download_cifar.sh
```
The CIFAR-10 dataset consists of 60000 32x32 color images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
Here are the classes in the dataset, as well as 10 random images from each:
<center>![Image Classification](./cifar.png)</center>
After downloading and converting, we should find a directory (cifar-out) containing the dataset in the following format:
```
train
---airplane
---automobile
---bird
---cat
---deer
---dog
---frog
---horse
---ship
---truck
test
---airplane
---automobile
---bird
---cat
---deer
---dog
---frog
---horse
---ship
---truck
```
It has two directories:`train` and `test`. These two directories contain training data and testing data of CIFAR-10, respectively. Each of these two folders contains 10 sub-folders, ranging from `airplane` to `truck`. Each sub-folder contains images with the corresponding label. After the images are organized into this structure, we are ready to train an image classification model.
## Preprocess
After the data has been downloaded, it needs to be pre-processed into the Paddle format. We can run the following command for preprocessing.
```
cd demo/image_classification/
sh preprocess.sh
```
`preprocess.sh` calls `./demo/image_classification/preprocess.py` to preprocess image data.
```sh
export PYTHONPATH=$PYTHONPATH:../../
data_dir=./data/cifar-out
python preprocess.py -i $data_dir -s 32 -c 1
```
`./demo/image_classification/preprocess.py` has the following arguments
- `-i` or `--input` specifes the input data directory.
- `-s` or `--size` specifies the processed size of images.
- `-c` or `--color` specifes whether images are color images or gray images.
## Model Training
We need to create a model config file before training the model. An example of the config file (vgg_16_cifar.py) is listed below. **Note**, it is slightly different from the `vgg_16_cifar.py` which also applies to the prediction.
```python
from paddle.trainer_config_helpers import *
data_dir='data/cifar-out/batches/'
meta_path=data_dir+'batches.meta'
args = {'meta':meta_path, 'mean_img_size': 32,
'img_size': 32, 'num_classes': 10,
'use_jpeg': 1, 'color': "color"}
define_py_data_sources2(train_list=data_dir+"train.list",
test_list=data_dir+'test.list',
module='image_provider',
obj='processData',
args=args)
settings(
batch_size = 128,
learning_rate = 0.1 / 128.0,
learning_method = MomentumOptimizer(0.9),
regularization = L2Regularization(0.0005 * 128))
img = data_layer(name='image', size=3*32*32)
lbl = data_layer(name="label", size=10)
# small_vgg is predined in trainer_config_helpers.network
predict = small_vgg(input_image=img, num_channels=3)
outputs(classification_cost(input=predict, label=lbl))
```
The first line imports python functions for defining networks.
```python
from paddle.trainer_config_helpers import *
```
Then define an `define_py_data_sources2` which use python data provider
interface. The arguments in `args` are used in `image_provider.py` which
yeilds image data and transform them to Paddle.
- `meta`: the mean value of training set.
- `mean_img_size`: the size of mean feature map.
- `img_size`: the height and width of input image.
- `num_classes`: the number of classes.
- `use_jpeg`: the data storage type when preprocessing.
- `color`: specify color image.
`settings` specifies the training algorithm. In the following example,
it specifies learning rate as 0.1, but divided by batch size, and the weight decay
is 0.0005 and multiplied by batch size.
```python
settings(
batch_size = 128,
learning_rate = 0.1 / 128.0,
learning_method = MomentumOptimizer(0.9),
regularization = L2Regularization(0.0005 * 128)
)
```
The `small_vgg` specifies the network. We use a small version of VGG convolutional network as our network
for classification. A description of VGG network can be found here [http://www.robots.ox.ac.uk/~vgg/research/very_deep/](http://www.robots.ox.ac.uk/~vgg/research/very_deep/).
```python
# small_vgg is predined in trainer_config_helpers.network
predict = small_vgg(input_image=img, num_channels=3)
```
After writing the config, we can train the model by running the script train.sh. Notice that the following script assumes the you run the script in the `./demo/image_classification` folder. If you run the script in a different folder, you need to change the paths of the scripts and the configuration files accordingly.
```bash
config=vgg_16_cifar.py
output=./cifar_vgg_model
log=train.log
paddle train \
--config=$config \
--dot_period=10 \
--log_period=100 \
--test_all_data_in_one_period=1 \
--use_gpu=1 \
--save_dir=$output \
2>&1 | tee $log
python -m paddle.utils.plotcurve -i $log > plot.png
```
- Here we use GPU mode to train. If you have no gpu environment, just set `use_gpu=0`.
- `./demo/image_classification/vgg_16_cifar.py` is the network and data configuration file. The meaning of the other flags can be found in the documentation of the command line flags.
- The script `plotcurve.py` requires the python module of `matplotlib`, so if it fails, maybe you need to install `matplotlib`.
After training finishes, the training and testing error curve will be saved to `plot.png` using `plotcurve.py` script. An example of the plot is shown below:
<center>![Training and testing curves.](./plot.png)</center>
## Prediction
After we train the model, the model file as well as the model parameters are stored in path `./cifar_vgg_model/pass-%05d`. For example, the model of the 300-th pass is stored at `./cifar_vgg_model/pass-00299`.
To make a prediction for an image, one can run `predict.sh` as follows. The script will output the label of the classfiication.
```
sh predict.sh
```
predict.sh:
```
model=cifar_vgg_model/pass-00299/
image=data/cifar-out/test/airplane/seaplane_s_000978.png
use_gpu=1
python prediction.py $model $image $use_gpu
```
## Exercise
Train a image classification of birds using VGG model and CUB-200 dataset. The birds dataset can be downloaded here. It contains an image dataset with photos of 200 bird species (mostly North American).
<http://www.vision.caltech.edu/visipedia/CUB-200.html>
## Delve into Details
### Convolutional Neural Network
A Convolutional Neural Network is a feedforward neural network that uses convolution layers. It is very suitable for building neural networks that process and understand images. A standard convolutional neural network is shown below:
![Convolutional Neural Network](./lenet.png)
Convolutional Neural Network contains the following layers:
- Convolutional layer: It uses convolution operation to extract features from an image or a feature map.
- Pooling layer: It uses max-pooling to downsample feature maps.
- Fully Connected layer: It uses fully connected connections to transform features.
Convolutional Neural Network achieves amazing performance for image classification because it exploits two important characteristics of images: *local correlation* and *spatial invariance*. By iteratively applying convolution and max-pooing operations, convolutional neural network can well represent these two characteristics of images.
For more details of how to define layers and their connections, please refer to the documentation of layers.
Image Classification Tutorial
=============================
.. toctree::
:maxdepth: 3
:glob:
Training Locally <image_classification.md>
cluster_train/internal/cluster_train.md
cluster_train/opensource/cluster_train.md
# Model Zoo - ImageNet #
[ImageNet](http://www.image-net.org/) is a popular dataset for generic object classification. This tutorial provided convolutional neural network(CNN) models for ImageNet.
## ResNet Introduction
ResNets from paper [Deep Residual Learning for Image Recognition](http://arxiv.org/abs/1512.03385) won the 1st place on the ILSVRC 2015 classification task. They present residual learning framework to ease the training of networks that are substantially deeper than those used previously. The residual connections are shown in following figure. The left building block is used in network of 34 layers and the right bottleneck building block is used in network of 50, 101, 152 layers .
<center>![resnet_block](./resnet_block.jpg)</center>
<center>Figure 1. ResNet Block</center>
We present three ResNet models, which are converted from the models provided by the authors <https://github.com/KaimingHe/deep-residual-networks>. The classfication errors tested in PaddlePaddle on 50,000 ILSVRC validation set with input images channel order of **BGR** by single scale with the shorter side of 256 and single crop as following table.
<center>
<table border="2" cellspacing="0" cellpadding="6" rules="all" frame="border">
<colgroup>
<col class="left" />
<col class="left" />
<col class="left" />
</colgroup>
<thead>
<tr>
<th scope="col" class="left">ResNet</th>
<th scope="col" class="left">Top-1</th>
<th scope="col" class="left">Model Size</th>
</tr>
</thead>
<tbody>
<tr>
<td class="left">ResNet-50</td>
<td class="left">24.9%</td>
<td class="left">99M</td>
</tr>
<tr>
<td class="left">ResNet-101</td>
<td class="left">23.7%</td>
<td class="left">173M</td>
</tr>
<tr>
<td class="left">ResNet-152</td>
<td class="left">23.2%</td>
<td class="left">234M</td>
</tr>
</tbody>
</table></center>
<br>
## ResNet Model
See ```demo/model_zoo/resnet/resnet.py```. This confgiure contains network of 50, 101 and 152 layers. You can specify layer number by adding argument like this ```--config_args=layer_num=50``` in command line arguments.
### Network Visualization
You can get a diagram of ResNet network by running the following command. The script generates dot file and then converts dot file to PNG file, which uses installed draw_dot tool in our server. If you can not access the server, just install graphviz to convert dot file.
```
cd demo/model_zoo/resnet
./net_diagram.sh
```
### Model Download
```
cd demo/model_zoo/resnet
./get_model.sh
```
You can run above command to download all models and mean file and save them in ```demo/model_zoo/resnet/model``` if downloading successfully.
```
mean_meta_224 resnet_101 resnet_152 resnet_50
```
* resnet_50: model of 50 layers.
* resnet_101: model of 101 layers.
* resnet_152: model of 152 layers.
* mean\_meta\_224: mean file with 3 x 224 x 224 size in **BGR** order. You also can use three mean values: 103.939, 116.779, 123.68.
### Parameter Info
* **Convolution Layer Weight**
As batch normalization layer is connected after each convolution layer, there is no parameter of bias and only one weight in this layer.
shape: `(Co, ky, kx, Ci)`
* Co: channle number of output feature map.
* ky: filter size in vertical direction.
* kx: filter size in horizontal direction.
* Ci: channle number of input feature map.
2-Dim matrix: (Co * ky * kx, Ci), saved in row-major order.
* **Fully connected Layer Weight**
2-Dim matrix: (input layer size, this layer size), saved in row-major order.
* **[Batch Normalization](<http://arxiv.org/abs/1502.03167>) Layer Weight**
There are four parameters in this layer. In fact, only .w0 and .wbias are the learned parameters. The other two are therunning mean and variance respectively. They will be loaded in testing. Following table shows parameters of a batch normzalization layer.
<center>
<table border="2" cellspacing="0" cellpadding="6" rules="all" frame="border">
<colgroup>
<col class="left" />
<col class="left" />
<col class="left" />
</colgroup>
<thead>
<tr>
<th scope="col" class="left">Parameter Name</th>
<th scope="col" class="left">Number</th>
<th scope="col" class="left">Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td class="left">_res2_1_branch1_bn.w0</td>
<td class="left">256</td>
<td class="left">gamma, scale parameter</td>
</tr>
<tr>
<td class="left">_res2_1_branch1_bn.w1</td>
<td class="left">256</td>
<td class="left">mean value of feature map</td>
</tr>
<tr>
<td class="left">_res2_1_branch1_bn.w2</td>
<td class="left">256</td>
<td class="left">variance of feature map</td>
</tr>
<tr>
<td class="left">_res2_1_branch1_bn.wbias</td>
<td class="left">256</td>
<td class="left">beta, shift parameter</td>
</tr>
</tbody>
</table></center>
<br>
### Parameter Observation
Users who want to observe the parameters can use python to read:
```
import sys
import numpy as np
def load(file_name):
with open(file_name, 'rb') as f:
f.read(16) # skip header for float type.
return np.fromfile(f, dtype=np.float32)
if __name__=='__main__':
weight = load(sys.argv[1])
```
or simply use following shell command:
```
od -j 16 -f _res2_1_branch1_bn.w0
```
## Feature Extraction
We provide both C++ and Python interfaces to extract features. The following examples use data in `demo/model_zoo/resnet/example` to show the extracting process in detail.
### C++ Interface
First, specify image data list in `define_py_data_sources` in the config, see example `demo/model_zoo/resnet/resnet.py`.
```
train_list = 'train.list' if not is_test else None
# mean.meta is mean file of ImageNet dataset.
# mean.meta size : 3 x 224 x 224.
# If you use three mean value, set like:
# "mean_value:103.939,116.779,123.68;"
args={
'mean_meta': "model/mean_meta_224/mean.meta",
'image_size': 224, 'crop_size': 224,
'color': True,'swap_channel:': [2, 1, 0]}
define_py_data_sources2(train_list,
'example/test.list',
module="example.image_list_provider",
obj="processData",
args=args)
```
Second, specify layers to extract features in `Outputs()` of `resnet.py`. For example,
```
Outputs("res5_3_branch2c_conv", "res5_3_branch2c_bn")
```
Third, specify model path and output directory in `extract_fea_c++.sh
`, and then run following commands
```
cd demo/model_zoo/resnet
./extract_fea_c++.sh
```
If successful, features are saved in `fea_output/rank-00000` as follows. And you can use `load_feature_c` interface in `load_feature.py ` to load such a file.
```
-0.115318 -0.108358 ... -0.087884;-1.27664 ... -1.11516 -2.59123;
-0.126383 -0.116248 ... -0.00534909;-1.42593 ... -1.04501 -1.40769;
```
* Each line stores features of a sample. Here, the first line stores features of `example/dog.jpg` and second line stores features of `example/cat.jpg`.
* Features of different layers are splitted by `;`, and their order is consistent with the layer order in `Outputs()`. Here, the left features are `res5_3_branch2c_conv` layer and right features are `res5_3_branch2c_bn` layer.
### Python Interface
`demo/model_zoo/resnet/classify.py` is an example to show how to use python to extract features. Following example still uses data of `./example/test.list`. Command is as follows:
```
cd demo/model_zoo/resnet
./extract_fea_py.sh
```
extract_fea_py.sh:
```
python classify.py \
--job=extract \
--conf=resnet.py\
--use_gpu=1 \
--mean=model/mean_meta_224/mean.meta \
--model=model/resnet_50 \
--data=./example/test.list \
--output_layer="res5_3_branch2c_conv,res5_3_branch2c_bn" \
--output_dir=features
```
* \--job=extract: specify job mode to extract feature.
* \--conf=resnet.py: network configure.
* \--use_gpu=1: speficy GPU mode.
* \--model=model/resnet_5: model path.
* \--data=./example/test.list: data list.
* \--output_layer="xxx,xxx": specify layers to extract features.
* \--output_dir=features: output diretcoty.
Note, since the convolution layer in these ResNet models is suitable for the cudnn implementation which only support GPU. It not support CPU mode because of compatibility issue and we will fix later.
If run successfully, you will see features saved in `features/batch_0`, this file is produced with cPickle. You can use `load_feature_py` interface in `load_feature.py` to open the file, and it returns a dictionary as follows:
```
{
'cat.jpg': {'res5_3_branch2c_conv': array([[-0.12638293, -0.116248 , -0.11883899, ..., -0.00895038, 0.01994277, -0.00534909]], dtype=float32), 'res5_3_branch2c_bn': array([[-1.42593431, -1.28918779, -1.32414699, ..., -1.45933616, -1.04501402, -1.40769434]], dtype=float32)},
'dog.jpg': {'res5_3_branch2c_conv': array([[-0.11531784, -0.10835785, -0.08809858, ...,0.0055237, 0.01505112, -0.08788397]], dtype=float32), 'res5_3_branch2c_bn': array([[-1.27663755, -1.18272924, -0.90937918, ..., -1.25178063, -1.11515927, -2.59122872]], dtype=float32)}
}
```
Observed carefully, these feature values are consistent with the above results extracted by C++ interface.
## Prediction
`classify.py` also can be used to predict. We provide an example script `predict.sh` to predict data in `example/test.list` using a ResNet model with 50 layers.
```
cd demo/model_zoo/resnet
./predict.sh
```
predict.sh calls the `classify.py`:
```
python classify.py \
--job=predict \
--conf=resnet.py\
--multi_crop \
--model=model/resnet_50 \
--use_gpu=1 \
--data=./example/test.list
```
* \--job=extract: speficy job mode to predict.
* \--conf=resnet.py: network configure.
* \--multi_crop: use 10 crops and average predicting probability.
* \--use_gpu=1: speficy GPU mode.
* \--model=model/resnet_50: model path.
* \--data=./example/test.list: data list.
If run successfully, you will see following results, where 156 and 285 are labels of the images.
```
Label of example/dog.jpg is: 156
Label of example/cat.jpg is: 282
```
# Examples and demos
There are serveral examples and demos here.
## Image
* [Image Classification](image_classification/index.rst)
## NLP
* [Sentiment Analysis](sentiment_analysis/index.rst)
* [Text Generation](text_generation/index.rst)
* [Semantic Role Labeling](semantic_role_labeling/index.md)
## Recommendation
* [MovieLens Dataset](rec/ml_dataset.md)
* [MovieLens Regression](rec/ml_regression.rst)
## Model Zoo
* [ImageNet: ResNet](imagenet_model/resnet_model.md)
* [Embedding: Chinese Word](embedding_model/index.md)
## Customization
* [Writing New Layers](new_layer/index.rst)
Writing New Layers
=======================
This tutorial will guide you to write customized layers in PaddlePaddle. We will utilize fully connected layer as an example to guide you through the following steps for writing a new layer.
- Derive equations for the forward and backward part of the layer.
- Implement C++ class for the layer.
- Write gradient check unit test to make sure the gradients are correctly computed.
- Implement Python wrapper for the layer.
=================
Derive Equations
=================
First we need to derive equations of the *forward* and *backward* part of the layer. The forward part computes the output given an input. The backward part computes the gradients of the input and the parameters given the the gradients of the output.
The illustration of a fully connected layer is shown in the following figure. In a fully connected layer, all output nodes are connected to all the input nodes.
.. image:: ./FullyConnected.jpg
The *forward part* of a layer transforms an input into the corresponding output.
Fully connected layer takes a dense input vector with dimension :math:`D_i`. It uses a transformation matrix :math:`W` with size :math:`D_i \times D_o` to project x into a :math:`D_o` dimensional vector, and add a bias vector :math:`b` with dimension :math:`D_o` to the vector.
.. math::
y = f(W^T x + b)
where :math:`f(.)` is an nonlinear *activation* function, such as sigmoid, tanh, and Relu.
The transformation matrix :math:`W` and bias vector :math:`b` are the *parameters* of the layer. The *parameters* of a layer are learned during training in the *backward pass*. The backward pass computes the gradients of the output function with respect to all parameters and inputs. The optimizer can use chain rule to compute the gradients of the loss function with respect to each parameter. Suppose our loss function is :math:`c(y)`, then
.. math::
\frac{\partial c(y)}{\partial x} = \frac{\partial c(y)}{\partial y} \frac{\partial y}{\partial x}
Suppose :math:`z = f(W^T x + b)`, then
.. math::
\frac{\partial y}{\partial z} = \frac{\partial f(z)}{\partial z}
This derivative can be automatically computed by our base layer class.
Then, for fully connected layer, we need to compute :math:`\frac{\partial z}{\partial x}`, and :math:`\frac{\partial z}{\partial W}`, and :math:`\frac{\partial z}{\partial b}`
.
.. math::
\frac{\partial z}{\partial x} = W \\
\frac{\partial z_j}{\partial W_{ij}} = x_i \\
\frac{\partial z}{\partial b} = \mathbf 1 \\
where .. math::`\mathbf 1` is an all one vector, .. math::`W_{ij}` is the number at the i-th row and j-th column of the matrix .. math::`W`, .. math::`z_j` is the j-th component of the vector .. math::`z`, and .. math::`x_i` is the i-th component of the vector .. math::`x`.
Then we can use chain rule to calculate .. math::`\frac{\partial z}{\partial x}`, and .. math::`\frac{\partial z}{\partial W}`. The details of the computation will be given in the next section.
=================
Implement C++ Class
=================
The C++ class of the layer implements the initialization, forward, and backward part of the layer. The fully connected layer is at `paddle/gserver/layers/FullyConnectedLayer.h` and `paddle/gserver/layers/FullyConnectedLayer.cpp`. We list simplified version of the code below.
It needs to derive the base class `paddle::BaseLayer`, and it needs to override the following functions:
- constructor and destructor.
- `init` function. It is used to initialize the parameters and settings.
- `forward`. It implements the forward part of the layer.
- `backward`. It implements the backward part of the layer.
- `prefetch`. It is utilized to determine the rows corresponding parameter matrix to prefetch from parameter server. You do not need to override this function if your layer does not need remote sparse update. (most layers do not need to support remote sparse update)
The header file is listed below::
namespace paddle {
/**
* A layer has full connections to all neurons in the previous layer.
* It computes an inner product with a set of learned weights, and
* (optionally) adds biases.
*
* The config file api is fc_layer.
*/
class FullyConnectedLayer : public Layer {
protected:
WeightList weights_;
std::unique_ptr<Weight> biases_;
public:
explicit FullyConnectedLayer(const LayerConfig& config)
: Layer(config) {}
~FullyConnectedLayer() {}
bool init(const LayerMap& layerMap, const ParameterMap& parameterMap);
Weight& getWeight(int idx) { return *weights_[idx]; }
void prefetch();
void forward(PassType passType);
void backward(const UpdateCallback& callback = nullptr);
};
} // namespace paddle
It defines the parameters as class variables. We use `Weight` class as abstraction of parameters. It supports multi-thread update. The details of this class will be described in details in the implementations.
- `weights_` is a list of weights for the transformation matrices. The current implementation can have more than one inputs. Thus, it has a list of weights. One weight corresponds to an input.
- `biases_` is a weight for the bias vector.
The fully connected layer does not have layer configuration hyper-parameters. If there are some layer hyper-parameters, a common practice is to store it in `LayerConfig& config`, and put it into a class variable in the constructor.
The following code snippet implements the `init` function.
- First, every `init` function must call the `init` function of the base class `Layer::init(layerMap, parameterMap);`. This statement will initialize the required variables and connections for each layer.
- The it initializes all the weights matrices :math:`W`. The current implementation can have more than one inputs. Thus, it has a list of weights.
- Finally, it initializes the bias.
The code is listed below::
bool FullyConnectedLayer::init(const LayerMap& layerMap,
const ParameterMap& parameterMap) {
/* Initialize the basic parent class */
Layer::init(layerMap, parameterMap);
/* initialize the weightList */
CHECK(inputLayers_.size() == parameters_.size());
for (size_t i = 0; i < inputLayers_.size(); i++) {
// Option the parameters
size_t height = inputLayers_[i]->getSize();
size_t width = getSize();
// create a new weight
if (parameters_[i]->isSparse()) {
CHECK_LE(parameters_[i]->getSize(), width * height);
} else {
CHECK_EQ(parameters_[i]->getSize(), width * height);
}
Weight* w = new Weight(height, width, parameters_[i]);
// append the new weight to the list
weights_.emplace_back(w);
}
/* initialize biases_ */
if (biasParameter_.get() != NULL) {
biases_ = std::unique_ptr<Weight>(new Weight(1, getSize(), biasParameter_));
}
return true;
}
The implementation of the forward part has the following steps.
- Every layer must call `Layer::forward(passType);` at the beginning of its `forward` function.
- Then it allocates memory for the output using `reserveOutput(batchSize, size);`. This step is necessary because we support the batches to have different batch sizes. `reserveOutput` will change the size of the output accordingly. For the sake of efficiency, we will allocate new memory if we want to expand the matrix, but we will reuse the existing memory block if we want to shrink the matrix.
- Then it computes :math:`\sum_i W_i x + b` using Matrix operations. `getInput(i).value` retrieve the matrix of the i-th input. Each input is a :math:`batchSize \times dim` matrix, where each row represents an single input in a batch. For a complete lists of supported matrix operations, please refer to `paddle/math/Matrix.h` and `paddle/math/BaseMatrix.h`.
- Finally it applies the activation function using `forwardActivation();`. It will automatically applies the corresponding activation function specifies in the network configuration.
The code is listed below::
void FullyConnectedLayer::forward(PassType passType) {
Layer::forward(passType);
/* malloc memory for the output_ if necessary */
int batchSize = getInput(0).getBatchSize();
int size = getSize();
{
// Settup the size of the output.
reserveOutput(batchSize, size);
}
MatrixPtr outV = getOutputValue();
// Apply the the transformation matrix to each input.
for (size_t i = 0; i != inputLayers_.size(); ++i) {
auto input = getInput(i);
CHECK(input.value) << "The input of 'fc' layer must be matrix";
i == 0 ? outV->mul(input.value, weights_[i]->getW(), 1, 0)
: outV->mul(input.value, weights_[i]->getW(), 1, 1);
}
/* add the bias-vector */
if (biases_.get() != NULL) {
outV->addBias(*(biases_->getW()), 1);
}
/* activation */ {
forwardActivation();
}
}
The implementation of the backward part has the following steps.
- ` backwardActivation();` computes the gradients of the activation. The gradients will be multiplies in place to the gradients of the output, which can be retrieved using `getOutputGrad()`.
- Compute the gradients of bias. Notice that we an use `biases_->getWGrad()` to get the gradient matrix of the corresponding parameter. After the gradient of one parameter is updated, it *MUST* call `getParameterPtr()->incUpdate(callback);`. This is utilize for parameter update over multiple threads or multiple machines.
- Then it computes the gradients of the transformation matrices and inputs, and it calls `incUpdate` for the corresponding parameter. This gives the framework the chance to know whether it has gathered all the gradient to one parameter so that it can do some overlapping work (e.g., network communication)
The code is listed below::
void FullyConnectedLayer::backward(const UpdateCallback& callback) {
/* Do derivation for activations.*/ {
backwardActivation();
}
if (biases_ && biases_->getWGrad()) {
biases_->getWGrad()->collectBias(*getOutputGrad(), 1);
/* Increasing the number of gradient */
biases_->getParameterPtr()->incUpdate(callback);
}
bool syncFlag = hl_get_sync_flag();
for (size_t i = 0; i != inputLayers_.size(); ++i) {
/* Calculate the W-gradient for the current layer */
if (weights_[i]->getWGrad()) {
MatrixPtr input_T = getInputValue(i)->getTranspose();
MatrixPtr oGrad = getOutputGrad();
{
weights_[i]->getWGrad()->mul(input_T, oGrad, 1, 1);
}
}
/* Calculate the input layers error */
MatrixPtr preGrad = getInputGrad(i);
if (NULL != preGrad) {
MatrixPtr weights_T = weights_[i]->getW()->getTranspose();
preGrad->mul(getOutputGrad(), weights_T, 1, 1);
}
{
weights_[i]->getParameterPtr()->incUpdate(callback);
}
}
}
The `prefetch` function specifies the rows that need to be fetched from parameter server during training. It is only useful for remote sparse training. In remote sparse training, the full parameter matrix is stored distributedly at the parameter server. When the layer uses a batch for training, only a subset of locations of the input is non-zero in this batch. Thus, this layer only needs the rows of the transformation matrix corresponding to the locations of these non-zero entries. The `prefetch` function specifies the ids of these rows.
Most of the layers do not need remote sparse training function. You do not need to override this function in this case::
void FullyConnectedLayer::prefetch() {
for (size_t i = 0; i != inputLayers_.size(); ++i) {
auto* sparseParam =
dynamic_cast<SparsePrefetchRowCpuMatrix*>(weights_[i]->getW().get());
if (sparseParam) {
MatrixPtr input = getInputValue(i);
sparseParam->addRows(input);
}
}
}
Finally, you can use `REGISTER_LAYER(fc, FullyConnectedLayer);` to register the layer. `fc` is the identifier of the layer, and `FullyConnectedLayer` is the class name of the layer::
namespace paddle {
REGISTER_LAYER(fc, FullyConnectedLayer);
}
If the `cpp` file is put into `paddle/gserver/layers`, it will be automatically added to the compilation list.
=================
Write Gradient Check Unit Test
=================
An easy way to verify the correctness of new layer's implementation is to write a gradient check unit test. Gradient check unit test utilizes finite difference method to verify the gradient of a layer. It modifies the input with a small perturbation :math:`\Delta x` and observes the changes of output :math:`\Delta y`, the gradient can be computed as :math:`\frac{\Delta y}{\Delta x }`. This gradient can be compared with the gradient computed by the `backward` function of the layer to ensure the correctness of the gradient computation. Notice that the gradient check only tests the correctness of the gradient computation, it does not necessarily guarantee the correctness of the implementation of the `forward` and `backward` function. You need to write more sophisticated unit tests to make sure your layer is implemented correctly.
All the gradient check unit tests are located in `paddle/gserver/tests/test_LayerGrad.cpp`. You are recommended to put your test into a new test file if you are planning to write a new layer. The gradient test of the gradient check unit test of the fully connected layer is listed below. It has the following steps.
+ Create layer configuration. A layer configuration can include the following attributes:
- size of the bias parameter. (4096 in our example)
- type of the layer. (fc in our example)
- size of the layer. (4096 in our example)
- activation type. (softmax in our example)
- dropout rate. (0.1 in our example)
+ configure the input of the layer. In our example, we have only one input.
- type of the input (`INPUT_DATA`) in our example. It can be one of the following types
- `INPUT_DATA`: dense vector.
- `INPUT_LABEL`: integer.
- `INPUT_DATA_TARGET`: dense vector, but it does not used to compute gradient.
- `INPUT_SEQUENCE_DATA`: dense vector with sequence information.
- `INPUT_HASSUB_SEQUENCE_DATA`: dense vector with both sequence and sub-sequence information.
- `INPUT_SEQUENCE_LABEL`: integer with sequence information.
- `INPUT_SPARSE_NON_VALUE_DATA`: 0-1 sparse data.
- `INPUT_SPARSE_FLOAT_VALUE_DATA`: float sparse data.
- name of the input. (`layer_0` in our example)
- size of the input. (8192 in our example)
- number of non-zeros, only useful for sparse inputs.
- format of sparse data, only useful for sparse inputs.
+ each inputs needs to call `config.layerConfig.add_inputs();` once.
+ call `testLayerGrad` to perform gradient checks. It has the following arguments.
- layer and input configurations. (`config` in our example)
- type of the input. (`fc` in our example)
- batch size of the gradient check. (100 in our example)
- whether the input is transpose. Most layers need to set it to `false`. (`false` in our example)
- whether to use weights. Some layers or activations perform normalization so that the sum of their output is a constant. For example, the sum of output of a softmax activation is one. In this case, we cannot correctly compute the gradients using regular gradient check techniques. A weighted sum of the output, which is not a constant, is utilized to compute the gradients. (`true` in our example, because the activation of a fully connected layer can be softmax)
The code is listed below::
void testFcLayer(string format, size_t nnz) {
// Create layer configuration.
TestConfig config;
config.biasSize = 4096;
config.layerConfig.set_type("fc");
config.layerConfig.set_size(4096);
config.layerConfig.set_active_type("sigmoid");
config.layerConfig.set_drop_rate(0.1);
// Setup inputs.
config.inputDefs.push_back(
{INPUT_DATA, "layer_0", 8192, nnz, ParaSparse(format)});
config.layerConfig.add_inputs();
LOG(INFO) << config.inputDefs[0].sparse.sparse << " "
<< config.inputDefs[0].sparse.format;
for (auto useGpu : {false, true}) {
testLayerGrad(config, "fc", 100, /* trans */ false, useGpu,
/* weight */ true);
}
}
If you are creating a new file for the test, such as `paddle/gserver/tests/testFCGrad.cpp`, you need to add the file to `paddle/gserver/tests/CMakeLists.txt`. An example is given below. All the unit tests will run when you execute the command `make tests`. Notice that some layers might need high accuracy for the gradient check unit tests to work well. You need to configure `WITH_DOUBLE` to `ON` when configuring cmake.
The code is listed below::
add_unittest_without_exec(test_FCGrad
test_FCGrad.cpp
LayerGradUtil.cpp
TestUtil.cpp)
add_test(NAME test_FCGrad
COMMAND test_FCGrad)
=================
Implement Python Wrapper
=================
Implementing Python wrapper allows us to use the added layer in configuration files. All the Python wrappers are in file `python/paddle/trainer/config_parser.py`. An example of the Python wrapper for fully connected layer is listed below. It has the following steps:
- Use `@config_layer('fc’)` at the decorator for all the Python wrapper class. `fc` is the identifier of the layer.
- Implements `__init__` constructor function.
- It first call `super(FCLayer, self).__init__(name, 'fc', size, inputs=inputs, **xargs)` base constructor function. `FCLayer` is the Python wrapper class name, and `fc` is the layer identifier name. They must be correct in order for the wrapper to work.
- Then it computes the size and format (whether sparse) of each transformation matrix as well as the size.
The code is listed below::
@config_layer('fc')
class FCLayer(LayerBase):
def __init__(
self,
name,
size,
inputs,
bias=True,
**xargs):
super(FCLayer, self).__init__(name, 'fc', size, inputs=inputs, **xargs)
for input_index in xrange(len(self.inputs)):
input_layer = self.get_input_layer(input_index)
psize = self.config.size * input_layer.size
dims = [input_layer.size, self.config.size]
format = self.inputs[input_index].format
sparse = format == "csr" or format == "csc"
if sparse:
psize = self.inputs[input_index].nnz
self.create_input_parameter(input_index, psize, dims, sparse, format)
self.create_bias_parameter(bias, self.config.size)
In network configuration, the layer can be specifies using the following code snippets. The arguments of this class are:
- `name` is the name identifier of the layer instance.
- `type` is the type of the layer, specified using layer identifier.
- `size` is the output size of the layer.
- `bias` specifies whether this layer instance has bias.
- `inputs` specifies a list of layer instance names as inputs.
The code is listed below::
Layer(
name = "fc1",
type = "fc",
size = 64,
bias = True,
inputs = [Input("pool3")]
)
You are also recommended to implement a helper for the Python wrapper, which makes it easier to write models. You can refer to `python/paddle/trainer_config_helpers/layers.py` for examples.
此差异已折叠。
# MovieLens Dataset
The [MovieLens Dataset](http://grouplens.org/datasets/movielens/) was collected by GroupLens Research.
The data set contains some user information, movie information, and many movie ratings from \[1-5\].
The data sets have many version depending on the size of set.
We use [MovieLens 1M Dataset](http://files.grouplens.org/datasets/movielens/ml-1m.zip) as a demo dataset, which contains
1 million ratings from 6000 users on 4000 movies. Released 2/2003.
## Dataset Features
In [ml-1m Dataset](http://files.grouplens.org/datasets/movielens/ml-1m.zip), there are many features in these dataset.
The data files (which have ".dat" extension) in [ml-1m Dataset](http://files.grouplens.org/datasets/movielens/ml-1m.zip)
is basically CSV file that delimiter is "::". The description in README we quote here.
### RATINGS FILE DESCRIPTION(ratings.dat)
All ratings are contained in the file "ratings.dat" and are in the
following format:
UserID::MovieID::Rating::Timestamp
- UserIDs range between 1 and 6040
- MovieIDs range between 1 and 3952
- Ratings are made on a 5-star scale (whole-star ratings only)
- Timestamp is represented in seconds since the epoch as returned by time(2)
- Each user has at least 20 ratings
### USERS FILE DESCRIPTION(users.dat)
User information is in the file "users.dat" and is in the following
format:
UserID::Gender::Age::Occupation::Zip-code
All demographic information is provided voluntarily by the users and is
not checked for accuracy. Only users who have provided some demographic
information are included in this data set.
- Gender is denoted by a "M" for male and "F" for female
- Age is chosen from the following ranges:
* 1: "Under 18"
* 18: "18-24"
* 25: "25-34"
* 35: "35-44"
* 45: "45-49"
* 50: "50-55"
* 56: "56+"
- Occupation is chosen from the following choices:
* 0: "other" or not specified
* 1: "academic/educator"
* 2: "artist"
* 3: "clerical/admin"
* 4: "college/grad student"
* 5: "customer service"
* 6: "doctor/health care"
* 7: "executive/managerial"
* 8: "farmer"
* 9: "homemaker"
* 10: "K-12 student"
* 11: "lawyer"
* 12: "programmer"
* 13: "retired"
* 14: "sales/marketing"
* 15: "scientist"
* 16: "self-employed"
* 17: "technician/engineer"
* 18: "tradesman/craftsman"
* 19: "unemployed"
* 20: "writer"
### MOVIES FILE DESCRIPTION(movies.dat)
Movie information is in the file "movies.dat" and is in the following
format:
MovieID::Title::Genres
- Titles are identical to titles provided by the IMDB (including
year of release)
- Genres are pipe-separated and are selected from the following genres:
* Action
* Adventure
* Animation
* Children's
* Comedy
* Crime
* Documentary
* Drama
* Fantasy
* Film-Noir
* Horror
* Musical
* Mystery
* Romance
* Sci-Fi
* Thriller
* War
* Western
- Some MovieIDs do not correspond to a movie due to accidental duplicate
entries and/or test entries
- Movies are mostly entered by hand, so errors and inconsistencies may exist
Regression MovieLens Ratting
============================
Here we demonstrate a **Cosine Similarity Regression** job in movie lens dataset.
This demo will show how paddle does (word) embedding job,
handles the similarity regression,
the character-level convolutional networks for text, and how does paddle handle
multiple types of inputs.
Note that the model structure is not fine-tuned and just a demo to show how paddle works.
YOU ARE WELCOME TO BUILD A BETTER DEMO
BY USING PADDLEPADDLE, AND LET US KNOW TO MAKE THIS DEMO BETTER.
Data Preparation
````````````````
Download and extract dataset
''''''''''''''''''''''''''''
We use `movielens 1m dataset <ml_dataset.html>`_ here.
To download and unzip the dataset, simply run the following commands.
.. code-block:: bash
cd demo/recommendation/data
./ml_data.sh
And the directory structure of :code:`demo/recommendation/data/ml-1m` is:
.. code-block:: text
+--ml-1m
+--- movies.dat # movie features
+--- ratings.dat # ratings
+--- users.dat # user features
+--- README # dataset description
Field config file
'''''''''''''''''
**Field config file** is used to specific the fields dataset and file format,
i.e, specific **WHAT** type it is in each feature file.
The field config file of ml-1m shows in :code:`demo/recommendation/data/config.json`.
It specifics the field types and file names: 1) there are four types of field for user file\: id, gender, age and occupation;
2) the filename is "users.dat", and the delimiter of file is "::".
.. include:: ../../../demo/recommendation/data/config.json
:code: json
:literal:
Preprocess Data
```````````````
You need to install python 3rd party libraries.
IT IS HIGHLY RECOMMEND TO USE VIRTUALENV MAKE A CLEAN PYTHON ENVIRONMENT.
.. code-block:: bash
pip install -r requirements.txt
The general command for preprocessing the dataset is:
.. code-block:: bash
cd demo/recommendation
./preprocess.sh
And the detail steps are introduced as follows.
Extract Movie/User features to python object
'''''''''''''''''''''''''''''''''''''''''''''
There are many features in movie or user in movielens 1m dataset.
Each line of rating file just provides a Movie/User id to refer each movie or user.
We process the movie/user feature file first, and pickle the feature (**Meta**) object as a file.
Meta config file
................
**Meta config file** is used to specific **HOW** to parse each field in dataset.
It could be translated from field config file, or written by hand.
Its file format could be either json or yaml syntax file. Parser will automatically choose the file format by extension name.
To convert Field config file to meta config file, just run:
.. code-block:: bash
cd demo/recommendation/data
python config_generator.py config.json > meta_config.json
The meta config file shows below:
.. include:: ../../../demo/recommendation/data/meta_config.json
:code: json
:literal:
There are two kinds of features in meta\: movie and user.
* in movie file, whose name is movies.dat
* we just split each line by "::"
* pos 0 is id.
* pos 1 feature:
* name is title.
* it uses regex to parse this feature.
* it is a char based word embedding feature.
* it is a sequence.
* pos 2 feature:
* name is genres.
* type is one hot dense vector.
* dictionary is auto generated by parsing, each key is split by '|'
* in user file, whose name is users.dat
* we just split each line by "::"
* pos 0 is id.
* pos 1 feature:
* name is gender
* just simple char based embedding.
* pos 2 feature:
* name is age
* just whole word embedding.
* embedding id will be sort by word.
* pos 3 feature:
* name is occupation.
* just simple whole word embedding.
Meta file
'''''''''
After having meta config file, we can generate **Meta file**, a python pickle object which stores movie/user information.
The following commands could be run to generate it.
.. code-block:: bash
python meta_generator.py ml-1m meta.bin --config=meta_config.json
And the structure of the meta file :code:`meta.bin` is:
.. code-block:: text
+--+ movie
| +--+ __meta__
| | +--+ raw_meta # each feature meta config. list
| | | +
| | | | # ID Field, we use id as key
| | | +--+ {'count': 3883, 'max': 3952, 'is_key': True, 'type': 'id', 'min': 1}
| | | |
| | | | # Titile field, the dictionary list of embedding.
| | | +--+ {'dict': [ ... ], 'type': 'embedding', 'name': 'title', 'seq': 'sequence'}
| | | |
| | | | # Genres field, the genres dictionary
| | | +--+ {'dict': [ ... ], 'type': 'one_hot_dense', 'name': 'genres'}
| | |
| | +--+ feature_map [1, 2] # a list for raw_meta index for feature field.
| | # it means there are 2 features for each key.
| | # * 0 offset of feature is raw_meta[1], Title.
| | # * 1 offset of feature is raw_meta[2], Genres.
| |
| +--+ 1 # movie 1 features
| | +
| | +---+ [[...], [...]] # title ids, genres dense vector
| |
| +--+ 2
| |
| +--+ ...
|
+--- user
+--+ __meta__
| +
| +--+ raw_meta
| | +
| | +--+ id field as user
| | |
| | +--+ {'dict': ['F', 'M'], 'type': 'embedding', 'name': 'gender', 'seq': 'no_sequence'}
| | |
| | +--+ {'dict': ['1', '18', '25', '35', '45', '50', '56'], 'type': 'embedding', 'name': 'age', 'seq': 'no_sequence'}
| | |
| | +--+ {'dict': [...], 'type': 'embedding', 'name': 'occupation', 'seq': 'no_sequence'}
| |
| +--+ feature_map [1, 2, 3]
|
+--+ 1 # user 1 features
|
+--+ 2
+--+ ...
Split Training/Testing files
''''''''''''''''''''''''''''
We split :code:`ml-1m/ratings.dat` into a training and testing file. The way to split file is for each user, we split the
rating by two parts. So each user in testing file will have some rating information in training file.
Use separate.py to separate the training and testing file.
.. code-block:: bash
python split.py ml-1m/ratings.dat --delimiter="::" --test_ratio=0.1
Then two files will be generated\: :code:`ml-1m/ratings.dat.train` and :code:`ml-1m/rating.data.test`.
Move them to workspace :code:`data`, shuffle the train file, and prepare the file list for paddle train.
.. code-block:: bash
shuf ml-1m/ratings.dat.train > ratings.dat.train
cp ml-1m/ratings.dat.test .
echo "./data/ratings.dat.train" > train.list
echo "./data/ratings.dat.test" > test.list
Neural Network Configuration
````````````````````````````
Trainer Config File
'''''''''''''''''''
The network structure shows below.
.. image:: rec_regression_network.png
:align: center
:alt: rec_regression_network
The demo's neural network config file "trainer_config.py" show as below.
.. include:: ../../../demo/recommendation/trainer_config.py
:code: python
:literal:
In this :code:`trainer_config.py`, we just map each feature type to
a feature vector, following shows how to map each feature to a vector shows below.
* :code:`id`\: Just simple embedding, and then add to fully connected layer.
* :code:`embedding`\:
- if is_sequence, get the embedding and do a text convolutional operation,
get the average pooling result.
- if not sequence, get the embedding and add to fully connected layer.
* :code:`one_host_dense`\:
- just two fully connected layer.
Then we combine each features of movie into one movie feature by a
:code:`fc_layer` with multiple inputs, and do the same thing to user features,
get one user feature. Then we calculate the cosine similarity of these two
features.
In these network, we use several api in `trainer_config_helpers
<../../ui/api/trainer_config_helpers/index.html>`_. There are
* Data Layer, `data_layer
<../../ui/api/trainer_config_helpers/layers.html#id1>`_
* Fully Connected Layer, `fc_layer
<../../ui/api/trainer_config_helpers/layers.html#fc-layer>`_
* Embedding Layer, `embedding_layer
<../../ui/api/trainer_config_helpers/layers.html#embedding-layer>`_
* Context Projection Layer, `context_projection
<../../ui/api/trainer_config_helpers/layers.html#context-projection>`_
* Pooling Layer, `pooling_layer
<../../ui/api/trainer_config_helpers/layers.html#pooling-layer>`_
* Cosine Similarity Layer, `cos_sim
<../../ui/api/trainer_config_helpers/layers.html#cos-sim>`_
* Text Convolution Pooling Layer, `text_conv_pool
<../../ui/api/trainer_config_helpers/networks.html
#trainer_config_helpers.networks.text_conv_pool>`_
* Declare Python Data Sources, `define_py_data_sources
<../../ui/api/trainer_config_helpers/data_sources.html>`_
Data Provider
'''''''''''''
.. include:: ../../../demo/recommendation/dataprovider.py
:code: python
:literal:
The data provider just read the meta.bin and rating file, yield each sample for training.
In this :code:`dataprovider.py`, we should set\:
* obj.slots\: The feature types and dimension.
* use_seq\: Whether this :code:`dataprovider.py` in sequence mode or not.
* process\: Return each sample of data to :code:`paddle`.
The data provider details document see `there <../../ui/DataProvider.html>`_.
Train
`````
After prepare data, config network, writting data provider, now we can run paddle training.
The run.sh is shown as follow:
.. include:: ../../../demo/recommendation/run.sh
:code: bash
:literal:
It just start a paddle training process, write the log to `log.txt`,
then print it on screen.
Each command line argument in :code:`run.sh`, please refer to the `command line
arguments <TBD>`_ page. The short description of these arguments is shown as follow.
* config\: Tell paddle which file is neural network configuration.
* save_dir\: Tell paddle save model into './output'
* use_gpu\: Use gpu or not. Default is false.
* trainer_count\: The compute thread in one machine.
* test_all_data_in_one_period\: Test All Data during one test period. Otherwise,
will test a :code:`batch_size` data in one test period.
* log_period\: Print log after train :code:`log_period` batches.
* dot_period\: Print a :code:`.` after train :code:`dot_period` batches.
* num_passes\: Train at most :code:`num_passes`.
If training process starts successfully, the output likes follow:
.. code-block:: text
I0601 08:07:22.832059 10549 TrainerInternal.cpp:157] Batch=100 samples=160000 AvgCost=4.13494 CurrentCost=4.13494 Eval: CurrentEval:
I0601 08:07:50.672627 10549 TrainerInternal.cpp:157] Batch=200 samples=320000 AvgCost=3.80957 CurrentCost=3.48421 Eval: CurrentEval:
I0601 08:08:18.877369 10549 TrainerInternal.cpp:157] Batch=300 samples=480000 AvgCost=3.68145 CurrentCost=3.42519 Eval: CurrentEval:
I0601 08:08:46.863963 10549 TrainerInternal.cpp:157] Batch=400 samples=640000 AvgCost=3.6007 CurrentCost=3.35847 Eval: CurrentEval:
I0601 08:09:15.413025 10549 TrainerInternal.cpp:157] Batch=500 samples=800000 AvgCost=3.54811 CurrentCost=3.33773 Eval: CurrentEval:
I0601 08:09:36.058670 10549 TrainerInternal.cpp:181] Pass=0 Batch=565 samples=902826 AvgCost=3.52368 Eval:
I0601 08:09:46.215489 10549 Tester.cpp:101] Test samples=97383 cost=3.32155 Eval:
I0601 08:09:46.215966 10549 GradientMachine.cpp:132] Saving parameters to ./output/model/pass-00000
I0601 08:09:46.233397 10549 ParamUtil.cpp:99] save dir ./output/model/pass-00000
I0601 08:09:46.233438 10549 Util.cpp:209] copy trainer_config.py to ./output/model/pass-00000
I0601 08:09:46.233541 10549 ParamUtil.cpp:147] fileName trainer_config.py
The model is saved in :code:`output/` directory. You can use :code:`Ctrl-C` to stop training whenever you want.
Evaluate and Predict
````````````````````
After training several passes, you can evalute them and get the best pass. Just run
.. code-block:: bash
./evalute.sh
You will see messages like this:
.. code-block:: text
Best pass is 00009, error is 3.06949, which means predict get error as 0.875998002281
evaluating from pass output/pass-00009
Then, you can predict what any user will rate a movie. Just run
.. code-block:: bash
python prediction.py 'output/pass-00009/'
Predictor will read user input, and predict scores. It has a command-line user interface as follows:
.. code-block:: text
Input movie_id: 9
Input user_id: 4
Prediction Score is 2.56
Input movie_id: 8
Input user_id: 2
Prediction Score is 3.13
# Semantic Role Labelling Tutorial
Semantic role labeling (SRL) is a form of shallow semantic parsing whose goal is to discover the predicate-argument structure of each predicate in a given input sentence. SRL is useful as an intermediate step in a wide range of natural language processing tasks, such as information extraction. automatic document categorization and question answering. An instance is as following [1]:
[ <sub>A0</sub> He ] [ <sub>AM-MOD</sub> would ][ <sub>AM-NEG</sub> n’t ] [ <sub>V</sub> accept] [ <sub>A1</sub> anything of value ] from [<sub>A2</sub> those he was writing about ].
- V: verb
- A0: acceptor
- A1: thing accepted
- A2: accepted-from
- A3: Attribute
- AM-MOD: modal
- AM-NEG: negation
Given the verb "accept", the chunks in sentence would play certain semantic roles. Here, the label scheme is from Penn Proposition Bank.
To this date, most of the successful SRL systems are built on top of some form of parsing results where pre-defined feature templates over the syntactic structure are used. This tutorial will present an end-to-end system using deep bidirectional long short-term memory (DB-LSTM)[2] for solving the SRL task, which largely outperforms the previous state-of-the-art systems. The system regards SRL task as the sequence labelling problem.
## Data Description
The relevant paper[2] takes the data set in CoNLL-2005&2012 Shared Task for training and testing. Accordingto data license, the demo adopts the test data set of CoNLL-2005, which can be reached on website.
To download and process the original data, user just need to execute the following command:
```bash
cd data
./get_data.sh
```
Several new files appear in the `data `directory as follows.
```bash
conll05st-release:the test data set of CoNll-2005 shared task
test.wsj.words:the Wall Street Journal data sentences
test.wsj.props: the propositional arguments
src.dict:the dictionary of words in sentences
tgt.dict:the labels dictionary
feature: the extracted features from data set
```
## Training
### DB-LSTM
Please refer to the Sentiment Analysis demo to learn more about the long short-term memory unit.
Unlike Bidirectional-LSTM that used in Sentiment Analysis demo, the DB-LSTM adopts another way to stack LSTM layer. First a standard LSTM processes the sequence in forward direction. The input and output of this LSTM layer are taken by the next LSTM layer as input, processed in reversed direction. These two standard LSTM layers compose a pair of LSTM. Then we stack LSTM layers pair after pair to obtain the deep LSTM model.
The following figure shows a temporal expanded 2-layer DB-LSTM network.
<center>
![pic](./network_arch.png)
</center>
### Features
Two input features play an essential role in this pipeline: predicate (pred) and argument (argu). Two other features: predicate context (ctx-p) and region mark (mr) are also adopted. Because a single predicate word can not exactly describe the predicate information, especially when the same words appear more than one times in a sentence. With the predicate context, the ambiguity can be largely eliminated. Similarly, we use region mark m<sub>r</sub> = 1 to denote the argument position if it locates in the predicate context region, or m<sub>r</sub> = 0 if does not. These four simple features are all we need for our SRL system. Features of one sample with context size set to 1 is showed as following[2]:
<center>
![pic](./feature.jpg)
</center>
In this sample, the coresponding labelled sentence is:
[ <sub>A1</sub> A record date ] has [ <sub>AM-NEG</sub> n't ] been [ <sub>V</sub> set ] .
In the demo, we adopt the feature template as above, consists of : `argument`, `predicate`, `ctx-p (p=-1,0,1)`, `mark` and use `B/I/O` scheme to label each argument. These features and labels are stored in `feature` file, and separated by `\t`.
### Data Provider
`dataprovider.py` is the python file to wrap data. `hook()` function is to define the data slots for network. The Six features and label are all IndexSlots.
```
def hook(settings, word_dict, label_dict, **kwargs):
settings.word_dict = word_dict
settings.label_dict = label_dict
#all inputs are integral and sequential type
settings.slots = [
integer_value_sequence(len(word_dict)),
integer_value_sequence(len(word_dict)),
integer_value_sequence(len(word_dict)),
integer_value_sequence(len(word_dict)),
integer_value_sequence(len(word_dict)),
integer_value_sequence(2),
integer_value_sequence(len(label_dict))]
```
The corresponding data iterator is as following:
```
@provider(use_seq=True, init_hook=hook)
def process(obj, file_name):
with open(file_name, 'r') as fdata:
for line in fdata:
sentence, predicate, ctx_n1, ctx_0, ctx_p1, mark, label = line.strip().split('\t')
words = sentence.split()
sen_len = len(words)
word_slot = [obj.word_dict.get(w, UNK_IDX) for w in words]
predicate_slot = [obj.word_dict.get(predicate, UNK_IDX)] * sen_len
ctx_n1_slot = [obj.word_dict.get(ctx_n1, UNK_IDX) ] * sen_len
ctx_0_slot = [obj.word_dict.get(ctx_0, UNK_IDX) ] * sen_len
ctx_p1_slot = [obj.word_dict.get(ctx_p1, UNK_IDX) ] * sen_len
marks = mark.split()
mark_slot = [int(w) for w in marks]
label_list = label.split()
label_slot = [obj.label_dict.get(w) for w in label_list]
yield word_slot, predicate_slot, ctx_n1_slot, ctx_0_slot, ctx_p1_slot, mark_slot, label_slot
```
The `process`function yield 7 lists which are six features and labels.
### Neural Network Config
`db_lstm.py` is the neural network config file to load the dictionaries and define the data provider module and network architecture during the training procedure.
Seven `data_layer` load instances from data provider. Six features are transformed into embedddings respectively, and mixed by `mixed_layer` . Deep bidirectional LSTM layers extract features for the softmax layer. The objective function is cross entropy of labels.
### Run Training
The script for training is `train.sh`, user just need to execute:
```bash
./train.sh
```
The content in `train.sh`:
```
paddle train \
--config=./db_lstm.py \
--save_dir=./output \
--trainer_count=4 \
--log_period=10 \
--num_passes=500 \
--use_gpu=false \
--show_parameter_stats_period=10 \
--test_all_data_in_one_period=1 \
2>&1 | tee 'train.log'
```
- \--config=./db_lstm.py : network config file.
- \--save_di=./output: output path to save models.
- \--trainer_count=4 : set thread number (or GPU count).
- \--log_period=10 : print log every 20 batches.
- \--num_passes=500: set pass number, one pass in PaddlePaddle means training all samples in dataset one time.
- \--use_gpu=false: use CPU to train, set true, if you install GPU version of PaddlePaddle and want to use GPU to train.
- \--show_parameter_stats_period=10: show parameter statistic every 100 batches.
- \--test_all_data_in_one_period=1: test all data in every testing.
After training, the models will be saved in directory `output`.
### Run testing
The script for testing is `test.sh`, user just need to execute:
```bash
./test.sh
```
The main part in `tesh.sh`
```
paddle train \
--config=./db_lstm.py \
--model_list=$model_list \
--job=test \
--config_args=is_test=1 \
```
- \--config=./db_lstm.py: network config file
- \--model_list=$model_list.list: model list file
- \--job=test: indicate the test job
- \--config_args=is_test=1: flag to indicate test
### Run prediction
The script for prediction is `predict.sh`, user just need to execute:
```bash
./predict.sh
```
In `predict.sh`, user should offer the network config file, model path, label file, word dictionary file, feature file
```
python predict.py
-c $config_file
-w $model_path
-l $label_file
-d $dict_file
-i $input_file
```
`predict.py` is the main executable python script, which includes functions: load model, load data, data prediction. The network model will output the probability distribution of labels. In the demo, we take the label with maximum probability as result. User can also implement the beam search or viterbi decoding upon the probability distribution matrix.
After prediction, the result is saved in `predict.res`.
## Reference
[1] Martha Palmer, Dan Gildea, and Paul Kingsbury. The Proposition Bank: An Annotated Corpus of Semantic Roles , Computational Linguistics, 31(1), 2005.
[2] Zhou, Jie, and Wei Xu. "End-to-end learning of semantic role labeling using recurrent neural networks." Proceedings of the Annual Meeting of the Association for Computational Linguistics. 2015.
Sentiment Analasis Tutorial
===========================
.. toctree::
:maxdepth: 3
:glob:
Training Locally <sentiment_analysis.md>
internal/cluster_train.md
# Sentiment Analysis Tutorial
Sentiment analysis has many applications. A basic task in sentiment analysis is classifying the polarity of a given text at the document, sentence or feature/aspect level. One simple example is to classify the customer reviews in a shopping website, a tourism website, and group buying websites like Amazon, TaoBao, Tmall etc.
Sentiment analysis is also used to monitor social media based on large amount of reviews or blogs. For example, the researchers analyzed several surveys on consumer confidence and political opinion, found they correlate to sentiment word frequencies in contemporaneous Twitter messages [1]. Another example is to forecast stock movements through analyzing the text content of a daily Twitter blog [2].
On the other hand, grabbing the user comments of products and analyzing their sentiment are useful to understand user preferences for companies, products, even competing products.
This tutorial will guide you through the process of training a Long Short Term Memory (LSTM) Network to classify the sentiment of sentences from [Large Movie Review Dataset](http://ai.stanford.edu/~amaas/data/sentiment/), sometimes known as the [Internet Movie Database (IMDB)](http://ai.stanford.edu/~amaas/papers/wvSent_acl2011.pdf). This dataset contains movie reviews along with their associated binary sentiment polarity labels, namely positive and negative. So randomly guessing yields 50% accuracy.
## Data Preparation
### IMDB Data Introduction
Before training models, we need to preprocess the data and build a dictionary. First, you can use following script to download IMDB dataset and [Moses](http://www.statmt.org/moses/) tool, which is a statistical machine translation system. We provide a data preprocessing script, which is capable of handling not only IMDB data, but also other user-defined data. In order to use the pre-written script, it needs to move labeled train and test samples to another path, which has been done in `get_imdb.sh`.
```
cd demo/sentiment/data
./get_imdb.sh
```
If the data is obtained successfuly, you will see the following files at ```./demo/sentiment/data```:
```
aclImdb get_imdb.sh imdb mosesdecoder-master
```
* aclImdb: raw dataset downloaded from website.
* imdb: only contains train and test data.
* mosesdecoder-master: Moses tool.
IMDB dataset contains 25,000 highly polar movie reviews for training, and 25,000 for testing. A negative review has a score ≤ 4 out of 10, and a positive review has a score ≥ 7 out of 10. After running `./get_imdb.sh`, we can find the dataset has the following structure in `aclImdb`.
```
imdbEr.txt imdb.vocab README test train
```
* train: train sets.
* test : test sets.
* imdb.vocab: dictionary.
* imdbEr.txt: expected rating for each token in imdb.vocab.
* README: data documentation.
Both train and test set directory contains:
```
labeledBow.feat neg pos unsup unsupBow.feat urls_neg.txt urls_pos.txt urls_unsup.txt
```
* pos: positive samples, contains 12,500 txt files, each file is one movie review.
* neg: negative samples, contains 12,500 txt files, each file is one movie review.
* unsup: unlabeled samples, contains 50,000 txt files.
* urls_xx.txt: urls of each reviews.
* xxBow.feat: already-tokenized bag of words (BoW) features.
### IMDB Data Preparation
In this demo, we only use labled train and test set and not use imdb.vocab as dictionary. By default, dictionary is builded on train set. Train set is shuffled and test set is not. `tokenizer.perl` in Moses tool is used to tokenize the words and punctuation. Simply execute the following command to preprcess data.
```
cd demo/sentiment/
./preprocess.sh
```
preprocess.sh:
```
data_dir="./data/imdb"
python preprocess.py -i data_dir
```
* data_dir: input data directory.
* preprocess.py: preprocess script.
If running successfully, you will see `demo/sentiment/data/pre-imdb` directory as follows:
```
dict.txt labels.list test.list test_part_000 train.list train_part_000
```
* test\_part\_000 and train\_part\_000: all labeled test and train sets. Train sets have be shuffled.
* train.list and test.list: train and test file lists.
* dict.txt: dictionary generated on train sets by default.
* labels.txt: neg 0, pos 1, means label 0 is negative review, label 1 is positive review.
### User-defined Data Preparation
If you perform other sentiment classifcation task, you can prepare data as follows. We have provided the scripts to build dictionary and preprocess data. So just organize data as follows.
```
dataset
|----train
| |----class1
| | |----text_files
| |----class2
| | |----text_files
| | ...
|----test
| |----class1
| | |----text_files
| |----class2
| | |----text_files
| | ...
```
* dataset: 1st directory.
* train, test: 2nd directory.
* class1,class2,...: 3rd directory.
* text_files: samples with text file format.
All samples with text files format under the same folder are same category. Each text file contains one or more samples and each line is one sample. In order to shuffle fully, the preprocessing is a little different for data with multiple lines in one text file, which needs to set `-m True` in `preprocess.sh`. And tokenizer.perl is used by default. If you don't need it, only set `-t False` in `preprocess.sh'.
## Training
In this task, we use Recurrent Neural Network (RNN) of LSTM architecure to train sentiment analysis model. LSTM model was introduced primarily in order to overcome the problem of vanishing gradients. LSTM network resembles a standard recurrent neural network with a hidden layer, but each ordinary node in the hidden layer is replaced by a memory cell. Each memory cell contains four main elements: an input gate, a neuron with a self-recurrent connection, a forget gate and an output gate. More details can be found in the literature [4]. The biggest advantage of the LSTM architecture is that it learns to memorize information over long time intervals without the loss of short time memory. At each time step with a new coming word, historical information stored in the memory block is updated to iteratively learn the sequence representation.
<center>![LSTM](./lstm.png)</center>
<center>Figure 1. LSTM [3]</center>
Sentiment analysis is among the most typical problems in natural language understanding. It aims at predicting the attitude expressed in a sequence. Usually, only some key words, like adjectives and adverbs words, play a major role in predicting the sentiment of sequences or paragraphs. However, some review or comment contexts are very long, such as IMDB dataset. We use LSTM to perform this task for its improved design with the gate mechanism. First, it is able to summarize the representation from word level to context level with variable context length which is adapted by the gate values. Second, it can utilize the expanded context at the sentence level, while most methods are good at utilizing n-gram level knowledge. Third, it learns the paragraph representation directly rather than combining the context level information. This results in this end-to-end framework.
In this demo we provide two network, namely bidirectional-LSTM and three layers of stacked-LSTM.
#### Bidirectional-LSTM
One is a bidirectional LSTM network, connected by fully connected layer and softmax, as shown in Figure 2.
<center>![BiLSTM](./bi_lstm.jpg)</center>
<center>Figure 2. Bidirectional-LSTM </center>
#### Stacked-LSTM
Another is three-layer LSTM structure in Figure 3. The bottom of the figure is word embedding. Next, three LSTM-Hidden layers are connected and the second LSTM is reversed. Then extract the maximum hidden vectors of all time step of hidden and LSTM layer as the representation for the entire sequence. Finally, a fully connected feed forward layer with softmax activation is used to perform the classification task. This network is refered to paper [5].
<center>![StackedLSTM](./stacked_lstm.jpg)</center>
<center>Figure 3. Stacked-LSTM for sentiment analysis </center>
**Config**
Switch into `demo/sentiment` directory, `trainer_config.py` file is an example of the config, containing algorithm and newtork configure. The first line imports predefined networks from `sentiment_net.py`.
trainer_config.py:
```python
from sentiment_net import *
data_dir = "./data/pre-imdb"
# whether this config is used for test
is_test = get_config_arg('is_test', bool, False)
# whether this config is used for prediction
is_predict = get_config_arg('is_predict', bool, False)
dict_dim, class_dim = sentiment_data(data_dir, is_test, is_predict)
################## Algorithm Config #####################
settings(
batch_size=128,
learning_rate=2e-3,
learning_method=AdamOptimizer(),
regularization=L2Regularization(8e-4),
gradient_clipping_threshold=25
)
#################### Network Config ######################
stacked_lstm_net(dict_dim, class_dim=class_dim,
stacked_num=3, is_predict=is_predict)
#bidirectional_lstm_net(dict_dim, class_dim=class_dim, is_predict=is_predict)
```
* **Data Definition**:
* get\_config\_arg(): get arguments setted by `--config_args=xx` in commandline argument.
* Define TrainData and TestData provider, here using Python interface (PyDataProviderWrapper) of PaddlePaddle to load data. For details, you can refer to the document of PyDataProvider.
* **Algorithm Configuration**:
* use sgd algorithm.
* use adam optimization.
* set batch size of 128.
* set average sgd window.
* set global learning rate.
* **Network Configuration**:
* dict_dim: get dictionary dimension.
* class_dim: set category number, IMDB has two label, namely positive and negative label.
* `stacked_lstm_net`: predefined network as shown in Figure 3, use this network by default.
* `bidirectional_lstm_net`: predefined network as shown in Figure 2.
**Training**
Install PaddlePaddle first if necessary. Then you can use script `train.sh` as follows to launch local training.
```
cd demo/sentiment/
./train.sh
```
train.sh:
```
config=trainer_config.py
output=./model_output
paddle train --config=$config \
--save_dir=$output \
--job=train \
--use_gpu=false \
--trainer_count=4 \
--num_passes=10 \
--log_period=20 \
--dot_period=20 \
--show_parameter_stats_period=100 \
--test_all_data_in_one_period=1 \
2>&1 | tee 'train.log'
```
* \--config=$config: set network config.
* \--save\_dir=$output: set output path to save models.
* \--job=train: set job mode to train.
* \--use\_gpu=false: use CPU to train, set true, if you install GPU version of PaddlePaddle and want to use GPU to train.
* \--trainer\_count=4: set thread number (or GPU count).
* \--num\_passes=15: set pass number, one pass in PaddlePaddle means training all samples in dataset one time.
* \--log\_period=20: print log every 20 batches.
* \--show\_parameter\_stats\_period=100: show parameter statistic every 100 batches.
* \--test\_all_data\_in\_one\_period=1: test all data every testing.
If the run succeeds, the output log is saved in path of `demo/sentiment/train.log` and model is saved in path of `demo/sentiment/model_output/`. The output log is explained as follows.
```
Batch=20 samples=2560 AvgCost=0.681644 CurrentCost=0.681644 Eval: classification_error_evaluator=0.36875 CurrentEval: classification_error_evaluator=0.36875
...
Pass=0 Batch=196 samples=25000 AvgCost=0.418964 Eval: classification_error_evaluator=0.1922
Test samples=24999 cost=0.39297 Eval: classification_error_evaluator=0.149406
```
- Batch=xx: means passing xx batches.
- samples=xx: means passing xx samples.
- AvgCost=xx: averaged cost from 0-th batch to current batch.
- CurrentCost=xx: current cost of latest log_period batches.
- Eval: classification\_error\_evaluator=xx: means classfication error from 0-th batch ro current batch.
- CurrentEval: classification\_error\_evaluator: current classfication error of the lates log_period batches.
- Pass=0: Going through all training set one time is called one pass. 0 means going through training set first time.
By default, we use the `stacked_lstm_net` network, which converges at a faster rate than `bidirectional_lstm_net` when passing same sample number. If you want to use bidirectional LSTM, just remove comment in the last line and comment `stacked_lstm_net`.
## Testing
Testing means evaluating the labeled validation set using trained model.
```
cd demo/sentiment
./test.sh
```
test.sh:
```bash
function get_best_pass() {
cat $1 | grep -Pzo 'Test .*\n.*pass-.*' | \
sed -r 'N;s/Test.* error=([0-9]+\.[0-9]+).*\n.*pass-([0-9]+)/\1 \2/g' | \
sort | head -n 1
}
log=train.log
LOG=`get_best_pass $log`
LOG=(${LOG})
evaluate_pass="model_output/pass-${LOG[1]}"
echo 'evaluating from pass '$evaluate_pass
model_list=./model.list
touch $model_list | echo $evaluate_pass > $model_list
net_conf=trainer_config.py
paddle train --config=$net_conf \
--model_list=$model_list \
--job=test \
--use_gpu=false \
--trainer_count=4 \
--config_args=is_test=1 \
2>&1 | tee 'test.log'
```
The function `get_best_pass` gets the best model by classification error rate for testing. In this example, We use test dataset of IMDB as validation by default. Unlike training, it needs to specify `--job=test` and model path, namely `--model_list=$model_list` here. If running successfully, the log is saved in path of `demo/sentiment/test.log`. For example, in our test, the best model is `model_output/pass-00002`, the classification error is 0.115645 as follows.
```
Pass=0 samples=24999 AvgCost=0.280471 Eval: classification_error_evaluator=0.115645
```
## Prediction
`predict.py` provides a predicting interface. You should install python api of PaddlePaddle before using it. One example to predict unlabeled review of IMDB is as follows. Simply running:
```
cd demo/sentiment
./predict.sh
```
predict.sh:
```
#Note the default model is pass-00002, you shold make sure the model path
#exists or change the mode path.
model=model_output/pass-00002/
config=trainer_config.py
label=data/pre-imdb/labels.list
python predict.py \
-n $config\
-w $model \
-b $label \
-d data/pre-imdb/dict.txt \
-i data/aclImdb/test/pos/10007_10.txt
```
* `predict.py`: predicting interface.
* -n $config : set network configure.
* -w $model: set model path.
* -b $label: set dictionary about corresponding relation between integer label and string label.
* -d data/pre-imdb/dict.txt: set dictionary.
* -i data/aclImdb/test/pos/10014_7.txt: set one example file to predict.
Note you should make sure the default model path `model_output/pass-00002`
exists or change the model path.
Predicting result of this example:
```
Loading parameters from model_output/pass-00002/
./data/aclImdb/test/pos/10014_7.txt: predicting label is pos
```
We sincerely appreciate your interest and welcome your contributions.
## Reference
[1] Brendan O'Connor, Ramnath Balasubramanyan, Bryan R. Routledge, and Noah A. Smith. 2010. [From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series](http://homes.cs.washington.edu/~nasmith/papers/oconnor+balasubramanyan+routledge+smith.icwsm10.pdf). In ICWSM-2010. <br>
[2] Johan Bollen, Huina Mao, Xiaojun Zeng. 2011. [Twitter mood predicts the stock market](http://arxiv.org/abs/1010.3003), Journal of Computational Science.<br>
[3] Alex Graves, Marcus Liwicki, Santiago Fernan- dez, Roman Bertolami, Horst Bunke, and Ju ̈rgen Schmidhuber. 2009. [A novel connectionist system for unconstrained handwriting recognition. IEEE Transactions on Pattern Analysis and Machine In- telligence](http://www.cs.toronto.edu/~graves/tpami_2009.pdf), 31(5):855–868.<br>
[4] Zachary C. Lipton, [A Critical Review of Recurrent Neural Networks for Sequence Learning](http://arxiv.org/abs/1506.00019v1), arXiv:1506.00019. <br>
[5] Jie Zhou and Wei Xu; [End-to-end Learning of Semantic Role Labeling Using Recurrent Neural Networks](http://www.aclweb.org/anthology/P/P15/P15-1109.pdf); ACL-IJCNLP 2015. <br>
Text Generation Tutorial
========================
.. toctree::
:maxdepth: 3
:glob:
Training Locally <text_generation.md>
internal/cluster_train.md
# Text generation Tutorial #
Sequence to sequence has been proven to be a powerful model for language generation. It can be used for machine translation, query rewriting, image captioning, etc.
This tutorial guides you through training a sequence to sequence model for neural machine translation (NMT) network that translates French to English.
We follow the paper [Neural Machine Translation by Jointly Learning to Align and Translate](http://arxiv.org/abs/1409.0473) , which details the model architecture and training procedure for good performance on WMT-14 dataset. This tutorial reproduces this result in PaddlePaddle.
We thank @caoying for the pull request that defines the model architecture and solver configurations.
## Data Preparation ##
### Download and Extract ###
Download the WMT-14 dataset from [http://www-lium.univ-lemans.fr/~schwenk/cslm\_joint\_paper/](http://www-lium.univ-lemans.fr/~schwenk/cslm_joint_paper/), extract it, and divide Develop and Test data into separate folder.
- **Train data**: [bitexts (after selection)](http://www-lium.univ-lemans.fr/~schwenk/cslm_joint_paper/data/bitexts.tgz)
- **Develop and Test data**: [dev+test data](http://www-lium.univ-lemans.fr/~schwenk/cslm_joint_paper/data/dev+test.tgz)
To do this, simply run the following commands in linux, otherwise, you need to download, extract, divide, and rename the file suffix respectively.
```bash
cd demo/seqToseq/data
./wmt14_data.sh
```
We should find that the dataset `wmt14` has three folders as shown in the following table.
<table border="2" cellspacing="0" cellpadding="6" rules="all" frame="border">
<colgroup>
<col class="left" />
<col class="left" />
<col class="left" />
<col class="left" />
</colgroup>
<thead>
<tr>
<th scope="col" class="left">folder name</th>
<th scope="col" class="left">French-English parallel corpora file</th>
<th scope="col" class="left">number of total file</th>
<th scope="col" class="left">size</th>
</tr>
</thead>
<tbody>
<tr>
<td class="left">train_data</td>
<td class="left">ccb2_pc30.src, ccb2_pc30.trg, etc</td>
<td class="left">twelve</td>
<td class="left">3.55G</td>
</tr>
<tr>
<td class="left">test_data</td>
<td class="left">ntst1213.src, ntst1213.trg</td>
<td class="left">two</td>
<td class="left">1636k</td>
</tr>
<tr>
<td class="left">gen_data</td>
<td class="left">ntst14.src, ntst14.trg</td>
<td class="left">two</td>
<td class="left">864k</td>
</tr>
</tbody>
</table>
<br/>
- Each folder has French-English parallel corpora
- **XXX.src** are source French files; **XXX.trg** are target English files.
- The number of lines of **XXX.src** and **XXX.trg** should be the same.
- Each line is a French/English sentence.
- There is a one-to-one correspondence between the sentence at the i-th line of **XXX.src** and **XXX.trg**.
### User Defined Dataset ###
If you need to do other sequence-to-sequence tasks, such as Paraphrasing, you only need to organize the data as follows, and place them in `demo/seqToseq/data`:
dataset
train
file1.src file1.trg
file2.src file2.trg
......
test
file1.src file1.trg
file2.src file2.trg
......
gen
file1.src file1.trg
file2.src file2.trg
......
- 1st directory: dataset folder name
- 2nd directory: folder of train, test, and gen. The names of these three folders are fixed.
- 3rd file: Source-Target parallel corpora files.
- **XXX.src** are source files, **XXX.trg** are target files.
- Each line of the file must be a sequence.
- There should be a one-to-one correspondence between the i-th sequence of **XXX.src** and **XXX.trg**.
## Data Preprocess ##
### Preprocessing Workflow ###
- Concat each Source-Target parallel corpora to be one file:
- concat each **XXX.src** and **XXX.trg** to be **XXX**.
- the i-th line of **XXX** = the i-th line of **XXX.src** + '\t' + the i-th line of **XXX.trg**
- Build source and target dictionary of train data, each dictionary has DICTSIZE words:
- the most frequent (DICTSIZE-3) words
- 3 special token:
- `<s>`: the start of a sequence
- `<e>`: the end of a sequence
- `<unk>`: a word not included in dictionary
### Preprocessing Command and Result
The general command for preprocessing the dataset is:
```python
cd demo/seqToseq/
python preprocess.py -i INPUT [-d DICTSIZE] [-m]
```
- `-i INPUT`: the path of input original dataset
- `-d DICTSIZE`: the specified word count of dictionary, if not set, dictionary will contain all the words in input dataset
- `-m --mergeDict`: merge source and target dictionary, thus, two dictionaries have the same context
And you will see messages like this:
concat parallel corpora for dataset
build source dictionary for train data
build target dictionary for train data
dictionary size is XXX
Here, you can simply run the command:
```python
python preprocess.py -i data/wmt14 -d 30000
```
It will take several minutes, and store the preprocessed dataset in `demo/seqToseq/data/pre-wmt14`, the directory has following structure.
train test gen train.list test.list gen.list src.dict trg.dict
- **train, test, gen**: folder contains French-English parallel corpora of train data, test data and gen data respectively. Each line of file in folder contains two parts, the former is a French sequence, and the latter is a corresponding English sequence.
- **train.list, test.list, gen.list**: text contains a file list in train folder, test folder and gen folder respectively
- **src.dict, trg.dict**: source (French) / target (English) dictionary, each dictionary has 30000 words: the most frequent 29997 words and 3 special token
## Model Training ##
### Introduction ###
Neural machine translation (NMT) aims at building a single neural network that can be jointly tuned to maximize translation performance. Recently proposed NMT models often belong to a family of encoder–decoder models. Encoder-Decoder models encode a source sentence into a fixed-length vector from which a decoder generates a target sentence.
In this task, we use an extension to the encoder–decoder model which learns to align and translate jointly. Each time the model generates a word in a translation, it searches for a set of positions in the source sentence for the most relevant information. The decoder predicts a target word based on the context vectors associated with these source positions and all the previous generated target words. For more detailed explanation, readers can refer to paper [Neural Machine Translation by Jointly Learning to Align and Translate](http://arxiv.org/abs/1409.0473).
The most distinguishing feature of this model is that it doesn't encode an input sentence into a single fixed-length vector. Instead, it encodes the input sentence into a sequence of vectors, where one vector corresponds to an input element. A subset of these vectors is chosen adaptively while decoding the translated sentence. This frees a NMT model from having to squash all the information of a source sentence, regardless of its length, into a fixed-length vector. The improvement of this model is more apparent for longer sentences, but the improvement can be observed for sentences of any length.
<center>![](./encoder-decoder-attention-model.png)</center>
<center>Figure 1. Encoder-Decoder-Attention-Model</center>
### Training Model in PaddlePaddle ###
We need to create a model config file before training. Here is an example `demo/seqToseq/translation/train.conf`. The first three lines import python function for defining network, and define the job_mode and attention_mode.
```python
from seqToseq_net import *
is_generating = False
### Data Definiation
train_conf = seq_to_seq_data(data_dir = "./data/pre-wmt14",
is_generating = is_generating)
### Algorithm Configuration
settings(
learning_method = AdamOptimizer(),
batch_size = 50,
learning_rate = 5e-4)
### Network Architecture
gru_encoder_decoder(train_conf, is_generating)
```
1. **Data Definiation**: We define a SeqToSeq train and test data in our example. It returns train_conf as the configuration, following is its input arguments:
- data_dir: directory of train data and test data
- is\_generating: whether this config is used for generating, here is false
2. **Algorithm Configuration**: We use the SGD training algorithm (default), ADAM learning method in our example, specify batch_size as 50, and learning rate as 5e-4.
3. **Network Architecture**: We use an attention version of GRU Encoder-Decoder network in our example. It consists a bidirectional GRU as an encoder and a decoder that emulates searching through a source sentence during decoding a translation.
### Training Command and Result###
After writing the model config, we can train the model by running the command:
```bash
cd demo/seqToseq/translation
./train.sh
```
The `train.sh` is shown as follows:
```bash
paddle train \
--config='translation/train.conf' \
--save_dir='translation/model' \
--use_gpu=false \
--num_passes=16 \
--show_parameter_stats_period=100 \
--trainer_count=4 \
--log_period=10 \
--dot_period=5 \
2>&1 | tee 'translation/train.log'
```
- config: set config of neural network
- save_dir: set output path to save models
- use_gpu: whether to use GPU to train, here use CPU
- num_passes: set number of passes. One pass in paddle means training all samples in dataset one time
- show_parameter_stats_period: here show parameter statistic every 100 batches
- trainer_count: set number of CPU threads or GPU devices
- log_period: here print log every 10 batches
- dot_period: here print '.' every 5 batches
The training loss function is printed every 10 batch by default, and you will see messages like this:
I0719 19:16:45.952062 15563 TrainerInternal.cpp:160] Batch=10 samples=500 AvgCost=198.475 CurrentCost=198.475 Eval: classification_error_evaluator=0.737155 CurrentEval: classification_error_evaluator=0.737155
I0719 19:17:56.707319 15563 TrainerInternal.cpp:160] Batch=20 samples=1000 AvgCost=157.479 CurrentCost=116.483 Eval: classification_error_evaluator=0.698392 CurrentEval: classification_error_evaluator=0.659065
.....
- AvgCost: Average Cost from 0th batch to current batch
- CurrentCost: Cost in current batch
- classification\_error\_evaluator(Eval): False prediction rate for each word from 0th evaluation to current evaluation
- classification\_error\_evaluator(CurrentEval): False prediction rate for each word in current evaluation
And when the classification\_error\_evaluator is less than 0.35, the model is trained sucessfully.
## Text Generation ##
### Introduction ###
Generally speaking, the NMT model is conditioned on the encodings of the source sentence, and then to predict the next target word by given the current target word. In the training process, the current word is always knowns as the ground truth, by contrast. In the generating process, the current word is the output of the decoder in last time step, which is accessed to from a memory in PaddlePaddle.
Besides, we use Beam Search to generate sequences. Beam search uses breadth-first search to build its search tree. At each level of the tree, it generates all successors of the states at the current level, sorting them in increasing order of heuristic cost. However, it only stores a predetermined number of best states at each level (called the beam size).
### Pretrained model ###
We trained the model on a cluster with 50 nodes, each node has two 6-core CPUs. We trained 16 passes in 5 days, where each pass takes 7 hours. The model_dir has 16 sub-folder, each of which contains the whole model parameters with 202MB size. And we find pass-00012 model has the highest BLEU 27.77 (see paper [BLEU: a Method for Automatic Evaluation of Machine Translation](http://www.aclweb.org/anthology/P02-1040.pdf)). To download and extract this model, simply run the following commands in linux.
```bash
cd demo/seqToseq/data
./wmt14_model.sh
```
### Generating Model in PaddlePaddle ###
We need to create a model config file before translating French sequence. Here is an example `demo/seqToseq/translation/gen.conf`, the first three lines import python function for defining network, and define the job\_mode and attention\_mode.
```python
from seqToseq_net import *
is_generating = True
################## Data Definiation #####################
gen_conf = seq_to_seq_data(data_dir = "./data/pre-wmt14",
is_generating = is_generating,
gen_result = "./translation/gen_result")
############## Algorithm Configuration ##################
settings(
learning_method = AdamOptimizer(),
batch_size = 1,
learning_rate = 0)
################# Network configure #####################
gru_encoder_decoder(gen_conf, is_generating)
```
1. **Data Definiation**: We defines an SeqToSeq gen data in our example. It returns gen_conf as the configuration, following is its input arguments:
- data\_dir: directory of gen data
- is\_generating: whether this config is used for generating, here is false
- gen\_result: file to store the generation result
2. **Algorithm Configuration**: We use SGD traing algorithm in generation, and specify batch_size as 1 (each time generate one sequence), and learning rate as 0.
3. **Network Architecture**: Essentially the same as the training model.
### Generating Command and Result ###
After writing the model config, we can do text translation from French to English by running the command:
```bash
cd demo/seqToseq/translation
./gen.sh
```
The `gen.sh` is shown as follows, unlike training, there are some different arguments to specify:
```bash
paddle train \
--job=test \
--config='translation/gen.conf' \
--save_dir='data/wmt14_model' \
--use_gpu=true \
--num_passes=13 \
--test_pass=12 \
--trainer_count=1 \
2>&1 | tee 'translation/gen.log'
```
- job: set job mode to test
- num_passes and test_pass: loading model parameters from test_pass to (num_passes - 1), here only loads `data/wmt14_model/pass-00012`
You will see messages like this:
I0706 14:48:31.178915 31441 GradientMachine.cpp:143] Loading parameters from data/wmt14_model/pass-00012
I0706 14:48:40.012039 31441 Tester.cpp:125] Batch=100 samples=100 AvgCost=0
I0706 14:48:48.898632 31441 Tester.cpp:125] Batch=200 samples=200 AvgCost=0
...
And the generating result in `demo/seqToseq/translation/gen_result` likes:
0
0 -11.1314 The <unk> <unk> about the width of the seats while large controls are at stake <e>
1 -11.1519 The <unk> <unk> on the width of the seats while large controls are at stake <e>
2 -11.5988 The <unk> <unk> about the width of the seats while large controls are at stake . <e>
1
0 -24.4149 The dispute is between the major aircraft manufacturers about the width of the tourist seats on the <unk> flights , paving the way for a <unk> confrontation during the month of the Dubai <unk> . <e>
1 -26.9524 The dispute is between the major aircraft manufacturers about the width of the tourist seats on the <unk> flights , paving the way for a <unk> confrontation during the month of Dubai &apos; s <unk> . <e>
2 -27.9574 The dispute is between the major aircraft manufacturers about the width of the tourist seats on the <unk> flights , paving the way for a <unk> confrontation during the month of Dubai &apos; s Dubai <unk> . <e>
...
- This is the beam search result, where beam size is 3
- '0' in 1st-line and '1' in 6th-line mean the sequence-id in gen data
- Other six lines list the beam search results
- The 2nd-column is the score of beam search (from large to small)
- The 3rd-colunm is the generating English sequence
- There is 2 special tokens:
- `<e>`: the end of a sequence
- `<unk>`: a word not included in dictionary
### Bleu Evalutaion ###
Human evaluations of machine translation are extensive but expensive. Paper [BLEU: a Method for Automatic Evaluation of Machine Translation](http://www.aclweb.org/anthology/P02-1040.pdf) presents a method as an automated understudy to skilled human judges which substitutes for them when there is need for quick or frequent evaluations. [Moses](http://www.statmt.org/moses/) is a statistical machine translation system, and we use [multi-bleu.perl](https://github.com/moses-smt/mosesdecoder/blob/master/scripts/generic/multi-bleu.perl) of it to do Bleu Evalution. To download this script, simply run the following command:
```bash
cd demo/seqToseq/translation
./moses_bleu.sh
```
Since the standard translation is alrealy downloaded as `data/wmt14/gen/ntst14.trg`, we can do Bleu Evalution by running the command:
```bash
cd demo/seqToseq/translation
./eval_bleu.sh FILE BEAMSIZE
```
- FILE: the generation result file
- BEAMSIZE: expand width in beam search
PaddlePaddle Documentation
===================
User Guide
----------
* [Quick Start](demo/quick_start/index_en.md)
* [Build and Installation](build/index.rst)
* [Contribute Code](build/contribute_to_paddle.md)
* [User Interface](ui/index.md)
* [Source Code Documents](source/index.md)
* [Layer Documents](layer.md)
* [Trainer Config Helpers](ui/api/trainer_config_helpers/index.md)
* [Example and Demo](demo/index.md)
* [Cluster Train](cluster/index.md)
# Layer Documents
* [Layer Source Code Document](source/gserver/layers/index.rst)
* [Layer Python API Document](ui/api/trainer_config_helpers/layers_index.rst)
API
========
.. doxygenfile:: paddle/api/PaddleAPI.h
.. doxygenfile:: paddle/api/Internal.h
Cuda
=============
Dynamic Link Libs
--------------------------
hl_dso_loader.h
``````````````````
.. doxygenfile:: paddle/cuda/include/hl_dso_loader.h
GPU Resources
----------------
hl_cuda.ph
``````````````
.. doxygenfile:: paddle/cuda/include/hl_cuda.ph
hl_cuda.h
``````````````
.. doxygenfile:: paddle/cuda/include/hl_cuda.h
CUDA Wrapper
--------------
hl_cuda_cublas.h
``````````````````````
.. doxygenfile:: paddle/cuda/include/hl_cuda_cublas.h
hl_cuda_cudnn.h
``````````````````````
.. doxygenfile:: paddle/cuda/include/hl_cuda_cudnn.h
hl_cuda_cudnn.h
``````````````````````
.. doxygenfile:: paddle/cuda/include/hl_cuda_cudnn.ph
CUDA
====================
.. toctree::
:maxdepth: 3
cuda.rst
Matrix
====================
.. toctree::
:maxdepth: 3
matrix.rst
Matrix
=======
Base Matrix
-------------
hl_matrix.h
``````````````````
.. doxygenfile:: paddle/cuda/include/hl_matrix.h
hl_matrix_base.h
``````````````````
.. doxygenfile:: paddle/cuda/include/hl_matrix_base.cuh
hl_matrix_apply.cuh
``````````````````````
.. doxygenfile:: paddle/cuda/include/hl_matrix_apply.cuh
hl_matrix_ops.cuh
``````````````````````
.. doxygenfile:: paddle/cuda/include/hl_matrix_ops.cuh
hl_matrix_type.cuh
``````````````````````
.. doxygenfile:: paddle/cuda/include/hl_matrix_type.cuh
hl_sse_matrix_kernel.cuh
``````````````````````````
.. doxygenfile:: paddle/cuda/include/hl_sse_matrix_kernel.cuh
hl_batch_transpose.h
``````````````````````````
.. doxygenfile:: paddle/cuda/include/hl_batch_transpose.h
Sparse Matrix
--------------
hl_sparse.h
``````````````````
.. doxygenfile:: paddle/cuda/include/hl_sparse.h
hl_sparse.ph
``````````````````````
.. doxygenfile:: paddle/cuda/include/hl_sparse.ph
Others
---------------
hl_aggregate.h
``````````````````
.. doxygenfile:: paddle/cuda/include/hl_aggregate.h
hl_table_apply.h
``````````````````
.. doxygenfile:: paddle/cuda/include/hl_table_apply.h
hl_top_k.h
``````````````````
.. doxygenfile:: paddle/cuda/include/hl_top_k.h
RNN
====================
.. toctree::
:maxdepth: 3
rnn.rst
Neural Networks
==================
Base
-------
.. doxygenfile:: paddle/cuda/include/hl_gpu.h
.. doxygenfile:: paddle/cuda/include/hl_cnn.h
.. doxygenfile:: paddle/cuda/include/hl_functions.h
.. doxygenfile:: paddle/cuda/include/hl_avx_functions.h
.. doxygenfile:: paddle/cuda/include/hl_device_functions.cuh
.. doxygenfile:: paddle/cuda/include/hl_gpu_functions.cuh
Activation Functions
-----------------------
.. doxygenfile:: paddle/cuda/include/hl_activation_functions.h
RNN Related APIs
-----------------
.. doxygenfile:: paddle/cuda/include/hl_recurrent_apply.cuh
.. doxygenfile:: paddle/cuda/include/hl_sequence.h
LSTM Model
``````````````
.. doxygenfile:: paddle/cuda/include/hl_lstm.h
.. dpxygenfile:: paddle/cuda/include/hl_cpu_lstm.cuh
.. doxygenfile:: paddle/cuda/include/hl_gpu_lstm.cuh
.. doxygenfile:: paddle/cuda/include/hl_lstm_ops.cuh
GRU Model
````````````````
.. doxygenfile:: paddle/cuda/include/hl_gru_ops.cuh
.. doxygenfile:: paddle/cuda/include/hl_cpu_gru.cuh
.. doxygenfile:: paddle/cuda/include/hl_gpu_gru.cuh
Utils
====================
.. toctree::
:maxdepth: 3
utils.rst
Utilities
===========
HPPL Base
------------
hl_base.h
``````````````
.. doxygenfile:: paddle/cuda/include/hl_base.h
Timer
-----------
hl_time.h
``````````````
.. doxygenfile:: paddle/cuda/include/hl_time.h
Thread Resource
-----------
hl_thread.ph
``````````````
.. doxygenfile:: paddle/cuda/include/hl_thread.ph
Activations
=============
.. doxygenfile:: paddle/gserver/activations/ActivationFunction.h
.. doxygenfile:: paddle/gserver/activations/ActivationFunction.cpp
Data Providers
================
Base DataProvider
------------------
.. doxygenclass:: paddle::DataProvider
:members:
DataProviderGroup
-------------------
.. doxygenclass:: paddle::DataProviderGroup
:members:
MultiDataProvider
-------------------
.. doxygenclass:: paddle::MultiDataProvider
:members:
PyDataProvider
===================
IFieldScanner
-------------
.. doxygenclass:: paddle::IFieldScanner
:members:
DenseScanner
-------------
.. doxygenclass:: paddle::DenseScanner
:members:
IndexScanner
-------------
.. doxygenclass:: paddle::IndexScanner
:members:
SparseNonValueScanner
---------------------
.. doxygenclass:: paddle::SparseNonValueScanner
:members:
SparseValueScanner
------------------
.. doxygenclass:: paddle::SparseValueScanner
:members:
SequenceScanner
------------------
.. doxygenclass:: paddle::SparseValueScanner
:members:
IPyDataProviderCache
--------------------
.. doxygenclass:: paddle::IPyDataProviderCache
:members:
NoCacheStrategy
---------------
.. doxygenclass:: paddle::NoCacheStrategy
:members:
CacheOnePassInMemory
--------------------
.. doxygenclass:: paddle::CacheOnePassInMemory
:members:
IPyDataProvider
---------------
.. doxygenclass:: paddle::PyDataProvider2
:members:
Proto Data Provider
===================
ProtoDataProvider
----------------
.. doxygenclass:: paddle::ProtoDataProvider
:members:
ProtoSequenceDataProvider
----------------
.. doxygenclass:: paddle::ProtoSequenceDataProvider
:members:
Data Providers Documents
==========================
.. toctree::
:maxdepth: 3
dataproviders.rst
Base Evaluator
==============
Evaluator
---------
.. doxygenclass:: paddle::Evaluator
:members:
Utils
=====
SumEvaluator
------------
.. doxygenclass:: paddle::SumEvaluator
:members:
ColumnSumEvaluator
------------------
.. doxygenclass:: paddle::ColumnSumEvaluator
:members:
Classification
==============
ClassificationErrorEvaluator
---------------------------
.. doxygenclass:: paddle::ClassificationErrorEvaluator
:members:
SequenceClassificationErrorEvaluator
------------------------------------
.. doxygenclass:: paddle::SequenceClassificationErrorEvaluator
:members:
AucEvaluator
-------------
.. doxygenclass:: paddle::AucEvaluator
:members:
PrecisionRecallEvaluator
------------------------
.. doxygenclass:: paddle::PrecisionRecallEvaluator
:members:
ChunkEvaluator
--------------
.. doxygenclass:: paddle::ChunkEvaluator
:members:
CTCEvaluator
------------
.. doxygenclass:: paddle::CTCErrorEvaluator
:members:
Rank
====
PnpairEvaluator
-------------
.. doxygenclass:: paddle::PnpairEvaluator
:members:
AucEvaluator
-------------
.. doxygenclass:: paddle::RankAucEvaluator
:members:
Printer
=======
ValuePrinter
-------------
.. doxygenclass:: paddle::ValuePrinter
:members:
GradientPrinter
---------------
.. doxygenclass:: paddle::GradientPrinter
:members:
MaxIdPrinter
------------
.. doxygenclass:: paddle::MaxIdPrinter
:members:
MaxFramePrinter
---------------
.. doxygenclass:: paddle::MaxFramePrinter
:members:
SequenceTextPrinter
------------------
.. doxygenclass:: paddle::SequenceTextPrinter
:members:
ClassificationErrorPrinter
--------------------------
.. doxygenclass:: paddle::ClassificationErrorPrinter
:members:
Evaluators
==========
.. toctree::
:maxdepth: 3
evaluators.rst
Gradient Machines
================
GradientMachine
---------------------
.. doxygenclass:: paddle::GradientMachine
:members:
GradientMachineModel
--------------------
.. doxygenclass:: paddle::IGradientMachineMode
:members:
MultiGradientMachine
---------------------
.. doxygenclass:: paddle::MultiGradientMachine
:members:
TrainerThread
`````````````
.. doxygenclass:: paddle::TrainerThread
:members:
Recurrent Gradient Machines
---------------------------
.. doxygenclass:: paddle::RecurrentGradientMachine
:members:
Networks
========
NeuralNetwork
-------------
.. doxygenclass:: paddle::NeuralNetwork
:members:
ParallelNeuralNetwork
---------------------
.. doxygenclass:: paddle::ParallelNeuralNetwork
:members:
Gradient Machines Documents
=============================
.. toctree::
:maxdepth: 3
gradientmachines.rst
Layers Documents
====================
.. toctree::
:maxdepth: 3
layer.rst
Base
======
Layer
-----
.. doxygenclass:: paddle::Layer
:members:
Projection
----------
.. doxygenclass:: paddle::Projection
:members:
Operator
--------
.. doxygenclass:: paddle::Operator
:members:
Data Layer
===========
.. doxygenclass:: paddle::DataLayer
:members:
Fully Connected Layers
======================
FullyConnectedLayer
-------------------
.. doxygenclass:: paddle::FullyConnectedLayer
:members:
SelectiveFullyConnectedLayer
----------------------------
.. doxygenclass:: paddle::SelectiveFullyConnectedLayer
:members:
Conv Layers
===========
ConvBaseLayer
-------------
.. doxygenclass:: paddle::ConvBaseLayer
:members:
ConvOperator
------------
.. doxygenclass:: paddle::ConvOperator
:members:
ConvShiftLayer
--------------
.. doxygenclass:: paddle::ConvShiftLayer
:members:
CudnnConvLayer
--------------
.. doxygenclass:: paddle::CudnnConvLayer
:members:
ExpandConvLayer
---------------
.. doxygenclass:: paddle::ExpandConvLayer
:members:
ContextProjection
-----------------
.. doxygenclass:: paddle::ContextProjection
:members:
Pooling Layers
==============
PoolLayer
---------
.. doxygenclass:: paddle::PoolLayer
:members:
PoolProjectionLayer
-------------------
.. doxygenclass:: paddle::PoolProjectionLayer
:members:
CudnnPoolLayer
--------------
.. doxygenclass:: paddle::CudnnPoolLayer
:members:
Norm Layers
===========
NormLayer
---------
.. doxygenclass:: paddle::NormLayer
:members:
CMRProjectionNormLayer
----------------------
.. doxygenclass:: paddle::CMRProjectionNormLayer
:members:
DataNormLayer
-------------
.. doxygenclass:: paddle::DataNormLayer
:members:
ResponseNormLayer
-----------------
.. doxygenclass:: paddle::ResponseNormLayer
:members:
BatchNormBaseLayer
------------------
.. doxygenclass:: paddle::BatchNormBaseLayer
:members:
BatchNormalizationLayer
-----------------------
.. doxygenclass:: paddle::BatchNormalizationLayer
:members:
CudnnBatchNormLayer
-----------------------
.. doxygenclass:: paddle::CudnnBatchNormLayer
:members:
SumToOneNormLayer
-----------------
.. doxygenclass:: paddle::SumToOneNormLayer
:members:
Activation Layer
================
ParameterReluLayer
------------------
.. doxygenclass:: paddle::ParameterReluLayer
:members:
Recurrent Layers
================
RecurrentLayer
--------------
.. doxygenclass:: paddle::RecurrentLayer
:members:
SequenceToBatch
---------------
.. doxygenclass:: paddle::SequenceToBatch
:members:
LSTM
----
LstmLayer
`````````
.. doxygenclass:: paddle::LstmLayer
:members:
LstmStepLayer
`````````````
.. doxygenclass:: paddle::LstmStepLayer
:members:
LstmCompute
```````````
.. doxygenclass:: paddle::LstmCompute
:members:
MDLSTM
------
MDLstmLayer
```````````
.. doxygenclass:: paddle::MDLstmLayer
:members:
CoordIterator
`````````````
.. doxygenclass:: paddle::CoordIterator
:members:
GRU
---
GatedRecurrentLayer
```````````````````
.. doxygenclass:: paddle::GatedRecurrentLayer
:members:
GruStepLayer
````````````
.. doxygenclass:: paddle::GruStepLayer
:members:
GruCompute
``````````
.. doxygenclass:: paddle::GruCompute
:members:
Recurrent Layer Group
=====================
AgentLayer
----------
.. doxygenclass:: paddle::AgentLayer
:members:
SequenceAgentLayer
------------------
.. doxygenclass:: paddle::SequenceAgentLayer
:members:
GatherAgentLayer
----------------
.. doxygenclass:: paddle::GatherAgentLayer
:members:
SequenceGatherAgentLayer
------------------------
.. doxygenclass:: paddle::SequenceGatherAgentLayer
:members:
ScatterAgentLayer
-----------------
.. doxygenclass:: paddle::ScatterAgentLayer
:members:
SequenceScatterAgentLayer
-------------------------
.. doxygenclass:: paddle::SequenceScatterAgentLayer
:members:
GetOutputLayer
--------------
.. doxygenclass:: paddle::GetOutputLayer
:members:
Mixed Layer
===========
.. doxygenclass:: paddle::MixedLayer
:members:
DotMulProjection
----------------
.. doxygenclass:: paddle::DotMulProjection
:members:
DotMulOperator
--------------
.. doxygenclass:: paddle::DotMulOperator
:members:
FullMatrixProjection
--------------------
.. doxygenclass:: paddle::FullMatrixProjection
:members:
IdentityProjection
------------------
.. doxygenclass:: paddle::IdentityProjection
:members:
IdentityOffsetProjection
------------------------
.. doxygenclass:: paddle::IdentityOffsetProjection
:members:
TableProjection
---------------
.. doxygenclass:: paddle::TableProjection
:members:
TransposedFullMatrixProjection
------------------------------
.. doxygenclass:: paddle::TransposedFullMatrixProjection
:members:
Aggregate Layers
================
Aggregate
---------
AverageLayer
````````````
.. doxygenclass:: paddle::AverageLayer
:members:
MaxLayer
````````
.. doxygenclass:: paddle::MaxLayer
:members:
SequenceLastInstanceLayer
`````````````````````````
.. doxygenclass:: paddle::SequenceLastInstanceLayer
:members:
Concat
------
ConcatenateLayer
````````````````
.. doxygenclass:: paddle::ConcatenateLayer
:members:
ConcatenateLayer2
`````````````````
.. doxygenclass:: paddle::ConcatenateLayer2
:members:
SequenceConcatLayer
```````````````````
.. doxygenclass:: paddle::SequenceConcatLayer
:members:
Subset
------
SubSequenceLayer
````````````````
.. doxygenclass:: paddle::SubSequenceLayer
:members:
Reshaping Layers
================
BlockExpandLayer
----------------
.. doxygenclass:: paddle::BlockExpandLayer
:members:
ExpandLayer
-----------
.. doxygenclass:: paddle::ExpandLayer
:members:
FeatureMapExpandLayer
---------------------
.. doxygenclass:: paddle::FeatureMapExpandLayer
:members:
ResizeLayer
-----------
.. doxygenclass:: paddle::ResizeLayer
:members:
SequenceReshapeLayer
--------------------
.. doxygenclass:: paddle::SequenceReshapeLayer
:members:
Math Layers
===========
AddtoLayer
----------
.. doxygenclass:: paddle::AddtoLayer
:members:
ConvexCombinationLayer
----------------------
.. doxygenclass:: paddle::ConvexCombinationLayer
:members:
InterpolationLayer
------------------
.. doxygenclass:: paddle::InterpolationLayer
:members:
MultiplexLayer
--------------
.. doxygenclass:: paddle::MultiplexLayer
:members:
OuterProdLayer
--------------
.. doxygenclass:: paddle::OuterProdLayer
:members:
PowerLayer
----------
.. doxygenclass:: paddle::PowerLayer
:members:
ScalingLayer
------------
.. doxygenclass:: paddle::ScalingLayer
:members:
SlopeInterceptLayer
-------------------
.. doxygenclass:: paddle::SlopeInterceptLayer
:members:
TensorLayer
------------
.. doxygenclass:: paddle::TensorLayer
:members:
TransLayer
----------
.. doxygenclass:: paddle::TransLayer
:members:
Sampling Layers
===============
MultinomialSampler
------------------
.. doxygenclass:: paddle::MultinomialSampler
:members:
MaxIdLayer
----------
.. doxygenclass:: paddle::MaxIdLayer
:members:
SamplingIdLayer
---------------
.. doxygenclass:: paddle::SamplingIdLayer
:members:
Cost Layers
===========
CostLayer
-----------
.. doxygenclass:: paddle::CostLayer
:members:
HuberTwoClass
`````````````
.. doxygenclass:: paddle::HuberTwoClass
:members:
LambdaCost
```````````
.. doxygenclass:: paddle::LambdaCost
:members:
MultiBinaryLabelCrossEntropy
````````````````````````````
.. doxygenclass:: paddle::MultiBinaryLabelCrossEntropy
:members:
MultiClassCrossEntropy
```````````````````````
.. doxygenclass:: paddle::MultiClassCrossEntropy
:members:
MultiClassCrossEntropyWithSelfNorm
``````````````````````````````````
.. doxygenclass:: paddle::MultiClassCrossEntropyWithSelfNorm
:members:
RankingCost
```````````
.. doxygenclass:: paddle::RankingCost
:members:
SoftBinaryClassCrossEntropy
```````````````````````````
.. doxygenclass:: paddle::SoftBinaryClassCrossEntropy
:members:
SumOfSquaresCostLayer
`````````````````````
.. doxygenclass:: paddle::SumOfSquaresCostLayer
:members:
CosSimLayer
-----------
.. doxygenclass:: paddle::CosSimLayer
:members:
CosSimVecMatLayer
-----------------
.. doxygenclass:: paddle::CosSimVecMatLayer
:members:
CRFDecodingLayer
----------------
.. doxygenclass:: paddle::CRFDecodingLayer
:members:
CRFLayer
--------
.. doxygenclass:: paddle::CRFLayer
:members:
CTCLayer
--------
.. doxygenclass:: paddle::CTCLayer
:members:
HierarchicalSigmoidLayer
------------------------
.. doxygenclass:: paddle::HierarchicalSigmoidLayer
:members:
LinearChainCRF
--------------
.. doxygenclass:: paddle::LinearChainCRF
:members:
LinearChainCTC
--------------
.. doxygenclass:: paddle::LinearChainCTC
:members:
NCELayer
--------
.. doxygenclass:: paddle::NCELayer
:members:
Validation Layers
-----------------
ValidationLayer
```````````````
.. doxygenclass:: paddle::ValidationLayer
:members:
AucValidation
`````````````
.. doxygenclass:: paddle::AucValidation
:members:
PnpairValidation
````````````````
.. doxygenclass:: paddle::PnpairValidation
:members:
Check Layers
============
EosIdCheckLayer
---------------
.. doxygenclass:: paddle::EosIdCheckLayer
:members:
# Source Code Documents
## cuda
- [CUDA](cuda/cuda/index.rst)
- [Matrix](cuda/matrix/index.rst)
- [RNN](cuda/rnn/index.rst)
- [Utils](cuda/utils/index.rst)
## gserver
- [Activations](gserver/activations/index.rst)
- [Data Providers](gserver/dataprovider/index.rst)
- [Evaluators](gserver/evaluators/index.rst)
- [Gradient Machines](gserver/gradientmachines/index.rst)
- [Layers](gserver/layers/index.rst)
## math
- [Matrix](math/matrix/index.rst)
- [Utils](math/utils/index.rst)
## parameter
- [Parameter](parameter/parameter/index.rst)
- [Update](parameter/update/index.rst)
- [Optimizer](parameter/optimizer/index.rst)
## pserver
- [Client](pserver/client/index.rst)
- [Network](pserver/network/index.rst)
- [Server](pserver/server/index.rst)
## trainer
- [Trainer](trainer/trainer.rst)
## api
- [API](api/api.rst)
## utils
- [CustomStackTrace](utils/customStackTrace.rst)
- [Enumeration wrapper](utils/enum.rst)
- [Lock](utils/lock.rst)
- [Queue](utils/queue.rst)
- [Thread](utils/thread.rst)
Matrix Documents
====================
.. toctree::
:maxdepth: 3
matrix.rst
Matrix
=======
Base
--------
.. doxygenfile:: paddle/math/BaseMatrix.h
Sparse Matrix
----------------
.. doxygenfile:: paddle/math/Matrix.h
.. doxygenfile:: paddle/math/Vector.h
.. doxygenfile:: paddle/math/MathUtils.h
.. doxygenfile:: paddle/math/SparseMatrix.h
.. doxygenfile:: paddle/math/SparseRowMatrix.h
.. doxygenfile:: paddle/math/CpuSparseMatrix.h
Others
----------
.. doxygenfile:: paddle/math/MathFunctions.h
.. doxygenfile:: paddle/math/SIMDFunctions.h
Utils Documents
====================
.. toctree::
:maxdepth: 3
utils.rst
Utils
=======
Bits
-------
.. doxygenfile:: paddle/math/Bits.h
Memory Handle
--------------
.. doxygenfile:: paddle/math/MemoryHandle.h
.. doxygenfile:: paddle/math/Allocator.h
.. doxygenfile:: paddle/math/PoolAllocator.h
.. doxygenfile:: paddle/math/Storage.h
Parameter Documents
====================
.. toctree::
:maxdepth: 3
optimizer.rst
Optimizer
============
.. doxygenfile:: paddle/parameter/FirstOrderOptimizer.h
.. doxygenfile:: paddle/parameter/AverageOptimizer.h
.. doxygenfile:: paddle/parameter/ParameterOptimizer.h
.. doxygenfile:: paddle/parameter/OptimizerWithRegularizer.h
Parameter Documents
====================
.. toctree::
:maxdepth: 3
parameter.rst
Parameter
=============
Weight
--------
.. doxygenfile:: paddle/parameter/Weight.h
Regularizer
------------
.. doxygenfile:: paddle/parameter/Regularizer.h
Parameter
-------------
.. doxygenfile:: paddle/parameter/Argument.h
.. doxygenfile:: paddle/parameter/Parameter.h
.. doxygenfile:: paddle/parameter/ParallelParameter.h
Parameter Documents
====================
.. toctree::
:maxdepth: 3
update.rst
Update
==========
.. doxygenfile:: paddle/parameter/ParameterUpdaterBase.h
.. doxygenfile:: paddle/parameter/ParameterUpdaterHook.h
.. doxygenfile:: paddle/parameter/ParameterUpdateFunctions.h
Client
=========
.. doxygenclass:: paddle::BaseClient
:members:
:protected-members:
:private-members:
:undoc-members:
.. doxygenclass:: paddle::ParameterClient2
:members:
:protected-members:
:private-members:
:undoc-members:
Client Documents
====================
.. toctree::
:maxdepth: 3
client.rst
Network Documents
====================
.. toctree::
:maxdepth: 3
network.rst
Network
==========
Socket Server
----------------
.. doxygenclass:: paddle::SocketServer
:members:
:protected-members:
:private-members:
:undoc-members:
Socket Worker
----------------
.. doxygenclass:: paddle::SocketWorker
:members:
:protected-members:
:private-members:
:undoc-members:
Socket Client
----------------
.. doxygenclass:: paddle::SocketClient
:members:
:protected-members:
:private-members:
:undoc-members:
Socket Channel
---------------
.. doxygenclass:: paddle::SocketChannel
:members:
:protected-members:
:private-members:
:undoc-members:
Message Reader
---------------
.. doxygenclass:: paddle::MsgReader
:members:
:protected-members:
:private-members:
:undoc-members:
Server Documents
====================
.. toctree::
:maxdepth: 3
server.rst
Server
==========
.. doxygenclass:: paddle::ProtoServer
:members:
:protected-members:
:private-members:
:undoc-members:
.. doxygenclass:: paddle::ParameterServer2
:members:
:protected-members:
:private-members:
:undoc-members:
Trainer
=======
TrainerStats
------------
.. doxygenclass:: paddle::TrainerStats
:members:
RemoteParameterUpdater
-----------------------
.. doxygenclass:: paddle::RemoteParameterUpdater
:members:
ConcurrentRemoteParameterUpdater
---------------------------------
.. doxygenclass:: paddle::ConcurrentRemoteParameterUpdater
:members:
SparseRemoteParameterUpdater
----------------------------
.. doxygenclass:: paddle::SparseRemoteParameterUpdater
:members:
SparseRemoteParameterUpdaterComposite
-------------------------------------
.. doxygenclass:: paddle::SparseRemoteParameterUpdaterComposite
:members:
CustomStackTrace
================
class CustomStackTrace
----------------------
.. doxygenclass:: paddle::CustomStackTrace
:members:
enumeration_wrapper
===================
namespace paddle::enumeration_wrapper
-------------------------------------
.. doxygennamespace:: paddle::enumeration_wrapper
Thread
======
class Thread
------------
.. doxygenclass:: paddle::Thread
:members:
class ThreadWorker
------------------
.. doxygenclass:: paddle::ThreadWorker
:members:
class SyncThreadPool
--------------------
.. doxygenclass:: paddle::SyncThreadPool
:members:
class MultiThreadWorker
-----------------------
.. doxygenclass:: paddle::MultiThreadWorker
:members:
class AsyncThreadPool
---------------------
.. doxygenclass:: paddle::AsyncThreadPool
:members:
Queue
=====
class Queue
------------
.. doxygenclass:: paddle::Queue
:members:
class BlockingQueue
-------------------
.. doxygenclass:: paddle::BlockingQueue
:members:
Lock
====
class RWLock
------------
.. doxygenclass:: paddle::RWLock
:members:
class ReadLockGuard
-------------------
.. doxygenclass:: paddle::ReadLockGuard
:members:
class SpinLock
--------------
.. doxygenclass:: paddle::SpinLock
:members:
class Semaphore
---------------
.. doxygenclass:: paddle::Semaphore
:members:
class ThreadBarrier
-------------------
.. doxygenclass:: paddle::ThreadBarrier
:members:
class LockedCondition
---------------------
.. doxygenclass:: paddle::LockedCondition
:members:
PyDataProviderWrapper API
=========================
.. automodule:: paddle.trainer.PyDataProviderWrapper
:members:
此差异已折叠。
BaseActivation
==============
.. automodule:: paddle.trainer_config_helpers.activations
:members: BaseActivation
:noindex:
AbsActivation
===============
.. automodule:: paddle.trainer_config_helpers.activations
:members: AbsActivation
:noindex:
IdentityActivation
==================
.. automodule:: paddle.trainer_config_helpers.activations
:members: IdentityActivation
:noindex:
LinearActivation
==================
.. automodule:: paddle.trainer_config_helpers.activations
:members: LinearActivation
:noindex:
SquareActivation
================
.. automodule:: paddle.trainer_config_helpers.activations
:members: SquareActivation
:noindex:
SigmoidActivation
=================
.. automodule:: paddle.trainer_config_helpers.activations
:members: SigmoidActivation
:noindex:
SoftmaxActivation
=================
.. automodule:: paddle.trainer_config_helpers.activations
:members: SoftmaxActivation
:noindex:
SequenceSoftmaxActivation
=========================
.. automodule:: paddle.trainer_config_helpers.activations
:members: SequenceSoftmaxActivation
:noindex:
ReluActivation
==============
.. automodule:: paddle.trainer_config_helpers.activations
:members: ReluActivation
:noindex:
BReluActivation
===============
.. automodule:: paddle.trainer_config_helpers.activations
:members: BReluActivation
:noindex:
SoftReluActivation
==================
.. automodule:: paddle.trainer_config_helpers.activations
:members: SoftReluActivation
:noindex:
TanhActivation
==============
.. automodule:: paddle.trainer_config_helpers.activations
:members: TanhActivation
:noindex:
STanhActivation
===============
.. automodule:: paddle.trainer_config_helpers.activations
:members: STanhActivation
:noindex:
Activations
===========
.. toctree::
:maxdepth: 3
activations.rst
Parameter and Extra Layer Attribute
===================================
.. automodule:: paddle.trainer_config_helpers.attrs
:members:
DataSources
===========
.. automodule:: paddle.trainer_config_helpers.data_sources
:members:
Base
====
.. automodule:: paddle.trainer_config_helpers.evaluators
:members: evaluator_base
:noindex:
Classification
==============
classification_error_evaluator
------------------------------
.. automodule:: paddle.trainer_config_helpers.evaluators
:members: classification_error_evaluator
:noindex:
auc_evaluator
-------------
.. automodule:: paddle.trainer_config_helpers.evaluators
:members: auc_evaluator
:noindex:
ctc_error_evaluator
-------------------
.. automodule:: paddle.trainer_config_helpers.evaluators
:members: ctc_error_evaluator
:noindex:
chunk_evaluator
---------------
.. automodule:: paddle.trainer_config_helpers.evaluators
:members: chunk_evaluator
:noindex:
precision_recall_evaluator
--------------------------
.. automodule:: paddle.trainer_config_helpers.evaluators
:members: precision_recall_evaluator
:noindex:
Rank
====
pnpair_evaluator
----------------
.. automodule:: paddle.trainer_config_helpers.evaluators
:members: pnpair_evaluator
:noindex:
Utils
=====
sum_evaluator
-------------
.. automodule:: paddle.trainer_config_helpers.evaluators
:members: sum_evaluator
:noindex:
column_sum_evaluator
--------------------
.. automodule:: paddle.trainer_config_helpers.evaluators
:members: column_sum_evaluator
:noindex:
Print
=====
classification_error_printer_evaluator
--------------------------------------
.. automodule:: paddle.trainer_config_helpers.evaluators
:members: classification_error_printer_evaluator
:noindex:
gradient_printer_evaluator
--------------------------
.. automodule:: paddle.trainer_config_helpers.evaluators
:members: gradient_printer_evaluator
:noindex:
maxid_printer_evaluator
-----------------------
.. automodule:: paddle.trainer_config_helpers.evaluators
:members: maxid_printer_evaluator
:noindex:
maxframe_printer_evaluator
---------------------------
.. automodule:: paddle.trainer_config_helpers.evaluators
:members: maxframe_printer_evaluator
:noindex:
seqtext_printer_evaluator
-------------------------
.. automodule:: paddle.trainer_config_helpers.evaluators
:members: seqtext_printer_evaluator
:noindex:
value_printer_evaluator
-----------------------
.. automodule:: paddle.trainer_config_helpers.evaluators
:members: value_printer_evaluator
:noindex:
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册