Merge branch 'develop' of https://github.com/PaddlePaddle/models into caffe2fluid

support lenet and resnet conertion

Merge branch 'develop' of https://github.com/PaddlePaddle/models into caffe2fluid
support lenet and resnet conertion
e15c71cd · liuyang11 · bf954e8d · 40cc7e4f · e15c71cd · e15c71cd
87 changed file
--- a/.travis/unittest.sh
+++ b/.travis/unittest.sh
@@ -24,13 +24,24 @@ unittest(){
 trap 'abort' 0
 set -e
-for proj in */ ; do
+for proj in * ; do
+    if [ -d $proj ]; then
+        if [ "$proj" = "fluid" ]; then
+            for proj in fluid/* ; do
                if [ -d $proj ]; then
                    unittest $proj
                    if [ $? != 0 ]; then
                        exit 1
                    fi
                fi
+            done
+        else
+            unittest $proj
+            if [ $? != 0 ]; then
+                exit 1
+            fi
+        fi
+    fi
 done
 trap : 0
--- a/conv_seq2seq/README.md
+++ b/conv_seq2seq/README.md
+The minimum PaddlePaddle version needed for the code sample in this directory is v0.11.0. If you are on a version of PaddlePaddle earlier than v0.11.0, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
+---
 # Convolutional Sequence to Sequence Learning
 This model implements the work in the following paper:

--- a/ctr/README.cn.md
+++ b/ctr/README.cn.md
+运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此要求，请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
+---
 # 点击率预估
 以下是本例目录包含的文件以及对应说明:

--- a/ctr/README.md
+++ b/ctr/README.md
+The minimum PaddlePaddle version needed for the code sample in this directory is v0.10.0. If you are on a version of PaddlePaddle earlier than v0.10.0, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
+---
 # Click-Through Rate Prediction
 ## Introduction

--- a/deep_fm/README.md
+++ b/deep_fm/README.md
+The minimum PaddlePaddle version needed for the code sample in this directory is v0.11.0. If you are on a version of PaddlePaddle earlier than v0.11.0, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
+---
 # Deep Factorization Machine for Click-Through Rate prediction
 ## Introduction

--- a/dssm/README.cn.md
+++ b/dssm/README.cn.md
+运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此版本要求，请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
+---
 # 深度结构化语义模型 (Deep Structured Semantic Models, DSSM)
 DSSM使用DNN模型在一个连续的语义空间中学习文本低纬的表示向量，并且建模两个句子间的语义相似度。本例演示如何使用PaddlePaddle实现一个通用的DSSM 模型，用于建模两个字符串间的语义相似度，模型实现支持通用的数据格式，用户替换数据便可以在真实场景中使用该模型。

--- a/dssm/README.md
+++ b/dssm/README.md
+The minimum PaddlePaddle version needed for the code sample in this directory is v0.10.0. If you are on a version of PaddlePaddle earlier than v0.10.0, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
+---
 # Deep Structured Semantic Models (DSSM)
 Deep Structured Semantic Models (DSSM) is simple but powerful DNN based model for matching web search queries and the URL based documents. This example demonstrates how to use PaddlePaddle to implement a generic DSSM model for modeling the semantic similarity between two strings.

--- a/fluid/DeepASR/README.md
+++ b/fluid/DeepASR/README.md
-Deep ASR Kickoff
+The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
+---
+### TODO
+This project is still under active development.
--- a/fluid/DeepASR/data_utils/__init__.py
+++ b/fluid/DeepASR/data_utils/__init__.py
--- a/fluid/DeepASR/data_utils/augmentor/__init__.py
+++ b/fluid/DeepASR/data_utils/augmentor/__init__.py
--- a/fluid/DeepASR/data_utils/augmentor/tests/__init__.py
+++ b/fluid/DeepASR/data_utils/augmentor/tests/__init__.py
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm
+import data_utils.augmentor.trans_add_delta as trans_add_delta
+import data_utils.augmentor.trans_splice as trans_splice
--- a/fluid/DeepASR/data_utils/augmentor/tests/data/global_mean_var_search26kHr
+++ b/fluid/DeepASR/data_utils/augmentor/tests/data/global_mean_var_search26kHr
+16.2845556399 11.6891798673
+17.21509949 12.3788567902
+18.1143704548 14.9912618017
+19.2335963752 18.5419556172
+19.9266772451 21.2768220522
+19.8245737202 21.2347210705
+19.5432940972 20.2784036567
+19.4631271754 20.2934452329
+19.3929919324 20.457971868
+19.2924788362 20.3626439234
+18.9207244502 19.9196569759
+18.7202605641 19.5920276899
+18.4844279398 19.2068349019
+18.2670948624 18.8716893824
+18.0929628855 18.5439666541
+17.8428896026 18.0255891747
+17.6646850635 17.473764296
+17.4955705896 16.8966859471
+17.3706720293 16.4294027467
+17.2530867792 16.0514717623
+17.1304341172 15.7234699057
+17.0038353287 15.4344471514
+16.902550309 15.1603287337
+16.8375590047 14.9304337826
+16.816287853 14.9119310513
+16.828838265 15.0930023024
+16.8602209498 15.3771992423
+16.9101763812 15.6897991789
+16.9466065143 15.9364556489
+16.9486061956 16.0699417826
+16.9041374104 16.0796970272
+16.8410093699 16.0111444599
+16.7045718836 15.7991985601
+16.51128489 15.5208920129
+16.3253910608 15.2603181921
+16.1297317333 14.9499965958
+15.903428372 14.5958280409
+15.6131718105 14.2709618
+15.1395035533 13.9993939893
+14.4298229999 13.3841189151
+0.0034970565424 0.246184766149
+0.00501284154705 0.238484972472
+0.00605942680019 0.269064381708
+0.00687266156243 0.319479238011
+0.00734065019253 0.371947383205
+0.00718807218417 0.384426479694
+0.00652195540212 0.384676838281
+0.00660416525951 0.395543910317
+0.00680202057642 0.400803979681
+0.00659144183007 0.393228973031
+0.00605294530423 0.385021118038
+0.00590452969394 0.361763039625
+0.00612315374687 0.346777773373
+0.00582354093973 0.335802403976
+0.00574556002554 0.320733728218
+0.00612254485891 0.310153103033
+0.00626733043219 0.299854747445
+0.00567398408041 0.293353685493
+0.00519236700706 0.287668810947
+0.00529581474367 0.281479660772
+0.00479019484082 0.27451415777
+0.00486381039428 0.266294391154
+0.00491126372868 0.258105116126
+0.00452105305011 0.252926328298
+0.00531483334271 0.250910887373
+0.00546572110469 0.253302256977
+0.00479544857908 0.258484183394
+0.00422106426297 0.264582900173
+0.00401824135188 0.268467945623
+0.0041705465252 0.269699480291
+0.00405239564143 0.270406162975
+0.0040059737566 0.270407601782
+0.00406426729317 0.267951582656
+0.00416613791013 0.264543833042
+0.00427847607653 0.26247798891
+0.00428050903034 0.259635263243
+0.00454842971786 0.255829377617
+0.00393747552387 0.253802307025
+0.00374143688909 0.251011478787
+0.00335475310258 0.236543650856
+0.000373194755312 0.0419494800709
+0.000230909648678 0.0394102370205
+0.000150840015851 0.0414956922398
+8.44401840771e-05 0.0460502231327
+-6.24759314572e-06 0.0528049937739
+-8.82957758148e-05 0.055711244886
+1.16795791952e-05 0.0563188428833
+-1.68716267856e-05 0.0575232763711
+-0.000112625308645 0.057979929947
+-0.000122619090002 0.0564126233493
+1.73569637319e-05 0.05522573909
+6.49872782342e-05 0.0507353361334
+4.17746389178e-05 0.0479568131253
+5.13884475653e-05 0.0461253238047
+1.8860115143e-05 0.0436860476919
+-5.64317701105e-05 0.042516381059
+-0.000136859948115 0.0413574820205
+-7.00847019726e-05 0.0409516370727
+-5.39392223336e-05 0.040441504085
+-9.24897162815e-05 0.0397800398173
+4.7104970622e-05 0.039046286243
+6.24805896165e-06 0.0380185986602
+-2.35272813418e-05 0.036851063786
+5.88344154127e-05 0.0361640489242
+-8.39162076993e-05 0.0357639427311
+-0.000108702805776 0.0358774639538
+3.22013961834e-06 0.0363644530435
+9.43501518394e-05 0.0370309934774
+0.000134406229423 0.0374972993343
+3.84007008533e-05 0.037676222515
+3.05989328157e-05 0.0379111939182
+9.52201629091e-05 0.0380927209106
+0.000102126083729 0.0379925358499
+6.98628072264e-05 0.0377276252241
+4.55782256339e-05 0.0375165468654
+4.76370987786e-05 0.0371482526345
+-2.24128832709e-05 0.0366810742947
+0.000125621306953 0.036628355271
+0.000134568666093 0.0364860461759
+0.000159858844464 0.0345583593149
--- a/fluid/DeepASR/data_utils/augmentor/tests/test_data_trans.py
+++ b/fluid/DeepASR/data_utils/augmentor/tests/test_data_trans.py
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import sys
+import unittest
+import numpy as np
+import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm
+import data_utils.augmentor.trans_add_delta as trans_add_delta
+import data_utils.augmentor.trans_splice as trans_splice
+class TestTransMeanVarianceNorm(unittest.TestCase):
+    """unit test for TransMeanVarianceNorm
+    """
+    def setUp(self):
+        self._file_path = "./data_utils/augmentor/tests/data/" \
+                          "global_mean_var_search26kHr"
+    def test(self):
+        feature = np.zeros((2, 120), dtype="float32")
+        feature.fill(1)
+        trans = trans_mean_variance_norm.TransMeanVarianceNorm(self._file_path)
+        (feature1, label1) = trans.perform_trans((feature, None))
+        (mean, var) = trans.get_mean_var()
+        feature_flat1 = feature1.flatten()
+        feature_flat = feature.flatten()
+        one = np.ones((1), dtype="float32")
+        for idx, val in enumerate(feature_flat1):
+            cur_idx = idx % 120
+            self.assertAlmostEqual(val, (one[0] - mean[cur_idx]) * var[cur_idx])
+class TestTransAddDelta(unittest.TestCase):
+    """unit test TestTransAddDelta
+    """
+    def test_regress(self):
+        """test regress
+        """
+        feature = np.zeros((14, 120), dtype="float32")
+        feature[0:5, 0:40].fill(1)
+        feature[0 + 5, 0:40].fill(1)
+        feature[1 + 5, 0:40].fill(2)
+        feature[2 + 5, 0:40].fill(3)
+        feature[3 + 5, 0:40].fill(4)
+        feature[8:14, 0:40].fill(4)
+        trans = trans_add_delta.TransAddDelta()
+        feature = feature.reshape((14 * 120))
+        trans._regress(feature, 5 * 120, feature, 5 * 120 + 40, 40, 4, 120)
+        trans._regress(feature, 5 * 120 + 40, feature, 5 * 120 + 80, 40, 4, 120)
+        feature = feature.reshape((14, 120))
+        tmp_feature = feature[5:5 + 4, :]
+        self.assertAlmostEqual(1.0, tmp_feature[0][0])
+        self.assertAlmostEqual(0.24, tmp_feature[0][119])
+        self.assertAlmostEqual(2.0, tmp_feature[1][0])
+        self.assertAlmostEqual(0.13, tmp_feature[1][119])
+        self.assertAlmostEqual(3.0, tmp_feature[2][0])
+        self.assertAlmostEqual(-0.13, tmp_feature[2][119])
+        self.assertAlmostEqual(4.0, tmp_feature[3][0])
+        self.assertAlmostEqual(-0.24, tmp_feature[3][119])
+    def test_perform(self):
+        """test perform
+        """
+        feature = np.zeros((4, 40), dtype="float32")
+        feature[0, 0:40].fill(1)
+        feature[1, 0:40].fill(2)
+        feature[2, 0:40].fill(3)
+        feature[3, 0:40].fill(4)
+        trans = trans_add_delta.TransAddDelta()
+        (feature, label) = trans.perform_trans((feature, None))
+        self.assertAlmostEqual(feature.shape[0], 4)
+        self.assertAlmostEqual(feature.shape[1], 120)
+        self.assertAlmostEqual(1.0, feature[0][0])
+        self.assertAlmostEqual(0.24, feature[0][119])
+        self.assertAlmostEqual(2.0, feature[1][0])
+        self.assertAlmostEqual(0.13, feature[1][119])
+        self.assertAlmostEqual(3.0, feature[2][0])
+        self.assertAlmostEqual(-0.13, feature[2][119])
+        self.assertAlmostEqual(4.0, feature[3][0])
+        self.assertAlmostEqual(-0.24, feature[3][119])
+class TestTransSplict(unittest.TestCase):
+    """unit test Test TransSplict
+    """
+    def test_perfrom(self):
+        feature = np.zeros((8, 10), dtype="float32")
+        for i in xrange(feature.shape[0]):
+            feature[i, :].fill(i)
+        trans = trans_splice.TransSplice()
+        (feature, label) = trans.perform_trans((feature, None))
+        self.assertEqual(feature.shape[1], 110)
+        for i in xrange(8):
+            nzero_num = 5 - i
+            cur_val = 0.0
+            if nzero_num < 0:
+                cur_val = i - 5 - 1
+            for j in xrange(11):
+                if j <= nzero_num:
+                    for k in xrange(10):
+                        self.assertAlmostEqual(feature[i][j * 10 + k], cur_val)
+                else:
+                    if cur_val < 7:
+                        cur_val += 1.0
+                    for k in xrange(10):
+                        self.assertAlmostEqual(feature[i][j * 10 + k], cur_val)
+if __name__ == '__main__':
+    unittest.main()
--- a/fluid/DeepASR/data_utils/augmentor/trans_add_delta.py
+++ b/fluid/DeepASR/data_utils/augmentor/trans_add_delta.py
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import numpy as np
+import math
+import copy
+class TransAddDelta(object):
+    """ add delta of feature data 
+        trans feature for shape(a, b) to shape(a, b * 3)
+        Attributes:
+            _norder(int):
+            _window(int):
+    """
+    def __init__(self, norder=2, nwindow=2):
+        """ init construction
+            Args:
+                norder: default 2 
+                nwindow: default 2
+        """
+        self._norder = norder
+        self._nwindow = nwindow
+    def perform_trans(self, sample):
+        """ add delta for feature
+            trans feature shape from (a,b) to (a, b * 3)
+            Args: 
+                sample(object,tuple): contain feature numpy and label numpy
+            Returns:
+                (feature, label)
+        """
+        (feature, label) = sample
+        frame_dim = feature.shape[1]
+        d_frame_dim = frame_dim * 3
+        head_filled = 5
+        tail_filled = 5
+        mat = np.zeros(
+            (feature.shape[0] + head_filled + tail_filled, d_frame_dim),
+            dtype="float32")
+        #copy first frame
+        for i in xrange(head_filled):
+            np.copyto(mat[i, 0:frame_dim], feature[0, :])
+        np.copyto(mat[head_filled:head_filled + feature.shape[0], 0:frame_dim],
+                  feature[:, :])
+        # copy last frame
+        for i in xrange(head_filled + feature.shape[0], mat.shape[0], 1):
+            np.copyto(mat[i, 0:frame_dim], feature[feature.shape[0] - 1, :])
+        nframe = feature.shape[0]
+        start = head_filled
+        tmp_shape = mat.shape
+        mat = mat.reshape((tmp_shape[0] * tmp_shape[1]))
+        self._regress(mat, start * d_frame_dim, mat,
+                      start * d_frame_dim + frame_dim, frame_dim, nframe,
+                      d_frame_dim)
+        self._regress(mat, start * d_frame_dim + frame_dim, mat,
+                      start * d_frame_dim + 2 * frame_dim, frame_dim, nframe,
+                      d_frame_dim)
+        mat.shape = tmp_shape
+        return (mat[head_filled:mat.shape[0] - tail_filled, :], label)
+    def _regress(self, data_in, start_in, data_out, start_out, size, n, step):
+        """ regress
+            Args:
+                data_in: in data
+                start_in: start index of data_in
+                data_out: out data
+                start_out: start index of data_out
+                size: frame dimentional
+                n: frame num
+                step: 3 * (frame num)
+            Returns:
+                None
+        """
+        sigma_t2 = 0.0
+        delta_window = self._nwindow
+        for t in xrange(1, delta_window + 1):
+            sigma_t2 += t * t
+        sigma_t2 *= 2.0
+        for i in xrange(n):
+            fp1 = start_in
+            fp2 = start_out
+            for j in xrange(size):
+                back = fp1
+                forw = fp1
+                sum = 0.0
+                for t in xrange(1, delta_window + 1):
+                    back -= step
+                    forw += step
+                    sum += t * (data_in[forw] - data_in[back])
+                data_out[fp2] = sum / sigma_t2
+                fp1 += 1
+                fp2 += 1
+            start_in += step
+            start_out += step
--- a/fluid/DeepASR/data_utils/augmentor/trans_mean_variance_norm.py
+++ b/fluid/DeepASR/data_utils/augmentor/trans_mean_variance_norm.py
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import numpy as np
+import math
+class TransMeanVarianceNorm(object):
+    """ normalization of mean variance for feature data 
+        Attributes:
+            _mean(numpy.array): the feature mean vector
+            _var(numpy.array): the feature variance 
+    """
+    def __init__(self, snorm_path):
+        """init construction
+            Args:
+                snorm_path: the path of mean and variance
+        """
+        self._mean = None
+        self._var = None
+        self._load_norm(snorm_path)
+    def _load_norm(self, snorm_path):
+        """ load mean var file
+            Args: 
+                snorm_path(str):the file path
+        """
+        lLines = open(snorm_path).readlines()
+        nLen = len(lLines)
+        self._mean = np.zeros((nLen), dtype="float32")
+        self._var = np.zeros((nLen), dtype="float32")
+        self._nLen = nLen
+        for nidx, l in enumerate(lLines):
+            s = l.split()
+            assert len(s) == 2
+            self._mean[nidx] = float(s[0])
+            self._var[nidx] = 1.0 / math.sqrt(float(s[1]))
+            if self._var[nidx] > 100000.0:
+                self._var[nidx] = 100000.0
+    def get_mean_var(self):
+        """ get mean and var 
+            Args:
+            Returns:
+                (mean, var)
+        """
+        return (self._mean, self._var)
+    def perform_trans(self, sample):
+        """ feature = (feature - mean) * var
+            Args:
+                sample(object):input sample, contain feature numpy and label numpy
+            Returns:
+                (feature, label)
+        """
+        (feature, label) = sample
+        shape = feature.shape
+        assert len(shape) == 2
+        nfeature_len = shape[0] * shape[1]
+        assert nfeature_len % self._nLen == 0
+        ncur_idx = 0
+        feature = feature.reshape((nfeature_len))
+        while ncur_idx < nfeature_len:
+            block = feature[ncur_idx:ncur_idx + self._nLen]
+            block = (block - self._mean) * self._var
+            feature[ncur_idx:ncur_idx + self._nLen] = block
+            ncur_idx += self._nLen
+        feature = feature.reshape(shape)
+        return (feature, label)
--- a/fluid/DeepASR/data_utils/augmentor/trans_splice.py
+++ b/fluid/DeepASR/data_utils/augmentor/trans_splice.py
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import numpy as np
+import math
+class TransSplice(object):
+    """ copy feature context to construct new feature
+        expand feature data from shape (frame_num, frame_dim) 
+        to shape (frame_num, frame_dim * 11)
+        Attributes:
+            _nleft_context(int): copy left context number
+            _nright_context(int): copy right context number
+    """
+    def __init__(self, nleft_context=5, nright_context=5):
+        """ init construction
+            Args:
+                nleft_context(int):
+                nright_context(int):
+        """
+        self._nleft_context = nleft_context
+        self._nright_context = nright_context
+    def perform_trans(self, sample):
+        """ copy feature context 
+        Args:
+            sample(object): input sample(feature, label)
+        Return:
+            (feature, label)
+        """
+        (feature, label) = sample
+        nframe_num = feature.shape[0]
+        nframe_dim = feature.shape[1]
+        nnew_frame_dim = nframe_dim * (
+            self._nleft_context + self._nright_context + 1)
+        mat = np.zeros(
+            (nframe_num + self._nleft_context + self._nright_context,
+             nframe_dim),
+            dtype="float32")
+        ret = np.zeros((nframe_num, nnew_frame_dim), dtype="float32")
+        #copy left
+        for i in xrange(self._nleft_context):
+            mat[i, :] = feature[0, :]
+        #copy middle 
+        mat[self._nleft_context:self._nleft_context +
+            nframe_num, :] = feature[:, :]
+        #copy right
+        for i in xrange(self._nright_context):
+            mat[i + self._nleft_context + nframe_num, :] = feature[-1, :]
+        mat = mat.reshape(mat.shape[0] * mat.shape[1])
+        ret = ret.reshape(ret.shape[0] * ret.shape[1])
+        for i in xrange(nframe_num):
+            np.copyto(ret[i * nnew_frame_dim:(i + 1) * nnew_frame_dim],
+                      mat[i * nframe_dim:i * nframe_dim + nnew_frame_dim])
+        ret = ret.reshape((nframe_num, nnew_frame_dim))
+        return (ret, label)
--- a/fluid/DeepASR/data_utils/data_reader.py
+++ b/fluid/DeepASR/data_utils/data_reader.py
+"""This module contains data processing related logic.
+"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import random
+import struct
+import Queue
+import time
+import numpy as np
+from threading import Thread
+import signal
+from multiprocessing import Manager, Process
+import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm
+import data_utils.augmentor.trans_add_delta as trans_add_delta
+from data_utils.util import suppress_complaints, suppress_signal
+from data_utils.util import CriticalException, ForceExitWrapper
+class SampleInfo(object):
+    """SampleInfo holds the necessary information to load a sample from disk.
+    Args:
+        feature_bin_path (str): File containing the feature data.
+        feature_start (int): Start position of the sample's feature data.
+        feature_size (int): Byte count of the sample's feature data.
+        feature_frame_num (int): Time length of the sample.
+        feature_dim (int): Feature dimension of one frame.
+        label_bin_path (str): File containing the label data.
+        label_size (int): Byte count of the sample's label data.
+        label_frame_num (int): Label number of the sample.
+    """
+    def __init__(self, feature_bin_path, feature_start, feature_size,
+                 feature_frame_num, feature_dim, label_bin_path, label_start,
+                 label_size, label_frame_num):
+        self.feature_bin_path = feature_bin_path
+        self.feature_start = feature_start
+        self.feature_size = feature_size
+        self.feature_frame_num = feature_frame_num
+        self.feature_dim = feature_dim
+        self.label_bin_path = label_bin_path
+        self.label_start = label_start
+        self.label_size = label_size
+        self.label_frame_num = label_frame_num
+class SampleInfoBucket(object):
+    """SampleInfoBucket contains paths of several description files. Feature
+    description file contains necessary information (including path of binary
+    data, sample start position, sample byte number etc.) to access samples'
+    feature data and the same with the label description file. SampleInfoBucket
+    is the minimum unit to do shuffle.
+    Args:
+        feature_bin_paths (list|tuple): Files containing the binary feature
+                                        data.
+        feature_desc_paths (list|tuple): Files containing the description of
+                                         samples' feature data.
+        label_bin_paths (list|tuple): Files containing the binary label data.
+        label_desc_paths (list|tuple): Files containing the description of
+                                       samples' label data.
+        split_perturb(int): Maximum perturbation value for length of
+                            sub-sentence when splitting long sentence.
+        split_sentence_threshold(int): Sentence whose length larger than
+                                the value will trigger split operation.
+        split_sub_sentence_len(int): sub-sentence length is equal to
+                                    (split_sub_sentence_len + rand() % split_perturb).
+    """
+    def __init__(self,
+                 feature_bin_paths,
+                 feature_desc_paths,
+                 label_bin_paths,
+                 label_desc_paths,
+                 split_perturb=50,
+                 split_sentence_threshold=512,
+                 split_sub_sentence_len=256):
+        block_num = len(label_bin_paths)
+        assert len(label_desc_paths) == block_num
+        assert len(feature_bin_paths) == block_num
+        assert len(feature_desc_paths) == block_num
+        self._block_num = block_num
+        self._feature_bin_paths = feature_bin_paths
+        self._feature_desc_paths = feature_desc_paths
+        self._label_bin_paths = label_bin_paths
+        self._label_desc_paths = label_desc_paths
+        self._split_perturb = split_perturb
+        self._split_sentence_threshold = split_sentence_threshold
+        self._split_sub_sentence_len = split_sub_sentence_len
+        self._rng = random.Random(0)
+    def generate_sample_info_list(self):
+        sample_info_list = []
+        for block_idx in xrange(self._block_num):
+            label_bin_path = self._label_bin_paths[block_idx]
+            label_desc_path = self._label_desc_paths[block_idx]
+            feature_bin_path = self._feature_bin_paths[block_idx]
+            feature_desc_path = self._feature_desc_paths[block_idx]
+            label_desc_lines = open(label_desc_path).readlines()
+            feature_desc_lines = open(feature_desc_path).readlines()
+            sample_num = int(label_desc_lines[0].split()[1])
+            assert sample_num == int(feature_desc_lines[0].split()[1])
+            for i in xrange(sample_num):
+                feature_desc_split = feature_desc_lines[i + 1].split()
+                feature_start = int(feature_desc_split[2])
+                feature_size = int(feature_desc_split[3])
+                feature_frame_num = int(feature_desc_split[4])
+                feature_dim = int(feature_desc_split[5])
+                label_desc_split = label_desc_lines[i + 1].split()
+                label_start = int(label_desc_split[2])
+                label_size = int(label_desc_split[3])
+                label_frame_num = int(label_desc_split[4])
+                assert feature_frame_num == label_frame_num
+                if self._split_sentence_threshold == -1 or \
+                        self._split_perturb == -1 or \
+                        self._split_sub_sentence_len == -1 \
+                        or self._split_sentence_threshold >= feature_frame_num:
+                    sample_info_list.append(
+                        SampleInfo(feature_bin_path, feature_start,
+                                   feature_size, feature_frame_num, feature_dim,
+                                   label_bin_path, label_start, label_size,
+                                   label_frame_num))
+                #split sentence
+                else:
+                    cur_frame_pos = 0
+                    cur_frame_len = 0
+                    remain_frame_num = feature_frame_num
+                    while True:
+                        if remain_frame_num > self._split_sentence_threshold:
+                            cur_frame_len = self._split_sub_sentence_len + \
+                                    self._rng.randint(0, self._split_perturb)
+                            if cur_frame_len > remain_frame_num:
+                                cur_frame_len = remain_frame_num
+                        else:
+                            cur_frame_len = remain_frame_num
+                        sample_info_list.append(
+                            SampleInfo(
+                                feature_bin_path, feature_start + cur_frame_pos
+                                * feature_dim * 4, cur_frame_len * feature_dim *
+                                4, cur_frame_len, feature_dim, label_bin_path,
+                                label_start + cur_frame_pos * 4, cur_frame_len *
+                                4, cur_frame_len))
+                        remain_frame_num -= cur_frame_len
+                        cur_frame_pos += cur_frame_len
+                        if remain_frame_num <= 0:
+                            break
+        return sample_info_list
+class EpochEndSignal():
+    pass
+class DataReader(object):
+    """DataReader provides basic audio sample preprocessing pipeline including
+    data loading and data augmentation.
+    Args:
+        feature_file_list (str): File containing paths of feature data file and
+                                 corresponding description file.
+        label_file_list (str): File containing paths of label data file and
+                               corresponding description file.
+        drop_frame_len (int): Samples whose label length above the value will be
+                              dropped.(Using '-1' to disable the policy)
+        process_num (int): Number of processes for processing data.
+        sample_buffer_size (int): Buffer size to indicate the maximum samples
+                                  cached.
+        sample_info_buffer_size (int): Buffer size to indicate the maximum
+                                       sample information cached.
+        batch_buffer_size (int): Buffer size to indicate the maximum batch
+                                 cached.
+        shuffle_block_num (int): Block number indicating the minimum unit to do
+                                 shuffle.
+        random_seed (int): Random seed.
+        verbose (int): If set to 0, complaints including exceptions and signal
+                       traceback from sub-process will be suppressed. If set
+                       to 1, all complaints will be printed.
+    """
+    def __init__(self,
+                 feature_file_list,
+                 label_file_list,
+                 drop_frame_len=512,
+                 process_num=10,
+                 sample_buffer_size=1024,
+                 sample_info_buffer_size=1024,
+                 batch_buffer_size=1024,
+                 shuffle_block_num=10,
+                 random_seed=0,
+                 verbose=0):
+        self._feature_file_list = feature_file_list
+        self._label_file_list = label_file_list
+        self._drop_frame_len = drop_frame_len
+        self._shuffle_block_num = shuffle_block_num
+        self._block_info_list = None
+        self._rng = random.Random(random_seed)
+        self._bucket_list = None
+        self.generate_bucket_list(True)
+        self._order_id = 0
+        self._manager = Manager()
+        self._sample_buffer_size = sample_buffer_size
+        self._sample_info_buffer_size = sample_info_buffer_size
+        self._batch_buffer_size = batch_buffer_size
+        self._process_num = process_num
+        self._verbose = verbose
+        self._force_exit = ForceExitWrapper(self._manager.Value('b', False))
+    def generate_bucket_list(self, is_shuffle):
+        if self._block_info_list is None:
+            block_feature_info_lines = open(self._feature_file_list).readlines()
+            block_label_info_lines = open(self._label_file_list).readlines()
+            assert len(block_feature_info_lines) == len(block_label_info_lines)
+            self._block_info_list = []
+            for i in xrange(0, len(block_feature_info_lines), 2):
+                block_info = (block_feature_info_lines[i],
+                              block_feature_info_lines[i + 1],
+                              block_label_info_lines[i],
+                              block_label_info_lines[i + 1])
+                self._block_info_list.append(
+                    map(lambda line: line.strip(), block_info))
+        if is_shuffle:
+            self._rng.shuffle(self._block_info_list)
+        self._bucket_list = []
+        for i in xrange(0, len(self._block_info_list), self._shuffle_block_num):
+            bucket_block_info = self._block_info_list[i:i +
+                                                      self._shuffle_block_num]
+            self._bucket_list.append(
+                SampleInfoBucket(
+                    map(lambda info: info[0], bucket_block_info),
+                    map(lambda info: info[1], bucket_block_info),
+                    map(lambda info: info[2], bucket_block_info),
+                    map(lambda info: info[3], bucket_block_info)))
+    # @TODO make this configurable
+    def set_transformers(self, transformers):
+        self._transformers = transformers
+    def _sample_generator(self):
+        sample_info_queue = self._manager.Queue(self._sample_info_buffer_size)
+        sample_queue = self._manager.Queue(self._sample_buffer_size)
+        self._order_id = 0
+        @suppress_complaints(verbose=self._verbose, notify=self._force_exit)
+        def ordered_feeding_task(sample_info_queue):
+            for sample_info_bucket in self._bucket_list:
+                try:
+                    sample_info_list = \
+                            sample_info_bucket.generate_sample_info_list()
+                except Exception as e:
+                    raise CriticalException(e)
+                else:
+                    self._rng.shuffle(sample_info_list)  # do shuffle here
+                    for sample_info in sample_info_list:
+                        sample_info_queue.put((sample_info, self._order_id))
+                        self._order_id += 1
+            for i in xrange(self._process_num):
+                sample_info_queue.put(EpochEndSignal())
+        feeding_thread = Thread(
+            target=ordered_feeding_task, args=(sample_info_queue, ))
+        feeding_thread.daemon = True
+        feeding_thread.start()
+        @suppress_complaints(verbose=self._verbose, notify=self._force_exit)
+        def ordered_processing_task(sample_info_queue, sample_queue, out_order):
+            if self._verbose == 0:
+                signal.signal(signal.SIGTERM, suppress_signal)
+                signal.signal(signal.SIGINT, suppress_signal)
+            def read_bytes(fpath, start, size):
+                try:
+                    f = open(fpath, 'r')
+                    f.seek(start, 0)
+                    binary_bytes = f.read(size)
+                    f.close()
+                    return binary_bytes
+                except Exception as e:
+                    raise CriticalException(e)
+            ins = sample_info_queue.get()
+            while not isinstance(ins, EpochEndSignal):
+                sample_info, order_id = ins
+                feature_bytes = read_bytes(sample_info.feature_bin_path,
+                                           sample_info.feature_start,
+                                           sample_info.feature_size)
+                assert sample_info.feature_frame_num * sample_info.feature_dim * 4 \
+                        == len(feature_bytes), \
+                        (sample_info.feature_bin_path,
+                         sample_info.feature_frame_num,
+                         sample_info.feature_dim,
+                         len(feature_bytes))
+                label_bytes = read_bytes(sample_info.label_bin_path,
+                                         sample_info.label_start,
+                                         sample_info.label_size)
+                assert sample_info.label_frame_num * 4 == len(label_bytes), (
+                    sample_info.label_bin_path, sample_info.label_array,
+                    len(label_bytes))
+                label_array = struct.unpack('I' * sample_info.label_frame_num,
+                                            label_bytes)
+                label_data = np.array(
+                    label_array, dtype='int64').reshape(
+                        (sample_info.label_frame_num, 1))
+                feature_frame_num = sample_info.feature_frame_num
+                feature_dim = sample_info.feature_dim
+                assert feature_frame_num * feature_dim * 4 == len(feature_bytes)
+                feature_array = struct.unpack('f' * feature_frame_num *
+                                              feature_dim, feature_bytes)
+                feature_data = np.array(
+                    feature_array, dtype='float32').reshape((
+                        sample_info.feature_frame_num, sample_info.feature_dim))
+                sample_data = (feature_data, label_data)
+                for transformer in self._transformers:
+                    # @TODO(pkuyym) to make transfomer only accept feature_data
+                    sample_data = transformer.perform_trans(sample_data)
+                while order_id != out_order[0]:
+                    time.sleep(0.001)
+                # drop long sentence
+                if self._drop_frame_len == -1 or \
+                        self._drop_frame_len >= sample_data[0].shape[0]:
+                    sample_queue.put(sample_data)
+                out_order[0] += 1
+                ins = sample_info_queue.get()
+            sample_queue.put(EpochEndSignal())
+        out_order = self._manager.list([0])
+        args = (sample_info_queue, sample_queue, out_order)
+        workers = [
+            Process(
+                target=ordered_processing_task, args=args)
+            for _ in xrange(self._process_num)
+        ]
+        for w in workers:
+            w.daemon = True
+            w.start()
+        finished_process_num = 0
+        while self._force_exit == False:
+            try:
+                sample = sample_queue.get_nowait()
+            except Queue.Empty:
+                time.sleep(0.001)
+            else:
+                if isinstance(sample, EpochEndSignal):
+                    finished_process_num += 1
+                    if finished_process_num >= self._process_num:
+                        break
+                    else:
+                        continue
+                yield sample
+    def batch_iterator(self, batch_size, minimum_batch_size):
+        def batch_to_ndarray(batch_samples, lod):
+            assert len(batch_samples)
+            frame_dim = batch_samples[0][0].shape[1]
+            batch_feature = np.zeros((lod[-1], frame_dim), dtype="float32")
+            batch_label = np.zeros((lod[-1], 1), dtype="int64")
+            start = 0
+            for sample in batch_samples:
+                frame_num = sample[0].shape[0]
+                batch_feature[start:start + frame_num, :] = sample[0]
+                batch_label[start:start + frame_num, :] = sample[1]
+                start += frame_num
+            return (batch_feature, batch_label)
+        @suppress_complaints(verbose=self._verbose, notify=self._force_exit)
+        def batch_assembling_task(sample_generator, batch_queue):
+            batch_samples = []
+            lod = [0]
+            for sample in sample_generator():
+                batch_samples.append(sample)
+                lod.append(lod[-1] + sample[0].shape[0])
+                if len(batch_samples) == batch_size:
+                    (batch_feature, batch_label) = batch_to_ndarray(
+                        batch_samples, lod)
+                    batch_queue.put((batch_feature, batch_label, lod))
+                    batch_samples = []
+                    lod = [0]
+            if len(batch_samples) >= minimum_batch_size:
+                (batch_feature, batch_label) = batch_to_ndarray(batch_samples,
+                                                                lod)
+                batch_queue.put((batch_feature, batch_label, lod))
+            batch_queue.put(EpochEndSignal())
+        batch_queue = Queue.Queue(self._batch_buffer_size)
+        assembling_thread = Thread(
+            target=batch_assembling_task,
+            args=(self._sample_generator, batch_queue))
+        assembling_thread.daemon = True
+        assembling_thread.start()
+        while self._force_exit == False:
+            try:
+                batch_data = batch_queue.get_nowait()
+            except Queue.Empty:
+                time.sleep(0.001)
+            else:
+                if isinstance(batch_data, EpochEndSignal):
+                    break
+                yield batch_data
--- a/fluid/DeepASR/data_utils/util.py
+++ b/fluid/DeepASR/data_utils/util.py
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import sys
+from six import reraise
+from tblib import Traceback
+import numpy as np
+def to_lodtensor(data, place):
+    """convert tensor to lodtensor
+    """
+    seq_lens = [len(seq) for seq in data]
+    cur_len = 0
+    lod = [cur_len]
+    for l in seq_lens:
+        cur_len += l
+        lod.append(cur_len)
+    flattened_data = numpy.concatenate(data, axis=0).astype("int64")
+    flattened_data = flattened_data.reshape([len(flattened_data), 1])
+    res = fluid.LoDTensor()
+    res.set(flattened_data, place)
+    res.set_lod([lod])
+    return res
+def lodtensor_to_ndarray(lod_tensor):
+    """conver lodtensor to ndarray
+    """
+    dims = lod_tensor.get_dims()
+    ret = np.zeros(shape=dims).astype('float32')
+    for i in xrange(np.product(dims)):
+        ret.ravel()[i] = lod_tensor.get_float_element(i)
+    return ret, lod_tensor.lod()
+class CriticalException(Exception):
+    pass
+def suppress_signal(signo, stack_frame):
+    pass
+def suppress_complaints(verbose, notify=None):
+    def decorator_maker(func):
+        def suppress_warpper(*args, **kwargs):
+            try:
+                func(*args, **kwargs)
+            except:
+                et, ev, tb = sys.exc_info()
+                if notify is not None:
+                    notify(except_type=et, except_value=ev, traceback=tb)
+                if verbose == 1 or isinstance(ev, CriticalException):
+                    reraise(et, ev, Traceback(tb).as_traceback())
+        return suppress_warpper
+    return decorator_maker
+class ForceExitWrapper(object):
+    def __init__(self, exit_flag):
+        self._exit_flag = exit_flag
+    @suppress_complaints(verbose=0)
+    def __call__(self, *args, **kwargs):
+        self._exit_flag.value = True
+    def __eq__(self, flag):
+        return self._exit_flag.value == flag
--- a/fluid/DeepASR/infer.py
+++ b/fluid/DeepASR/infer.py
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import os
+import argparse
+import paddle.fluid as fluid
+import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm
+import data_utils.augmentor.trans_add_delta as trans_add_delta
+import data_utils.augmentor.trans_splice as trans_splice
+import data_utils.data_reader as reader
+from data_utils.util import lodtensor_to_ndarray
+def parse_args():
+    parser = argparse.ArgumentParser("Inference for stacked LSTMP model.")
+    parser.add_argument(
+        '--batch_size',
+        type=int,
+        default=32,
+        help='The sequence number of a batch data. (default: %(default)d)')
+    parser.add_argument(
+        '--device',
+        type=str,
+        default='GPU',
+        choices=['CPU', 'GPU'],
+        help='The device type. (default: %(default)s)')
+    parser.add_argument(
+        '--mean_var',
+        type=str,
+        default='data/global_mean_var_search26kHr',
+        help="The path for feature's global mean and variance. "
+        "(default: %(default)s)")
+    parser.add_argument(
+        '--infer_feature_lst',
+        type=str,
+        default='data/infer_feature.lst',
+        help='The feature list path for inference. (default: %(default)s)')
+    parser.add_argument(
+        '--infer_label_lst',
+        type=str,
+        default='data/infer_label.lst',
+        help='The label list path for inference. (default: %(default)s)')
+    parser.add_argument(
+        '--infer_model_path',
+        type=str,
+        default='./infer_models/deep_asr.pass_0.infer.model/',
+        help='The directory for loading inference model. '
+        '(default: %(default)s)')
+    args = parser.parse_args()
+    return args
+def print_arguments(args):
+    print('-----------  Configuration Arguments -----------')
+    for arg, value in sorted(vars(args).iteritems()):
+        print('%s: %s' % (arg, value))
+    print('------------------------------------------------')
+def split_infer_result(infer_seq, lod):
+    infer_batch = []
+    for i in xrange(0, len(lod[0]) - 1):
+        infer_batch.append(infer_seq[lod[0][i]:lod[0][i + 1]])
+    return infer_batch
+def infer(args):
+    """ Gets one batch of feature data and predicts labels for each sample.
+    """
+    if not os.path.exists(args.infer_model_path):
+        raise IOError("Invalid inference model path!")
+    place = fluid.CUDAPlace(0) if args.device == 'GPU' else fluid.CPUPlace()
+    exe = fluid.Executor(place)
+    # load model
+    [infer_program, feed_dict,
+     fetch_targets] = fluid.io.load_inference_model(args.infer_model_path, exe)
+    ltrans = [
+        trans_add_delta.TransAddDelta(2, 2),
+        trans_mean_variance_norm.TransMeanVarianceNorm(args.mean_var),
+        trans_splice.TransSplice()
+    ]
+    infer_data_reader = reader.DataReader(args.infer_feature_lst,
+                                          args.infer_label_lst)
+    infer_data_reader.set_transformers(ltrans)
+    feature_t = fluid.LoDTensor()
+    one_batch = infer_data_reader.batch_iterator(args.batch_size, 1).next()
+    (features, labels, lod) = one_batch
+    feature_t.set(features, place)
+    feature_t.set_lod([lod])
+    results = exe.run(infer_program,
+                      feed={feed_dict[0]: feature_t},
+                      fetch_list=fetch_targets,
+                      return_numpy=False)
+    probs, lod = lodtensor_to_ndarray(results[0])
+    preds = probs.argmax(axis=1)
+    infer_batch = split_infer_result(preds, lod)
+    for index, sample in enumerate(infer_batch):
+        print("result %d: " % index, sample, '\n')
+if __name__ == '__main__':
+    args = parse_args()
+    print_arguments(args)
+    infer(args)
--- a/fluid/DeepASR/model_utils/__init__.py
+++ b/fluid/DeepASR/model_utils/__init__.py
--- a/fluid/DeepASR/model_utils/model.py
+++ b/fluid/DeepASR/model_utils/model.py
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import paddle.v2 as paddle
+import paddle.fluid as fluid
+def stacked_lstmp_model(hidden_dim,
+                        proj_dim,
+                        stacked_num,
+                        class_num,
+                        parallel=False,
+                        is_train=True):
+    """ The model for DeepASR. The main structure is composed of stacked 
+        identical LSTMP (LSTM with recurrent projection) layers.
+        When running in training and validation phase, the feeding dictionary
+        is {'feature', 'label'}, fed by the LodTensor for feature data and 
+        label data respectively. And in inference, only `feature` is needed.
+    Args:
+	hidden_dim(int): The hidden state's dimension of the LSTMP layer.
+	proj_dim(int): The projection size of the LSTMP layer.
+	stacked_num(int): The number of stacked LSTMP layers.
+	parallel(bool): Run in parallel or not, default `False`.
+	is_train(bool): Run in training phase or not, default `True`.
+	class_dim(int): The number of output classes.
+    """
+    # network configuration
+    def _net_conf(feature, label):
+        seq_conv1 = fluid.layers.sequence_conv(
+            input=feature,
+            num_filters=1024,
+            filter_size=3,
+            filter_stride=1,
+            bias_attr=True)
+        bn1 = fluid.layers.batch_norm(
+            input=seq_conv1,
+            act="sigmoid",
+            is_test=not is_train,
+            momentum=0.9,
+            epsilon=1e-05,
+            data_layout='NCHW')
+        stack_input = bn1
+        for i in range(stacked_num):
+            fc = fluid.layers.fc(input=stack_input,
+                                 size=hidden_dim * 4,
+                                 bias_attr=True)
+            proj, cell = fluid.layers.dynamic_lstmp(
+                input=fc,
+                size=hidden_dim * 4,
+                proj_size=proj_dim,
+                bias_attr=True,
+                use_peepholes=True,
+                is_reverse=False,
+                cell_activation="tanh",
+                proj_activation="tanh")
+            bn = fluid.layers.batch_norm(
+                input=proj,
+                act="sigmoid",
+                is_test=not is_train,
+                momentum=0.9,
+                epsilon=1e-05,
+                data_layout='NCHW')
+            stack_input = bn
+        prediction = fluid.layers.fc(input=stack_input,
+                                     size=class_num,
+                                     act='softmax')
+        cost = fluid.layers.cross_entropy(input=prediction, label=label)
+        avg_cost = fluid.layers.mean(x=cost)
+        acc = fluid.layers.accuracy(input=prediction, label=label)
+        return prediction, avg_cost, acc
+    # data feeder
+    feature = fluid.layers.data(
+        name="feature", shape=[-1, 120 * 11], dtype="float32", lod_level=1)
+    label = fluid.layers.data(
+        name="label", shape=[-1, 1], dtype="int64", lod_level=1)
+    if parallel:
+        # When the execution place is specified to CUDAPlace, the program will
+        # run on all $CUDA_VISIBLE_DEVICES GPUs. Otherwise the program will 
+        # run on all CPU devices.
+        places = fluid.layers.get_places()
+        pd = fluid.layers.ParallelDo(places)
+        with pd.do():
+            feat_ = pd.read_input(feature)
+            label_ = pd.read_input(label)
+            prediction, avg_cost, acc = _net_conf(feat_, label_)
+            for out in [avg_cost, acc]:
+                pd.write_output(out)
+        # get mean loss and acc through every devices.
+        avg_cost, acc = pd()
+        avg_cost = fluid.layers.mean(x=avg_cost)
+        acc = fluid.layers.mean(x=acc)
+    else:
+        prediction, avg_cost, acc = _net_conf(feature, label)
+    return prediction, avg_cost, acc
--- a/fluid/DeepASR/tools/_init_paths.py
+++ b/fluid/DeepASR/tools/_init_paths.py
+"""Add the parent directory to $PYTHONPATH"""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import os.path
+import sys
+def add_path(path):
+    if path not in sys.path:
+        sys.path.insert(0, path)
+this_dir = os.path.dirname(__file__)
+# Add project path to PYTHONPATH
+proj_path = os.path.join(this_dir, '..')
+add_path(proj_path)
--- a/fluid/DeepASR/tools/profile.py
+++ b/fluid/DeepASR/tools/profile.py
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import sys
+import numpy as np
+import argparse
+import time
+import paddle.fluid as fluid
+import paddle.fluid.profiler as profiler
+import _init_paths
+import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm
+import data_utils.augmentor.trans_add_delta as trans_add_delta
+import data_utils.augmentor.trans_splice as trans_splice
+import data_utils.data_reader as reader
+from model_utils.model import stacked_lstmp_model
+from data_utils.util import lodtensor_to_ndarray
+def parse_args():
+    parser = argparse.ArgumentParser("Profiling for the stacked LSTMP model.")
+    parser.add_argument(
+        '--batch_size',
+        type=int,
+        default=32,
+        help='The sequence number of a batch data. (default: %(default)d)')
+    parser.add_argument(
+        '--minimum_batch_size',
+        type=int,
+        default=1,
+        help='The minimum sequence number of a batch data. '
+        '(default: %(default)d)')
+    parser.add_argument(
+        '--stacked_num',
+        type=int,
+        default=5,
+        help='Number of lstmp layers to stack. (default: %(default)d)')
+    parser.add_argument(
+        '--proj_dim',
+        type=int,
+        default=512,
+        help='Project size of lstmp unit. (default: %(default)d)')
+    parser.add_argument(
+        '--hidden_dim',
+        type=int,
+        default=1024,
+        help='Hidden size of lstmp unit. (default: %(default)d)')
+    parser.add_argument(
+        '--learning_rate',
+        type=float,
+        default=0.002,
+        help='Learning rate used to train. (default: %(default)f)')
+    parser.add_argument(
+        '--device',
+        type=str,
+        default='GPU',
+        choices=['CPU', 'GPU'],
+        help='The device type. (default: %(default)s)')
+    parser.add_argument(
+        '--parallel', action='store_true', help='If set, run in parallel.')
+    parser.add_argument(
+        '--mean_var',
+        type=str,
+        default='data/global_mean_var_search26kHr',
+        help='mean var path')
+    parser.add_argument(
+        '--feature_lst',
+        type=str,
+        default='data/feature.lst',
+        help='feature list path.')
+    parser.add_argument(
+        '--label_lst',
+        type=str,
+        default='data/label.lst',
+        help='label list path.')
+    parser.add_argument(
+        '--max_batch_num',
+        type=int,
+        default=10,
+        help='Maximum number of batches for profiling. (default: %(default)d)')
+    parser.add_argument(
+        '--first_batches_to_skip',
+        type=int,
+        default=1,
+        help='Number of first batches to skip for profiling. '
+        '(default: %(default)d)')
+    parser.add_argument(
+        '--print_train_acc',
+        action='store_true',
+        help='If set, output training accuray.')
+    parser.add_argument(
+        '--sorted_key',
+        type=str,
+        default='total',
+        choices=['None', 'total', 'calls', 'min', 'max', 'ave'],
+        help='Different types of time to sort the profiling report. '
+        '(default: %(default)s)')
+    args = parser.parse_args()
+    return args
+def print_arguments(args):
+    print('-----------  Configuration Arguments -----------')
+    for arg, value in sorted(vars(args).iteritems()):
+        print('%s: %s' % (arg, value))
+    print('------------------------------------------------')
+def profile(args):
+    """profile the training process.
+    """
+    if not args.first_batches_to_skip < args.max_batch_num:
+        raise ValueError("arg 'first_batches_to_skip' must be smaller than "
+                         "'max_batch_num'.")
+    if not args.first_batches_to_skip >= 0:
+        raise ValueError(
+            "arg 'first_batches_to_skip' must not be smaller than 0.")
+    _, avg_cost, accuracy = stacked_lstmp_model(
+        hidden_dim=args.hidden_dim,
+        proj_dim=args.proj_dim,
+        stacked_num=args.stacked_num,
+        class_num=1749,
+        parallel=args.parallel)
+    optimizer = fluid.optimizer.Momentum(
+        learning_rate=args.learning_rate, momentum=0.9)
+    optimizer.minimize(avg_cost)
+    place = fluid.CPUPlace() if args.device == 'CPU' else fluid.CUDAPlace(0)
+    exe = fluid.Executor(place)
+    exe.run(fluid.default_startup_program())
+    ltrans = [
+        trans_add_delta.TransAddDelta(2, 2),
+        trans_mean_variance_norm.TransMeanVarianceNorm(args.mean_var),
+        trans_splice.TransSplice()
+    ]
+    data_reader = reader.DataReader(args.feature_lst, args.label_lst)
+    data_reader.set_transformers(ltrans)
+    feature_t = fluid.LoDTensor()
+    label_t = fluid.LoDTensor()
+    sorted_key = None if args.sorted_key is 'None' else args.sorted_key
+    with profiler.profiler(args.device, sorted_key) as prof:
+        frames_seen, start_time = 0, 0.0
+        for batch_id, batch_data in enumerate(
+                data_reader.batch_iterator(args.batch_size,
+                                           args.minimum_batch_size)):
+            if batch_id >= args.max_batch_num:
+                break
+            if args.first_batches_to_skip == batch_id:
+                profiler.reset_profiler()
+                start_time = time.time()
+                frames_seen = 0
+            # load_data
+            (features, labels, lod) = batch_data
+            feature_t.set(features, place)
+            feature_t.set_lod([lod])
+            label_t.set(labels, place)
+            label_t.set_lod([lod])
+            frames_seen += lod[-1]
+            outs = exe.run(fluid.default_main_program(),
+                           feed={"feature": feature_t,
+                                 "label": label_t},
+                           fetch_list=[avg_cost, accuracy],
+                           return_numpy=False)
+            if args.print_train_acc:
+                print("Batch %d acc: %f" %
+                      (batch_id, lodtensor_to_ndarray(outs[1])[0]))
+            else:
+                sys.stdout.write('.')
+                sys.stdout.flush()
+        time_consumed = time.time() - start_time
+        frames_per_sec = frames_seen / time_consumed
+        print("\nTime consumed: %f s, performance: %f frames/s." %
+              (time_consumed, frames_per_sec))
+if __name__ == '__main__':
+    args = parse_args()
+    print_arguments(args)
+    profile(args)
--- a/fluid/DeepASR/train.py
+++ b/fluid/DeepASR/train.py
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import sys
+import os
+import numpy as np
+import argparse
+import time
+import paddle.fluid as fluid
+import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm
+import data_utils.augmentor.trans_add_delta as trans_add_delta
+import data_utils.augmentor.trans_splice as trans_splice
+import data_utils.data_reader as reader
+from data_utils.util import lodtensor_to_ndarray
+from model_utils.model import stacked_lstmp_model
+def parse_args():
+    parser = argparse.ArgumentParser("Training for stacked LSTMP model.")
+    parser.add_argument(
+        '--batch_size',
+        type=int,
+        default=32,
+        help='The sequence number of a batch data. (default: %(default)d)')
+    parser.add_argument(
+        '--minimum_batch_size',
+        type=int,
+        default=1,
+        help='The minimum sequence number of a batch data. '
+        '(default: %(default)d)')
+    parser.add_argument(
+        '--stacked_num',
+        type=int,
+        default=5,
+        help='Number of lstmp layers to stack. (default: %(default)d)')
+    parser.add_argument(
+        '--proj_dim',
+        type=int,
+        default=512,
+        help='Project size of lstmp unit. (default: %(default)d)')
+    parser.add_argument(
+        '--hidden_dim',
+        type=int,
+        default=1024,
+        help='Hidden size of lstmp unit. (default: %(default)d)')
+    parser.add_argument(
+        '--pass_num',
+        type=int,
+        default=100,
+        help='Epoch number to train. (default: %(default)d)')
+    parser.add_argument(
+        '--print_per_batches',
+        type=int,
+        default=100,
+        help='Interval to print training accuracy. (default: %(default)d)')
+    parser.add_argument(
+        '--learning_rate',
+        type=float,
+        default=0.002,
+        help='Learning rate used to train. (default: %(default)f)')
+    parser.add_argument(
+        '--device',
+        type=str,
+        default='GPU',
+        choices=['CPU', 'GPU'],
+        help='The device type. (default: %(default)s)')
+    parser.add_argument(
+        '--parallel', action='store_true', help='If set, run in parallel.')
+    parser.add_argument(
+        '--mean_var',
+        type=str,
+        default='data/global_mean_var_search26kHr',
+        help="The path for feature's global mean and variance. "
+        "(default: %(default)s)")
+    parser.add_argument(
+        '--train_feature_lst',
+        type=str,
+        default='data/feature.lst',
+        help='The feature list path for training. (default: %(default)s)')
+    parser.add_argument(
+        '--train_label_lst',
+        type=str,
+        default='data/label.lst',
+        help='The label list path for training. (default: %(default)s)')
+    parser.add_argument(
+        '--val_feature_lst',
+        type=str,
+        default='data/val_feature.lst',
+        help='The feature list path for validation. (default: %(default)s)')
+    parser.add_argument(
+        '--val_label_lst',
+        type=str,
+        default='data/val_label.lst',
+        help='The label list path for validation. (default: %(default)s)')
+    parser.add_argument(
+        '--init_model_path',
+        type=str,
+        default=None,
+        help="The model (checkpoint) path which the training resumes from. "
+        "If None, train the model from scratch. (default: %(default)s)")
+    parser.add_argument(
+        '--checkpoints',
+        type=str,
+        default='./checkpoints',
+        help="The directory for saving checkpoints. Do not save checkpoints "
+        "if set to ''. (default: %(default)s)")
+    parser.add_argument(
+        '--infer_models',
+        type=str,
+        default='./infer_models',
+        help="The directory for saving inference models. Do not save inference "
+        "models if set to ''. (default: %(default)s)")
+    args = parser.parse_args()
+    return args
+def print_arguments(args):
+    print('-----------  Configuration Arguments -----------')
+    for arg, value in sorted(vars(args).iteritems()):
+        print('%s: %s' % (arg, value))
+    print('------------------------------------------------')
+def train(args):
+    """train in loop.
+    """
+    # paths check
+    if args.init_model_path is not None and \
+            not os.path.exists(args.init_model_path):
+        raise IOError("Invalid initial model path!")
+    if args.checkpoints != '' and not os.path.exists(args.checkpoints):
+        os.mkdir(args.checkpoints)
+    if args.infer_models != '' and not os.path.exists(args.infer_models):
+        os.mkdir(args.infer_models)
+    prediction, avg_cost, accuracy = stacked_lstmp_model(
+        hidden_dim=args.hidden_dim,
+        proj_dim=args.proj_dim,
+        stacked_num=args.stacked_num,
+        class_num=1749,
+        parallel=args.parallel)
+    optimizer = fluid.optimizer.Momentum(
+        learning_rate=args.learning_rate, momentum=0.9)
+    optimizer.minimize(avg_cost)
+    # program for test
+    test_program = fluid.default_main_program().clone()
+    with fluid.program_guard(test_program):
+        test_program = fluid.io.get_inference_program([avg_cost, accuracy])
+    place = fluid.CPUPlace() if args.device == 'CPU' else fluid.CUDAPlace(0)
+    exe = fluid.Executor(place)
+    exe.run(fluid.default_startup_program())
+    # resume training if initial model provided.
+    if args.init_model_path is not None:
+        fluid.io.load_persistables(exe, args.init_model_path)
+    ltrans = [
+        trans_add_delta.TransAddDelta(2, 2),
+        trans_mean_variance_norm.TransMeanVarianceNorm(args.mean_var),
+        trans_splice.TransSplice()
+    ]
+    feature_t = fluid.LoDTensor()
+    label_t = fluid.LoDTensor()
+    # validation
+    def test(exe):
+        # If test data not found, return invalid cost and accuracy
+        if not (os.path.exists(args.val_feature_lst) and
+                os.path.exists(args.val_label_lst)):
+            return -1.0, -1.0
+        # test data reader
+        test_data_reader = reader.DataReader(args.val_feature_lst,
+                                             args.val_label_lst)
+        test_data_reader.set_transformers(ltrans)
+        test_costs, test_accs = [], []
+        for batch_id, batch_data in enumerate(
+                test_data_reader.batch_iterator(args.batch_size,
+                                                args.minimum_batch_size)):
+            # load_data
+            (features, labels, lod) = batch_data
+            feature_t.set(features, place)
+            feature_t.set_lod([lod])
+            label_t.set(labels, place)
+            label_t.set_lod([lod])
+            cost, acc = exe.run(test_program,
+                                feed={"feature": feature_t,
+                                      "label": label_t},
+                                fetch_list=[avg_cost, accuracy],
+                                return_numpy=False)
+            test_costs.append(lodtensor_to_ndarray(cost)[0])
+            test_accs.append(lodtensor_to_ndarray(acc)[0])
+        return np.mean(test_costs), np.mean(test_accs)
+    # train data reader
+    train_data_reader = reader.DataReader(args.train_feature_lst,
+                                          args.train_label_lst, -1)
+    train_data_reader.set_transformers(ltrans)
+    # train
+    for pass_id in xrange(args.pass_num):
+        pass_start_time = time.time()
+        for batch_id, batch_data in enumerate(
+                train_data_reader.batch_iterator(args.batch_size,
+                                                 args.minimum_batch_size)):
+            # load_data
+            (features, labels, lod) = batch_data
+            feature_t.set(features, place)
+            feature_t.set_lod([lod])
+            label_t.set(labels, place)
+            label_t.set_lod([lod])
+            cost, acc = exe.run(fluid.default_main_program(),
+                                feed={"feature": feature_t,
+                                      "label": label_t},
+                                fetch_list=[avg_cost, accuracy],
+                                return_numpy=False)
+            if batch_id > 0 and (batch_id % args.print_per_batches == 0):
+                print("\nBatch %d, train cost: %f, train acc: %f" %
+                      (batch_id, lodtensor_to_ndarray(cost)[0],
+                       lodtensor_to_ndarray(acc)[0]))
+                # save the latest checkpoint
+                if args.checkpoints != '':
+                    model_path = os.path.join(args.checkpoints,
+                                              "deep_asr.latest.checkpoint")
+                    fluid.io.save_persistables(exe, model_path)
+            else:
+                sys.stdout.write('.')
+                sys.stdout.flush()
+        # run test
+        val_cost, val_acc = test(exe)
+        # save checkpoint per pass
+        if args.checkpoints != '':
+            model_path = os.path.join(
+                args.checkpoints,
+                "deep_asr.pass_" + str(pass_id) + ".checkpoint")
+            fluid.io.save_persistables(exe, model_path)
+        # save inference model
+        if args.infer_models != '':
+            model_path = os.path.join(
+                args.infer_models,
+                "deep_asr.pass_" + str(pass_id) + ".infer.model")
+            fluid.io.save_inference_model(model_path, ["feature"],
+                                          [prediction], exe)
+        # cal pass time
+        pass_end_time = time.time()
+        time_consumed = pass_end_time - pass_start_time
+        # print info at pass end
+        print("\nPass %d, time consumed: %f s, val cost: %f, val acc: %f\n" %
+              (pass_id, time_consumed, val_cost, val_acc))
+if __name__ == '__main__':
+    args = parse_args()
+    print_arguments(args)
+    train(args)
--- a/fluid/README.md
+++ b/fluid/README.md
+# Paddle Fluid Models
+---
+The Paddle Fluid models are a collection of example models that use Paddle Fluid APIs. Currently, example codes in this directory are still under active development.
--- a/fluid/adversarial/README.md
+++ b/fluid/adversarial/README.md
+The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
+---
 # Advbox
 Advbox is a Python toolbox to create adversarial examples that fool neural networks. It requires Python and paddle.

--- a/fluid/adversarial/advbox/__init__.py
+++ b/fluid/adversarial/advbox/__init__.py
 """
   A set of tools for generating adversarial example on paddle platform
 """
-from . import attacks
-from . import models
-from .adversary import Adversary
--- a/fluid/adversarial/advbox/adversary.py
+++ b/fluid/adversarial/advbox/adversary.py
@@ -18,13 +18,15 @@ class Adversary(object):
        """
        assert original is not None
+        self.original_label = original_label
+        self.target_label = None
+        self.adversarial_label = None
        self.__original = original
-        self.__original_label = original_label
-        self.__target_label = None
        self.__target = None
        self.__is_targeted_attack = False
        self.__adversarial_example = None
-        self.__adversarial_label = None
+        self.__bad_adversarial_example = None
    def set_target(self, is_targeted_attack, target=None, target_label=None):
        """
@@ -38,10 +40,10 @@ class Adversary(object):
        """
        assert (target_label is None) or is_targeted_attack
        self.__is_targeted_attack = is_targeted_attack
-        self.__target_label = target_label
+        self.target_label = target_label
        self.__target = target
        if not is_targeted_attack:
-            self.__target_label = None
+            self.target_label = None
            self.__target = None
    def set_original(self, original, original_label=None):
@@ -53,10 +55,11 @@ class Adversary(object):
        """
        if original != self.__original:
            self.__original = original
-            self.__original_label = original_label
+            self.original_label = original_label
            self.__adversarial_example = None
+            self.__bad_adversarial_example = None
        if original is None:
-            self.__original_label = None
+            self.original_label = None
    def _is_successful(self, adversarial_label):
        """
@@ -65,11 +68,11 @@ class Adversary(object):
        :param adversarial_label: adversarial label.
        :return: bool
        """
-        if self.__target_label is not None:
+        if self.target_label is not None:
-            return adversarial_label == self.__target_label
+            return adversarial_label == self.target_label
        else:
            return (adversarial_label is not None) and \
-                   (adversarial_label != self.__original_label)
+                   (adversarial_label != self.original_label)
    def is_successful(self):
        """
@@ -77,7 +80,7 @@ class Adversary(object):
        :return: bool
        """
-        return self._is_successful(self.__adversarial_label)
+        return self._is_successful(self.adversarial_label)
    def try_accept_the_example(self, adversarial_example, adversarial_label):
        """
@@ -93,7 +96,9 @@ class Adversary(object):
        ok = self._is_successful(adversarial_label)
        if ok:
            self.__adversarial_example = adversarial_example
-            self.__adversarial_label = adversarial_label
+            self.adversarial_label = adversarial_label
+        else:
+            self.__bad_adversarial_example = adversarial_example
        return ok
    def perturbation(self, multiplying_factor=1.0):
@@ -104,9 +109,14 @@ class Adversary(object):
        :return: The perturbation that is multiplied by multiplying_factor.
        """
        assert self.__original is not None
-        assert self.__adversarial_example is not None
+        assert (self.__adversarial_example is not None) or \
+               (self.__bad_adversarial_example is not None)
+        if self.__adversarial_example is not None:
            return multiplying_factor * (
                self.__adversarial_example - self.__original)
+        else:
+            return multiplying_factor * (
+                self.__bad_adversarial_example - self.__original)
    @property
    def is_targeted_attack(self):
@@ -115,20 +125,6 @@ class Adversary(object):
        """
        return self.__is_targeted_attack
-    @property
-    def target_label(self):
-        """
-        :property: target_label
-        """
-        return self.__target_label
-    @target_label.setter
-    def target_label(self, label):
-        """
-        :property: target_label
-        """
-        self.__target_label = label
    @property
    def target(self):
        """
@@ -143,20 +139,6 @@ class Adversary(object):
        """
        return self.__original
-    @property
-    def original_label(self):
-        """
-        :property: original
-        """
-        return self.__original_label
-    @original_label.setter
-    def original_label(self, label):
-        """
-        original_label setter
-        """
-        self.__original_label = label
    @property
    def adversarial_example(self):
        """
@@ -164,23 +146,9 @@ class Adversary(object):
        """
        return self.__adversarial_example
-    @adversarial_example.setter
-    def adversarial_example(self, example):
-        """
-        adversarial_example setter
-        """
-        self.__adversarial_example = example
    @property
-    def adversarial_label(self):
+    def bad_adversarial_example(self):
-        """
-        :property: adversarial_label
-        """
-        return self.__adversarial_label
-    @adversarial_label.setter
-    def adversarial_label(self, label):
        """
-        adversarial_label setter
+        :property: bad_adversarial_example
        """
-        self.__adversarial_label = label
+        return self.__bad_adversarial_example
--- a/fluid/adversarial/advbox/attacks/__init__.py
+++ b/fluid/adversarial/advbox/attacks/__init__.py
 """
-Attack methods
+Attack methods __init__.py
 """
-from .base import Attack
-from .deepfool import DeepFoolAttack
-from .gradientsign import FGSM
-from .gradientsign import GradientSignAttack
-from .iterator_gradientsign import IFGSM
-from .iterator_gradientsign import IteratorGradientSignAttack
--- a/fluid/adversarial/advbox/attacks/base.py
+++ b/fluid/adversarial/advbox/attacks/base.py
@@ -52,21 +52,23 @@ class Attack(object):
        :param adversary: adversary
        :return: None
        """
+        assert self.model.channel_axis() == adversary.original.ndim
        if adversary.original_label is None:
            adversary.original_label = np.argmax(
                self.model.predict(adversary.original))
        if adversary.is_targeted_attack and adversary.target_label is None:
            if adversary.target is None:
                raise ValueError(
-                    'When adversary.is_targeted_attack is True, '
+                    'When adversary.is_targeted_attack is true, '
                    'adversary.target_label or adversary.target must be set.')
            else:
-                adversary.target_label_label = np.argmax(
+                adversary.target_label = np.argmax(
-                    self.model.predict(
+                    self.model.predict(adversary.target))
-                        self.model.scale_input(adversary.target)))
-        logging.info('adversary:\noriginal_label: {}'
+        logging.info('adversary:'
-                     '\n          target_lable: {}'
+                     '\n         original_label: {}'
+                     '\n         target_label: {}'
                     '\n         is_targeted_attack: {}'
                     ''.format(adversary.original_label, adversary.target_label,
                               adversary.is_targeted_attack))
--- a/fluid/adversarial/advbox/attacks/deepfool.py
+++ b/fluid/adversarial/advbox/attacks/deepfool.py
@@ -10,6 +10,8 @@ import numpy as np
 from .base import Attack
+__all__ = ['DeepFoolAttack']
 class DeepFoolAttack(Attack):
    """
@@ -56,7 +58,7 @@ class DeepFoolAttack(Attack):
                gradient_k = self.model.gradient(x, k)
                w_k = gradient_k - gradient
                f_k = f[k] - f[pre_label]
-                w_k_norm = np.linalg.norm(w_k) + 1e-8
+                w_k_norm = np.linalg.norm(w_k.flatten()) + 1e-8
                pert_k = (np.abs(f_k) + 1e-8) / w_k_norm
                if pert_k < pert:
                    pert = pert_k
@@ -70,9 +72,12 @@ class DeepFoolAttack(Attack):
            f = self.model.predict(x)
            gradient = self.model.gradient(x, pre_label)
            adv_label = np.argmax(f)
-            logging.info('iteration = {}, f = {}, pre_label = {}'
+            logging.info('iteration={}, f[pre_label]={}, f[target_label]={}'
-                         ', adv_label={}'.format(iteration, f[pre_label],
+                         ', f[adv_label]={}, pre_label={}, adv_label={}'
-                                                 pre_label, adv_label))
+                         ''.format(iteration, f[pre_label], (
+                             f[adversary.target_label]
+                             if adversary.is_targeted_attack else 'NaN'), f[
+                                 adv_label], pre_label, adv_label))
            if adversary.try_accept_the_example(x, adv_label):
                return adversary

--- a/fluid/adversarial/advbox/attacks/gradient_method.py
+++ b/fluid/adversarial/advbox/attacks/gradient_method.py
+"""
+This module provide the attack method for Iterator FGSM's implement.
+"""
+from __future__ import division
+import logging
+from collections import Iterable
+import numpy as np
+from .base import Attack
+__all__ = [
+    'GradientMethodAttack', 'FastGradientSignMethodAttack', 'FGSM',
+    'FastGradientSignMethodTargetedAttack', 'FGSMT',
+    'BasicIterativeMethodAttack', 'BIM',
+    'IterativeLeastLikelyClassMethodAttack', 'ILCM'
+]
+class GradientMethodAttack(Attack):
+    """
+    This class implements gradient attack method, and is the base of FGSM, BIM,
+    ILCM, etc.
+    """
+    def __init__(self, model, support_targeted=True):
+        """
+        :param model(model): The model to be attacked.
+        :param support_targeted(bool): Does this attack method support targeted.
+        """
+        super(GradientMethodAttack, self).__init__(model)
+        self.support_targeted = support_targeted
+    def _apply(self, adversary, norm_ord=np.inf, epsilons=0.01, steps=100):
+        """
+        Apply the gradient attack method.
+        :param adversary(Adversary):
+            The Adversary object.
+        :param norm_ord(int):
+            Order of the norm, such as np.inf, 1, 2, etc. It can't be 0.
+        :param epsilons(list|tuple|int):
+            Attack step size (input variation).
+        :param steps:
+            The number of iterator steps.
+        :return:
+            adversary(Adversary): The Adversary object.
+        """
+        if norm_ord == 0:
+            raise ValueError("L0 norm is not supported!")
+        if not self.support_targeted:
+            if adversary.is_targeted_attack:
+                raise ValueError(
+                    "This attack method doesn't support targeted attack!")
+        if not isinstance(epsilons, Iterable):
+            epsilons = np.linspace(epsilons, epsilons + 1e-10, num=steps)
+        pre_label = adversary.original_label
+        min_, max_ = self.model.bounds()
+        assert self.model.channel_axis() == adversary.original.ndim
+        assert (self.model.channel_axis() == 1 or
+                self.model.channel_axis() == adversary.original.shape[0] or
+                self.model.channel_axis() == adversary.original.shape[-1])
+        step = 1
+        adv_img = adversary.original
+        for epsilon in epsilons[:steps]:
+            if epsilon == 0.0:
+                continue
+            if adversary.is_targeted_attack:
+                gradient = -self.model.gradient(adv_img, adversary.target_label)
+            else:
+                gradient = self.model.gradient(adv_img,
+                                               adversary.original_label)
+            if norm_ord == np.inf:
+                gradient_norm = np.sign(gradient)
+            else:
+                gradient_norm = gradient / self._norm(gradient, ord=norm_ord)
+            adv_img = adv_img + epsilon * gradient_norm * (max_ - min_)
+            adv_img = np.clip(adv_img, min_, max_)
+            adv_label = np.argmax(self.model.predict(adv_img))
+            logging.info('step={}, epsilon = {:.5f}, pre_label = {}, '
+                         'adv_label={}'.format(step, epsilon, pre_label,
+                                               adv_label))
+            if adversary.try_accept_the_example(adv_img, adv_label):
+                return adversary
+            step += 1
+        return adversary
+    @staticmethod
+    def _norm(a, ord):
+        if a.ndim == 1:
+            return np.linalg.norm(a, ord=ord)
+        if a.ndim == a.shape[0]:
+            norm_shape = (a.ndim, reduce(np.dot, a.shape[1:]))
+            norm_axis = 1
+        else:
+            norm_shape = (reduce(np.dot, a.shape[:-1]), a.ndim)
+            norm_axis = 0
+        return np.linalg.norm(a.reshape(norm_shape), ord=ord, axis=norm_axis)
+class FastGradientSignMethodTargetedAttack(GradientMethodAttack):
+    """
+    "Fast Gradient Sign Method" is extended to support targeted attack.
+    "Fast Gradient Sign Method" was originally implemented by Goodfellow et
+    al. (2015) with the infinity norm.
+    Paper link: https://arxiv.org/abs/1412.6572
+    """
+    def _apply(self, adversary, epsilons=0.03):
+        return GradientMethodAttack._apply(
+            self,
+            adversary=adversary,
+            norm_ord=np.inf,
+            epsilons=epsilons,
+            steps=1)
+class FastGradientSignMethodAttack(FastGradientSignMethodTargetedAttack):
+    """
+    This attack was originally implemented by Goodfellow et al. (2015) with the
+    infinity norm, and is known as the "Fast Gradient Sign Method".
+    Paper link: https://arxiv.org/abs/1412.6572
+    """
+    def __init__(self, model):
+        super(FastGradientSignMethodAttack, self).__init__(model, False)
+class IterativeLeastLikelyClassMethodAttack(GradientMethodAttack):
+    """
+    "Iterative Least-likely Class Method (ILCM)" extends "BIM" to support
+    targeted attack.
+    "The Basic Iterative Method (BIM)" is to extend "FSGM". "BIM" iteratively
+    take multiple small steps while adjusting the direction after each step.
+    Paper link: https://arxiv.org/abs/1607.02533
+    """
+    def _apply(self, adversary, epsilons=0.001, steps=1000):
+        return GradientMethodAttack._apply(
+            self,
+            adversary=adversary,
+            norm_ord=np.inf,
+            epsilons=epsilons,
+            steps=steps)
+class BasicIterativeMethodAttack(IterativeLeastLikelyClassMethodAttack):
+    """
+    FGSM is a one-step method. "The Basic Iterative Method (BIM)" iteratively
+    take multiple small steps while adjusting the direction after each step.
+    Paper link: https://arxiv.org/abs/1607.02533
+    """
+    def __init__(self, model):
+        super(BasicIterativeMethodAttack, self).__init__(model, False)
+FGSM = FastGradientSignMethodAttack
+FGSMT = FastGradientSignMethodTargetedAttack
+BIM = BasicIterativeMethodAttack
+ILCM = IterativeLeastLikelyClassMethodAttack
--- a/fluid/adversarial/advbox/attacks/gradientsign.py
+++ b/fluid/adversarial/advbox/attacks/gradientsign.py
-"""
-This module provide the attack method for FGSM's implement.
-"""
-from __future__ import division
-import logging
-from collections import Iterable
-import numpy as np
-from .base import Attack
-class GradientSignAttack(Attack):
-    """
-    This attack was originally implemented by Goodfellow et al. (2015) with the
-    infinity norm (and is known as the "Fast Gradient Sign Method").
-    This is therefore called the Fast Gradient Method.
-    Paper link: https://arxiv.org/abs/1412.6572
-    """
-    def _apply(self, adversary, epsilons=1000):
-        """
-          Apply the gradient sign attack.
-          Args:
-              adversary(Adversary): The Adversary object.
-              epsilons(list|tuple|int): The epsilon (input variation parameter).
-          Return:
-              adversary: The Adversary object.
-          """
-        assert adversary is not None
-        if not isinstance(epsilons, Iterable):
-            epsilons = np.linspace(0, 1, num=epsilons + 1)[1:]
-        pre_label = adversary.original_label
-        min_, max_ = self.model.bounds()
-        if adversary.is_targeted_attack:
-            gradient = self.model.gradient(adversary.original,
-                                           adversary.target_label)
-            gradient_sign = -np.sign(gradient) * (max_ - min_)
-        else:
-            gradient = self.model.gradient(adversary.original,
-                                           adversary.original_label)
-            gradient_sign = np.sign(gradient) * (max_ - min_)
-        for epsilon in epsilons:
-            adv_img = adversary.original + epsilon * gradient_sign
-            adv_img = np.clip(adv_img, min_, max_)
-            adv_label = np.argmax(self.model.predict(adv_img))
-            logging.info('epsilon = {:.3f}, pre_label = {}, adv_label={}'.
-                         format(epsilon, pre_label, adv_label))
-            if adversary.try_accept_the_example(adv_img, adv_label):
-                return adversary
-        return adversary
-FGSM = GradientSignAttack
--- a/fluid/adversarial/advbox/attacks/iterator_gradientsign.py
+++ b/fluid/adversarial/advbox/attacks/iterator_gradientsign.py
-"""
-This module provide the attack method for Iterator FGSM's implement.
-"""
-from __future__ import division
-import logging
-from collections import Iterable
-import numpy as np
-from .base import Attack
-class IteratorGradientSignAttack(Attack):
-    """
-    This attack was originally implemented by Alexey Kurakin(Google Brain).
-    Paper link: https://arxiv.org/pdf/1607.02533.pdf
-    """
-    def _apply(self, adversary, epsilons=100, steps=10):
-        """
-        Apply the iterative gradient sign attack.
-        Args:
-            adversary(Adversary): The Adversary object.
-            epsilons(list|tuple|int): The epsilon (input variation parameter).
-            steps(int): The number of iterator steps.
-        Return:
-            adversary(Adversary): The Adversary object.
-        """
-        if not isinstance(epsilons, Iterable):
-            epsilons = np.linspace(0, 1 / steps, num=epsilons + 1)[1:]
-        pre_label = adversary.original_label
-        min_, max_ = self.model.bounds()
-        for epsilon in epsilons:
-            adv_img = adversary.original
-            for _ in range(steps):
-                if adversary.is_targeted_attack:
-                    gradient = self.model.gradient(adversary.original,
-                                                   adversary.target_label)
-                    gradient_sign = -np.sign(gradient) * (max_ - min_)
-                else:
-                    gradient = self.model.gradient(adversary.original,
-                                                   adversary.original_label)
-                    gradient_sign = np.sign(gradient) * (max_ - min_)
-                adv_img = adv_img + gradient_sign * epsilon
-                adv_img = np.clip(adv_img, min_, max_)
-                adv_label = np.argmax(self.model.predict(adv_img))
-                logging.info('epsilon = {:.3f}, pre_label = {}, adv_label={}'.
-                             format(epsilon, pre_label, adv_label))
-                if adversary.try_accept_the_example(adv_img, adv_label):
-                    return adversary
-        return adversary
-IFGSM = IteratorGradientSignAttack
--- a/fluid/adversarial/advbox/attacks/lbfgs.py
+++ b/fluid/adversarial/advbox/attacks/lbfgs.py
+"""
+This module provide the attack method of "LBFGS".
+"""
+from __future__ import division
+import logging
+import numpy as np
+from scipy.optimize import fmin_l_bfgs_b
+from .base import Attack
+__all__ = ['LBFGSAttack', 'LBFGS']
+class LBFGSAttack(Attack):
+    """
+    Uses L-BFGS-B to minimize the cross-entropy and the distance between the
+    original and the adversary.
+    Paper link: https://arxiv.org/abs/1510.05328
+    """
+    def __init__(self, model):
+        super(LBFGSAttack, self).__init__(model)
+        self._predicts_normalized = None
+        self._adversary = None  # type: Adversary
+    def _apply(self, adversary, epsilon=0.001, steps=10):
+        self._adversary = adversary
+        if not adversary.is_targeted_attack:
+            raise ValueError("This attack method only support targeted attack!")
+        # finding initial c
+        logging.info('finding initial c...')
+        c = epsilon
+        x0 = adversary.original.flatten()
+        for i in range(30):
+            c = 2 * c
+            logging.info('c={}'.format(c))
+            is_adversary = self._lbfgsb(x0, c, steps)
+            if is_adversary:
+                break
+        if not is_adversary:
+            logging.info('Failed!')
+            return adversary
+        # binary search c
+        logging.info('binary search c...')
+        c_low = 0
+        c_high = c
+        while c_high - c_low >= epsilon:
+            logging.info('c_high={}, c_low={}, diff={}, epsilon={}'
+                         .format(c_high, c_low, c_high - c_low, epsilon))
+            c_half = (c_low + c_high) / 2
+            is_adversary = self._lbfgsb(x0, c_half, steps)
+            if is_adversary:
+                c_high = c_half
+            else:
+                c_low = c_half
+        return adversary
+    def _is_predicts_normalized(self, predicts):
+        """
+        To determine the predicts is normalized.
+        :param predicts(np.array): the output of the model.
+        :return: bool
+        """
+        if self._predicts_normalized is None:
+            if self.model.predict_name().lower() in [
+                    'softmax', 'probabilities', 'probs'
+            ]:
+                self._predicts_normalized = True
+            else:
+                if np.any(predicts < 0.0):
+                    self._predicts_normalized = False
+                else:
+                    s = np.sum(predicts.flatten())
+                    if 0.999 <= s <= 1.001:
+                        self._predicts_normalized = True
+                    else:
+                        self._predicts_normalized = False
+        assert self._predicts_normalized is not None
+        return self._predicts_normalized
+    def _loss(self, adv_x, c):
+        """
+        To get the loss and gradient.
+        :param adv_x: the candidate adversarial example
+        :param c: parameter 'C' in the paper
+        :return: (loss, gradient)
+        """
+        x = adv_x.reshape(self._adversary.original.shape)
+        # cross_entropy
+        logits = self.model.predict(x)
+        if not self._is_predicts_normalized(logits):  # to softmax
+            e = np.exp(logits)
+            logits = e / np.sum(e)
+        e = np.exp(logits)
+        s = np.sum(e)
+        ce = np.log(s) - logits[self._adversary.target_label]
+        # L2 distance
+        min_, max_ = self.model.bounds()
+        d = np.sum((x - self._adversary.original).flatten() ** 2) \
+            / ((max_ - min_) ** 2) / len(adv_x)
+        # gradient
+        gradient = self.model.gradient(x, self._adversary.target_label)
+        result = (c * ce + d).astype(float), gradient.flatten().astype(float)
+        return result
+    def _lbfgsb(self, x0, c, maxiter):
+        min_, max_ = self.model.bounds()
+        bounds = [(min_, max_)] * len(x0)
+        approx_grad_eps = (max_ - min_) / 100.0
+        x, f, d = fmin_l_bfgs_b(
+            self._loss,
+            x0,
+            args=(c, ),
+            bounds=bounds,
+            maxiter=maxiter,
+            epsilon=approx_grad_eps)
+        if np.amax(x) > max_ or np.amin(x) < min_:
+            x = np.clip(x, min_, max_)
+        shape = self._adversary.original.shape
+        adv_label = np.argmax(self.model.predict(x.reshape(shape)))
+        logging.info('pre_label = {}, adv_label={}'.format(
+            self._adversary.target_label, adv_label))
+        return self._adversary.try_accept_the_example(
+            x.reshape(shape), adv_label)
+LBFGS = LBFGSAttack
--- a/fluid/adversarial/advbox/attacks/saliency.py
+++ b/fluid/adversarial/advbox/attacks/saliency.py
+"""
+This module provide the attack method for JSMA's implement.
+"""
+from __future__ import division
+import logging
+import random
+import numpy as np
+from .base import Attack
+class SaliencyMapAttack(Attack):
+    """
+    Implements the Saliency Map Attack.
+    The Jacobian-based Saliency Map Approach (Papernot et al. 2016).
+    Paper link: https://arxiv.org/pdf/1511.07528.pdf
+    """
+    def _apply(self,
+               adversary,
+               max_iter=2000,
+               fast=True,
+               theta=0.1,
+               max_perturbations_per_pixel=7):
+        """
+        Apply the JSMA attack.
+        Args:
+            adversary(Adversary): The Adversary object.
+            max_iter(int): The max iterations.
+            fast(bool): Whether evaluate the pixel influence on sum of residual classes.
+            theta(float): Perturbation per pixel relative to [min, max] range.
+            max_perturbations_per_pixel(int): The max count of perturbation per pixel.
+        Return:
+            adversary: The Adversary object.
+        """
+        assert adversary is not None
+        if not adversary.is_targeted_attack or (adversary.target_label is None):
+            target_labels = self._generate_random_target(
+                adversary.original_label)
+        else:
+            target_labels = [adversary.target_label]
+        for target in target_labels:
+            original_image = adversary.original
+            # the mask defines the search domain
+            # each modified pixel with border value is set to zero in mask
+            mask = np.ones_like(original_image)
+            # count tracks how often each pixel was changed
+            counts = np.zeros_like(original_image)
+            labels = range(self.model.num_classes())
+            adv_img = original_image.copy()
+            min_, max_ = self.model.bounds()
+            for step in range(max_iter):
+                adv_img = np.clip(adv_img, min_, max_)
+                adv_label = np.argmax(self.model.predict(adv_img))
+                if adversary.try_accept_the_example(adv_img, adv_label):
+                    return adversary
+                # stop if mask is all zero
+                if not any(mask.flatten()):
+                    return adversary
+                logging.info('step = {}, original_label = {}, adv_label={}'.
+                             format(step, adversary.original_label, adv_label))
+                # get pixel location with highest influence on class
+                idx, p_sign = self._saliency_map(
+                    adv_img, target, labels, mask, fast=fast)
+                # apply perturbation
+                adv_img[idx] += -p_sign * theta * (max_ - min_)
+                # tracks number of updates for each pixel
+                counts[idx] += 1
+                # remove pixel from search domain if it hits the bound
+                if adv_img[idx] <= min_ or adv_img[idx] >= max_:
+                    mask[idx] = 0
+                # remove pixel if it was changed too often
+                if counts[idx] >= max_perturbations_per_pixel:
+                    mask[idx] = 0
+                adv_img = np.clip(adv_img, min_, max_)
+    def _generate_random_target(self, original_label):
+        """
+        Draw random target labels all of which are different and not the original label.
+        Args:
+            original_label(int): Original label.
+        Return:
+            target_labels(list): random target labels
+        """
+        num_random_target = 1
+        num_classes = self.model.num_classes()
+        assert num_random_target <= num_classes - 1
+        target_labels = random.sample(range(num_classes), num_random_target + 1)
+        target_labels = [t for t in target_labels if t != original_label]
+        target_labels = target_labels[:num_random_target]
+        return target_labels
+    def _saliency_map(self, image, target, labels, mask, fast=False):
+        """
+        Get pixel location with highest influence on class.
+        Args:
+            image(numpy.ndarray): Image with shape (height, width, channels).
+            target(int): The target label.
+            labels(int): The number of classes of the output label.
+            mask(list): Each modified pixel with border value is set to zero in mask.
+            fast(bool): Whether evaluate the pixel influence on sum of residual classes.
+        Return:
+            idx: The index of optimal pixel.
+            pix_sign: The direction of perturbation
+        """
+        # pixel influence on target class
+        alphas = self.model.gradient(image, target) * mask
+        # pixel influence on sum of residual classes(don't evaluate if fast == True)
+        if fast:
+            betas = -np.ones_like(alphas)
+        else:
+            betas = np.sum([
+                self.model.gradient(image, label) * mask - alphas
+                for label in labels
+            ], 0)
+        # compute saliency map (take into account both pos. & neg. perturbations)
+        sal_map = np.abs(alphas) * np.abs(betas) * np.sign(alphas * betas)
+        # find optimal pixel & direction of perturbation
+        idx = np.argmin(sal_map)
+        idx = np.unravel_index(idx, mask.shape)
+        pix_sign = np.sign(alphas)[idx]
+        return idx, pix_sign
+JSMA = SaliencyMapAttack
--- a/fluid/adversarial/advbox/models/__init__.py
+++ b/fluid/adversarial/advbox/models/__init__.py
 """
-Paddle model for target of attack
+Models __init__.py
 """
\ No newline at end of file
-from .base import Model
-from .paddle import PaddleModel
--- a/fluid/adversarial/advbox/models/base.py
+++ b/fluid/adversarial/advbox/models/base.py
@@ -24,11 +24,21 @@ class Model(object):
        assert len(bounds) == 2
        assert channel_axis in [0, 1, 2, 3]
-        if preprocess is None:
-            preprocess = (0, 1)
        self._bounds = bounds
        self._channel_axis = channel_axis
-        self._preprocess = preprocess
+        # Make self._preprocess to be (0,1) if possible, so that don't need
+        # to do substract or divide.
+        if preprocess is not None:
+            sub, div = np.array(preprocess)
+            if not np.any(sub):
+                sub = 0
+            if np.all(div == 1):
+                div = 1
+            assert (div is None) or np.all(div)
+            self._preprocess = (sub, div)
+        else:
+            self._preprocess = (0, 1)
    def bounds(self):
        """
@@ -47,8 +57,7 @@ class Model(object):
        sub, div = self._preprocess
        if np.any(sub != 0):
            res = input_ - sub
-        assert np.any(div != 0)
+        if not np.all(sub == 1):
-        if np.any(div != 1):
            if res is None:  # "res = input_ - sub" is not executed!
                res = input_ / div
            else:
@@ -97,3 +106,11 @@ class Model(object):
                with the shape (height, width, channel).
        """
        raise NotImplementedError
+    @abstractmethod
+    def predict_name(self):
+        """
+        Get the predict name, such as "softmax",etc.
+        :return: string
+        """
+        raise NotImplementedError
--- a/fluid/adversarial/advbox/models/paddle.py
+++ b/fluid/adversarial/advbox/models/paddle.py
@@ -4,7 +4,7 @@ Paddle model
 from __future__ import absolute_import
 import numpy as np
-import paddle.v2.fluid as fluid
+import paddle.fluid as fluid
 from .base import Model
@@ -16,7 +16,7 @@ class PaddleModel(Model):
    instance of PaddleModel.
    Args:
-        program(paddle.v2.fluid.framework.Program): The program of the model
+        program(paddle.fluid.framework.Program): The program of the model
            which generate the adversarial sample.
        input_name(string): The name of the input.
        logits_name(string): The name of the logits.
@@ -114,3 +114,10 @@ class PaddleModel(Model):
                              feed=feeder.feed([(scaled_data, label)]),
                              fetch_list=[self._gradient])
        return grad.reshape(data.shape)
+    def predict_name(self):
+        """
+        Get the predict name, such as "softmax",etc.
+        :return: string
+        """
+        return self._program.block(0).var(self._predict_name).op.type
--- a/fluid/adversarial/fluid_mnist.py
+++ b/fluid/adversarial/fluid_mnist.py
@@ -2,7 +2,7 @@
 CNN on mnist data using fluid api of paddlepaddle
 """
 import paddle.v2 as paddle
-import paddle.v2.fluid as fluid
+import paddle.fluid as fluid
 def mnist_cnn_model(img):

--- a/fluid/adversarial/mnist_tutorial_fgsm.py
+++ b/fluid/adversarial/mnist_tutorial_fgsm.py
@@ -3,10 +3,10 @@ FGSM demos on mnist using advbox tool.
 """
 import matplotlib.pyplot as plt
 import paddle.v2 as paddle
-import paddle.v2.fluid as fluid
+import paddle.fluid as fluid
-from advbox import Adversary
+from advbox.adversary import Adversary
-from advbox.attacks.gradientsign import GradientSignAttack
+from advbox.attacks.gradient_method import FGSM
 from advbox.models.paddle import PaddleModel
@@ -73,7 +73,7 @@ def main():
    # advbox demo
    m = PaddleModel(fluid.default_main_program(), IMG_NAME, LABEL_NAME,
                    logits.name, avg_cost.name, (-1, 1))
-    att = GradientSignAttack(m)
+    att = FGSM(m)
    for data in train_reader():
        # fgsm attack
        adversary = att(Adversary(data[0][0], data[0][1]))

--- a/fluid/adversarial/mnist_tutorial_jsma.py
+++ b/fluid/adversarial/mnist_tutorial_jsma.py
+"""
+FGSM demos on mnist using advbox tool.
+"""
+import matplotlib.pyplot as plt
+import paddle.v2 as paddle
+import paddle.fluid as fluid
+import numpy as np
+from advbox import Adversary
+from advbox.attacks.saliency import SaliencyMapAttack
+from advbox.models.paddle import PaddleModel
+def cnn_model(img):
+    """
+    Mnist cnn model
+    Args:
+        img(Varaible): the input image to be recognized
+    Returns:
+        Variable: the label prediction
+    """
+    # conv1 = fluid.nets.conv2d()
+    conv_pool_1 = fluid.nets.simple_img_conv_pool(
+        input=img,
+        num_filters=20,
+        filter_size=5,
+        pool_size=2,
+        pool_stride=2,
+        act='relu')
+    conv_pool_2 = fluid.nets.simple_img_conv_pool(
+        input=conv_pool_1,
+        num_filters=50,
+        filter_size=5,
+        pool_size=2,
+        pool_stride=2,
+        act='relu')
+    logits = fluid.layers.fc(input=conv_pool_2, size=10, act='softmax')
+    return logits
+def main():
+    """
+    Advbox demo which demonstrate how to use advbox.
+    """
+    IMG_NAME = 'img'
+    LABEL_NAME = 'label'
+    img = fluid.layers.data(name=IMG_NAME, shape=[1, 28, 28], dtype='float32')
+    # gradient should flow
+    img.stop_gradient = False
+    label = fluid.layers.data(name=LABEL_NAME, shape=[1], dtype='int64')
+    logits = cnn_model(img)
+    cost = fluid.layers.cross_entropy(input=logits, label=label)
+    avg_cost = fluid.layers.mean(x=cost)
+    place = fluid.CPUPlace()
+    exe = fluid.Executor(place)
+    BATCH_SIZE = 1
+    train_reader = paddle.batch(
+        paddle.reader.shuffle(
+            paddle.dataset.mnist.train(), buf_size=500),
+        batch_size=BATCH_SIZE)
+    feeder = fluid.DataFeeder(
+        feed_list=[IMG_NAME, LABEL_NAME],
+        place=place,
+        program=fluid.default_main_program())
+    fluid.io.load_params(
+        exe, "./mnist/", main_program=fluid.default_main_program())
+    # advbox demo
+    m = PaddleModel(fluid.default_main_program(), IMG_NAME, LABEL_NAME,
+                    logits.name, avg_cost.name, (-1, 1))
+    attack = SaliencyMapAttack(m)
+    total_num = 0
+    success_num = 0
+    for data in train_reader():
+        total_num += 1
+        # adversary.set_target(True, target_label=target_label)
+        jsma_attack = attack(Adversary(data[0][0], data[0][1]))
+        if jsma_attack is not None and jsma_attack.is_successful():
+            # plt.imshow(jsma_attack.target, cmap='Greys_r')
+            # plt.show()
+            success_num += 1
+            print('original_label=%d, adversary examples label =%d' %
+                  (data[0][1], jsma_attack.adversarial_label))
+            # np.save('adv_img', jsma_attack.adversarial_example)
+        print('total num = %d, success num = %d ' % (total_num, success_num))
+        if total_num == 100:
+            break
+if __name__ == '__main__':
+    main()
--- a/fluid/image_classification/README.md
+++ b/fluid/image_classification/README.md
+The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
+---
 # SE-ResNeXt for image classification
 This model built with paddle fluid is still under active development and is not

--- a/fluid/image_classification/mobilenet.py
+++ b/fluid/image_classification/mobilenet.py
+import os
+import paddle.v2 as paddle
+import paddle.fluid as fluid
+from paddle.fluid.initializer import MSRA
+from paddle.fluid.param_attr import ParamAttr
+parameter_attr = ParamAttr(initializer=MSRA())
+def conv_bn_layer(input,
+                  filter_size,
+                  num_filters,
+                  stride,
+                  padding,
+                  channels=None,
+                  num_groups=1,
+                  act='relu',
+                  use_cudnn=True):
+    conv = fluid.layers.conv2d(
+        input=input,
+        num_filters=num_filters,
+        filter_size=filter_size,
+        stride=stride,
+        padding=padding,
+        groups=num_groups,
+        act=None,
+        use_cudnn=use_cudnn,
+        param_attr=parameter_attr,
+        bias_attr=False)
+    return fluid.layers.batch_norm(input=conv, act=act)
+def depthwise_separable(input, num_filters1, num_filters2, num_groups, stride,
+                        scale):
+    """
+    """
+    depthwise_conv = conv_bn_layer(
+        input=input,
+        filter_size=3,
+        num_filters=int(num_filters1 * scale),
+        stride=stride,
+        padding=1,
+        num_groups=int(num_groups * scale),
+        use_cudnn=False)
+    pointwise_conv = conv_bn_layer(
+        input=depthwise_conv,
+        filter_size=1,
+        num_filters=int(num_filters2 * scale),
+        stride=1,
+        padding=0)
+    return pointwise_conv
+def mobile_net(img, class_dim, scale=1.0):
+    # conv1: 112x112
+    tmp = conv_bn_layer(
+        img,
+        filter_size=3,
+        channels=3,
+        num_filters=int(32 * scale),
+        stride=2,
+        padding=1)
+    # 56x56
+    tmp = depthwise_separable(
+        tmp,
+        num_filters1=32,
+        num_filters2=64,
+        num_groups=32,
+        stride=1,
+        scale=scale)
+    tmp = depthwise_separable(
+        tmp,
+        num_filters1=64,
+        num_filters2=128,
+        num_groups=64,
+        stride=2,
+        scale=scale)
+    # 28x28
+    tmp = depthwise_separable(
+        tmp,
+        num_filters1=128,
+        num_filters2=128,
+        num_groups=128,
+        stride=1,
+        scale=scale)
+    tmp = depthwise_separable(
+        tmp,
+        num_filters1=128,
+        num_filters2=256,
+        num_groups=128,
+        stride=2,
+        scale=scale)
+    # 14x14
+    tmp = depthwise_separable(
+        tmp,
+        num_filters1=256,
+        num_filters2=256,
+        num_groups=256,
+        stride=1,
+        scale=scale)
+    tmp = depthwise_separable(
+        tmp,
+        num_filters1=256,
+        num_filters2=512,
+        num_groups=256,
+        stride=2,
+        scale=scale)
+    # 14x14
+    for i in range(5):
+        tmp = depthwise_separable(
+            tmp,
+            num_filters1=512,
+            num_filters2=512,
+            num_groups=512,
+            stride=1,
+            scale=scale)
+    # 7x7
+    tmp = depthwise_separable(
+        tmp,
+        num_filters1=512,
+        num_filters2=1024,
+        num_groups=512,
+        stride=2,
+        scale=scale)
+    tmp = depthwise_separable(
+        tmp,
+        num_filters1=1024,
+        num_filters2=1024,
+        num_groups=1024,
+        stride=1,
+        scale=scale)
+    tmp = fluid.layers.pool2d(
+        input=tmp,
+        pool_size=0,
+        pool_stride=1,
+        pool_type='avg',
+        global_pooling=True)
+    tmp = fluid.layers.fc(input=tmp,
+                          size=class_dim,
+                          act='softmax',
+                          param_attr=parameter_attr)
+    return tmp
+def train(learning_rate, batch_size, num_passes, model_save_dir='model'):
+    class_dim = 102
+    image_shape = [3, 224, 224]
+    image = fluid.layers.data(name='image', shape=image_shape, dtype='float32')
+    label = fluid.layers.data(name='label', shape=[1], dtype='int64')
+    out = mobile_net(image, class_dim=class_dim)
+    cost = fluid.layers.cross_entropy(input=out, label=label)
+    avg_cost = fluid.layers.mean(x=cost)
+    optimizer = fluid.optimizer.Momentum(
+        learning_rate=learning_rate,
+        momentum=0.9,
+        regularization=fluid.regularizer.L2Decay(5 * 1e-5))
+    opts = optimizer.minimize(avg_cost)
+    accuracy = fluid.evaluator.Accuracy(input=out, label=label)
+    inference_program = fluid.default_main_program().clone()
+    with fluid.program_guard(inference_program):
+        test_accuracy = fluid.evaluator.Accuracy(input=out, label=label)
+        test_target = [avg_cost] + test_accuracy.metrics + test_accuracy.states
+        inference_program = fluid.io.get_inference_program(test_target)
+    place = fluid.CUDAPlace(0)
+    exe = fluid.Executor(place)
+    exe.run(fluid.default_startup_program())
+    train_reader = paddle.batch(
+        paddle.dataset.flowers.train(), batch_size=batch_size)
+    test_reader = paddle.batch(
+        paddle.dataset.flowers.test(), batch_size=batch_size)
+    feeder = fluid.DataFeeder(place=place, feed_list=[image, label])
+    for pass_id in range(num_passes):
+        accuracy.reset(exe)
+        for batch_id, data in enumerate(train_reader()):
+            loss, acc = exe.run(fluid.default_main_program(),
+                                feed=feeder.feed(data),
+                                fetch_list=[avg_cost] + accuracy.metrics)
+            print("Pass {0}, batch {1}, loss {2}, acc {3}".format(
+                pass_id, batch_id, loss[0], acc[0]))
+        pass_acc = accuracy.eval(exe)
+        test_accuracy.reset(exe)
+        for data in test_reader():
+            loss, acc = exe.run(inference_program,
+                                feed=feeder.feed(data),
+                                fetch_list=[avg_cost] + test_accuracy.metrics)
+        test_pass_acc = test_accuracy.eval(exe)
+        print("End pass {0}, train_acc {1}, test_acc {2}".format(
+            pass_id, pass_acc, test_pass_acc))
+        if pass_id % 10 == 0:
+            model_path = os.path.join(model_save_dir, str(pass_id))
+            print 'save models to %s' % (model_path)
+            fluid.io.save_inference_model(model_path, ['image'], [out], exe)
+if __name__ == '__main__':
+    train(learning_rate=0.005, batch_size=40, num_passes=300)
--- a/fluid/image_classification/se_resnext.py
+++ b/fluid/image_classification/se_resnext.py
 import os
 import paddle.v2 as paddle
-import paddle.v2.fluid as fluid
+import paddle.fluid as fluid
 import reader
@@ -103,66 +103,87 @@ def train(learning_rate,
          batch_size,
          num_passes,
          init_model=None,
-          model_save_dir='model'):
+          model_save_dir='model',
+          parallel=True):
    class_dim = 1000
    image_shape = [3, 224, 224]
    image = fluid.layers.data(name='image', shape=image_shape, dtype='float32')
    label = fluid.layers.data(name='label', shape=[1], dtype='int64')
-    out = SE_ResNeXt(input=image, class_dim=class_dim)
+    if parallel:
+        places = fluid.layers.get_places()
+        pd = fluid.layers.ParallelDo(places)
+        with pd.do():
+            image_ = pd.read_input(image)
+            label_ = pd.read_input(label)
+            out = SE_ResNeXt(input=image_, class_dim=class_dim)
+            cost = fluid.layers.cross_entropy(input=out, label=label_)
+            avg_cost = fluid.layers.mean(x=cost)
+            accuracy = fluid.layers.accuracy(input=out, label=label_)
+            pd.write_output(avg_cost)
+            pd.write_output(accuracy)
+        avg_cost, accuracy = pd()
+        avg_cost = fluid.layers.mean(x=avg_cost)
+        accuracy = fluid.layers.mean(x=accuracy)
+    else:
+        out = SE_ResNeXt(input=image, class_dim=class_dim)
        cost = fluid.layers.cross_entropy(input=out, label=label)
        avg_cost = fluid.layers.mean(x=cost)
+        accuracy = fluid.layers.accuracy(input=out, label=label)
    optimizer = fluid.optimizer.Momentum(
        learning_rate=learning_rate,
        momentum=0.9,
        regularization=fluid.regularizer.L2Decay(1e-4))
    opts = optimizer.minimize(avg_cost)
-    accuracy = fluid.evaluator.Accuracy(input=out, label=label)
    inference_program = fluid.default_main_program().clone()
    with fluid.program_guard(inference_program):
-        test_accuracy = fluid.evaluator.Accuracy(input=out, label=label)
+        inference_program = fluid.io.get_inference_program([avg_cost, accuracy])
-        test_target = [avg_cost] + test_accuracy.metrics + test_accuracy.states
-        inference_program = fluid.io.get_inference_program(test_target)
    place = fluid.CUDAPlace(0)
    exe = fluid.Executor(place)
    exe.run(fluid.default_startup_program())
    if init_model is not None:
-        fluid.io.load_persistables_if_exist(exe, init_model)
+        fluid.io.load_persistables(exe, init_model)
    train_reader = paddle.batch(reader.train(), batch_size=batch_size)
    test_reader = paddle.batch(reader.test(), batch_size=batch_size)
    feeder = fluid.DataFeeder(place=place, feed_list=[image, label])
    for pass_id in range(num_passes):
-        accuracy.reset(exe)
        for batch_id, data in enumerate(train_reader()):
-            loss, acc = exe.run(fluid.default_main_program(),
+            loss = exe.run(fluid.default_main_program(),
                           feed=feeder.feed(data),
-                                fetch_list=[avg_cost] + accuracy.metrics)
+                           fetch_list=[avg_cost])
-            print("Pass {0}, batch {1}, loss {2}, acc {3}".format(
+            print("Pass {0}, batch {1}, loss {2}".format(pass_id, batch_id,
-                pass_id, batch_id, loss[0], acc[0]))
+                                                         float(loss[0])))
-        pass_acc = accuracy.eval(exe)
-        test_accuracy.reset(exe)
+        total_loss = 0.0
+        total_acc = 0.0
+        total_batch = 0
        for data in test_reader():
            loss, acc = exe.run(inference_program,
                                feed=feeder.feed(data),
-                                fetch_list=[avg_cost] + test_accuracy.metrics)
+                                fetch_list=[avg_cost, accuracy])
-        test_pass_acc = test_accuracy.eval(exe)
+            total_loss += float(loss)
-        print("End pass {0}, train_acc {1}, test_acc {2}".format(
+            total_acc += float(acc)
-            pass_id, pass_acc, test_pass_acc))
+            total_batch += 1
+        print("End pass {0}, test_loss {1}, test_acc {2}".format(
+            pass_id, total_loss / total_batch, total_acc / total_batch))
        model_path = os.path.join(model_save_dir, str(pass_id))
-        if not os.path.isdir(model_path):
+        fluid.io.save_inference_model(model_path, ['image'], [out], exe)
-            os.makedirs(model_path)
-        fluid.io.save_persistables(exe, model_path)
 if __name__ == '__main__':
-    train(learning_rate=0.1, batch_size=8, num_passes=100, init_model=None)
+    train(
+        learning_rate=0.1,
+        batch_size=8,
+        num_passes=100,
+        init_model=None,
+        parallel=False)
--- a/fluid/ocr_recognition/ctc_reader.py
+++ b/fluid/ocr_recognition/ctc_reader.py
+import os
+import cv2
+import numpy as np
+from PIL import Image
+from paddle.v2.image import load_image
+class DataGenerator(object):
+    def __init__(self):
+        pass
+    def train_reader(self, img_root_dir, img_label_list, batchsize):
+        '''
+        Reader interface for training.
+        :param img_root_dir: The root path of the image for training.
+        :type file_list: str 
+        :param img_label_list: The path of the <image_name, label> file for training.
+        :type file_list: str 
+        '''
+        img_label_lines = []
+        if batchsize == 1:
+            to_file = "tmp.txt"
+            cmd = "cat " + img_label_list + " | awk '{print $1,$2,$3,$4;}' | shuf > " + to_file
+            print "cmd: " + cmd
+            os.system(cmd)
+            print "finish batch shuffle"
+            img_label_lines = open(to_file, 'r').readlines()
+        else:
+            to_file = "tmp.txt"
+            #cmd1: partial shuffle
+            cmd = "cat " + img_label_list + " | awk '{printf(\"%04d%.4f %s\\n\", $1, rand(), $0)}' | sort | sed 1,$((1 + RANDOM % 100))d | "
+            #cmd2: batch merge and shuffle
+            cmd += "awk '{printf $2\" \"$3\" \"$4\" \"$5\" \"; if(NR % " + str(
+                batchsize) + " == 0) print \"\";}' | shuf | "
+            #cmd3: batch split
+            cmd += "awk '{if(NF == " + str(
+                batchsize
+            ) + " * 4) {for(i = 0; i < " + str(
+                batchsize
+            ) + "; i++) print $(4*i+1)\" \"$(4*i+2)\" \"$(4*i+3)\" \"$(4*i+4);}}' > " + to_file
+            print "cmd: " + cmd
+            os.system(cmd)
+            print "finish batch shuffle"
+            img_label_lines = open(to_file, 'r').readlines()
+        def reader():
+            sizes = len(img_label_lines) / batchsize
+            for i in range(sizes):
+                result = []
+                sz = [0, 0]
+                for j in range(batchsize):
+                    line = img_label_lines[i * batchsize + j]
+                    # h, w, img_name, labels
+                    items = line.split(' ')
+                    label = [int(c) for c in items[-1].split(',')]
+                    img = Image.open(os.path.join(img_root_dir, items[
+                        2])).convert('L')  #zhuanhuidu
+                    if j == 0:
+                        sz = img.size
+                    img = img.resize((sz[0], sz[1]))
+                    img = np.array(img) - 127.5
+                    img = img[np.newaxis, ...]
+                    result.append([img, label])
+                yield result
+        return reader
+    def test_reader(self, img_root_dir, img_label_list):
+        '''
+        Reader interface for inference.
+        :param img_root_dir: The root path of the images for training.
+        :type file_list: str 
+        :param img_label_list: The path of the <image_name, label> file for testing.
+        :type file_list: list
+        '''
+        def reader():
+            for line in open(img_label_list):
+                # h, w, img_name, labels
+                items = line.split(' ')
+                label = [int(c) for c in items[-1].split(',')]
+                img = Image.open(os.path.join(img_root_dir, items[2])).convert(
+                    'L')
+                img = np.array(img) - 127.5
+                img = img[np.newaxis, ...]
+                yield img, label
+        return reader
--- a/fluid/policy_gradient/README.md
+++ b/fluid/policy_gradient/README.md
-# Policy Gradient RL by PaddlePaddle
+运行本目录下的程序示例需要使用PaddlePaddle的最新develop分枝。如果您的PaddlePaddle安装版本低于此要求，请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
+---
+# Policy Gradient RL by PaddlePaddle
 本文介绍了如何使用PaddlePaddle通过policy-based的强化学习方法来训练一个player（actor model）, 我们希望这个player可以完成简单的走阶梯任务。
 内容分为:

--- a/fluid/policy_gradient/brain.py
+++ b/fluid/policy_gradient/brain.py
 import numpy as np
 import paddle.v2 as paddle
-import paddle.v2.fluid as fluid
+import paddle.fluid as fluid
 # reproducible
 np.random.seed(1)

--- a/fluid/text_classification/README.md
+++ b/fluid/text_classification/README.md
+The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
+---
 # Text Classification
 ## Data Preparation

--- a/fluid/text_classification/train.py
+++ b/fluid/text_classification/train.py
@@ -5,7 +5,7 @@ import argparse
 import time
 import paddle.v2 as paddle
-import paddle.v2.fluid as fluid
+import paddle.fluid as fluid
 from config import TrainConfig as conf

--- a/fluid/transformer/.gitignore
+++ b/fluid/transformer/.gitignore
+*.pyc
--- a/fluid/transformer/README.md
+++ b/fluid/transformer/README.md
+The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
+---
+# Attention is All You Need: A Paddle Fluid implementation
+This is a Paddle Fluid implementation of the Transformer model in [Attention is All You Need]() (Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, arxiv, 2017).
+If you use the dataset/code in your research, please cite the paper:
+```text
+@inproceedings{vaswani2017attention,
+  title={Attention is all you need},
+  author={Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, {\L}ukasz and Polosukhin, Illia},
+  booktitle={Advances in Neural Information Processing Systems},
+  pages={6000--6010},
+  year={2017}
+}
+```
+### TODO
+This project is still under active development.
--- a/fluid/transformer/config.py
+++ b/fluid/transformer/config.py
+class TrainTaskConfig(object):
+    use_gpu = False
+    # the epoch number to train.
+    pass_num = 2
+    # number of sequences contained in a mini-batch.
+    batch_size = 64
+    # the hyper params for Adam optimizer.
+    learning_rate = 0.001
+    beta1 = 0.9
+    beta2 = 0.98
+    eps = 1e-9
+    # the params for learning rate scheduling
+    warmup_steps = 4000
+class ModelHyperParams(object):
+    # Dictionary size for source and target language. This model directly uses
+    # paddle.dataset.wmt16 in which <bos>, <eos> and <unk> token has
+    # alreay been added, but the <pad> token is not added. Transformer requires
+    # sequences in a mini-batch are padded to have the same length. A <pad> token is
+    # added into the original dictionary in paddle.dateset.wmt16.
+    # size of source word dictionary.
+    src_vocab_size = 10000
+    # index for <pad> token in source language.
+    src_pad_idx = src_vocab_size
+    # size of target word dictionay
+    trg_vocab_size = 10000
+    # index for <pad> token in target language.
+    trg_pad_idx = trg_vocab_size
+    # position value corresponding to the <pad> token.
+    pos_pad_idx = 0
+    # max length of sequences. It should plus 1 to include position
+    # padding token for position encoding.
+    max_length = 50
+    # the dimension for word embeddings, which is also the last dimension of
+    # the input and output of multi-head attention, position-wise feed-forward
+    # networks, encoder and decoder.
+    d_model = 512
+    # size of the hidden layer in position-wise feed-forward networks.
+    d_inner_hid = 1024
+    # the dimension that keys are projected to for dot-product attention.
+    d_key = 64
+    # the dimension that values are projected to for dot-product attention.
+    d_value = 64
+    # number of head used in multi-head attention.
+    n_head = 8
+    # number of sub-layers to be stacked in the encoder and decoder.
+    n_layer = 6
+    # dropout rate used by all dropout layers.
+    dropout = 0.1
+# Names of position encoding table which will be initialized externally.
+pos_enc_param_names = (
+    "src_pos_enc_table",
+    "trg_pos_enc_table", )
+# Names of all data layers listed in order.
+input_data_names = (
+    "src_word",
+    "src_pos",
+    "trg_word",
+    "trg_pos",
+    "src_slf_attn_bias",
+    "trg_slf_attn_bias",
+    "trg_src_attn_bias",
+    "lbl_word",
+    "lbl_weight", )
--- a/fluid/transformer/model.py
+++ b/fluid/transformer/model.py
+from functools import partial
+import numpy as np
+import paddle.v2 as paddle
+import paddle.fluid as fluid
+import paddle.fluid.layers as layers
+from config import TrainTaskConfig, input_data_names, pos_enc_param_names
+# FIXME(guosheng): Remove out the batch_size from the model.
+batch_size = TrainTaskConfig.batch_size
+def position_encoding_init(n_position, d_pos_vec):
+    """
+    Generate the initial values for the sinusoid position encoding table.
+    """
+    position_enc = np.array([[
+        pos / np.power(10000, 2 * (j // 2) / d_pos_vec)
+        for j in range(d_pos_vec)
+    ] if pos != 0 else np.zeros(d_pos_vec) for pos in range(n_position)])
+    position_enc[1:, 0::2] = np.sin(position_enc[1:, 0::2])  # dim 2i
+    position_enc[1:, 1::2] = np.cos(position_enc[1:, 1::2])  # dim 2i+1
+    return position_enc.astype("float32")
+def multi_head_attention(queries,
+                         keys,
+                         values,
+                         attn_bias,
+                         d_key,
+                         d_value,
+                         d_model,
+                         num_heads=1,
+                         dropout_rate=0.):
+    """
+    Multi-Head Attention. Note that attn_bias is added to the logit before
+    computing softmax activiation to mask certain selected positions so that
+    they will not considered in attention weights.
+    """
+    if not (len(queries.shape) == len(keys.shape) == len(values.shape) == 3):
+        raise ValueError(
+            "Inputs: quries, keys and values should all be 3-D tensors.")
+    def __compute_qkv(queries, keys, values, num_heads, d_key, d_value):
+        """
+        Add linear projection to queries, keys, and values.
+        """
+        q = layers.fc(input=queries,
+                      size=d_key * num_heads,
+                      bias_attr=False,
+                      num_flatten_dims=2)
+        k = layers.fc(input=keys,
+                      size=d_key * num_heads,
+                      bias_attr=False,
+                      num_flatten_dims=2)
+        v = layers.fc(input=values,
+                      size=d_value * num_heads,
+                      bias_attr=False,
+                      num_flatten_dims=2)
+        return q, k, v
+    def __split_heads(x, num_heads):
+        """
+        Reshape the last dimension of inpunt tensor x so that it becomes two
+        dimensions and then transpose. Specifically, input a tensor with shape
+        [bs, max_sequence_length, num_heads * hidden_dim] then output a tensor
+        with shape [bs, num_heads, max_sequence_length, hidden_dim].
+        """
+        if num_heads == 1:
+            return x
+        hidden_size = x.shape[-1]
+        # FIXME(guosheng): Decouple the program desc with batch_size.
+        reshaped = layers.reshape(
+            x=x, shape=[batch_size, -1, num_heads, hidden_size // num_heads])
+        # permuate the dimensions into:
+        # [batch_size, num_heads, max_sequence_len, hidden_size_per_head]
+        return layers.transpose(x=reshaped, perm=[0, 2, 1, 3])
+    def __combine_heads(x):
+        """
+        Transpose and then reshape the last two dimensions of inpunt tensor x
+        so that it becomes one dimension, which is reverse to __split_heads.
+        """
+        if len(x.shape) == 3: return x
+        if len(x.shape) != 4:
+            raise ValueError("Input(x) should be a 4-D Tensor.")
+        trans_x = layers.transpose(x, perm=[0, 2, 1, 3])
+        # FIXME(guosheng): Decouple the program desc with batch_size.
+        return layers.reshape(
+            x=trans_x,
+            shape=map(int,
+                      [batch_size, -1, trans_x.shape[2] * trans_x.shape[3]]))
+    def scaled_dot_product_attention(q, k, v, attn_bias, d_key, dropout_rate):
+        """
+        Scaled Dot-Product Attention
+        """
+        # FIXME(guosheng): Optimize the shape in reshape_op or softmax_op.
+        # The current implementation of softmax_op only supports 2D tensor,
+        # consequently it cannot be directly used here.
+        # If to use the reshape_op, Besides, the shape of product inferred in
+        # compile-time is not the actual shape in run-time. It cann't be used
+        # to set the attribute of reshape_op.
+        # So, here define the softmax for temporary solution.
+        def __softmax(x, eps=1e-9):
+            exp_out = layers.exp(x=x)
+            sum_out = layers.reduce_sum(exp_out, dim=-1, keep_dim=False)
+            return layers.elementwise_div(x=exp_out, y=sum_out, axis=0)
+        scaled_q = layers.scale(x=q, scale=d_key**-0.5)
+        product = layers.matmul(x=scaled_q, y=k, transpose_y=True)
+        weights = __softmax(layers.elementwise_add(x=product, y=attn_bias))
+        if dropout_rate:
+            weights = layers.dropout(
+                weights, dropout_prob=dropout_rate, is_test=False)
+        out = layers.matmul(weights, v)
+        return out
+    q, k, v = __compute_qkv(queries, keys, values, num_heads, d_key, d_value)
+    q = __split_heads(q, num_heads)
+    k = __split_heads(k, num_heads)
+    v = __split_heads(v, num_heads)
+    ctx_multiheads = scaled_dot_product_attention(q, k, v, attn_bias, d_key,
+                                                  dropout_rate)
+    out = __combine_heads(ctx_multiheads)
+    # Project back to the model size.
+    proj_out = layers.fc(input=out,
+                         size=d_model,
+                         bias_attr=False,
+                         num_flatten_dims=2)
+    return proj_out
+def positionwise_feed_forward(x, d_inner_hid, d_hid):
+    """
+    Position-wise Feed-Forward Networks.
+    This module consists of two linear transformations with a ReLU activation
+    in between, which is applied to each position separately and identically.
+    """
+    hidden = layers.fc(input=x,
+                       size=d_inner_hid,
+                       num_flatten_dims=2,
+                       act="relu")
+    out = layers.fc(input=hidden, size=d_hid, num_flatten_dims=2)
+    return out
+def pre_post_process_layer(prev_out, out, process_cmd, dropout=0.):
+    """
+    Add residual connection, layer normalization and droput to the out tensor
+    optionally according to the value of process_cmd.
+    This will be used before or after multi-head attention and position-wise
+    feed-forward networks.
+    """
+    for cmd in process_cmd:
+        if cmd == "a":  # add residual connection
+            out = out + prev_out if prev_out else out
+        elif cmd == "n":  # add layer normalization
+            out = layers.layer_norm(out, begin_norm_axis=len(out.shape) - 1)
+        elif cmd == "d":  # add dropout
+            if dropout:
+                out = layers.dropout(out, dropout_prob=dropout, is_test=False)
+    return out
+pre_process_layer = partial(pre_post_process_layer, None)
+post_process_layer = pre_post_process_layer
+def prepare_encoder(src_word,
+                    src_pos,
+                    src_vocab_size,
+                    src_emb_dim,
+                    src_pad_idx,
+                    src_max_len,
+                    dropout=0.,
+                    pos_pad_idx=0,
+                    pos_enc_param_name=None):
+    """Add word embeddings and position encodings.
+    The output tensor has a shape of:
+    [batch_size, max_src_length_in_batch, d_model].
+    This module is used at the bottom of the encoder stacks.
+    """
+    src_word_emb = layers.embedding(
+        src_word, size=[src_vocab_size, src_emb_dim], padding_idx=src_pad_idx)
+    src_pos_enc = layers.embedding(
+        src_pos,
+        size=[src_max_len, src_emb_dim],
+        padding_idx=pos_pad_idx,
+        param_attr=fluid.ParamAttr(
+            name=pos_enc_param_name, trainable=False))
+    enc_input = src_word_emb + src_pos_enc
+    # FIXME(guosheng): Decouple the program desc with batch_size.
+    enc_input = layers.reshape(x=enc_input, shape=[batch_size, -1, src_emb_dim])
+    return layers.dropout(
+        enc_input, dropout_prob=dropout,
+        is_test=False) if dropout else enc_input
+prepare_encoder = partial(
+    prepare_encoder, pos_enc_param_name=pos_enc_param_names[0])
+prepare_decoder = partial(
+    prepare_encoder, pos_enc_param_name=pos_enc_param_names[1])
+def encoder_layer(enc_input,
+                  attn_bias,
+                  n_head,
+                  d_key,
+                  d_value,
+                  d_model,
+                  d_inner_hid,
+                  dropout_rate=0.):
+    """The encoder layers that can be stacked to form a deep encoder.
+    This module consits of a multi-head (self) attention followed by
+    position-wise feed-forward networks and both the two components companied
+    with the post_process_layer to add residual connection, layer normalization
+    and droput.
+    """
+    attn_output = multi_head_attention(enc_input, enc_input, enc_input,
+                                       attn_bias, d_key, d_value, d_model,
+                                       n_head, dropout_rate)
+    attn_output = post_process_layer(enc_input, attn_output, "dan",
+                                     dropout_rate)
+    ffd_output = positionwise_feed_forward(attn_output, d_inner_hid, d_model)
+    return post_process_layer(attn_output, ffd_output, "dan", dropout_rate)
+def encoder(enc_input,
+            attn_bias,
+            n_layer,
+            n_head,
+            d_key,
+            d_value,
+            d_model,
+            d_inner_hid,
+            dropout_rate=0.):
+    """
+    The encoder is composed of a stack of identical layers returned by calling
+    encoder_layer.
+    """
+    for i in range(n_layer):
+        enc_output = encoder_layer(enc_input, attn_bias, n_head, d_key, d_value,
+                                   d_model, d_inner_hid, dropout_rate)
+        enc_input = enc_output
+    return enc_output
+def decoder_layer(dec_input,
+                  enc_output,
+                  slf_attn_bias,
+                  dec_enc_attn_bias,
+                  n_head,
+                  d_key,
+                  d_value,
+                  d_model,
+                  d_inner_hid,
+                  dropout_rate=0.):
+    """ The layer to be stacked in decoder part.
+    The structure of this module is similar to that in the encoder part except
+    a multi-head attention is added to implement encoder-decoder attention.
+    """
+    slf_attn_output = multi_head_attention(
+        dec_input,
+        dec_input,
+        dec_input,
+        slf_attn_bias,
+        d_key,
+        d_value,
+        d_model,
+        n_head,
+        dropout_rate, )
+    slf_attn_output = post_process_layer(
+        dec_input,
+        slf_attn_output,
+        "dan",  # residual connection + dropout + layer normalization
+        dropout_rate, )
+    enc_attn_output = multi_head_attention(
+        slf_attn_output,
+        enc_output,
+        enc_output,
+        dec_enc_attn_bias,
+        d_key,
+        d_value,
+        d_model,
+        n_head,
+        dropout_rate, )
+    enc_attn_output = post_process_layer(
+        slf_attn_output,
+        enc_attn_output,
+        "dan",  # residual connection + dropout + layer normalization
+        dropout_rate, )
+    ffd_output = positionwise_feed_forward(
+        enc_attn_output,
+        d_inner_hid,
+        d_model, )
+    dec_output = post_process_layer(
+        enc_attn_output,
+        ffd_output,
+        "dan",  # residual connection + dropout + layer normalization
+        dropout_rate, )
+    return dec_output
+def decoder(dec_input,
+            enc_output,
+            dec_slf_attn_bias,
+            dec_enc_attn_bias,
+            n_layer,
+            n_head,
+            d_key,
+            d_value,
+            d_model,
+            d_inner_hid,
+            dropout_rate=0.):
+    """
+    The decoder is composed of a stack of identical decoder_layer layers.
+    """
+    for i in range(n_layer):
+        dec_output = decoder_layer(
+            dec_input,
+            enc_output,
+            dec_slf_attn_bias,
+            dec_enc_attn_bias,
+            n_head,
+            d_key,
+            d_value,
+            d_model,
+            d_inner_hid,
+            dropout_rate, )
+        dec_input = dec_output
+    return dec_output
+def transformer(
+        src_vocab_size,
+        trg_vocab_size,
+        max_length,
+        n_layer,
+        n_head,
+        d_key,
+        d_value,
+        d_model,
+        d_inner_hid,
+        dropout_rate,
+        src_pad_idx,
+        trg_pad_idx,
+        pos_pad_idx, ):
+    # The shapes here act as placeholder.
+    # The shapes set here is to pass the infer-shape in compile time. The actual
+    # shape of src_word in run time is:
+    # [batch_size * max_src_length_in_a_batch, 1].
+    src_word = layers.data(
+        name=input_data_names[0],
+        shape=[batch_size * max_length, 1],
+        dtype="int64",
+        append_batch_size=False)
+    # The actual shape of src_pos in runtime is:
+    # [batch_size * max_src_length_in_a_batch, 1].
+    src_pos = layers.data(
+        name=input_data_names[1],
+        shape=[batch_size * max_length, 1],
+        dtype="int64",
+        append_batch_size=False)
+    # The actual shape of trg_word is in runtime is:
+    # [batch_size * max_trg_length_in_a_batch, 1].
+    trg_word = layers.data(
+        name=input_data_names[2],
+        shape=[batch_size * max_length, 1],
+        dtype="int64",
+        append_batch_size=False)
+    # The actual shape of trg_pos in runtime is:
+    # [batch_size * max_trg_length_in_a_batch, 1].
+    trg_pos = layers.data(
+        name=input_data_names[3],
+        shape=[batch_size * max_length, 1],
+        dtype="int64",
+        append_batch_size=False)
+    # The actual shape of src_slf_attn_bias in runtime is:
+    # [batch_size, n_head, max_src_length_in_a_batch, max_src_length_in_a_batch].
+    # This input is used to remove attention weights on paddings.
+    src_slf_attn_bias = layers.data(
+        name=input_data_names[4],
+        shape=[batch_size, n_head, max_length, max_length],
+        dtype="float32",
+        append_batch_size=False)
+    # The actual shape of trg_slf_attn_bias in runtime is:
+    # [batch_size, n_head, max_trg_length_in_batch, max_trg_length_in_batch].
+    # This is used to remove attention weights on paddings and subsequent words.
+    trg_slf_attn_bias = layers.data(
+        name=input_data_names[5],
+        shape=[batch_size, n_head, max_length, max_length],
+        dtype="float32",
+        append_batch_size=False)
+    # The actual shape of trg_src_attn_bias in runtime is:
+    # [batch_size, n_head, max_trg_length_in_batch, max_src_length_in_batch].
+    # This is used to remove attention weights on paddings.
+    trg_src_attn_bias = layers.data(
+        name=input_data_names[6],
+        shape=[batch_size, n_head, max_length, max_length],
+        dtype="float32",
+        append_batch_size=False)
+    enc_input = prepare_encoder(
+        src_word,
+        src_pos,
+        src_vocab_size,
+        d_model,
+        src_pad_idx,
+        max_length,
+        dropout_rate, )
+    enc_output = encoder(
+        enc_input,
+        src_slf_attn_bias,
+        n_layer,
+        n_head,
+        d_key,
+        d_value,
+        d_model,
+        d_inner_hid,
+        dropout_rate, )
+    dec_input = prepare_decoder(
+        trg_word,
+        trg_pos,
+        trg_vocab_size,
+        d_model,
+        trg_pad_idx,
+        max_length,
+        dropout_rate, )
+    dec_output = decoder(
+        dec_input,
+        enc_output,
+        trg_slf_attn_bias,
+        trg_src_attn_bias,
+        n_layer,
+        n_head,
+        d_key,
+        d_value,
+        d_model,
+        d_inner_hid,
+        dropout_rate, )
+    # TODO(guosheng): Share the weight matrix between the embedding layers and
+    # the pre-softmax linear transformation.
+    predict = layers.reshape(
+        x=layers.fc(input=dec_output,
+                    size=trg_vocab_size,
+                    bias_attr=False,
+                    num_flatten_dims=2),
+        shape=[-1, trg_vocab_size],
+        act="softmax")
+    # The actual shape of gold in runtime is:
+    # [batch_size * max_trg_length_in_a_batch, 1].
+    gold = layers.data(
+        name=input_data_names[7],
+        shape=[batch_size * max_length, 1],
+        dtype="int64",
+        append_batch_size=False)
+    cost = layers.cross_entropy(input=predict, label=gold)
+    # The actual shape of weights in runtime is:
+    # [batch_size * max_trg_length_in_a_batch, 1].
+    # Padding index do not contribute to the total loss. This Weight is used to
+    # cancel padding index in calculating the loss.
+    weights = layers.data(
+        name=input_data_names[8],
+        shape=[batch_size * max_length, 1],
+        dtype="float32",
+        append_batch_size=False)
+    weighted_cost = cost * weights
+    return layers.reduce_sum(weighted_cost)
--- a/fluid/transformer/optim.py
+++ b/fluid/transformer/optim.py
+import numpy as np
+import paddle.fluid as fluid
+import paddle.fluid.layers as layers
+class LearningRateScheduler(object):
+    """
+    Wrapper for learning rate scheduling as described in the Transformer paper.
+    LearningRateScheduler adapts the learning rate externally and the adapted
+    learning rate will be feeded into the main_program as input data.
+    """
+    def __init__(self,
+                 d_model,
+                 warmup_steps,
+                 place,
+                 learning_rate=0.001,
+                 current_steps=0,
+                 name="learning_rate"):
+        self.current_steps = current_steps
+        self.warmup_steps = warmup_steps
+        self.d_model = d_model
+        self.learning_rate = layers.create_global_var(
+            name=name,
+            shape=[1],
+            value=float(learning_rate),
+            dtype="float32",
+            persistable=True)
+        self.place = place
+    def update_learning_rate(self, data_input):
+        self.current_steps += 1
+        lr_value = np.power(self.d_model, -0.5) * np.min([
+            np.power(self.current_steps, -0.5),
+            np.power(self.warmup_steps, -1.5) * self.current_steps
+        ])
+        lr_tensor = fluid.LoDTensor()
+        lr_tensor.set(np.array([lr_value], dtype="float32"), self.place)
+        data_input[self.learning_rate.name] = lr_tensor
--- a/fluid/transformer/train.py
+++ b/fluid/transformer/train.py
+import numpy as np
+import paddle.v2 as paddle
+import paddle.fluid as fluid
+from model import transformer, position_encoding_init
+from optim import LearningRateScheduler
+from config import TrainTaskConfig, ModelHyperParams, \
+        pos_enc_param_names, input_data_names
+def prepare_batch_input(insts, input_data_names, src_pad_idx, trg_pad_idx,
+                        max_length, n_head, place):
+    """
+    Pad the instances to the max sequence length in batch, and generate the
+    corresponding position data and attention bias. Then, convert the numpy
+    data to tensors and return a dict mapping names to tensors.
+    """
+    input_dict = {}
+    def __pad_batch_data(insts,
+                         pad_idx,
+                         is_target=False,
+                         return_pos=True,
+                         return_attn_bias=True,
+                         return_max_len=True):
+        """
+        Pad the instances to the max sequence length in batch, and generate the
+        corresponding position data and attention bias.
+        """
+        return_list = []
+        max_len = max(len(inst) for inst in insts)
+        inst_data = np.array(
+            [inst + [pad_idx] * (max_len - len(inst)) for inst in insts])
+        return_list += [inst_data.astype("int64").reshape([-1, 1])]
+        if return_pos:
+            inst_pos = np.array([[
+                pos_i + 1 if w_i != pad_idx else 0
+                for pos_i, w_i in enumerate(inst)
+            ] for inst in inst_data])
+            return_list += [inst_pos.astype("int64").reshape([-1, 1])]
+        if return_attn_bias:
+            if is_target:
+                # This is used to avoid attention on paddings and subsequent
+                # words.
+                slf_attn_bias_data = np.ones((inst_data.shape[0], max_len,
+                                              max_len))
+                slf_attn_bias_data = np.triu(slf_attn_bias_data, 1).reshape(
+                    [-1, 1, max_len, max_len])
+                slf_attn_bias_data = np.tile(slf_attn_bias_data,
+                                             [1, n_head, 1, 1]) * [-1e9]
+            else:
+                # This is used to avoid attention on paddings.
+                slf_attn_bias_data = np.array([[0] * len(inst) + [-1e9] *
+                                               (max_len - len(inst))
+                                               for inst in insts])
+                slf_attn_bias_data = np.tile(
+                    slf_attn_bias_data.reshape([-1, 1, 1, max_len]),
+                    [1, n_head, max_len, 1])
+            return_list += [slf_attn_bias_data.astype("float32")]
+        if return_max_len:
+            return_list += [max_len]
+        return return_list if len(return_list) > 1 else return_list[0]
+    def data_to_tensor(data_list, name_list, input_dict, place):
+        assert len(data_list) == len(name_list)
+        for i in range(len(name_list)):
+            tensor = fluid.LoDTensor()
+            tensor.set(data_list[i], place)
+            input_dict[name_list[i]] = tensor
+    src_word, src_pos, src_slf_attn_bias, src_max_len = __pad_batch_data(
+        [inst[0] for inst in insts], src_pad_idx, is_target=False)
+    trg_word, trg_pos, trg_slf_attn_bias, trg_max_len = __pad_batch_data(
+        [inst[1] for inst in insts], trg_pad_idx, is_target=True)
+    trg_src_attn_bias = np.tile(src_slf_attn_bias[:, :, ::src_max_len, :],
+                                [1, 1, trg_max_len, 1]).astype("float32")
+    lbl_word = __pad_batch_data([inst[2] for inst in insts], trg_pad_idx, False,
+                                False, False, False)
+    lbl_weight = (lbl_word != trg_pad_idx).astype("float32").reshape([-1, 1])
+    data_to_tensor([
+        src_word, src_pos, trg_word, trg_pos, src_slf_attn_bias,
+        trg_slf_attn_bias, trg_src_attn_bias, lbl_word, lbl_weight
+    ], input_data_names, input_dict, place)
+    return input_dict
+def main():
+    place = fluid.CUDAPlace(0) if TrainTaskConfig.use_gpu else fluid.CPUPlace()
+    exe = fluid.Executor(place)
+    cost = transformer(
+        ModelHyperParams.src_vocab_size + 1,
+        ModelHyperParams.trg_vocab_size + 1, ModelHyperParams.max_length + 1,
+        ModelHyperParams.n_layer, ModelHyperParams.n_head,
+        ModelHyperParams.d_key, ModelHyperParams.d_value,
+        ModelHyperParams.d_model, ModelHyperParams.d_inner_hid,
+        ModelHyperParams.dropout, ModelHyperParams.src_pad_idx,
+        ModelHyperParams.trg_pad_idx, ModelHyperParams.pos_pad_idx)
+    lr_scheduler = LearningRateScheduler(ModelHyperParams.d_model,
+                                         TrainTaskConfig.warmup_steps, place,
+                                         TrainTaskConfig.learning_rate)
+    optimizer = fluid.optimizer.Adam(
+        learning_rate=lr_scheduler.learning_rate,
+        beta1=TrainTaskConfig.beta1,
+        beta2=TrainTaskConfig.beta2,
+        epsilon=TrainTaskConfig.eps)
+    optimizer.minimize(cost)
+    train_data = paddle.batch(
+        paddle.reader.shuffle(
+            paddle.dataset.wmt16.train(ModelHyperParams.src_vocab_size,
+                                       ModelHyperParams.trg_vocab_size),
+            buf_size=51200),
+        batch_size=TrainTaskConfig.batch_size)
+    # Initialize the parameters.
+    exe.run(fluid.framework.default_startup_program())
+    for pos_enc_param_name in pos_enc_param_names:
+        pos_enc_param = fluid.global_scope().find_var(
+            pos_enc_param_name).get_tensor()
+        pos_enc_param.set(
+            position_encoding_init(ModelHyperParams.max_length + 1,
+                                   ModelHyperParams.d_model), place)
+    for pass_id in xrange(TrainTaskConfig.pass_num):
+        for batch_id, data in enumerate(train_data()):
+            # The current program desc is coupled with batch_size, thus all
+            # mini-batches must have the same number of instances currently.
+            if len(data) != TrainTaskConfig.batch_size:
+                continue
+            data_input = prepare_batch_input(
+                data, input_data_names, ModelHyperParams.src_pad_idx,
+                ModelHyperParams.trg_pad_idx, ModelHyperParams.max_length,
+                ModelHyperParams.n_head, place)
+            lr_scheduler.update_learning_rate(data_input)
+            outs = exe.run(fluid.framework.default_main_program(),
+                           feed=data_input,
+                           fetch_list=[cost])
+            cost_val = np.array(outs[0])
+            print("pass_id = " + str(pass_id) + " batch = " + str(batch_id) +
+                  " avg_cost = " + str(cost_val))
+if __name__ == "__main__":
+    main()
--- a/generate_chinese_poetry/README.md
+++ b/generate_chinese_poetry/README.md
+运行本目录下的程序示例需要使用PaddlePaddle v0.10.0版本。如果您的PaddlePaddle安装版本低于此要求，请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
+---
 # 中国古诗生成
 ## 简介

--- a/generate_sequence_by_rnn_lm/README.md
+++ b/generate_sequence_by_rnn_lm/README.md
+运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此要求，请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
+---
 # 使用循环神经网语言模型生成文本
 语言模型(Language Model)是一个概率分布模型，简单来说，就是用来计算一个句子的概率的模型。利用它可以确定哪个词序列的可能性更大，或者给定若干个词，可以预测下一个最可能出现的词。语言模型是自然语言处理领域里一个重要的基础模型。

--- a/globally_normalized_reader/README.md
+++ b/globally_normalized_reader/README.md
+The minimum PaddlePaddle version needed for the code sample in this directory is v0.11.0. If you are on a version of PaddlePaddle earlier than v0.11.0, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
+---
 # Globally Normalized Reader
 This model implements the work in the following paper:

--- a/hsigmoid/README.md
+++ b/hsigmoid/README.md
+运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此要求，请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
+---
 # Hsigmoid加速词向量训练
 ## 背景介绍
 在自然语言处理领域中，传统做法通常使用one-hot向量来表示词，比如词典为['我', '你', '喜欢']，可以用[1,0,0]、[0,1,0]和[0,0,1]这三个向量分别表示'我'、'你'和'喜欢'。这种表示方式比较简洁，但是当词表很大时，容易产生维度爆炸问题；而且任意两个词的向量是正交的，向量包含的信息有限。为了避免或减轻one-hot表示的缺点，目前通常使用词向量来取代one-hot表示，词向量也就是word embedding，即使用一个低维稠密的实向量取代高维稀疏的one-hot向量。训练词向量的方法有很多种，神经网络模型是其中之一，包括CBOW、Skip-gram等，这些模型本质上都是一个分类模型，当词表较大即类别较多时，传统的softmax将非常消耗时间。PaddlePaddle提供了Hsigmoid Layer、NCE Layer，来加速模型的训练过程。本文主要介绍如何使用Hsigmoid Layer来加速训练，词向量相关内容请查阅PaddlePaddle Book中的[词向量章节](https://github.com/PaddlePaddle/book/tree/develop/04.word2vec)。

--- a/image_classification/README.md
+++ b/image_classification/README.md
+运行本目录下的程序示例需要使用PaddlePaddle v0.11.0 版本。如果您的PaddlePaddle安装版本低于此要求，请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
+---
 图像分类
 =======================

--- a/ltr/README.md
+++ b/ltr/README.md
+运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此要求，请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
+---
 # 排序学习(Learning To Rank)
 排序学习技术\[[1](#参考文献1)\]是构建排序模型的机器学习方法，在信息检索、自然语言处理，数据挖掘等机器学场景中具有重要作用。排序学习的主要目的是对给定一组文档，对任意查询请求给出反映相关性的文档排序。在本例子中，利用标注过的语料库训练两种经典排序模型RankNet[[4](#参考文献4)\]和LamdaRank[[6](#参考文献6)\]，分别可以生成对应的排序模型，能够对任意查询请求，给出相关性文档排序。

--- a/mt_with_external_memory/README.md
+++ b/mt_with_external_memory/README.md
+运行本目录下的程序示例需要使用PaddlePaddle v0.11.0 版本。如果您的PaddlePaddle安装版本低于此要求，请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
+---
 # 带外部记忆机制的神经机器翻译
 带**外部记忆**（External Memory）机制的神经机器翻译模型（Neural Machine Translation, NMT），是神经机器翻译模型的一个重要扩展。它引入可微分的记忆网络作为额外的记忆单元，拓展神经翻译模型内部工作记忆（Working Memory）的容量或带宽，辅助完成翻译等任务中信息的临时存取，改善模型表现。

--- a/nce_cost/README.md
+++ b/nce_cost/README.md
+运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此要求，请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
+---
 # 使用噪声对比估计加速语言模型训练
 ## 为什么需要噪声对比估计
@@ -102,10 +106,10 @@ return paddle.layer.nce(
 NCE 层的一些重要参数解释如下：
 | 参数名                   | 参数作用                                 | 介绍                                                                                                                                                 |
-|:------ |:-------| :--------|
+| :----------------------- | :--------------------------------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------- |
-| param\_attr / bias\_attr | 用来设置参数名字 |方便预测阶段加载参数，具体在预测一节中介绍。|
+| param\_attr / bias\_attr | 用来设置参数名字                         | 方便预测阶段加载参数，具体在预测一节中介绍。                                                                                                         |
-| num\_neg\_samples | 负样本采样个数|可以控制正负样本比例，这个值取值区间为 [1, 字典大小-1]，负样本个数越多则整个模型的训练速度越慢，模型精度也会越高 |
+| num\_neg\_samples        | 负样本采样个数                           | 可以控制正负样本比例，这个值取值区间为 [1, 字典大小-1]，负样本个数越多则整个模型的训练速度越慢，模型精度也会越高                                     |
-| neg\_distribution | 生成负样例标签的分布，默认是一个均匀分布| 可以自行控制负样本采样时各个类别的采样权重。例如：希望正样例为“晴天”时，负样例“洪水”在训练时更被着重区分，则可以将“洪水”这个类别的采样权重增加|
+| neg\_distribution        | 生成负样例标签的分布，默认是一个均匀分布 | 可以自行控制负样本采样时各个类别的采样权重。例如：希望正样例为“晴天”时，负样例“洪水”在训练时更被着重区分，则可以将“洪水”这个类别的采样权重增加 |
 ## 预测
 1. 在命令行运行 :

--- a/nested_sequence/text_classification/README.md
+++ b/nested_sequence/text_classification/README.md
+运行本目录下的程序示例需要使用PaddlePaddle v0.11.0 版本。如果您的PaddlePaddle安装版本低于此要求，请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
+---
 # 基于双层序列的文本分类
 ## 简介

--- a/neural_qa/README.md
+++ b/neural_qa/README.md
+The minimum PaddlePaddle version needed for the code sample in this directory is v0.10.0. If you are on a version of PaddlePaddle earlier than v0.10.0, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
+---
 # Neural Recurrent Sequence Labeling Model for Open-Domain Factoid Question Answering
 This model implements the work in the following paper:

--- a/nmt_without_attention/README.md
+++ b/nmt_without_attention/README.md
+The minimum PaddlePaddle version needed for the code sample in this directory is v0.10.0. If you are on a version of PaddlePaddle earlier than v0.10.0, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
+---
 # Neural Machine Translation Model
 ## Background Introduction

--- a/scene_text_recognition/README.md
+++ b/scene_text_recognition/README.md
+运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此要求，请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
+---
 # 场景文字识别 (STR, Scene Text Recognition)
 ## STR任务简介

--- a/scheduled_sampling/README.md
+++ b/scheduled_sampling/README.md
+运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此要求，请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
+---
 # Scheduled Sampling
 ## 概述

--- a/sequence_tagging_for_ner/README.md
+++ b/sequence_tagging_for_ner/README.md
+运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此要求，请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
+---
 # 命名实体识别
 以下是本例的简要目录结构及说明：
@@ -88,7 +92,7 @@ Baghdad      NNP  I-NP  I-LOC
 预处理完成后，一条训练样本包含3个部分作为神经网络的输入信息用于训练：（1）句子序列；（2）首字母大写标记序列；（3）标注序列，下表是一条训练样本的示例：
 | 句子序列 | 大写标记序列 | 标注序列 |
-|---|---|---|
+| -------- | ------------ | -------- |
 | u.n.     | 1            | B-ORG    |
 | official | 0            | O        |
 | ekeus    | 1            | B-PER    |

--- a/ssd/README.cn.md
+++ b/ssd/README.cn.md
+运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此要求，请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
+---
 # SSD目标检测
 ## 概述
 SSD全称：Single Shot MultiBox Detector，是目标检测领域较新且效果较好的检测算法之一\[[1](#引用)\]，有着检测速度快且检测精度高的有的。PaddlePaddle已集成SSD算法，本示例旨在介绍如何使用PaddlePaddle中的SSD模型进行目标检测。下文首先简要介绍SSD原理，然后介绍示例包含文件及如何使用，接着介绍如何在PASCAL VOC数据集上训练、评估及检测，最后简要介绍如何在自有数据集上使用SSD。

--- a/ssd/README.md
+++ b/ssd/README.md
+The minimum PaddlePaddle version needed for the code sample in this directory is v0.10.0. If you are on a version of PaddlePaddle earlier than v0.10.0, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
+---
 # Single Shot MultiBox Detector (SSD) Object Detection
 ## Introduction

--- a/text_classification/README.md
+++ b/text_classification/README.md
+运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此要求，请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
+---
 # 文本分类
 以下是本例目录包含的文件以及对应说明:

--- a/youtube_recall/README.cn.md
+++ b/youtube_recall/README.cn.md
--- a/youtube_recall/README.md
+++ b/youtube_recall/README.md
--- a/youtube_recall/data/data.tar
+++ b/youtube_recall/data/data.tar
--- a/youtube_recall/data_processor.py
+++ b/youtube_recall/data_processor.py
--- a/youtube_recall/images/model_network.png
+++ b/youtube_recall/images/model_network.png
--- a/youtube_recall/images/recommendation_system.png
+++ b/youtube_recall/images/recommendation_system.png
--- a/youtube_recall/infer.py
+++ b/youtube_recall/infer.py
--- a/youtube_recall/infer_user.py
+++ b/youtube_recall/infer_user.py
--- a/youtube_recall/item_vector.py
+++ b/youtube_recall/item_vector.py
--- a/youtube_recall/network_conf.py
+++ b/youtube_recall/network_conf.py
--- a/youtube_recall/reader.py
+++ b/youtube_recall/reader.py
--- a/youtube_recall/train.py
+++ b/youtube_recall/train.py
--- a/youtube_recall/user_vector.py
+++ b/youtube_recall/user_vector.py
--- a/youtube_recall/utils.py
+++ b/youtube_recall/utils.py