diff --git a/conv_seq2seq/README.md b/conv_seq2seq/README.md
index 920c664562ef160699dff7b391aa3f4ad8608387..5b22c2c17ea2ff3588e93219e86d81a831242211 100644
--- a/conv_seq2seq/README.md
+++ b/conv_seq2seq/README.md
@@ -1,3 +1,7 @@
+The minimum PaddlePaddle version needed for the code sample in this directory is v0.11.0. If you are on a version of PaddlePaddle earlier than v0.11.0, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
+
+---
+
 # Convolutional Sequence to Sequence Learning
 This model implements the work in the following paper:
 
diff --git a/ctr/README.cn.md b/ctr/README.cn.md
index a4cb6d17144a9d78a2764e8a49d3abaeb918b7b6..d717264c46529c4ca3be6500983558b0384a7d77 100644
--- a/ctr/README.cn.md
+++ b/ctr/README.cn.md
@@ -1,3 +1,7 @@
+运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此要求，请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
+
+---
+
 # 点击率预估
 
 以下是本例目录包含的文件以及对应说明:
diff --git a/ctr/README.md b/ctr/README.md
index 6f11ac60734b4a549a9c84d7fbba8ed283a97284..9ace483be6126b31e064ce3014cea1b08664f8cf 100644
--- a/ctr/README.md
+++ b/ctr/README.md
@@ -1,3 +1,7 @@
+The minimum PaddlePaddle version needed for the code sample in this directory is v0.10.0. If you are on a version of PaddlePaddle earlier than v0.10.0, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
+
+---
+
 # Click-Through Rate Prediction
 
 ## Introduction
diff --git a/deep_fm/README.md b/deep_fm/README.md
index aa63170c921e7a22801790834a0db86df5e70e7a..6e2c6fad38d2e9e9db8d17c4967196b4f1cc5a36 100644
--- a/deep_fm/README.md
+++ b/deep_fm/README.md
@@ -1,3 +1,7 @@
+The minimum PaddlePaddle version needed for the code sample in this directory is v0.11.0. If you are on a version of PaddlePaddle earlier than v0.11.0, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
+
+---
+
 # Deep Factorization Machine for Click-Through Rate prediction
 
 ## Introduction
diff --git a/dssm/README.cn.md b/dssm/README.cn.md
index 4a80c87673e958c958c82a2cd5dfb7bf0dbaa075..140446ad2e071e8bc185d7788dcf33651a370d69 100644
--- a/dssm/README.cn.md
+++ b/dssm/README.cn.md
@@ -1,3 +1,7 @@
+运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此版本要求，请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
+
+---
+
 # 深度结构化语义模型 (Deep Structured Semantic Models, DSSM)
 DSSM使用DNN模型在一个连续的语义空间中学习文本低纬的表示向量，并且建模两个句子间的语义相似度。本例演示如何使用PaddlePaddle实现一个通用的DSSM 模型，用于建模两个字符串间的语义相似度，模型实现支持通用的数据格式，用户替换数据便可以在真实场景中使用该模型。
 
diff --git a/dssm/README.md b/dssm/README.md
index 6e3d7583a28d77760d4ca727bc0215a1b5d4ea82..ad378f6cd52b0e08efbaac37848d1c167c086ac1 100644
--- a/dssm/README.md
+++ b/dssm/README.md
@@ -1,3 +1,7 @@
+The minimum PaddlePaddle version needed for the code sample in this directory is v0.10.0. If you are on a version of PaddlePaddle earlier than v0.10.0, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
+
+---
+
 # Deep Structured Semantic Models (DSSM)
 Deep Structured Semantic Models (DSSM) is simple but powerful DNN based model for matching web search queries and the URL based documents. This example demonstrates how to use PaddlePaddle to implement a generic DSSM model for modeling the semantic similarity between two strings.
 
diff --git a/fluid/DeepASR/README.md b/fluid/DeepASR/README.md
index ac385ea7549fd193205573564dd07594caf118fe..0c3c95a67adeb8ac8a01a320a8a10fb9902542f2 100644
--- a/fluid/DeepASR/README.md
+++ b/fluid/DeepASR/README.md
@@ -1 +1,6 @@
-Deep ASR Kickoff
+The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
+
+---
+### TODO
+
+This project is still under active development.
diff --git a/fluid/DeepASR/infer.py b/fluid/DeepASR/infer.py
index 5543e8252f130760b8e71b83bd9da0393889cce0..fa0c3382d6dbc2f8d6eb443f08ab75f9fc2d6756 100644
--- a/fluid/DeepASR/infer.py
+++ b/fluid/DeepASR/infer.py
@@ -4,7 +4,7 @@ from __future__ import print_function
 
 import os
 import argparse
-import paddle.v2.fluid as fluid
+import paddle.fluid as fluid
 import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm
 import data_utils.augmentor.trans_add_delta as trans_add_delta
 import data_utils.augmentor.trans_splice as trans_splice
diff --git a/fluid/DeepASR/model_utils/model.py b/fluid/DeepASR/model_utils/model.py
index 4e29394a0a2ced700c99f3ea9fa516a9a58e1420..541f869c7224e620c519c97472dbe79ca73bd84b 100644
--- a/fluid/DeepASR/model_utils/model.py
+++ b/fluid/DeepASR/model_utils/model.py
@@ -3,7 +3,7 @@ from __future__ import division
 from __future__ import print_function
 
 import paddle.v2 as paddle
-import paddle.v2.fluid as fluid
+import paddle.fluid as fluid
 
 
 def stacked_lstmp_model(hidden_dim,
diff --git a/fluid/DeepASR/tools/profile.py b/fluid/DeepASR/tools/profile.py
index 492a3aa2362de52c3b14d455da528b893a124640..cb0227c33a25b1c38977f8485237f13d0351c36f 100644
--- a/fluid/DeepASR/tools/profile.py
+++ b/fluid/DeepASR/tools/profile.py
@@ -7,8 +7,8 @@ import numpy as np
 import argparse
 import time
 
-import paddle.v2.fluid as fluid
-import paddle.v2.fluid.profiler as profiler
+import paddle.fluid as fluid
+import paddle.fluid.profiler as profiler
 import _init_paths
 import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm
 import data_utils.augmentor.trans_add_delta as trans_add_delta
diff --git a/fluid/DeepASR/train.py b/fluid/DeepASR/train.py
index 1c9c963381cdcc0d1c7a098a8f078688738bd4c7..9856dad7d56b47bf14c32a7d0ca0ec10b8ecf88f 100644
--- a/fluid/DeepASR/train.py
+++ b/fluid/DeepASR/train.py
@@ -8,7 +8,7 @@ import numpy as np
 import argparse
 import time
 
-import paddle.v2.fluid as fluid
+import paddle.fluid as fluid
 import data_utils.augmentor.trans_mean_variance_norm as trans_mean_variance_norm
 import data_utils.augmentor.trans_add_delta as trans_add_delta
 import data_utils.augmentor.trans_splice as trans_splice
diff --git a/fluid/README.md b/fluid/README.md
index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..88357ced18c89a1eaa7670c2ad975a42debad4bb 100644
--- a/fluid/README.md
+++ b/fluid/README.md
@@ -0,0 +1,5 @@
+# Paddle Fluid Models
+
+---
+
+The Paddle Fluid models are a collection of example models that use Paddle Fluid APIs. Currently, example codes in this directory are still under active development.
diff --git a/fluid/adversarial/README.md b/fluid/adversarial/README.md
index 51da21918a9d6e2192a2e03eabef4fde97896bc5..e052361c2ae9fdb77babd820a92a4091e1439987 100644
--- a/fluid/adversarial/README.md
+++ b/fluid/adversarial/README.md
@@ -1,3 +1,7 @@
+The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
+
+---
+
 # Advbox
 
 Advbox is a Python toolbox to create adversarial examples that fool neural networks. It requires Python and paddle.
diff --git a/fluid/adversarial/advbox/__init__.py b/fluid/adversarial/advbox/__init__.py
index de124bad2e988bb8b4f6906c07caf3c6737784d1..e68b585ef98d12d147da43468aa0b4be667137b2 100644
--- a/fluid/adversarial/advbox/__init__.py
+++ b/fluid/adversarial/advbox/__init__.py
@@ -1,7 +1,3 @@
 """
    A set of tools for generating adversarial example on paddle platform
 """
-
-from . import attacks
-from . import models
-from .adversary import Adversary
diff --git a/fluid/adversarial/advbox/adversary.py b/fluid/adversarial/advbox/adversary.py
index f044dfe8c9164b05353053be5919045521510ec0..14b8517e336affc4752b53fa586f30f1ec5926be 100644
--- a/fluid/adversarial/advbox/adversary.py
+++ b/fluid/adversarial/advbox/adversary.py
@@ -18,13 +18,15 @@ class Adversary(object):
         """
         assert original is not None
 
+        self.original_label = original_label
+        self.target_label = None
+        self.adversarial_label = None
+
         self.__original = original
-        self.__original_label = original_label
-        self.__target_label = None
         self.__target = None
         self.__is_targeted_attack = False
         self.__adversarial_example = None
-        self.__adversarial_label = None
+        self.__bad_adversarial_example = None
 
     def set_target(self, is_targeted_attack, target=None, target_label=None):
         """
@@ -38,10 +40,10 @@ class Adversary(object):
         """
         assert (target_label is None) or is_targeted_attack
         self.__is_targeted_attack = is_targeted_attack
-        self.__target_label = target_label
+        self.target_label = target_label
         self.__target = target
         if not is_targeted_attack:
-            self.__target_label = None
+            self.target_label = None
             self.__target = None
 
     def set_original(self, original, original_label=None):
@@ -53,10 +55,11 @@ class Adversary(object):
         """
         if original != self.__original:
             self.__original = original
-            self.__original_label = original_label
+            self.original_label = original_label
             self.__adversarial_example = None
+            self.__bad_adversarial_example = None
         if original is None:
-            self.__original_label = None
+            self.original_label = None
 
     def _is_successful(self, adversarial_label):
         """
@@ -65,11 +68,11 @@ class Adversary(object):
         :param adversarial_label: adversarial label.
         :return: bool
         """
-        if self.__target_label is not None:
-            return adversarial_label == self.__target_label
+        if self.target_label is not None:
+            return adversarial_label == self.target_label
         else:
             return (adversarial_label is not None) and \
-                   (adversarial_label != self.__original_label)
+                   (adversarial_label != self.original_label)
 
     def is_successful(self):
         """
@@ -77,7 +80,7 @@ class Adversary(object):
 
         :return: bool
         """
-        return self._is_successful(self.__adversarial_label)
+        return self._is_successful(self.adversarial_label)
 
     def try_accept_the_example(self, adversarial_example, adversarial_label):
         """
@@ -93,7 +96,9 @@ class Adversary(object):
         ok = self._is_successful(adversarial_label)
         if ok:
             self.__adversarial_example = adversarial_example
-            self.__adversarial_label = adversarial_label
+            self.adversarial_label = adversarial_label
+        else:
+            self.__bad_adversarial_example = adversarial_example
         return ok
 
     def perturbation(self, multiplying_factor=1.0):
@@ -104,9 +109,14 @@ class Adversary(object):
         :return: The perturbation that is multiplied by multiplying_factor.
         """
         assert self.__original is not None
-        assert self.__adversarial_example is not None
-        return multiplying_factor * (
-            self.__adversarial_example - self.__original)
+        assert (self.__adversarial_example is not None) or \
+               (self.__bad_adversarial_example is not None)
+        if self.__adversarial_example is not None:
+            return multiplying_factor * (
+                self.__adversarial_example - self.__original)
+        else:
+            return multiplying_factor * (
+                self.__bad_adversarial_example - self.__original)
 
     @property
     def is_targeted_attack(self):
@@ -115,20 +125,6 @@ class Adversary(object):
         """
         return self.__is_targeted_attack
 
-    @property
-    def target_label(self):
-        """
-        :property: target_label
-        """
-        return self.__target_label
-
-    @target_label.setter
-    def target_label(self, label):
-        """
-        :property: target_label
-        """
-        self.__target_label = label
-
     @property
     def target(self):
         """
@@ -143,20 +139,6 @@ class Adversary(object):
         """
         return self.__original
 
-    @property
-    def original_label(self):
-        """
-        :property: original
-        """
-        return self.__original_label
-
-    @original_label.setter
-    def original_label(self, label):
-        """
-        original_label setter
-        """
-        self.__original_label = label
-
     @property
     def adversarial_example(self):
         """
@@ -164,23 +146,9 @@ class Adversary(object):
         """
         return self.__adversarial_example
 
-    @adversarial_example.setter
-    def adversarial_example(self, example):
-        """
-        adversarial_example setter
-        """
-        self.__adversarial_example = example
-
     @property
-    def adversarial_label(self):
-        """
-        :property: adversarial_label
-        """
-        return self.__adversarial_label
-
-    @adversarial_label.setter
-    def adversarial_label(self, label):
+    def bad_adversarial_example(self):
         """
-        adversarial_label setter
+        :property: bad_adversarial_example
         """
-        self.__adversarial_label = label
+        return self.__bad_adversarial_example
diff --git a/fluid/adversarial/advbox/attacks/__init__.py b/fluid/adversarial/advbox/attacks/__init__.py
index bafd123c674184e265288d44ac79b9d18489016c..3893b769f3ad62ada135b55d9367352532feb490 100644
--- a/fluid/adversarial/advbox/attacks/__init__.py
+++ b/fluid/adversarial/advbox/attacks/__init__.py
@@ -1,10 +1,3 @@
 """
-Attack methods
+Attack methods __init__.py
 """
-
-from .base import Attack
-from .deepfool import DeepFoolAttack
-from .gradientsign import FGSM
-from .gradientsign import GradientSignAttack
-from .iterator_gradientsign import IFGSM
-from .iterator_gradientsign import IteratorGradientSignAttack
diff --git a/fluid/adversarial/advbox/attacks/base.py b/fluid/adversarial/advbox/attacks/base.py
index eb9b1d480c9e5488ec622f0717efdd3e5684ea00..af2eae5e41ab2618602a2d82a5151363a35c2378 100644
--- a/fluid/adversarial/advbox/attacks/base.py
+++ b/fluid/adversarial/advbox/attacks/base.py
@@ -52,21 +52,23 @@ class Attack(object):
         :param adversary: adversary
         :return: None
         """
+        assert self.model.channel_axis() == adversary.original.ndim
+
         if adversary.original_label is None:
             adversary.original_label = np.argmax(
                 self.model.predict(adversary.original))
         if adversary.is_targeted_attack and adversary.target_label is None:
             if adversary.target is None:
                 raise ValueError(
-                    'When adversary.is_targeted_attack is True, '
+                    'When adversary.is_targeted_attack is true, '
                     'adversary.target_label or adversary.target must be set.')
             else:
-                adversary.target_label_label = np.argmax(
-                    self.model.predict(
-                        self.model.scale_input(adversary.target)))
+                adversary.target_label = np.argmax(
+                    self.model.predict(adversary.target))
 
-        logging.info('adversary:\noriginal_label: {}'
-                     '\n          target_lable: {}'
-                     '\n          is_targeted_attack: {}'
+        logging.info('adversary:'
+                     '\n         original_label: {}'
+                     '\n         target_label: {}'
+                     '\n         is_targeted_attack: {}'
                      ''.format(adversary.original_label, adversary.target_label,
                                adversary.is_targeted_attack))
diff --git a/fluid/adversarial/advbox/attacks/deepfool.py b/fluid/adversarial/advbox/attacks/deepfool.py
index 2f2da63059955ee73eb445d1a8cd9917a1a51962..abf2292cf30ffedcb8b8056de7237d2e120e3485 100644
--- a/fluid/adversarial/advbox/attacks/deepfool.py
+++ b/fluid/adversarial/advbox/attacks/deepfool.py
@@ -10,6 +10,8 @@ import numpy as np
 
 from .base import Attack
 
+__all__ = ['DeepFoolAttack']
+
 
 class DeepFoolAttack(Attack):
     """
@@ -56,7 +58,7 @@ class DeepFoolAttack(Attack):
                 gradient_k = self.model.gradient(x, k)
                 w_k = gradient_k - gradient
                 f_k = f[k] - f[pre_label]
-                w_k_norm = np.linalg.norm(w_k) + 1e-8
+                w_k_norm = np.linalg.norm(w_k.flatten()) + 1e-8
                 pert_k = (np.abs(f_k) + 1e-8) / w_k_norm
                 if pert_k < pert:
                     pert = pert_k
@@ -70,9 +72,12 @@ class DeepFoolAttack(Attack):
             f = self.model.predict(x)
             gradient = self.model.gradient(x, pre_label)
             adv_label = np.argmax(f)
-            logging.info('iteration = {}, f = {}, pre_label = {}'
-                         ', adv_label={}'.format(iteration, f[pre_label],
-                                                 pre_label, adv_label))
+            logging.info('iteration={}, f[pre_label]={}, f[target_label]={}'
+                         ', f[adv_label]={}, pre_label={}, adv_label={}'
+                         ''.format(iteration, f[pre_label], (
+                             f[adversary.target_label]
+                             if adversary.is_targeted_attack else 'NaN'), f[
+                                 adv_label], pre_label, adv_label))
             if adversary.try_accept_the_example(x, adv_label):
                 return adversary
 
diff --git a/fluid/adversarial/advbox/attacks/gradient_method.py b/fluid/adversarial/advbox/attacks/gradient_method.py
new file mode 100644
index 0000000000000000000000000000000000000000..25b828d41233dea193aef4d953073af3eafdefb3
--- /dev/null
+++ b/fluid/adversarial/advbox/attacks/gradient_method.py
@@ -0,0 +1,170 @@
+"""
+This module provide the attack method for Iterator FGSM's implement.
+"""
+from __future__ import division
+
+import logging
+from collections import Iterable
+
+import numpy as np
+
+from .base import Attack
+
+__all__ = [
+    'GradientMethodAttack', 'FastGradientSignMethodAttack', 'FGSM',
+    'FastGradientSignMethodTargetedAttack', 'FGSMT',
+    'BasicIterativeMethodAttack', 'BIM',
+    'IterativeLeastLikelyClassMethodAttack', 'ILCM'
+]
+
+
+class GradientMethodAttack(Attack):
+    """
+    This class implements gradient attack method, and is the base of FGSM, BIM,
+    ILCM, etc.
+    """
+
+    def __init__(self, model, support_targeted=True):
+        """
+        :param model(model): The model to be attacked.
+        :param support_targeted(bool): Does this attack method support targeted.
+        """
+        super(GradientMethodAttack, self).__init__(model)
+        self.support_targeted = support_targeted
+
+    def _apply(self, adversary, norm_ord=np.inf, epsilons=0.01, steps=100):
+        """
+        Apply the gradient attack method.
+        :param adversary(Adversary):
+            The Adversary object.
+        :param norm_ord(int):
+            Order of the norm, such as np.inf, 1, 2, etc. It can't be 0.
+        :param epsilons(list|tuple|int):
+            Attack step size (input variation).
+        :param steps:
+            The number of iterator steps.
+        :return:
+            adversary(Adversary): The Adversary object.
+        """
+        if norm_ord == 0:
+            raise ValueError("L0 norm is not supported!")
+
+        if not self.support_targeted:
+            if adversary.is_targeted_attack:
+                raise ValueError(
+                    "This attack method doesn't support targeted attack!")
+
+        if not isinstance(epsilons, Iterable):
+            epsilons = np.linspace(epsilons, epsilons + 1e-10, num=steps)
+
+        pre_label = adversary.original_label
+        min_, max_ = self.model.bounds()
+
+        assert self.model.channel_axis() == adversary.original.ndim
+        assert (self.model.channel_axis() == 1 or
+                self.model.channel_axis() == adversary.original.shape[0] or
+                self.model.channel_axis() == adversary.original.shape[-1])
+
+        step = 1
+        adv_img = adversary.original
+        for epsilon in epsilons[:steps]:
+            if epsilon == 0.0:
+                continue
+            if adversary.is_targeted_attack:
+                gradient = -self.model.gradient(adv_img, adversary.target_label)
+            else:
+                gradient = self.model.gradient(adv_img,
+                                               adversary.original_label)
+            if norm_ord == np.inf:
+                gradient_norm = np.sign(gradient)
+            else:
+                gradient_norm = gradient / self._norm(gradient, ord=norm_ord)
+
+            adv_img = adv_img + epsilon * gradient_norm * (max_ - min_)
+            adv_img = np.clip(adv_img, min_, max_)
+            adv_label = np.argmax(self.model.predict(adv_img))
+            logging.info('step={}, epsilon = {:.5f}, pre_label = {}, '
+                         'adv_label={}'.format(step, epsilon, pre_label,
+                                               adv_label))
+            if adversary.try_accept_the_example(adv_img, adv_label):
+                return adversary
+            step += 1
+        return adversary
+
+    @staticmethod
+    def _norm(a, ord):
+        if a.ndim == 1:
+            return np.linalg.norm(a, ord=ord)
+        if a.ndim == a.shape[0]:
+            norm_shape = (a.ndim, reduce(np.dot, a.shape[1:]))
+            norm_axis = 1
+        else:
+            norm_shape = (reduce(np.dot, a.shape[:-1]), a.ndim)
+            norm_axis = 0
+        return np.linalg.norm(a.reshape(norm_shape), ord=ord, axis=norm_axis)
+
+
+class FastGradientSignMethodTargetedAttack(GradientMethodAttack):
+    """
+    "Fast Gradient Sign Method" is extended to support targeted attack.
+    "Fast Gradient Sign Method" was originally implemented by Goodfellow et
+    al. (2015) with the infinity norm.
+
+    Paper link: https://arxiv.org/abs/1412.6572
+    """
+
+    def _apply(self, adversary, epsilons=0.03):
+        return GradientMethodAttack._apply(
+            self,
+            adversary=adversary,
+            norm_ord=np.inf,
+            epsilons=epsilons,
+            steps=1)
+
+
+class FastGradientSignMethodAttack(FastGradientSignMethodTargetedAttack):
+    """
+    This attack was originally implemented by Goodfellow et al. (2015) with the
+    infinity norm, and is known as the "Fast Gradient Sign Method".
+
+    Paper link: https://arxiv.org/abs/1412.6572
+    """
+
+    def __init__(self, model):
+        super(FastGradientSignMethodAttack, self).__init__(model, False)
+
+
+class IterativeLeastLikelyClassMethodAttack(GradientMethodAttack):
+    """
+    "Iterative Least-likely Class Method (ILCM)" extends "BIM" to support
+    targeted attack.
+    "The Basic Iterative Method (BIM)" is to extend "FSGM". "BIM" iteratively
+    take multiple small steps while adjusting the direction after each step.
+
+    Paper link: https://arxiv.org/abs/1607.02533
+    """
+
+    def _apply(self, adversary, epsilons=0.001, steps=1000):
+        return GradientMethodAttack._apply(
+            self,
+            adversary=adversary,
+            norm_ord=np.inf,
+            epsilons=epsilons,
+            steps=steps)
+
+
+class BasicIterativeMethodAttack(IterativeLeastLikelyClassMethodAttack):
+    """
+    FGSM is a one-step method. "The Basic Iterative Method (BIM)" iteratively
+    take multiple small steps while adjusting the direction after each step.
+    Paper link: https://arxiv.org/abs/1607.02533
+    """
+
+    def __init__(self, model):
+        super(BasicIterativeMethodAttack, self).__init__(model, False)
+
+
+FGSM = FastGradientSignMethodAttack
+FGSMT = FastGradientSignMethodTargetedAttack
+BIM = BasicIterativeMethodAttack
+ILCM = IterativeLeastLikelyClassMethodAttack
diff --git a/fluid/adversarial/advbox/attacks/gradientsign.py b/fluid/adversarial/advbox/attacks/gradientsign.py
deleted file mode 100644
index 5909fef5c837e0b1a07c716349d354a6631dfca2..0000000000000000000000000000000000000000
--- a/fluid/adversarial/advbox/attacks/gradientsign.py
+++ /dev/null
@@ -1,60 +0,0 @@
-"""
-This module provide the attack method for FGSM's implement.
-"""
-from __future__ import division
-
-import logging
-from collections import Iterable
-
-import numpy as np
-
-from .base import Attack
-
-
-class GradientSignAttack(Attack):
-    """
-    This attack was originally implemented by Goodfellow et al. (2015) with the
-    infinity norm (and is known as the "Fast Gradient Sign Method").
-    This is therefore called the Fast Gradient Method.
-    Paper link: https://arxiv.org/abs/1412.6572
-    """
-
-    def _apply(self, adversary, epsilons=1000):
-        """
-          Apply the gradient sign attack.
-          Args:
-              adversary(Adversary): The Adversary object.
-              epsilons(list|tuple|int): The epsilon (input variation parameter).
-          Return:
-              adversary: The Adversary object.
-          """
-        assert adversary is not None
-
-        if not isinstance(epsilons, Iterable):
-            epsilons = np.linspace(0, 1, num=epsilons + 1)[1:]
-
-        pre_label = adversary.original_label
-        min_, max_ = self.model.bounds()
-
-        if adversary.is_targeted_attack:
-            gradient = self.model.gradient(adversary.original,
-                                           adversary.target_label)
-            gradient_sign = -np.sign(gradient) * (max_ - min_)
-        else:
-            gradient = self.model.gradient(adversary.original,
-                                           adversary.original_label)
-            gradient_sign = np.sign(gradient) * (max_ - min_)
-
-        for epsilon in epsilons:
-            adv_img = adversary.original + epsilon * gradient_sign
-            adv_img = np.clip(adv_img, min_, max_)
-            adv_label = np.argmax(self.model.predict(adv_img))
-            logging.info('epsilon = {:.3f}, pre_label = {}, adv_label={}'.
-                         format(epsilon, pre_label, adv_label))
-            if adversary.try_accept_the_example(adv_img, adv_label):
-                return adversary
-
-        return adversary
-
-
-FGSM = GradientSignAttack
diff --git a/fluid/adversarial/advbox/attacks/iterator_gradientsign.py b/fluid/adversarial/advbox/attacks/iterator_gradientsign.py
deleted file mode 100644
index ac2ef8142a630da5b2190fc3818b0fb7c008d826..0000000000000000000000000000000000000000
--- a/fluid/adversarial/advbox/attacks/iterator_gradientsign.py
+++ /dev/null
@@ -1,59 +0,0 @@
-"""
-This module provide the attack method for Iterator FGSM's implement.
-"""
-from __future__ import division
-
-import logging
-from collections import Iterable
-
-import numpy as np
-
-from .base import Attack
-
-
-class IteratorGradientSignAttack(Attack):
-    """
-    This attack was originally implemented by Alexey Kurakin(Google Brain).
-    Paper link: https://arxiv.org/pdf/1607.02533.pdf
-    """
-
-    def _apply(self, adversary, epsilons=100, steps=10):
-        """
-        Apply the iterative gradient sign attack.
-        Args:
-            adversary(Adversary): The Adversary object.
-            epsilons(list|tuple|int): The epsilon (input variation parameter).
-            steps(int): The number of iterator steps.
-        Return:
-            adversary(Adversary): The Adversary object.
-        """
-
-        if not isinstance(epsilons, Iterable):
-            epsilons = np.linspace(0, 1 / steps, num=epsilons + 1)[1:]
-
-        pre_label = adversary.original_label
-        min_, max_ = self.model.bounds()
-
-        for epsilon in epsilons:
-            adv_img = adversary.original
-            for _ in range(steps):
-                if adversary.is_targeted_attack:
-                    gradient = self.model.gradient(adversary.original,
-                                                   adversary.target_label)
-                    gradient_sign = -np.sign(gradient) * (max_ - min_)
-                else:
-                    gradient = self.model.gradient(adversary.original,
-                                                   adversary.original_label)
-                    gradient_sign = np.sign(gradient) * (max_ - min_)
-                adv_img = adv_img + gradient_sign * epsilon
-                adv_img = np.clip(adv_img, min_, max_)
-                adv_label = np.argmax(self.model.predict(adv_img))
-                logging.info('epsilon = {:.3f}, pre_label = {}, adv_label={}'.
-                             format(epsilon, pre_label, adv_label))
-                if adversary.try_accept_the_example(adv_img, adv_label):
-                    return adversary
-
-        return adversary
-
-
-IFGSM = IteratorGradientSignAttack
diff --git a/fluid/adversarial/advbox/attacks/lbfgs.py b/fluid/adversarial/advbox/attacks/lbfgs.py
new file mode 100644
index 0000000000000000000000000000000000000000..b427df1d9770c25b4ad68609dffc890f8c232e36
--- /dev/null
+++ b/fluid/adversarial/advbox/attacks/lbfgs.py
@@ -0,0 +1,138 @@
+"""
+This module provide the attack method of "LBFGS".
+"""
+from __future__ import division
+
+import logging
+
+import numpy as np
+from scipy.optimize import fmin_l_bfgs_b
+
+from .base import Attack
+
+__all__ = ['LBFGSAttack', 'LBFGS']
+
+
+class LBFGSAttack(Attack):
+    """
+    Uses L-BFGS-B to minimize the cross-entropy and the distance between the
+    original and the adversary.
+
+    Paper link: https://arxiv.org/abs/1510.05328
+    """
+
+    def __init__(self, model):
+        super(LBFGSAttack, self).__init__(model)
+        self._predicts_normalized = None
+        self._adversary = None  # type: Adversary
+
+    def _apply(self, adversary, epsilon=0.001, steps=10):
+        self._adversary = adversary
+
+        if not adversary.is_targeted_attack:
+            raise ValueError("This attack method only support targeted attack!")
+
+        # finding initial c
+        logging.info('finding initial c...')
+        c = epsilon
+        x0 = adversary.original.flatten()
+        for i in range(30):
+            c = 2 * c
+            logging.info('c={}'.format(c))
+            is_adversary = self._lbfgsb(x0, c, steps)
+            if is_adversary:
+                break
+        if not is_adversary:
+            logging.info('Failed!')
+            return adversary
+
+        # binary search c
+        logging.info('binary search c...')
+        c_low = 0
+        c_high = c
+        while c_high - c_low >= epsilon:
+            logging.info('c_high={}, c_low={}, diff={}, epsilon={}'
+                         .format(c_high, c_low, c_high - c_low, epsilon))
+            c_half = (c_low + c_high) / 2
+            is_adversary = self._lbfgsb(x0, c_half, steps)
+            if is_adversary:
+                c_high = c_half
+            else:
+                c_low = c_half
+
+        return adversary
+
+    def _is_predicts_normalized(self, predicts):
+        """
+        To determine the predicts is normalized.
+        :param predicts(np.array): the output of the model.
+        :return: bool
+        """
+        if self._predicts_normalized is None:
+            if self.model.predict_name().lower() in [
+                    'softmax', 'probabilities', 'probs'
+            ]:
+                self._predicts_normalized = True
+            else:
+                if np.any(predicts < 0.0):
+                    self._predicts_normalized = False
+                else:
+                    s = np.sum(predicts.flatten())
+                    if 0.999 <= s <= 1.001:
+                        self._predicts_normalized = True
+                    else:
+                        self._predicts_normalized = False
+        assert self._predicts_normalized is not None
+        return self._predicts_normalized
+
+    def _loss(self, adv_x, c):
+        """
+        To get the loss and gradient.
+        :param adv_x: the candidate adversarial example
+        :param c: parameter 'C' in the paper
+        :return: (loss, gradient)
+        """
+        x = adv_x.reshape(self._adversary.original.shape)
+
+        # cross_entropy
+        logits = self.model.predict(x)
+        if not self._is_predicts_normalized(logits):  # to softmax
+            e = np.exp(logits)
+            logits = e / np.sum(e)
+        e = np.exp(logits)
+        s = np.sum(e)
+        ce = np.log(s) - logits[self._adversary.target_label]
+
+        # L2 distance
+        min_, max_ = self.model.bounds()
+        d = np.sum((x - self._adversary.original).flatten() ** 2) \
+            / ((max_ - min_) ** 2) / len(adv_x)
+
+        # gradient
+        gradient = self.model.gradient(x, self._adversary.target_label)
+
+        result = (c * ce + d).astype(float), gradient.flatten().astype(float)
+        return result
+
+    def _lbfgsb(self, x0, c, maxiter):
+        min_, max_ = self.model.bounds()
+        bounds = [(min_, max_)] * len(x0)
+        approx_grad_eps = (max_ - min_) / 100.0
+        x, f, d = fmin_l_bfgs_b(
+            self._loss,
+            x0,
+            args=(c, ),
+            bounds=bounds,
+            maxiter=maxiter,
+            epsilon=approx_grad_eps)
+        if np.amax(x) > max_ or np.amin(x) < min_:
+            x = np.clip(x, min_, max_)
+        shape = self._adversary.original.shape
+        adv_label = np.argmax(self.model.predict(x.reshape(shape)))
+        logging.info('pre_label = {}, adv_label={}'.format(
+            self._adversary.target_label, adv_label))
+        return self._adversary.try_accept_the_example(
+            x.reshape(shape), adv_label)
+
+
+LBFGS = LBFGSAttack
diff --git a/fluid/adversarial/advbox/models/__init__.py b/fluid/adversarial/advbox/models/__init__.py
index 46d0fea90ef1b2dcefc68121cca9301613519e4c..de6d2a9feeb4a3ffc3b8bfb11e87f600a6951487 100644
--- a/fluid/adversarial/advbox/models/__init__.py
+++ b/fluid/adversarial/advbox/models/__init__.py
@@ -1,5 +1,3 @@
 """
-Paddle model for target of attack
-"""
-from .base import Model
-from .paddle import PaddleModel
+Models __init__.py
+"""
\ No newline at end of file
diff --git a/fluid/adversarial/advbox/models/base.py b/fluid/adversarial/advbox/models/base.py
index 142c7f054a29048af505fe2e861d8ac11cf623e1..f25d4e305d4772b1b2876beef670823a393b7089 100644
--- a/fluid/adversarial/advbox/models/base.py
+++ b/fluid/adversarial/advbox/models/base.py
@@ -24,11 +24,21 @@ class Model(object):
         assert len(bounds) == 2
         assert channel_axis in [0, 1, 2, 3]
 
-        if preprocess is None:
-            preprocess = (0, 1)
         self._bounds = bounds
         self._channel_axis = channel_axis
-        self._preprocess = preprocess
+
+        # Make self._preprocess to be (0,1) if possible, so that don't need
+        # to do substract or divide.
+        if preprocess is not None:
+            sub, div = np.array(preprocess)
+            if not np.any(sub):
+                sub = 0
+            if np.all(div == 1):
+                div = 1
+            assert (div is None) or np.all(div)
+            self._preprocess = (sub, div)
+        else:
+            self._preprocess = (0, 1)
 
     def bounds(self):
         """
@@ -47,8 +57,7 @@ class Model(object):
         sub, div = self._preprocess
         if np.any(sub != 0):
             res = input_ - sub
-        assert np.any(div != 0)
-        if np.any(div != 1):
+        if not np.all(sub == 1):
             if res is None:  # "res = input_ - sub" is not executed!
                 res = input_ / div
             else:
@@ -97,3 +106,11 @@ class Model(object):
                 with the shape (height, width, channel).
         """
         raise NotImplementedError
+
+    @abstractmethod
+    def predict_name(self):
+        """
+        Get the predict name, such as "softmax",etc.
+        :return: string
+        """
+        raise NotImplementedError
diff --git a/fluid/adversarial/advbox/models/paddle.py b/fluid/adversarial/advbox/models/paddle.py
index 3a25dba40a0027d4e633cbeddc24a72fc4ce49c4..73439d2a4e616899dca6c1a017e1f75b4fb1971f 100644
--- a/fluid/adversarial/advbox/models/paddle.py
+++ b/fluid/adversarial/advbox/models/paddle.py
@@ -4,7 +4,7 @@ Paddle model
 from __future__ import absolute_import
 
 import numpy as np
-import paddle.v2.fluid as fluid
+import paddle.fluid as fluid
 
 from .base import Model
 
@@ -16,7 +16,7 @@ class PaddleModel(Model):
     instance of PaddleModel.
 
     Args:
-        program(paddle.v2.fluid.framework.Program): The program of the model
+        program(paddle.fluid.framework.Program): The program of the model
             which generate the adversarial sample.
         input_name(string): The name of the input.
         logits_name(string): The name of the logits.
@@ -114,3 +114,10 @@ class PaddleModel(Model):
                               feed=feeder.feed([(scaled_data, label)]),
                               fetch_list=[self._gradient])
         return grad.reshape(data.shape)
+
+    def predict_name(self):
+        """
+        Get the predict name, such as "softmax",etc.
+        :return: string
+        """
+        return self._program.block(0).var(self._predict_name).op.type
diff --git a/fluid/adversarial/fluid_mnist.py b/fluid/adversarial/fluid_mnist.py
index db4d4b51868ffa8be13d4d57a40e1def7e25d1a8..dc116d7de52bfe4529c6fc977a9753440145b73c 100644
--- a/fluid/adversarial/fluid_mnist.py
+++ b/fluid/adversarial/fluid_mnist.py
@@ -2,7 +2,7 @@
 CNN on mnist data using fluid api of paddlepaddle
 """
 import paddle.v2 as paddle
-import paddle.v2.fluid as fluid
+import paddle.fluid as fluid
 
 
 def mnist_cnn_model(img):
diff --git a/fluid/adversarial/mnist_tutorial_fgsm.py b/fluid/adversarial/mnist_tutorial_fgsm.py
index 5da4bbfc4313315c27e1b5b41c4452cbaafa7413..ea3231695bab8c78aceaf7ba0ba375a5c564d5a0 100644
--- a/fluid/adversarial/mnist_tutorial_fgsm.py
+++ b/fluid/adversarial/mnist_tutorial_fgsm.py
@@ -3,10 +3,10 @@ FGSM demos on mnist using advbox tool.
 """
 import matplotlib.pyplot as plt
 import paddle.v2 as paddle
-import paddle.v2.fluid as fluid
+import paddle.fluid as fluid
 
-from advbox import Adversary
-from advbox.attacks.gradientsign import GradientSignAttack
+from advbox.adversary import Adversary
+from advbox.attacks.gradient_method import FGSM
 from advbox.models.paddle import PaddleModel
 
 
@@ -73,7 +73,7 @@ def main():
     # advbox demo
     m = PaddleModel(fluid.default_main_program(), IMG_NAME, LABEL_NAME,
                     logits.name, avg_cost.name, (-1, 1))
-    att = GradientSignAttack(m)
+    att = FGSM(m)
     for data in train_reader():
         # fgsm attack
         adversary = att(Adversary(data[0][0], data[0][1]))
diff --git a/fluid/adversarial/mnist_tutorial_jsma.py b/fluid/adversarial/mnist_tutorial_jsma.py
index 2010b5b4655b8693c5955092a7810d327667375f..d9db8b712cb5ca4fbded2119f249c586d2877b50 100644
--- a/fluid/adversarial/mnist_tutorial_jsma.py
+++ b/fluid/adversarial/mnist_tutorial_jsma.py
@@ -3,7 +3,7 @@ FGSM demos on mnist using advbox tool.
 """
 import matplotlib.pyplot as plt
 import paddle.v2 as paddle
-import paddle.v2.fluid as fluid
+import paddle.fluid as fluid
 import numpy as np
 
 from advbox import Adversary
diff --git a/fluid/image_classification/README.md b/fluid/image_classification/README.md
index 3d9f340b3e4ffc73147d0cbe2be0706fe608c198..b950fbe1a7901893f3d8f883858e3db15966b7b3 100644
--- a/fluid/image_classification/README.md
+++ b/fluid/image_classification/README.md
@@ -1,3 +1,7 @@
+The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
+
+---
+
 # SE-ResNeXt for image classification
 
 This model built with paddle fluid is still under active development and is not
diff --git a/fluid/image_classification/mobilenet.py b/fluid/image_classification/mobilenet.py
index 15a7273af86cd659575f43d7bfb055017c5f9b17..48d266c02b94155e07d80f7c401987a22ac7c906 100644
--- a/fluid/image_classification/mobilenet.py
+++ b/fluid/image_classification/mobilenet.py
@@ -1,9 +1,9 @@
 import os
 
 import paddle.v2 as paddle
-import paddle.v2.fluid as fluid
-from paddle.v2.fluid.initializer import MSRA
-from paddle.v2.fluid.param_attr import ParamAttr
+import paddle.fluid as fluid
+from paddle.fluid.initializer import MSRA
+from paddle.fluid.param_attr import ParamAttr
 
 parameter_attr = ParamAttr(initializer=MSRA())
 
diff --git a/fluid/image_classification/se_resnext.py b/fluid/image_classification/se_resnext.py
index 9d7b4ca2be6979ef0ceaf2a5636d8162fe3fc821..c2b2d680fc995b1ea6cc5a2f640746a8a79ac029 100644
--- a/fluid/image_classification/se_resnext.py
+++ b/fluid/image_classification/se_resnext.py
@@ -1,6 +1,6 @@
 import os
 import paddle.v2 as paddle
-import paddle.v2.fluid as fluid
+import paddle.fluid as fluid
 import reader
 
 
diff --git a/fluid/policy_gradient/README.md b/fluid/policy_gradient/README.md
index 7db11fc44bc9e68dd064080d0fca0f7011c3c018..b813aa124466597adfb80261bee7c2de22b95e67 100644
--- a/fluid/policy_gradient/README.md
+++ b/fluid/policy_gradient/README.md
@@ -1,4 +1,8 @@
-﻿# Policy Gradient RL by PaddlePaddle
+﻿运行本目录下的程序示例需要使用PaddlePaddle的最新develop分枝。如果您的PaddlePaddle安装版本低于此要求，请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
+
+---
+
+# Policy Gradient RL by PaddlePaddle
 本文介绍了如何使用PaddlePaddle通过policy-based的强化学习方法来训练一个player（actor model）, 我们希望这个player可以完成简单的走阶梯任务。
 
  内容分为:
diff --git a/fluid/policy_gradient/brain.py b/fluid/policy_gradient/brain.py
index bf247932a499572911c09592e4fd8d977d424e93..8387833065d89e0a61b90734771a8d9db5ac1eb4 100644
--- a/fluid/policy_gradient/brain.py
+++ b/fluid/policy_gradient/brain.py
@@ -1,6 +1,6 @@
 import numpy as np
 import paddle.v2 as paddle
-import paddle.v2.fluid as fluid
+import paddle.fluid as fluid
 # reproducible
 np.random.seed(1)
 
diff --git a/fluid/text_classification/README.md b/fluid/text_classification/README.md
index 40df3211d7c914e2e0af7df3d9aeef6fd3cca842..500ee6ae6db28e9d844d206a1cc894c36f1db09f 100644
--- a/fluid/text_classification/README.md
+++ b/fluid/text_classification/README.md
@@ -1,3 +1,7 @@
+The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
+
+---
+
 # Text Classification
 
 ## Data Preparation
diff --git a/fluid/text_classification/train.py b/fluid/text_classification/train.py
index 98f63f0867d0834c7e7750fa7822243ca920b2cb..d0c9c34f02a0435fe7b6c390189aa921a6beef02 100644
--- a/fluid/text_classification/train.py
+++ b/fluid/text_classification/train.py
@@ -5,7 +5,7 @@ import argparse
 import time
 
 import paddle.v2 as paddle
-import paddle.v2.fluid as fluid
+import paddle.fluid as fluid
 
 from config import TrainConfig as conf
 
diff --git a/fluid/transformer/README.md b/fluid/transformer/README.md
index 4988c6b1f2e5b443912508e376206b1b9d556fa3..6fea167b5e7c3e9dd759ef30d9225b451350e889 100644
--- a/fluid/transformer/README.md
+++ b/fluid/transformer/README.md
@@ -1,3 +1,7 @@
+The minimum PaddlePaddle version needed for the code sample in this directory is the lastest develop branch. If you are on a version of PaddlePaddle earlier than this, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
+
+---
+
 # Attention is All You Need: A Paddle Fluid implementation
 
 This is a Paddle Fluid implementation of the Transformer model in [Attention is All You Need]() (Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, arxiv, 2017).
diff --git a/fluid/transformer/config.py b/fluid/transformer/config.py
index b0ec296e1a2085bf390c8527af115a3c06622800..091ea175291c56d63e1d8b42a874516d9733f1cf 100644
--- a/fluid/transformer/config.py
+++ b/fluid/transformer/config.py
@@ -12,6 +12,9 @@ class TrainTaskConfig(object):
     beta2 = 0.98
     eps = 1e-9
 
+    # the params for learning rate scheduling
+    warmup_steps = 4000
+
 
 class ModelHyperParams(object):
     # Dictionary size for source and target language. This model directly uses
@@ -70,4 +73,5 @@ input_data_names = (
     "src_slf_attn_bias",
     "trg_slf_attn_bias",
     "trg_src_attn_bias",
-    "lbl_word", )
+    "lbl_word",
+    "lbl_weight", )
diff --git a/fluid/transformer/model.py b/fluid/transformer/model.py
index 1a5ae39677ca06641890ba6d14738e949079aac7..e1163899529ce56d0b295ffd5163d3a6a6be296d 100644
--- a/fluid/transformer/model.py
+++ b/fluid/transformer/model.py
@@ -2,8 +2,8 @@ from functools import partial
 import numpy as np
 
 import paddle.v2 as paddle
-import paddle.v2.fluid as fluid
-import paddle.v2.fluid.layers as layers
+import paddle.fluid as fluid
+import paddle.fluid.layers as layers
 
 from config import TrainTaskConfig, input_data_names, pos_enc_param_names
 
@@ -474,4 +474,14 @@ def transformer(
         dtype="int64",
         append_batch_size=False)
     cost = layers.cross_entropy(input=predict, label=gold)
-    return layers.mean(x=cost)
+    # The actual shape of weights in runtime is:
+    # [batch_size * max_trg_length_in_a_batch, 1].
+    # Padding index do not contribute to the total loss. This Weight is used to
+    # cancel padding index in calculating the loss.
+    weights = layers.data(
+        name=input_data_names[8],
+        shape=[batch_size * max_length, 1],
+        dtype="float32",
+        append_batch_size=False)
+    weighted_cost = cost * weights
+    return layers.reduce_sum(weighted_cost)
diff --git a/fluid/transformer/optim.py b/fluid/transformer/optim.py
new file mode 100644
index 0000000000000000000000000000000000000000..9905e6594a668b8e59fef1a4394714a6fcb8aeb6
--- /dev/null
+++ b/fluid/transformer/optim.py
@@ -0,0 +1,40 @@
+import numpy as np
+
+import paddle.fluid as fluid
+import paddle.fluid.layers as layers
+
+
+class LearningRateScheduler(object):
+    """
+    Wrapper for learning rate scheduling as described in the Transformer paper.
+    LearningRateScheduler adapts the learning rate externally and the adapted
+    learning rate will be feeded into the main_program as input data.
+    """
+
+    def __init__(self,
+                 d_model,
+                 warmup_steps,
+                 place,
+                 learning_rate=0.001,
+                 current_steps=0,
+                 name="learning_rate"):
+        self.current_steps = current_steps
+        self.warmup_steps = warmup_steps
+        self.d_model = d_model
+        self.learning_rate = layers.create_global_var(
+            name=name,
+            shape=[1],
+            value=float(learning_rate),
+            dtype="float32",
+            persistable=True)
+        self.place = place
+
+    def update_learning_rate(self, data_input):
+        self.current_steps += 1
+        lr_value = np.power(self.d_model, -0.5) * np.min([
+            np.power(self.current_steps, -0.5),
+            np.power(self.warmup_steps, -1.5) * self.current_steps
+        ])
+        lr_tensor = fluid.LoDTensor()
+        lr_tensor.set(np.array([lr_value], dtype="float32"), self.place)
+        data_input[self.learning_rate.name] = lr_tensor
diff --git a/fluid/transformer/train.py b/fluid/transformer/train.py
index 76904669de804df41710ff6f7602997f4899d3c4..b841ef4621d91e07f9d93e87a795c4605e7f30bc 100644
--- a/fluid/transformer/train.py
+++ b/fluid/transformer/train.py
@@ -1,9 +1,10 @@
 import numpy as np
 
 import paddle.v2 as paddle
-import paddle.v2.fluid as fluid
+import paddle.fluid as fluid
 
 from model import transformer, position_encoding_init
+from optim import LearningRateScheduler
 from config import TrainTaskConfig, ModelHyperParams, \
         pos_enc_param_names, input_data_names
 
@@ -77,17 +78,21 @@ def prepare_batch_input(insts, input_data_names, src_pad_idx, trg_pad_idx,
                                 [1, 1, trg_max_len, 1]).astype("float32")
     lbl_word = __pad_batch_data([inst[2] for inst in insts], trg_pad_idx, False,
                                 False, False, False)
+    lbl_weight = (lbl_word != trg_pad_idx).astype("float32").reshape([-1, 1])
 
     data_to_tensor([
         src_word, src_pos, trg_word, trg_pos, src_slf_attn_bias,
-        trg_slf_attn_bias, trg_src_attn_bias, lbl_word
+        trg_slf_attn_bias, trg_src_attn_bias, lbl_word, lbl_weight
     ], input_data_names, input_dict, place)
 
     return input_dict
 
 
 def main():
-    avg_cost = transformer(
+    place = fluid.CUDAPlace(0) if TrainTaskConfig.use_gpu else fluid.CPUPlace()
+    exe = fluid.Executor(place)
+
+    cost = transformer(
         ModelHyperParams.src_vocab_size + 1,
         ModelHyperParams.trg_vocab_size + 1, ModelHyperParams.max_length + 1,
         ModelHyperParams.n_layer, ModelHyperParams.n_head,
@@ -96,12 +101,15 @@ def main():
         ModelHyperParams.dropout, ModelHyperParams.src_pad_idx,
         ModelHyperParams.trg_pad_idx, ModelHyperParams.pos_pad_idx)
 
+    lr_scheduler = LearningRateScheduler(ModelHyperParams.d_model,
+                                         TrainTaskConfig.warmup_steps, place,
+                                         TrainTaskConfig.learning_rate)
     optimizer = fluid.optimizer.Adam(
-        learning_rate=TrainTaskConfig.learning_rate,
+        learning_rate=lr_scheduler.learning_rate,
         beta1=TrainTaskConfig.beta1,
         beta2=TrainTaskConfig.beta2,
         epsilon=TrainTaskConfig.eps)
-    optimizer.minimize(avg_cost)
+    optimizer.minimize(cost)
 
     train_data = paddle.batch(
         paddle.reader.shuffle(
@@ -110,9 +118,6 @@ def main():
             buf_size=51200),
         batch_size=TrainTaskConfig.batch_size)
 
-    place = fluid.CUDAPlace(0) if TrainTaskConfig.use_gpu else fluid.CPUPlace()
-    exe = fluid.Executor(place)
-
     # Initialize the parameters.
     exe.run(fluid.framework.default_startup_program())
     for pos_enc_param_name in pos_enc_param_names:
@@ -124,16 +129,21 @@ def main():
 
     for pass_id in xrange(TrainTaskConfig.pass_num):
         for batch_id, data in enumerate(train_data()):
+            # The current program desc is coupled with batch_size, thus all
+            # mini-batches must have the same number of instances currently.
+            if len(data) != TrainTaskConfig.batch_size:
+                continue
             data_input = prepare_batch_input(
                 data, input_data_names, ModelHyperParams.src_pad_idx,
                 ModelHyperParams.trg_pad_idx, ModelHyperParams.max_length,
                 ModelHyperParams.n_head, place)
+            lr_scheduler.update_learning_rate(data_input)
             outs = exe.run(fluid.framework.default_main_program(),
                            feed=data_input,
-                           fetch_list=[avg_cost])
-            avg_cost_val = np.array(outs[0])
+                           fetch_list=[cost])
+            cost_val = np.array(outs[0])
             print("pass_id = " + str(pass_id) + " batch = " + str(batch_id) +
-                  " avg_cost = " + str(avg_cost_val))
+                  " avg_cost = " + str(cost_val))
 
 
 if __name__ == "__main__":
diff --git a/generate_chinese_poetry/README.md b/generate_chinese_poetry/README.md
index 1f6bef0da8145098f70fd02030f6cf4f7284dd3e..c1ea00109075a64f549ec56ad8433f7c4846855a 100644
--- a/generate_chinese_poetry/README.md
+++ b/generate_chinese_poetry/README.md
@@ -1,3 +1,7 @@
+运行本目录下的程序示例需要使用PaddlePaddle v0.10.0版本。如果您的PaddlePaddle安装版本低于此要求，请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
+
+---
+
 # 中国古诗生成
 
 ## 简介
diff --git a/generate_sequence_by_rnn_lm/README.md b/generate_sequence_by_rnn_lm/README.md
index b804e528543ad1d60241024b7fd7bee48b7a9c26..afa543334f19088fbf8840483397e659408b6af0 100644
--- a/generate_sequence_by_rnn_lm/README.md
+++ b/generate_sequence_by_rnn_lm/README.md
@@ -1,3 +1,7 @@
+运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此要求，请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
+
+---
+
 # 使用循环神经网语言模型生成文本
 
 语言模型(Language Model)是一个概率分布模型，简单来说，就是用来计算一个句子的概率的模型。利用它可以确定哪个词序列的可能性更大，或者给定若干个词，可以预测下一个最可能出现的词。语言模型是自然语言处理领域里一个重要的基础模型。
diff --git a/globally_normalized_reader/README.md b/globally_normalized_reader/README.md
index ca223ac75bc3b7edea5cf69abd88e16ba4d193a9..9763a1c04fc5dd76da2003acfa53ba094f0582e4 100644
--- a/globally_normalized_reader/README.md
+++ b/globally_normalized_reader/README.md
@@ -1,3 +1,7 @@
+The minimum PaddlePaddle version needed for the code sample in this directory is v0.11.0. If you are on a version of PaddlePaddle earlier than v0.11.0, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
+
+---
+
 # Globally Normalized Reader
 
 This model implements the work in the following paper:
diff --git a/hsigmoid/README.md b/hsigmoid/README.md
index 5e891bce4eaf5c8fe4ab7f17cafd950752cf026e..619fc190acbbbfc2f792f3274e4dfec0042d8c1c 100644
--- a/hsigmoid/README.md
+++ b/hsigmoid/README.md
@@ -1,3 +1,7 @@
+运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此要求，请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
+
+---
+
 # Hsigmoid加速词向量训练
 ## 背景介绍
 在自然语言处理领域中，传统做法通常使用one-hot向量来表示词，比如词典为['我', '你', '喜欢']，可以用[1,0,0]、[0,1,0]和[0,0,1]这三个向量分别表示'我'、'你'和'喜欢'。这种表示方式比较简洁，但是当词表很大时，容易产生维度爆炸问题；而且任意两个词的向量是正交的，向量包含的信息有限。为了避免或减轻one-hot表示的缺点，目前通常使用词向量来取代one-hot表示，词向量也就是word embedding，即使用一个低维稠密的实向量取代高维稀疏的one-hot向量。训练词向量的方法有很多种，神经网络模型是其中之一，包括CBOW、Skip-gram等，这些模型本质上都是一个分类模型，当词表较大即类别较多时，传统的softmax将非常消耗时间。PaddlePaddle提供了Hsigmoid Layer、NCE Layer，来加速模型的训练过程。本文主要介绍如何使用Hsigmoid Layer来加速训练，词向量相关内容请查阅PaddlePaddle Book中的[词向量章节](https://github.com/PaddlePaddle/book/tree/develop/04.word2vec)。
diff --git a/image_classification/README.md b/image_classification/README.md
index 45d8ce5742393ae705e8d16cbf6b0f4e33df5c6a..f041185acc6f972fa5b2759a7f64efc0f2000c80 100644
--- a/image_classification/README.md
+++ b/image_classification/README.md
@@ -1,3 +1,7 @@
+运行本目录下的程序示例需要使用PaddlePaddle v0.11.0 版本。如果您的PaddlePaddle安装版本低于此要求，请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
+
+---
+
 图像分类
 =======================
 
diff --git a/ltr/README.md b/ltr/README.md
index 3cc84494f7d666e396a9f00690aaf269f36d0057..e7ce9f9215fd85ed3008627f3041a7000ecf219d 100644
--- a/ltr/README.md
+++ b/ltr/README.md
@@ -1,3 +1,7 @@
+运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此要求，请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
+
+---
+
 # 排序学习(Learning To Rank)
 
 排序学习技术\[[1](#参考文献1)\]是构建排序模型的机器学习方法，在信息检索、自然语言处理，数据挖掘等机器学场景中具有重要作用。排序学习的主要目的是对给定一组文档，对任意查询请求给出反映相关性的文档排序。在本例子中，利用标注过的语料库训练两种经典排序模型RankNet[[4](#参考文献4)\]和LamdaRank[[6](#参考文献6)\]，分别可以生成对应的排序模型，能够对任意查询请求，给出相关性文档排序。
diff --git a/mt_with_external_memory/README.md b/mt_with_external_memory/README.md
index 413526a5b52ba15fff2235eb637f95cbfc1ed209..1b478bd846ec5a5083c877f15c86057014375f8a 100644
--- a/mt_with_external_memory/README.md
+++ b/mt_with_external_memory/README.md
@@ -1,3 +1,7 @@
+运行本目录下的程序示例需要使用PaddlePaddle v0.11.0 版本。如果您的PaddlePaddle安装版本低于此要求，请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
+
+---
+
 # 带外部记忆机制的神经机器翻译
 
 带**外部记忆**（External Memory）机制的神经机器翻译模型（Neural Machine Translation, NMT），是神经机器翻译模型的一个重要扩展。它引入可微分的记忆网络作为额外的记忆单元，拓展神经翻译模型内部工作记忆（Working Memory）的容量或带宽，辅助完成翻译等任务中信息的临时存取，改善模型表现。
diff --git a/nce_cost/README.md b/nce_cost/README.md
index 1792c41b8d4ce86846466e2af65166169118de69..25864ada5c5ab9c686070743f4745f7062047205 100644
--- a/nce_cost/README.md
+++ b/nce_cost/README.md
@@ -1,3 +1,7 @@
+运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此要求，请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
+
+---
+
 # 使用噪声对比估计加速语言模型训练
 
 ## 为什么需要噪声对比估计
@@ -101,11 +105,11 @@ return paddle.layer.nce(
 
 NCE 层的一些重要参数解释如下：
 
-| 参数名  | 参数作用  | 介绍 |
-|:------ |:-------| :--------|
-| param\_attr / bias\_attr | 用来设置参数名字 |方便预测阶段加载参数，具体在预测一节中介绍。|
-| num\_neg\_samples | 负样本采样个数|可以控制正负样本比例，这个值取值区间为 [1, 字典大小-1]，负样本个数越多则整个模型的训练速度越慢，模型精度也会越高 |
-| neg\_distribution | 生成负样例标签的分布，默认是一个均匀分布| 可以自行控制负样本采样时各个类别的采样权重。例如：希望正样例为“晴天”时，负样例“洪水”在训练时更被着重区分，则可以将“洪水”这个类别的采样权重增加|
+| 参数名                   | 参数作用                                 | 介绍                                                                                                                                                 |
+| :----------------------- | :--------------------------------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------- |
+| param\_attr / bias\_attr | 用来设置参数名字                         | 方便预测阶段加载参数，具体在预测一节中介绍。                                                                                                         |
+| num\_neg\_samples        | 负样本采样个数                           | 可以控制正负样本比例，这个值取值区间为 [1, 字典大小-1]，负样本个数越多则整个模型的训练速度越慢，模型精度也会越高                                     |
+| neg\_distribution        | 生成负样例标签的分布，默认是一个均匀分布 | 可以自行控制负样本采样时各个类别的采样权重。例如：希望正样例为“晴天”时，负样例“洪水”在训练时更被着重区分，则可以将“洪水”这个类别的采样权重增加 |
 
 ## 预测
 1. 在命令行运行 :
diff --git a/nested_sequence/text_classification/README.md b/nested_sequence/text_classification/README.md
index db6f2bc65a38e95b0371d82862779e9fc806f0f8..0509ac342bf09c5d8b9c80981f78c0e5cf316c24 100644
--- a/nested_sequence/text_classification/README.md
+++ b/nested_sequence/text_classification/README.md
@@ -1,3 +1,7 @@
+运行本目录下的程序示例需要使用PaddlePaddle v0.11.0 版本。如果您的PaddlePaddle安装版本低于此要求，请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
+
+---
+
 # 基于双层序列的文本分类
 
 ## 简介
diff --git a/neural_qa/README.md b/neural_qa/README.md
index 7744493fab5afe32cab50038a95bf38ed5b8bd07..a19d7020679ac0dfee44e3c7a65ebef05057507a 100644
--- a/neural_qa/README.md
+++ b/neural_qa/README.md
@@ -1,3 +1,7 @@
+The minimum PaddlePaddle version needed for the code sample in this directory is v0.10.0. If you are on a version of PaddlePaddle earlier than v0.10.0, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
+
+---
+
 # Neural Recurrent Sequence Labeling Model for Open-Domain Factoid Question Answering
 
 This model implements the work in the following paper:
diff --git a/nmt_without_attention/README.md b/nmt_without_attention/README.md
index aad847211d2a3c90cd029b4653c4c2ece7fb63f9..deb7ff58ee9c4940964bea8f6a19ca1b54019b6e 100644
--- a/nmt_without_attention/README.md
+++ b/nmt_without_attention/README.md
@@ -1,3 +1,7 @@
+The minimum PaddlePaddle version needed for the code sample in this directory is v0.10.0. If you are on a version of PaddlePaddle earlier than v0.10.0, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
+
+---
+
 # Neural Machine Translation Model
 
 ## Background Introduction
diff --git a/scene_text_recognition/README.md b/scene_text_recognition/README.md
index 9974d1d74b6d3cd6c426ae95fd6969cfc09f4610..f10b4c0d5a966caa0e3deb6b6fd73bcd7538e2e9 100644
--- a/scene_text_recognition/README.md
+++ b/scene_text_recognition/README.md
@@ -1,3 +1,7 @@
+运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此要求，请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
+
+---
+
 # 场景文字识别 (STR, Scene Text Recognition)
 
 ## STR任务简介
diff --git a/scheduled_sampling/README.md b/scheduled_sampling/README.md
index 4691c1f8be868bb9c4af837307c60cf3c9443b7b..2a33f3b248e3cede611e5b4c8647286cc8fb791c 100644
--- a/scheduled_sampling/README.md
+++ b/scheduled_sampling/README.md
@@ -1,3 +1,7 @@
+运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此要求，请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
+
+---
+
 # Scheduled Sampling
 
 ## 概述
diff --git a/sequence_tagging_for_ner/README.md b/sequence_tagging_for_ner/README.md
index cea72acc699fb80708284c3d8813545f650f4612..38e187554537bc5b83a5c658d639c9743047f085 100644
--- a/sequence_tagging_for_ner/README.md
+++ b/sequence_tagging_for_ner/README.md
@@ -1,3 +1,7 @@
+运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此要求，请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
+
+---
+
 # 命名实体识别
 
 以下是本例的简要目录结构及说明：
@@ -88,14 +92,14 @@ Baghdad      NNP  I-NP  I-LOC
 预处理完成后，一条训练样本包含3个部分作为神经网络的输入信息用于训练：（1）句子序列；（2）首字母大写标记序列；（3）标注序列，下表是一条训练样本的示例：
 
 | 句子序列 | 大写标记序列 | 标注序列 |
-|---|---|---|
-| u.n. | 1 | B-ORG |
-| official | 0 | O |
-| ekeus | 1 | B-PER |
-| heads | 0 | O |
-| for | 0 | O |
-| baghdad | 1 | B-LOC |
-| . | 0 | O |
+| -------- | ------------ | -------- |
+| u.n.     | 1            | B-ORG    |
+| official | 0            | O        |
+| ekeus    | 1            | B-PER    |
+| heads    | 0            | O        |
+| for      | 0            | O        |
+| baghdad  | 1            | B-LOC    |
+| .        | 0            | O        |
 
 ## 运行
 ### 编写数据读取接口
diff --git a/ssd/README.cn.md b/ssd/README.cn.md
index b51441820561262d9db68abf6d0aaaffce6971d5..2e510908a43c29352be87ddc061958f568495251 100644
--- a/ssd/README.cn.md
+++ b/ssd/README.cn.md
@@ -1,3 +1,7 @@
+运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此要求，请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
+
+---
+
 # SSD目标检测
 ## 概述
 SSD全称：Single Shot MultiBox Detector，是目标检测领域较新且效果较好的检测算法之一\[[1](#引用)\]，有着检测速度快且检测精度高的有的。PaddlePaddle已集成SSD算法，本示例旨在介绍如何使用PaddlePaddle中的SSD模型进行目标检测。下文首先简要介绍SSD原理，然后介绍示例包含文件及如何使用，接着介绍如何在PASCAL VOC数据集上训练、评估及检测，最后简要介绍如何在自有数据集上使用SSD。
diff --git a/ssd/README.md b/ssd/README.md
index 99856a69d2f557ec8038b3477db8f79334f9f384..22ac492f49819763bb96ebef088760e824eba380 100644
--- a/ssd/README.md
+++ b/ssd/README.md
@@ -1,3 +1,7 @@
+The minimum PaddlePaddle version needed for the code sample in this directory is v0.10.0. If you are on a version of PaddlePaddle earlier than v0.10.0, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
+
+---
+
 # Single Shot MultiBox Detector (SSD) Object Detection
 
 ## Introduction
diff --git a/text_classification/README.md b/text_classification/README.md
index 191ab20f2e84c698df082c52b068e71960715d62..0617e19d3061c2288b2c59dfe53e2053cb8d3be2 100644
--- a/text_classification/README.md
+++ b/text_classification/README.md
@@ -1,3 +1,7 @@
+运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此要求，请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
+
+---
+
 # 文本分类
 
 以下是本例目录包含的文件以及对应说明:
@@ -129,70 +133,70 @@ negative        0.0300 0.9700   i love scifi and am willing to put up with a lot
 
 1. 数据组织
 
-	假设有如下格式的训练数据：每一行为一条样本，以 `\t` 分隔，第一列是类别标签，第二列是输入文本的内容，文本内容中的词语以空格分隔。以下是两条示例数据：
+    假设有如下格式的训练数据：每一行为一条样本，以 `\t` 分隔，第一列是类别标签，第二列是输入文本的内容，文本内容中的词语以空格分隔。以下是两条示例数据：
 
-	```
-	positive        PaddlePaddle is good
-	negative        What a terrible weather
-	```
+    ```
+    positive        PaddlePaddle is good
+    negative        What a terrible weather
+    ```
 
 2. 编写数据读取接口
 
-	自定义数据读取接口只需编写一个 Python 生成器实现**从原始输入文本中解析一条训练样本**的逻辑。以下代码片段实现了读取原始数据返回类型为： `paddle.data_type.integer_value_sequence`（词语在字典的序号）和 `paddle.data_type.integer_value`（类别标签）的 2 个输入给网络中定义的 2 个 `data_layer` 的功能。
-	```python
-	def train_reader(data_dir, word_dict, label_dict):
-	    def reader():
-	        UNK_ID = word_dict["<UNK>"]
-	        word_col = 0
-	        lbl_col = 1
-
-	        for file_name in os.listdir(data_dir):
-	            with open(os.path.join(data_dir, file_name), "r") as f:
-	                for line in f:
-	                    line_split = line.strip().split("\t")
-	                    word_ids = [
-	                        word_dict.get(w, UNK_ID)
-	                        for w in line_split[word_col].split()
-	                    ]
-	                    yield word_ids, label_dict[line_split[lbl_col]]
-
-	    return reader
-	```
-
-	- 关于 PaddlePaddle 中 `data_layer` 接受输入数据的类型，以及数据读取接口对应该返回数据的格式，请参考 [input-types](http://www.paddlepaddle.org/release_doc/0.9.0/doc_cn/ui/data_provider/pydataprovider2.html#input-types) 一节。
-	- 以上代码片段详见本例目录下的 `reader.py` 脚本，`reader.py` 同时提供了读取测试数据的全部代码。
-
-	接下来，只需要将数据读取函数 `train_reader` 作为参数传递给 `train.py` 脚本中的 `paddle.batch` 接口即可使用自定义数据接口读取数据，调用方式如下：
-
-	```python
-	train_reader = paddle.batch(
-	        paddle.reader.shuffle(
-	            reader.train_reader(train_data_dir, word_dict, lbl_dict),
-	            buf_size=1000),
-	        batch_size=batch_size)
-	```
+    自定义数据读取接口只需编写一个 Python 生成器实现**从原始输入文本中解析一条训练样本**的逻辑。以下代码片段实现了读取原始数据返回类型为： `paddle.data_type.integer_value_sequence`（词语在字典的序号）和 `paddle.data_type.integer_value`（类别标签）的 2 个输入给网络中定义的 2 个 `data_layer` 的功能。
+    ```python
+    def train_reader(data_dir, word_dict, label_dict):
+        def reader():
+            UNK_ID = word_dict["<UNK>"]
+            word_col = 0
+            lbl_col = 1
+
+            for file_name in os.listdir(data_dir):
+                with open(os.path.join(data_dir, file_name), "r") as f:
+                    for line in f:
+                        line_split = line.strip().split("\t")
+                        word_ids = [
+                            word_dict.get(w, UNK_ID)
+                            for w in line_split[word_col].split()
+                        ]
+                        yield word_ids, label_dict[line_split[lbl_col]]
+
+        return reader
+    ```
+
+    - 关于 PaddlePaddle 中 `data_layer` 接受输入数据的类型，以及数据读取接口对应该返回数据的格式，请参考 [input-types](http://www.paddlepaddle.org/release_doc/0.9.0/doc_cn/ui/data_provider/pydataprovider2.html#input-types) 一节。
+    - 以上代码片段详见本例目录下的 `reader.py` 脚本，`reader.py` 同时提供了读取测试数据的全部代码。
+
+    接下来，只需要将数据读取函数 `train_reader` 作为参数传递给 `train.py` 脚本中的 `paddle.batch` 接口即可使用自定义数据接口读取数据，调用方式如下：
+
+    ```python
+    train_reader = paddle.batch(
+            paddle.reader.shuffle(
+                reader.train_reader(train_data_dir, word_dict, lbl_dict),
+                buf_size=1000),
+            batch_size=batch_size)
+    ```
 
 3. 修改命令行参数
 
-	- 如果将数据组织成示例数据的同样的格式，只需在 `run.sh` 脚本中修改 `train.py` 启动参数，指定 `train_data_dir` 参数，可以直接运行本例，无需修改数据读取接口 `reader.py`。
-	- 执行 `python train.py --help` 可以获取`train.py` 脚本各项启动参数的详细说明，主要参数如下：
-		- `nn_type`：选择要使用的模型，目前支持两种：“dnn” 或者 “cnn”。
-		- `train_data_dir`：指定训练数据所在的文件夹，使用自定义数据训练，必须指定此参数，否则使用`paddle.dataset.imdb`训练，同时忽略`test_data_dir`，`word_dict`，和 `label_dict` 参数。  
-		- `test_data_dir`：指定测试数据所在的文件夹，若不指定将不进行测试。
-		- `word_dict`：字典文件所在的路径，若不指定，将从训练数据根据词频统计，自动建立字典。
-		- `label_dict`：类别标签字典，用于将字符串类型的类别标签，映射为整数类型的序号。
-		- `batch_size`：指定多少条样本后进行一次神经网络的前向运行及反向更新。
-		- `num_passes`：指定训练多少个轮次。
+    - 如果将数据组织成示例数据的同样的格式，只需在 `run.sh` 脚本中修改 `train.py` 启动参数，指定 `train_data_dir` 参数，可以直接运行本例，无需修改数据读取接口 `reader.py`。
+    - 执行 `python train.py --help` 可以获取`train.py` 脚本各项启动参数的详细说明，主要参数如下：
+        - `nn_type`：选择要使用的模型，目前支持两种：“dnn” 或者 “cnn”。
+        - `train_data_dir`：指定训练数据所在的文件夹，使用自定义数据训练，必须指定此参数，否则使用`paddle.dataset.imdb`训练，同时忽略`test_data_dir`，`word_dict`，和 `label_dict` 参数。  
+        - `test_data_dir`：指定测试数据所在的文件夹，若不指定将不进行测试。
+        - `word_dict`：字典文件所在的路径，若不指定，将从训练数据根据词频统计，自动建立字典。
+        - `label_dict`：类别标签字典，用于将字符串类型的类别标签，映射为整数类型的序号。
+        - `batch_size`：指定多少条样本后进行一次神经网络的前向运行及反向更新。
+        - `num_passes`：指定训练多少个轮次。
 
 ### 如何预测
 
 1. 修改 `infer.py` 中以下变量，指定使用的模型、指定测试数据。
 
-	```python
-	model_path = "dnn_params_pass_00000.tar.gz"  # 指定模型所在的路径
-	nn_type = "dnn"      # 指定测试使用的模型
-	test_dir = "./data/test"      # 指定测试文件所在的目录
-	word_dict = "./data/dict/word_dict.txt"     # 指定字典所在的路径
-	label_dict = "./data/dict/label_dict.txt"    # 指定类别标签字典的路径
-	```
+    ```python
+    model_path = "dnn_params_pass_00000.tar.gz"  # 指定模型所在的路径
+    nn_type = "dnn"      # 指定测试使用的模型
+    test_dir = "./data/test"      # 指定测试文件所在的目录
+    word_dict = "./data/dict/word_dict.txt"     # 指定字典所在的路径
+    label_dict = "./data/dict/label_dict.txt"    # 指定类别标签字典的路径
+    ```
 2. 在终端中执行 `python infer.py`。
diff --git a/youtube_recall/README.cn.md b/youtube_recall/README.cn.md
index 2f20416bb9bf0d202bd38dd7a15b5ea447b0c472..6628a6269b17eb76d2c03de297049235e9c49423 100644
--- a/youtube_recall/README.cn.md
+++ b/youtube_recall/README.cn.md
@@ -1,3 +1,7 @@
+运行本目录下的程序示例需要使用PaddlePaddle v0.10.0 版本。如果您的PaddlePaddle安装版本低于此要求，请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
+
+---
+
 # Youtube DNN推荐模型
 
 以下是本例目录包含的文件以及对应说明:
diff --git a/youtube_recall/README.md b/youtube_recall/README.md
index b67bd33660b72a11444db1880f426102ac5c76d3..b9912abeb82107a14f4c59145d1d091289bfa7f8 100644
--- a/youtube_recall/README.md
+++ b/youtube_recall/README.md
@@ -1,3 +1,7 @@
+The minimum PaddlePaddle version needed for the code sample in this directory is v0.10.0. If you are on a version of PaddlePaddle earlier than v0.10.0, [please update your installation](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html).
+
+---
+
 # Deep Neural Networks for YouTube Recommendations
 
 ## Introduction\[[1](#References)\]