Merge branch 'develop' of https://github.com/PaddlePaddle/models into ctc_doc

f67e732f · wanghaoshuang · 53988dd3 · fa5587d6 · f67e732f · f67e732f
35 changed file
--- a/.gitignore
+++ b/.gitignore
 .DS_Store
 *.pyc
+.*~
--- a/.travis.yml
+++ b/.travis.yml
@@ -17,7 +17,7 @@ addons:
      - python-pip
      - python2.7-dev
      - clang-format-3.8
-  ssh_known_hosts: 52.76.173.135
+  ssh_known_hosts: 13.229.163.131
 before_install:
  - if [[ "$JOB" == "PRE_COMMIT" ]]; then sudo ln -s /usr/bin/clang-format-3.8 /usr/bin/clang-format; fi
  - sudo pip install -U virtualenv pre-commit pip

--- a/fluid/DeepASR/tools/profile.py
+++ b/fluid/DeepASR/tools/profile.py
@@ -168,7 +168,7 @@ def profile(args):
                start_time = time.time()
                frames_seen = 0
            # load_data
-            (features, labels, lod) = batch_data
+            (features, labels, lod, _) = batch_data
            feature_t.set(features, place)
            feature_t.set_lod([lod])
            label_t.set(labels, place)

--- a/fluid/DeepASR/train.py
+++ b/fluid/DeepASR/train.py
@@ -192,7 +192,7 @@ def train(args):
                test_data_reader.batch_iterator(args.batch_size,
                                                args.minimum_batch_size)):
            # load_data
-            (features, labels, lod) = batch_data
+            (features, labels, lod, _) = batch_data
            feature_t.set(features, place)
            feature_t.set_lod([lod])
            label_t.set(labels, place)

--- a/fluid/adversarial/README.md
+++ b/fluid/adversarial/README.md
@@ -4,10 +4,105 @@ The minimum PaddlePaddle version needed for the code sample in this directory is
 # Advbox
-Advbox is a Python toolbox to create adversarial examples that fool neural networks. It requires Python and paddle.
+Advbox is a toolbox to generate adversarial examples that fool neural networks and Advbox can benchmark the robustness of machine learning models.
-## How to use
+The Advbox is based on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) Fluid and is under continual development, always welcoming contributions of the latest method of adversarial attacks and defenses.
-1. train a model and save it's parameters. (like fluid_mnist.py)
-2. load the parameters which is trained in step1, then reconstruct the model.(like mnist_tutorial_fgsm.py)
+## Overview
-3. use advbox to generate the adversarial sample.
+[Szegedy et al.](https://arxiv.org/abs/1312.6199) discovered an intriguing properties of deep neural networks in the context of image classification for the first time. They showed that despite the state-of-the-art deep networks are surprisingly susceptible to adversarial attacks in the form of small perturbations to images that remain (almost) imperceptible to human vision system. These perturbations are found by optimizing the input to maximize the prediction error and the images modified by these perturbations are called as `adversarial examples`. The profound implications of these results triggered a wide interest of researchers in adversarial attacks and their defenses for deep learning in general.
+Advbox is similar to [Foolbox](https://github.com/bethgelab/foolbox) and [CleverHans](https://github.com/tensorflow/cleverhans). CleverHans only supports TensorFlow framework while foolbox interfaces with many popular machine learning frameworks such as PyTorch, Keras, TensorFlow, Theano, Lasagne and MXNet. However, these two great libraries don't support PaddlePaddle, an easy-to-use, efficient, flexible and scalable deep learning platform which is originally developed by Baidu scientists and engineers for the purpose of applying deep learning to many products at Baidu.
+## Usage
+Advbox provides many stable reference implementations of modern methods to generate adversarial examples such as FGSM, DeepFool, JSMA. When you want to benchmark the robustness of your neural networks , you can use the advbox to generate some adversarial examples and benchmark the networks. Some tips of using Advbox:
+1. Train a model and save the parameters.
+2. Load the parameters which has been trained，then reconstruct the model.
+3. Use advbox to generate the adversarial samples.
+#### Dependencies
+* PaddlePaddle: [the lastest develop branch](http://www.paddlepaddle.org/docs/develop/documentation/en/build_and_install/pip_install_en.html)
+* Python 2.x
+#### Structure
+Network models, attack method's implements and the criterion that defines adversarial examples are three essential elements to generate adversarial examples. Misclassification is adopted as the adversarial criterion for briefness in Advbox.
+The structure of Advbox module are as follows:
+    .
+    ├── advbox
+    |   ├── __init__.py
+    |   ├── attack
+    |        ├── __init__.py
+    |        ├── base.py
+    |        ├── deepfool.py
+    |        ├── gradient_method.py
+    |        ├── lbfgs.py
+    |        └── saliency.py
+    |   ├── models
+    |        ├── __init__.py
+    |        ├── base.py
+    |        └── paddle.py
+    |   └── adversary.py
+    ├── tutorials
+    |   ├── __init__.py
+    |   ├── mnist_model.py
+    |   ├── mnist_tutorial_lbfgs.py
+    |   ├── mnist_tutorial_fgsm.py
+    |   ├── mnist_tutorial_bim.py
+    |   ├── mnist_tutorial_ilcm.py
+    |   ├── mnist_tutorial_jsma.py
+    |   └── mnist_tutorial_deepfool.py
+    └── README.md
+**advbox.attack**
+Advbox implements several popular adversarial attacks which search adversarial examples. Each attack method uses a distance measure(L1, L2, etc.) to quantify the size of adversarial perturbations. Advbox is easy to craft adversarial example as some attack methods could perform internal hyperparameter tuning to find the minimum perturbation.
+**advbox.model**
+Advbox implements interfaces to PaddlePaddle. Additionally, other deep learning framworks such as TensorFlow can also be defined and employed. The module is use to compute predictions and gradients for given inputs in a specific framework.
+**advbox.adversary**
+Adversary contains the original object, the target and the adversarial examples. It provides the misclassification as the criterion to accept a adversarial example.
+## Tutorials
+The `./tutorials/` folder provides some tutorials to generate adversarial examples on the MNIST dataset. You can slightly modify the code to apply to other dataset. These attack methods are supported in Advbox:
+* [L-BFGS](https://arxiv.org/abs/1312.6199)
+* [FGSM](https://arxiv.org/abs/1412.6572)
+* [BIM](https://arxiv.org/abs/1607.02533)
+* [ILCM](https://arxiv.org/abs/1607.02533)
+* [JSMA](https://arxiv.org/pdf/1511.07528)
+* [DeepFool](https://arxiv.org/abs/1511.04599)
+## Testing
+Benchmarks on a vanilla CNN model.
+> MNIST
+|  adversarial attacks  |  fooling rate (non-targeted)  | fooling rate (targeted) | max_epsilon | iterations | Strength |
+|:-----:| :----: | :---: | :----: | :----: | :----: |
+|L-BFGS| --- | 89.2% | --- | One shot | *** |
+|FGSM| 57.8% | 26.55% | 0.3 | One shot| *** |
+|BIM| 97.4% | --- | 0.1 | 100 | **** |
+|ILCM| ---  | 100.0% | 0.1 | 100 | **** |
+|JSMA| 96.8% | 90.4%| 0.1 | 2000 | *** |
+|DeepFool| 97.7% | 51.3% | --- | 100 | **** |
+* The strength (higher for more asterisks) is based on the impression from the reviewed literature.
+---
+## References
+* [Intriguing properties of neural networks](https://arxiv.org/abs/1312.6199), C. Szegedy et al., arxiv 2014
+* [Explaining and Harnessing Adversarial Examples](https://arxiv.org/abs/1412.6572), I. Goodfellow et al., ICLR 2015
+* [Adversarial Examples In The Physical World](https://arxiv.org/pdf/1607.02533v3.pdf), A. Kurakin et al., ICLR workshop 2017
+* [The Limitations of Deep Learning in Adversarial Settings](https://arxiv.org/abs/1511.07528), N. Papernot et al., ESSP 2016
+* [DeepFool: a simple and accurate method to fool deep neural networks](https://arxiv.org/abs/1511.04599), S. Moosavi-Dezfooli et al., CVPR 2016
+* [Foolbox: A Python toolbox to benchmark the robustness of machine learning models] (https://arxiv.org/abs/1707.04131), Jonas Rauber et al., arxiv 2018
+* [CleverHans: An adversarial example library for constructing attacks, building defenses, and benchmarking both](https://github.com/tensorflow/cleverhans#setting-up-cleverhans)
+* [Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey](https://arxiv.org/abs/1801.00553), Naveed Akhtar, Ajmal Mian, arxiv 2018
--- a/fluid/adversarial/advbox/attacks/gradient_method.py
+++ b/fluid/adversarial/advbox/attacks/gradient_method.py
@@ -32,7 +32,12 @@ class GradientMethodAttack(Attack):
        super(GradientMethodAttack, self).__init__(model)
        self.support_targeted = support_targeted
-    def _apply(self, adversary, norm_ord=np.inf, epsilons=0.01, steps=100):
+    def _apply(self,
+               adversary,
+               norm_ord=np.inf,
+               epsilons=0.01,
+               steps=1,
+               epsilon_steps=100):
        """
        Apply the gradient attack method.
        :param adversary(Adversary):
@@ -41,8 +46,11 @@ class GradientMethodAttack(Attack):
            Order of the norm, such as np.inf, 1, 2, etc. It can't be 0.
        :param epsilons(list|tuple|int):
            Attack step size (input variation).
+            Largest step size if epsilons is not iterable.
        :param steps:
-            The number of iterator steps.
+            The number of attack iteration.
+        :param epsilon_steps:
+            The number of Epsilons' iteration for each attack iteration.
        :return:
            adversary(Adversary): The Adversary object.
        """
@@ -55,7 +63,7 @@ class GradientMethodAttack(Attack):
                    "This attack method doesn't support targeted attack!")
        if not isinstance(epsilons, Iterable):
-            epsilons = np.linspace(epsilons, epsilons + 1e-10, num=steps)
+            epsilons = np.linspace(0, epsilons, num=epsilon_steps)
        pre_label = adversary.original_label
        min_, max_ = self.model.bounds()
@@ -65,30 +73,33 @@ class GradientMethodAttack(Attack):
                self.model.channel_axis() == adversary.original.shape[0] or
                self.model.channel_axis() == adversary.original.shape[-1])
-        step = 1
+        for epsilon in epsilons[:]:
-        adv_img = adversary.original
+            step = 1
-        for epsilon in epsilons[:steps]:
+            adv_img = adversary.original
-            if epsilon == 0.0:
+            for i in range(steps):
-                continue
+                if epsilon == 0.0:
-            if adversary.is_targeted_attack:
+                    continue
-                gradient = -self.model.gradient(adv_img, adversary.target_label)
+                if adversary.is_targeted_attack:
-            else:
+                    gradient = -self.model.gradient(adv_img,
-                gradient = self.model.gradient(adv_img,
+                                                    adversary.target_label)
-                                               adversary.original_label)
+                else:
-            if norm_ord == np.inf:
+                    gradient = self.model.gradient(adv_img,
-                gradient_norm = np.sign(gradient)
+                                                   adversary.original_label)
-            else:
+                if norm_ord == np.inf:
-                gradient_norm = gradient / self._norm(gradient, ord=norm_ord)
+                    gradient_norm = np.sign(gradient)
+                else:
-            adv_img = adv_img + epsilon * gradient_norm * (max_ - min_)
+                    gradient_norm = gradient / self._norm(
-            adv_img = np.clip(adv_img, min_, max_)
+                        gradient, ord=norm_ord)
-            adv_label = np.argmax(self.model.predict(adv_img))
-            logging.info('step={}, epsilon = {:.5f}, pre_label = {}, '
+                adv_img = adv_img + epsilon * gradient_norm * (max_ - min_)
-                         'adv_label={}'.format(step, epsilon, pre_label,
+                adv_img = np.clip(adv_img, min_, max_)
-                                               adv_label))
+                adv_label = np.argmax(self.model.predict(adv_img))
-            if adversary.try_accept_the_example(adv_img, adv_label):
+                logging.info('step={}, epsilon = {:.5f}, pre_label = {}, '
-                return adversary
+                             'adv_label={}'.format(step, epsilon, pre_label,
-            step += 1
+                                                   adv_label))
+                if adversary.try_accept_the_example(adv_img, adv_label):
+                    return adversary
+                step += 1
        return adversary
    @staticmethod
@@ -113,7 +124,7 @@ class FastGradientSignMethodTargetedAttack(GradientMethodAttack):
    Paper link: https://arxiv.org/abs/1412.6572
    """
-    def _apply(self, adversary, epsilons=0.03):
+    def _apply(self, adversary, epsilons=0.01):
        return GradientMethodAttack._apply(
            self,
            adversary=adversary,
@@ -144,7 +155,7 @@ class IterativeLeastLikelyClassMethodAttack(GradientMethodAttack):
    Paper link: https://arxiv.org/abs/1607.02533
    """
-    def _apply(self, adversary, epsilons=0.001, steps=1000):
+    def _apply(self, adversary, epsilons=0.01, steps=1000):
        return GradientMethodAttack._apply(
            self,
            adversary=adversary,

--- a/fluid/adversarial/mnist_tutorial_fgsm.py
+++ b/fluid/adversarial/mnist_tutorial_fgsm.py
-"""
-FGSM demos on mnist using advbox tool.
-"""
-import matplotlib.pyplot as plt
-import paddle.v2 as paddle
-import paddle.fluid as fluid
-from advbox.adversary import Adversary
-from advbox.attacks.gradient_method import FGSM
-from advbox.models.paddle import PaddleModel
-def cnn_model(img):
-    """
-    Mnist cnn model
-    Args:
-        img(Varaible): the input image to be recognized
-    Returns:
-        Variable: the label prediction
-    """
-    # conv1 = fluid.nets.conv2d()
-    conv_pool_1 = fluid.nets.simple_img_conv_pool(
-        input=img,
-        num_filters=20,
-        filter_size=5,
-        pool_size=2,
-        pool_stride=2,
-        act='relu')
-    conv_pool_2 = fluid.nets.simple_img_conv_pool(
-        input=conv_pool_1,
-        num_filters=50,
-        filter_size=5,
-        pool_size=2,
-        pool_stride=2,
-        act='relu')
-    logits = fluid.layers.fc(input=conv_pool_2, size=10, act='softmax')
-    return logits
-def main():
-    """
-    Advbox demo which demonstrate how to use advbox.
-    """
-    IMG_NAME = 'img'
-    LABEL_NAME = 'label'
-    img = fluid.layers.data(name=IMG_NAME, shape=[1, 28, 28], dtype='float32')
-    # gradient should flow
-    img.stop_gradient = False
-    label = fluid.layers.data(name=LABEL_NAME, shape=[1], dtype='int64')
-    logits = cnn_model(img)
-    cost = fluid.layers.cross_entropy(input=logits, label=label)
-    avg_cost = fluid.layers.mean(x=cost)
-    place = fluid.CPUPlace()
-    exe = fluid.Executor(place)
-    BATCH_SIZE = 1
-    train_reader = paddle.batch(
-        paddle.reader.shuffle(
-            paddle.dataset.mnist.train(), buf_size=500),
-        batch_size=BATCH_SIZE)
-    feeder = fluid.DataFeeder(
-        feed_list=[IMG_NAME, LABEL_NAME],
-        place=place,
-        program=fluid.default_main_program())
-    fluid.io.load_params(
-        exe, "./mnist/", main_program=fluid.default_main_program())
-    # advbox demo
-    m = PaddleModel(fluid.default_main_program(), IMG_NAME, LABEL_NAME,
-                    logits.name, avg_cost.name, (-1, 1))
-    att = FGSM(m)
-    for data in train_reader():
-        # fgsm attack
-        adversary = att(Adversary(data[0][0], data[0][1]))
-        if adversary.is_successful():
-            plt.imshow(adversary.target, cmap='Greys_r')
-            plt.show()
-            # np.save('adv_img', adversary.target)
-        break
-if __name__ == '__main__':
-    main()
--- a/fluid/adversarial/mnist_tutorial_jsma.py
+++ b/fluid/adversarial/mnist_tutorial_jsma.py
-"""
-FGSM demos on mnist using advbox tool.
-"""
-import matplotlib.pyplot as plt
-import paddle.v2 as paddle
-import paddle.fluid as fluid
-import numpy as np
-from advbox import Adversary
-from advbox.attacks.saliency import SaliencyMapAttack
-from advbox.models.paddle import PaddleModel
-def cnn_model(img):
-    """
-    Mnist cnn model
-    Args:
-        img(Varaible): the input image to be recognized
-    Returns:
-        Variable: the label prediction
-    """
-    # conv1 = fluid.nets.conv2d()
-    conv_pool_1 = fluid.nets.simple_img_conv_pool(
-        input=img,
-        num_filters=20,
-        filter_size=5,
-        pool_size=2,
-        pool_stride=2,
-        act='relu')
-    conv_pool_2 = fluid.nets.simple_img_conv_pool(
-        input=conv_pool_1,
-        num_filters=50,
-        filter_size=5,
-        pool_size=2,
-        pool_stride=2,
-        act='relu')
-    logits = fluid.layers.fc(input=conv_pool_2, size=10, act='softmax')
-    return logits
-def main():
-    """
-    Advbox demo which demonstrate how to use advbox.
-    """
-    IMG_NAME = 'img'
-    LABEL_NAME = 'label'
-    img = fluid.layers.data(name=IMG_NAME, shape=[1, 28, 28], dtype='float32')
-    # gradient should flow
-    img.stop_gradient = False
-    label = fluid.layers.data(name=LABEL_NAME, shape=[1], dtype='int64')
-    logits = cnn_model(img)
-    cost = fluid.layers.cross_entropy(input=logits, label=label)
-    avg_cost = fluid.layers.mean(x=cost)
-    place = fluid.CPUPlace()
-    exe = fluid.Executor(place)
-    BATCH_SIZE = 1
-    train_reader = paddle.batch(
-        paddle.reader.shuffle(
-            paddle.dataset.mnist.train(), buf_size=500),
-        batch_size=BATCH_SIZE)
-    feeder = fluid.DataFeeder(
-        feed_list=[IMG_NAME, LABEL_NAME],
-        place=place,
-        program=fluid.default_main_program())
-    fluid.io.load_params(
-        exe, "./mnist/", main_program=fluid.default_main_program())
-    # advbox demo
-    m = PaddleModel(fluid.default_main_program(), IMG_NAME, LABEL_NAME,
-                    logits.name, avg_cost.name, (-1, 1))
-    attack = SaliencyMapAttack(m)
-    total_num = 0
-    success_num = 0
-    for data in train_reader():
-        total_num += 1
-        # adversary.set_target(True, target_label=target_label)
-        jsma_attack = attack(Adversary(data[0][0], data[0][1]))
-        if jsma_attack is not None and jsma_attack.is_successful():
-            # plt.imshow(jsma_attack.target, cmap='Greys_r')
-            # plt.show()
-            success_num += 1
-            print('original_label=%d, adversary examples label =%d' %
-                  (data[0][1], jsma_attack.adversarial_label))
-            # np.save('adv_img', jsma_attack.adversarial_example)
-        print('total num = %d, success num = %d ' % (total_num, success_num))
-        if total_num == 100:
-            break
-if __name__ == '__main__':
-    main()
--- a/fluid/adversarial/tutorials/__init__.py
+++ b/fluid/adversarial/tutorials/__init__.py
+"""
+   A set of tutorials for generating adversarial examples with advbox.
+"""
\ No newline at end of file
--- a/fluid/adversarial/fluid_mnist.py
+++ b/fluid/adversarial/fluid_mnist.py
@@ -30,8 +30,9 @@ def mnist_cnn_model(img):
        pool_size=2,
        pool_stride=2,
        act='relu')
+    fc = fluid.layers.fc(input=conv_pool_2, size=50, act='relu')
-    logits = fluid.layers.fc(input=conv_pool_2, size=10, act='softmax')
+    logits = fluid.layers.fc(input=fc, size=10, act='softmax')
    return logits
@@ -60,7 +61,10 @@ def main():
            paddle.dataset.mnist.train(), buf_size=500),
        batch_size=BATCH_SIZE)
+    # use CPU
    place = fluid.CPUPlace()
+    # use GPU
+    # place = fluid.CUDAPlace(0)
    exe = fluid.Executor(place)
    feeder = fluid.DataFeeder(feed_list=[img, label], place=place)
    exe.run(fluid.default_startup_program())
@@ -74,9 +78,11 @@ def main():
                feed=feeder.feed(data),
                fetch_list=[avg_cost, batch_acc, batch_size])
            pass_acc.add(value=acc, weight=b_size)
+            pass_acc_val = pass_acc.eval()[0]
            print("pass_id=" + str(pass_id) + " acc=" + str(acc[0]) +
-                  " pass_acc=" + str(pass_acc.eval()[0]))
+                  " pass_acc=" + str(pass_acc_val))
-            if loss < LOSS_THRESHOLD and pass_acc > ACC_THRESHOLD:
+            if loss < LOSS_THRESHOLD and pass_acc_val > ACC_THRESHOLD:
+                # early stop
                break
        print("pass_id=" + str(pass_id) + " pass_acc=" + str(pass_acc.eval()[

--- a/fluid/adversarial/tutorials/mnist_tutorial_bim.py
+++ b/fluid/adversarial/tutorials/mnist_tutorial_bim.py
+"""
+BIM tutorial on mnist using advbox tool.
+BIM method iteratively take multiple small steps while adjusting the direction after each step.
+It only supports non-targeted attack.
+"""
+import sys
+sys.path.append("..")
+import matplotlib.pyplot as plt
+import paddle.fluid as fluid
+import paddle.v2 as paddle
+from advbox.adversary import Adversary
+from advbox.attacks.gradient_method import BIM
+from advbox.models.paddle import PaddleModel
+from tutorials.mnist_model import mnist_cnn_model
+def main():
+    """
+    Advbox demo which demonstrate how to use advbox.
+    """
+    TOTAL_NUM = 500
+    IMG_NAME = 'img'
+    LABEL_NAME = 'label'
+    img = fluid.layers.data(name=IMG_NAME, shape=[1, 28, 28], dtype='float32')
+    # gradient should flow
+    img.stop_gradient = False
+    label = fluid.layers.data(name=LABEL_NAME, shape=[1], dtype='int64')
+    logits = mnist_cnn_model(img)
+    cost = fluid.layers.cross_entropy(input=logits, label=label)
+    avg_cost = fluid.layers.mean(x=cost)
+    # use CPU
+    place = fluid.CPUPlace()
+    # use GPU
+    # place = fluid.CUDAPlace(0)
+    exe = fluid.Executor(place)
+    BATCH_SIZE = 1
+    train_reader = paddle.batch(
+        paddle.reader.shuffle(
+            paddle.dataset.mnist.train(), buf_size=128 * 10),
+        batch_size=BATCH_SIZE)
+    test_reader = paddle.batch(
+        paddle.reader.shuffle(
+            paddle.dataset.mnist.test(), buf_size=128 * 10),
+        batch_size=BATCH_SIZE)
+    fluid.io.load_params(
+        exe, "./mnist/", main_program=fluid.default_main_program())
+    # advbox demo
+    m = PaddleModel(
+        fluid.default_main_program(),
+        IMG_NAME,
+        LABEL_NAME,
+        logits.name,
+        avg_cost.name, (-1, 1),
+        channel_axis=1)
+    attack = BIM(m)
+    attack_config = {"epsilons": 0.1, "steps": 100}
+    # use train data to generate adversarial examples
+    total_count = 0
+    fooling_count = 0
+    for data in train_reader():
+        total_count += 1
+        adversary = Adversary(data[0][0], data[0][1])
+        # BIM non-targeted attack
+        adversary = attack(adversary, **attack_config)
+        if adversary.is_successful():
+            fooling_count += 1
+            print(
+                'attack success, original_label=%d, adversarial_label=%d, count=%d'
+                % (data[0][1], adversary.adversarial_label, total_count))
+            # plt.imshow(adversary.target, cmap='Greys_r')
+            # plt.show()
+            # np.save('adv_img', adversary.target)
+        else:
+            print('attack failed, original_label=%d, count=%d' %
+                  (data[0][1], total_count))
+        if total_count >= TOTAL_NUM:
+            print(
+                "[TRAIN_DATASET]: fooling_count=%d, total_count=%d, fooling_rate=%f"
+                % (fooling_count, total_count,
+                   float(fooling_count) / total_count))
+            break
+    # use test data to generate adversarial examples
+    total_count = 0
+    fooling_count = 0
+    for data in test_reader():
+        total_count += 1
+        adversary = Adversary(data[0][0], data[0][1])
+        # BIM non-targeted attack
+        adversary = attack(adversary, **attack_config)
+        if adversary.is_successful():
+            fooling_count += 1
+            print(
+                'attack success, original_label=%d, adversarial_label=%d, count=%d'
+                % (data[0][1], adversary.adversarial_label, total_count))
+            # plt.imshow(adversary.target, cmap='Greys_r')
+            # plt.show()
+            # np.save('adv_img', adversary.target)
+        else:
+            print('attack failed, original_label=%d, count=%d' %
+                  (data[0][1], total_count))
+        if total_count >= TOTAL_NUM:
+            print(
+                "[TEST_DATASET]: fooling_count=%d, total_count=%d, fooling_rate=%f"
+                % (fooling_count, total_count,
+                   float(fooling_count) / total_count))
+            break
+    print("bim attack done")
+if __name__ == '__main__':
+    main()
--- a/fluid/adversarial/tutorials/mnist_tutorial_deepfool.py
+++ b/fluid/adversarial/tutorials/mnist_tutorial_deepfool.py
+"""
+DeepFool tutorial on mnist using advbox tool.
+Deepfool is a simple and accurate adversarial attack method.
+It supports both targeted attack and non-targeted attack.
+"""
+import sys
+sys.path.append("..")
+import matplotlib.pyplot as plt
+import paddle.fluid as fluid
+import paddle.v2 as paddle
+from advbox.adversary import Adversary
+from advbox.attacks.deepfool import DeepFoolAttack
+from advbox.models.paddle import PaddleModel
+from tutorials.mnist_model import mnist_cnn_model
+def main():
+    """
+    Advbox demo which demonstrate how to use advbox.
+    """
+    TOTAL_NUM = 500
+    IMG_NAME = 'img'
+    LABEL_NAME = 'label'
+    img = fluid.layers.data(name=IMG_NAME, shape=[1, 28, 28], dtype='float32')
+    # gradient should flow
+    img.stop_gradient = False
+    label = fluid.layers.data(name=LABEL_NAME, shape=[1], dtype='int64')
+    logits = mnist_cnn_model(img)
+    cost = fluid.layers.cross_entropy(input=logits, label=label)
+    avg_cost = fluid.layers.mean(x=cost)
+    # use CPU
+    place = fluid.CPUPlace()
+    # use GPU
+    # place = fluid.CUDAPlace(0)
+    exe = fluid.Executor(place)
+    BATCH_SIZE = 1
+    train_reader = paddle.batch(
+        paddle.reader.shuffle(
+            paddle.dataset.mnist.train(), buf_size=128 * 10),
+        batch_size=BATCH_SIZE)
+    test_reader = paddle.batch(
+        paddle.reader.shuffle(
+            paddle.dataset.mnist.test(), buf_size=128 * 10),
+        batch_size=BATCH_SIZE)
+    fluid.io.load_params(
+        exe, "./mnist/", main_program=fluid.default_main_program())
+    # advbox demo
+    m = PaddleModel(
+        fluid.default_main_program(),
+        IMG_NAME,
+        LABEL_NAME,
+        logits.name,
+        avg_cost.name, (-1, 1),
+        channel_axis=1)
+    attack = DeepFoolAttack(m)
+    attack_config = {"iterations": 100, "overshoot": 9}
+    # use train data to generate adversarial examples
+    total_count = 0
+    fooling_count = 0
+    for data in train_reader():
+        total_count += 1
+        adversary = Adversary(data[0][0], data[0][1])
+        # DeepFool non-targeted attack
+        adversary = attack(adversary, **attack_config)
+        # DeepFool targeted attack
+        # tlabel = 0
+        # adversary.set_target(is_targeted_attack=True, target_label=tlabel)
+        # adversary = attack(adversary, **attack_config)
+        if adversary.is_successful():
+            fooling_count += 1
+            print(
+                'attack success, original_label=%d, adversarial_label=%d, count=%d'
+                % (data[0][1], adversary.adversarial_label, total_count))
+            # plt.imshow(adversary.target, cmap='Greys_r')
+            # plt.show()
+            # np.save('adv_img', adversary.target)
+        else:
+            print('attack failed, original_label=%d, count=%d' %
+                  (data[0][1], total_count))
+        if total_count >= TOTAL_NUM:
+            print(
+                "[TRAIN_DATASET]: fooling_count=%d, total_count=%d, fooling_rate=%f"
+                % (fooling_count, total_count,
+                   float(fooling_count) / total_count))
+            break
+    # use test data to generate adversarial examples
+    total_count = 0
+    fooling_count = 0
+    for data in test_reader():
+        total_count += 1
+        adversary = Adversary(data[0][0], data[0][1])
+        # DeepFool non-targeted attack
+        adversary = attack(adversary, **attack_config)
+        # DeepFool targeted attack
+        # tlabel = 0
+        # adversary.set_target(is_targeted_attack=True, target_label=tlabel)
+        # adversary = attack(adversary, **attack_config)
+        if adversary.is_successful():
+            fooling_count += 1
+            print(
+                'attack success, original_label=%d, adversarial_label=%d, count=%d'
+                % (data[0][1], adversary.adversarial_label, total_count))
+            # plt.imshow(adversary.target, cmap='Greys_r')
+            # plt.show()
+            # np.save('adv_img', adversary.target)
+        else:
+            print('attack failed, original_label=%d, count=%d' %
+                  (data[0][1], total_count))
+        if total_count >= TOTAL_NUM:
+            print(
+                "[TEST_DATASET]: fooling_count=%d, total_count=%d, fooling_rate=%f"
+                % (fooling_count, total_count,
+                   float(fooling_count) / total_count))
+            break
+    print("deelfool attack done")
+if __name__ == '__main__':
+    main()
--- a/fluid/adversarial/tutorials/mnist_tutorial_fgsm.py
+++ b/fluid/adversarial/tutorials/mnist_tutorial_fgsm.py
+"""
+FGSM tutorial on mnist using advbox tool.
+FGSM method is non-targeted attack while FGSMT is targeted attack.
+"""
+import sys
+sys.path.append("..")
+import matplotlib.pyplot as plt
+import numpy as np
+import paddle.fluid as fluid
+import paddle.v2 as paddle
+from advbox.adversary import Adversary
+from advbox.attacks.gradient_method import FGSM
+from advbox.attacks.gradient_method import FGSMT
+from advbox.models.paddle import PaddleModel
+from tutorials.mnist_model import mnist_cnn_model
+def main():
+    """
+    Advbox demo which demonstrate how to use advbox.
+    """
+    TOTAL_NUM = 500
+    IMG_NAME = 'img'
+    LABEL_NAME = 'label'
+    img = fluid.layers.data(name=IMG_NAME, shape=[1, 28, 28], dtype='float32')
+    # gradient should flow
+    img.stop_gradient = False
+    label = fluid.layers.data(name=LABEL_NAME, shape=[1], dtype='int64')
+    logits = mnist_cnn_model(img)
+    cost = fluid.layers.cross_entropy(input=logits, label=label)
+    avg_cost = fluid.layers.mean(x=cost)
+    # use CPU
+    place = fluid.CPUPlace()
+    # use GPU
+    # place = fluid.CUDAPlace(0)
+    exe = fluid.Executor(place)
+    BATCH_SIZE = 1
+    train_reader = paddle.batch(
+        paddle.reader.shuffle(
+            paddle.dataset.mnist.train(), buf_size=128 * 10),
+        batch_size=BATCH_SIZE)
+    test_reader = paddle.batch(
+        paddle.reader.shuffle(
+            paddle.dataset.mnist.test(), buf_size=128 * 10),
+        batch_size=BATCH_SIZE)
+    fluid.io.load_params(
+        exe, "./mnist/", main_program=fluid.default_main_program())
+    # advbox demo
+    m = PaddleModel(
+        fluid.default_main_program(),
+        IMG_NAME,
+        LABEL_NAME,
+        logits.name,
+        avg_cost.name, (-1, 1),
+        channel_axis=1)
+    attack = FGSM(m)
+    # attack = FGSMT(m)
+    attack_config = {"epsilons": 0.3}
+    # use train data to generate adversarial examples
+    total_count = 0
+    fooling_count = 0
+    for data in train_reader():
+        total_count += 1
+        adversary = Adversary(data[0][0], data[0][1])
+        # FGSM non-targeted attack
+        adversary = attack(adversary, **attack_config)
+        # FGSMT targeted attack
+        # tlabel = 0
+        # adversary.set_target(is_targeted_attack=True, target_label=tlabel)
+        # adversary = attack(adversary, **attack_config)
+        if adversary.is_successful():
+            fooling_count += 1
+            print(
+                'attack success, original_label=%d, adversarial_label=%d, count=%d'
+                % (data[0][1], adversary.adversarial_label, total_count))
+            # plt.imshow(adversary.target, cmap='Greys_r')
+            # plt.show()
+            # np.save('adv_img', adversary.target)
+        else:
+            print('attack failed, original_label=%d, count=%d' %
+                  (data[0][1], total_count))
+        if total_count >= TOTAL_NUM:
+            print(
+                "[TRAIN_DATASET]: fooling_count=%d, total_count=%d, fooling_rate=%f"
+                % (fooling_count, total_count,
+                   float(fooling_count) / total_count))
+            break
+    # use test data to generate adversarial examples
+    total_count = 0
+    fooling_count = 0
+    for data in test_reader():
+        total_count += 1
+        adversary = Adversary(data[0][0], data[0][1])
+        # FGSM non-targeted attack
+        adversary = attack(adversary, **attack_config)
+        # FGSMT targeted attack
+        # tlabel = 0
+        # adversary.set_target(is_targeted_attack=True, target_label=tlabel)
+        # adversary = attack(adversary, **attack_config)
+        if adversary.is_successful():
+            fooling_count += 1
+            print(
+                'attack success, original_label=%d, adversarial_label=%d, count=%d'
+                % (data[0][1], adversary.adversarial_label, total_count))
+            # plt.imshow(adversary.target, cmap='Greys_r')
+            # plt.show()
+            # np.save('adv_img', adversary.target)
+        else:
+            print('attack failed, original_label=%d, count=%d' %
+                  (data[0][1], total_count))
+        if total_count >= TOTAL_NUM:
+            print(
+                "[TEST_DATASET]: fooling_count=%d, total_count=%d, fooling_rate=%f"
+                % (fooling_count, total_count,
+                   float(fooling_count) / total_count))
+            break
+    print("fgsm attack done")
+if __name__ == '__main__':
+    main()
--- a/fluid/adversarial/tutorials/mnist_tutorial_ilcm.py
+++ b/fluid/adversarial/tutorials/mnist_tutorial_ilcm.py
+"""
+ILCM tutorial on mnist using advbox tool.
+ILCM method extends "BIM" to support targeted attack.
+"""
+import sys
+sys.path.append("..")
+import matplotlib.pyplot as plt
+import paddle.fluid as fluid
+import paddle.v2 as paddle
+from advbox.adversary import Adversary
+from advbox.attacks.gradient_method import ILCM
+from advbox.models.paddle import PaddleModel
+from tutorials.mnist_model import mnist_cnn_model
+def main():
+    """
+    Advbox demo which demonstrate how to use advbox.
+    """
+    TOTAL_NUM = 500
+    IMG_NAME = 'img'
+    LABEL_NAME = 'label'
+    img = fluid.layers.data(name=IMG_NAME, shape=[1, 28, 28], dtype='float32')
+    # gradient should flow
+    img.stop_gradient = False
+    label = fluid.layers.data(name=LABEL_NAME, shape=[1], dtype='int64')
+    logits = mnist_cnn_model(img)
+    cost = fluid.layers.cross_entropy(input=logits, label=label)
+    avg_cost = fluid.layers.mean(x=cost)
+    # use CPU
+    place = fluid.CPUPlace()
+    # use GPU
+    # place = fluid.CUDAPlace(0)
+    exe = fluid.Executor(place)
+    BATCH_SIZE = 1
+    train_reader = paddle.batch(
+        paddle.reader.shuffle(
+            paddle.dataset.mnist.train(), buf_size=128 * 10),
+        batch_size=BATCH_SIZE)
+    test_reader = paddle.batch(
+        paddle.reader.shuffle(
+            paddle.dataset.mnist.test(), buf_size=128 * 10),
+        batch_size=BATCH_SIZE)
+    fluid.io.load_params(
+        exe, "./mnist/", main_program=fluid.default_main_program())
+    # advbox demo
+    m = PaddleModel(
+        fluid.default_main_program(),
+        IMG_NAME,
+        LABEL_NAME,
+        logits.name,
+        avg_cost.name, (-1, 1),
+        channel_axis=1)
+    attack = ILCM(m)
+    attack_config = {"epsilons": 0.1, "steps": 100}
+    # use train data to generate adversarial examples
+    total_count = 0
+    fooling_count = 0
+    for data in train_reader():
+        total_count += 1
+        adversary = Adversary(data[0][0], data[0][1])
+        tlabel = 0
+        adversary.set_target(is_targeted_attack=True, target_label=tlabel)
+        # ILCM targeted attack
+        adversary = attack(adversary, **attack_config)
+        if adversary.is_successful():
+            fooling_count += 1
+            print(
+                'attack success, original_label=%d, adversarial_label=%d, count=%d'
+                % (data[0][1], adversary.adversarial_label, total_count))
+            # plt.imshow(adversary.target, cmap='Greys_r')
+            # plt.show()
+            # np.save('adv_img', adversary.target)
+        else:
+            print('attack failed, original_label=%d, count=%d' %
+                  (data[0][1], total_count))
+        if total_count >= TOTAL_NUM:
+            print(
+                "[TRAIN_DATASET]: fooling_count=%d, total_count=%d, fooling_rate=%f"
+                % (fooling_count, total_count,
+                   float(fooling_count) / total_count))
+            break
+    # use test data to generate adversarial examples
+    total_count = 0
+    fooling_count = 0
+    for data in test_reader():
+        total_count += 1
+        adversary = Adversary(data[0][0], data[0][1])
+        tlabel = 0
+        adversary.set_target(is_targeted_attack=True, target_label=tlabel)
+        # ILCM targeted attack
+        adversary = attack(adversary, **attack_config)
+        if adversary.is_successful():
+            fooling_count += 1
+            print(
+                'attack success, original_label=%d, adversarial_label=%d, count=%d'
+                % (data[0][1], adversary.adversarial_label, total_count))
+            # plt.imshow(adversary.target, cmap='Greys_r')
+            # plt.show()
+            # np.save('adv_img', adversary.target)
+        else:
+            print('attack failed, original_label=%d, count=%d' %
+                  (data[0][1], total_count))
+        if total_count >= TOTAL_NUM:
+            print(
+                "[TEST_DATASET]: fooling_count=%d, total_count=%d, fooling_rate=%f"
+                % (fooling_count, total_count,
+                   float(fooling_count) / total_count))
+            break
+    print("ilcm attack done")
+if __name__ == '__main__':
+    main()
--- a/fluid/adversarial/tutorials/mnist_tutorial_jsma.py
+++ b/fluid/adversarial/tutorials/mnist_tutorial_jsma.py
+"""
+JSMA tutorial on mnist using advbox tool.
+JSMA method supports both targeted attack and non-targeted attack.
+"""
+import sys
+sys.path.append("..")
+import matplotlib.pyplot as plt
+import paddle.fluid as fluid
+import paddle.v2 as paddle
+from advbox.adversary import Adversary
+from advbox.attacks.saliency import JSMA
+from advbox.models.paddle import PaddleModel
+from tutorials.mnist_model import mnist_cnn_model
+def main():
+    """
+    Advbox demo which demonstrate how to use advbox.
+    """
+    TOTAL_NUM = 500
+    IMG_NAME = 'img'
+    LABEL_NAME = 'label'
+    img = fluid.layers.data(name=IMG_NAME, shape=[1, 28, 28], dtype='float32')
+    # gradient should flow
+    img.stop_gradient = False
+    label = fluid.layers.data(name=LABEL_NAME, shape=[1], dtype='int64')
+    logits = mnist_cnn_model(img)
+    cost = fluid.layers.cross_entropy(input=logits, label=label)
+    avg_cost = fluid.layers.mean(x=cost)
+    # use CPU
+    place = fluid.CPUPlace()
+    # use GPU
+    # place = fluid.CUDAPlace(0)
+    exe = fluid.Executor(place)
+    BATCH_SIZE = 1
+    train_reader = paddle.batch(
+        paddle.reader.shuffle(
+            paddle.dataset.mnist.train(), buf_size=128 * 10),
+        batch_size=BATCH_SIZE)
+    test_reader = paddle.batch(
+        paddle.reader.shuffle(
+            paddle.dataset.mnist.test(), buf_size=128 * 10),
+        batch_size=BATCH_SIZE)
+    fluid.io.load_params(
+        exe, "./mnist/", main_program=fluid.default_main_program())
+    # advbox demo
+    m = PaddleModel(
+        fluid.default_main_program(),
+        IMG_NAME,
+        LABEL_NAME,
+        logits.name,
+        avg_cost.name, (-1, 1),
+        channel_axis=1)
+    attack = JSMA(m)
+    attack_config = {
+        "max_iter": 2000,
+        "theta": 0.1,
+        "max_perturbations_per_pixel": 7
+    }
+    # use train data to generate adversarial examples
+    total_count = 0
+    fooling_count = 0
+    for data in train_reader():
+        total_count += 1
+        adversary = Adversary(data[0][0], data[0][1])
+        # JSMA non-targeted attack
+        adversary = attack(adversary, **attack_config)
+        # JSMA targeted attack
+        # tlabel = 0
+        # adversary.set_target(is_targeted_attack=True, target_label=tlabel)
+        # adversary = attack(adversary, **attack_config)
+        # JSMA may return None
+        if adversary is not None and adversary.is_successful():
+            fooling_count += 1
+            print(
+                'attack success, original_label=%d, adversarial_label=%d, count=%d'
+                % (data[0][1], adversary.adversarial_label, total_count))
+            # plt.imshow(adversary.target, cmap='Greys_r')
+            # plt.show()
+            # np.save('adv_img', adversary.target)
+        else:
+            print('attack failed, original_label=%d, count=%d' %
+                  (data[0][1], total_count))
+        if total_count >= TOTAL_NUM:
+            print(
+                "[TRAIN_DATASET]: fooling_count=%d, total_count=%d, fooling_rate=%f"
+                % (fooling_count, total_count,
+                   float(fooling_count) / total_count))
+            break
+    # use test data to generate adversarial examples
+    total_count = 0
+    fooling_count = 0
+    for data in test_reader():
+        total_count += 1
+        adversary = Adversary(data[0][0], data[0][1])
+        # JSMA non-targeted attack
+        adversary = attack(adversary, **attack_config)
+        # JSMA targeted attack
+        # tlabel = 0
+        # adversary.set_target(is_targeted_attack=True, target_label=tlabel)
+        # adversary = attack(adversary, **attack_config)
+        # JSMA may return None
+        if adversary is not None and adversary.is_successful():
+            fooling_count += 1
+            print(
+                'attack success, original_label=%d, adversarial_label=%d, count=%d'
+                % (data[0][1], adversary.adversarial_label, total_count))
+            # plt.imshow(adversary.target, cmap='Greys_r')
+            # plt.show()
+            # np.save('adv_img', adversary.target)
+        else:
+            print('attack failed, original_label=%d, count=%d' %
+                  (data[0][1], total_count))
+        if total_count >= TOTAL_NUM:
+            print(
+                "[TEST_DATASET]: fooling_count=%d, total_count=%d, fooling_rate=%f"
+                % (fooling_count, total_count,
+                   float(fooling_count) / total_count))
+            break
+    print("jsma attack done")
+if __name__ == '__main__':
+    main()
--- a/fluid/adversarial/tutorials/mnist_tutorial_lbfgs.py
+++ b/fluid/adversarial/tutorials/mnist_tutorial_lbfgs.py
+"""
+LBFGS tutorial on mnist using advbox tool.
+LBFGS method only supports targeted attack.
+"""
+import sys
+sys.path.append("..")
+import matplotlib.pyplot as plt
+import paddle.fluid as fluid
+import paddle.v2 as paddle
+from advbox.adversary import Adversary
+from advbox.attacks.lbfgs import LBFGS
+from advbox.models.paddle import PaddleModel
+from tutorials.mnist_model import mnist_cnn_model
+def main():
+    """
+    Advbox demo which demonstrate how to use advbox.
+    """
+    TOTAL_NUM = 500
+    IMG_NAME = 'img'
+    LABEL_NAME = 'label'
+    img = fluid.layers.data(name=IMG_NAME, shape=[1, 28, 28], dtype='float32')
+    # gradient should flow
+    img.stop_gradient = False
+    label = fluid.layers.data(name=LABEL_NAME, shape=[1], dtype='int64')
+    logits = mnist_cnn_model(img)
+    cost = fluid.layers.cross_entropy(input=logits, label=label)
+    avg_cost = fluid.layers.mean(x=cost)
+    # use CPU
+    place = fluid.CPUPlace()
+    # use GPU
+    # place = fluid.CUDAPlace(0)
+    exe = fluid.Executor(place)
+    BATCH_SIZE = 1
+    train_reader = paddle.batch(
+        paddle.reader.shuffle(
+            paddle.dataset.mnist.train(), buf_size=128 * 10),
+        batch_size=BATCH_SIZE)
+    test_reader = paddle.batch(
+        paddle.reader.shuffle(
+            paddle.dataset.mnist.test(), buf_size=128 * 10),
+        batch_size=BATCH_SIZE)
+    fluid.io.load_params(
+        exe, "./mnist/", main_program=fluid.default_main_program())
+    # advbox demo
+    m = PaddleModel(
+        fluid.default_main_program(),
+        IMG_NAME,
+        LABEL_NAME,
+        logits.name,
+        avg_cost.name, (-1, 1),
+        channel_axis=1)
+    attack = LBFGS(m)
+    attack_config = {"epsilon": 0.001, }
+    # use train data to generate adversarial examples
+    total_count = 0
+    fooling_count = 0
+    for data in train_reader():
+        total_count += 1
+        adversary = Adversary(data[0][0], data[0][1])
+        # LBFGS targeted attack
+        tlabel = 0
+        adversary.set_target(is_targeted_attack=True, target_label=tlabel)
+        adversary = attack(adversary, **attack_config)
+        if adversary.is_successful():
+            fooling_count += 1
+            print(
+                'attack success, original_label=%d, adversarial_label=%d, count=%d'
+                % (data[0][1], adversary.adversarial_label, total_count))
+            # plt.imshow(adversary.target, cmap='Greys_r')
+            # plt.show()
+            # np.save('adv_img', adversary.target)
+        else:
+            print('attack failed, original_label=%d, count=%d' %
+                  (data[0][1], total_count))
+        if total_count >= TOTAL_NUM:
+            print(
+                "[TRAIN_DATASET]: fooling_count=%d, total_count=%d, fooling_rate=%f"
+                % (fooling_count, total_count,
+                   float(fooling_count) / total_count))
+            break
+    # use test data to generate adversarial examples
+    total_count = 0
+    fooling_count = 0
+    for data in test_reader():
+        total_count += 1
+        adversary = Adversary(data[0][0], data[0][1])
+        # LBFGS targeted attack
+        tlabel = 0
+        adversary.set_target(is_targeted_attack=True, target_label=tlabel)
+        adversary = attack(adversary, **attack_config)
+        if adversary.is_successful():
+            fooling_count += 1
+            print(
+                'attack success, original_label=%d, adversarial_label=%d, count=%d'
+                % (data[0][1], adversary.adversarial_label, total_count))
+            # plt.imshow(adversary.target, cmap='Greys_r')
+            # plt.show()
+            # np.save('adv_img', adversary.target)
+        else:
+            print('attack failed, original_label=%d, count=%d' %
+                  (data[0][1], total_count))
+        if total_count >= TOTAL_NUM:
+            print(
+                "[TEST_DATASET]: fooling_count=%d, total_count=%d, fooling_rate=%f"
+                % (fooling_count, total_count,
+                   float(fooling_count) / total_count))
+            break
+    print("lbfgs attack done")
+if __name__ == '__main__':
+    main()
--- a/fluid/image_classification/caffe2fluid/README.md
+++ b/fluid/image_classification/caffe2fluid/README.md
@@ -18,19 +18,19 @@ This tool is used to convert a Caffe model to Fluid model
 ### Tested models
- Lenet on mnist dataset
+- Lenet
 - ResNets:(ResNet-50, ResNet-101, ResNet-152)
-    model addr: `https://onedrive.live.com/?authkey=%21AAFW2-FVoxeVRck&id=4006CBB8476FF777%2117887&cid=4006CBB8476FF777`_
+[model addr](https://onedrive.live.com/?authkey=%21AAFW2-FVoxeVRck&id=4006CBB8476FF777%2117887&cid=4006CBB8476FF777)
 - GoogleNet:
-    model addr: `https://gist.github.com/jimmie33/7ea9f8ac0da259866b854460f4526034`_
+[model addr](https://gist.github.com/jimmie33/7ea9f8ac0da259866b854460f4526034)
 - VGG:
-    model addr: `https://gist.github.com/ksimonyan/211839e770f7b538e2d8`_
+[model addr](https://gist.github.com/ksimonyan/211839e770f7b538e2d8)
 - AlexNet:
-    model addr: `https://github.com/BVLC/caffe/tree/master/models/bvlc_alexnet`_
+[model addr](https://github.com/BVLC/caffe/tree/master/models/bvlc_alexnet)
 ### Notes
 Some of this code come from here: https://github.com/ethereon/caffe-tensorflow
--- a/fluid/image_classification/caffe2fluid/examples/imagenet/compare.py
+++ b/fluid/image_classification/caffe2fluid/examples/imagenet/compare.py
+#!/usr/bin/python
+#
+#a tool to compare tensors in two files or two directories
+#
+import sys
+import os
+def walk_dir(rootdir):
+    for subdir, dirs, files in os.walk(rootdir):
+        for file in files:
+            yield file
+def calc_diff(f1, f2):
+    import numpy as np
+    d1 = np.load(f1).flatten()
+    d2 = np.load(f2).flatten()
+    d1_num = reduce(lambda x, y: x * y, d1.shape)
+    d2_num = reduce(lambda x, y: x * y, d2.shape)
+    if d1_num != d2_num:
+        print d1.shape
+        print d2.shape
+        assert (d1_num == d2_num), "their shape is not consistent"
+    try:
+        df = np.abs(d1 - d2)
+        max_df = np.max(df)
+        sq_df = np.mean(df * df)
+        return max_df, sq_df
+    except Exception as e:
+        return -1.0, -1.0
+def compare(path1, path2):
+    def diff(f1, f2):
+        max_df, sq_df = calc_diff(f1, f2)
+        print('compare %s <=> %s with result[max_df:%.4e, sq_df:%.4e]' %
+              (f1, f2, max_df, sq_df))
+        assert (max_df < 1e-5), \
+                'max_df is too large with value[%.6e]' % (max_df)
+        assert (sq_df < 1e-10), \
+                'sq_df is too large with value[%.6e]' % (sq_df)
+    if os.path.exists(path1) is False:
+        print('not found %s' % (path1))
+        return 1
+    elif os.path.exists(path2) is False:
+        print('not found %s' % (path2))
+        return 1
+    if path1.find('.npy') > 0 and path2.find('.npy') > 0:
+        diff(path1, path2)
+        return
+    for f in walk_dir(path2):
+        if f.find('.npy') < 0:
+            continue
+        f1 = os.path.join(path1, f)
+        f2 = os.path.join(path2, f)
+        diff(f1, f2)
+    print('all checking succeed to pass')
+    return 0
+if __name__ == "__main__":
+    if len(sys.argv) == 1:
+        path1 = 'lenet.tf/results'
+        path2 = 'lenet.paddle/results'
+    elif len(sys.argv) == 3:
+        path1 = sys.argv[1]
+        path2 = sys.argv[2]
+    else:
+        print('usage:')
+        print(' %s [path1] [path2]' % (sys.argv[0]))
+        exit(1)
+    print('compare inner result in %s %s' % (path1, path2))
+    exit(compare(path1, path2))
--- a/fluid/image_classification/caffe2fluid/examples/imagenet/diff.sh
+++ b/fluid/image_classification/caffe2fluid/examples/imagenet/diff.sh
+#!/bin/bash
+#
+#function:
+#   a tool used to check the difference of models' results generated by caffe model and paddle model
+#
+#howto:
+#   bash diff.sh resnet50 #when this has been finished, you can get the difference in precision
+#
+#notes:
+#   0, in order to infer using caffe, we need pycaffe installed
+#   1, prepare your caffe model in 'models.caffe/', eg: 'model.caffe/resnet101/resnet101.[prototxt|caffemodel]'
+#   2, converted paddle model will be in 'models'
+#   3, results of layers will be stored in 'results/${model_name}.[paddle|caffe]'
+#   4, only the last layer will be checked by default
+model_name="resnet50"
+results_root="results/"
+if [[ -n $1 ]];then
+    if [ $1 = "-h" ];then
+        echo "usage:"
+        echo "  bash $0 [model_name]"
+        echo "  eg:bash $0 resnet50"
+        exit 0
+    fi
+    model_name=$1
+fi
+mkdir -p $results_root
+model_prototxt="models.caffe/$model_name/${model_name}.prototxt"
+model_caffemodel="models.caffe/${model_name}/${model_name}.caffemodel"
+#1, dump layers' results from paddle
+paddle_results="$results_root/${model_name}.paddle"
+rm -rf $paddle_results
+rm -rf "results.paddle"
+bash run.sh $model_name ./models.caffe/$model_name ./models/$model_name
+if [[ $? -ne 0 ]] || [[ ! -e "results.paddle" ]];then
+    echo "not found paddle's results, maybe failed to convert"
+    exit 1
+fi
+mv results.paddle $paddle_results
+#2, dump layers' results from caffe
+caffe_results="$results_root/${model_name}.caffe"
+rm -rf $caffe_results
+rm -rf "results.caffe"
+cfpython ./infer.py caffe $model_prototxt $model_caffemodel $paddle_results/data.npy
+if [[ $? -ne 0 ]] || [[ ! -e "results.caffe" ]];then
+    echo "not found caffe's results, maybe failed to do inference with caffe"
+    exit 1
+fi
+mv results.caffe $caffe_results
+#3, extract layer names
+cat $model_prototxt | grep name | perl -ne 'if(/^\s*name:\s+\"([^\"]+)/){ print $1."\n";}' >.layer_names
+#4, compare one by one
+for i in $(cat ".layer_names" | tail -n1);do
+    echo "process $i"
+    python compare.py $caffe_results/${i}.npy $paddle_results/${i}.npy
+done
--- a/fluid/image_classification/caffe2fluid/examples/imagenet/infer.py
+++ b/fluid/image_classification/caffe2fluid/examples/imagenet/infer.py
@@ -10,8 +10,11 @@ import os
 import sys
 import inspect
 import numpy as np
-import paddle.v2 as paddle
-import paddle.v2.fluid as fluid
+def import_fluid():
+    import paddle.fluid as fluid
+    return fluid
 def load_data(imgfile, shape):
@@ -52,8 +55,10 @@ def build_model(net_file, net_name):
        print(e)
        return None
-    input_name = 'data'
+    fluid = import_fluid()
-    input_shape = MyNet.input_shapes()[input_name]
+    inputs_dict = MyNet.input_shapes()
+    input_name = inputs_dict.keys()[0]
+    input_shape = inputs_dict[input_name]
    images = fluid.layers.data(name='image', shape=input_shape, dtype='float32')
    #label = fluid.layers.data(name='label', shape=[1], dtype='int64')
@@ -64,7 +69,7 @@ def build_model(net_file, net_name):
 def dump_results(results, names, root):
    if os.path.exists(root) is False:
-        os.path.mkdir(root)
+        os.mkdir(root)
    for i in range(len(names)):
        n = names[i]
@@ -73,9 +78,12 @@ def dump_results(results, names, root):
        np.save(filename + '.npy', res)
-def infer(net_file, net_name, model_file, imgfile, debug=False):
+def infer(net_file, net_name, model_file, imgfile, debug=True):
    """ do inference using a model which consist 'xxx.py' and 'xxx.npy'
    """
+    fluid = import_fluid()
    #1, build model
    net, input_shape = build_model(net_file, net_name)
    prediction = net.get_output()
@@ -109,34 +117,79 @@ def infer(net_file, net_name, model_file, imgfile, debug=False):
                      fetch_list=fetch_list_var)
    if debug is True:
-        dump_path = 'results.layers'
+        dump_path = 'results.paddle'
        dump_results(results, fetch_list_name, dump_path)
-        print('all results dumped to [%s]' % (dump_path))
+        print('all result of layers dumped to [%s]' % (dump_path))
    else:
        result = results[0]
        print('predicted class:', np.argmax(result))
+    return 0
+def caffe_infer(prototxt, caffemodel, datafile):
+    """ do inference using pycaffe for debug,
+        all intermediate results will be dumpped to 'results.caffe'
+    """
+    import caffe
+    net = caffe.Net(prototxt, caffemodel, caffe.TEST)
+    input_layer = net.blobs.keys()[0]
+    print('got name of input layer is:%s' % (input_layer))
+    input_shape = list(net.blobs[input_layer].data.shape[1:])
+    if '.npy' in datafile:
+        np_images = np.load(datafile)
+    else:
+        np_images = load_data(datafile, input_shape)
+    inputs = {input_layer: np_images}
+    net.forward_all(**inputs)
+    results = []
+    names = []
+    for k, v in net.blobs.items():
+        k = k.rstrip('_output')
+        k = k.replace('/', '_')
+        names.append(k)
+        results.append(v.data.copy())
+    dump_path = 'results.caffe'
+    dump_results(results, names, dump_path)
+    print('all result of layers dumped to [%s]' % (dump_path))
+    return 0
 if __name__ == "__main__":
    """ maybe more convenient to use 'run.sh' to call this tool
    """
    net_file = 'models/resnet50/resnet50.py'
    weight_file = 'models/resnet50/resnet50.npy'
-    imgfile = 'data/65.jpeg'
+    datafile = 'data/65.jpeg'
    net_name = 'ResNet50'
    argc = len(sys.argv)
-    if argc == 5:
+    if sys.argv[1] == 'caffe':
+        if len(sys.argv) != 5:
+            print('usage:')
+            print('\tpython %s caffe [prototxt] [caffemodel] [datafile]' %
+                  (sys.argv[0]))
+            sys.exit(1)
+        prototxt = sys.argv[2]
+        caffemodel = sys.argv[3]
+        datafile = sys.argv[4]
+        sys.exit(caffe_infer(prototxt, caffemodel, datafile))
+    elif argc == 5:
        net_file = sys.argv[1]
        weight_file = sys.argv[2]
-        imgfile = sys.argv[3]
+        datafile = sys.argv[3]
        net_name = sys.argv[4]
    elif argc > 1:
        print('usage:')
-        print('\tpython %s [net_file] [weight_file] [imgfile] [net_name]' %
+        print('\tpython %s [net_file] [weight_file] [datafile] [net_name]' %
              (sys.argv[0]))
        print('\teg:python %s %s %s %s %s' % (sys.argv[0], net_file,
-                                              weight_file, imgfile, net_name))
+                                              weight_file, datafile, net_name))
        sys.exit(1)
-    infer(net_file, net_name, weight_file, imgfile)
+    infer(net_file, net_name, weight_file, datafile)
--- a/fluid/image_classification/caffe2fluid/examples/imagenet/run.sh
+++ b/fluid/image_classification/caffe2fluid/examples/imagenet/run.sh
@@ -3,7 +3,7 @@
 #function:
 #   a tool used to:
 #       1, convert a caffe model
-#       2, do inference using this model
+#       2, do inference(only in fluid) using this model
 #
 #usage:
 #   bash run.sh resnet50 ./models.caffe/resnet50 ./models/resnet50
@@ -65,7 +65,12 @@ if [[ -z $only_convert ]];then
        PYTHON=`which python`
    fi
    imgfile="data/65.jpeg"
-    net_name=`grep "name" $proto_file | head -n1 | perl -ne 'if(/\"([^\"]+)\"/){ print $1."\n";}'`
+    #FIX ME:
+    #   only look the first line in prototxt file for the name of this network, maybe not correct
+    net_name=`grep "name" $proto_file | head -n1 | perl -ne 'if(/^\s*name\s*:\s*\"([^\"]+)\"/){ print $1."\n";}'`
+    if [[ -z $net_name ]];then
+        net_name="MyNet"
+    fi
    $PYTHON ./infer.py $net_file $weight_file $imgfile $net_name
    ret=$?
 fi

--- a/fluid/image_classification/caffe2fluid/kaffe/graph.py
+++ b/fluid/image_classification/caffe2fluid/kaffe/graph.py
@@ -52,7 +52,10 @@ class Graph(object):
    def __init__(self, nodes=None, name=None):
        self.nodes = nodes or []
        self.node_lut = {node.name: node for node in self.nodes}
-        self.name = name
+        if name is None or name == '':
+            self.name = 'MyNet'
+        else:
+            self.name = name
    def add_node(self, node):
        self.nodes.append(node)

--- a/fluid/image_classification/caffe2fluid/kaffe/paddle/network.py
+++ b/fluid/image_classification/caffe2fluid/kaffe/paddle/network.py
@@ -4,7 +4,7 @@ import numpy as np
 def import_fluid():
-    import paddle.v2.fluid as fluid
+    import paddle.fluid as fluid
    return fluid
@@ -64,7 +64,7 @@ class Network(object):
        if os.path.isdir(data_path):
            assert (exe is not None), \
                'must provide a executor to load fluid model'
-            fluid.io.load_persistables_if_exist(executor=exe, dirname=data_path)
+            fluid.io.load_persistables(executor=exe, dirname=data_path)
            return True
        #load model from a npy file
@@ -161,56 +161,28 @@ class Network(object):
        output = fluid.layers.relu(x=input)
        return output
-    def _adjust_pad_if_needed(self, i_hw, k_hw, s_hw, p_hw):
-        #adjust the padding if needed
-        i_h, i_w = i_hw
-        k_h, k_w = k_hw
-        s_h, s_w = s_hw
-        p_h, p_w = p_hw
-        def is_consistent(i, k, s, p):
-            o = i + 2 * p - k
-            if o % s == 0:
-                return True
-            else:
-                return False
-        real_p_h = 0
-        real_p_w = 0
-        if is_consistent(i_h, k_h, s_h, p_h) is False:
-            real_p_h = int(k_h / 2)
-        if is_consistent(i_w, k_w, s_w, p_w) is False:
-            real_p_w = int(k_w / 2)
-        return [real_p_h, real_p_w]
    def pool(self, pool_type, input, k_h, k_w, s_h, s_w, name, padding):
        # Get the number of channels in the input
        in_hw = input.shape[2:]
        k_hw = [k_h, k_w]
        s_hw = [s_h, s_w]
-        if padding is None:
-            #fix bug about the difference between conv and pool
-            #more info: https://github.com/BVLC/caffe/issues/1318
-            padding = self._adjust_pad_if_needed(in_hw, k_hw, s_hw, [0, 0])
        fluid = import_fluid()
        output = fluid.layers.pool2d(
            input=input,
            pool_size=k_hw,
            pool_stride=s_hw,
            pool_padding=padding,
+            ceil_mode=True,
            pool_type=pool_type)
        return output
    @layer
-    def max_pool(self, input, k_h, k_w, s_h, s_w, name, padding=None):
+    def max_pool(self, input, k_h, k_w, s_h, s_w, name, padding=[0, 0]):
        return self.pool('max', input, k_h, k_w, s_h, s_w, name, padding)
    @layer
-    def avg_pool(self, input, k_h, k_w, s_h, s_w, name, padding=None):
+    def avg_pool(self, input, k_h, k_w, s_h, s_w, name, padding=[0, 0]):
        return self.pool('avg', input, k_h, k_w, s_h, s_w, name, padding)
    @layer
@@ -258,7 +230,12 @@ class Network(object):
        return output
    @layer
-    def batch_normalization(self, input, name, scale_offset=True, relu=False):
+    def batch_normalization(self,
+                            input,
+                            name,
+                            scale_offset=True,
+                            eps=1e-5,
+                            relu=False):
        # NOTE: Currently, only inference is supported
        fluid = import_fluid()
        prefix = name + '_'
@@ -276,7 +253,7 @@ class Network(object):
            bias_attr=bias_attr,
            moving_mean_name=mean_name,
            moving_variance_name=variance_name,
-            epsilon=1e-5,
+            epsilon=eps,
            act='relu' if relu is True else None)
        return output

--- a/fluid/image_classification/caffe2fluid/kaffe/paddle/transformer.py
+++ b/fluid/image_classification/caffe2fluid/kaffe/paddle/transformer.py
@@ -142,7 +142,13 @@ class TensorFlowMapper(NodeMapper):
    def map_batch_norm(self, node):
        scale_offset = len(node.data) == 4
-        kwargs = {} if scale_offset else {'scale_offset': False}
+        #this default value comes from caffe's param in batch_norm
+        default_eps = 1e-5
+        kwargs = {'scale_offset': scale_offset}
+        if node.parameters.eps != default_eps:
+            kwargs['eps'] = node.parameters.eps
        return MaybeActivated(
            node, default=False)('batch_normalization', **kwargs)
@@ -236,7 +242,7 @@ class TensorFlowEmitter(object):
        func_def = self.statement('@classmethod')
        func_def += self.statement('def convert(cls, npy_model, fluid_path):')
        self.indent()
-        func_def += self.statement('import paddle.v2.fluid as fluid')
+        func_def += self.statement('fluid = import_fluid()')
        for l in codes:
            func_def += self.statement(l)
        return '\n' + func_def

--- a/fluid/image_classification/se_resnext.py
+++ b/fluid/image_classification/se_resnext.py
-import os
-import numpy as np
-import time
-import sys
 import paddle.v2 as paddle
 import paddle.fluid as fluid
-import reader
 def conv_bn_layer(input, num_filters, filter_size, stride=1, groups=1,
@@ -124,164 +119,3 @@ def SE_ResNeXt(input, class_dim, infer=False, layers=50):
        drop = pool
    out = fluid.layers.fc(input=drop, size=class_dim, act='softmax')
    return out
-def train(learning_rate,
-          batch_size,
-          num_passes,
-          init_model=None,
-          model_save_dir='model',
-          parallel=True,
-          use_nccl=True,
-          lr_strategy=None,
-          layers=50):
-    class_dim = 1000
-    image_shape = [3, 224, 224]
-    image = fluid.layers.data(name='image', shape=image_shape, dtype='float32')
-    label = fluid.layers.data(name='label', shape=[1], dtype='int64')
-    if parallel:
-        places = fluid.layers.get_places()
-        pd = fluid.layers.ParallelDo(places, use_nccl=use_nccl)
-        with pd.do():
-            image_ = pd.read_input(image)
-            label_ = pd.read_input(label)
-            out = SE_ResNeXt(input=image_, class_dim=class_dim, layers=layers)
-            cost = fluid.layers.cross_entropy(input=out, label=label_)
-            avg_cost = fluid.layers.mean(x=cost)
-            acc_top1 = fluid.layers.accuracy(input=out, label=label_, k=1)
-            acc_top5 = fluid.layers.accuracy(input=out, label=label_, k=5)
-            pd.write_output(avg_cost)
-            pd.write_output(acc_top1)
-            pd.write_output(acc_top5)
-        avg_cost, acc_top1, acc_top5 = pd()
-        avg_cost = fluid.layers.mean(x=avg_cost)
-        acc_top1 = fluid.layers.mean(x=acc_top1)
-        acc_top5 = fluid.layers.mean(x=acc_top5)
-    else:
-        out = SE_ResNeXt(input=image, class_dim=class_dim, layers=layers)
-        cost = fluid.layers.cross_entropy(input=out, label=label)
-        avg_cost = fluid.layers.mean(x=cost)
-        acc_top1 = fluid.layers.accuracy(input=out, label=label, k=1)
-        acc_top5 = fluid.layers.accuracy(input=out, label=label, k=5)
-    if lr_strategy is None:
-        optimizer = fluid.optimizer.Momentum(
-            learning_rate=learning_rate,
-            momentum=0.9,
-            regularization=fluid.regularizer.L2Decay(1e-4))
-    else:
-        bd = lr_strategy["bd"]
-        lr = lr_strategy["lr"]
-        optimizer = fluid.optimizer.Momentum(
-            learning_rate=fluid.layers.piecewise_decay(
-                boundaries=bd, values=lr),
-            momentum=0.9,
-            regularization=fluid.regularizer.L2Decay(1e-4))
-    opts = optimizer.minimize(avg_cost)
-    fluid.memory_optimize(fluid.default_main_program())
-    inference_program = fluid.default_main_program().clone()
-    with fluid.program_guard(inference_program):
-        inference_program = fluid.io.get_inference_program(
-            [avg_cost, acc_top1, acc_top5])
-    place = fluid.CUDAPlace(0)
-    exe = fluid.Executor(place)
-    exe.run(fluid.default_startup_program())
-    if init_model is not None:
-        fluid.io.load_persistables(exe, init_model)
-    train_reader = paddle.batch(reader.train(), batch_size=batch_size)
-    test_reader = paddle.batch(reader.test(), batch_size=batch_size)
-    feeder = fluid.DataFeeder(place=place, feed_list=[image, label])
-    for pass_id in range(num_passes):
-        train_info = [[], [], []]
-        test_info = [[], [], []]
-        for batch_id, data in enumerate(train_reader()):
-            t1 = time.time()
-            loss, acc1, acc5 = exe.run(
-                fluid.default_main_program(),
-                feed=feeder.feed(data),
-                fetch_list=[avg_cost, acc_top1, acc_top5])
-            t2 = time.time()
-            period = t2 - t1
-            train_info[0].append(loss[0])
-            train_info[1].append(acc1[0])
-            train_info[2].append(acc5[0])
-            if batch_id % 10 == 0:
-                print("Pass {0}, trainbatch {1}, loss {2}, \
-                       acc1 {3}, acc5 {4} time {5}"
-                                                   .format(pass_id, \
-                       batch_id, loss[0], acc1[0], acc5[0], \
-                       "%2.2f sec" % period))
-                sys.stdout.flush()
-        train_loss = np.array(train_info[0]).mean()
-        train_acc1 = np.array(train_info[1]).mean()
-        train_acc5 = np.array(train_info[2]).mean()
-        for data in test_reader():
-            t1 = time.time()
-            loss, acc1, acc5 = exe.run(
-                inference_program,
-                feed=feeder.feed(data),
-                fetch_list=[avg_cost, acc_top1, acc_top5])
-            t2 = time.time()
-            period = t2 - t1
-            test_info[0].append(loss[0])
-            test_info[1].append(acc1[0])
-            test_info[2].append(acc5[0])
-            if batch_id % 10 == 0:
-                print("Pass {0},testbatch {1},loss {2}, \
-                       acc1 {3},acc5 {4},time {5}"
-                                                  .format(pass_id, \
-                       batch_id, loss[0], acc1[0], acc5[0], \
-                       "%2.2f sec" % period))
-                sys.stdout.flush()
-        test_loss = np.array(test_info[0]).mean()
-        test_acc1 = np.array(test_info[1]).mean()
-        test_acc5 = np.array(test_info[2]).mean()
-        print("End pass {0}, train_loss {1}, train_acc1 {2}, train_acc5 {3}, \
-               test_loss {4}, test_acc1 {5}, test_acc5 {6}"
-                                                           .format(pass_id, \
-              train_loss, train_acc1, train_acc5, test_loss, test_acc1, \
-              test_acc5))
-        sys.stdout.flush()
-        model_path = os.path.join(model_save_dir, str(pass_id))
-        if not os.path.isdir(model_path):
-            os.makedirs(model_path)
-        fluid.io.save_persistables(exe, model_path)
-if __name__ == '__main__':
-    epoch_points = [30, 60, 90]
-    total_images = 1281167
-    batch_size = 256
-    step = int(total_images / batch_size + 1)
-    bd = [e * step for e in epoch_points]
-    lr = [0.1, 0.01, 0.001, 0.0001]
-    lr_strategy = {"bd": bd, "lr": lr}
-    use_nccl = True
-    # layers: 50, 152
-    layers = 50
-    train(
-        learning_rate=0.1,
-        batch_size=batch_size,
-        num_passes=120,
-        init_model=None,
-        parallel=True,
-        use_nccl=True,
-        lr_strategy=lr_strategy,
-        layers=layers)
--- a/fluid/image_classification/train.py
+++ b/fluid/image_classification/train.py
+import os
+import numpy as np
+import time
+import sys
+import paddle.v2 as paddle
+import paddle.fluid as fluid
+from se_resnext import SE_ResNeXt
+import reader
+import argparse
+import functools
+from utility import add_arguments, print_arguments
+parser = argparse.ArgumentParser(description=__doc__)
+add_arg = functools.partial(add_arguments, argparser=parser)
+# yapf: disable
+add_arg('batch_size',   int,  256, "Minibatch size.")
+add_arg('num_layers',   int,  50,  "How many layers for SE-ResNeXt model.")
+add_arg('with_mem_opt', bool, True, "Whether to use memory optimization or not.")
+add_arg('parallel_exe', bool, True, "Whether to use ParallelExecutor to train or not.")
+def train_paralle_do(args,
+                     learning_rate,
+                     batch_size,
+                     num_passes,
+                     init_model=None,
+                     model_save_dir='model',
+                     parallel=True,
+                     use_nccl=True,
+                     lr_strategy=None,
+                     layers=50):
+    class_dim = 1000
+    image_shape = [3, 224, 224]
+    image = fluid.layers.data(name='image', shape=image_shape, dtype='float32')
+    label = fluid.layers.data(name='label', shape=[1], dtype='int64')
+    if parallel:
+        places = fluid.layers.get_places()
+        pd = fluid.layers.ParallelDo(places, use_nccl=use_nccl)
+        with pd.do():
+            image_ = pd.read_input(image)
+            label_ = pd.read_input(label)
+            out = SE_ResNeXt(input=image_, class_dim=class_dim, layers=layers)
+            cost = fluid.layers.cross_entropy(input=out, label=label_)
+            avg_cost = fluid.layers.mean(x=cost)
+            acc_top1 = fluid.layers.accuracy(input=out, label=label_, k=1)
+            acc_top5 = fluid.layers.accuracy(input=out, label=label_, k=5)
+            pd.write_output(avg_cost)
+            pd.write_output(acc_top1)
+            pd.write_output(acc_top5)
+        avg_cost, acc_top1, acc_top5 = pd()
+        avg_cost = fluid.layers.mean(x=avg_cost)
+        acc_top1 = fluid.layers.mean(x=acc_top1)
+        acc_top5 = fluid.layers.mean(x=acc_top5)
+    else:
+        out = SE_ResNeXt(input=image, class_dim=class_dim, layers=layers)
+        cost = fluid.layers.cross_entropy(input=out, label=label)
+        avg_cost = fluid.layers.mean(x=cost)
+        acc_top1 = fluid.layers.accuracy(input=out, label=label, k=1)
+        acc_top5 = fluid.layers.accuracy(input=out, label=label, k=5)
+    if lr_strategy is None:
+        optimizer = fluid.optimizer.Momentum(
+            learning_rate=learning_rate,
+            momentum=0.9,
+            regularization=fluid.regularizer.L2Decay(1e-4))
+    else:
+        bd = lr_strategy["bd"]
+        lr = lr_strategy["lr"]
+        optimizer = fluid.optimizer.Momentum(
+            learning_rate=fluid.layers.piecewise_decay(
+                boundaries=bd, values=lr),
+            momentum=0.9,
+            regularization=fluid.regularizer.L2Decay(1e-4))
+    inference_program = fluid.default_main_program().clone(for_test=True)
+    opts = optimizer.minimize(avg_cost)
+    if args.with_mem_opt:
+        fluid.memory_optimize(fluid.default_main_program())
+        fluid.memory_optimize(inference_program)
+    place = fluid.CUDAPlace(0)
+    exe = fluid.Executor(place)
+    exe.run(fluid.default_startup_program())
+    if init_model is not None:
+        fluid.io.load_persistables(exe, init_model)
+    train_reader = paddle.batch(reader.train(), batch_size=batch_size)
+    test_reader = paddle.batch(reader.test(), batch_size=batch_size)
+    feeder = fluid.DataFeeder(place=place, feed_list=[image, label])
+    for pass_id in range(num_passes):
+        train_info = [[], [], []]
+        test_info = [[], [], []]
+        for batch_id, data in enumerate(train_reader()):
+            t1 = time.time()
+            loss, acc1, acc5 = exe.run(
+                fluid.default_main_program(),
+                feed=feeder.feed(data),
+                fetch_list=[avg_cost, acc_top1, acc_top5])
+            t2 = time.time()
+            period = t2 - t1
+            train_info[0].append(loss[0])
+            train_info[1].append(acc1[0])
+            train_info[2].append(acc5[0])
+            if batch_id % 10 == 0:
+                print("Pass {0}, trainbatch {1}, loss {2}, \
+                       acc1 {3}, acc5 {4} time {5}"
+                                                   .format(pass_id, \
+                       batch_id, loss[0], acc1[0], acc5[0], \
+                       "%2.2f sec" % period))
+                sys.stdout.flush()
+        train_loss = np.array(train_info[0]).mean()
+        train_acc1 = np.array(train_info[1]).mean()
+        train_acc5 = np.array(train_info[2]).mean()
+        for data in test_reader():
+            t1 = time.time()
+            loss, acc1, acc5 = exe.run(
+                inference_program,
+                feed=feeder.feed(data),
+                fetch_list=[avg_cost, acc_top1, acc_top5])
+            t2 = time.time()
+            period = t2 - t1
+            test_info[0].append(loss[0])
+            test_info[1].append(acc1[0])
+            test_info[2].append(acc5[0])
+            if batch_id % 10 == 0:
+                print("Pass {0},testbatch {1},loss {2}, \
+                       acc1 {3},acc5 {4},time {5}"
+                                                  .format(pass_id, \
+                       batch_id, loss[0], acc1[0], acc5[0], \
+                       "%2.2f sec" % period))
+                sys.stdout.flush()
+        test_loss = np.array(test_info[0]).mean()
+        test_acc1 = np.array(test_info[1]).mean()
+        test_acc5 = np.array(test_info[2]).mean()
+        print("End pass {0}, train_loss {1}, train_acc1 {2}, train_acc5 {3}, \
+               test_loss {4}, test_acc1 {5}, test_acc5 {6}"
+                                                           .format(pass_id, \
+              train_loss, train_acc1, train_acc5, test_loss, test_acc1, \
+              test_acc5))
+        sys.stdout.flush()
+        model_path = os.path.join(model_save_dir, str(pass_id))
+        if not os.path.isdir(model_path):
+            os.makedirs(model_path)
+        fluid.io.save_persistables(exe, model_path)
+def train_parallel_exe(args,
+                       learning_rate,
+                       batch_size,
+                       num_passes,
+                       init_model=None,
+                       model_save_dir='model',
+                       parallel=True,
+                       use_nccl=True,
+                       lr_strategy=None,
+                       layers=50):
+    class_dim = 1000
+    image_shape = [3, 224, 224]
+    image = fluid.layers.data(name='image', shape=image_shape, dtype='float32')
+    label = fluid.layers.data(name='label', shape=[1], dtype='int64')
+    out = SE_ResNeXt(input=image, class_dim=class_dim, layers=layers)
+    cost = fluid.layers.cross_entropy(input=out, label=label)
+    acc_top1 = fluid.layers.accuracy(input=out, label=label, k=1)
+    acc_top5 = fluid.layers.accuracy(input=out, label=label, k=5)
+    avg_cost = fluid.layers.mean(x=cost)
+    test_program = fluid.default_main_program().clone(for_test=True)
+    if lr_strategy is None:
+        optimizer = fluid.optimizer.Momentum(
+            learning_rate=learning_rate,
+            momentum=0.9,
+            regularization=fluid.regularizer.L2Decay(1e-4))
+    else:
+        bd = lr_strategy["bd"]
+        lr = lr_strategy["lr"]
+        optimizer = fluid.optimizer.Momentum(
+            learning_rate=fluid.layers.piecewise_decay(
+                boundaries=bd, values=lr),
+            momentum=0.9,
+            regularization=fluid.regularizer.L2Decay(1e-4))
+    opts = optimizer.minimize(avg_cost)
+    if args.with_mem_opt:
+        fluid.memory_optimize(fluid.default_main_program())
+        fluid.memory_optimize(test_program)
+    place = fluid.CUDAPlace(0)
+    exe = fluid.Executor(place)
+    exe.run(fluid.default_startup_program())
+    if init_model is not None:
+        fluid.io.load_persistables(exe, init_model)
+    train_reader = paddle.batch(reader.train(), batch_size=batch_size)
+    test_reader = paddle.batch(reader.test(), batch_size=batch_size)
+    feeder = fluid.DataFeeder(place=place, feed_list=[image, label])
+    train_exe = fluid.ParallelExecutor(use_cuda=True, loss_name=avg_cost.name)
+    test_exe = fluid.ParallelExecutor(
+        use_cuda=True,
+        main_program=test_program,
+        share_vars_from=train_exe)
+    fetch_list = [avg_cost.name, acc_top1.name, acc_top5.name]
+    for pass_id in range(num_passes):
+        train_info = [[], [], []]
+        test_info = [[], [], []]
+        for batch_id, data in enumerate(train_reader()):
+            t1 = time.time()
+            loss, acc1, acc5 = train_exe.run(
+                fetch_list,
+                feed_dict=feeder.feed(data))
+            t2 = time.time()
+            period = t2 - t1
+            loss = np.mean(np.array(loss))
+            acc1 = np.mean(np.array(acc1))
+            acc5 = np.mean(np.array(acc5))
+            train_info[0].append(loss)
+            train_info[1].append(acc1)
+            train_info[2].append(acc5)
+            if batch_id % 10 == 0:
+                print("Pass {0}, trainbatch {1}, loss {2}, \
+                       acc1 {3}, acc5 {4} time {5}"
+                                                   .format(pass_id, \
+                       batch_id, loss, acc1, acc5, \
+                       "%2.2f sec" % period))
+                sys.stdout.flush()
+        train_loss = np.array(train_info[0]).mean()
+        train_acc1 = np.array(train_info[1]).mean()
+        train_acc5 = np.array(train_info[2]).mean()
+        for data in test_reader():
+            t1 = time.time()
+            loss, acc1, acc5 = test_exe.run(
+                fetch_list,
+                feed_dict=feeder.feed(data))
+            t2 = time.time()
+            period = t2 - t1
+            loss = np.mean(np.array(loss))
+            acc1 = np.mean(np.array(acc1))
+            acc5 = np.mean(np.array(acc5))
+            test_info[0].append(loss)
+            test_info[1].append(acc1)
+            test_info[2].append(acc5)
+            if batch_id % 10 == 0:
+                print("Pass {0},testbatch {1},loss {2}, \
+                       acc1 {3},acc5 {4},time {5}"
+                                                  .format(pass_id, \
+                       batch_id, loss, acc1, acc5, \
+                       "%2.2f sec" % period))
+                sys.stdout.flush()
+        test_loss = np.array(test_info[0]).mean()
+        test_acc1 = np.array(test_info[1]).mean()
+        test_acc5 = np.array(test_info[2]).mean()
+        print("End pass {0}, train_loss {1}, train_acc1 {2}, train_acc5 {3}, \
+               test_loss {4}, test_acc1 {5}, test_acc5 {6}"
+                                                           .format(pass_id, \
+              train_loss, train_acc1, train_acc5, test_loss, test_acc1, \
+              test_acc5))
+        sys.stdout.flush()
+        model_path = os.path.join(model_save_dir, str(pass_id))
+        if not os.path.isdir(model_path):
+            os.makedirs(model_path)
+        fluid.io.save_persistables(exe, model_path)
+if __name__ == '__main__':
+    args = parser.parse_args()
+    print_arguments(args)
+    epoch_points = [30, 60, 90]
+    total_images = 1281167
+    batch_size = args.batch_size
+    step = int(total_images / batch_size + 1)
+    bd = [e * step for e in epoch_points]
+    lr = [0.1, 0.01, 0.001, 0.0001]
+    lr_strategy = {"bd": bd, "lr": lr}
+    use_nccl = True
+    # layers: 50, 152
+    layers = args.num_layers
+    method = train_parallel_exe if args.parallel_exe else train_parallel_do
+    method(args,
+           learning_rate=0.1,
+           batch_size=batch_size,
+           num_passes=120,
+           init_model=None,
+           parallel=True,
+           use_nccl=True,
+           lr_strategy=lr_strategy,
+           layers=layers)
--- a/fluid/image_classification/utility.py
+++ b/fluid/image_classification/utility.py
+"""Contains common utility functions."""
+#  Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import distutils.util
+import numpy as np
+from paddle.fluid import core
+def print_arguments(args):
+    """Print argparse's arguments.
+    Usage:
+    .. code-block:: python
+        parser = argparse.ArgumentParser()
+        parser.add_argument("name", default="Jonh", type=str, help="User name.")
+        args = parser.parse_args()
+        print_arguments(args)
+    :param args: Input argparse.Namespace for printing.
+    :type args: argparse.Namespace
+    """
+    print("-----------  Configuration Arguments -----------")
+    for arg, value in sorted(vars(args).iteritems()):
+        print("%s: %s" % (arg, value))
+    print("------------------------------------------------")
+def add_arguments(argname, type, default, help, argparser, **kwargs):
+    """Add argparse's argument.
+    Usage:
+    .. code-block:: python
+        parser = argparse.ArgumentParser()
+        add_argument("name", str, "Jonh", "User name.", parser)
+        args = parser.parse_args()
+    """
+    type = distutils.util.strtobool if type == bool else type
+    argparser.add_argument(
+        "--" + argname,
+        default=default,
+        type=type,
+        help=help + ' Default: %(default)s.',
+        **kwargs)
--- a/fluid/neural_machine_translation/transformer/config.py
+++ b/fluid/neural_machine_translation/transformer/config.py
@@ -15,6 +15,9 @@ class TrainTaskConfig(object):
    # the parameters for learning rate scheduling.
    warmup_steps = 4000
+    # the flag indicating to use average loss or sum loss when training.
+    use_avg_cost = False
    # the directory for saving trained models.
    model_dir = "trained_models"
@@ -22,8 +25,7 @@ class TrainTaskConfig(object):
 class InferTaskConfig(object):
    use_gpu = False
    # the number of examples in one run for sequence generation.
-    # currently the batch size can only be set to 1.
+    batch_size = 10
-    batch_size = 1
    # the parameters for beam search.
    beam_size = 5
@@ -31,6 +33,11 @@ class InferTaskConfig(object):
    # the number of decoded sentences to output.
    n_best = 1
+    # the flags indicating whether to output the special tokens.
+    output_bos = False
+    output_eos = False
+    output_unk = False
    # the directory for loading the trained model.
    model_path = "trained_models/pass_1.infer.model"
@@ -56,6 +63,8 @@ class ModelHyperParams(object):
    bos_idx = 0
    # index for <eos> token
    eos_idx = 1
+    # index for <unk> token
+    unk_idx = 2
    # position value corresponding to the <pad> token.
    pos_pad_idx = 0
@@ -93,6 +102,7 @@ encoder_input_data_names = (
    "src_word",
    "src_pos",
    "src_slf_attn_bias",
+    "src_data_shape",
    "src_slf_attn_pre_softmax_shape",
    "src_slf_attn_post_softmax_shape", )
@@ -102,6 +112,7 @@ decoder_input_data_names = (
    "trg_pos",
    "trg_slf_attn_bias",
    "trg_src_attn_bias",
+    "trg_data_shape",
    "trg_slf_attn_pre_softmax_shape",
    "trg_slf_attn_post_softmax_shape",
    "trg_src_attn_pre_softmax_shape",

--- a/fluid/neural_machine_translation/transformer/infer.py
+++ b/fluid/neural_machine_translation/transformer/infer.py
 import numpy as np
-import paddle.v2 as paddle
+import paddle
 import paddle.fluid as fluid
 import model
@@ -11,10 +11,26 @@ from config import InferTaskConfig, ModelHyperParams, \
 from train import pad_batch_data
-def translate_batch(exe, src_words, encoder, enc_in_names, enc_out_names,
+def translate_batch(exe,
-                    decoder, dec_in_names, dec_out_names, beam_size, max_length,
+                    src_words,
-                    n_best, batch_size, n_head, src_pad_idx, trg_pad_idx,
+                    encoder,
-                    bos_idx, eos_idx):
+                    enc_in_names,
+                    enc_out_names,
+                    decoder,
+                    dec_in_names,
+                    dec_out_names,
+                    beam_size,
+                    max_length,
+                    n_best,
+                    batch_size,
+                    n_head,
+                    d_model,
+                    src_pad_idx,
+                    trg_pad_idx,
+                    bos_idx,
+                    eos_idx,
+                    unk_idx,
+                    output_unk=True):
    """
    Run the encoder program once and run the decoder program multiple times to
    implement beam search externally.
@@ -28,6 +44,11 @@ def translate_batch(exe, src_words, encoder, enc_in_names, enc_out_names,
        return_pos=True,
        return_attn_bias=True,
        return_max_len=False)
+    # Append the data shape input to reshape the output of embedding layer.
+    enc_in_data = enc_in_data + [
+        np.array(
+            [-1, enc_in_data[2].shape[-1], d_model], dtype="int32")
+    ]
    # Append the shape inputs to reshape before and after softmax in encoder
    # self attention.
    enc_in_data = enc_in_data + [
@@ -44,11 +65,16 @@ def translate_batch(exe, src_words, encoder, enc_in_names, enc_out_names,
    scores = np.zeros((batch_size, beam_size), dtype="float32")
    prev_branchs = [[] for i in range(batch_size)]
    next_ids = [[] for i in range(batch_size)]
-    # Use beam_map to map the instance idx in batch to beam idx, since the
+    # Use beam_inst_map to map beam idx to the instance idx in batch, since the
    # size of feeded batch is changing.
-    beam_map = range(batch_size)
+    beam_inst_map = {
+        beam_idx: inst_idx
+        for inst_idx, beam_idx in enumerate(range(batch_size))
+    }
+    # Use active_beams to recode the alive.
+    active_beams = range(batch_size)
-    def beam_backtrace(prev_branchs, next_ids, n_best=beam_size, add_bos=True):
+    def beam_backtrace(prev_branchs, next_ids, n_best=beam_size):
        """
        Decode and select n_best sequences for one instance by backtrace.
        """
@@ -60,7 +86,8 @@ def translate_batch(exe, src_words, encoder, enc_in_names, enc_out_names,
                seq.append(next_ids[j][k])
                k = prev_branchs[j][k]
            seq = seq[::-1]
-            seq = [bos_idx] + seq if add_bos else seq
+            # Add the <bos>, since next_ids don't include the <bos>.
+            seq = [bos_idx] + seq
            seqs.append(seq)
        return seqs
@@ -82,8 +109,14 @@ def translate_batch(exe, src_words, encoder, enc_in_names, enc_out_names,
                             [-1e9]).astype("float32")
        # This is used to remove attention on the paddings of source sequences.
        trg_src_attn_bias = np.tile(
-            src_slf_attn_bias[:, :, ::src_max_length, :],
+            src_slf_attn_bias[:, :, ::src_max_length, :][:, np.newaxis],
-            [beam_size, 1, trg_max_len, 1])
+            [1, beam_size, 1, trg_max_len, 1]).reshape([
+                -1, src_slf_attn_bias.shape[1], trg_max_len,
+                src_slf_attn_bias.shape[-1]
+            ])
+        # Append the shape input to reshape the output of embedding layer.
+        trg_data_shape = np.array(
+            [batch_size * beam_size, trg_max_len, d_model], dtype="int32")
        # Append the shape inputs to reshape before and after softmax in
        # decoder self attention.
        trg_slf_attn_pre_softmax_shape = np.array(
@@ -96,26 +129,27 @@ def translate_batch(exe, src_words, encoder, enc_in_names, enc_out_names,
            [-1, trg_src_attn_bias.shape[-1]], dtype="int32")
        trg_src_attn_post_softmax_shape = np.array(
            trg_src_attn_bias.shape, dtype="int32")
-        enc_output = np.tile(enc_output, [beam_size, 1, 1])
+        enc_output = np.tile(
+            enc_output[:, np.newaxis], [1, beam_size, 1, 1]).reshape(
+                [-1, enc_output.shape[-2], enc_output.shape[-1]])
        return trg_words, trg_pos, trg_slf_attn_bias, trg_src_attn_bias, \
-            trg_slf_attn_pre_softmax_shape, trg_slf_attn_post_softmax_shape, \
+            trg_data_shape, trg_slf_attn_pre_softmax_shape, \
-            trg_src_attn_pre_softmax_shape, trg_src_attn_post_softmax_shape, \
+            trg_slf_attn_post_softmax_shape, trg_src_attn_pre_softmax_shape, \
-            enc_output
+            trg_src_attn_post_softmax_shape, enc_output
-    def update_dec_in_data(dec_in_data, next_ids, active_beams):
+    def update_dec_in_data(dec_in_data, next_ids, active_beams, beam_inst_map):
        """
        Update the input data of decoder mainly by slicing from the previous
        input data and dropping the finished instance beams.
        """
        trg_words, trg_pos, trg_slf_attn_bias, trg_src_attn_bias, \
-            trg_slf_attn_pre_softmax_shape, trg_slf_attn_post_softmax_shape, \
+            trg_data_shape, trg_slf_attn_pre_softmax_shape, \
-            trg_src_attn_pre_softmax_shape, trg_src_attn_post_softmax_shape, \
+            trg_slf_attn_post_softmax_shape, trg_src_attn_pre_softmax_shape, \
-            enc_output = dec_in_data
+            trg_src_attn_post_softmax_shape, enc_output = dec_in_data
-        trg_cur_len = len(next_ids[0]) + 1  # include the <bos>
+        trg_cur_len = trg_slf_attn_bias.shape[-1] + 1
        trg_words = np.array(
            [
-                beam_backtrace(
+                beam_backtrace(prev_branchs[beam_idx], next_ids[beam_idx])
-                    prev_branchs[beam_idx], next_ids[beam_idx], add_bos=True)
                for beam_idx in active_beams
            ],
            dtype="int64")
@@ -123,6 +157,7 @@ def translate_batch(exe, src_words, encoder, enc_in_names, enc_out_names,
        trg_pos = np.array(
            [range(1, trg_cur_len + 1)] * len(active_beams) * beam_size,
            dtype="int64").reshape([-1, 1])
+        active_beams = [beam_inst_map[beam_idx] for beam_idx in active_beams]
        active_beams_indice = (
            (np.array(active_beams) * beam_size)[:, np.newaxis] +
            np.array(range(beam_size))[np.newaxis, :]).flatten()
@@ -137,6 +172,10 @@ def translate_batch(exe, src_words, encoder, enc_in_names, enc_out_names,
        trg_src_attn_bias = np.tile(trg_src_attn_bias[
            active_beams_indice, :, ::trg_src_attn_bias.shape[2], :],
                                    [1, 1, trg_cur_len, 1])
+        # Append the shape input to reshape the output of embedding layer.
+        trg_data_shape = np.array(
+            [len(active_beams) * beam_size, trg_cur_len, d_model],
+            dtype="int32")
        # Append the shape inputs to reshape before and after softmax in
        # decoder self attention.
        trg_slf_attn_pre_softmax_shape = np.array(
@@ -151,9 +190,9 @@ def translate_batch(exe, src_words, encoder, enc_in_names, enc_out_names,
            trg_src_attn_bias.shape, dtype="int32")
        enc_output = enc_output[active_beams_indice, :, :]
        return trg_words, trg_pos, trg_slf_attn_bias, trg_src_attn_bias, \
-            trg_slf_attn_pre_softmax_shape, trg_slf_attn_post_softmax_shape, \
+            trg_data_shape, trg_slf_attn_pre_softmax_shape, \
-            trg_src_attn_pre_softmax_shape, trg_src_attn_post_softmax_shape, \
+            trg_slf_attn_post_softmax_shape, trg_src_attn_pre_softmax_shape, \
-            enc_output
+            trg_src_attn_post_softmax_shape, enc_output
    dec_in_data = init_dec_in_data(batch_size, beam_size, enc_in_data,
                                   enc_output)
@@ -162,13 +201,18 @@ def translate_batch(exe, src_words, encoder, enc_in_names, enc_out_names,
                              feed=dict(zip(dec_in_names, dec_in_data)),
                              fetch_list=dec_out_names)[0]
        predict_all = np.log(
-            predict_all.reshape([len(beam_map) * beam_size, i + 1, -1])[:,
+            predict_all.reshape([len(beam_inst_map) * beam_size, i + 1, -1])
-                                                                        -1, :])
+            [:, -1, :])
-        predict_all = (predict_all + scores[beam_map].reshape(
+        predict_all = (predict_all + scores[active_beams].reshape(
-            [len(beam_map) * beam_size, -1])).reshape(
+            [len(beam_inst_map) * beam_size, -1])).reshape(
-                [len(beam_map), beam_size, -1])
+                [len(beam_inst_map), beam_size, -1])
+        if not output_unk:  # To exclude the <unk> token.
+            predict_all[:, :, unk_idx] = -1e9
        active_beams = []
-        for inst_idx, beam_idx in enumerate(beam_map):
+        for beam_idx in range(batch_size):
+            if not beam_inst_map.has_key(beam_idx):
+                continue
+            inst_idx = beam_inst_map[beam_idx]
            predict = (predict_all[inst_idx, :, :]
                       if i != 0 else predict_all[inst_idx, 0, :]).flatten()
            top_k_indice = np.argpartition(predict, -beam_size)[-beam_size:]
@@ -181,13 +225,20 @@ def translate_batch(exe, src_words, encoder, enc_in_names, enc_out_names,
            next_ids[beam_idx].append(top_scores_ids % predict_all.shape[-1])
            if next_ids[beam_idx][-1][0] != eos_idx:
                active_beams.append(beam_idx)
-        beam_map = active_beams
+        if len(active_beams) == 0:
-        if len(beam_map) == 0:
            break
-        dec_in_data = update_dec_in_data(dec_in_data, next_ids, active_beams)
+        dec_in_data = update_dec_in_data(dec_in_data, next_ids, active_beams,
+                                         beam_inst_map)
+        beam_inst_map = {
+            beam_idx: inst_idx
+            for inst_idx, beam_idx in enumerate(active_beams)
+        }
    # Decode beams and select n_best sequences for each instance by backtrace.
-    seqs = [beam_backtrace(prev_branchs[beam_idx], next_ids[beam_idx], n_best)]
+    seqs = [
+        beam_backtrace(prev_branchs[beam_idx], next_ids[beam_idx], n_best)
+        for beam_idx in range(batch_size)
+    ]
    return seqs, scores[:, :n_best].tolist()
@@ -195,10 +246,8 @@ def translate_batch(exe, src_words, encoder, enc_in_names, enc_out_names,
 def main():
    place = fluid.CUDAPlace(0) if InferTaskConfig.use_gpu else fluid.CPUPlace()
    exe = fluid.Executor(place)
-    # The current program desc is coupled with batch_size and the only
-    # supported batch size is 1 currently.
    encoder_program = fluid.Program()
-    model.batch_size = InferTaskConfig.batch_size
    with fluid.program_guard(main_program=encoder_program):
        enc_output = encoder(
            ModelHyperParams.src_vocab_size + 1,
@@ -208,7 +257,6 @@ def main():
            ModelHyperParams.d_inner_hid, ModelHyperParams.dropout,
            ModelHyperParams.src_pad_idx, ModelHyperParams.pos_pad_idx)
-    model.batch_size = InferTaskConfig.batch_size * InferTaskConfig.beam_size
    decoder_program = fluid.Program()
    with fluid.program_guard(main_program=decoder_program):
        predict = decoder(
@@ -253,18 +301,52 @@ def main():
    trg_idx2word = paddle.dataset.wmt16.get_dict(
        "de", dict_size=ModelHyperParams.trg_vocab_size, reverse=True)
+    # Append the <pad> token since the dict provided by dataset.wmt16 does
+    # not include it.
+    trg_idx2word[ModelHyperParams.trg_pad_idx] = "<pad>"
+    def post_process_seq(seq,
+                         bos_idx=ModelHyperParams.bos_idx,
+                         eos_idx=ModelHyperParams.eos_idx,
+                         output_bos=InferTaskConfig.output_bos,
+                         output_eos=InferTaskConfig.output_eos):
+        """
+        Post-process the beam-search decoded sequence. Truncate from the first
+        <eos> and remove the <bos> and <eos> tokens currently.
+        """
+        eos_pos = len(seq) - 1
+        for i, idx in enumerate(seq):
+            if idx == eos_idx:
+                eos_pos = i
+                break
+        seq = seq[:eos_pos + 1]
+        return filter(
+            lambda idx: (output_bos or idx != bos_idx) and \
+                (output_eos or idx != eos_idx),
+            seq)
    for batch_id, data in enumerate(test_data()):
        batch_seqs, batch_scores = translate_batch(
-            exe, [item[0] for item in data], encoder_program,
+            exe, [item[0] for item in data],
-            encoder_input_data_names, [enc_output.name], decoder_program,
+            encoder_program,
-            decoder_input_data_names, [predict.name], InferTaskConfig.beam_size,
+            encoder_input_data_names, [enc_output.name],
-            InferTaskConfig.max_length, InferTaskConfig.n_best,
+            decoder_program,
-            len(data), ModelHyperParams.n_head, ModelHyperParams.src_pad_idx,
+            decoder_input_data_names, [predict.name],
-            ModelHyperParams.trg_pad_idx, ModelHyperParams.bos_idx,
+            InferTaskConfig.beam_size,
-            ModelHyperParams.eos_idx)
+            InferTaskConfig.max_length,
+            InferTaskConfig.n_best,
+            len(data),
+            ModelHyperParams.n_head,
+            ModelHyperParams.d_model,
+            ModelHyperParams.src_pad_idx,
+            ModelHyperParams.trg_pad_idx,
+            ModelHyperParams.bos_idx,
+            ModelHyperParams.eos_idx,
+            ModelHyperParams.unk_idx,
+            output_unk=InferTaskConfig.output_unk)
        for i in range(len(batch_seqs)):
-            seqs = batch_seqs[i]
+            # Post-process the beam-search decoded sequences.
+            seqs = map(post_process_seq, batch_seqs[i])
            scores = batch_scores[i]
            for seq in seqs:
                print(" ".join([trg_idx2word[idx] for idx in seq]))

--- a/fluid/neural_machine_translation/transformer/model.py
+++ b/fluid/neural_machine_translation/transformer/model.py
@@ -7,9 +7,6 @@ import paddle.fluid.layers as layers
 from config import TrainTaskConfig, pos_enc_param_names, \
    encoder_input_data_names, decoder_input_data_names, label_data_names
-# FIXME(guosheng): Remove out the batch_size from the model.
-batch_size = TrainTaskConfig.batch_size
 def position_encoding_init(n_position, d_pos_vec):
    """
@@ -85,9 +82,10 @@ def multi_head_attention(queries,
            return x
        hidden_size = x.shape[-1]
-        # FIXME(guosheng): Decouple the program desc with batch_size.
+        # The value 0 in shape attr means copying the corresponding dimension
+        # size of the input as the output dimension size.
        reshaped = layers.reshape(
-            x=x, shape=[batch_size, -1, n_head, hidden_size // n_head])
+            x=x, shape=[0, -1, n_head, hidden_size // n_head])
        # permuate the dimensions into:
        # [batch_size, n_head, max_sequence_len, hidden_size_per_head]
@@ -103,11 +101,11 @@ def multi_head_attention(queries,
            raise ValueError("Input(x) should be a 4-D Tensor.")
        trans_x = layers.transpose(x, perm=[0, 2, 1, 3])
-        # FIXME(guosheng): Decouple the program desc with batch_size.
+        # The value 0 in shape attr means copying the corresponding dimension
+        # size of the input as the output dimension size.
        return layers.reshape(
            x=trans_x,
-            shape=map(int,
+            shape=map(int, [0, -1, trans_x.shape[2] * trans_x.shape[3]]))
-                      [batch_size, -1, trans_x.shape[2] * trans_x.shape[3]]))
    def scaled_dot_product_attention(q, k, v, attn_bias, d_model, dropout_rate):
        """
@@ -205,6 +203,7 @@ def prepare_encoder(src_word,
                    src_max_len,
                    dropout_rate=0.,
                    pos_pad_idx=0,
+                    src_data_shape=None,
                    pos_enc_param_name=None):
    """Add word embeddings and position encodings.
    The output tensor has a shape of:
@@ -224,9 +223,10 @@ def prepare_encoder(src_word,
        param_attr=fluid.ParamAttr(
            name=pos_enc_param_name, trainable=False))
    enc_input = src_word_emb + src_pos_enc
+    enc_input = layers.reshape(
-    # FIXME(guosheng): Decouple the program desc with batch_size.
+        x=enc_input,
-    enc_input = layers.reshape(x=enc_input, shape=[batch_size, -1, src_emb_dim])
+        shape=[-1, src_max_len, src_emb_dim],
+        actual_shape=src_data_shape)
    return layers.dropout(
        enc_input, dropout_prob=dropout_rate,
        is_test=False) if dropout_rate else enc_input
@@ -401,20 +401,23 @@ def decoder(dec_input,
 def make_inputs(input_data_names,
                n_head,
                d_model,
-                batch_size,
                max_length,
                is_pos,
                slf_attn_bias_flag,
                src_attn_bias_flag,
                enc_output_flag=False,
+                data_shape_flag=True,
                slf_attn_shape_flag=True,
                src_attn_shape_flag=True):
    """
    Define the input data layers for the transformer model.
    """
    input_layers = []
-    # The shapes here act as placeholder.
+    batch_size = 1  # Only for the infer-shape in compile time.
-    # The shapes set here is to pass the infer-shape in compile time.
+    # The shapes here act as placeholder and are set to pass the infer-shape in
+    # compile time.
+    # The actual data shape of word is:
+    # [batch_size * max_len_in_batch, 1]
    word = layers.data(
        name=input_data_names[len(input_layers)],
        shape=[batch_size * max_length, 1],
@@ -422,6 +425,8 @@ def make_inputs(input_data_names,
        append_batch_size=False)
    input_layers += [word]
    # This is used for position data or label weight.
+    # The actual data shape of pos is:
+    # [batch_size * max_len_in_batch, 1]
    pos = layers.data(
        name=input_data_names[len(input_layers)],
        shape=[batch_size * max_length, 1],
@@ -432,6 +437,8 @@ def make_inputs(input_data_names,
        # This input is used to remove attention weights on paddings for the
        # encoder and to remove attention weights on subsequent words for the
        # decoder.
+        # The actual data shape of slf_attn_bias_flag is:
+        # [batch_size, n_head, max_len_in_batch, max_len_in_batch]
        slf_attn_bias = layers.data(
            name=input_data_names[len(input_layers)],
            shape=[batch_size, n_head, max_length, max_length],
@@ -439,40 +446,56 @@ def make_inputs(input_data_names,
            append_batch_size=False)
        input_layers += [slf_attn_bias]
    if src_attn_bias_flag:
-        # This input is used to remove attention weights on paddings.
+        # This input is used to remove attention weights on paddings. It's used
+        # in encoder-decoder attention.
+        # The actual data shape of slf_attn_bias_flag is:
+        # [batch_size, n_head, trg_max_len_in_batch, src_max_len_in_batch]
        src_attn_bias = layers.data(
            name=input_data_names[len(input_layers)],
            shape=[batch_size, n_head, max_length, max_length],
            dtype="float32",
            append_batch_size=False)
        input_layers += [src_attn_bias]
+    if data_shape_flag:
+        # This input is used to reshape the output of embedding layer.
+        data_shape = layers.data(
+            name=input_data_names[len(input_layers)],
+            shape=[3],
+            dtype="int32",
+            append_batch_size=False)
+        input_layers += [data_shape]
    if slf_attn_shape_flag:
+        # This shape input is used to reshape before softmax in self attention.
        slf_attn_pre_softmax_shape = layers.data(
            name=input_data_names[len(input_layers)],
-            shape=[3],
+            shape=[2],
            dtype="int32",
            append_batch_size=False)
        input_layers += [slf_attn_pre_softmax_shape]
+        # This shape input is used to reshape after softmax in self attention.
        slf_attn_post_softmax_shape = layers.data(
            name=input_data_names[len(input_layers)],
-            shape=[3],
+            shape=[4],
            dtype="int32",
            append_batch_size=False)
        input_layers += [slf_attn_post_softmax_shape]
    if src_attn_shape_flag:
        src_attn_pre_softmax_shape = layers.data(
            name=input_data_names[len(input_layers)],
-            shape=[3],
+            shape=[2],
            dtype="int32",
            append_batch_size=False)
        input_layers += [src_attn_pre_softmax_shape]
        src_attn_post_softmax_shape = layers.data(
            name=input_data_names[len(input_layers)],
-            shape=[3],
+            shape=[4],
            dtype="int32",
            append_batch_size=False)
        input_layers += [src_attn_post_softmax_shape]
    if enc_output_flag:
+        # This input is used in independent decoder program for inference.
+        # The actual data shape of slf_attn_bias_flag is:
+        # [batch_size, max_len_in_batch, d_model]
        enc_output = layers.data(
            name=input_data_names[len(input_layers)],
            shape=[batch_size, max_length, d_model],
@@ -497,16 +520,16 @@ def transformer(
        src_pad_idx,
        trg_pad_idx,
        pos_pad_idx, ):
-    enc_input_layers = make_inputs(
+    enc_inputs = make_inputs(
        encoder_input_data_names,
        n_head,
        d_model,
-        batch_size,
        max_length,
        is_pos=True,
        slf_attn_bias_flag=True,
        src_attn_bias_flag=False,
        enc_output_flag=False,
+        data_shape_flag=True,
        slf_attn_shape_flag=True,
        src_attn_shape_flag=False)
@@ -522,18 +545,18 @@ def transformer(
        dropout_rate,
        src_pad_idx,
        pos_pad_idx,
-        enc_input_layers, )
+        enc_inputs, )
-    dec_input_layers = make_inputs(
+    dec_inputs = make_inputs(
        decoder_input_data_names,
        n_head,
        d_model,
-        batch_size,
        max_length,
        is_pos=True,
        slf_attn_bias_flag=True,
        src_attn_bias_flag=True,
        enc_output_flag=False,
+        data_shape_flag=True,
        slf_attn_shape_flag=True,
        src_attn_shape_flag=True)
@@ -549,7 +572,7 @@ def transformer(
        dropout_rate,
        trg_pad_idx,
        pos_pad_idx,
-        dec_input_layers,
+        dec_inputs,
        enc_output, )
    # Padding index do not contribute to the total loss. The weights is used to
@@ -558,17 +581,20 @@ def transformer(
        label_data_names,
        n_head,
        d_model,
-        batch_size,
        max_length,
        is_pos=False,
        slf_attn_bias_flag=False,
        src_attn_bias_flag=False,
        enc_output_flag=False,
+        data_shape_flag=False,
        slf_attn_shape_flag=False,
        src_attn_shape_flag=False)
    cost = layers.softmax_with_cross_entropy(logits=predict, label=gold)
    weighted_cost = cost * weights
-    return layers.reduce_sum(weighted_cost), predict
+    sum_cost = layers.reduce_sum(weighted_cost)
+    token_num = layers.reduce_sum(weights)
+    avg_cost = sum_cost / token_num
+    return sum_cost, avg_cost, predict, token_num
 def wrap_encoder(src_vocab_size,
@@ -582,28 +608,30 @@ def wrap_encoder(src_vocab_size,
                 dropout_rate,
                 src_pad_idx,
                 pos_pad_idx,
-                 enc_input_layers=None):
+                 enc_inputs=None):
    """
    The wrapper assembles together all needed layers for the encoder.
    """
-    if enc_input_layers is None:
+    if enc_inputs is None:
        # This is used to implement independent encoder program in inference.
-        src_word, src_pos, src_slf_attn_bias, slf_attn_pre_softmax_shape, \
+        src_word, src_pos, src_slf_attn_bias, src_data_shape, \
-            slf_attn_post_softmax_shape = make_inputs(
+            slf_attn_pre_softmax_shape, slf_attn_post_softmax_shape = \
+            make_inputs(
                encoder_input_data_names,
                n_head,
                d_model,
-                batch_size,
                max_length,
                is_pos=True,
                slf_attn_bias_flag=True,
                src_attn_bias_flag=False,
                enc_output_flag=False,
+                data_shape_flag=True,
                slf_attn_shape_flag=True,
                src_attn_shape_flag=False)
    else:
-        src_word, src_pos, src_slf_attn_bias, slf_attn_pre_softmax_shape, \
+        src_word, src_pos, src_slf_attn_bias, src_data_shape, \
-            slf_attn_post_softmax_shape = enc_input_layers
+            slf_attn_pre_softmax_shape, slf_attn_post_softmax_shape = \
+            enc_inputs
    enc_input = prepare_encoder(
        src_word,
        src_pos,
@@ -611,7 +639,9 @@ def wrap_encoder(src_vocab_size,
        d_model,
        src_pad_idx,
        max_length,
-        dropout_rate, )
+        dropout_rate,
+        pos_pad_idx,
+        src_data_shape, )
    enc_output = encoder(
        enc_input,
        src_slf_attn_bias,
@@ -638,33 +668,33 @@ def wrap_decoder(trg_vocab_size,
                 dropout_rate,
                 trg_pad_idx,
                 pos_pad_idx,
-                 dec_input_layers=None,
+                 dec_inputs=None,
                 enc_output=None):
    """
    The wrapper assembles together all needed layers for the decoder.
    """
-    if dec_input_layers is None:
+    if dec_inputs is None:
        # This is used to implement independent decoder program in inference.
        trg_word, trg_pos, trg_slf_attn_bias, trg_src_attn_bias, \
-            slf_attn_pre_softmax_shape, slf_attn_post_softmax_shape, \
+            trg_data_shape, slf_attn_pre_softmax_shape, \
-            src_attn_pre_softmax_shape, src_attn_post_softmax_shape, \
+            slf_attn_post_softmax_shape, src_attn_pre_softmax_shape, \
-            enc_output = make_inputs(
+            src_attn_post_softmax_shape, enc_output = make_inputs(
                decoder_input_data_names,
                n_head,
                d_model,
-                batch_size,
                max_length,
                is_pos=True,
                slf_attn_bias_flag=True,
                src_attn_bias_flag=True,
                enc_output_flag=True,
+                data_shape_flag=True,
                slf_attn_shape_flag=True,
                src_attn_shape_flag=True)
    else:
        trg_word, trg_pos, trg_slf_attn_bias, trg_src_attn_bias, \
-            slf_attn_pre_softmax_shape, slf_attn_post_softmax_shape, \
+            trg_data_shape, slf_attn_pre_softmax_shape, \
-            src_attn_pre_softmax_shape, src_attn_post_softmax_shape = \
+            slf_attn_post_softmax_shape, src_attn_pre_softmax_shape, \
-                dec_input_layers
+            src_attn_post_softmax_shape = dec_inputs
    dec_input = prepare_decoder(
        trg_word,
@@ -673,7 +703,9 @@ def wrap_decoder(trg_vocab_size,
        d_model,
        trg_pad_idx,
        max_length,
-        dropout_rate, )
+        dropout_rate,
+        pos_pad_idx,
+        trg_data_shape, )
    dec_output = decoder(
        dec_input,
        enc_output,
@@ -697,5 +729,5 @@ def wrap_decoder(trg_vocab_size,
                    bias_attr=False,
                    num_flatten_dims=2),
        shape=[-1, trg_vocab_size],
-        act="softmax" if dec_input_layers is None else None)
+        act="softmax" if dec_inputs is None else None)
    return predict
--- a/fluid/neural_machine_translation/transformer/train.py
+++ b/fluid/neural_machine_translation/transformer/train.py
 import os
+import time
 import numpy as np
-import paddle.v2 as paddle
+import paddle
 import paddle.fluid as fluid
 from model import transformer, position_encoding_init
@@ -56,7 +57,7 @@ def pad_batch_data(insts,
 def prepare_batch_input(insts, input_data_names, src_pad_idx, trg_pad_idx,
-                        max_length, n_head):
+                        n_head, d_model):
    """
    Put all padded data needed by training into a dict.
    """
@@ -66,6 +67,10 @@ def prepare_batch_input(insts, input_data_names, src_pad_idx, trg_pad_idx,
        [inst[1] for inst in insts], trg_pad_idx, n_head, is_target=True)
    trg_src_attn_bias = np.tile(src_slf_attn_bias[:, :, ::src_max_len, :],
                                [1, 1, trg_max_len, 1]).astype("float32")
+    # These shape tensors are used in reshape_op.
+    src_data_shape = np.array([len(insts), src_max_len, d_model], dtype="int32")
+    trg_data_shape = np.array([len(insts), trg_max_len, d_model], dtype="int32")
    src_slf_attn_pre_softmax_shape = np.array(
        [-1, src_slf_attn_bias.shape[-1]], dtype="int32")
    src_slf_attn_post_softmax_shape = np.array(
@@ -78,17 +83,19 @@ def prepare_batch_input(insts, input_data_names, src_pad_idx, trg_pad_idx,
        [-1, trg_src_attn_bias.shape[-1]], dtype="int32")
    trg_src_attn_post_softmax_shape = np.array(
        trg_src_attn_bias.shape, dtype="int32")
    lbl_word = pad_batch_data([inst[2] for inst in insts], trg_pad_idx, n_head,
                              False, False, False, False)
    lbl_weight = (lbl_word != trg_pad_idx).astype("float32").reshape([-1, 1])
    input_dict = dict(
        zip(input_data_names, [
-            src_word, src_pos, src_slf_attn_bias,
+            src_word, src_pos, src_slf_attn_bias, src_data_shape,
            src_slf_attn_pre_softmax_shape, src_slf_attn_post_softmax_shape,
            trg_word, trg_pos, trg_slf_attn_bias, trg_src_attn_bias,
-            trg_slf_attn_pre_softmax_shape, trg_slf_attn_post_softmax_shape,
+            trg_data_shape, trg_slf_attn_pre_softmax_shape,
-            trg_src_attn_pre_softmax_shape, trg_src_attn_post_softmax_shape,
+            trg_slf_attn_post_softmax_shape, trg_src_attn_pre_softmax_shape,
-            lbl_word, lbl_weight
+            trg_src_attn_post_softmax_shape, lbl_word, lbl_weight
        ]))
    return input_dict
@@ -97,7 +104,7 @@ def main():
    place = fluid.CUDAPlace(0) if TrainTaskConfig.use_gpu else fluid.CPUPlace()
    exe = fluid.Executor(place)
-    cost, predict = transformer(
+    sum_cost, avg_cost, predict, token_num = transformer(
        ModelHyperParams.src_vocab_size + 1,
        ModelHyperParams.trg_vocab_size + 1, ModelHyperParams.max_length + 1,
        ModelHyperParams.n_layer, ModelHyperParams.n_head,
@@ -114,7 +121,7 @@ def main():
        beta1=TrainTaskConfig.beta1,
        beta2=TrainTaskConfig.beta2,
        epsilon=TrainTaskConfig.eps)
-    optimizer.minimize(cost)
+    optimizer.minimize(avg_cost if TrainTaskConfig.use_avg_cost else sum_cost)
    train_data = paddle.batch(
        paddle.reader.shuffle(
@@ -126,27 +133,31 @@ def main():
    # Program to do validation.
    test_program = fluid.default_main_program().clone()
    with fluid.program_guard(test_program):
-        test_program = fluid.io.get_inference_program([cost])
+        test_program = fluid.io.get_inference_program([avg_cost])
    val_data = paddle.batch(
        paddle.dataset.wmt16.validation(ModelHyperParams.src_vocab_size,
                                        ModelHyperParams.trg_vocab_size),
        batch_size=TrainTaskConfig.batch_size)
    def test(exe):
-        test_costs = []
+        test_total_cost = 0
+        test_total_token = 0
        for batch_id, data in enumerate(val_data()):
-            if len(data) != TrainTaskConfig.batch_size:
-                continue
            data_input = prepare_batch_input(
                data, encoder_input_data_names + decoder_input_data_names[:-1] +
                label_data_names, ModelHyperParams.src_pad_idx,
-                ModelHyperParams.trg_pad_idx, ModelHyperParams.max_length,
+                ModelHyperParams.trg_pad_idx, ModelHyperParams.n_head,
-                ModelHyperParams.n_head)
+                ModelHyperParams.d_model)
-            test_cost = exe.run(test_program,
+            test_sum_cost, test_token_num = exe.run(
-                                feed=data_input,
+                test_program,
-                                fetch_list=[cost])[0]
+                feed=data_input,
-            test_costs.append(test_cost)
+                fetch_list=[sum_cost, token_num],
-        return np.mean(test_costs)
+                use_program_cache=True)
+            test_total_cost += test_sum_cost
+            test_total_token += test_token_num
+        test_avg_cost = test_total_cost / test_total_token
+        test_ppl = np.exp([min(test_avg_cost, 100)])
+        return test_avg_cost, test_ppl
    # Initialize the parameters.
    exe.run(fluid.framework.default_startup_program())
@@ -158,27 +169,28 @@ def main():
                                   ModelHyperParams.d_model), place)
    for pass_id in xrange(TrainTaskConfig.pass_num):
+        pass_start_time = time.time()
        for batch_id, data in enumerate(train_data()):
-            # The current program desc is coupled with batch_size, thus all
-            # mini-batches must have the same number of instances currently.
-            if len(data) != TrainTaskConfig.batch_size:
-                continue
            data_input = prepare_batch_input(
                data, encoder_input_data_names + decoder_input_data_names[:-1] +
                label_data_names, ModelHyperParams.src_pad_idx,
-                ModelHyperParams.trg_pad_idx, ModelHyperParams.max_length,
+                ModelHyperParams.trg_pad_idx, ModelHyperParams.n_head,
-                ModelHyperParams.n_head)
+                ModelHyperParams.d_model)
            lr_scheduler.update_learning_rate(data_input)
            outs = exe.run(fluid.framework.default_main_program(),
                           feed=data_input,
-                           fetch_list=[cost],
+                           fetch_list=[sum_cost, avg_cost],
                           use_program_cache=True)
-            cost_val = np.array(outs[0])
+            sum_cost_val, avg_cost_val = np.array(outs[0]), np.array(outs[1])
-            print("pass_id = " + str(pass_id) + " batch = " + str(batch_id) +
+            print("epoch: %d, batch: %d, sum loss: %f, avg loss: %f, ppl: %f" %
-                  " cost = " + str(cost_val))
+                  (pass_id, batch_id, sum_cost_val, avg_cost_val,
+                   np.exp([min(avg_cost_val[0], 100)])))
        # Validate and save the model for inference.
-        val_cost = test(exe)
+        val_avg_cost, val_ppl = test(exe)
-        print("pass_id = " + str(pass_id) + " val_cost = " + str(val_cost))
+        pass_end_time = time.time()
+        time_consumed = pass_end_time - pass_start_time
+        print("epoch: %d, val avg loss: %f, val ppl: %f, "
+              "consumed %fs" % (pass_id, val_avg_cost, val_ppl, time_consumed))
        fluid.io.save_inference_model(
            os.path.join(TrainTaskConfig.model_dir,
                         "pass_" + str(pass_id) + ".infer.model"),

--- a/fluid/object_detection/mobilenet_ssd.py
+++ b/fluid/object_detection/mobilenet_ssd.py
@@ -13,7 +13,7 @@ def conv_bn(input,
            num_groups=1,
            act='relu',
            use_cudnn=True):
-    parameter_attr = ParamAttr(learning_rate=0.1, initializer=MSRA())
+    parameter_attr = ParamAttr(initializer=MSRA())
    conv = fluid.layers.conv2d(
        input=input,
        num_filters=num_filters,
@@ -25,14 +25,11 @@ def conv_bn(input,
        use_cudnn=use_cudnn,
        param_attr=parameter_attr,
        bias_attr=False)
-    parameter_attr = ParamAttr(learning_rate=0.1, initializer=MSRA())
+    #parameter_attr = ParamAttr(learning_rate=0.1, initializer=MSRA())
-    bias_attr = ParamAttr(learning_rate=0.2)
+    #bias_attr = ParamAttr(learning_rate=0.2)
-    return fluid.layers.batch_norm(
+    return fluid.layers.batch_norm(input=conv, act=act, epsilon=0.00001)
-        input=conv,
+    #param_attr=parameter_attr,
-        act=act,
+    #bias_attr=bias_attr)
-        epsilon=0.00001,
-        param_attr=parameter_attr,
-        bias_attr=bias_attr)
 def depthwise_separable(input, num_filters1, num_filters2, num_groups, stride,
@@ -76,7 +73,7 @@ def extra_block(input, num_filters1, num_filters2, num_groups, stride, scale):
    return normal_conv
-def mobile_net(img, img_shape, scale=1.0):
+def mobile_net(num_classes, img, img_shape, scale=1.0):
    # 300x300
    tmp = conv_bn(img, 3, int(32 * scale), 2, 1, 3)
    # 150x150
@@ -104,10 +101,11 @@ def mobile_net(img, img_shape, scale=1.0):
    module16 = extra_block(module15, 128, 256, 1, 2, scale)
    # 2x2
    module17 = extra_block(module16, 64, 128, 1, 2, scale)
    mbox_locs, mbox_confs, box, box_var = fluid.layers.multi_box_head(
        inputs=[module11, module13, module14, module15, module16, module17],
        image=img,
-        num_classes=21,
+        num_classes=num_classes,
        min_ratio=20,
        max_ratio=90,
        min_sizes=[60.0, 105.0, 150.0, 195.0, 240.0, 285.0],

--- a/fluid/object_detection/reader.py
+++ b/fluid/object_detection/reader.py
@@ -16,19 +16,29 @@ import image_util
 from paddle.utils.image_util import *
 import random
 from PIL import Image
+from PIL import ImageDraw
 import numpy as np
 import xml.etree.ElementTree
 import os
+import time
+import copy
+# cocoapi 
+from pycocotools.coco import COCO
+from pycocotools.cocoeval import COCOeval
 class Settings(object):
-    def __init__(self, data_dir, label_file, resize_h, resize_w, mean_value,
+    def __init__(self, dataset, toy, data_dir, label_file, resize_h, resize_w,
-                 apply_distort, apply_expand):
+                 mean_value, apply_distort, apply_expand):
+        self._dataset = dataset
+        self._toy = toy
        self._data_dir = data_dir
-        self._label_list = []
+        if dataset == "pascalvoc":
-        label_fpath = os.path.join(data_dir, label_file)
+            self._label_list = []
-        for line in open(label_fpath):
+            label_fpath = os.path.join(data_dir, label_file)
-            self._label_list.append(line.strip())
+            for line in open(label_fpath):
+                self._label_list.append(line.strip())
        self._apply_distort = apply_distort
        self._apply_expand = apply_expand
@@ -47,6 +57,14 @@ class Settings(object):
        self._brightness_prob = 0.5
        self._brightness_delta = 0.125
+    @property
+    def dataset(self):
+        return self._dataset
+    @property
+    def toy(self):
+        return self._toy
    @property
    def apply_distort(self):
        return self._apply_expand
@@ -59,6 +77,10 @@ class Settings(object):
    def data_dir(self):
        return self._data_dir
+    @data_dir.setter
+    def data_dir(self, data_dir):
+        self._data_dir = data_dir
    @property
    def label_list(self):
        return self._label_list
@@ -78,23 +100,72 @@ class Settings(object):
 def _reader_creator(settings, file_list, mode, shuffle):
    def reader():
-        with open(file_list) as flist:
+        if settings.dataset == 'coco':
-            lines = [line.strip() for line in flist]
+            coco = COCO(file_list)
-            if shuffle:
+            image_ids = coco.getImgIds()
-                random.shuffle(lines)
+            images = coco.loadImgs(image_ids)
-            for line in lines:
+            category_ids = coco.getCatIds()
+            category_names = [
+                item['name'] for item in coco.loadCats(category_ids)
+            ]
+        elif settings.dataset == 'pascalvoc':
+            flist = open(file_list)
+            images = [line.strip() for line in flist]
+        if not settings.toy == 0:
+            images = images[:settings.toy] if len(
+                images) > settings.toy else images
+        print("{} on {} with {} images".format(mode, settings.dataset,
+                                               len(images)))
+        if shuffle:
+            random.shuffle(images)
+        for image in images:
+            if settings.dataset == 'coco':
+                image_name = image['file_name']
+                image_path = os.path.join(settings.data_dir, image_name)
+            elif settings.dataset == 'pascalvoc':
                if mode == 'train' or mode == 'test':
-                    img_path, label_path = line.split()
+                    image_path, label_path = image.split()
-                    img_path = os.path.join(settings.data_dir, img_path)
+                    image_path = os.path.join(settings.data_dir, image_path)
                    label_path = os.path.join(settings.data_dir, label_path)
                elif mode == 'infer':
-                    img_path = os.path.join(settings.data_dir, line)
+                    image_path = os.path.join(settings.data_dir, image)
-                img = Image.open(img_path)
+            img = Image.open(image_path)
-                img_width, img_height = img.size
+            if img.mode == 'L':
+                img = img.convert('RGB')
+            img_width, img_height = img.size
-                # layout: label | xmin | ymin | xmax | ymax | difficult
+            if mode == 'train' or mode == 'test':
-                if mode == 'train' or mode == 'test':
+                if settings.dataset == 'coco':
+                    # layout: category_id | xmin | ymin | xmax | ymax | iscrowd | origin_coco_bbox | segmentation | area | image_id | annotation_id
+                    bbox_labels = []
+                    annIds = coco.getAnnIds(imgIds=image['id'])
+                    anns = coco.loadAnns(annIds)
+                    for ann in anns:
+                        bbox_sample = []
+                        # start from 1, leave 0 to background
+                        bbox_sample.append(
+                            float(category_ids.index(ann['category_id'])) + 1)
+                        bbox = ann['bbox']
+                        xmin, ymin, w, h = bbox
+                        xmax = xmin + w
+                        ymax = ymin + h
+                        bbox_sample.append(float(xmin) / img_width)
+                        bbox_sample.append(float(ymin) / img_height)
+                        bbox_sample.append(float(xmax) / img_width)
+                        bbox_sample.append(float(ymax) / img_height)
+                        bbox_sample.append(float(ann['iscrowd']))
+                        #bbox_sample.append(ann['bbox'])
+                        #bbox_sample.append(ann['segmentation'])
+                        #bbox_sample.append(ann['area'])
+                        #bbox_sample.append(ann['image_id'])
+                        #bbox_sample.append(ann['id'])
+                        bbox_labels.append(bbox_sample)
+                elif settings.dataset == 'pascalvoc':
+                    # layout: label | xmin | ymin | xmax | ymax | difficult
                    bbox_labels = []
                    root = xml.etree.ElementTree.parse(label_path).getroot()
                    for object in root.findall('object'):
@@ -117,91 +188,138 @@ def _reader_creator(settings, file_list, mode, shuffle):
                        bbox_sample.append(difficult)
                        bbox_labels.append(bbox_sample)
-                    sample_labels = bbox_labels
+                sample_labels = bbox_labels
-                    if mode == 'train':
-                        if settings._apply_distort:
-                            img = image_util.distort_image(img, settings)
-                        if settings._apply_expand:
-                            img, bbox_labels = image_util.expand_image(
-                                img, bbox_labels, img_width, img_height,
-                                settings)
-                        batch_sampler = []
-                        # hard-code here
-                        batch_sampler.append(
-                            image_util.sampler(1, 1, 1.0, 1.0, 1.0, 1.0, 0.0,
-                                               0.0))
-                        batch_sampler.append(
-                            image_util.sampler(1, 50, 0.3, 1.0, 0.5, 2.0, 0.1,
-                                               0.0))
-                        batch_sampler.append(
-                            image_util.sampler(1, 50, 0.3, 1.0, 0.5, 2.0, 0.3,
-                                               0.0))
-                        batch_sampler.append(
-                            image_util.sampler(1, 50, 0.3, 1.0, 0.5, 2.0, 0.5,
-                                               0.0))
-                        batch_sampler.append(
-                            image_util.sampler(1, 50, 0.3, 1.0, 0.5, 2.0, 0.7,
-                                               0.0))
-                        batch_sampler.append(
-                            image_util.sampler(1, 50, 0.3, 1.0, 0.5, 2.0, 0.9,
-                                               0.0))
-                        batch_sampler.append(
-                            image_util.sampler(1, 50, 0.3, 1.0, 0.5, 2.0, 0.0,
-                                               1.0))
-                        """ random crop """
-                        sampled_bbox = image_util.generate_batch_samples(
-                            batch_sampler, bbox_labels, img_width, img_height)
-                        img = np.array(img)
-                        if len(sampled_bbox) > 0:
-                            idx = int(random.uniform(0, len(sampled_bbox)))
-                            img, sample_labels = image_util.crop_image(
-                                img, bbox_labels, sampled_bbox[idx], img_width,
-                                img_height)
-                        img = Image.fromarray(img)
-                img = img.resize((settings.resize_w, settings.resize_h),
-                                 Image.ANTIALIAS)
-                img = np.array(img)
                if mode == 'train':
-                    mirror = int(random.uniform(0, 2))
+                    if settings._apply_distort:
-                    if mirror == 1:
+                        img = image_util.distort_image(img, settings)
-                        img = img[:, ::-1, :]
+                    if settings._apply_expand:
-                        for i in xrange(len(sample_labels)):
+                        img, bbox_labels = image_util.expand_image(
-                            tmp = sample_labels[i][1]
+                            img, bbox_labels, img_width, img_height, settings)
-                            sample_labels[i][1] = 1 - sample_labels[i][3]
+                    batch_sampler = []
-                            sample_labels[i][3] = 1 - tmp
+                    # hard-code here
+                    batch_sampler.append(
-                if len(img.shape) == 3:
+                        image_util.sampler(1, 1, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0))
-                    img = np.swapaxes(img, 1, 2)
+                    batch_sampler.append(
-                    img = np.swapaxes(img, 1, 0)
+                        image_util.sampler(1, 50, 0.3, 1.0, 0.5, 2.0, 0.1, 0.0))
+                    batch_sampler.append(
-                img = img[[2, 1, 0], :, :]
+                        image_util.sampler(1, 50, 0.3, 1.0, 0.5, 2.0, 0.3, 0.0))
-                img = img.astype('float32')
+                    batch_sampler.append(
-                img -= settings.img_mean
+                        image_util.sampler(1, 50, 0.3, 1.0, 0.5, 2.0, 0.5, 0.0))
-                img = img.flatten()
+                    batch_sampler.append(
-                img = img * 0.007843
+                        image_util.sampler(1, 50, 0.3, 1.0, 0.5, 2.0, 0.7, 0.0))
+                    batch_sampler.append(
-                sample_labels = np.array(sample_labels)
+                        image_util.sampler(1, 50, 0.3, 1.0, 0.5, 2.0, 0.9, 0.0))
-                if mode == 'train' or mode == 'test':
+                    batch_sampler.append(
-                    if mode == 'train' and len(sample_labels) == 0: continue
+                        image_util.sampler(1, 50, 0.3, 1.0, 0.5, 2.0, 0.0, 1.0))
-                    yield img.astype(
+                    """ random crop """
-                        'float32'
+                    sampled_bbox = image_util.generate_batch_samples(
-                    ), sample_labels[:, 1:5], sample_labels[:, 0].astype(
+                        batch_sampler, bbox_labels, img_width, img_height)
-                        'int32'), sample_labels[:, -1].astype('int32')
-                elif mode == 'infer':
+                    img = np.array(img)
-                    yield img.astype('float32')
+                    if len(sampled_bbox) > 0:
+                        idx = int(random.uniform(0, len(sampled_bbox)))
+                        img, sample_labels = image_util.crop_image(
+                            img, bbox_labels, sampled_bbox[idx], img_width,
+                            img_height)
+                    img = Image.fromarray(img)
+            img = img.resize((settings.resize_w, settings.resize_h),
+                             Image.ANTIALIAS)
+            img = np.array(img)
+            if mode == 'train':
+                mirror = int(random.uniform(0, 2))
+                if mirror == 1:
+                    img = img[:, ::-1, :]
+                    for i in xrange(len(sample_labels)):
+                        tmp = sample_labels[i][1]
+                        sample_labels[i][1] = 1 - sample_labels[i][3]
+                        sample_labels[i][3] = 1 - tmp
+            #draw_bounding_box_on_image(img, sample_labels, image_name, category_names, normalized=True)
+            # HWC to CHW
+            if len(img.shape) == 3:
+                img = np.swapaxes(img, 1, 2)
+                img = np.swapaxes(img, 1, 0)
+            # RBG to BGR
+            img = img[[2, 1, 0], :, :]
+            img = img.astype('float32')
+            img -= settings.img_mean
+            img = img.flatten()
+            img = img * 0.007843
+            sample_labels = np.array(sample_labels)
+            if mode == 'train' or mode == 'test':
+                if mode == 'train' and len(sample_labels) == 0: continue
+                if mode == 'test' and len(sample_labels) == 0: continue
+                yield img.astype(
+                    'float32'
+                ), sample_labels[:, 1:5], sample_labels[:, 0].astype(
+                    'int32'), sample_labels[:, -1].astype('int32')
+            elif mode == 'infer':
+                yield img.astype('float32')
    return reader
+def draw_bounding_box_on_image(image,
+                               sample_labels,
+                               image_name,
+                               category_names,
+                               color='red',
+                               thickness=4,
+                               with_text=True,
+                               normalized=True):
+    image = Image.fromarray(image)
+    draw = ImageDraw.Draw(image)
+    im_width, im_height = image.size
+    if not normalized:
+        im_width, im_height = 1, 1
+    for item in sample_labels:
+        label = item[0]
+        category_name = category_names[int(label)]
+        bbox = item[1:5]
+        xmin, ymin, xmax, ymax = bbox
+        (left, right, top, bottom) = (xmin * im_width, xmax * im_width,
+                                      ymin * im_height, ymax * im_height)
+        draw.line(
+            [(left, top), (left, bottom), (right, bottom), (right, top),
+             (left, top)],
+            width=thickness,
+            fill=color)
+        #draw.rectangle([xmin, ymin, xmax, ymax], outline=color)
+        if with_text:
+            if image.mode == 'RGB':
+                draw.text((left, top), category_name, (255, 255, 0))
+    image.save(image_name)
 def train(settings, file_list, shuffle=True):
-    return _reader_creator(settings, file_list, 'train', shuffle)
+    if settings.dataset == 'coco':
+        train_settings = copy.copy(settings)
+        if '2014' in file_list:
+            sub_dir = "train2014"
+        elif '2017' in file_list:
+            sub_dir = "train2017"
+        train_settings.data_dir = os.path.join(settings.data_dir, sub_dir)
+        file_list = os.path.join(settings.data_dir, file_list)
+        return _reader_creator(train_settings, file_list, 'train', shuffle)
+    elif settings.dataset == 'pascalvoc':
+        return _reader_creator(settings, file_list, 'train', shuffle)
 def test(settings, file_list):
-    return _reader_creator(settings, file_list, 'test', False)
+    if settings.dataset == 'coco':
+        test_settings = copy.copy(settings)
+        if '2014' in file_list:
+            sub_dir = "val2014"
+        elif '2017' in file_list:
+            sub_dir = "val2017"
+        test_settings.data_dir = os.path.join(settings.data_dir, sub_dir)
+        file_list = os.path.join(settings.data_dir, file_list)
+        return _reader_creator(test_settings, file_list, 'test', False)
+    elif settings.dataset == 'pascalvoc':
+        return _reader_creator(settings, file_list, 'test', False)
 def infer(settings, file_list):

--- a/fluid/object_detection/train.py
+++ b/fluid/object_detection/train.py
-import paddle.v2 as paddle
+import paddle
 import paddle.fluid as fluid
 import reader
 import load_model as load_model
 from mobilenet_ssd import mobile_net
 from utility import add_arguments, print_arguments
 import os
+import time
 import numpy as np
 import argparse
 import functools
 parser = argparse.ArgumentParser(description=__doc__)
 add_arg = functools.partial(add_arguments, argparser=parser)
-# yapf: disable
+add_arg('learning_rate', float, 0.001, "Learning rate.")
-add_arg('batch_size',   int,    32,       "Minibatch size.")
+add_arg('batch_size', int, 32, "Minibatch size.")
-add_arg('parallel',     bool,   True,     "Whether use parallel training.")
+add_arg('num_passes', int, 25, "Epoch number.")
-add_arg('use_gpu',      bool,   True,     "Whether use GPU.")
+add_arg('parallel', bool, True, "Whether use parallel training.")
-# yapf: disable
+add_arg('use_gpu', bool, True, "Whether use GPU.")
+add_arg('data_dir', str, './data/COCO17', "Root path of data")
+add_arg('train_file_list', str, 'annotations/instances_train2017.json',
+        "train file list")
+add_arg('val_file_list', str, 'annotations/instances_val2017.json',
+        "vaild file list")
+add_arg('model_save_dir', str, 'model_COCO17', "where to save model")
+add_arg('dataset', str, 'coco', "coco or pascalvoc")
+add_arg(
+    'is_toy', int, 0,
+    "Is Toy for quick debug, 0 means using all data, while n means using only n sample"
+)
+add_arg('label_file', str, 'label_list',
+        "Lable file which lists all label name")
+add_arg('apply_distort', bool, True, "Whether apply distort")
+add_arg('apply_expand', bool, False, "Whether appley expand")
+add_arg('resize_h', int, 300, "resize image size")
+add_arg('resize_w', int, 300, "resize image size")
+add_arg('mean_value_B', float, 127.5,
+        "mean value which will be subtracted")  #123.68
+add_arg('mean_value_G', float, 127.5,
+        "mean value which will be subtracted")  #116.78
+add_arg('mean_value_R', float, 127.5,
+        "mean value which will be subtracted")  #103.94
 def train(args,
@@ -28,6 +53,10 @@ def train(args,
          model_save_dir='model',
          init_model_path=None):
    image_shape = [3, data_args.resize_h, data_args.resize_w]
+    if data_args.dataset == 'coco':
+        num_classes = 81
+    elif data_args.dataset == 'pascalvoc':
+        num_classes = 21
    image = fluid.layers.data(name='image', shape=image_shape, dtype='float32')
    gt_box = fluid.layers.data(
@@ -45,9 +74,10 @@ def train(args,
            gt_box_ = pd.read_input(gt_box)
            gt_label_ = pd.read_input(gt_label)
            difficult_ = pd.read_input(difficult)
-            locs, confs, box, box_var = mobile_net(image_, image_shape)
+            locs, confs, box, box_var = mobile_net(num_classes, image_,
-            loss = fluid.layers.ssd_loss(locs, confs, gt_box_, gt_label_,
+                                                   image_shape)
-                                         box, box_var)
+            loss = fluid.layers.ssd_loss(locs, confs, gt_box_, gt_label_, box,
+                                         box_var)
            nmsed_out = fluid.layers.detection_output(
                locs, confs, box, box_var, nms_threshold=0.45)
            loss = fluid.layers.reduce_sum(loss)
@@ -57,11 +87,11 @@ def train(args,
        loss, nmsed_out = pd()
        loss = fluid.layers.mean(loss)
    else:
-        locs, confs, box, box_var = mobile_net(image, image_shape)
+        locs, confs, box, box_var = mobile_net(num_classes, image, image_shape)
        nmsed_out = fluid.layers.detection_output(
            locs, confs, box, box_var, nms_threshold=0.45)
-        loss = fluid.layers.ssd_loss(locs, confs, gt_box, gt_label,
+        loss = fluid.layers.ssd_loss(locs, confs, gt_box, gt_label, box,
-                                     box, box_var)
+                                     box_var)
        loss = fluid.layers.reduce_sum(loss)
    test_program = fluid.default_main_program().clone(for_test=True)
@@ -71,13 +101,20 @@ def train(args,
            gt_label,
            gt_box,
            difficult,
-            21,
+            num_classes,
            overlap_threshold=0.5,
            evaluate_difficult=False,
-            ap_version='11point')
+            ap_version='integral')
-    boundaries = [40000, 60000]
+    if data_args.dataset == 'coco':
-    values = [0.001, 0.0005, 0.00025]
+        # learning rate decay in 12, 19 pass, respectively
+        if '2014' in train_file_list:
+            boundaries = [82783 / batch_size * 12, 82783 / batch_size * 19]
+        elif '2017' in train_file_list:
+            boundaries = [118287 / batch_size * 12, 118287 / batch_size * 19]
+    elif data_args.dataset == 'pascalvoc':
+        boundaries = [40000, 60000]
+    values = [learning_rate, learning_rate * 0.5, learning_rate * 0.25]
    optimizer = fluid.optimizer.RMSProp(
        learning_rate=fluid.layers.piecewise_decay(boundaries, values),
        regularization=fluid.regularizer.L2Decay(0.00005), )
@@ -88,8 +125,8 @@ def train(args,
    exe = fluid.Executor(place)
    exe.run(fluid.default_startup_program())
-    load_model.load_and_set_vars(place)
+    #load_model.load_and_set_vars(place)
-    #load_model.load_paddlev1_vars(place)
+    load_model.load_paddlev1_vars(place)
    train_reader = paddle.batch(
        reader.train(data_args, train_file_list), batch_size=batch_size)
    test_reader = paddle.batch(
@@ -108,16 +145,23 @@ def train(args,
        print("Test {0}, map {1}".format(pass_id, test_map[0]))
    for pass_id in range(num_passes):
+        start_time = time.time()
+        prev_start_time = start_time
+        end_time = 0
        for batch_id, data in enumerate(train_reader()):
+            prev_start_time = start_time
+            start_time = time.time()
+            #print("Batch {} start at {:.2f}".format(batch_id, start_time))
            loss_v = exe.run(fluid.default_main_program(),
                             feed=feeder.feed(data),
                             fetch_list=[loss])
+            end_time = time.time()
            if batch_id % 20 == 0:
-                print("Pass {0}, batch {1}, loss {2}"
+                print("Pass {0}, batch {1}, loss {2}, time {3}".format(
-                      .format(pass_id, batch_id, loss_v[0]))
+                    pass_id, batch_id, loss_v[0], start_time - prev_start_time))
        test(pass_id)
-        if pass_id % 10 == 0:
+        if pass_id % 10 == 0 or pass_id == num_passes - 1:
            model_path = os.path.join(model_save_dir, str(pass_id))
            print 'save models to %s' % (model_path)
            fluid.io.save_inference_model(model_path, ['image'], [nmsed_out],
@@ -128,17 +172,21 @@ if __name__ == '__main__':
    args = parser.parse_args()
    print_arguments(args)
    data_args = reader.Settings(
-        data_dir='./data',
+        dataset=args.dataset,  # coco or pascalvoc
-        label_file='label_list',
+        toy=args.is_toy,
-        apply_distort=True,
+        data_dir=args.data_dir,
-        apply_expand=True,
+        label_file=args.label_file,
-        resize_h=300,
+        apply_distort=args.apply_distort,
-        resize_w=300,
+        apply_expand=args.apply_expand,
-        mean_value=[127.5, 127.5, 127.5])
+        resize_h=args.resize_h,
-    train(args,
+        resize_w=args.resize_w,
-          train_file_list='./data/trainval.txt',
+        mean_value=[args.mean_value_B, args.mean_value_G, args.mean_value_R])
-          val_file_list='./data/test.txt',
+    train(
-          data_args=data_args,
+        args,
-          learning_rate=0.001,
+        train_file_list=args.train_file_list,
-          batch_size=args.batch_size,
+        val_file_list=args.val_file_list,
-          num_passes=300)
+        data_args=data_args,
+        learning_rate=args.learning_rate,
+        batch_size=args.batch_size,
+        num_passes=args.num_passes,
+        model_save_dir=args.model_save_dir)
--- a/fluid/policy_gradient/brain.py
+++ b/fluid/policy_gradient/brain.py
@@ -30,32 +30,28 @@ class PolicyGradient:
        acts = fluid.layers.data(name='acts', shape=[1], dtype='int64')
        vt = fluid.layers.data(name='vt', shape=[1], dtype='float32')
        # fc1
-        fc1 = fluid.layers.fc(
+        fc1 = fluid.layers.fc(input=obs, size=10, act="tanh")  # tanh activation
-            input=obs,
-            size=10,
-            act="tanh"  # tanh activation
-        )
        # fc2
-        self.all_act_prob = fluid.layers.fc(input=fc1,
+        all_act_prob = fluid.layers.fc(input=fc1,
-                                            size=self.n_actions,
+                                       size=self.n_actions,
-                                            act="softmax")
+                                       act="softmax")
+        self.inferece_program = fluid.defaul_main_program().clone()
        # to maximize total reward (log_p * R) is to minimize -(log_p * R)
        neg_log_prob = fluid.layers.cross_entropy(
            input=self.all_act_prob,
            label=acts)  # this is negative log of chosen action
        neg_log_prob_weight = fluid.layers.elementwise_mul(x=neg_log_prob, y=vt)
        loss = fluid.layers.reduce_mean(
-            x=neg_log_prob_weight)  # reward guided loss
+            neg_log_prob_weight)  # reward guided loss
        sgd_optimizer = fluid.optimizer.SGD(self.lr)
        sgd_optimizer.minimize(loss)
        self.exe.run(fluid.default_startup_program())
    def choose_action(self, observation):
-        prob_weights = self.exe.run(
+        prob_weights = self.exe.run(self.inferece_program,
-            fluid.default_main_program().prune(self.all_act_prob),
+                                    feed={"obs": observation[np.newaxis, :]},
-            feed={"obs": observation[np.newaxis, :]},
+                                    fetch_list=[self.all_act_prob])
-            fetch_list=[self.all_act_prob])
        prob_weights = np.array(prob_weights[0])
        action = np.random.choice(
            range(prob_weights.shape[1]),