Merge branch 'develop' of https://github.com/PaddlePaddle/models into cloudtest2

05403680 · guosheng · 2c52abfa · b8cbdd33 · 05403680 · 05403680
407 changed file
--- a/.gitmodules
+++ b/.gitmodules
+[submodule "fluid/SimNet"]
+	path = fluid/SimNet
+	url = https://github.com/baidu/AnyQ.git
+[submodule "fluid/LAC"]
+	path = fluid/LAC
+	url = https://github.com/baidu/lac
+[submodule "fluid/Senta"]
+	path = fluid/Senta
+	url = https://github.com/baidu/Senta
--- a/README.md
+++ b/README.md
@@ -8,7 +8,7 @@ PaddlePaddle provides a rich set of computational units to enable users to adopt
 - [fluid models](fluid): use PaddlePaddle's Fluid APIs. We especially recommend users to use Fluid models.
- [v2 models](v2): use PaddlePaddle's v2 APIs.
+- [legacy models](legacy): use PaddlePaddle's v2 APIs.
 ## License

--- a/fluid/DeepASR/model_utils/model.py
+++ b/fluid/DeepASR/model_utils/model.py
@@ -2,7 +2,6 @@ from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
-import paddle.v2 as paddle
 import paddle.fluid as fluid

--- a/LAC @ 66660503
+++ b/LAC @ 66660503
+Subproject commit 66660503bb6e8f34adc4715ccf42cad77ed46ded
--- a/fluid/README.cn.rst
+++ b/fluid/README.cn.rst
@@ -49,14 +49,46 @@ Network,ICNet)进行语义分割，相比其他分割算法，ICNet兼顾了准
 -  `ICNet <https://github.com/PaddlePaddle/models/tree/develop/fluid/icnet>`__
+图像生成
+-----------
+图像生成是指根据输入向量，生成目标图像。这里的输入向量可以是随机的噪声或用户指定的条件向量。具体的应用场景有：手写体生成、人脸合成、风格迁移、图像修复等。当前的图像生成任务主要是借助生成对抗网络（GAN）来实现。
+生成对抗网络（GAN）由两种子网络组成：生成器和识别器。生成器的输入是随机噪声或条件向量，输出是目标图像。识别器是一个分类器，输入是一张图像，输出是该图像是否是真实的图像。在训练过程中，生成器和识别器通过不断的相互博弈提升自己的能力。
+在图像生成任务中，我们介绍了如何使用DCGAN和ConditioanlGAN来进行手写数字的生成，另外还介绍了用于风格迁移的CycleGAN.
+- `DCGAN & ConditionalGAN <https://github.com/PaddlePaddle/models/tree/develop/fluid/gan/c_gan>`__
+- `CycleGAN <https://github.com/PaddlePaddle/models/tree/develop/fluid/gan/cycle_gan>`__
 场景文字识别
 ------------
 许多场景图像中包含着丰富的文本信息，对理解图像信息有着重要作用，能够极大地帮助人们认知和理解场景图像的内容。场景文字识别是在图像背景复杂、分辨率低下、字体多样、分布随意等情况下，将图像信息转化为文字序列的过程，可认为是一种特别的翻译过程：将图像输入翻译为自然语言输出。场景图像文字识别技术的发展也促进了一些新型应用的产生，如通过自动识别路牌中的文字帮助街景应用获取更加准确的地址信息等。
-在场景文字识别任务中，我们介绍如何将基于CNN的图像特征提取和基于RNN的序列翻译技术结合，免除人工定义特征，避免字符分割，使用自动学习到的图像特征，完成端到端地无约束字符定位和识别。当前，介绍了CRNN-CTC模型，后续会引入基于注意力机制的序列到序列模型。
+在场景文字识别任务中，我们介绍如何将基于CNN的图像特征提取和基于RNN的序列翻译技术结合，免除人工定义特征，避免字符分割，使用自动学习到的图像特征，完成字符识别。当前，介绍了CRNN-CTC模型和基于注意力机制的序列到序列模型。
 -  `CRNN-CTC模型 <https://github.com/PaddlePaddle/models/tree/develop/fluid/ocr_recognition>`__
+-  `Attention模型 <https://github.com/PaddlePaddle/models/tree/develop/fluid/ocr_recognition>`__
+度量学习
+-------
+度量学习也称作距离度量学习、相似度学习，通过学习对象之间的距离，度量学习能够用于分析对象时间的关联、比较关系，在实际问题中应用较为广泛，可应用于辅助分类、聚类问题，也广泛用于图像检索、人脸识别等领域。以往，针对不同的任务，需要选择合适的特征并手动构建距离函数，而度量学习可根据不同的任务来自主学习出针对特定任务的度量距离函数。度量学习和深度学习的结合，在人脸识别/验证、行人再识别(human Re-ID)、图像检索等领域均取得较好的性能，在这个任务中我们主要介绍了基于Fluid的深度度量学习模型，包含了三元组、四元组等损失函数。
+- `Metric Learning <https://github.com/PaddlePaddle/models/tree/develop/fluid/metric_learning>`__
+视频分类
+-------
+视频分类是视频理解任务的基础，与图像分类不同的是，分类的对象不再是静止的图像，而是一个由多帧图像构成的、包含语音数据、包含运动信息等的视频对象，因此理解视频需要获得更多的上下文信息，不仅要理解每帧图像是什么、包含什么，还需要结合不同帧，知道上下文的关联信息。视频分类方法主要包含基于卷积神经网络、基于循环神经网络、或将这两者结合的方法。该任务中我们介绍基于Fluid的视频分类模型，目前包含Temporal Segment Network(TSN)模型，后续会持续增加更多模型。
+- `TSN <https://github.com/PaddlePaddle/models/tree/develop/fluid/video_classification>`__
 语音识别
 --------
@@ -124,6 +156,15 @@ DQN 及其变体，并测试了它们在 Atari 游戏中的表现。
 - `Senta <https://github.com/baidu/Senta/blob/master/README.md>`__
+语义匹配
+--------
+在自然语言处理很多场景中，需要度量两个文本在语义上的相似度，这类任务通常被称为语义匹配。例如在搜索中根据查询与候选文档的相似度对搜索结果进行排序，文本去重中文本与文本相似度的计算，自动问答中候选答案与问题的匹配等。
+本例所开放的DAM (Deep Attention Matching Network)为百度自然语言处理部发表于ACL-2018的工作，用于检索式聊天机器人多轮对话中应答的选择。DAM受Transformer的启发，其网络结构完全基于注意力(attention)机制，利用栈式的self-attention结构分别学习不同粒度下应答和语境的语义表示，然后利用cross-attention获取应答与语境之间的相关性，在两个大规模多轮对话数据集上的表现均好于其它模型。
+- `Deep Attention Matching Network <https://github.com/PaddlePaddle/models/tree/develop/fluid/deep_attention_matching_net>`__ 
 AnyQ
 ----
@@ -135,3 +176,12 @@ SimNet是百度自然语言处理部于2013年自主研发的语义匹配框架
 -  `SimNet in PaddlePaddle
   Fluid <https://github.com/baidu/AnyQ/blob/master/tools/simnet/train/paddle/README.md>`__
+机器阅读理解
+----
+机器阅读理解(MRC)是自然语言处理(NLP)中的核心任务之一，最终目标是让机器像人类一样阅读文本，提炼文本信息并回答相关问题。深度学习近年来在NLP中得到广泛使用，也使得机器阅读理解能力在近年有了大幅提高，但是目前研究的机器阅读理解都采用人工构造的数据集，以及回答一些相对简单的问题，和人类处理的数据还有明显差距，因此亟需大规模真实训练数据推动MRC的进一步发展。
+百度阅读理解数据集是由百度自然语言处理部开源的一个真实世界数据集，所有的问题、原文都来源于实际数据(百度搜索引擎数据和百度知道问答社区)，答案是由人类回答的。每个问题都对应多个答案，数据集包含200k问题、1000k原文和420k答案，是目前最大的中文MRC数据集。百度同时开源了对应的阅读理解模型，称为DuReader，采用当前通用的网络分层结构，通过双向attention机制捕捉问题和原文之间的交互关系，生成query-aware的原文表示，最终基于query-aware的原文表示通过point network预测答案范围。
+-  `DuReader in PaddlePaddle Fluid] <https://github.com/PaddlePaddle/models/blob/develop/fluid/machine_reading_comprehension/README.md>`__
--- a/fluid/README.md
+++ b/fluid/README.md
@@ -28,8 +28,11 @@ Fluid模型配置和参数文件的工具。
 开放环境中的检测人脸，尤其是小的、模糊的和部分遮挡的人脸也是一个具有挑战的任务。我们也介绍了如何基于 [WIDER FACE](http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace) 数据训练百度自研的人脸检测PyramidBox模型，该算法于2018年3月份在WIDER FACE的多项评测中均获得 [第一名](http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/WiderFace_Results.html)。
+Faster RCNN 是典型的两阶段目标检测器，相较于传统提取区域的方法，Faster RCNN中RPN网络通过共享卷积层参数大幅提高提取区域的效率，并提出高质量的候选区域。
 -  [Single Shot MultiBox Detector](https://github.com/PaddlePaddle/models/blob/develop/fluid/object_detection/README_cn.md)
 -  [Face Detector: PyramidBox](https://github.com/PaddlePaddle/models/tree/develop/fluid/face_detection/README_cn.md)
+-  [Faster RCNN](https://github.com/PaddlePaddle/models/tree/develop/fluid/faster_rcnn/README_cn.md)
 图像语义分割
 ------------
@@ -41,14 +44,45 @@ Network,ICNet)进行语义分割，相比其他分割算法，ICNet兼顾了准
 -  [ICNet](https://github.com/PaddlePaddle/models/tree/develop/fluid/icnet)
+图像生成
+-----------
+图像生成是指根据输入向量，生成目标图像。这里的输入向量可以是随机的噪声或用户指定的条件向量。具体的应用场景有：手写体生成、人脸合成、风格迁移、图像修复等。当前的图像生成任务主要是借助生成对抗网络（GAN）来实现。
+生成对抗网络（GAN）由两种子网络组成：生成器和识别器。生成器的输入是随机噪声或条件向量，输出是目标图像。识别器是一个分类器，输入是一张图像，输出是该图像是否是真实的图像。在训练过程中，生成器和识别器通过不断的相互博弈提升自己的能力。
+在图像生成任务中，我们介绍了如何使用DCGAN和ConditioanlGAN来进行手写数字的生成，另外还介绍了用于风格迁移的CycleGAN.
+- [DCGAN & ConditionalGAN](https://github.com/PaddlePaddle/models/tree/develop/fluid/gan/c_gan)
+- [CycleGAN](https://github.com/PaddlePaddle/models/tree/develop/fluid/gan/cycle_gan)
 场景文字识别
 ------------
 许多场景图像中包含着丰富的文本信息，对理解图像信息有着重要作用，能够极大地帮助人们认知和理解场景图像的内容。场景文字识别是在图像背景复杂、分辨率低下、字体多样、分布随意等情况下，将图像信息转化为文字序列的过程，可认为是一种特别的翻译过程：将图像输入翻译为自然语言输出。场景图像文字识别技术的发展也促进了一些新型应用的产生，如通过自动识别路牌中的文字帮助街景应用获取更加准确的地址信息等。
-在场景文字识别任务中，我们介绍如何将基于CNN的图像特征提取和基于RNN的序列翻译技术结合，免除人工定义特征，避免字符分割，使用自动学习到的图像特征，完成端到端地无约束字符定位和识别。当前，介绍了CRNN-CTC模型，后续会引入基于注意力机制的序列到序列模型。
+在场景文字识别任务中，我们介绍如何将基于CNN的图像特征提取和基于RNN的序列翻译技术结合，免除人工定义特征，避免字符分割，使用自动学习到的图像特征，完成字符识别。当前，介绍了CRNN-CTC模型和基于注意力机制的序列到序列模型。
+-  [CRNN-CTC模型](https://github.com/PaddlePaddle/models/tree/develop/fluid/ocr_recognition)
+-  [Attention模型](https://github.com/PaddlePaddle/models/tree/develop/fluid/ocr_recognition)
+度量学习
+-------
+度量学习也称作距离度量学习、相似度学习，通过学习对象之间的距离，度量学习能够用于分析对象时间的关联、比较关系，在实际问题中应用较为广泛，可应用于辅助分类、聚类问题，也广泛用于图像检索、人脸识别等领域。以往，针对不同的任务，需要选择合适的特征并手动构建距离函数，而度量学习可根据不同的任务来自主学习出针对特定任务的度量距离函数。度量学习和深度学习的结合，在人脸识别/验证、行人再识别(human Re-ID)、图像检索等领域均取得较好的性能，在这个任务中我们主要介绍了基于Fluid的深度度量学习模型，包含了三元组、四元组等损失函数。
+- [Metric Learning](https://github.com/PaddlePaddle/models/tree/develop/fluid/metric_learning)
+视频分类
+-------
+视频分类是视频理解任务的基础，与图像分类不同的是，分类的对象不再是静止的图像，而是一个由多帧图像构成的、包含语音数据、包含运动信息等的视频对象，因此理解视频需要获得更多的上下文信息，不仅要理解每帧图像是什么、包含什么，还需要结合不同帧，知道上下文的关联信息。视频分类方法主要包含基于卷积神经网络、基于循环神经网络、或将这两者结合的方法。该任务中我们介绍基于Fluid的视频分类模型，目前包含Temporal Segment Network(TSN)模型，后续会持续增加更多模型。
+- [TSN](https://github.com/PaddlePaddle/models/tree/develop/fluid/video_classification)
-  [CRNN-CTC模](https://github.com/PaddlePaddle/models/tree/develop/fluid/ocr_recognition)
 语音识别
 --------
@@ -94,6 +128,15 @@ Machine Translation, NMT)等阶段。在 NMT 成熟后，机器翻译才真正
 - [Senta](https://github.com/baidu/Senta/blob/master/README.md)
+语义匹配
+--------
+在自然语言处理很多场景中，需要度量两个文本在语义上的相似度，这类任务通常被称为语义匹配。例如在搜索中根据查询与候选文档的相似度对搜索结果进行排序，文本去重中文本与文本相似度的计算，自动问答中候选答案与问题的匹配等。
+本例所开放的DAM (Deep Attention Matching Network)为百度自然语言处理部发表于ACL-2018的工作，用于检索式聊天机器人多轮对话中应答的选择。DAM受Transformer的启发，其网络结构完全基于注意力(attention)机制，利用栈式的self-attention结构分别学习不同粒度下应答和语境的语义表示，然后利用cross-attention获取应答与语境之间的相关性，在两个大规模多轮对话数据集上的表现均好于其它模型。
+- [Deep Attention Matching Network](https://github.com/PaddlePaddle/models/tree/develop/fluid/deep_attention_matching_net)
 AnyQ
 ----
@@ -102,3 +145,12 @@ AnyQ
 SimNet是百度自然语言处理部于2013年自主研发的语义匹配框架，该框架在百度各产品上广泛应用，主要包括BOW、CNN、RNN、MM-DNN等核心网络结构形式，同时基于该框架也集成了学术界主流的语义匹配模型，如MatchPyramid、MV-LSTM、K-NRM等模型。使用SimNet构建出的模型可以便捷的加入AnyQ系统中，增强AnyQ系统的语义匹配能力。
 -  [SimNet in PaddlePaddle Fluid](https://github.com/baidu/AnyQ/blob/master/tools/simnet/train/paddle/README.md)
+机器阅读理解
+----------
+机器阅读理解(MRC)是自然语言处理(NLP)中的核心任务之一，最终目标是让机器像人类一样阅读文本，提炼文本信息并回答相关问题。深度学习近年来在NLP中得到广泛使用，也使得机器阅读理解能力在近年有了大幅提高，但是目前研究的机器阅读理解都采用人工构造的数据集，以及回答一些相对简单的问题，和人类处理的数据还有明显差距，因此亟需大规模真实训练数据推动MRC的进一步发展。
+百度阅读理解数据集是由百度自然语言处理部开源的一个真实世界数据集，所有的问题、原文都来源于实际数据(百度搜索引擎数据和百度知道问答社区)，答案是由人类回答的。每个问题都对应多个答案，数据集包含200k问题、1000k原文和420k答案，是目前最大的中文MRC数据集。百度同时开源了对应的阅读理解模型，称为DuReader，采用当前通用的网络分层结构，通过双向attention机制捕捉问题和原文之间的交互关系，生成query-aware的原文表示，最终基于query-aware的原文表示通过point network预测答案范围。
+-  [DuReader in PaddlePaddle Fluid](https://github.com/PaddlePaddle/models/blob/develop/fluid/machine_reading_comprehension/README.md)
--- a/Senta @ 870651e2
+++ b/Senta @ 870651e2
+Subproject commit 870651e257750f2c237f0b0bc9a27e5d062d1909
--- a/SimNet @ 4dbe7f7b
+++ b/SimNet @ 4dbe7f7b
+Subproject commit 4dbe7f7b0e76c188eb7f448d104f0165f0a12229
--- a/fluid/adversarial/tutorials/mnist_model.py
+++ b/fluid/adversarial/tutorials/mnist_model.py
 """
 CNN on mnist data using fluid api of paddlepaddle
 """
-import paddle.v2 as paddle
+import paddle
 import paddle.fluid as fluid

--- a/fluid/adversarial/tutorials/mnist_tutorial_bim.py
+++ b/fluid/adversarial/tutorials/mnist_tutorial_bim.py
@@ -8,7 +8,7 @@ sys.path.append("..")
 import matplotlib.pyplot as plt
 import paddle.fluid as fluid
-import paddle.v2 as paddle
+import paddle
 from advbox.adversary import Adversary
 from advbox.attacks.gradient_method import BIM

--- a/fluid/adversarial/tutorials/mnist_tutorial_deepfool.py
+++ b/fluid/adversarial/tutorials/mnist_tutorial_deepfool.py
@@ -8,7 +8,7 @@ sys.path.append("..")
 import matplotlib.pyplot as plt
 import paddle.fluid as fluid
-import paddle.v2 as paddle
+import paddle
 from advbox.adversary import Adversary
 from advbox.attacks.deepfool import DeepFoolAttack

--- a/fluid/adversarial/tutorials/mnist_tutorial_fgsm.py
+++ b/fluid/adversarial/tutorials/mnist_tutorial_fgsm.py
@@ -8,7 +8,7 @@ sys.path.append("..")
 import matplotlib.pyplot as plt
 import numpy as np
 import paddle.fluid as fluid
-import paddle.v2 as paddle
+import paddle
 from advbox.adversary import Adversary
 from advbox.attacks.gradient_method import FGSM

--- a/fluid/adversarial/tutorials/mnist_tutorial_ilcm.py
+++ b/fluid/adversarial/tutorials/mnist_tutorial_ilcm.py
@@ -7,7 +7,7 @@ sys.path.append("..")
 import matplotlib.pyplot as plt
 import paddle.fluid as fluid
-import paddle.v2 as paddle
+import paddle
 from advbox.adversary import Adversary
 from advbox.attacks.gradient_method import ILCM

--- a/fluid/adversarial/tutorials/mnist_tutorial_jsma.py
+++ b/fluid/adversarial/tutorials/mnist_tutorial_jsma.py
@@ -7,7 +7,7 @@ sys.path.append("..")
 import matplotlib.pyplot as plt
 import paddle.fluid as fluid
-import paddle.v2 as paddle
+import paddle
 from advbox.adversary import Adversary
 from advbox.attacks.saliency import JSMA

--- a/fluid/adversarial/tutorials/mnist_tutorial_lbfgs.py
+++ b/fluid/adversarial/tutorials/mnist_tutorial_lbfgs.py
@@ -7,7 +7,7 @@ sys.path.append("..")
 import matplotlib.pyplot as plt
 import paddle.fluid as fluid
-import paddle.v2 as paddle
+import paddle
 from advbox.adversary import Adversary
 from advbox.attacks.lbfgs import LBFGS

--- a/fluid/adversarial/tutorials/mnist_tutorial_mifgsm.py
+++ b/fluid/adversarial/tutorials/mnist_tutorial_mifgsm.py
@@ -9,7 +9,7 @@ sys.path.append("..")
 import matplotlib.pyplot as plt
 import numpy as np
 import paddle.fluid as fluid
-import paddle.v2 as paddle
+import paddle
 from advbox.adversary import Adversary
 from advbox.attacks.gradient_method import MIFGSM

--- a/fluid/deep_attention_matching_net/model.py
+++ b/fluid/deep_attention_matching_net/model.py
-import cPickle as pickle
+import six
 import numpy as np
 import paddle.fluid as fluid
 import utils.layers as layers
@@ -29,7 +29,7 @@ class Net(object):
        mask_cache = dict() if self.use_mask_cache else None
        turns_data = []
-        for i in xrange(self._max_turn_num):
+        for i in six.moves.xrange(self._max_turn_num):
            turn = fluid.layers.data(
                name="turn_%d" % i,
                shape=[self._max_turn_len, 1],
@@ -37,7 +37,7 @@ class Net(object):
            turns_data.append(turn)
        turns_mask = []
-        for i in xrange(self._max_turn_num):
+        for i in six.moves.xrange(self._max_turn_num):
            turn_mask = fluid.layers.data(
                name="turn_mask_%d" % i,
                shape=[self._max_turn_len, 1],
@@ -64,7 +64,7 @@ class Net(object):
        Hr = response_emb
        Hr_stack = [Hr]
-        for index in range(self._stack_num):
+        for index in six.moves.xrange(self._stack_num):
            Hr = layers.block(
                name="response_self_stack" + str(index),
                query=Hr,
@@ -78,7 +78,7 @@ class Net(object):
        # context part
        sim_turns = []
-        for t in xrange(self._max_turn_num):
+        for t in six.moves.xrange(self._max_turn_num):
            Hu = fluid.layers.embedding(
                input=turns_data[t],
                size=[self._vocab_size + 1, self._emb_size],
@@ -88,7 +88,7 @@ class Net(object):
                    initializer=fluid.initializer.Normal(scale=0.1)))
            Hu_stack = [Hu]
-            for index in range(self._stack_num):
+            for index in six.moves.xrange(self._stack_num):
                # share parameters
                Hu = layers.block(
                    name="turn_self_stack" + str(index),
@@ -104,7 +104,7 @@ class Net(object):
            # cross attention 
            r_a_t_stack = []
            t_a_r_stack = []
-            for index in range(self._stack_num + 1):
+            for index in six.moves.xrange(self._stack_num + 1):
                t_a_r = layers.block(
                    name="t_attend_r_" + str(index),
                    query=Hu_stack[index],
@@ -134,7 +134,7 @@ class Net(object):
                t_a_r = fluid.layers.stack(t_a_r_stack, axis=1)
                r_a_t = fluid.layers.stack(r_a_t_stack, axis=1)
            else:
-                for index in xrange(len(t_a_r_stack)):
+                for index in six.moves.xrange(len(t_a_r_stack)):
                    t_a_r_stack[index] = fluid.layers.unsqueeze(
                        input=t_a_r_stack[index], axes=[1])
                    r_a_t_stack[index] = fluid.layers.unsqueeze(
@@ -151,7 +151,7 @@ class Net(object):
        if self.use_stack_op:
            sim = fluid.layers.stack(sim_turns, axis=2)
        else:
-            for index in xrange(len(sim_turns)):
+            for index in six.moves.xrange(len(sim_turns)):
                sim_turns[index] = fluid.layers.unsqueeze(
                    input=sim_turns[index], axes=[2])
            # sim shape: [batch_size, 2*(stack_num+1), max_turn_num, max_turn_len, max_turn_len]

--- a/fluid/deep_attention_matching_net/test_and_evaluate.py
+++ b/fluid/deep_attention_matching_net/test_and_evaluate.py
 import os
+import six
 import numpy as np
 import time
 import argparse
@@ -6,8 +7,12 @@ import multiprocessing
 import paddle
 import paddle.fluid as fluid
 import utils.reader as reader
-import cPickle as pickle
+from utils.util import print_arguments, mkdir
-from utils.util import print_arguments
+try:
+    import cPickle as pickle  #python 2
+except ImportError as e:
+    import pickle  #python 3
 from model import Net
@@ -107,7 +112,7 @@ def parse_args():
 def test(args):
    if not os.path.exists(args.save_path):
-        raise ValueError("Invalid save path %s" % args.save_path)
+        mkdir(args.save_path)
    if not os.path.exists(args.model_path):
        raise ValueError("Invalid model init path %s" % args.model_path)
    # data data_config
@@ -158,7 +163,11 @@ def test(args):
        use_cuda=args.use_cuda, main_program=test_program)
    print("start loading data ...")
-    train_data, val_data, test_data = pickle.load(open(args.data_path, 'rb'))
+    with open(args.data_path, 'rb') as f:
+        if six.PY2:
+            train_data, val_data, test_data = pickle.load(f)
+        else:
+            train_data, val_data, test_data = pickle.load(f, encoding="bytes")
    print("finish loading data ...")
    if args.ext_eval:
@@ -178,9 +187,9 @@ def test(args):
    score_path = os.path.join(args.save_path, 'score.txt')
    score_file = open(score_path, 'w')
-    for it in xrange(test_batch_num // dev_count):
+    for it in six.moves.xrange(test_batch_num // dev_count):
        feed_list = []
-        for dev in xrange(dev_count):
+        for dev in six.moves.xrange(dev_count):
            index = it * dev_count + dev
            feed_dict = reader.make_one_batch_input(test_batches, index)
            feed_list.append(feed_dict)
@@ -190,9 +199,9 @@ def test(args):
        scores = np.array(predicts[0])
        print("step = %d" % it)
-        for dev in xrange(dev_count):
+        for dev in six.moves.xrange(dev_count):
            index = it * dev_count + dev
-            for i in xrange(args.batch_size):
+            for i in six.moves.xrange(args.batch_size):
                score_file.write(
                    str(scores[args.batch_size * dev + i][0]) + '\t' + str(
                        test_batches["label"][index][i]) + '\n')

--- a/fluid/deep_attention_matching_net/train_and_evaluate.py
+++ b/fluid/deep_attention_matching_net/train_and_evaluate.py
 import os
+import six
 import numpy as np
 import time
 import argparse
@@ -6,9 +7,13 @@ import multiprocessing
 import paddle
 import paddle.fluid as fluid
 import utils.reader as reader
-import cPickle as pickle
 from utils.util import print_arguments
+try:
+    import cPickle as pickle  #python 2
+except ImportError as e:
+    import pickle  #python 3
 from model import Net
@@ -164,35 +169,45 @@ def train(args):
    if args.word_emb_init is not None:
        print("start loading word embedding init ...")
-        word_emb = np.array(pickle.load(open(args.word_emb_init, 'rb'))).astype(
+        if six.PY2:
-            'float32')
+            word_emb = np.array(pickle.load(open(args.word_emb_init,
+                                                 'rb'))).astype('float32')
+        else:
+            word_emb = np.array(
+                pickle.load(
+                    open(args.word_emb_init, 'rb'), encoding="bytes")).astype(
+                        'float32')
        dam.set_word_embedding(word_emb, place)
        print("finish init word embedding  ...")
    print("start loading data ...")
-    train_data, val_data, test_data = pickle.load(open(args.data_path, 'rb'))
+    with open(args.data_path, 'rb') as f:
+        if six.PY2:
+            train_data, val_data, test_data = pickle.load(f)
+        else:
+            train_data, val_data, test_data = pickle.load(f, encoding="bytes")
    print("finish loading data ...")
    val_batches = reader.build_batches(val_data, data_conf)
-    batch_num = len(train_data['y']) / args.batch_size
+    batch_num = len(train_data[six.b('y')]) // args.batch_size
    val_batch_num = len(val_batches["response"])
-    print_step = max(1, batch_num / (dev_count * 100))
+    print_step = max(1, batch_num // (dev_count * 100))
-    save_step = max(1, batch_num / (dev_count * 10))
+    save_step = max(1, batch_num // (dev_count * 10))
    print("begin model training ...")
    print(time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(time.time())))
    step = 0
-    for epoch in xrange(args.num_scan_data):
+    for epoch in six.moves.xrange(args.num_scan_data):
        shuffle_train = reader.unison_shuffle(train_data)
        train_batches = reader.build_batches(shuffle_train, data_conf)
        ave_cost = 0.0
-        for it in xrange(batch_num // dev_count):
+        for it in six.moves.xrange(batch_num // dev_count):
            feed_list = []
-            for dev in xrange(dev_count):
+            for dev in six.moves.xrange(dev_count):
                index = it * dev_count + dev
                feed_dict = reader.make_one_batch_input(train_batches, index)
                feed_list.append(feed_dict)
@@ -215,9 +230,9 @@ def train(args):
                score_path = os.path.join(args.save_path, 'score.' + str(step))
                score_file = open(score_path, 'w')
-                for it in xrange(val_batch_num // dev_count):
+                for it in six.moves.xrange(val_batch_num // dev_count):
                    feed_list = []
-                    for dev in xrange(dev_count):
+                    for dev in six.moves.xrange(dev_count):
                        val_index = it * dev_count + dev
                        feed_dict = reader.make_one_batch_input(val_batches,
                                                                val_index)
@@ -227,9 +242,9 @@ def train(args):
                                            fetch_list=[logits.name])
                    scores = np.array(predicts[0])
-                    for dev in xrange(dev_count):
+                    for dev in six.moves.xrange(dev_count):
                        val_index = it * dev_count + dev
-                        for i in xrange(args.batch_size):
+                        for i in six.moves.xrange(args.batch_size):
                            score_file.write(
                                str(scores[args.batch_size * dev + i][0]) + '\t'
                                + str(val_batches["label"][val_index][

--- a/fluid/deep_attention_matching_net/utils/douban_evaluation.py
+++ b/fluid/deep_attention_matching_net/utils/douban_evaluation.py
 import sys
+import six
 import numpy as np
 from sklearn.metrics import average_precision_score
@@ -7,7 +8,7 @@ def mean_average_precision(sort_data):
    #to do
    count_1 = 0
    sum_precision = 0
-    for index in range(len(sort_data)):
+    for index in six.moves.xrange(len(sort_data)):
        if sort_data[index][1] == 1:
            count_1 += 1
            sum_precision += 1.0 * count_1 / (index + 1)

--- a/fluid/deep_attention_matching_net/utils/evaluation.py
+++ b/fluid/deep_attention_matching_net/utils/evaluation.py
 import sys
+import six
 def get_p_at_n_in_m(data, n, m, ind):
@@ -30,9 +31,9 @@ def evaluate(file_path):
    p_at_2_in_10 = 0.0
    p_at_5_in_10 = 0.0
-    length = len(data) / 10
+    length = len(data) // 10
-    for i in xrange(0, length):
+    for i in six.moves.xrange(0, length):
        ind = i * 10
        assert data[ind][1] == 1

--- a/fluid/deep_attention_matching_net/utils/reader.py
+++ b/fluid/deep_attention_matching_net/utils/reader.py
-import cPickle as pickle
+import six
 import numpy as np
+try:
+    import cPickle as pickle  #python 2
+except ImportError as e:
+    import pickle  #python 3
 def unison_shuffle(data, seed=None):
    if seed is not None:
        np.random.seed(seed)
-    y = np.array(data['y'])
+    y = np.array(data[six.b('y')])
-    c = np.array(data['c'])
+    c = np.array(data[six.b('c')])
-    r = np.array(data['r'])
+    r = np.array(data[six.b('r')])
    assert len(y) == len(c) == len(r)
    p = np.random.permutation(len(y))
-    shuffle_data = {'y': y[p], 'c': c[p], 'r': r[p]}
+    shuffle_data = {six.b('y'): y[p], six.b('c'): c[p], six.b('r'): r[p]}
    return shuffle_data
@@ -65,9 +70,9 @@ def produce_one_sample(data,
       max_turn_len=50
       return y, nor_turns_nor_c, nor_r, turn_len, term_len, r_len
    '''
-    c = data['c'][index]
+    c = data[six.b('c')][index]
-    r = data['r'][index][:]
+    r = data[six.b('r')][index][:]
-    y = data['y'][index]
+    y = data[six.b('y')][index]
    turns = split_c(c, split_id)
    #normalize turns_c length, nor_turns length is max_turn_num
@@ -101,7 +106,7 @@ def build_one_batch(data,
    _label = []
-    for i in range(conf['batch_size']):
+    for i in six.moves.xrange(conf['batch_size']):
        index = batch_index * conf['batch_size'] + i
        y, nor_turns_nor_c, nor_r, turn_len, term_len, r_len = produce_one_sample(
            data, index, conf['_EOS_'], conf['max_turn_num'],
@@ -145,8 +150,8 @@ def build_batches(data, conf, turn_cut_type='tail', term_cut_type='tail'):
    _label_batches = []
-    batch_len = len(data['y']) / conf['batch_size']
+    batch_len = len(data[six.b('y')]) // conf['batch_size']
-    for batch_index in range(batch_len):
+    for batch_index in six.moves.range(batch_len):
        _turns, _tt_turns_len, _every_turn_len, _response, _response_len, _label = build_one_batch(
            data, batch_index, conf, turn_cut_type='tail', term_cut_type='tail')
@@ -192,8 +197,10 @@ def make_one_batch_input(data_batches, index):
    max_turn_num = turns.shape[1]
    max_turn_len = turns.shape[2]
-    turns_list = [turns[:, i, :] for i in xrange(max_turn_num)]
+    turns_list = [turns[:, i, :] for i in six.moves.xrange(max_turn_num)]
-    every_turn_len_list = [every_turn_len[:, i] for i in xrange(max_turn_num)]
+    every_turn_len_list = [
+        every_turn_len[:, i] for i in six.moves.xrange(max_turn_num)
+    ]
    feed_dict = {}
    for i, turn in enumerate(turns_list):
@@ -204,7 +211,7 @@ def make_one_batch_input(data_batches, index):
    for i, turn_len in enumerate(every_turn_len_list):
        feed_dict["turn_mask_%d" % i] = np.ones(
            (batch_size, max_turn_len, 1)).astype("float32")
-        for row in xrange(batch_size):
+        for row in six.moves.xrange(batch_size):
            feed_dict["turn_mask_%d" % i][row, turn_len[row]:, 0] = 0
    feed_dict["response"] = response
@@ -212,7 +219,7 @@ def make_one_batch_input(data_batches, index):
    feed_dict["response_mask"] = np.ones(
        (batch_size, max_turn_len, 1)).astype("float32")
-    for row in xrange(batch_size):
+    for row in six.moves.xrange(batch_size):
        feed_dict["response_mask"][row, response_len[row]:, 0] = 0
    feed_dict["label"] = np.array([data_batches["label"][index]]).reshape(
@@ -228,14 +235,14 @@ if __name__ == '__main__':
        "max_turn_len": 50,
        "_EOS_": 28270,
    }
-    train, val, test = pickle.load(open('../data/ubuntu/data_small.pkl', 'rb'))
+    with open('../ubuntu/data/data_small.pkl', 'rb') as f:
+        if six.PY2:
+            train, val, test = pickle.load(f)
+        else:
+            train, val, test = pickle.load(f, encoding="bytes")
    print('load data success')
    train_batches = build_batches(train, conf)
    val_batches = build_batches(val, conf)
    test_batches = build_batches(test, conf)
    print('build batches success')
-    pickle.dump([train_batches, val_batches, test_batches],
-                open('../data/ubuntu/data_small_xxx.pkl', 'wb'))
-    print('dump success')
--- a/fluid/deep_attention_matching_net/utils/util.py
+++ b/fluid/deep_attention_matching_net/utils/util.py
+import six
+import os
 def print_arguments(args):
    print('-----------  Configuration Arguments -----------')
-    for arg, value in sorted(vars(args).iteritems()):
+    for arg, value in sorted(six.iteritems(vars(args))):
        print('%s: %s' % (arg, value))
    print('------------------------------------------------')
+def mkdir(path):
+    if not os.path.isdir(path):
+        mkdir(os.path.split(path)[0])
+    else:
+        return
+    os.mkdir(path)
 def pos_encoding_init():
    pass

--- a/fluid/deeplabv3+/.gitignore
+++ b/fluid/deeplabv3+/.gitignore
+deeplabv3plus_xception65_initialize.params
+deeplabv3plus.params
+deeplabv3plus.tar.gz
--- a/fluid/deeplabv3+/README.md
+++ b/fluid/deeplabv3+/README.md
-DeepLab运行本目录下的程序示例需要使用PaddlePaddle develop最新版本。如果您的PaddlePaddle安装版本低于此要求，请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。
+DeepLab运行本目录下的程序示例需要使用PaddlePaddle Fluid v1.0.0版本或以上。如果您的PaddlePaddle安装版本低于此要求，请按照安装文档中的说明更新PaddlePaddle安装版本，如果使用GPU，该程序需要使用cuDNN v7版本。
 ## 代码结构
@@ -41,10 +41,12 @@ data/cityscape/
 如果需要从头开始训练模型，用户需要下载我们的初始化模型
 ```
 wget http://paddlemodels.cdn.bcebos.com/deeplab/deeplabv3plus_xception65_initialize.tar.gz
+tar -xf deeplabv3plus_xception65_initialize.tar.gz && rm deeplabv3plus_xception65_initialize.tar.gz
 ```
 如果需要最终训练模型进行fine tune或者直接用于预测，请下载我们的最终模型
 ```
 wget http://paddlemodels.cdn.bcebos.com/deeplab/deeplabv3plus.tar.gz
+tar -xf deeplabv3plus.tar.gz && rm deeplabv3plus.tar.gz
 ```
@@ -70,11 +72,11 @@ python train.py --help
 ```
 python ./train.py \
    --batch_size=8 \
-    --parallel=true
+    --parallel=true \
    --train_crop_size=769 \
    --total_step=90000 \
-    --init_weights_path=$INIT_WEIGHTS_PATH \
+    --init_weights_path=deeplabv3plus_xception65_initialize.params \
-    --save_weights_path=$SAVE_WEIGHTS_PATH \
+    --save_weights_path=output \
    --dataset_path=$DATASET_PATH
 ```
@@ -82,11 +84,10 @@ python ./train.py \
 执行以下命令在`Cityscape`测试数据集上进行测试：
 ```
 python ./eval.py \
-    --init_weights_path=$INIT_WEIGHTS_PATH \
+    --init_weights=deeplabv3plus.params \
    --dataset_path=$DATASET_PATH
 ```
-需要通过选项`--model_path`指定模型文件。
+需要通过选项`--model_path`指定模型文件。测试脚本的输出的评估指标为mean IoU。
-测试脚本的输出的评估指标为[mean IoU]()。
 ## 实验结果

--- a/fluid/deeplabv3+/eval.py
+++ b/fluid/deeplabv3+/eval.py
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
 import os
 os.environ['FLAGS_fraction_of_gpu_memory_to_use'] = '0.98'
@@ -91,7 +94,7 @@ exe = fluid.Executor(place)
 exe.run(sp)
 if args.init_weights_path:
-    print "load from:", args.init_weights_path
+    print("load from:", args.init_weights_path)
    load_model()
 dataset = CityscapeDataset(args.dataset_path, 'val')
@@ -118,7 +121,7 @@ for i, imgs, labels, names in batches:
    mp = (wrong + right) != 0
    miou2 = np.mean((right[mp] * 1.0 / (right[mp] + wrong[mp])))
    if args.verbose:
-        print 'step: %s, mIoU: %s' % (i + 1, miou2)
+        print('step: %s, mIoU: %s' % (i + 1, miou2))
    else:
-        print '\rstep: %s, mIoU: %s' % (i + 1, miou2),
+        print('\rstep: %s, mIoU: %s' % (i + 1, miou2))
        sys.stdout.flush()
--- a/fluid/deeplabv3+/models.py
+++ b/fluid/deeplabv3+/models.py
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
 import paddle
 import paddle.fluid as fluid
@@ -50,7 +53,7 @@ def append_op_result(result, name):
 def conv(*args, **kargs):
    kargs['param_attr'] = name_scope + 'weights'
-    if kargs.has_key('bias_attr') and kargs['bias_attr']:
+    if 'bias_attr' in kargs and kargs['bias_attr']:
        kargs['bias_attr'] = name_scope + 'biases'
    else:
        kargs['bias_attr'] = False
@@ -62,7 +65,7 @@ def group_norm(input, G, eps=1e-5, param_attr=None, bias_attr=None):
    N, C, H, W = input.shape
    if C % G != 0:
-        print "group can not divide channle:", C, G
+        print("group can not divide channle:", C, G)
        for d in range(10):
            for t in [d, -d]:
                if G + t <= 0: continue
@@ -70,7 +73,7 @@ def group_norm(input, G, eps=1e-5, param_attr=None, bias_attr=None):
                    G = G + t
                    break
            if C % G == 0:
-                print "use group size:", G
+                print("use group size:", G)
                break
    assert C % G == 0
    param_shape = (G, )
@@ -139,7 +142,7 @@ def seq_conv(input, channel, stride, filter, dilation=1, act=None):
            filter,
            stride,
            groups=input.shape[1],
-            padding=(filter / 2) * dilation,
+            padding=(filter // 2) * dilation,
            dilation=dilation)
        input = bn(input)
        if act: input = act(input)

--- a/fluid/deeplabv3+/reader.py
+++ b/fluid/deeplabv3+/reader.py
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
 import cv2
 import numpy as np
+import os
+import six
 default_config = {
    "shuffle": True,
@@ -30,7 +35,7 @@ def slice_with_pad(a, s, value=0):
                pr = 0
            pads.append([pl, pr])
            slices.append([l, r])
-    slices = map(lambda x: slice(x[0], x[1], 1), slices)
+    slices = list(map(lambda x: slice(x[0], x[1], 1), slices))
    a = a[slices]
    a = np.pad(a, pad_width=pads, mode='constant', constant_values=value)
    return a
@@ -38,11 +43,17 @@ def slice_with_pad(a, s, value=0):
 class CityscapeDataset:
    def __init__(self, dataset_dir, subset='train', config=default_config):
-        import commands
+        label_dirname = os.path.join(dataset_dir, 'gtFine/' + subset)
-        label_dirname = dataset_dir + 'gtFine/' + subset
+        if six.PY2:
-        label_files = commands.getoutput(
+            import commands
-            "find %s -type f | grep labelTrainIds | sort" %
+            label_files = commands.getoutput(
-            label_dirname).splitlines()
+                "find %s -type f | grep labelTrainIds | sort" %
+                label_dirname).splitlines()
+        else:
+            import subprocess
+            label_files = subprocess.getstatusoutput(
+                "find %s -type f | grep labelTrainIds | sort" %
+                label_dirname)[-1].splitlines()
        self.label_files = label_files
        self.label_dirname = label_dirname
        self.index = 0
@@ -50,7 +61,7 @@ class CityscapeDataset:
        self.dataset_dir = dataset_dir
        self.config = config
        self.reset()
-        print "total number", len(label_files)
+        print("total number", len(label_files))
    def reset(self, shuffle=False):
        self.index = 0
@@ -66,13 +77,14 @@ class CityscapeDataset:
        shape = self.config["crop_size"]
        while True:
            ln = self.label_files[self.index]
-            img_name = self.dataset_dir + 'leftImg8bit/' + self.subset + ln[len(
+            img_name = os.path.join(
-                self.label_dirname):]
+                self.dataset_dir,
+                'leftImg8bit/' + self.subset + ln[len(self.label_dirname):])
            img_name = img_name.replace('gtFine_labelTrainIds', 'leftImg8bit')
            label = cv2.imread(ln)
            img = cv2.imread(img_name)
            if img is None:
-                print "load img failed:", img_name
+                print("load img failed:", img_name)
                self.next_img()
            else:
                break
@@ -128,5 +140,7 @@ class CityscapeDataset:
            from prefetch_generator import BackgroundGenerator
            batches = BackgroundGenerator(batches, 100)
        except:
-            print "You can install 'prefetch_generator' for acceleration of data reading."
+            print(
+                "You can install 'prefetch_generator' for acceleration of data reading."
+            )
        return batches
--- a/fluid/deeplabv3+/train.py
+++ b/fluid/deeplabv3+/train.py
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
 import os
 os.environ['FLAGS_fraction_of_gpu_memory_to_use'] = '0.98'
@@ -126,13 +129,12 @@ exe = fluid.Executor(place)
 exe.run(sp)
 if args.init_weights_path:
-    print "load from:", args.init_weights_path
+    print("load from:", args.init_weights_path)
    load_model()
 dataset = CityscapeDataset(args.dataset_path, 'train')
 if args.parallel:
-    print "Using ParallelExecutor."
    exe_p = fluid.ParallelExecutor(
        use_cuda=True, loss_name=loss_mean.name, main_program=tp)
@@ -149,9 +151,9 @@ for i, imgs, labels, names in batches:
                             'label': labels},
                       fetch_list=[pred, loss_mean])
    if i % 100 == 0:
-        print "Model is saved to", args.save_weights_path
+        print("Model is saved to", args.save_weights_path)
        save_model()
-    print "step %s, loss: %s" % (i, np.mean(retv[1]))
+    print("step %s, loss: %s" % (i, np.mean(retv[1])))
-print "Training done. Model is saved to", args.save_weights_path
+print("Training done. Model is saved to", args.save_weights_path)
 save_model()
--- a/fluid/face_detection/.gitignore
+++ b/fluid/face_detection/.gitignore
@@ -10,3 +10,4 @@ output*
 pred
 eval_tools
 box*
+PyramidBox_WiderFace*
--- a/fluid/face_detection/data_util.py
+++ b/fluid/face_detection/data_util.py
@@ -9,6 +9,7 @@ import time
 import numpy as np
 import threading
 import multiprocessing
+import traceback
 try:
    import queue
 except ImportError:
@@ -71,6 +72,7 @@ class GeneratorEnqueuer(object):
                        try:
                            task()
                        except Exception:
+                            traceback.print_exc()
                            self._stop_event.set()
                            break
            else:
@@ -78,6 +80,7 @@ class GeneratorEnqueuer(object):
                    try:
                        task()
                    except Exception:
+                        traceback.print_exc()
                        self._stop_event.set()
                        break

--- a/fluid/face_detection/pyramidbox.py
+++ b/fluid/face_detection/pyramidbox.py
@@ -427,6 +427,7 @@ class PyramidBox(object):
            overlap_threshold=0.35,
            neg_overlap=0.35)
        loss = fluid.layers.reduce_sum(loss)
+        loss.persistable = True
        return loss
    def train(self):

--- a/fluid/face_detection/reader.py
+++ b/fluid/face_detection/reader.py
@@ -250,6 +250,10 @@ def train_generator(settings, file_list, batch_size, shuffle=True):
                    ymin = float(temp_info_box[1])
                    w = float(temp_info_box[2])
                    h = float(temp_info_box[3])
+                    # Filter out wrong labels
+                    if w < 0 or h < 0:
+                        continue
                    xmax = xmin + w
                    ymax = ymin + h
@@ -294,7 +298,7 @@ def train(settings,
                        generator_output = enqueuer.queue.get()
                        break
                    else:
-                        time.sleep(0.02)
+                        time.sleep(0.01)
                yield generator_output
                generator_output = None
        finally:

--- a/fluid/face_detection/train.py
+++ b/fluid/face_detection/train.py
@@ -167,7 +167,7 @@ def train(args, config, train_params, train_file_list):
            shutil.rmtree(model_path)
        print('save models to %s' % (model_path))
-        fluid.io.save_persistables(exe, model_path)
+        fluid.io.save_persistables(exe, model_path, main_program=program)
    train_py_reader.start()
    try:
@@ -189,13 +189,13 @@ def train(args, config, train_params, train_file_list):
                fetch_vars = [np.mean(np.array(v)) for v in fetch_vars]
                if batch_id % 10 == 0:
                    if not args.use_pyramidbox:
-                        print("Pass {0}, batch {1}, loss {2}, time {3}".format(
+                        print("Pass {:d}, batch {:d}, loss {:.6f}, time {:.5f}".format(
                            pass_id, batch_id, fetch_vars[0],
                            start_time - prev_start_time))
                    else:
-                        print("Pass {0}, batch {1}, face loss {2}, " \
+                        print("Pass {:d}, batch {:d}, face loss {:.6f}, " \
-                              "head loss {3}, " \
+                              "head loss {:.6f}, " \
-                              "time {4}".format(pass_id,
+                              "time {:.5f}".format(pass_id,
                               batch_id, fetch_vars[0], fetch_vars[1],
                               start_time - prev_start_time))
            if pass_id % 1 == 0 or pass_id == epoc_num - 1:

--- a/fluid/face_detection/widerface_eval.py
+++ b/fluid/face_detection/widerface_eval.py
@@ -82,9 +82,6 @@ def save_widerface_bboxes(image_path, bboxes_scores, output_dir):
    image_name = image_path.split('/')[-1]
    image_class = image_path.split('/')[-2]
-    image_name = image_name.encode('utf-8')
-    image_class = image_class.encode('utf-8')
    odir = os.path.join(output_dir, image_class)
    if not os.path.exists(odir):
        os.makedirs(odir)

--- a/fluid/faster_rcnn/README.md
+++ b/fluid/faster_rcnn/README.md
@@ -43,7 +43,7 @@ After data preparation, one can start the training step by:
    python train.py \
       --max_size=1333 \
-       --scales=800 \
+       --scales=[800] \
       --batch_size=8 \
       --model_save_dir=output/
@@ -57,6 +57,22 @@ After data preparation, one can start the training step by:
    sh ./pretrained/download.sh
 Set `pretrained_model` to load pre-trained model. In addition, this parameter is used to load trained model when finetuning as well.
+Please make sure that pretrained_model is downloaded and loaded correctly, otherwise, the loss may be NAN during training.
+**Install the [cocoapi](https://github.com/cocodataset/cocoapi):**
+To train the model, [cocoapi](https://github.com/cocodataset/cocoapi) is needed. Install the cocoapi:
+    # COCOAPI=/path/to/clone/cocoapi
+    git clone https://github.com/cocodataset/cocoapi.git $COCOAPI
+    cd $COCOAPI/PythonAPI
+    # if cython is not installed
+    pip install Cython
+    # Install into global site-packages
+    make install
+    # Alternatively, if you do not have permissions or prefer
+    # not to install the COCO API into global site-packages
+    python2 setup.py install --user
 **data reader introduction:**
@@ -103,18 +119,7 @@ Finetuning is to finetune model weights in a specific task by loading pretrained
 ## Evaluation
-Evaluation is to evaluate the performance of a trained model. This sample provides `eval_coco_map.py` which uses a COCO-specific mAP metric defined by [COCO committee](http://cocodataset.org/#detections-eval). To use `eval_coco_map.py` , [cocoapi](https://github.com/cocodataset/cocoapi) is needed. Install the cocoapi:
+Evaluation is to evaluate the performance of a trained model. This sample provides `eval_coco_map.py` which uses a COCO-specific mAP metric defined by [COCO committee](http://cocodataset.org/#detections-eval).
-    # COCOAPI=/path/to/clone/cocoapi
-    git clone https://github.com/cocodataset/cocoapi.git $COCOAPI
-    cd $COCOAPI/PythonAPI
-    # if cython is not installed
-    pip install Cython
-    # Install into global site-packages
-    make install
-    # Alternatively, if you do not have permissions or prefer
-    # not to install the COCO API into global site-packages
-    python2 setup.py install --user
 `eval_coco_map.py` is the main executor for evalution, one can start evalution step by:
@@ -136,7 +141,7 @@ Faster RCNN mAP
 | Detectron                 | 8            |    180000        | 0.315 |
 | Fluid minibatch padding | 8            |    180000        | 0.314 |
 | Fluid all padding         | 8            |    180000        | 0.308 |
-| Fluid no padding         |6            |    240000        | 0.317 |
+| Fluid no padding         |8            |    180000        | 0.316 |
 * Fluid all padding: Each image padding to 1333\*1333.
 * Fluid minibatch padding: Images in one batch padding to the same size. This method is same as detectron.

--- a/fluid/faster_rcnn/README_cn.md
+++ b/fluid/faster_rcnn/README_cn.md
@@ -42,7 +42,7 @@ Faster RCNN 目标检测模型
    python train.py \
       --max_size=1333 \
-       --scales=800 \
+       --scales=[800] \
       --batch_size=8 \
       --model_save_dir=output/ \
       --pretrained_model=${path_to_pretrain_model}
@@ -57,6 +57,22 @@ Faster RCNN 目标检测模型
    sh ./pretrained/download.sh
 通过初始化`pretrained_model` 加载预训练模型。同时在参数微调时也采用该设置加载已训练模型。
+请在训练前确认预训练模型下载与加载正确，否则训练过程中损失可能会出现NAN。
+**安装[cocoapi](https://github.com/cocodataset/cocoapi)：**
+训练前需要首先下载[cocoapi](https://github.com/cocodataset/cocoapi)：
+    # COCOAPI=/path/to/clone/cocoapi
+    git clone https://github.com/cocodataset/cocoapi.git $COCOAPI
+    cd $COCOAPI/PythonAPI
+    # if cython is not installed
+    pip install Cython
+    # Install into global site-packages
+    make install
+    # Alternatively, if you do not have permissions or prefer
+    # not to install the COCO API into global site-packages
+    python2 setup.py install --user
 **数据读取器说明：** 数据读取器定义在reader.py中。所有图像将短边等比例缩放至`scales`，若长边大于`max_size`, 则再次将长边等比例缩放至`max_iter`。在训练阶段，对图像采用水平翻转。支持将同一个batch内的图像padding为相同尺寸。
@@ -87,18 +103,7 @@ Faster RCNN 训练loss
 ## 模型评估
-模型评估是指对训练完毕的模型评估各类性能指标。本示例采用[COCO官方评估](http://cocodataset.org/#detections-eval)，使用前需要首先下载[cocoapi](https://github.com/cocodataset/cocoapi)：
+模型评估是指对训练完毕的模型评估各类性能指标。本示例采用[COCO官方评估](http://cocodataset.org/#detections-eval)
-    # COCOAPI=/path/to/clone/cocoapi
-    git clone https://github.com/cocodataset/cocoapi.git $COCOAPI
-    cd $COCOAPI/PythonAPI
-    # if cython is not installed
-    pip install Cython
-    # Install into global site-packages
-    make install
-    # Alternatively, if you do not have permissions or prefer
-    # not to install the COCO API into global site-packages
-    python2 setup.py install --user
 `eval_coco_map.py`是评估模块的主要执行程序，调用示例如下：
@@ -120,7 +125,7 @@ Faster RCNN mAP
 | Detectron                 | 8            |    180000        | 0.315 |
 | Fluid minibatch padding | 8            |    180000        | 0.314 |
 | Fluid all padding         | 8            |    180000        | 0.308 |
-| Fluid no padding            |6            |    240000        | 0.317 |
+| Fluid no padding            |8            |    180000        | 0.316 |
 * Fluid all padding: 每张图像填充为1333\*1333大小。
 * Fluid minibatch padding: 同一个batch内的图像填充为相同尺寸。该方法与detectron处理相同。

--- a/fluid/faster_rcnn/config.py
+++ b/fluid/faster_rcnn/config.py
+#  Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved.
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#    http://www.apache.org/licenses/LICENSE-2.0
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License. 
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+from edict import AttrDict
+import six
+import numpy as np
+_C = AttrDict()
+cfg = _C
+#
+# Training options
+#
+_C.TRAIN = AttrDict()
+# scales an image's shortest side
+_C.TRAIN.scales = [800]
+# max size of longest side
+_C.TRAIN.max_size = 1333
+# images per GPU in minibatch
+_C.TRAIN.im_per_batch = 1
+# roi minibatch size per image
+_C.TRAIN.batch_size_per_im = 512
+# target fraction of foreground roi minibatch 
+_C.TRAIN.fg_fractrion = 0.25
+# overlap threshold for a foreground roi
+_C.TRAIN.fg_thresh = 0.5
+# overlap threshold for a background roi
+_C.TRAIN.bg_thresh_hi = 0.5
+_C.TRAIN.bg_thresh_lo = 0.0
+# If False, only resize image and not pad, image shape is different between
+# GPUs in one mini-batch. If True, image shape is the same in one mini-batch.
+_C.TRAIN.padding_minibatch = False
+# Snapshot period
+_C.TRAIN.snapshot_iter = 10000
+# number of RPN proposals to keep before NMS
+_C.TRAIN.rpn_pre_nms_top_n = 12000
+# number of RPN proposals to keep after NMS
+_C.TRAIN.rpn_post_nms_top_n = 2000
+# NMS threshold used on RPN proposals
+_C.TRAIN.rpn_nms_thresh = 0.7
+# min size in RPN proposals
+_C.TRAIN.rpn_min_size = 0.0
+# eta for adaptive NMS in RPN
+_C.TRAIN.rpn_eta = 1.0
+# number of RPN examples per image
+_C.TRAIN.rpn_batch_size_per_im = 256
+# remove anchors out of the image
+_C.TRAIN.rpn_straddle_thresh = 0.
+# target fraction of foreground examples pre RPN minibatch
+_C.TRAIN.rpn_fg_fraction = 0.5
+# min overlap between anchor and gt box to be a positive examples
+_C.TRAIN.rpn_positive_overlap = 0.7
+# max overlap between anchor and gt box to be a negative examples
+_C.TRAIN.rpn_negative_overlap = 0.3
+# stopgrad at a specified stage
+_C.TRAIN.freeze_at = 2
+# min area of ground truth box
+_C.TRAIN.gt_min_area = -1
+#
+# Inference options
+#
+_C.TEST = AttrDict()
+# scales an image's shortest side
+_C.TEST.scales = [800]
+# max size of longest side
+_C.TEST.max_size = 1333
+# eta for adaptive NMS in RPN
+_C.TEST.rpn_eta = 1.0
+# min score threshold to infer
+_C.TEST.score_thresh = 0.05
+# overlap threshold used for NMS
+_C.TEST.nms_thresh = 0.5
+# number of RPN proposals to keep before NMS
+_C.TEST.rpn_pre_nms_top_n = 6000
+# number of RPN proposals to keep after NMS
+_C.TEST.rpn_post_nms_top_n = 1000
+# min size in RPN proposals
+_C.TEST.rpn_min_size = 0.0
+# max number of detections
+_C.TEST.detectiions_per_im = 100
+# NMS threshold used on RPN proposals
+_C.TEST.rpn_nms_thresh = 0.7
+#
+# Model options
+#
+# weight for bbox regression targets
+_C.bbox_reg_weights = [0.1, 0.1, 0.2, 0.2]
+# RPN anchor sizes
+_C.anchor_sizes = [32, 64, 128, 256, 512]
+# RPN anchor ratio
+_C.aspect_ratio = [0.5, 1, 2]
+# variance of anchors
+_C.variances = [1., 1., 1., 1.]
+# stride of feature map
+_C.rpn_stride = [16.0, 16.0]
+# Use roi pool or roi align, 'RoIPool' or 'RoIAlign'
+_C.roi_func = 'RoIAlign'
+# sampling ratio for roi align
+_C.sampling_ratio = 0
+# pooled width and pooled height 
+_C.roi_resolution = 14
+# spatial scale 
+_C.spatial_scale = 1. / 16.
+#
+# SOLVER options
+#
+# derived learning rate the to get the final learning rate.
+_C.learning_rate = 0.01
+# maximum number of iterations
+_C.max_iter = 180000
+# warm up to learning rate 
+_C.warm_up_iter = 500
+_C.warm_up_factor = 1. / 3.
+# lr steps_with_decay
+_C.lr_steps = [120000, 160000]
+_C.lr_gamma = 0.1
+# L2 regularization hyperparameter
+_C.weight_decay = 0.0001
+# momentum with SGD
+_C.momentum = 0.9
+#
+# ENV options
+#
+# support both CPU and GPU
+_C.use_gpu = True
+# Whether use parallel
+_C.parallel = True
+# Class number
+_C.class_num = 81
+# support pyreader
+_C.use_pyreader = True
+# pixel mean values
+_C.pixel_means = [102.9801, 115.9465, 122.7717]
+# clip box to prevent overflowing
+_C.bbox_clip = np.log(1000. / 16.)
+# dataset path
+_C.train_file_list = 'annotations/instances_train2017.json'
+_C.train_data_dir = 'train2017'
+_C.val_file_list = 'annotations/instances_val2017.json'
+_C.val_data_dir = 'val2017'
+def merge_cfg_from_args(args, mode):
+    """Merge config keys, values in args into the global config."""
+    if mode == 'train':
+        sub_d = _C.TRAIN
+    else:
+        sub_d = _C.TEST
+    for k, v in sorted(six.iteritems(vars(args))):
+        d = _C
+        try:
+            value = eval(v)
+        except:
+            value = v
+        if k in sub_d:
+            sub_d[k] = value
+        else:
+            d[k] = value
--- a/fluid/faster_rcnn/data_utils.py
+++ b/fluid/faster_rcnn/data_utils.py
@@ -27,21 +27,27 @@ from __future__ import unicode_literals
 import cv2
 import numpy as np
+from config import cfg
-def get_image_blob(roidb, settings):
+def get_image_blob(roidb, mode):
    """Builds an input blob from the images in the roidb at the specified
    scales.
    """
-    scale_ind = np.random.randint(0, high=len(settings.scales))
+    if mode == 'train':
+        scales = cfg.TRAIN.scales
+        scale_ind = np.random.randint(0, high=len(scales))
+        target_size = scales[scale_ind]
+        max_size = cfg.TRAIN.max_size
+    else:
+        target_size = cfg.TEST.scales[0]
+        max_size = cfg.TEST.max_size
    im = cv2.imread(roidb['image'])
    assert im is not None, \
        'Failed to read image \'{}\''.format(roidb['image'])
    if roidb['flipped']:
        im = im[:, ::-1, :]
-    target_size = settings.scales[scale_ind]
+    im, im_scale = prep_im_for_blob(im, cfg.pixel_means, target_size, max_size)
-    im, im_scale = prep_im_for_blob(im, settings.mean_value, target_size,
-                                    settings.max_size)
    return im, im_scale

--- a/fluid/faster_rcnn/edict.py
+++ b/fluid/faster_rcnn/edict.py
+# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
+#
+#Licensed under the Apache License, Version 2.0 (the "License");
+#you may not use this file except in compliance with the License.
+#You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#Unless required by applicable law or agreed to in writing, software
+#distributed under the License is distributed on an "AS IS" BASIS,
+#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#See the License for the specific language governing permissions and
+#limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from __future__ import unicode_literals
+class AttrDict(dict):
+    def __init__(self, *args, **kwargs):
+        super(AttrDict, self).__init__(*args, **kwargs)
+    def __getattr__(self, name):
+        if name in self.__dict__:
+            return self.__dict__[name]
+        elif name in self:
+            return self[name]
+        else:
+            raise AttributeError(name)
+    def __setattr__(self, name, value):
+        if name in self.__dict__:
+            self.__dict__[name] = value
+        else:
+            self[name] = value
--- a/fluid/faster_rcnn/eval_coco_map.py
+++ b/fluid/faster_rcnn/eval_coco_map.py
@@ -29,18 +29,20 @@ import models.resnet as resnet
 import json
 from pycocotools.coco import COCO
 from pycocotools.cocoeval import COCOeval, Params
+from config import cfg
-def eval(cfg):
+def eval():
    if '2014' in cfg.dataset:
        test_list = 'annotations/instances_val2014.json'
    elif '2017' in cfg.dataset:
        test_list = 'annotations/instances_val2017.json'
-    image_shape = [3, cfg.max_size, cfg.max_size]
+    image_shape = [3, cfg.TEST.max_size, cfg.TEST.max_size]
    class_nums = cfg.class_num
-    batch_size = cfg.batch_size
+    devices = os.getenv("CUDA_VISIBLE_DEVICES") or ""
+    devices_num = len(devices.split(","))
+    total_batch_size = devices_num * cfg.TRAIN.im_per_batch
    cocoGt = COCO(os.path.join(cfg.data_dir, test_list))
    numId_to_catId_map = {i + 1: v for i, v in enumerate(cocoGt.getCatIds())}
    category_ids = cocoGt.getCatIds()
@@ -51,7 +53,6 @@ def eval(cfg):
    label_list[0] = ['background']
    model = model_builder.FasterRCNN(
-        cfg=cfg,
        add_conv_body_func=resnet.add_ResNet50_conv4_body,
        add_roi_box_head_func=resnet.add_ResNet_roi_conv5_head,
        use_pyreader=False,
@@ -66,7 +67,7 @@ def eval(cfg):
            return os.path.exists(os.path.join(cfg.pretrained_model, var.name))
        fluid.io.load_vars(exe, cfg.pretrained_model, predicate=if_exist)
    # yapf: enable
-    test_reader = reader.test(cfg, batch_size)
+    test_reader = reader.test(total_batch_size)
    feeder = fluid.DataFeeder(place=place, feed_list=model.feeds())
    dts_res = []
@@ -80,11 +81,11 @@ def eval(cfg):
            fetch_list=[v.name for v in fetch_list],
            feed=feeder.feed(batch_data),
            return_numpy=False)
-        new_lod, nmsed_out = get_nmsed_box(cfg, rpn_rois_v, confs_v, locs_v,
+        new_lod, nmsed_out = get_nmsed_box(rpn_rois_v, confs_v, locs_v,
                                           class_nums, im_info,
                                           numId_to_catId_map)
-        dts_res += get_dt_res(batch_size, new_lod, nmsed_out, batch_data)
+        dts_res += get_dt_res(total_batch_size, new_lod, nmsed_out, batch_data)
        end = time.time()
        print('batch id: {}, time: {}'.format(batch_id, end - start))
    with open("detection_result.json", 'w') as outfile:
@@ -100,6 +101,4 @@ def eval(cfg):
 if __name__ == '__main__':
    args = parse_args()
    print_arguments(args)
+    eval()
-    data_args = reader.Settings(args)
-    eval(data_args)
--- a/fluid/faster_rcnn/eval_helper.py
+++ b/fluid/faster_rcnn/eval_helper.py
@@ -20,6 +20,7 @@ import box_utils
 from PIL import Image
 from PIL import ImageDraw
 from PIL import ImageFont
+from config import cfg
 def box_decoder(target_box, prior_box, prior_box_var):
@@ -31,10 +32,8 @@ def box_decoder(target_box, prior_box, prior_box_var):
    prior_box_loc[:, 3] = (prior_box[:, 3] + prior_box[:, 1]) / 2
    pred_bbox = np.zeros_like(target_box, dtype=np.float32)
    for i in range(prior_box.shape[0]):
-        dw = np.minimum(prior_box_var[2] * target_box[i, 2::4],
+        dw = np.minimum(prior_box_var[2] * target_box[i, 2::4], cfg.bbox_clip)
-                        np.log(1000. / 16.))
+        dh = np.minimum(prior_box_var[3] * target_box[i, 3::4], cfg.bbox_clip)
-        dh = np.minimum(prior_box_var[3] * target_box[i, 3::4],
-                        np.log(1000. / 16.))
        pred_bbox[i, 0::4] = prior_box_var[0] * target_box[
            i, 0::4] * prior_box_loc[i, 0] + prior_box_loc[i, 2]
        pred_bbox[i, 1::4] = prior_box_var[1] * target_box[
@@ -67,11 +66,11 @@ def clip_tiled_boxes(boxes, im_shape):
    return boxes
-def get_nmsed_box(args, rpn_rois, confs, locs, class_nums, im_info,
+def get_nmsed_box(rpn_rois, confs, locs, class_nums, im_info,
                  numId_to_catId_map):
    lod = rpn_rois.lod()[0]
    rpn_rois_v = np.array(rpn_rois)
-    variance_v = np.array([0.1, 0.1, 0.2, 0.2])
+    variance_v = np.array(cfg.bbox_reg_weights)
    confs_v = np.array(confs)
    locs_v = np.array(locs)
    rois = box_decoder(locs_v, rpn_rois_v, variance_v)
@@ -89,12 +88,12 @@ def get_nmsed_box(args, rpn_rois, confs, locs, class_nums, im_info,
        cls_boxes = [[] for _ in range(class_nums)]
        scores_n = confs_v[start:end, :]
        for j in range(1, class_nums):
-            inds = np.where(scores_n[:, j] > args.score_threshold)[0]
+            inds = np.where(scores_n[:, j] > cfg.TEST.score_thresh)[0]
            scores_j = scores_n[inds, j]
            rois_j = rois_n[inds, j * 4:(j + 1) * 4]
            dets_j = np.hstack((rois_j, scores_j[:, np.newaxis])).astype(
                np.float32, copy=False)
-            keep = box_utils.nms(dets_j, args.nms_threshold)
+            keep = box_utils.nms(dets_j, cfg.TEST.nms_thresh)
            nms_dets = dets_j[keep, :]
            #add labels
            cat_id = numId_to_catId_map[j]
@@ -105,8 +104,8 @@ def get_nmsed_box(args, rpn_rois, confs, locs, class_nums, im_info,
    # Limit to max_per_image detections **over all classes**
        image_scores = np.hstack(
            [cls_boxes[j][:, -2] for j in range(1, class_nums)])
-        if len(image_scores) > 100:
+        if len(image_scores) > cfg.TEST.detectiions_per_im:
-            image_thresh = np.sort(image_scores)[-100]
+            image_thresh = np.sort(image_scores)[-cfg.TEST.detectiions_per_im]
            for j in range(1, class_nums):
                keep = np.where(cls_boxes[j][:, -2] >= image_thresh)[0]
                cls_boxes[j] = cls_boxes[j][keep, :]

--- a/fluid/faster_rcnn/image/mAP.jpg
+++ b/fluid/faster_rcnn/image/mAP.jpg
--- a/fluid/faster_rcnn/image/train_loss.jpg
+++ b/fluid/faster_rcnn/image/train_loss.jpg
--- a/fluid/faster_rcnn/infer.py
+++ b/fluid/faster_rcnn/infer.py
+import os
+import time
+import numpy as np
+from eval_helper import get_nmsed_box
+from eval_helper import get_dt_res
+from eval_helper import draw_bounding_box_on_image
+import paddle
+import paddle.fluid as fluid
+import reader
+from utility import print_arguments, parse_args
+import models.model_builder as model_builder
+import models.resnet as resnet
+import json
+from pycocotools.coco import COCO
+from pycocotools.cocoeval import COCOeval, Params
+from config import cfg
+def infer():
+    if '2014' in cfg.dataset:
+        test_list = 'annotations/instances_val2014.json'
+    elif '2017' in cfg.dataset:
+        test_list = 'annotations/instances_val2017.json'
+    cocoGt = COCO(os.path.join(cfg.data_dir, test_list))
+    numId_to_catId_map = {i + 1: v for i, v in enumerate(cocoGt.getCatIds())}
+    category_ids = cocoGt.getCatIds()
+    label_list = {
+        item['id']: item['name']
+        for item in cocoGt.loadCats(category_ids)
+    }
+    label_list[0] = ['background']
+    image_shape = [3, cfg.TEST.max_size, cfg.TEST.max_size]
+    class_nums = cfg.class_num
+    model = model_builder.FasterRCNN(
+        add_conv_body_func=resnet.add_ResNet50_conv4_body,
+        add_roi_box_head_func=resnet.add_ResNet_roi_conv5_head,
+        use_pyreader=False,
+        is_train=False)
+    model.build_model(image_shape)
+    rpn_rois, confs, locs = model.eval_out()
+    place = fluid.CUDAPlace(0) if cfg.use_gpu else fluid.CPUPlace()
+    exe = fluid.Executor(place)
+    # yapf: disable
+    if cfg.pretrained_model:
+        def if_exist(var):
+            return os.path.exists(os.path.join(cfg.pretrained_model, var.name))
+        fluid.io.load_vars(exe, cfg.pretrained_model, predicate=if_exist)
+    # yapf: enable
+    infer_reader = reader.infer()
+    feeder = fluid.DataFeeder(place=place, feed_list=model.feeds())
+    dts_res = []
+    fetch_list = [rpn_rois, confs, locs]
+    data = next(infer_reader())
+    im_info = [data[0][1]]
+    rpn_rois_v, confs_v, locs_v = exe.run(
+        fetch_list=[v.name for v in fetch_list],
+        feed=feeder.feed(data),
+        return_numpy=False)
+    new_lod, nmsed_out = get_nmsed_box(rpn_rois_v, confs_v, locs_v, class_nums,
+                                       im_info, numId_to_catId_map)
+    path = os.path.join(cfg.image_path, cfg.image_name)
+    draw_bounding_box_on_image(path, nmsed_out, cfg.draw_threshold, label_list)
+if __name__ == '__main__':
+    args = parse_args()
+    print_arguments(args)
+    infer()
--- a/fluid/faster_rcnn/models/model_builder.py
+++ b/fluid/faster_rcnn/models/model_builder.py
@@ -17,11 +17,11 @@ from paddle.fluid.param_attr import ParamAttr
 from paddle.fluid.initializer import Constant
 from paddle.fluid.initializer import Normal
 from paddle.fluid.regularizer import L2Decay
+from config import cfg
 class FasterRCNN(object):
    def __init__(self,
-                 cfg=None,
                 add_conv_body_func=None,
                 add_roi_box_head_func=None,
                 is_train=True,
@@ -29,7 +29,6 @@ class FasterRCNN(object):
                 use_random=True):
        self.add_conv_body_func = add_conv_body_func
        self.add_roi_box_head_func = add_roi_box_head_func
-        self.cfg = cfg
        self.is_train = is_train
        self.use_pyreader = use_pyreader
        self.use_random = use_random
@@ -111,10 +110,10 @@ class FasterRCNN(object):
                name="conv_rpn_b", learning_rate=2., regularizer=L2Decay(0.)))
        self.anchor, self.var = fluid.layers.anchor_generator(
            input=rpn_conv,
-            anchor_sizes=self.cfg.anchor_sizes,
+            anchor_sizes=cfg.anchor_sizes,
-            aspect_ratios=self.cfg.aspect_ratios,
+            aspect_ratios=cfg.aspect_ratio,
-            variance=self.cfg.variance,
+            variance=cfg.variances,
-            stride=[16.0, 16.0])
+            stride=cfg.rpn_stride)
        num_anchor = self.anchor.shape[2]
        # Proposal classification scores
        self.rpn_cls_score = fluid.layers.conv2d(
@@ -152,8 +151,12 @@ class FasterRCNN(object):
        rpn_cls_score_prob = fluid.layers.sigmoid(
            self.rpn_cls_score, name='rpn_cls_score_prob')
-        pre_nms_top_n = 12000 if self.is_train else 6000
+        param_obj = cfg.TRAIN if self.is_train else cfg.TEST
-        post_nms_top_n = 2000 if self.is_train else 1000
+        pre_nms_top_n = param_obj.rpn_pre_nms_top_n
+        post_nms_top_n = param_obj.rpn_post_nms_top_n
+        nms_thresh = param_obj.rpn_nms_thresh
+        min_size = param_obj.rpn_min_size
+        eta = param_obj.rpn_eta
        rpn_rois, rpn_roi_probs = fluid.layers.generate_proposals(
            scores=rpn_cls_score_prob,
            bbox_deltas=self.rpn_bbox_pred,
@@ -162,9 +165,9 @@ class FasterRCNN(object):
            variances=self.var,
            pre_nms_top_n=pre_nms_top_n,
            post_nms_top_n=post_nms_top_n,
-            nms_thresh=0.7,
+            nms_thresh=nms_thresh,
-            min_size=0.0,
+            min_size=min_size,
-            eta=1.0)
+            eta=eta)
        self.rpn_rois = rpn_rois
        if self.is_train:
            outs = fluid.layers.generate_proposal_labels(
@@ -173,13 +176,13 @@ class FasterRCNN(object):
                is_crowd=self.is_crowd,
                gt_boxes=self.gt_box,
                im_info=self.im_info,
-                batch_size_per_im=self.cfg.batch_size_per_im,
+                batch_size_per_im=cfg.TRAIN.batch_size_per_im,
-                fg_fraction=0.25,
+                fg_fraction=cfg.TRAIN.fg_fractrion,
-                fg_thresh=0.5,
+                fg_thresh=cfg.TRAIN.fg_thresh,
-                bg_thresh_hi=0.5,
+                bg_thresh_hi=cfg.TRAIN.bg_thresh_hi,
-                bg_thresh_lo=0.0,
+                bg_thresh_lo=cfg.TRAIN.bg_thresh_lo,
-                bbox_reg_weights=[0.1, 0.1, 0.2, 0.2],
+                bbox_reg_weights=cfg.bbox_reg_weights,
-                class_nums=self.cfg.class_num,
+                class_nums=cfg.class_num,
                use_random=self.use_random)
            self.rois = outs[0]
@@ -193,15 +196,24 @@ class FasterRCNN(object):
            pool_rois = self.rois
        else:
            pool_rois = self.rpn_rois
-        pool = fluid.layers.roi_pool(
+        if cfg.roi_func == 'RoIPool':
-            input=roi_input,
+            pool = fluid.layers.roi_pool(
-            rois=pool_rois,
+                input=roi_input,
-            pooled_height=14,
+                rois=pool_rois,
-            pooled_width=14,
+                pooled_height=cfg.roi_resolution,
-            spatial_scale=0.0625)
+                pooled_width=cfg.roi_resolution,
+                spatial_scale=cfg.spatial_scale)
+        elif cfg.roi_func == 'RoIAlign':
+            pool = fluid.layers.roi_align(
+                input=roi_input,
+                rois=pool_rois,
+                pooled_height=cfg.roi_resolution,
+                pooled_width=cfg.roi_resolution,
+                spatial_scale=cfg.spatial_scale,
+                sampling_ratio=cfg.sampling_ratio)
        rcnn_out = self.add_roi_box_head_func(pool)
        self.cls_score = fluid.layers.fc(input=rcnn_out,
-                                         size=self.cfg.class_num,
+                                         size=cfg.class_num,
                                         act=None,
                                         name='cls_score',
                                         param_attr=ParamAttr(
@@ -213,7 +225,7 @@ class FasterRCNN(object):
                                             learning_rate=2.,
                                             regularizer=L2Decay(0.)))
        self.bbox_pred = fluid.layers.fc(input=rcnn_out,
-                                         size=4 * self.cfg.class_num,
+                                         size=4 * cfg.class_num,
                                         act=None,
                                         name='bbox_pred',
                                         param_attr=ParamAttr(
@@ -257,8 +269,7 @@ class FasterRCNN(object):
            x=rpn_cls_score_reshape, shape=(0, -1, 1))
        rpn_bbox_pred_reshape = fluid.layers.reshape(
            x=rpn_bbox_pred_reshape, shape=(0, -1, 4))
+        score_pred, loc_pred, score_tgt, loc_tgt, bbox_weight = \
-        score_pred, loc_pred, score_tgt, loc_tgt = \
            fluid.layers.rpn_target_assign(
                bbox_pred=rpn_bbox_pred_reshape,
                cls_logits=rpn_cls_score_reshape,
@@ -267,11 +278,11 @@ class FasterRCNN(object):
                gt_boxes=self.gt_box,
                is_crowd=self.is_crowd,
                im_info=self.im_info,
-                rpn_batch_size_per_im=256,
+                rpn_batch_size_per_im=cfg.TRAIN.rpn_batch_size_per_im,
-                rpn_straddle_thresh=0.0,
+                rpn_straddle_thresh=cfg.TRAIN.rpn_straddle_thresh,
-                rpn_fg_fraction=0.5,
+                rpn_fg_fraction=cfg.TRAIN.rpn_fg_fraction,
-                rpn_positive_overlap=0.7,
+                rpn_positive_overlap=cfg.TRAIN.rpn_positive_overlap,
-                rpn_negative_overlap=0.3,
+                rpn_negative_overlap=cfg.TRAIN.rpn_negative_overlap,
                use_random=self.use_random)
        score_tgt = fluid.layers.cast(x=score_tgt, dtype='float32')
        rpn_cls_loss = fluid.layers.sigmoid_cross_entropy_with_logits(
@@ -279,7 +290,12 @@ class FasterRCNN(object):
        rpn_cls_loss = fluid.layers.reduce_mean(
            rpn_cls_loss, name='loss_rpn_cls')
-        rpn_reg_loss = fluid.layers.smooth_l1(x=loc_pred, y=loc_tgt, sigma=3.0)
+        rpn_reg_loss = fluid.layers.smooth_l1(
+            x=loc_pred,
+            y=loc_tgt,
+            sigma=3.0,
+            inside_weight=bbox_weight,
+            outside_weight=bbox_weight)
        rpn_reg_loss = fluid.layers.reduce_sum(
            rpn_reg_loss, name='loss_rpn_bbox')
        score_shape = fluid.layers.shape(score_tgt)

--- a/fluid/faster_rcnn/models/resnet.py
+++ b/fluid/faster_rcnn/models/resnet.py
@@ -16,6 +16,7 @@ import paddle.fluid as fluid
 from paddle.fluid.param_attr import ParamAttr
 from paddle.fluid.initializer import Constant
 from paddle.fluid.regularizer import L2Decay
+from config import cfg
 def conv_bn_layer(input,
@@ -88,8 +89,7 @@ def conv_affine_layer(input,
        default_initializer=Constant(0.))
    bias.stop_gradient = True
-    elt_mul = fluid.layers.elementwise_mul(x=conv, y=scale, axis=1)
+    out = fluid.layers.affine_channel(x=conv, scale=scale, bias=bias)
-    out = fluid.layers.elementwise_add(x=elt_mul, y=bias, axis=1)
    if act == 'relu':
        out = fluid.layers.relu(x=out)
    return out
@@ -137,7 +137,7 @@ ResNet_cfg = {
 }
-def add_ResNet50_conv4_body(body_input, freeze_at=2):
+def add_ResNet50_conv4_body(body_input):
    stages, block_func = ResNet_cfg[50]
    stages = stages[0:3]
    conv1 = conv_affine_layer(
@@ -149,13 +149,13 @@ def add_ResNet50_conv4_body(body_input, freeze_at=2):
        pool_stride=2,
        pool_padding=1)
    res2 = layer_warp(block_func, pool1, 64, stages[0], 1, name="res2")
-    if freeze_at == 2:
+    if cfg.TRAIN.freeze_at == 2:
        res2.stop_gradient = True
    res3 = layer_warp(block_func, res2, 128, stages[1], 2, name="res3")
-    if freeze_at == 3:
+    if cfg.TRAIN.freeze_at == 3:
        res3.stop_gradient = True
    res4 = layer_warp(block_func, res3, 256, stages[2], 2, name="res4")
-    if freeze_at == 4:
+    if cfg.TRAIN.freeze_at == 4:
        res4.stop_gradient = True
    return res4

--- a/fluid/faster_rcnn/pretrained/download.sh
+++ b/fluid/faster_rcnn/pretrained/download.sh
-DIR="$( cd "$(dirname "$0")" ; pwd -P )"
+DIR="$(dirname "$PWD -P")"
 cd "$DIR"
 # Download the data.

--- a/fluid/faster_rcnn/profile.py
+++ b/fluid/faster_rcnn/profile.py
@@ -26,19 +26,18 @@ import paddle.fluid.profiler as profiler
 import models.model_builder as model_builder
 import models.resnet as resnet
 from learning_rate import exponential_with_warmup_decay
+from config import cfg
-def train(cfg):
+def train():
-    batch_size = cfg.batch_size
    learning_rate = cfg.learning_rate
-    image_shape = [3, cfg.max_size, cfg.max_size]
+    image_shape = [3, cfg.TRAIN.max_size, cfg.TRAIN.max_size]
    num_iterations = cfg.max_iter
    devices = os.getenv("CUDA_VISIBLE_DEVICES") or ""
    devices_num = len(devices.split(","))
+    total_batch_size = devices_num * cfg.TRAIN.im_per_batch
    model = model_builder.FasterRCNN(
-        cfg=cfg,
        add_conv_body_func=resnet.add_ResNet50_conv4_body,
        add_roi_box_head_func=resnet.add_ResNet_roi_conv5_head,
        use_pyreader=cfg.use_pyreader,
@@ -51,8 +50,10 @@ def train(cfg):
    rpn_reg_loss.persistable = True
    loss = loss_cls + loss_bbox + rpn_cls_loss + rpn_reg_loss
-    boundaries = [120000, 160000]
+    boundaries = cfg.lr_steps
-    values = [learning_rate, learning_rate * 0.1, learning_rate * 0.01]
+    gamma = cfg.lr_gamma
+    step_num = len(lr_steps)
+    values = [learning_rate * (gamma**i) for i in range(step_num + 1)]
    optimizer = fluid.optimizer.Momentum(
        learning_rate=exponential_with_warmup_decay(
@@ -82,22 +83,16 @@ def train(cfg):
        train_exe = fluid.ParallelExecutor(
            use_cuda=bool(cfg.use_gpu), loss_name=loss.name)
-    assert cfg.batch_size % devices_num == 0, \
-        "batch_size = %d, devices_num = %d" %(cfg.batch_size, devices_num)
-    batch_size_per_dev = cfg.batch_size / devices_num
    if cfg.use_pyreader:
        train_reader = reader.train(
-            cfg,
+            batch_size=cfg.TRAIN.im_per_batch,
-            batch_size=batch_size_per_dev,
+            total_batch_size=total_batch_size,
-            total_batch_size=cfg.batch_size,
+            padding_total=cfg.TRAIN.padding_minibatch,
-            padding_total=cfg.padding_minibatch,
            shuffle=False)
        py_reader = model.py_reader
        py_reader.decorate_paddle_reader(train_reader)
    else:
-        train_reader = reader.train(
+        train_reader = reader.train(batch_size=total_batch_size, shuffle=False)
-            cfg, batch_size=cfg.batch_size, shuffle=False)
        feeder = fluid.DataFeeder(place=place, feed_list=model.feeds())
    fetch_list = [loss, loss_cls, loss_bbox, rpn_cls_loss, rpn_reg_loss]
@@ -109,7 +104,7 @@ def train(cfg):
        for batch_id in range(iterations):
            start_time = time.time()
-            data = train_reader().next()
+            data = next(train_reader())
            end_time = time.time()
            reader_time.append(end_time - start_time)
            start_time = time.time()
@@ -163,8 +158,7 @@ def train(cfg):
    run_func(2)
    # profiling
    start = time.time()
-    use_profile = False
+    if cfg.use_profile:
-    if use_profile:
        with profiler.profiler('GPU', 'total', '/tmp/profile_file'):
            reader_time, run_time, total_images = run_func(num_iterations)
    else:
@@ -181,6 +175,4 @@ def train(cfg):
 if __name__ == '__main__':
    args = parse_args()
    print_arguments(args)
+    train()
-    data_args = reader.Settings(args)
-    train(data_args)
--- a/fluid/faster_rcnn/reader.py
+++ b/fluid/faster_rcnn/reader.py
@@ -26,58 +26,45 @@ from collections import deque
 from roidbs import JsonDataset
 import data_utils
+from config import cfg
-class Settings(object):
+def coco(mode,
-    def __init__(self, args=None):
-        for arg, value in sorted(six.iteritems(vars(args))):
-            setattr(self, arg, value)
-        if 'coco2014' in args.dataset:
-            self.class_nums = 81
-            self.train_file_list = 'annotations/instances_train2014.json'
-            self.train_data_dir = 'train2014'
-            self.val_file_list = 'annotations/instances_val2014.json'
-            self.val_data_dir = 'val2014'
-        elif 'coco2017' in args.dataset:
-            self.class_nums = 81
-            self.train_file_list = 'annotations/instances_train2017.json'
-            self.train_data_dir = 'train2017'
-            self.val_file_list = 'annotations/instances_val2017.json'
-            self.val_data_dir = 'val2017'
-        else:
-            raise NotImplementedError('Dataset {} not supported'.format(
-                self.dataset))
-        self.mean_value = np.array(self.mean_value)[
-            np.newaxis, np.newaxis, :].astype('float32')
-def coco(settings,
-         mode,
         batch_size=None,
         total_batch_size=None,
         padding_total=False,
         shuffle=False):
+    if 'coco2014' in cfg.dataset:
+        cfg.train_file_list = 'annotations/instances_train2014.json'
+        cfg.train_data_dir = 'train2014'
+        cfg.val_file_list = 'annotations/instances_val2014.json'
+        cfg.val_data_dir = 'val2014'
+    elif 'coco2017' in cfg.dataset:
+        cfg.train_file_list = 'annotations/instances_train2017.json'
+        cfg.train_data_dir = 'train2017'
+        cfg.val_file_list = 'annotations/instances_val2017.json'
+        cfg.val_data_dir = 'val2017'
+    else:
+        raise NotImplementedError('Dataset {} not supported'.format(
+            cfg.dataset))
+    cfg.mean_value = np.array(cfg.pixel_means)[np.newaxis,
+                                               np.newaxis, :].astype('float32')
    total_batch_size = total_batch_size if total_batch_size else batch_size
    if mode != 'infer':
        assert total_batch_size % batch_size == 0
    if mode == 'train':
-        settings.train_file_list = os.path.join(settings.data_dir,
+        cfg.train_file_list = os.path.join(cfg.data_dir, cfg.train_file_list)
-                                                settings.train_file_list)
+        cfg.train_data_dir = os.path.join(cfg.data_dir, cfg.train_data_dir)
-        settings.train_data_dir = os.path.join(settings.data_dir,
-                                               settings.train_data_dir)
    elif mode == 'test' or mode == 'infer':
-        settings.val_file_list = os.path.join(settings.data_dir,
+        cfg.val_file_list = os.path.join(cfg.data_dir, cfg.val_file_list)
-                                              settings.val_file_list)
+        cfg.val_data_dir = os.path.join(cfg.data_dir, cfg.val_data_dir)
-        settings.val_data_dir = os.path.join(settings.data_dir,
+    json_dataset = JsonDataset(train=(mode == 'train'))
-                                             settings.val_data_dir)
-    json_dataset = JsonDataset(settings, train=(mode == 'train'))
    roidbs = json_dataset.get_roidb()
-    print("{} on {} with {} roidbs".format(mode, settings.dataset, len(roidbs)))
+    print("{} on {} with {} roidbs".format(mode, cfg.dataset, len(roidbs)))
    def roidb_reader(roidb, mode):
-        im, im_scales = data_utils.get_image_blob(roidb, settings)
+        im, im_scales = data_utils.get_image_blob(roidb, mode)
        im_id = roidb['id']
        im_height = np.round(roidb['height'] * im_scales)
        im_width = np.round(roidb['width'] * im_scales)
@@ -150,7 +137,7 @@ def coco(settings,
        else:
            for roidb in roidbs:
-                if settings.image_name not in roidb['image']:
+                if cfg.image_name not in roidb['image']:
                    continue
                im, im_info, im_id = roidb_reader(roidb, mode)
                batch_out = [(im, im_info, im_id)]
@@ -159,23 +146,14 @@ def coco(settings,
    return reader
-def train(settings,
+def train(batch_size, total_batch_size=None, padding_total=False, shuffle=True):
-          batch_size,
-          total_batch_size=None,
-          padding_total=False,
-          shuffle=True):
    return coco(
-        settings,
+        'train', batch_size, total_batch_size, padding_total, shuffle=shuffle)
-        'train',
-        batch_size,
-        total_batch_size,
-        padding_total,
-        shuffle=shuffle)
-def test(settings, batch_size, total_batch_size=None, padding_total=False):
+def test(batch_size, total_batch_size=None, padding_total=False):
-    return coco(settings, 'test', batch_size, total_batch_size, shuffle=False)
+    return coco('test', batch_size, total_batch_size, shuffle=False)
-def infer(settings):
+def infer():
-    return coco(settings, 'infer')
+    return coco('infer')
--- a/fluid/faster_rcnn/roidbs.py
+++ b/fluid/faster_rcnn/roidbs.py
@@ -26,7 +26,6 @@ from __future__ import print_function
 from __future__ import unicode_literals
 import copy
-import cPickle as pickle
 import logging
 import numpy as np
 import os
@@ -37,6 +36,7 @@ import matplotlib
 matplotlib.use('Agg')
 from pycocotools.coco import COCO
 import box_utils
+from config import cfg
 logger = logging.getLogger(__name__)
@@ -44,16 +44,16 @@ logger = logging.getLogger(__name__)
 class JsonDataset(object):
    """A class representing a COCO json dataset."""
-    def __init__(self, args, train=False):
+    def __init__(self, train=False):
-        print('Creating: {}'.format(args.dataset))
+        print('Creating: {}'.format(cfg.dataset))
-        self.name = args.dataset
+        self.name = cfg.dataset
        self.is_train = train
        if self.is_train:
-            data_dir = args.train_data_dir
+            data_dir = cfg.train_data_dir
-            file_list = args.train_file_list
+            file_list = cfg.train_file_list
        else:
-            data_dir = args.val_data_dir
+            data_dir = cfg.val_data_dir
-            file_list = args.val_file_list
+            file_list = cfg.val_file_list
        self.image_directory = data_dir
        self.COCO = COCO(file_list)
        # Set up dataset classes
@@ -91,7 +91,6 @@ class JsonDataset(object):
            end_time = time.time()
            print('_add_gt_annotations took {:.3f}s'.format(end_time -
                                                            start_time))
            print('Appending horizontally-flipped training examples...')
            self._extend_with_flipped_entries(roidb)
        print('Loaded dataset: {:s}'.format(self.name))
@@ -130,7 +129,7 @@ class JsonDataset(object):
        width = entry['width']
        height = entry['height']
        for obj in objs:
-            if obj['area'] < -1:  #cfg.TRAIN.GT_MIN_AREA:
+            if obj['area'] < cfg.TRAIN.gt_min_area:
                continue
            if 'ignore' in obj and obj['ignore'] == 1:
                continue

--- a/fluid/faster_rcnn/train.py
+++ b/fluid/faster_rcnn/train.py
@@ -28,11 +28,12 @@ import reader
 import models.model_builder as model_builder
 import models.resnet as resnet
 from learning_rate import exponential_with_warmup_decay
+from config import cfg
-def train(cfg):
+def train():
    learning_rate = cfg.learning_rate
-    image_shape = [3, cfg.max_size, cfg.max_size]
+    image_shape = [3, cfg.TRAIN.max_size, cfg.TRAIN.max_size]
    if cfg.debug:
        fluid.default_startup_program().random_seed = 1000
@@ -43,9 +44,9 @@ def train(cfg):
    devices = os.getenv("CUDA_VISIBLE_DEVICES") or ""
    devices_num = len(devices.split(","))
+    total_batch_size = devices_num * cfg.TRAIN.im_per_batch
    model = model_builder.FasterRCNN(
-        cfg=cfg,
        add_conv_body_func=resnet.add_ResNet50_conv4_body,
        add_roi_box_head_func=resnet.add_ResNet_roi_conv5_head,
        use_pyreader=cfg.use_pyreader,
@@ -58,18 +59,20 @@ def train(cfg):
    rpn_reg_loss.persistable = True
    loss = loss_cls + loss_bbox + rpn_cls_loss + rpn_reg_loss
-    boundaries = [120000, 160000]
+    boundaries = cfg.lr_steps
-    values = [learning_rate, learning_rate * 0.1, learning_rate * 0.01]
+    gamma = cfg.lr_gamma
+    step_num = len(cfg.lr_steps)
+    values = [learning_rate * (gamma**i) for i in range(step_num + 1)]
    optimizer = fluid.optimizer.Momentum(
        learning_rate=exponential_with_warmup_decay(
            learning_rate=learning_rate,
            boundaries=boundaries,
            values=values,
-            warmup_iter=500,
+            warmup_iter=cfg.warm_up_iter,
-            warmup_factor=1.0 / 3.0),
+            warmup_factor=cfg.warm_up_factor),
-        regularization=fluid.regularizer.L2Decay(0.0001),
+        regularization=fluid.regularizer.L2Decay(cfg.weight_decay),
-        momentum=0.9)
+        momentum=cfg.momentum)
    optimizer.minimize(loss)
    fluid.memory_optimize(fluid.default_main_program())
@@ -89,20 +92,16 @@ def train(cfg):
        train_exe = fluid.ParallelExecutor(
            use_cuda=bool(cfg.use_gpu), loss_name=loss.name)
-    assert cfg.batch_size % devices_num == 0
-    batch_size_per_dev = cfg.batch_size / devices_num
    if cfg.use_pyreader:
        train_reader = reader.train(
-            cfg,
+            batch_size=cfg.TRAIN.im_per_batch,
-            batch_size=batch_size_per_dev,
+            total_batch_size=total_batch_size,
-            total_batch_size=cfg.batch_size,
+            padding_total=cfg.TRAIN.padding_minibatch,
-            padding_total=cfg.padding_minibatch,
            shuffle=True)
        py_reader = model.py_reader
        py_reader.decorate_paddle_reader(train_reader)
    else:
-        train_reader = reader.train(
+        train_reader = reader.train(batch_size=total_batch_size, shuffle=True)
-            cfg, batch_size=cfg.batch_size, shuffle=True)
        feeder = fluid.DataFeeder(place=place, feed_list=model.feeds())
    def save_model(postfix):
@@ -133,7 +132,7 @@ def train(cfg):
                    smoothed_loss.get_median_value(
                    ), start_time - prev_start_time))
                sys.stdout.flush()
-                if (iter_id + 1) % cfg.snapshot_stride == 0:
+                if (iter_id + 1) % cfg.TRAIN.snapshot_iter == 0:
                    save_model("model_iter{}".format(iter_id))
        except fluid.core.EOFException:
            py_reader.reset()
@@ -159,7 +158,7 @@ def train(cfg):
                iter_id, lr[0],
                smoothed_loss.get_median_value(), start_time - prev_start_time))
            sys.stdout.flush()
-            if (iter_id + 1) % cfg.snapshot_stride == 0:
+            if (iter_id + 1) % cfg.TRAIN.snapshot_iter == 0:
                save_model("model_iter{}".format(iter_id))
            if (iter_id + 1) == cfg.max_iter:
                break
@@ -175,6 +174,4 @@ def train(cfg):
 if __name__ == '__main__':
    args = parse_args()
    print_arguments(args)
+    train()
-    data_args = reader.Settings(args)
-    train(data_args)
--- a/fluid/faster_rcnn/utility.py
+++ b/fluid/faster_rcnn/utility.py
@@ -18,7 +18,7 @@ Contains common utility functions.
 from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
+import sys
 import distutils.util
 import numpy as np
 import six
@@ -26,6 +26,7 @@ from collections import deque
 from paddle.fluid import core
 import argparse
 import functools
+from config import *
 def print_arguments(args):
@@ -96,31 +97,33 @@ def parse_args():
    add_arg('model_save_dir',   str,    'output',     "The path to save model.")
    add_arg('pretrained_model', str,    'imagenet_resnet50_fusebn', "The init model path.")
    add_arg('dataset',          str,   'coco2017',  "coco2014, coco2017.")
-    add_arg('data_dir',         str,   'data/COCO17',        "The data root path.")
    add_arg('class_num',        int,   81,          "Class number.")
+    add_arg('data_dir',         str,   'data/COCO17',        "The data root path.")
    add_arg('use_pyreader',     bool,   True,           "Use pyreader.")
+    add_arg('use_profile',         bool,   False,       "Whether use profiler.")
    add_arg('padding_minibatch',bool,   False,
        "If False, only resize image and not pad, image shape is different between"
        " GPUs in one mini-batch. If True, image shape is the same in one mini-batch.")
    #SOLVER
    add_arg('learning_rate',    float,  0.01,     "Learning rate.")
    add_arg('max_iter',         int,    180000,   "Iter number.")
-    add_arg('log_window',       int,    1,        "Log smooth window, set 1 for debug, set 20 for train.")
+    add_arg('log_window',       int,    20,        "Log smooth window, set 1 for debug, set 20 for train.")
-    add_arg('snapshot_stride',  int,    10000,    "save model every snapshot stride.")
+    # FAST RCNN
    # RPN
    add_arg('anchor_sizes',     int,    [32,64,128,256,512],  "The size of anchors.")
    add_arg('aspect_ratios',    float,  [0.5,1.0,2.0],    "The ratio of anchors.")
    add_arg('variance',         float,  [1.,1.,1.,1.],    "The variance of anchors.")
-    add_arg('rpn_stride',       float,  16.,    "Stride of the feature map that RPN is attached.")
+    add_arg('rpn_stride',       float,  [16.,16.],    "Stride of the feature map that RPN is attached.")
-    # FAST RCNN
+    add_arg('rpn_nms_thresh',    float,   0.7,          "NMS threshold used on RPN proposals")
    # TRAIN TEST INFER
-    add_arg('batch_size',       int,   1,        "Minibatch size.")
+    add_arg('im_per_batch',       int,   1,        "Minibatch size.")
    add_arg('max_size',         int,   1333,    "The resized image height.")
    add_arg('scales', int,  [800],    "The resized image height.")
    add_arg('batch_size_per_im',int,    512,    "fast rcnn head batch size")
-    add_arg('mean_value',     float,   [102.9801, 115.9465, 122.7717], "pixel mean")
+    add_arg('pixel_means',     float,   [102.9801, 115.9465, 122.7717], "pixel mean")
-    add_arg('nms_threshold',    float, 0.5,    "NMS threshold.")
+    add_arg('nms_thresh',    float, 0.5,    "NMS threshold.")
-    add_arg('score_threshold',    float, 0.05,    "score threshold for NMS.")
+    add_arg('score_thresh',    float, 0.05,    "score threshold for NMS.")
+    add_arg('snapshot_stride',  int,    10000,    "save model every snapshot stride.")
    add_arg('debug',            bool,   False,   "Debug mode")
    # SINGLE EVAL AND DRAW
    add_arg('draw_threshold',  float, 0.8,    "Confidence threshold to draw bbox.")
@@ -128,4 +131,9 @@ def parse_args():
    add_arg('image_name',        str,    '',       "The single image used to inference and visualize.")
    # yapf: enable
    args = parser.parse_args()
+    file_name = sys.argv[0]
+    if 'train' in file_name or 'profile' in file_name:
+        merge_cfg_from_args(args, 'train')
+    else:
+        merge_cfg_from_args(args, 'test')
    return args
--- a/fluid/gan/c_gan/c_gan.py
+++ b/fluid/gan/c_gan/c_gan.py
@@ -12,8 +12,12 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
 import sys
 import os
+import six
 import argparse
 import functools
 import matplotlib
@@ -102,7 +106,7 @@ def train(args):
            noise_data = np.random.uniform(
                low=-1.0, high=1.0,
                size=[args.batch_size, NOISE_SIZE]).astype('float32')
-            real_image = np.array(map(lambda x: x[0], data)).reshape(
+            real_image = np.array(list(map(lambda x: x[0], data))).reshape(
                -1, 784).astype('float32')
            conditions_data = np.array([x[1] for x in data]).reshape(
                [-1, 1]).astype("float32")
@@ -138,7 +142,7 @@ def train(args):
            d_loss_np = [d_loss_1[0][0], d_loss_2[0][0]]
-            for _ in xrange(NUM_TRAIN_TIMES_OF_DG):
+            for _ in six.moves.xrange(NUM_TRAIN_TIMES_OF_DG):
                noise_data = np.random.uniform(
                    low=-1.0, high=1.0,
                    size=[args.batch_size, NOISE_SIZE]).astype('float32')
@@ -159,8 +163,9 @@ def train(args):
                total_images = np.concatenate([real_image, generated_images])
                fig = plot(total_images)
                msg = "Epoch ID={0}\n Batch ID={1}\n D-Loss={2}\n DG-Loss={3}\n gen={4}".format(
-                    pass_id, batch_id, d_loss_np, dg_loss_np,
+                    pass_id, batch_id,
-                    check(generated_images))
+                    np.sum(d_loss_np),
+                    np.sum(dg_loss_np), check(generated_images))
                print(msg)
                plt.title(msg)
                plt.savefig(

--- a/fluid/gan/c_gan/dc_gan.py
+++ b/fluid/gan/c_gan/dc_gan.py
@@ -12,11 +12,15 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
 import sys
 import os
 import argparse
 import functools
 import matplotlib
+import six
 import numpy as np
 import paddle
 import paddle.fluid as fluid
@@ -98,7 +102,7 @@ def train(args):
            noise_data = np.random.uniform(
                low=-1.0, high=1.0,
                size=[args.batch_size, NOISE_SIZE]).astype('float32')
-            real_image = np.array(map(lambda x: x[0], data)).reshape(
+            real_image = np.array(list(map(lambda x: x[0], data))).reshape(
                -1, 784).astype('float32')
            real_labels = np.ones(
                shape=[real_image.shape[0], 1], dtype='float32')
@@ -128,7 +132,7 @@ def train(args):
            d_loss_np = [d_loss_1[0][0], d_loss_2[0][0]]
-            for _ in xrange(NUM_TRAIN_TIMES_OF_DG):
+            for _ in six.moves.xrange(NUM_TRAIN_TIMES_OF_DG):
                noise_data = np.random.uniform(
                    low=-1.0, high=1.0,
                    size=[args.batch_size, NOISE_SIZE]).astype('float32')
@@ -146,7 +150,8 @@ def train(args):
                fig = plot(total_images)
                msg = "Epoch ID={0} Batch ID={1} D-Loss={2} DG-Loss={3}\n gen={4}".format(
                    pass_id, batch_id,
-                    np.sum(d_loss_np), dg_loss_np, check(generated_images))
+                    np.sum(d_loss_np),
+                    np.sum(dg_loss_np), check(generated_images))
                print(msg)
                plt.title(msg)
                plt.savefig(

--- a/fluid/gan/c_gan/network.py
+++ b/fluid/gan/c_gan/network.py
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
 import paddle
 import paddle.fluid as fluid
 from utility import get_parent_function_name
@@ -98,19 +101,19 @@ def D_cond(image, y):
    h2 = bn(fc(h1, dfc_dim), act='leaky_relu')
    h2 = fluid.layers.concat([h2, y], 1)
-    h3 = fc(h2, 1)
+    h3 = fc(h2, 1, act='sigmoid')
    return h3
 def G_cond(z, y):
    s_h, s_w = output_height, output_width
-    s_h2, s_h4 = int(s_h / 2), int(s_h / 4)
+    s_h2, s_h4 = int(s_h // 2), int(s_h // 4)
-    s_w2, s_w4 = int(s_w / 2), int(s_w / 4)
+    s_w2, s_w4 = int(s_w // 2), int(s_w // 4)
    yb = fluid.layers.reshape(y, [-1, y_dim, 1, 1])  #NCHW
    z = fluid.layers.concat([z, y], 1)
-    h0 = bn(fc(z, gfc_dim / 2), act='relu')
+    h0 = bn(fc(z, gfc_dim // 2), act='relu')
    h0 = fluid.layers.concat([h0, y], 1)
    h1 = bn(fc(h0, gf_dim * 2 * s_h4 * s_w4), act='relu')
@@ -128,14 +131,14 @@ def D(x):
    x = conv(x, df_dim, act='leaky_relu')
    x = bn(conv(x, df_dim * 2), act='leaky_relu')
    x = bn(fc(x, dfc_dim), act='leaky_relu')
-    x = fc(x, 1, act=None)
+    x = fc(x, 1, act='sigmoid')
    return x
 def G(x):
    x = bn(fc(x, gfc_dim))
-    x = bn(fc(x, gf_dim * 2 * img_dim / 4 * img_dim / 4))
+    x = bn(fc(x, gf_dim * 2 * img_dim // 4 * img_dim // 4))
-    x = fluid.layers.reshape(x, [-1, gf_dim * 2, img_dim / 4, img_dim / 4])
+    x = fluid.layers.reshape(x, [-1, gf_dim * 2, img_dim // 4, img_dim // 4])
    x = deconv(x, gf_dim * 2, act='relu', output_size=[14, 14])
    x = deconv(x, 1, filter_size=5, padding=2, act='tanh', output_size=[28, 28])
    x = fluid.layers.reshape(x, shape=[-1, 28 * 28])

--- a/fluid/gan/c_gan/utility.py
+++ b/fluid/gan/c_gan/utility.py
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
 import math
 import distutils.util
 import numpy as np
 import inspect
 import matplotlib
+import six
 matplotlib.use('agg')
 import matplotlib.pyplot as plt
 import matplotlib.gridspec as gridspec
@@ -54,7 +58,7 @@ def print_arguments(args):
    :type args: argparse.Namespace
    """
    print("-----------  Configuration Arguments -----------")
-    for arg, value in sorted(vars(args).iteritems()):
+    for arg, value in sorted(six.iteritems(vars(args))):
        print("%s: %s" % (arg, value))
    print("------------------------------------------------")

--- a/fluid/gan/cycle_gan/data_reader.py
+++ b/fluid/gan/cycle_gan/data_reader.py
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
 import os
 from PIL import Image
 import numpy as np
-from itertools import izip
 A_LIST_FILE = "./data/horse2zebra/trainA.txt"
 B_LIST_FILE = "./data/horse2zebra/trainB.txt"
@@ -70,11 +72,3 @@ def b_test_reader():
    Reader of images with B style for test.
    """
    return reader_creater(B_TEST_LIST_FILE, cycle=False, return_name=True)
-if __name__ == "__main__":
-    for A, B in izip(a_test_reader()(), a_test_reader()()):
-        print A[0].shape
-        print A[1]
-        print B[0].shape
-        print B[1]
--- a/fluid/gan/cycle_gan/train.py
+++ b/fluid/gan/cycle_gan/train.py
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
 import data_reader
 import os
 import random
@@ -9,7 +12,6 @@ import paddle.fluid as fluid
 import numpy as np
 from paddle.fluid import core
 from trainer import *
-from itertools import izip
 from scipy.misc import imsave
 import paddle.fluid.profiler as profiler
 from utility import add_arguments, print_arguments, ImagePool
@@ -66,7 +68,7 @@ def train(args):
        if not os.path.exists(out_path):
            os.makedirs(out_path)
        i = 0
-        for data_A, data_B in izip(A_test_reader(), B_test_reader()):
+        for data_A, data_B in zip(A_test_reader(), B_test_reader()):
            A_name = data_A[1]
            B_name = data_B[1]
            tensor_A = core.LoDTensor()
@@ -114,7 +116,7 @@ def train(args):
            exe, out_path + "/d_a", main_program=d_A_trainer.program)
        fluid.io.save_persistables(
            exe, out_path + "/d_b", main_program=d_B_trainer.program)
-        print "saved checkpoint to [%s]" % out_path
+        print("saved checkpoint to {}".format(out_path))
        sys.stdout.flush()
    def init_model():
@@ -128,7 +130,7 @@ def train(args):
            exe, args.init_model + "/d_a", main_program=d_A_trainer.program)
        fluid.io.load_persistables(
            exe, args.init_model + "/d_b", main_program=d_B_trainer.program)
-        print "Load model from [%s]" % args.init_model
+        print("Load model from {}".format(args.init_model))
    if args.init_model:
        init_model()
@@ -136,8 +138,8 @@ def train(args):
    for epoch in range(args.epoch):
        batch_id = 0
        for i in range(max_images_num):
-            data_A = A_reader.next()
+            data_A = next(A_reader)
-            data_B = B_reader.next()
+            data_B = next(B_reader)
            tensor_A = core.LoDTensor()
            tensor_B = core.LoDTensor()
            tensor_A.set(data_A, place)
@@ -174,9 +176,9 @@ def train(args):
                feed={"input_A": tensor_A,
                      "fake_pool_A": fake_pool_A})
-            print "epoch[%d]; batch[%d]; g_A_loss: %s; d_B_loss: %s; g_B_loss: %s; d_A_loss: %s;" % (
+            print("epoch{}; batch{}; g_A_loss: {}; d_B_loss: {}; g_B_loss: {}; d_A_loss: {};".format(
                epoch, batch_id, g_A_loss[0], d_B_loss[0], g_B_loss[0],
-                d_A_loss[0])
+                d_A_loss[0]))
            sys.stdout.flush()
            batch_id += 1

--- a/fluid/gan/cycle_gan/trainer.py
+++ b/fluid/gan/cycle_gan/trainer.py
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
 from model import *
 import paddle.fluid as fluid

--- a/fluid/gan/cycle_gan/utility.py
+++ b/fluid/gan/cycle_gan/utility.py
@@ -17,6 +17,7 @@ from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
 import distutils.util
+import six
 import random
 import glob
 import numpy as np
@@ -39,7 +40,7 @@ def print_arguments(args):
    :type args: argparse.Namespace
    """
    print("-----------  Configuration Arguments -----------")
-    for arg, value in sorted(vars(args).iteritems()):
+    for arg, value in sorted(six.iteritems(vars(args))):
        print("%s: %s" % (arg, value))
    print("------------------------------------------------")

--- a/fluid/icnet/infer.py
+++ b/fluid/icnet/infer.py
@@ -8,7 +8,7 @@ import os
 import cv2
 import paddle.fluid as fluid
-import paddle.v2 as paddle
+import paddle
 from icnet import icnet
 from utils import add_arguments, print_arguments, get_feeder_data
 from paddle.fluid.layers.learning_rate_scheduler import _decay_step_counter
@@ -111,10 +111,10 @@ def infer(args):
    for line in open(args.images_list):
        image_file = args.images_path + "/" + line.strip()
        filename = os.path.basename(image_file)
-        image = paddle.image.load_image(
+        image = paddle.dataset.image.load_image(
            image_file, is_color=True).astype("float32")
        image -= IMG_MEAN
-        img = paddle.image.to_chw(image)[np.newaxis, :]
+        img = paddle.dataset.image.to_chw(image)[np.newaxis, :]
        image_t = fluid.core.LoDTensor()
        image_t.set(img, place)
        result = exe.run(inference_program,

--- a/fluid/icnet/train.py
+++ b/fluid/icnet/train.py
@@ -17,6 +17,7 @@ from paddle.fluid.initializer import init_on_cpu
 if 'ce_mode' in os.environ:
    np.random.seed(10)
+    fluid.default_startup_program().random_seed = 90
 parser = argparse.ArgumentParser(description=__doc__)
 add_arg = functools.partial(add_arguments, argparser=parser)
@@ -91,9 +92,6 @@ def train(args):
        place = fluid.CUDAPlace(0)
    exe = fluid.Executor(place)
-    if 'ce_mode' in os.environ:
-        fluid.default_startup_program().random_seed = 90
    exe.run(fluid.default_startup_program())
    if args.init_model is not None:
@@ -126,9 +124,10 @@ def train(args):
            sub124_loss += results[3]
            # training log
            if iter_id % LOG_PERIOD == 0:
-                print("Iter[%d]; train loss: %.3f; sub4_loss: %.3f; sub24_loss: %.3f; sub124_loss: %.3f" % (
+                print(
-                    iter_id, t_loss / LOG_PERIOD, sub4_loss / LOG_PERIOD,
+                    "Iter[%d]; train loss: %.3f; sub4_loss: %.3f; sub24_loss: %.3f; sub124_loss: %.3f"
-                    sub24_loss / LOG_PERIOD, sub124_loss / LOG_PERIOD))
+                    % (iter_id, t_loss / LOG_PERIOD, sub4_loss / LOG_PERIOD,
+                       sub24_loss / LOG_PERIOD, sub124_loss / LOG_PERIOD))
                print("kpis	train_cost	%f" % (t_loss / LOG_PERIOD))
                t_loss = 0.

--- a/fluid/image_classification/README_cn.md
+++ b/fluid/image_classification/README_cn.md
@@ -14,7 +14,7 @@
 ## 安装
-在当前目录下运行样例代码需要PadddlePaddle Fluid的v0.13.0或以上的版本。如果你的运行环境中的PaddlePaddle低于此版本，请根据[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明来更新PaddlePaddle。
+在当前目录下运行样例代码需要PadddlePaddle Fluid的v0.13.0或以上的版本。如果你的运行环境中的PaddlePaddle低于此版本，请根据安装文档中的说明来更新PaddlePaddle。
 ## 数据准备

--- a/fluid/image_classification/_ce.py
+++ b/fluid/image_classification/_ce.py
@@ -19,7 +19,7 @@ test_acc_top1_kpi = AccKpi(
 test_acc_top5_kpi = AccKpi(
    'test_acc_top5', 0.02, 0, actived=True, desc='TOP5 ACC')
 test_cost_kpi = CostKpi('test_cost', 0.02, 0, actived=True, desc='train cost')
-train_speed_kpi = AccKpi(
+train_speed_kpi = DurationKpi(
    'train_speed',
    0.05,
    0,
@@ -38,7 +38,7 @@ test_acc_top5_card4_kpi = AccKpi(
    'test_acc_top5_card4', 0.02, 0, actived=True, desc='TOP5 ACC')
 test_cost_card4_kpi = CostKpi(
    'test_cost_card4', 0.02, 0, actived=True, desc='train cost')
-train_speed_card4_kpi = AccKpi(
+train_speed_card4_kpi = DurationKpi(
    'train_speed_card4',
    0.05,
    0,

--- a/fluid/image_classification/caffe2fluid/README.md
+++ b/fluid/image_classification/caffe2fluid/README.md
@@ -19,7 +19,7 @@ This tool is used to convert a Caffe model to a Fluid model
    - Download one from github directly
        ```
-        cd proto/ && wget https://github.com/ethereon/caffe-tensorflow/blob/master/kaffe/caffe/caffepb.py
+        cd proto/ && wget https://raw.githubusercontent.com/ethereon/caffe-tensorflow/master/kaffe/caffe/caffepb.py
        ```
 2. Convert the Caffe model to Fluid model

--- a/fluid/image_classification/caffe2fluid/examples/mnist/evaluate.py
+++ b/fluid/image_classification/caffe2fluid/examples/mnist/evaluate.py
@@ -8,7 +8,7 @@ import sys
 import os
 import numpy as np
 import paddle.fluid as fluid
-import paddle.v2 as paddle
+import paddle
 def test_model(exe, test_program, fetch_list, test_reader, feeder):

--- a/fluid/language_model/train.py
+++ b/fluid/language_model/train.py
@@ -21,6 +21,11 @@ def parse_args():
        action='store_true',
        help='If set, run \
        the task with continuous evaluation logs.')
+    parser.add_argument(
+        '--num_devices',
+        type=int,
+        default=1,
+        help='Number of GPU devices')
    args = parser.parse_args()
    return args
@@ -132,7 +137,7 @@ def train(train_reader,
                "src_wordseq": lod_src_wordseq,
                "dst_wordseq": lod_dst_wordseq
            },
-                                         fetch_list=fetch_list)
+                fetch_list=fetch_list)
            avg_ppl = np.exp(ret_avg_cost[0])
            newest_ppl = np.mean(avg_ppl)
            if i % 100 == 0:
@@ -153,7 +158,7 @@ def train(train_reader,
                print("kpis	imikolov_20_avg_ppl	%s" % newest_ppl)
            else:
                print("kpis	imikolov_20_pass_duration_card%s	%s" % \
-                                (gpu_num, total_time / epoch_idx))
+                      (gpu_num, total_time / epoch_idx))
                print("kpis	imikolov_20_avg_ppl_card%s	%s" %
                      (gpu_num, newest_ppl))
        save_dir = "%s/epoch_%d" % (model_dir, epoch_idx)
@@ -165,13 +170,13 @@ def train(train_reader,
    print("finish training")
-def get_cards(enable_ce):
+def get_cards(args):
-    if enable_ce:
+    if args.enable_ce:
        cards = os.environ.get('CUDA_VISIBLE_DEVICES')
        num = len(cards.split(","))
        return num
    else:
-        return fluid.core.get_cuda_device_count()
+        return args.num_devices
 def train_net():
@@ -179,7 +184,7 @@ def train_net():
    batch_size = 20
    args = parse_args()
    vocab, train_reader, test_reader = utils.prepare_data(
-        batch_size=batch_size * get_cards(args.enable_ce), buffer_size=1000, \
+        batch_size=batch_size * get_cards(args), buffer_size=1000, \
        word_freq_threshold=0, enable_ce = args.enable_ce)
    train(
        train_reader=train_reader,

--- a/fluid/machine_reading_comprehension/README.md
+++ b/fluid/machine_reading_comprehension/README.md
+# Abstract
+Dureader is an end-to-end neural network model for machine reading comprehension style question answering, which aims to answer questions from given passages. We first match the question and passages with a bidireactional attention flow network to obtrain the question-aware passages represenation. Then we employ a pointer network to locate the positions of answers from passages. Our experimental evalutions show that DuReader model achieves the state-of-the-art results in DuReader Dadaset.
+# Dataset
+DuReader Dataset is a new large-scale real-world and human sourced MRC dataset in Chinese. DuReader focuses on real-world open-domain question answering. The advantages of DuReader over existing datasets are concluded as follows:
+ - Real question
+ - Real article
+ - Real answer
+ - Real application scenario
+ - Rich annotation
+# Network
+DuReader model is inspired by 3 classic reading comprehension models([BiDAF](https://arxiv.org/abs/1611.01603), [Match-LSTM](https://arxiv.org/abs/1608.07905), [R-NET](https://www.microsoft.com/en-us/research/wp-content/uploads/2017/05/r-net.pdf)).
+DuReader model is a hierarchical multi-stage process and consists of five layers
+- **Word Embedding Layer** maps each word to a vector using a pre-trained word embedding model.
+- **Encoding Layer** extracts context infomation for each position in question and passages with a bi-directional LSTM network.
+- **Attention Flow Layer** couples the query and context vectors and produces a set of query-aware feature vectors for each word in the context. Please refer to [BiDAF](https://arxiv.org/abs/1611.01603) for more details.
+- **Fusion Layer** employs a layer of bi-directional LSTM to capture the interaction among context words independent of the query.
+- **Decode Layer** employs an answer point network with attention pooling of the quesiton to locate the positions of answers from passages. Please refer to [Match-LSTM](https://arxiv.org/abs/1608.07905) and [R-NET](https://www.microsoft.com/en-us/research/wp-content/uploads/2017/05/r-net.pdf) for more details.
+## How to Run
+### Download the Dataset
+To Download DuReader dataset:
+```
+cd data && bash download.sh
+```
+For more details about DuReader dataset please refer to [DuReader Dataset Homepage](https://ai.baidu.com//broad/subordinate?dataset=dureader).
+### Download Thirdparty Dependencies
+We use Bleu and Rouge as evaluation metrics, the calculation of these metrics relies on the scoring scripts under [coco-caption](https://github.com/tylin/coco-caption), to download them, run:
+```
+cd utils && bash download_thirdparty.sh
+```
+### Environment Requirements
+For now we've only tested on PaddlePaddle v1.0, to install PaddlePaddle and for more details about PaddlePaddle, see [PaddlePaddle Homepage](http://paddlepaddle.org).
+### Preparation
+Before training the model, we have to make sure that the data is ready. For preparation, we will check the data files, make directories and extract a vocabulary for later use. You can run the following command to do this with a specified task name:
+```
+sh run.sh --prepare
+```
+You can specify the files for train/dev/test by setting the `trainset`/`devset`/`testset`.
+### Training
+To train the model and you can also set the hyper-parameters such as the learning rate by using `--learning_rate NUM`. For example, to train the model for 10 passes, you can run:
+```
+sh run.sh --train --pass_num 10
+```
+The training process includes an evaluation on the dev set after each training epoch. By default, the model with the least Bleu-4 score on the dev set will be saved.
+### Evaluation
+To conduct a single evaluation on the dev set with the the model already trained, you can run the following command:
+```
+sh run.sh --evaluate  --load_dir models/1
+```
+### Prediction
+You can also predict answers for the samples in some files using the following command:
+```
+sh run.sh --predict --load_dir models/1 --testset ../data/preprocessed/testset/search.dev.json
+```
+By default, the results are saved at `../data/results/` folder. You can change this by specifying `--result_dir DIR_PATH`.
--- a/fluid/machine_reading_comprehension/args.py
+++ b/fluid/machine_reading_comprehension/args.py
+#   Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import argparse
+import distutils.util
+def parse_args():
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument(
+        '--prepare',
+        action='store_true',
+        help='create the directories, prepare the vocabulary and embeddings')
+    parser.add_argument('--train', action='store_true', help='train the model')
+    parser.add_argument(
+        '--evaluate', action='store_true', help='evaluate the model on dev set')
+    parser.add_argument(
+        '--predict',
+        action='store_true',
+        help='predict the answers for test set with trained model')
+    parser.add_argument(
+        "--embed_size",
+        type=int,
+        default=300,
+        help="The dimension of embedding table. (default: %(default)d)")
+    parser.add_argument(
+        "--hidden_size",
+        type=int,
+        default=300,
+        help="The size of rnn hidden unit. (default: %(default)d)")
+    parser.add_argument(
+        "--batch_size",
+        type=int,
+        default=32,
+        help="The sequence number of a mini-batch data. (default: %(default)d)")
+    parser.add_argument(
+        "--pass_num",
+        type=int,
+        default=5,
+        help="The pass number to train. (default: %(default)d)")
+    parser.add_argument(
+        "--learning_rate",
+        type=float,
+        default=0.001,
+        help="Learning rate used to train the model. (default: %(default)f)")
+    parser.add_argument(
+        "--weight_decay",
+        type=float,
+        default=0.0001,
+        help="Weight decay. (default: %(default)f)")
+    parser.add_argument(
+        "--use_gpu",
+        type=distutils.util.strtobool,
+        default=True,
+        help="Whether to use gpu. (default: %(default)d)")
+    parser.add_argument(
+        "--save_dir",
+        type=str,
+        default="model",
+        help="Specify the path to save trained models.")
+    parser.add_argument(
+        "--load_dir",
+        type=str,
+        default="",
+        help="Specify the path to load trained models.")
+    parser.add_argument(
+        "--save_interval",
+        type=int,
+        default=1,
+        help="Save the trained model every n passes."
+        "(default: %(default)d)")
+    parser.add_argument(
+        "--log_interval",
+        type=int,
+        default=50,
+        help="log the train loss every n batches."
+        "(default: %(default)d)")
+    parser.add_argument(
+        "--dev_interval",
+        type=int,
+        default=1000,
+        help="cal dev loss every n batches."
+        "(default: %(default)d)")
+    parser.add_argument('--optim', default='adam', help='optimizer type')
+    parser.add_argument('--trainset', nargs='+', help='train dataset')
+    parser.add_argument('--devset', nargs='+', help='dev dataset')
+    parser.add_argument('--testset', nargs='+', help='test dataset')
+    parser.add_argument('--vocab_dir', help='dict')
+    parser.add_argument('--max_p_num', type=int, default=5)
+    parser.add_argument('--max_a_len', type=int, default=200)
+    parser.add_argument('--max_p_len', type=int, default=500)
+    parser.add_argument('--max_q_len', type=int, default=9)
+    parser.add_argument('--doc_num', type=int, default=5)
+    parser.add_argument('--para_print', action='store_true')
+    parser.add_argument('--drop_rate', type=float, default=0.0)
+    parser.add_argument('--random_seed', type=int, default=123)
+    parser.add_argument(
+        '--log_path',
+        help='path of the log file. If not set, logs are printed to console')
+    parser.add_argument(
+        '--result_dir',
+        default='../data/results/',
+        help='the dir to output the results')
+    parser.add_argument(
+        '--result_name',
+        default='test_result',
+        help='the file name of the results')
+    args = parser.parse_args()
+    return args
--- a/fluid/machine_reading_comprehension/data/download.sh
+++ b/fluid/machine_reading_comprehension/data/download.sh
+#!/bin/bash
+# ==============================================================================
+# Copyright 2017 Baidu.com, Inc. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+if [[ -d preprocessed ]] && [[ -d raw ]]; then
+    echo "data exist"
+    exit 0
+else
+    wget -c --no-check-certificate http://dureader.gz.bcebos.com/dureader_preprocessed.zip 
+fi
+if md5sum --status -c md5sum.txt; then
+    unzip dureader_preprocessed.zip
+else
+    echo "download data error!" >> /dev/stderr
+    exit 1
+fi
--- a/fluid/machine_reading_comprehension/data/md5sum.txt
+++ b/fluid/machine_reading_comprehension/data/md5sum.txt
+7a4c28026f7dc94e8135d17203c63664  dureader_preprocessed.zip
--- a/fluid/machine_reading_comprehension/dataset.py
+++ b/fluid/machine_reading_comprehension/dataset.py
+# -*- coding:utf8 -*-
+# ==============================================================================
+# Copyright 2017 Baidu.com, Inc. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""
+This module implements data process strategies.
+"""
+import os
+import json
+import logging
+import numpy as np
+from collections import Counter
+class BRCDataset(object):
+    """
+    This module implements the APIs for loading and using baidu reading comprehension dataset
+    """
+    def __init__(self,
+                 max_p_num,
+                 max_p_len,
+                 max_q_len,
+                 train_files=[],
+                 dev_files=[],
+                 test_files=[]):
+        self.logger = logging.getLogger("brc")
+        self.max_p_num = max_p_num
+        self.max_p_len = max_p_len
+        self.max_q_len = max_q_len
+        self.train_set, self.dev_set, self.test_set = [], [], []
+        if train_files:
+            for train_file in train_files:
+                self.train_set += self._load_dataset(train_file, train=True)
+            self.logger.info('Train set size: {} questions.'.format(
+                len(self.train_set)))
+        if dev_files:
+            for dev_file in dev_files:
+                self.dev_set += self._load_dataset(dev_file)
+            self.logger.info('Dev set size: {} questions.'.format(
+                len(self.dev_set)))
+        if test_files:
+            for test_file in test_files:
+                self.test_set += self._load_dataset(test_file)
+            self.logger.info('Test set size: {} questions.'.format(
+                len(self.test_set)))
+    def _load_dataset(self, data_path, train=False):
+        """
+        Loads the dataset
+        Args:
+            data_path: the data file to load
+        """
+        with open(data_path) as fin:
+            data_set = []
+            for lidx, line in enumerate(fin):
+                sample = json.loads(line.strip())
+                if train:
+                    if len(sample['answer_spans']) == 0:
+                        continue
+                    if sample['answer_spans'][0][1] >= self.max_p_len:
+                        continue
+                if 'answer_docs' in sample:
+                    sample['answer_passages'] = sample['answer_docs']
+                sample['question_tokens'] = sample['segmented_question']
+                sample['passages'] = []
+                for d_idx, doc in enumerate(sample['documents']):
+                    if train:
+                        most_related_para = doc['most_related_para']
+                        sample['passages'].append({
+                            'passage_tokens':
+                            doc['segmented_paragraphs'][most_related_para],
+                            'is_selected': doc['is_selected']
+                        })
+                    else:
+                        para_infos = []
+                        for para_tokens in doc['segmented_paragraphs']:
+                            question_tokens = sample['segmented_question']
+                            common_with_question = Counter(
+                                para_tokens) & Counter(question_tokens)
+                            correct_preds = sum(common_with_question.values())
+                            if correct_preds == 0:
+                                recall_wrt_question = 0
+                            else:
+                                recall_wrt_question = float(
+                                    correct_preds) / len(question_tokens)
+                            para_infos.append((para_tokens, recall_wrt_question,
+                                               len(para_tokens)))
+                        para_infos.sort(key=lambda x: (-x[1], x[2]))
+                        fake_passage_tokens = []
+                        for para_info in para_infos[:1]:
+                            fake_passage_tokens += para_info[0]
+                        sample['passages'].append({
+                            'passage_tokens': fake_passage_tokens
+                        })
+                data_set.append(sample)
+        return data_set
+    def _one_mini_batch(self, data, indices, pad_id):
+        """
+        Get one mini batch
+        Args:
+            data: all data
+            indices: the indices of the samples to be selected
+            pad_id:
+        Returns:
+            one batch of data
+        """
+        batch_data = {
+            'raw_data': [data[i] for i in indices],
+            'question_token_ids': [],
+            'question_length': [],
+            'passage_token_ids': [],
+            'passage_length': [],
+            'start_id': [],
+            'end_id': []
+        }
+        max_passage_num = max(
+            [len(sample['passages']) for sample in batch_data['raw_data']])
+        #max_passage_num = min(self.max_p_num, max_passage_num)
+        max_passage_num = self.max_p_num
+        for sidx, sample in enumerate(batch_data['raw_data']):
+            for pidx in range(max_passage_num):
+                if pidx < len(sample['passages']):
+                    batch_data['question_token_ids'].append(sample[
+                        'question_token_ids'])
+                    batch_data['question_length'].append(
+                        len(sample['question_token_ids']))
+                    passage_token_ids = sample['passages'][pidx][
+                        'passage_token_ids']
+                    batch_data['passage_token_ids'].append(passage_token_ids)
+                    batch_data['passage_length'].append(
+                        min(len(passage_token_ids), self.max_p_len))
+                else:
+                    batch_data['question_token_ids'].append([])
+                    batch_data['question_length'].append(0)
+                    batch_data['passage_token_ids'].append([])
+                    batch_data['passage_length'].append(0)
+        batch_data, padded_p_len, padded_q_len = self._dynamic_padding(
+            batch_data, pad_id)
+        for sample in batch_data['raw_data']:
+            if 'answer_passages' in sample and len(sample['answer_passages']):
+                gold_passage_offset = padded_p_len * sample['answer_passages'][
+                    0]
+                batch_data['start_id'].append(gold_passage_offset + sample[
+                    'answer_spans'][0][0])
+                batch_data['end_id'].append(gold_passage_offset + sample[
+                    'answer_spans'][0][1])
+            else:
+                # fake span for some samples, only valid for testing
+                batch_data['start_id'].append(0)
+                batch_data['end_id'].append(0)
+        return batch_data
+    def _dynamic_padding(self, batch_data, pad_id):
+        """
+        Dynamically pads the batch_data with pad_id
+        """
+        pad_p_len = min(self.max_p_len, max(batch_data['passage_length']))
+        pad_q_len = min(self.max_q_len, max(batch_data['question_length']))
+        batch_data['passage_token_ids'] = [
+            (ids + [pad_id] * (pad_p_len - len(ids)))[:pad_p_len]
+            for ids in batch_data['passage_token_ids']
+        ]
+        batch_data['question_token_ids'] = [
+            (ids + [pad_id] * (pad_q_len - len(ids)))[:pad_q_len]
+            for ids in batch_data['question_token_ids']
+        ]
+        return batch_data, pad_p_len, pad_q_len
+    def word_iter(self, set_name=None):
+        """
+        Iterates over all the words in the dataset
+        Args:
+            set_name: if it is set, then the specific set will be used
+        Returns:
+            a generator
+        """
+        if set_name is None:
+            data_set = self.train_set + self.dev_set + self.test_set
+        elif set_name == 'train':
+            data_set = self.train_set
+        elif set_name == 'dev':
+            data_set = self.dev_set
+        elif set_name == 'test':
+            data_set = self.test_set
+        else:
+            raise NotImplementedError('No data set named as {}'.format(
+                set_name))
+        if data_set is not None:
+            for sample in data_set:
+                for token in sample['question_tokens']:
+                    yield token
+                for passage in sample['passages']:
+                    for token in passage['passage_tokens']:
+                        yield token
+    def convert_to_ids(self, vocab):
+        """
+        Convert the question and passage in the original dataset to ids
+        Args:
+            vocab: the vocabulary on this dataset
+        """
+        for data_set in [self.train_set, self.dev_set, self.test_set]:
+            if data_set is None:
+                continue
+            for sample in data_set:
+                sample['question_token_ids'] = vocab.convert_to_ids(sample[
+                    'question_tokens'])
+                for passage in sample['passages']:
+                    passage['passage_token_ids'] = vocab.convert_to_ids(passage[
+                        'passage_tokens'])
+    def gen_mini_batches(self, set_name, batch_size, pad_id, shuffle=True):
+        """
+        Generate data batches for a specific dataset (train/dev/test)
+        Args:
+            set_name: train/dev/test to indicate the set
+            batch_size: number of samples in one batch
+            pad_id: pad id
+            shuffle: if set to be true, the data is shuffled.
+        Returns:
+            a generator for all batches
+        """
+        if set_name == 'train':
+            data = self.train_set
+        elif set_name == 'dev':
+            data = self.dev_set
+        elif set_name == 'test':
+            data = self.test_set
+        else:
+            raise NotImplementedError('No data set named as {}'.format(
+                set_name))
+        data_size = len(data)
+        indices = np.arange(data_size)
+        if shuffle:
+            np.random.shuffle(indices)
+        for batch_start in np.arange(0, data_size, batch_size):
+            batch_indices = indices[batch_start:batch_start + batch_size]
+            yield self._one_mini_batch(data, batch_indices, pad_id)
--- a/fluid/machine_reading_comprehension/rc_model.py
+++ b/fluid/machine_reading_comprehension/rc_model.py
+#   Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import paddle.fluid.layers as layers
+import paddle.fluid as fluid
+import numpy as np
+def dropout(input, args):
+    if args.drop_rate:
+        return layers.dropout(
+            input,
+            dropout_prob=args.drop_rate,
+            seed=args.random_seed,
+            is_test=False)
+    else:
+        return input
+def bi_lstm_encoder(input_seq, gate_size, para_name, args):
+    # A bi-directional lstm encoder implementation.
+    # Linear transformation part for input gate, output gate, forget gate
+    # and cell activation vectors need be done outside of dynamic_lstm.
+    # So the output size is 4 times of gate_size.
+    input_forward_proj = layers.fc(
+        input=input_seq,
+        param_attr=fluid.ParamAttr(name=para_name + '_fw_gate_w'),
+        size=gate_size * 4,
+        act=None,
+        bias_attr=False)
+    input_reversed_proj = layers.fc(
+        input=input_seq,
+        param_attr=fluid.ParamAttr(name=para_name + '_bw_gate_w'),
+        size=gate_size * 4,
+        act=None,
+        bias_attr=False)
+    forward, _ = layers.dynamic_lstm(
+        input=input_forward_proj,
+        size=gate_size * 4,
+        use_peepholes=False,
+        param_attr=fluid.ParamAttr(name=para_name + '_fw_lstm_w'),
+        bias_attr=fluid.ParamAttr(name=para_name + '_fw_lstm_b'))
+    reversed, _ = layers.dynamic_lstm(
+        input=input_reversed_proj,
+        param_attr=fluid.ParamAttr(name=para_name + '_bw_lstm_w'),
+        bias_attr=fluid.ParamAttr(name=para_name + '_bw_lstm_b'),
+        size=gate_size * 4,
+        is_reverse=True,
+        use_peepholes=False)
+    encoder_out = layers.concat(input=[forward, reversed], axis=1)
+    return encoder_out
+def encoder(input_name, para_name, shape, hidden_size, args):
+    input_ids = layers.data(
+        name=input_name, shape=[1], dtype='int64', lod_level=1)
+    input_embedding = layers.embedding(
+        input=input_ids,
+        size=shape,
+        dtype='float32',
+        is_sparse=True,
+        param_attr=fluid.ParamAttr(name='embedding_para'))
+    encoder_out = bi_lstm_encoder(
+        input_seq=input_embedding,
+        gate_size=hidden_size,
+        para_name=para_name,
+        args=args)
+    return dropout(encoder_out, args)
+def attn_flow(q_enc, p_enc, p_ids_name, args):
+    tag = p_ids_name + "::"
+    drnn = layers.DynamicRNN()
+    with drnn.block():
+        h_cur = drnn.step_input(p_enc)
+        u_all = drnn.static_input(q_enc)
+        h_expd = layers.sequence_expand(x=h_cur, y=u_all)
+        s_t_mul = layers.elementwise_mul(x=u_all, y=h_expd, axis=0)
+        s_t_sum = layers.reduce_sum(input=s_t_mul, dim=1, keep_dim=True)
+        s_t_re = layers.reshape(s_t_sum, shape=[-1, 0])
+        s_t = layers.sequence_softmax(input=s_t_re)
+        u_expr = layers.elementwise_mul(x=u_all, y=s_t, axis=0)
+        u_expr = layers.sequence_pool(input=u_expr, pool_type='sum')
+        b_t = layers.sequence_pool(input=s_t_sum, pool_type='max')
+        drnn.output(u_expr, b_t)
+    U_expr, b = drnn()
+    b_norm = layers.sequence_softmax(input=b)
+    h_expr = layers.elementwise_mul(x=p_enc, y=b_norm, axis=0)
+    h_expr = layers.sequence_pool(input=h_expr, pool_type='sum')
+    H_expr = layers.sequence_expand(x=h_expr, y=p_enc)
+    H_expr = layers.lod_reset(x=H_expr, y=p_enc)
+    h_u = layers.elementwise_mul(x=p_enc, y=U_expr, axis=0)
+    h_h = layers.elementwise_mul(x=p_enc, y=H_expr, axis=0)
+    g = layers.concat(input=[p_enc, U_expr, h_u, h_h], axis=1)
+    return dropout(g, args)
+def lstm_step(x_t, hidden_t_prev, cell_t_prev, size, para_name, args):
+    def linear(inputs, para_name, args):
+        return layers.fc(input=inputs,
+                         size=size,
+                         param_attr=fluid.ParamAttr(name=para_name + '_w'),
+                         bias_attr=fluid.ParamAttr(name=para_name + '_b'))
+    input_cat = layers.concat([hidden_t_prev, x_t], axis=1)
+    forget_gate = layers.sigmoid(x=linear(input_cat, para_name + '_lstm_f',
+                                          args))
+    input_gate = layers.sigmoid(x=linear(input_cat, para_name + '_lstm_i',
+                                         args))
+    output_gate = layers.sigmoid(x=linear(input_cat, para_name + '_lstm_o',
+                                          args))
+    cell_tilde = layers.tanh(x=linear(input_cat, para_name + '_lstm_c', args))
+    cell_t = layers.sums(input=[
+        layers.elementwise_mul(
+            x=forget_gate, y=cell_t_prev), layers.elementwise_mul(
+                x=input_gate, y=cell_tilde)
+    ])
+    hidden_t = layers.elementwise_mul(x=output_gate, y=layers.tanh(x=cell_t))
+    return hidden_t, cell_t
+#point network
+def point_network_decoder(p_vec, q_vec, hidden_size, args):
+    tag = 'pn_decoder:'
+    init_random = fluid.initializer.Normal(loc=0.0, scale=1.0)
+    random_attn = layers.create_parameter(
+        shape=[1, hidden_size],
+        dtype='float32',
+        default_initializer=init_random)
+    random_attn = layers.fc(
+        input=random_attn,
+        size=hidden_size,
+        act=None,
+        param_attr=fluid.ParamAttr(name=tag + 'random_attn_fc_w'),
+        bias_attr=fluid.ParamAttr(name=tag + 'random_attn_fc_b'))
+    random_attn = layers.reshape(random_attn, shape=[-1])
+    U = layers.fc(input=q_vec,
+                  param_attr=fluid.ParamAttr(name=tag + 'q_vec_fc_w'),
+                  bias_attr=False,
+                  size=hidden_size,
+                  act=None) + random_attn
+    U = layers.tanh(U)
+    logits = layers.fc(input=U,
+                       param_attr=fluid.ParamAttr(name=tag + 'logits_fc_w'),
+                       bias_attr=fluid.ParamAttr(name=tag + 'logits_fc_b'),
+                       size=1,
+                       act=None)
+    scores = layers.sequence_softmax(input=logits)
+    pooled_vec = layers.elementwise_mul(x=q_vec, y=scores, axis=0)
+    pooled_vec = layers.sequence_pool(input=pooled_vec, pool_type='sum')
+    init_state = layers.fc(
+        input=pooled_vec,
+        param_attr=fluid.ParamAttr(name=tag + 'init_state_fc_w'),
+        bias_attr=fluid.ParamAttr(name=tag + 'init_state_fc_b'),
+        size=hidden_size,
+        act=None)
+    def custom_dynamic_rnn(p_vec, init_state, hidden_size, para_name, args):
+        tag = para_name + "custom_dynamic_rnn:"
+        def static_rnn(step,
+                       p_vec=p_vec,
+                       init_state=None,
+                       para_name='',
+                       args=args):
+            tag = para_name + "static_rnn:"
+            ctx = layers.fc(
+                input=p_vec,
+                param_attr=fluid.ParamAttr(name=tag + 'context_fc_w'),
+                bias_attr=fluid.ParamAttr(name=tag + 'context_fc_b'),
+                size=hidden_size,
+                act=None)
+            beta = []
+            c_prev = init_state
+            m_prev = init_state
+            for i in range(step):
+                m_prev0 = layers.fc(
+                    input=m_prev,
+                    size=hidden_size,
+                    act=None,
+                    param_attr=fluid.ParamAttr(name=tag + 'm_prev0_fc_w'),
+                    bias_attr=fluid.ParamAttr(name=tag + 'm_prev0_fc_b'))
+                m_prev1 = layers.sequence_expand(x=m_prev0, y=ctx)
+                Fk = ctx + m_prev1
+                Fk = layers.tanh(Fk)
+                logits = layers.fc(
+                    input=Fk,
+                    size=1,
+                    act=None,
+                    param_attr=fluid.ParamAttr(name=tag + 'logits_fc_w'),
+                    bias_attr=fluid.ParamAttr(name=tag + 'logits_fc_b'))
+                scores = layers.sequence_softmax(input=logits)
+                attn_ctx = layers.elementwise_mul(x=p_vec, y=scores, axis=0)
+                attn_ctx = layers.sequence_pool(input=attn_ctx, pool_type='sum')
+                hidden_t, cell_t = lstm_step(
+                    attn_ctx,
+                    hidden_t_prev=m_prev,
+                    cell_t_prev=c_prev,
+                    size=hidden_size,
+                    para_name=tag,
+                    args=args)
+                m_prev = hidden_t
+                c_prev = cell_t
+                beta.append(scores)
+            return beta
+        return static_rnn(
+            2, p_vec=p_vec, init_state=init_state, para_name=para_name)
+    fw_outputs = custom_dynamic_rnn(p_vec, init_state, hidden_size, tag + "fw:",
+                                    args)
+    bw_outputs = custom_dynamic_rnn(p_vec, init_state, hidden_size, tag + "bw:",
+                                    args)
+    start_prob = layers.elementwise_add(
+        x=fw_outputs[0], y=bw_outputs[1], axis=0) / 2
+    end_prob = layers.elementwise_add(
+        x=fw_outputs[1], y=bw_outputs[0], axis=0) / 2
+    return start_prob, end_prob
+def fusion(g, args):
+    m = bi_lstm_encoder(
+        input_seq=g, gate_size=args.hidden_size, para_name='fusion', args=args)
+    return dropout(m, args)
+def rc_model(hidden_size, vocab, args):
+    emb_shape = [vocab.size(), vocab.embed_dim]
+    # stage 1:encode 
+    p_ids_names = []
+    q_ids_names = []
+    ms = []
+    gs = []
+    qs = []
+    for i in range(args.doc_num):
+        p_ids_name = "pids_%d" % i
+        p_ids_names.append(p_ids_name)
+        p_enc_i = encoder(p_ids_name, 'p_enc', emb_shape, hidden_size, args)
+        q_ids_name = "qids_%d" % i
+        q_ids_names.append(q_ids_name)
+        q_enc_i = encoder(q_ids_name, 'q_enc', emb_shape, hidden_size, args)
+        # stage 2:match
+        g_i = attn_flow(q_enc_i, p_enc_i, p_ids_name, args)
+        # stage 3:fusion
+        m_i = fusion(g_i, args)
+        ms.append(m_i)
+        gs.append(g_i)
+        qs.append(q_enc_i)
+    m = layers.sequence_concat(input=ms)
+    g = layers.sequence_concat(input=gs)
+    q_vec = layers.sequence_concat(input=qs)
+    # stage 4:decode 
+    start_probs, end_probs = point_network_decoder(
+        p_vec=m, q_vec=q_vec, hidden_size=hidden_size, args=args)
+    start_labels = layers.data(
+        name="start_lables", shape=[1], dtype='float32', lod_level=1)
+    end_labels = layers.data(
+        name="end_lables", shape=[1], dtype='float32', lod_level=1)
+    cost0 = layers.sequence_pool(
+        layers.cross_entropy(
+            input=start_probs, label=start_labels, soft_label=True),
+        'sum')
+    cost1 = layers.sequence_pool(
+        layers.cross_entropy(
+            input=end_probs, label=end_labels, soft_label=True),
+        'sum')
+    cost0 = layers.mean(cost0)
+    cost1 = layers.mean(cost1)
+    cost = cost0 + cost1
+    cost.persistable = True
+    feeding_list = q_ids_names + ["start_lables", "end_lables"] + p_ids_names
+    return cost, start_probs, end_probs, feeding_list
--- a/fluid/machine_reading_comprehension/run.py
+++ b/fluid/machine_reading_comprehension/run.py
+#   Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import numpy as np
+import time
+import os
+import random
+import json
+import six
+import paddle
+import paddle.fluid as fluid
+import paddle.fluid.core as core
+import paddle.fluid.framework as framework
+from paddle.fluid.executor import Executor
+import sys
+if sys.version[0] == '2':
+    reload(sys)
+    sys.setdefaultencoding("utf-8")
+sys.path.append('..')
+from args import *
+import rc_model
+from dataset import BRCDataset
+import logging
+import pickle
+from utils import normalize
+from utils import compute_bleu_rouge
+from vocab import Vocab
+def prepare_batch_input(insts, args):
+    doc_num = args.doc_num
+    batch_size = len(insts['raw_data'])
+    new_insts = []
+    for i in range(batch_size):
+        p_id = []
+        q_id = []
+        p_ids = []
+        q_ids = []
+        p_len = 0
+        for j in range(i * doc_num, (i + 1) * doc_num):
+            p_ids.append(insts['passage_token_ids'][j])
+            p_id = p_id + insts['passage_token_ids'][j]
+            q_ids.append(insts['question_token_ids'][j])
+            q_id = q_id + insts['question_token_ids'][j]
+        p_len = len(p_id)
+        def _get_label(idx, ref_len):
+            ret = [0.0] * ref_len
+            if idx >= 0 and idx < ref_len:
+                ret[idx] = 1.0
+            return [[x] for x in ret]
+        start_label = _get_label(insts['start_id'][i], p_len)
+        end_label = _get_label(insts['end_id'][i], p_len)
+        new_inst = q_ids + [start_label, end_label] + p_ids
+        new_insts.append(new_inst)
+    return new_insts
+def LodTensor_Array(lod_tensor):
+    lod = lod_tensor.lod()
+    array = np.array(lod_tensor)
+    new_array = []
+    for i in range(len(lod[0]) - 1):
+        new_array.append(array[lod[0][i]:lod[0][i + 1]])
+    return new_array
+def print_para(train_prog, train_exe, logger, args):
+    if args.para_print:
+        param_list = train_prog.block(0).all_parameters()
+        param_name_list = [p.name for p in param_list]
+        num_sum = 0
+        for p_name in param_name_list:
+            p_array = np.array(train_exe.scope.find_var(p_name).get_tensor())
+            param_num = np.prod(p_array.shape)
+            num_sum = num_sum + param_num
+            logger.info(
+                "param: {0},  mean={1}  max={2}  min={3}  num={4} {5}".format(
+                    p_name,
+                    p_array.mean(),
+                    p_array.max(), p_array.min(), p_array.shape, param_num))
+        logger.info("total param num: {0}".format(num_sum))
+def find_best_answer_for_passage(start_probs, end_probs, passage_len, args):
+    """
+    Finds the best answer with the maximum start_prob * end_prob from a single passage
+    """
+    if passage_len is None:
+        passage_len = len(start_probs)
+    else:
+        passage_len = min(len(start_probs), passage_len)
+    best_start, best_end, max_prob = -1, -1, 0
+    for start_idx in range(passage_len):
+        for ans_len in range(args.max_a_len):
+            end_idx = start_idx + ans_len
+            if end_idx >= passage_len:
+                continue
+            prob = start_probs[start_idx] * end_probs[end_idx]
+            if prob > max_prob:
+                best_start = start_idx
+                best_end = end_idx
+                max_prob = prob
+    return (best_start, best_end), max_prob
+def find_best_answer(sample, start_prob, end_prob, padded_p_len, args):
+    """
+    Finds the best answer for a sample given start_prob and end_prob for each position.
+    This will call find_best_answer_for_passage because there are multiple passages in a sample
+    """
+    best_p_idx, best_span, best_score = None, None, 0
+    for p_idx, passage in enumerate(sample['passages']):
+        if p_idx >= args.max_p_num:
+            continue
+        passage_len = min(args.max_p_len, len(passage['passage_tokens']))
+        answer_span, score = find_best_answer_for_passage(
+            start_prob[p_idx * padded_p_len:(p_idx + 1) * padded_p_len],
+            end_prob[p_idx * padded_p_len:(p_idx + 1) * padded_p_len],
+            passage_len, args)
+        if score > best_score:
+            best_score = score
+            best_p_idx = p_idx
+            best_span = answer_span
+    if best_p_idx is None or best_span is None:
+        best_answer = ''
+    else:
+        best_answer = ''.join(sample['passages'][best_p_idx]['passage_tokens'][
+            best_span[0]:best_span[1] + 1])
+    return best_answer
+def validation(inference_program, avg_cost, s_probs, e_probs, feed_order, place,
+               vocab, brc_data, logger, args):
+    """
+    """
+    parallel_executor = fluid.ParallelExecutor(
+        main_program=inference_program,
+        use_cuda=bool(args.use_gpu),
+        loss_name=avg_cost.name)
+    print_para(inference_program, parallel_executor, logger, args)
+    # Use test set as validation each pass
+    total_loss = 0.0
+    count = 0
+    pred_answers, ref_answers = [], []
+    val_feed_list = [
+        inference_program.global_block().var(var_name)
+        for var_name in feed_order
+    ]
+    val_feeder = fluid.DataFeeder(val_feed_list, place)
+    pad_id = vocab.get_id(vocab.pad_token)
+    dev_batches = brc_data.gen_mini_batches(
+        'dev', args.batch_size, pad_id, shuffle=False)
+    for batch_id, batch in enumerate(dev_batches, 1):
+        feed_data = prepare_batch_input(batch, args)
+        val_fetch_outs = parallel_executor.run(
+            feed=val_feeder.feed(feed_data),
+            fetch_list=[avg_cost.name, s_probs.name, e_probs.name],
+            return_numpy=False)
+        total_loss += np.array(val_fetch_outs[0])[0]
+        start_probs = LodTensor_Array(val_fetch_outs[1])
+        end_probs = LodTensor_Array(val_fetch_outs[2])
+        count += len(batch['raw_data'])
+        padded_p_len = len(batch['passage_token_ids'][0])
+        for sample, start_prob, end_prob in zip(batch['raw_data'], start_probs,
+                                                end_probs):
+            best_answer = find_best_answer(sample, start_prob, end_prob,
+                                           padded_p_len, args)
+            pred_answers.append({
+                'question_id': sample['question_id'],
+                'question_type': sample['question_type'],
+                'answers': [best_answer],
+                'entity_answers': [[]],
+                'yesno_answers': []
+            })
+            if 'answers' in sample:
+                ref_answers.append({
+                    'question_id': sample['question_id'],
+                    'question_type': sample['question_type'],
+                    'answers': sample['answers'],
+                    'entity_answers': [[]],
+                    'yesno_answers': []
+                })
+    if args.result_dir is not None and args.result_name is not None:
+        if not os.path.exists(args.result_dir):
+            os.makedirs(args.result_dir)
+        result_file = os.path.join(args.result_dir, args.result_name + '.json')
+        with open(result_file, 'w') as fout:
+            for pred_answer in pred_answers:
+                fout.write(json.dumps(pred_answer, ensure_ascii=False) + '\n')
+        logger.info('Saving {} results to {}'.format(args.result_name,
+                                                     result_file))
+    ave_loss = 1.0 * total_loss / count
+    # compute the bleu and rouge scores if reference answers is provided
+    if len(ref_answers) > 0:
+        pred_dict, ref_dict = {}, {}
+        for pred, ref in zip(pred_answers, ref_answers):
+            question_id = ref['question_id']
+            if len(ref['answers']) > 0:
+                pred_dict[question_id] = normalize(pred['answers'])
+                ref_dict[question_id] = normalize(ref['answers'])
+        bleu_rouge = compute_bleu_rouge(pred_dict, ref_dict)
+    else:
+        bleu_rouge = None
+    return ave_loss, bleu_rouge
+def train(logger, args):
+    logger.info('Load data_set and vocab...')
+    with open(os.path.join(args.vocab_dir, 'vocab.data'), 'rb') as fin:
+        if six.PY2:
+            vocab = pickle.load(fin)
+        else:
+            vocab = pickle.load(fin, encoding='bytes')
+        logger.info('vocab size is {} and embed dim is {}'.format(vocab.size(
+        ), vocab.embed_dim))
+    brc_data = BRCDataset(args.max_p_num, args.max_p_len, args.max_q_len,
+                          args.trainset, args.devset)
+    logger.info('Converting text into ids...')
+    brc_data.convert_to_ids(vocab)
+    logger.info('Initialize the model...')
+    # build model
+    main_program = fluid.Program()
+    startup_prog = fluid.Program()
+    main_program.random_seed = args.random_seed
+    startup_prog.random_seed = args.random_seed
+    with fluid.program_guard(main_program, startup_prog):
+        with fluid.unique_name.guard():
+            avg_cost, s_probs, e_probs, feed_order = rc_model.rc_model(
+                args.hidden_size, vocab, args)
+            # clone from default main program and use it as the validation program
+            inference_program = main_program.clone(for_test=True)
+            # build optimizer
+            if args.optim == 'sgd':
+                optimizer = fluid.optimizer.SGD(
+                    learning_rate=args.learning_rate,
+                    regularization=fluid.regularizer.L2DecayRegularizer(
+                        regularization_coeff=args.weight_decay))
+            elif args.optim == 'adam':
+                optimizer = fluid.optimizer.Adam(
+                    learning_rate=args.learning_rate,
+                    regularization=fluid.regularizer.L2DecayRegularizer(
+                        regularization_coeff=args.weight_decay))
+            elif args.optim == 'rprop':
+                optimizer = fluid.optimizer.RMSPropOptimizer(
+                    learning_rate=args.learning_rate,
+                    regularization=fluid.regularizer.L2DecayRegularizer(
+                        regularization_coeff=args.weight_decay))
+            else:
+                logger.error('Unsupported optimizer: {}'.format(args.optim))
+                exit(-1)
+            optimizer.minimize(avg_cost)
+            # initialize parameters
+            place = core.CUDAPlace(0) if args.use_gpu else core.CPUPlace()
+            exe = Executor(place)
+            if args.load_dir:
+                logger.info('load from {}'.format(args.load_dir))
+                fluid.io.load_persistables(
+                    exe, args.load_dir, main_program=main_program)
+            else:
+                exe.run(startup_prog)
+                embedding_para = fluid.global_scope().find_var(
+                    'embedding_para').get_tensor()
+                embedding_para.set(vocab.embeddings.astype(np.float32), place)
+            # prepare data
+            feed_list = [
+                main_program.global_block().var(var_name)
+                for var_name in feed_order
+            ]
+            feeder = fluid.DataFeeder(feed_list, place)
+            logger.info('Training the model...')
+            parallel_executor = fluid.ParallelExecutor(
+                main_program=main_program,
+                use_cuda=bool(args.use_gpu),
+                loss_name=avg_cost.name)
+            print_para(main_program, parallel_executor, logger, args)
+            for pass_id in range(1, args.pass_num + 1):
+                pass_start_time = time.time()
+                pad_id = vocab.get_id(vocab.pad_token)
+                train_batches = brc_data.gen_mini_batches(
+                    'train', args.batch_size, pad_id, shuffle=True)
+                log_every_n_batch, n_batch_loss = args.log_interval, 0
+                total_num, total_loss = 0, 0
+                for batch_id, batch in enumerate(train_batches, 1):
+                    input_data_dict = prepare_batch_input(batch, args)
+                    fetch_outs = parallel_executor.run(
+                        feed=feeder.feed(input_data_dict),
+                        fetch_list=[avg_cost.name],
+                        return_numpy=False)
+                    cost_train = np.array(fetch_outs[0])[0]
+                    total_num += len(batch['raw_data'])
+                    n_batch_loss += cost_train
+                    total_loss += cost_train * len(batch['raw_data'])
+                    if log_every_n_batch > 0 and batch_id % log_every_n_batch == 0:
+                        print_para(main_program, parallel_executor, logger,
+                                   args)
+                        logger.info(
+                            'Average loss from batch {} to {} is {}'.format(
+                                batch_id - log_every_n_batch + 1, batch_id,
+                                "%.10f" % (n_batch_loss / log_every_n_batch)))
+                        n_batch_loss = 0
+                    if args.dev_interval > 0 and batch_id % args.dev_interval == 0:
+                        eval_loss, bleu_rouge = validation(
+                            inference_program, avg_cost, s_probs, e_probs,
+                            feed_order, place, vocab, brc_data, logger, args)
+                        logger.info('Dev eval loss {}'.format(eval_loss))
+                        logger.info('Dev eval result: {}'.format(bleu_rouge))
+                pass_end_time = time.time()
+                logger.info('Evaluating the model after epoch {}'.format(
+                    pass_id))
+                if brc_data.dev_set is not None:
+                    eval_loss, bleu_rouge = validation(
+                        inference_program, avg_cost, s_probs, e_probs,
+                        feed_order, place, vocab, brc_data, logger, args)
+                    logger.info('Dev eval loss {}'.format(eval_loss))
+                    logger.info('Dev eval result: {}'.format(bleu_rouge))
+                else:
+                    logger.warning(
+                        'No dev set is loaded for evaluation in the dataset!')
+                time_consumed = pass_end_time - pass_start_time
+                logger.info('Average train loss for epoch {} is {}'.format(
+                    pass_id, "%.10f" % (1.0 * total_loss / total_num)))
+                if pass_id % args.save_interval == 0:
+                    model_path = os.path.join(args.save_dir, str(pass_id))
+                    if not os.path.isdir(model_path):
+                        os.makedirs(model_path)
+                    fluid.io.save_persistables(
+                        executor=exe,
+                        dirname=model_path,
+                        main_program=main_program)
+def evaluate(logger, args):
+    logger.info('Load data_set and vocab...')
+    with open(os.path.join(args.vocab_dir, 'vocab.data'), 'rb') as fin:
+        vocab = pickle.load(fin)
+        logger.info('vocab size is {} and embed dim is {}'.format(vocab.size(
+        ), vocab.embed_dim))
+    brc_data = BRCDataset(
+        args.max_p_num, args.max_p_len, args.max_q_len, dev_files=args.devset)
+    logger.info('Converting text into ids...')
+    brc_data.convert_to_ids(vocab)
+    logger.info('Initialize the model...')
+    # build model
+    main_program = fluid.Program()
+    startup_prog = fluid.Program()
+    main_program.random_seed = args.random_seed
+    startup_prog.random_seed = args.random_seed
+    with fluid.program_guard(main_program, startup_prog):
+        with fluid.unique_name.guard():
+            avg_cost, s_probs, e_probs, feed_order = rc_model.rc_model(
+                args.hidden_size, vocab, args)
+            # initialize parameters
+            place = core.CUDAPlace(0) if args.use_gpu else core.CPUPlace()
+            exe = Executor(place)
+            if args.load_dir:
+                logger.info('load from {}'.format(args.load_dir))
+                fluid.io.load_persistables(
+                    exe, args.load_dir, main_program=main_program)
+            else:
+                logger.error('No model file to load ...')
+                return
+            # prepare data
+            feed_list = [
+                main_program.global_block().var(var_name)
+                for var_name in feed_order
+            ]
+            feeder = fluid.DataFeeder(feed_list, place)
+            inference_program = main_program.clone(for_test=True)
+            eval_loss, bleu_rouge = validation(
+                inference_program, avg_cost, s_probs, e_probs, feed_order,
+                place, vocab, brc_data, logger, args)
+            logger.info('Dev eval loss {}'.format(eval_loss))
+            logger.info('Dev eval result: {}'.format(bleu_rouge))
+            logger.info('Predicted answers are saved to {}'.format(
+                os.path.join(args.result_dir)))
+def predict(logger, args):
+    logger.info('Load data_set and vocab...')
+    with open(os.path.join(args.vocab_dir, 'vocab.data'), 'rb') as fin:
+        vocab = pickle.load(fin)
+        logger.info('vocab size is {} and embed dim is {}'.format(vocab.size(
+        ), vocab.embed_dim))
+    brc_data = BRCDataset(
+        args.max_p_num, args.max_p_len, args.max_q_len, dev_files=args.testset)
+    logger.info('Converting text into ids...')
+    brc_data.convert_to_ids(vocab)
+    logger.info('Initialize the model...')
+    # build model
+    main_program = fluid.Program()
+    startup_prog = fluid.Program()
+    main_program.random_seed = args.random_seed
+    startup_prog.random_seed = args.random_seed
+    with fluid.program_guard(main_program, startup_prog):
+        with fluid.unique_name.guard():
+            avg_cost, s_probs, e_probs, feed_order = rc_model.rc_model(
+                args.hidden_size, vocab, args)
+            # initialize parameters
+            place = core.CUDAPlace(0) if args.use_gpu else core.CPUPlace()
+            exe = Executor(place)
+            if args.load_dir:
+                logger.info('load from {}'.format(args.load_dir))
+                fluid.io.load_persistables(
+                    exe, args.load_dir, main_program=main_program)
+            else:
+                logger.error('No model file to load ...')
+                return
+            # prepare data
+            feed_list = [
+                main_program.global_block().var(var_name)
+                for var_name in feed_order
+            ]
+            feeder = fluid.DataFeeder(feed_list, place)
+            inference_program = main_program.clone(for_test=True)
+            eval_loss, bleu_rouge = validation(
+                inference_program, avg_cost, s_probs, e_probs, feed_order,
+                place, vocab, brc_data, logger, args)
+def prepare(logger, args):
+    """
+    checks data, creates the directories, prepare the vocabulary and embeddings
+    """
+    logger.info('Checking the data files...')
+    for data_path in args.trainset + args.devset + args.testset:
+        assert os.path.exists(data_path), '{} file does not exist.'.format(
+            data_path)
+    logger.info('Preparing the directories...')
+    for dir_path in [args.vocab_dir, args.save_dir, args.result_dir]:
+        if not os.path.exists(dir_path):
+            os.makedirs(dir_path)
+    logger.info('Building vocabulary...')
+    brc_data = BRCDataset(args.max_p_num, args.max_p_len, args.max_q_len,
+                          args.trainset, args.devset, args.testset)
+    vocab = Vocab(lower=True)
+    for word in brc_data.word_iter('train'):
+        vocab.add(word)
+    unfiltered_vocab_size = vocab.size()
+    vocab.filter_tokens_by_cnt(min_cnt=2)
+    filtered_num = unfiltered_vocab_size - vocab.size()
+    logger.info('After filter {} tokens, the final vocab size is {}'.format(
+        filtered_num, vocab.size()))
+    logger.info('Assigning embeddings...')
+    vocab.randomly_init_embeddings(args.embed_size)
+    logger.info('Saving vocab...')
+    with open(os.path.join(args.vocab_dir, 'vocab.data'), 'wb') as fout:
+        pickle.dump(vocab, fout)
+    logger.info('Done with preparing!')
+if __name__ == '__main__':
+    args = parse_args()
+    random.seed(args.random_seed)
+    np.random.seed(args.random_seed)
+    logger = logging.getLogger("brc")
+    logger.setLevel(logging.INFO)
+    formatter = logging.Formatter(
+        '%(asctime)s - %(name)s - %(levelname)s - %(message)s')
+    if args.log_path:
+        file_handler = logging.FileHandler(args.log_path)
+        file_handler.setLevel(logging.INFO)
+        file_handler.setFormatter(formatter)
+        logger.addHandler(file_handler)
+    else:
+        console_handler = logging.StreamHandler()
+        console_handler.setLevel(logging.INFO)
+        console_handler.setFormatter(formatter)
+        logger.addHandler(console_handler)
+    args = parse_args()
+    logger.info('Running with args : {}'.format(args))
+    if args.prepare:
+        prepare(logger, args)
+    if args.train:
+        train(logger, args)
+    if args.evaluate:
+        evaluate(logger, args)
+    if args.predict:
+        predict(logger, args)
--- a/fluid/machine_reading_comprehension/run.sh
+++ b/fluid/machine_reading_comprehension/run.sh
+export CUDA_VISIBLE_DEVICES=1
+python run.py   \
+--trainset 'data/preprocessed/trainset/search.train.json' \
+           'data/preprocessed/trainset/zhidao.train.json' \
+--devset 'data/preprocessed/devset/search.dev.json' \
+         'data/preprocessed/devset/zhidao.dev.json' \
+--testset 'data/preprocessed/testset/search.test.json' \
+          'data/preprocessed/testset/zhidao.test.json' \
+--vocab_dir 'data/vocab' \
+--use_gpu true \
+--save_dir ./models \
+--pass_num 10 \
+--learning_rate 0.001 \
+--batch_size 8 \
+--embed_size 300 \
+--hidden_size 150 \
+--max_p_num 5 \
+--max_p_len 500 \
+--max_q_len 60 \
+--max_a_len 200 \
+--drop_rate 0.2 $@\
--- a/fluid/machine_reading_comprehension/utils/__init__.py
+++ b/fluid/machine_reading_comprehension/utils/__init__.py
+# coding:utf8
+# ==============================================================================
+# Copyright 2017 Baidu.com, Inc. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""
+This package implements some utility functions shared by PaddlePaddle
+and Tensorflow model implementations.
+Authors: liuyuan(liuyuan04@baidu.com)
+Date:    2017/10/06 18:23:06
+"""
+from .dureader_eval import compute_bleu_rouge
+from .dureader_eval import normalize
+from .preprocess import find_fake_answer
+from .preprocess import find_best_question_match
+__all__ = [
+    'compute_bleu_rouge',
+    'normalize',
+    'find_fake_answer',
+    'find_best_question_match',
+]
--- a/fluid/machine_reading_comprehension/utils/download_thirdparty.sh
+++ b/fluid/machine_reading_comprehension/utils/download_thirdparty.sh
+#!/bin/bash
+# ==============================================================================
+# Copyright 2017 Baidu.com, Inc. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+# We use Bleu and Rouge as evaluation metrics, the calculation of these metrics
+# relies on the scoring scripts under "https://github.com/tylin/coco-caption"
+bleu_base_url='https://raw.githubusercontent.com/tylin/coco-caption/master/pycocoevalcap/bleu'
+bleu_files=("LICENSE" "__init__.py" "bleu.py" "bleu_scorer.py")
+rouge_base_url="https://raw.githubusercontent.com/tylin/coco-caption/master/pycocoevalcap/rouge"
+rouge_files=("__init__.py" "rouge.py")
+download() {
+    local metric=$1; shift;
+    local base_url=$1; shift;
+    local fnames=($@);
+    mkdir -p ${metric}
+    for fname in ${fnames[@]};
+    do
+        printf "downloading: %s\n" ${base_url}/${fname}
+        wget --no-check-certificate ${base_url}/${fname} -O ${metric}/${fname}
+    done
+}
+# prepare rouge
+download "rouge_metric" ${rouge_base_url} ${rouge_files[@]}
+# prepare bleu
+download "bleu_metric" ${bleu_base_url} ${bleu_files[@]}
+# convert python 2.x source code to python 3.x
+2to3 -w "../utils/bleu_metric/bleu_scorer.py"
+2to3 -w "../utils/bleu_metric/bleu.py"
--- a/fluid/machine_reading_comprehension/utils/dureader_eval.py
+++ b/fluid/machine_reading_comprehension/utils/dureader_eval.py
+# -*- coding:utf8 -*-
+# ==============================================================================
+# Copyright 2017 Baidu.com, Inc. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""
+This module computes evaluation metrics for DuReader dataset.
+"""
+import argparse
+import json
+import sys
+import zipfile
+from collections import Counter
+from .bleu_metric.bleu import Bleu
+from .rouge_metric.rouge import Rouge
+EMPTY = ''
+YESNO_LABELS = set(['Yes', 'No', 'Depends'])
+def normalize(s):
+    """
+    Normalize strings to space joined chars.
+    Args:
+        s: a list of strings.
+    Returns:
+        A list of normalized strings.
+    """
+    if not s:
+        return s
+    normalized = []
+    for ss in s:
+        tokens = [c for c in list(ss) if len(c.strip()) != 0]
+        normalized.append(' '.join(tokens))
+    return normalized
+def data_check(obj, task):
+    """
+    Check data.
+    Raises:
+        Raises AssertionError when data is not legal.
+    """
+    assert 'question_id' in obj, "Missing 'question_id' field."
+    assert 'question_type' in obj, \
+            "Missing 'question_type' field. question_id: {}".format(obj['question_type'])
+    assert 'yesno_answers' in obj, \
+            "Missing 'yesno_answers' field. question_id: {}".format(obj['question_id'])
+    assert isinstance(obj['yesno_answers'], list), \
+            r"""'yesno_answers' field must be a list, if the 'question_type' is not
+            'YES_NO', then this field should be an empty list.
+            question_id: {}""".format(obj['question_id'])
+    assert 'entity_answers' in obj, \
+            "Missing 'entity_answers' field. question_id: {}".format(obj['question_id'])
+    assert isinstance(obj['entity_answers'], list) \
+            and len(obj['entity_answers']) > 0, \
+            r"""'entity_answers' field must be a list, and has at least one element,
+            which can be a empty list. question_id: {}""".format(obj['question_id'])
+def read_file(file_name, task, is_ref=False):
+    """
+    Read predict answers or reference answers from file.
+    Args:
+        file_name: the name of the file containing predict result or reference
+                   result.
+    Returns:
+        A dictionary mapping question_id to the result information. The result
+        information itself is also a dictionary with has four keys:
+        - question_type: type of the query.
+        - yesno_answers: A list of yesno answers corresponding to 'answers'.
+        - answers: A list of predicted answers.
+        - entity_answers: A list, each element is also a list containing the entities
+                    tagged out from the corresponding answer string.
+    """
+    def _open(file_name, mode, zip_obj=None):
+        if zip_obj is not None:
+            return zip_obj.open(file_name, mode)
+        return open(file_name, mode)
+    results = {}
+    keys = ['answers', 'yesno_answers', 'entity_answers', 'question_type']
+    if is_ref:
+        keys += ['source']
+    zf = zipfile.ZipFile(file_name, 'r') if file_name.endswith('.zip') else None
+    file_list = [file_name] if zf is None else zf.namelist()
+    for fn in file_list:
+        for line in _open(fn, 'r', zip_obj=zf):
+            try:
+                obj = json.loads(line.strip())
+            except ValueError:
+                raise ValueError("Every line of data should be legal json")
+            data_check(obj, task)
+            qid = obj['question_id']
+            assert qid not in results, "Duplicate question_id: {}".format(qid)
+            results[qid] = {}
+            for k in keys:
+                results[qid][k] = obj[k]
+    return results
+def compute_bleu_rouge(pred_dict, ref_dict, bleu_order=4):
+    """
+    Compute bleu and rouge scores.
+    """
+    assert set(pred_dict.keys()) == set(ref_dict.keys()), \
+            "missing keys: {}".format(set(ref_dict.keys()) - set(pred_dict.keys()))
+    scores = {}
+    bleu_scores, _ = Bleu(bleu_order).compute_score(ref_dict, pred_dict)
+    for i, bleu_score in enumerate(bleu_scores):
+        scores['Bleu-%d' % (i + 1)] = bleu_score
+    rouge_score, _ = Rouge().compute_score(ref_dict, pred_dict)
+    scores['Rouge-L'] = rouge_score
+    return scores
+def local_prf(pred_list, ref_list):
+    """
+    Compute local precision recall and f1-score,
+    given only one prediction list and one reference list
+    """
+    common = Counter(pred_list) & Counter(ref_list)
+    num_same = sum(common.values())
+    if num_same == 0:
+        return 0, 0, 0
+    p = 1.0 * num_same / len(pred_list)
+    r = 1.0 * num_same / len(ref_list)
+    f1 = (2 * p * r) / (p + r)
+    return p, r, f1
+def compute_prf(pred_dict, ref_dict):
+    """
+    Compute precision recall and f1-score.
+    """
+    pred_question_ids = set(pred_dict.keys())
+    ref_question_ids = set(ref_dict.keys())
+    correct_preds, total_correct, total_preds = 0, 0, 0
+    for question_id in ref_question_ids:
+        pred_entity_list = pred_dict.get(question_id, [[]])
+        assert len(pred_entity_list) == 1, \
+            'the number of entity list for question_id {} is not 1.'.format(question_id)
+        pred_entity_list = pred_entity_list[0]
+        all_ref_entity_lists = ref_dict[question_id]
+        best_local_f1 = 0
+        best_ref_entity_list = None
+        for ref_entity_list in all_ref_entity_lists:
+            local_f1 = local_prf(pred_entity_list, ref_entity_list)[2]
+            if local_f1 > best_local_f1:
+                best_ref_entity_list = ref_entity_list
+                best_local_f1 = local_f1
+        if best_ref_entity_list is None:
+            if len(all_ref_entity_lists) > 0:
+                best_ref_entity_list = sorted(
+                    all_ref_entity_lists, key=lambda x: len(x))[0]
+            else:
+                best_ref_entity_list = []
+        gold_entities = set(best_ref_entity_list)
+        pred_entities = set(pred_entity_list)
+        correct_preds += len(gold_entities & pred_entities)
+        total_preds += len(pred_entities)
+        total_correct += len(gold_entities)
+    p = float(correct_preds) / total_preds if correct_preds > 0 else 0
+    r = float(correct_preds) / total_correct if correct_preds > 0 else 0
+    f1 = 2 * p * r / (p + r) if correct_preds > 0 else 0
+    return {'Precision': p, 'Recall': r, 'F1': f1}
+def prepare_prf(pred_dict, ref_dict):
+    """
+    Prepares data for calculation of prf scores.
+    """
+    preds = {k: v['entity_answers'] for k, v in pred_dict.items()}
+    refs = {k: v['entity_answers'] for k, v in ref_dict.items()}
+    return preds, refs
+def filter_dict(result_dict, key_tag):
+    """
+    Filter a subset of the result_dict, where keys ends with 'key_tag'.
+    """
+    filtered = {}
+    for k, v in result_dict.items():
+        if k.endswith(key_tag):
+            filtered[k] = v
+    return filtered
+def get_metrics(pred_result, ref_result, task, source):
+    """
+    Computes metrics.
+    """
+    metrics = {}
+    ref_result_filtered = {}
+    pred_result_filtered = {}
+    if source == 'both':
+        ref_result_filtered = ref_result
+        pred_result_filtered = pred_result
+    else:
+        for question_id, info in ref_result.items():
+            if info['source'] == source:
+                ref_result_filtered[question_id] = info
+                if question_id in pred_result:
+                    pred_result_filtered[question_id] = pred_result[question_id]
+    if task == 'main' or task == 'all' \
+            or task == 'description':
+        pred_dict, ref_dict = prepare_bleu(pred_result_filtered,
+                                           ref_result_filtered, task)
+        metrics = compute_bleu_rouge(pred_dict, ref_dict)
+    elif task == 'yesno':
+        pred_dict, ref_dict = prepare_bleu(pred_result_filtered,
+                                           ref_result_filtered, task)
+        keys = ['Yes', 'No', 'Depends']
+        preds = [filter_dict(pred_dict, k) for k in keys]
+        refs = [filter_dict(ref_dict, k) for k in keys]
+        metrics = compute_bleu_rouge(pred_dict, ref_dict)
+        for k, pred, ref in zip(keys, preds, refs):
+            m = compute_bleu_rouge(pred, ref)
+            k_metric = [(k + '|' + key, v) for key, v in m.items()]
+            metrics.update(k_metric)
+    elif task == 'entity':
+        pred_dict, ref_dict = prepare_prf(pred_result_filtered,
+                                          ref_result_filtered)
+        pred_dict_bleu, ref_dict_bleu = prepare_bleu(pred_result_filtered,
+                                                     ref_result_filtered, task)
+        metrics = compute_prf(pred_dict, ref_dict)
+        metrics.update(compute_bleu_rouge(pred_dict_bleu, ref_dict_bleu))
+    else:
+        raise ValueError("Illegal task name: {}".format(task))
+    return metrics
+def prepare_bleu(pred_result, ref_result, task):
+    """
+    Prepares data for calculation of bleu and rouge scores.
+    """
+    pred_list, ref_list = [], []
+    qids = ref_result.keys()
+    for qid in qids:
+        if task == 'main':
+            pred, ref = get_main_result(qid, pred_result, ref_result)
+        elif task == 'yesno':
+            pred, ref = get_yesno_result(qid, pred_result, ref_result)
+        elif task == 'all':
+            pred, ref = get_all_result(qid, pred_result, ref_result)
+        elif task == 'entity':
+            pred, ref = get_entity_result(qid, pred_result, ref_result)
+        elif task == 'description':
+            pred, ref = get_desc_result(qid, pred_result, ref_result)
+        else:
+            raise ValueError("Illegal task name: {}".format(task))
+        if pred and ref:
+            pred_list += pred
+            ref_list += ref
+    pred_dict = dict(pred_list)
+    ref_dict = dict(ref_list)
+    for qid, ans in ref_dict.items():
+        ref_dict[qid] = normalize(ref_dict[qid])
+        pred_dict[qid] = normalize(pred_dict.get(qid, [EMPTY]))
+        if not ans or ans == [EMPTY]:
+            del ref_dict[qid]
+            del pred_dict[qid]
+    for k, v in pred_dict.items():
+        assert len(v) == 1, \
+            "There should be only one predict answer. question_id: {}".format(k)
+    return pred_dict, ref_dict
+def get_main_result(qid, pred_result, ref_result):
+    """
+    Prepare answers for task 'main'.
+    Args:
+        qid: question_id.
+        pred_result: A dict include all question_id's result information read
+                     from args.pred_file.
+        ref_result: A dict incluce all question_id's result information read
+                    from args.ref_file.
+    Returns:
+        Two lists, the first one contains predict result, the second
+        one contains reference result of the same question_id. Each list has
+        elements of tuple (question_id, answers), 'answers' is a list of strings.
+    """
+    ref_ans = ref_result[qid]['answers']
+    if not ref_ans:
+        ref_ans = [EMPTY]
+    pred_ans = pred_result.get(qid, {}).get('answers', [])[:1]
+    if not pred_ans:
+        pred_ans = [EMPTY]
+    return [(qid, pred_ans)], [(qid, ref_ans)]
+def get_entity_result(qid, pred_result, ref_result):
+    """
+    Prepare answers for task 'entity'.
+    Args:
+        qid: question_id.
+        pred_result: A dict include all question_id's result information read
+                     from args.pred_file.
+        ref_result: A dict incluce all question_id's result information read
+                    from args.ref_file.
+    Returns:
+        Two lists, the first one contains predict result, the second
+        one contains reference result of the same question_id. Each list has
+        elements of tuple (question_id, answers), 'answers' is a list of strings.
+    """
+    if ref_result[qid]['question_type'] != 'ENTITY':
+        return None, None
+    return get_main_result(qid, pred_result, ref_result)
+def get_desc_result(qid, pred_result, ref_result):
+    """
+    Prepare answers for task 'description'.
+    Args:
+        qid: question_id.
+        pred_result: A dict include all question_id's result information read
+                     from args.pred_file.
+        ref_result: A dict incluce all question_id's result information read
+                    from args.ref_file.
+    Returns:
+        Two lists, the first one contains predict result, the second
+        one contains reference result of the same question_id. Each list has
+        elements of tuple (question_id, answers), 'answers' is a list of strings.
+    """
+    if ref_result[qid]['question_type'] != 'DESCRIPTION':
+        return None, None
+    return get_main_result(qid, pred_result, ref_result)
+def get_yesno_result(qid, pred_result, ref_result):
+    """
+    Prepare answers for task 'yesno'.
+    Args:
+        qid: question_id.
+        pred_result: A dict include all question_id's result information read
+                     from args.pred_file.
+        ref_result: A dict incluce all question_id's result information read
+                    from args.ref_file.
+    Returns:
+        Two lists, the first one contains predict result, the second
+        one contains reference result of the same question_id. Each list has
+        elements of tuple (question_id, answers), 'answers' is a list of strings.
+    """
+    def _uniq(li, is_ref):
+        uniq_li = []
+        left = []
+        keys = set()
+        for k, v in li:
+            if k not in keys:
+                uniq_li.append((k, v))
+                keys.add(k)
+            else:
+                left.append((k, v))
+        if is_ref:
+            dict_li = dict(uniq_li)
+            for k, v in left:
+                dict_li[k] += v
+            uniq_li = [(k, v) for k, v in dict_li.items()]
+        return uniq_li
+    def _expand_result(uniq_li):
+        expanded = uniq_li[:]
+        keys = set([x[0] for x in uniq_li])
+        for k in YESNO_LABELS - keys:
+            expanded.append((k, [EMPTY]))
+        return expanded
+    def _get_yesno_ans(qid, result_dict, is_ref=False):
+        if qid not in result_dict:
+            return [(str(qid) + '_' + k, v) for k, v in _expand_result([])]
+        yesno_answers = result_dict[qid]['yesno_answers']
+        answers = result_dict[qid]['answers']
+        lbl_ans = _uniq([(k, [v]) for k, v in zip(yesno_answers, answers)],
+                        is_ref)
+        ret = [(str(qid) + '_' + k, v) for k, v in _expand_result(lbl_ans)]
+        return ret
+    if ref_result[qid]['question_type'] != 'YES_NO':
+        return None, None
+    ref_ans = _get_yesno_ans(qid, ref_result, is_ref=True)
+    pred_ans = _get_yesno_ans(qid, pred_result)
+    return pred_ans, ref_ans
+def get_all_result(qid, pred_result, ref_result):
+    """
+    Prepare answers for task 'all'.
+    Args:
+        qid: question_id.
+        pred_result: A dict include all question_id's result information read
+                     from args.pred_file.
+        ref_result: A dict incluce all question_id's result information read
+                    from args.ref_file.
+    Returns:
+        Two lists, the first one contains predict result, the second
+        one contains reference result of the same question_id. Each list has
+        elements of tuple (question_id, answers), 'answers' is a list of strings.
+    """
+    if ref_result[qid]['question_type'] == 'YES_NO':
+        return get_yesno_result(qid, pred_result, ref_result)
+    return get_main_result(qid, pred_result, ref_result)
+def format_metrics(metrics, task, err_msg):
+    """
+    Format metrics. 'err' field returns any error occured during evaluation.
+    Args:
+        metrics: A dict object contains metrics for different tasks.
+        task: Task name.
+        err_msg: Exception raised during evaluation.
+    Returns:
+        Formatted result.
+    """
+    result = {}
+    sources = ["both", "search", "zhidao"]
+    if err_msg is not None:
+        return {'errorMsg': str(err_msg), 'errorCode': 1, 'data': []}
+    data = []
+    if task != 'all' and task != 'main':
+        sources = ["both"]
+    if task == 'entity':
+        metric_names = ["Bleu-4", "Rouge-L"]
+        metric_names_prf = ["F1", "Precision", "Recall"]
+        for name in metric_names + metric_names_prf:
+            for src in sources:
+                obj = {
+                    "name": name,
+                    "value": round(metrics[src].get(name, 0) * 100, 2),
+                    "type": src,
+                }
+                data.append(obj)
+    elif task == 'yesno':
+        metric_names = ["Bleu-4", "Rouge-L"]
+        details = ["Yes", "No", "Depends"]
+        src = sources[0]
+        for name in metric_names:
+            obj = {
+                "name": name,
+                "value": round(metrics[src].get(name, 0) * 100, 2),
+                "type": 'All',
+            }
+            data.append(obj)
+            for d in details:
+                obj = {
+                    "name": name,
+                    "value": \
+                        round(metrics[src].get(d + '|' + name, 0) * 100, 2),
+                    "type": d,
+                    }
+                data.append(obj)
+    else:
+        metric_names = ["Bleu-4", "Rouge-L"]
+        for name in metric_names:
+            for src in sources:
+                obj = {
+                    "name": name,
+                    "value": \
+                        round(metrics[src].get(name, 0) * 100, 2),
+                    "type": src,
+                    }
+                data.append(obj)
+    result["data"] = data
+    result["errorCode"] = 0
+    result["errorMsg"] = "success"
+    return result
+def main(args):
+    """
+    Do evaluation.
+    """
+    err = None
+    metrics = {}
+    try:
+        pred_result = read_file(args.pred_file, args.task)
+        ref_result = read_file(args.ref_file, args.task, is_ref=True)
+        sources = ['both', 'search', 'zhidao']
+        if args.task not in set(['main', 'all']):
+            sources = sources[:1]
+        for source in sources:
+            metrics[source] = get_metrics(pred_result, ref_result, args.task,
+                                          source)
+    except ValueError as ve:
+        err = ve
+    except AssertionError as ae:
+        err = ae
+    print(json.dumps(
+        format_metrics(metrics, args.task, err), ensure_ascii=False).encode(
+            'utf8'))
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('pred_file', help='predict file')
+    parser.add_argument('ref_file', help='reference file')
+    parser.add_argument(
+        'task', help='task name: Main|Yes_No|All|Entity|Description')
+    args = parser.parse_args()
+    args.task = args.task.lower().replace('_', '')
+    main(args)
--- a/fluid/machine_reading_comprehension/utils/get_vocab.py
+++ b/fluid/machine_reading_comprehension/utils/get_vocab.py
+# -*- coding:utf8 -*-
+# ==============================================================================
+# Copyright 2017 Baidu.com, Inc. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""
+Utility function to generate vocabulary file.
+"""
+import argparse
+import sys
+import json
+from itertools import chain
+def get_vocab(files, vocab_file):
+    """
+    Builds vocabulary file from field 'segmented_paragraphs'
+    and 'segmented_question'.
+    Args:
+        files: A list of file names.
+        vocab_file: The file that stores the vocabulary.
+    """
+    vocab = {}
+    for f in files:
+        with open(f, 'r') as fin:
+            for line in fin:
+                obj = json.loads(line.strip())
+                paras = [
+                    chain(*d['segmented_paragraphs']) for d in obj['documents']
+                ]
+                doc_tokens = chain(*paras)
+                question_tokens = obj['segmented_question']
+                for t in list(doc_tokens) + question_tokens:
+                    vocab[t] = vocab.get(t, 0) + 1
+    # output
+    sorted_vocab = sorted(
+        [(v, c) for v, c in vocab.items()], key=lambda x: x[1], reverse=True)
+    with open(vocab_file, 'w') as outf:
+        for w, c in sorted_vocab:
+            print >> outf, '{}\t{}'.format(w.encode('utf8'), c)
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        '--files',
+        nargs='+',
+        required=True,
+        help='file list to count vocab from.')
+    parser.add_argument(
+        '--vocab', required=True, help='file to store counted vocab.')
+    args = parser.parse_args()
+    get_vocab(args.files, args.vocab)
--- a/fluid/machine_reading_comprehension/utils/marco_tokenize_data.py
+++ b/fluid/machine_reading_comprehension/utils/marco_tokenize_data.py
+#coding=utf8
+import os, sys, json
+import nltk
+def _nltk_tokenize(sequence):
+    tokens = nltk.word_tokenize(sequence)
+    cur_char_offset = 0
+    token_offsets = []
+    token_words = []
+    for token in tokens:
+        cur_char_offset = sequence.find(token, cur_char_offset)
+        token_offsets.append(
+            [cur_char_offset, cur_char_offset + len(token) - 1])
+        token_words.append(token)
+    return token_offsets, token_words
+def segment(input_js):
+    _, input_js['segmented_question'] = _nltk_tokenize(input_js['question'])
+    for doc_id, doc in enumerate(input_js['documents']):
+        doc['segmented_title'] = []
+        doc['segmented_paragraphs'] = []
+        for para_id, para in enumerate(doc['paragraphs']):
+            _, seg_para = _nltk_tokenize(para)
+            doc['segmented_paragraphs'].append(seg_para)
+    if 'answers' in input_js:
+        input_js['segmented_answers'] = []
+        for answer_id, answer in enumerate(input_js['answers']):
+            _, seg_answer = _nltk_tokenize(answer)
+            input_js['segmented_answers'].append(seg_answer)
+if __name__ == '__main__':
+    if len(sys.argv) != 2:
+        print('Usage: tokenize_data.py <input_path>')
+        exit()
+    nltk.download('punkt')
+    for line in open(sys.argv[1]):
+        dureader_js = json.loads(line.strip())
+        segment(dureader_js)
+        print(json.dumps(dureader_js))
--- a/fluid/machine_reading_comprehension/utils/marcov1_to_dureader.py
+++ b/fluid/machine_reading_comprehension/utils/marcov1_to_dureader.py
+#coding=utf8
+import sys
+import json
+import pandas as pd
+def trans(input_js):
+    output_js = {}
+    output_js['question'] = input_js['query']
+    output_js['question_type'] = input_js['query_type']
+    output_js['question_id'] = input_js['query_id']
+    output_js['fact_or_opinion'] = ""
+    output_js['documents'] = []
+    for para_id, para in enumerate(input_js['passages']):
+        doc = {}
+        doc['title'] = ""
+        if 'is_selected' in para:
+            doc['is_selected'] = True if para['is_selected'] != 0 else False
+        doc['paragraphs'] = [para['passage_text']]
+        output_js['documents'].append(doc)
+    if 'answers' in input_js:
+        output_js['answers'] = input_js['answers']
+    return output_js
+if __name__ == '__main__':
+    if len(sys.argv) != 2:
+        print('Usage: marcov1_to_dureader.py <input_path>')
+        exit()
+    df = pd.read_json(sys.argv[1])
+    for row in df.iterrows():
+        marco_js = json.loads(row[1].to_json())
+        dureader_js = trans(marco_js)
+        print(json.dumps(dureader_js))
--- a/fluid/machine_reading_comprehension/utils/marcov2_to_v1_tojsonl.py
+++ b/fluid/machine_reading_comprehension/utils/marcov2_to_v1_tojsonl.py
+import sys
+import json
+import pandas as pd
+if __name__ == '__main__':
+    if len(sys.argv) != 3:
+        print('Usage: tojson.py <input_path> <output_path>')
+        exit()
+    infile = sys.argv[1]
+    outfile = sys.argv[2]
+    df = pd.read_json(infile)
+    with open(outfile, 'w') as f:
+        for row in df.iterrows():
+            f.write(row[1].to_json() + '\n')
--- a/fluid/machine_reading_comprehension/utils/preprocess.py
+++ b/fluid/machine_reading_comprehension/utils/preprocess.py
+###############################################################################
+# ==============================================================================
+# Copyright 2017 Baidu.com, Inc. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""
+This module finds the most related paragraph of each document according to recall.
+"""
+import sys
+if sys.version[0] == '2':
+    reload(sys)
+    sys.setdefaultencoding("utf-8")
+import json
+from collections import Counter
+def precision_recall_f1(prediction, ground_truth):
+    """
+    This function calculates and returns the precision, recall and f1-score
+    Args:
+        prediction: prediction string or list to be matched
+        ground_truth: golden string or list reference
+    Returns:
+        floats of (p, r, f1)
+    Raises:
+        None
+    """
+    if not isinstance(prediction, list):
+        prediction_tokens = prediction.split()
+    else:
+        prediction_tokens = prediction
+    if not isinstance(ground_truth, list):
+        ground_truth_tokens = ground_truth.split()
+    else:
+        ground_truth_tokens = ground_truth
+    common = Counter(prediction_tokens) & Counter(ground_truth_tokens)
+    num_same = sum(common.values())
+    if num_same == 0:
+        return 0, 0, 0
+    p = 1.0 * num_same / len(prediction_tokens)
+    r = 1.0 * num_same / len(ground_truth_tokens)
+    f1 = (2 * p * r) / (p + r)
+    return p, r, f1
+def recall(prediction, ground_truth):
+    """
+    This function calculates and returns the recall
+    Args:
+        prediction: prediction string or list to be matched
+        ground_truth: golden string or list reference
+    Returns:
+        floats of recall
+    Raises:
+        None
+    """
+    return precision_recall_f1(prediction, ground_truth)[1]
+def f1_score(prediction, ground_truth):
+    """
+    This function calculates and returns the f1-score
+    Args:
+        prediction: prediction string or list to be matched
+        ground_truth: golden string or list reference
+    Returns:
+        floats of f1
+    Raises:
+        None
+    """
+    return precision_recall_f1(prediction, ground_truth)[2]
+def metric_max_over_ground_truths(metric_fn, prediction, ground_truths):
+    """
+    This function calculates and returns the precision, recall and f1-score
+    Args:
+        metric_fn: metric function pointer which calculates scores according to corresponding logic.
+        prediction: prediction string or list to be matched
+        ground_truth: golden string or list reference
+    Returns:
+        floats of (p, r, f1)
+    Raises:
+        None
+    """
+    scores_for_ground_truths = []
+    for ground_truth in ground_truths:
+        score = metric_fn(prediction, ground_truth)
+        scores_for_ground_truths.append(score)
+    return max(scores_for_ground_truths)
+def find_best_question_match(doc, question, with_score=False):
+    """
+    For each docment, find the paragraph that matches best to the question.
+    Args:
+        doc: The document object.
+        question: The question tokens.
+        with_score: If True then the match score will be returned,
+            otherwise False.
+    Returns:
+        The index of the best match paragraph, if with_score=False,
+        otherwise returns a tuple of the index of the best match paragraph
+        and the match score of that paragraph.
+    """
+    most_related_para = -1
+    max_related_score = 0
+    most_related_para_len = 0
+    for p_idx, para_tokens in enumerate(doc['segmented_paragraphs']):
+        if len(question) > 0:
+            related_score = metric_max_over_ground_truths(recall, para_tokens,
+                                                          question)
+        else:
+            related_score = 0
+        if related_score > max_related_score \
+                or (related_score == max_related_score \
+                and len(para_tokens) < most_related_para_len):
+            most_related_para = p_idx
+            max_related_score = related_score
+            most_related_para_len = len(para_tokens)
+    if most_related_para == -1:
+        most_related_para = 0
+    if with_score:
+        return most_related_para, max_related_score
+    return most_related_para
+def find_fake_answer(sample):
+    """
+    For each document, finds the most related paragraph based on recall,
+    then finds a span that maximize the f1_score compared with the gold answers
+    and uses this span as a fake answer span
+    Args:
+        sample: a sample in the dataset
+    Returns:
+        None
+    Raises:
+        None
+    """
+    for doc in sample['documents']:
+        most_related_para = -1
+        most_related_para_len = 999999
+        max_related_score = 0
+        for p_idx, para_tokens in enumerate(doc['segmented_paragraphs']):
+            if len(sample['segmented_answers']) > 0:
+                related_score = metric_max_over_ground_truths(
+                    recall, para_tokens, sample['segmented_answers'])
+            else:
+                continue
+            if related_score > max_related_score \
+                    or (related_score == max_related_score
+                        and len(para_tokens) < most_related_para_len):
+                most_related_para = p_idx
+                most_related_para_len = len(para_tokens)
+                max_related_score = related_score
+        doc['most_related_para'] = most_related_para
+    sample['answer_docs'] = []
+    sample['answer_spans'] = []
+    sample['fake_answers'] = []
+    sample['match_scores'] = []
+    best_match_score = 0
+    best_match_d_idx, best_match_span = -1, [-1, -1]
+    best_fake_answer = None
+    answer_tokens = set()
+    for segmented_answer in sample['segmented_answers']:
+        answer_tokens = answer_tokens | set(
+            [token for token in segmented_answer])
+    for d_idx, doc in enumerate(sample['documents']):
+        if not doc['is_selected']:
+            continue
+        if doc['most_related_para'] == -1:
+            doc['most_related_para'] = 0
+        most_related_para_tokens = doc['segmented_paragraphs'][doc[
+            'most_related_para']][:1000]
+        for start_tidx in range(len(most_related_para_tokens)):
+            if most_related_para_tokens[start_tidx] not in answer_tokens:
+                continue
+            for end_tidx in range(
+                    len(most_related_para_tokens) - 1, start_tidx - 1, -1):
+                span_tokens = most_related_para_tokens[start_tidx:end_tidx + 1]
+                if len(sample['segmented_answers']) > 0:
+                    match_score = metric_max_over_ground_truths(
+                        f1_score, span_tokens, sample['segmented_answers'])
+                else:
+                    match_score = 0
+                if match_score == 0:
+                    break
+                if match_score > best_match_score:
+                    best_match_d_idx = d_idx
+                    best_match_span = [start_tidx, end_tidx]
+                    best_match_score = match_score
+                    best_fake_answer = ''.join(span_tokens)
+    if best_match_score > 0:
+        sample['answer_docs'].append(best_match_d_idx)
+        sample['answer_spans'].append(best_match_span)
+        sample['fake_answers'].append(best_fake_answer)
+        sample['match_scores'].append(best_match_score)
+if __name__ == '__main__':
+    for line in sys.stdin:
+        sample = json.loads(line)
+        find_fake_answer(sample)
+        print(json.dumps(sample, encoding='utf8', ensure_ascii=False))
--- a/fluid/machine_reading_comprehension/utils/run_marco2dureader_preprocess.sh
+++ b/fluid/machine_reading_comprehension/utils/run_marco2dureader_preprocess.sh
+#!/bin/bash
+input_file=$1
+output_file=$2
+# convert the data from MARCO V2 (json) format to MARCO V1 (jsonl) format. 
+# the script was forked from MARCO repo. 
+# the format of MARCO V1 is much more easier to explore. 
+python3 marcov2_to_v1_tojsonl.py $input_file $input_file.marcov1
+# convert the data from MARCO V1 format to DuReader format. 
+python3 marcov1_to_dureader.py $input_file.marcov1 >$input_file.dureader_raw
+# tokenize the data. 
+python3 marco_tokenize_data.py $input_file.dureader_raw >$input_file.segmented
+# find fake answers (indicating the start and end positions of answers in the document) for train and dev sets. 
+# note that this should not be applied for test set, since there is no ground truth in test set. 
+python preprocess.py $input_file.segmented >$output_file
+# remove the temporal data files. 
+rm -rf $input_file.dureader_raw $input_file.segmented
--- a/fluid/machine_reading_comprehension/vocab.py
+++ b/fluid/machine_reading_comprehension/vocab.py
+# -*- coding:utf8 -*-
+# ==============================================================================
+# Copyright 2017 Baidu.com, Inc. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""
+This module implements the Vocab class for converting string to id and back
+"""
+import numpy as np
+class Vocab(object):
+    """
+    Implements a vocabulary to store the tokens in the data, with their corresponding embeddings.
+    """
+    def __init__(self, filename=None, initial_tokens=None, lower=False):
+        self.id2token = {}
+        self.token2id = {}
+        self.token_cnt = {}
+        self.lower = lower
+        self.embed_dim = None
+        self.embeddings = None
+        self.pad_token = '<blank>'
+        self.unk_token = '<unk>'
+        self.initial_tokens = initial_tokens if initial_tokens is not None else []
+        self.initial_tokens.extend([self.pad_token, self.unk_token])
+        for token in self.initial_tokens:
+            self.add(token)
+        if filename is not None:
+            self.load_from_file(filename)
+    def size(self):
+        """
+        get the size of vocabulary
+        Returns:
+            an integer indicating the size
+        """
+        return len(self.id2token)
+    def load_from_file(self, file_path):
+        """
+        loads the vocab from file_path
+        Args:
+            file_path: a file with a word in each line
+        """
+        for line in open(file_path, 'r'):
+            token = line.rstrip('\n')
+            self.add(token)
+    def get_id(self, token):
+        """
+        gets the id of a token, returns the id of unk token if token is not in vocab
+        Args:
+            key: a string indicating the word
+        Returns:
+            an integer
+        """
+        token = token.lower() if self.lower else token
+        try:
+            return self.token2id[token]
+        except KeyError:
+            return self.token2id[self.unk_token]
+    def get_token(self, idx):
+        """
+        gets the token corresponding to idx, returns unk token if idx is not in vocab
+        Args:
+            idx: an integer
+        returns:
+            a token string
+        """
+        try:
+            return self.id2token[idx]
+        except KeyError:
+            return self.unk_token
+    def add(self, token, cnt=1):
+        """
+        adds the token to vocab
+        Args:
+            token: a string
+            cnt: a num indicating the count of the token to add, default is 1
+        """
+        token = token.lower() if self.lower else token
+        if token in self.token2id:
+            idx = self.token2id[token]
+        else:
+            idx = len(self.id2token)
+            self.id2token[idx] = token
+            self.token2id[token] = idx
+        if cnt > 0:
+            if token in self.token_cnt:
+                self.token_cnt[token] += cnt
+            else:
+                self.token_cnt[token] = cnt
+        return idx
+    def filter_tokens_by_cnt(self, min_cnt):
+        """
+        filter the tokens in vocab by their count
+        Args:
+            min_cnt: tokens with frequency less than min_cnt is filtered
+        """
+        filtered_tokens = [
+            token for token in self.token2id if self.token_cnt[token] >= min_cnt
+        ]
+        # rebuild the token x id map
+        self.token2id = {}
+        self.id2token = {}
+        for token in self.initial_tokens:
+            self.add(token, cnt=0)
+        for token in filtered_tokens:
+            self.add(token, cnt=0)
+    def randomly_init_embeddings(self, embed_dim):
+        """
+        randomly initializes the embeddings for each token
+        Args:
+            embed_dim: the size of the embedding for each token
+        """
+        self.embed_dim = embed_dim
+        self.embeddings = np.random.rand(self.size(), embed_dim)
+        for token in [self.pad_token, self.unk_token]:
+            self.embeddings[self.get_id(token)] = np.zeros([self.embed_dim])
+    def load_pretrained_embeddings(self, embedding_path):
+        """
+        loads the pretrained embeddings from embedding_path,
+        tokens not in pretrained embeddings will be filtered
+        Args:
+            embedding_path: the path of the pretrained embedding file
+        """
+        trained_embeddings = {}
+        with open(embedding_path, 'r') as fin:
+            for line in fin:
+                contents = line.strip().split()
+                token = contents[0].decode('utf8')
+                if token not in self.token2id:
+                    continue
+                trained_embeddings[token] = list(map(float, contents[1:]))
+                if self.embed_dim is None:
+                    self.embed_dim = len(contents) - 1
+        filtered_tokens = trained_embeddings.keys()
+        # rebuild the token x id map
+        self.token2id = {}
+        self.id2token = {}
+        for token in self.initial_tokens:
+            self.add(token, cnt=0)
+        for token in filtered_tokens:
+            self.add(token, cnt=0)
+        # load embeddings
+        self.embeddings = np.zeros([self.size(), self.embed_dim])
+        for token in self.token2id.keys():
+            if token in trained_embeddings:
+                self.embeddings[self.get_id(token)] = trained_embeddings[token]
+    def convert_to_ids(self, tokens):
+        """
+        Convert a list of tokens to ids, use unk_token if the token is not in vocab.
+        Args:
+            tokens: a list of token
+        Returns:
+            a list of ids
+        """
+        vec = [self.get_id(label) for label in tokens]
+        return vec
+    def recover_from_ids(self, ids, stop_id=None):
+        """
+        Convert a list of ids to tokens, stop converting if the stop_id is encountered
+        Args:
+            ids: a list of ids to convert
+            stop_id: the stop id, default is None
+        Returns:
+            a list of tokens
+        """
+        tokens = []
+        for i in ids:
+            tokens += [self.get_token(i)]
+            if stop_id is not None and i == stop_id:
+                break
+        return tokens
--- a/fluid/metric_learning/losses/datareader.py
+++ b/fluid/metric_learning/losses/datareader.py
 import os
 import math
 import random
-import cPickle
 import functools
 import numpy as np
-#import paddle.v2 as paddle
 import paddle
 from PIL import Image, ImageEnhance
@@ -45,9 +43,9 @@ for i, item in enumerate(test_list):
        test_data[label] = []
    test_data[label].append(path)
-print "train_data size:", len(train_data)
+print("train_data size:", len(train_data))
-print "test_data size:", len(test_data)
+print("test_data size:", len(test_data))
-print "test_data image number:", len(test_image_list)
+print("test_data image number:", len(test_image_list))
 random.shuffle(test_image_list)
@@ -214,11 +212,11 @@ def eml_iterator(data,
                 color_jitter=False,
                 rotate=False):
    def reader():
-        labs = data.keys()
+        labs = list(data.keys())
        lab_num = len(labs)
-        ind = range(0, lab_num)
+        ind = list(range(0, lab_num))
        assert batch_size % samples_each_class == 0, "batch_size % samples_each_class != 0"
-        num_class = batch_size/samples_each_class
+        num_class = batch_size // samples_each_class
        for i in range(iter_size):
            random.shuffle(ind)
            for n in range(num_class):
@@ -245,9 +243,9 @@ def quadruplet_iterator(data,
                        color_jitter=False,
                        rotate=False):
    def reader():
-        labs = data.keys()
+        labs = list(data.keys())
        lab_num = len(labs)
-        ind = range(0, lab_num)
+        ind = list(range(0, lab_num))
        for i in range(iter_size):
            random.shuffle(ind)
            ind_sample = ind[:class_num]
@@ -255,7 +253,7 @@ def quadruplet_iterator(data,
            for ind_i in ind_sample:
                lab = labs[ind_i]
                data_list = data[lab]
-                data_ind = range(0, len(data_list))
+                data_ind = list(range(0, len(data_list)))
                random.shuffle(data_ind)
                anchor_ind = data_ind[:samples_each_class]
@@ -277,15 +275,15 @@ def triplet_iterator(data,
                     color_jitter=False,
                     rotate=False):
    def reader():
-        labs = data.keys()
+        labs = list(data.keys())
        lab_num = len(labs)
-        ind = range(0, lab_num)
+        ind = list(range(0, lab_num))
        for i in range(iter_size):
            random.shuffle(ind)
            ind_pos, ind_neg = ind[:2]
            lab_pos = labs[ind_pos]
            pos_data_list = data[lab_pos]
-            data_ind = range(0, len(pos_data_list))
+            data_ind = list(range(0, len(pos_data_list)))
            random.shuffle(data_ind)
            anchor_ind, pos_ind = data_ind[:2]
@@ -346,7 +344,7 @@ def quadruplet_train(class_num, samples_each_class):
 def triplet_train(batch_size):
    assert(batch_size % 3 == 0)
-    return triplet_iterator(train_data, 'train', batch_size, iter_size = batch_size/3 * 100, \
+    return triplet_iterator(train_data, 'train', batch_size, iter_size = batch_size//3 * 100, \
                           shuffle=True, color_jitter=False, rotate=False)
 def test():

--- a/fluid/metric_learning/losses/emlloss.py
+++ b/fluid/metric_learning/losses/emlloss.py
-import datareader as reader
 import math
 import numpy as np
 import paddle.fluid as fluid
-from metrics import calculate_order_dist_matrix
+from . import datareader as reader
-from metrics import get_gpu_num
+from .metrics import calculate_order_dist_matrix
+from .metrics import get_gpu_num
 class emlloss():
    def __init__(self, train_batch_size = 40, samples_each_class=2):
@@ -11,9 +11,9 @@ class emlloss():
        self.samples_each_class = samples_each_class
        self.train_batch_size = train_batch_size
        assert(train_batch_size % num_gpus == 0)
-        self.cal_loss_batch_size = train_batch_size / num_gpus
+        self.cal_loss_batch_size = train_batch_size // num_gpus
        assert(self.cal_loss_batch_size % samples_each_class == 0)
-        class_num = train_batch_size / samples_each_class
+        class_num = train_batch_size // samples_each_class
        self.train_reader = reader.eml_train(train_batch_size, samples_each_class)
        self.test_reader = reader.test()

--- a/fluid/metric_learning/losses/metrics.py
+++ b/fluid/metric_learning/losses/metrics.py
@@ -20,12 +20,14 @@ def recall_topk(fea, lab, k = 1):
 import subprocess
 import os
 def get_gpu_num():
    visibledevice = os.getenv('CUDA_VISIBLE_DEVICES')
    if visibledevice:
        devicenum = len(visibledevice.split(','))
    else:
-        devicenum = subprocess.check_output(['nvidia-smi', '-L']).count('\n')
+        devicenum = subprocess.check_output(
+            [str.encode('nvidia-smi'), str.encode('-L')]).decode('utf-8').count('\n')
    return devicenum
 import paddle as paddle

--- a/fluid/metric_learning/losses/quadrupletloss.py
+++ b/fluid/metric_learning/losses/quadrupletloss.py
 import numpy as np
-import datareader as reader
 import paddle.fluid as fluid
-from metrics import calculate_order_dist_matrix
+from . import datareader as reader
-from metrics import get_gpu_num
+from .metrics import calculate_order_dist_matrix
+from .metrics import get_gpu_num
 class quadrupletloss():
    def __init__(self, 
@@ -14,9 +14,9 @@ class quadrupletloss():
        self.samples_each_class = samples_each_class
        self.train_batch_size = train_batch_size
        assert(train_batch_size % num_gpus == 0)
-        self.cal_loss_batch_size = train_batch_size / num_gpus
+        self.cal_loss_batch_size = train_batch_size // num_gpus
        assert(self.cal_loss_batch_size % samples_each_class == 0)
-        class_num = train_batch_size / samples_each_class
+        class_num = train_batch_size // samples_each_class
        self.train_reader = reader.quadruplet_train(class_num, samples_each_class)
        self.test_reader = reader.test()

--- a/fluid/metric_learning/losses/tripletloss.py
+++ b/fluid/metric_learning/losses/tripletloss.py
-import datareader as reader
+from . import datareader as reader
 import paddle.fluid as fluid
 class tripletloss():

--- a/fluid/metric_learning/models/resnet.py
+++ b/fluid/metric_learning/models/resnet.py
@@ -75,7 +75,7 @@ class ResNet():
            num_filters=num_filters,
            filter_size=filter_size,
            stride=stride,
-            padding=(filter_size - 1) / 2,
+            padding=(filter_size - 1) // 2,
            groups=groups,
            act=None,
            bias_attr=False)

--- a/fluid/metric_learning/models/se_resnext.py
+++ b/fluid/metric_learning/models/se_resnext.py
@@ -127,7 +127,7 @@ class SE_ResNeXt():
            num_filters=num_filters,
            filter_size=filter_size,
            stride=stride,
-            padding=(filter_size - 1) / 2,
+            padding=(filter_size - 1) // 2,
            groups=groups,
            act=None,
            bias_attr=False)

--- a/fluid/metric_learning/train.py
+++ b/fluid/metric_learning/train.py
@@ -93,7 +93,7 @@ def train(args):
    elif loss_name == "emlloss":
        metricloss = emlloss(
                train_batch_size = args.train_batch_size, 
-                samples_each_class=2
+                samples_each_class = args.samples_each_class
        )
        cost_metric = metricloss.loss(out[0])
        avg_cost_metric = fluid.layers.mean(x=cost_metric)

--- a/fluid/metric_learning/utility.py
+++ b/fluid/metric_learning/utility.py
@@ -17,6 +17,7 @@ from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
 import distutils.util
+import six
 import numpy as np
 from paddle.fluid import core
@@ -37,7 +38,7 @@ def print_arguments(args):
    :type args: argparse.Namespace
    """
    print("-----------  Configuration Arguments -----------")
-    for arg, value in sorted(vars(args).iteritems()):
+    for arg, value in sorted(six.iteritems(vars(args))):
        print("%s: %s" % (arg, value))
    print("------------------------------------------------")

--- a/fluid/neural_machine_translation/rnn_search/README.md
+++ b/fluid/neural_machine_translation/rnn_search/README.md
+运行本目录下的范例模型需要安装PaddlePaddle Fluid 1.0版。如果您的 PaddlePaddle 安装版本低于此要求，请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新 PaddlePaddle 安装版本。
+# 机器翻译：RNN Search
+以下是本范例模型的简要目录结构及说明：
+```text
+.
+├── README.md              # 文档，本文件
+├── args.py                # 训练、预测以及模型参数
+├── train.py               # 训练主程序
+├── infer.py               # 预测主程序
+├── attention_model.py     # 带注意力机制的翻译模型配置
+└── no_attention_model.py  # 无注意力机制的翻译模型配置
+```
+## 简介
+机器翻译（machine translation, MT）是用计算机来实现不同语言之间翻译的技术。被翻译的语言通常称为源语言（source language），翻译成的结果语言称为目标语言（target language）。机器翻译即实现从源语言到目标语言转换的过程，是自然语言处理的重要研究领域之一。
+近年来，深度学习技术的发展不断为机器翻译任务带来新的突破。直接用神经网络将源语言映射到目标语言，即端到端的神经网络机器翻译（End-to-End Neural Machine Translation, End-to-End NMT）模型逐渐成为主流，此类模型一般简称为NMT模型。
+本目录包含一个经典的机器翻译模型[RNN Search](https://arxiv.org/pdf/1409.0473.pdf)的Paddle Fluid实现。事实上，RNN search是一个较为传统的NMT模型，在现阶段，其表现已被很多新模型（如[Transformer](https://arxiv.org/abs/1706.03762)）超越。但除机器翻译外，该模型是许多序列到序列（sequence to sequence, 以下简称Seq2Seq）类模型的基础，很多解决其他NLP问题的模型均以此模型为基础；因此其在NLP领域具有重要意义，并被广泛用作Baseline. 
+本目录下此范例模型的实现，旨在展示如何用Paddle Fluid实现一个带有注意力机制（Attention）的RNN模型来解决Seq2Seq类问题，以及如何使用带有Beam Search算法的解码器。如果您仅仅只是需要在机器翻译方面有着较好翻译效果的模型，则建议您参考[Transformer的Paddle Fluid实现](https://github.com/PaddlePaddle/models/tree/develop/fluid/neural_machine_translation/transformer)。
+## 模型概览
+RNN Search模型使用了经典的编码器-解码器（Encoder-Decoder）的框架结构来解决Seq2Seq类问题。这种方法先用编码器将源序列编码成vector，再用解码器将该vector解码为目标序列。这其实模拟了人类在进行翻译类任务时的行为：先解析源语言，理解其含义，再根据该含义来写出目标语言的语句。编码器和解码器往往都使用RNN来实现。关于此方法的具体原理和数学表达式，可以参考[深度学习101](http://www.paddlepaddle.org/documentation/docs/zh/0.15.0/beginners_guide/basics/machine_translation/index.html).
+本模型中，在编码器方面，我们的实现使用了双向循环神经网络（Bi-directional Recurrent Neural Network）；在解码器方面，我们使用了带注意力（Attention）机制的RNN解码器，并同时提供了一个不带注意力机制的解码器实现作为对比；而在预测方面我们使用柱搜索（beam search）算法来生成翻译的目标语句。以下将分别介绍用到的这些方法。
+### 双向循环神经网络
+这里介绍Bengio团队在论文\[[2](#参考文献),[4](#参考文献)\]中提出的一种双向循环网络结构。该结构的目的是输入一个序列，得到其在每个时刻的特征表示，即输出的每个时刻都用定长向量表示到该时刻的上下文语义信息。
+具体来说，该双向循环神经网络分别在时间维以顺序和逆序——即前向（forward）和后向（backward）——依次处理输入序列，并将每个时间步RNN的输出拼接成为最终的输出层。这样每个时间步的输出节点，都包含了输入序列中当前时刻完整的过去和未来的上下文信息。下图展示的是一个按时间步展开的双向循环神经网络。该网络包含一个前向和一个后向RNN，其中有六个权重矩阵：输入到前向隐层和后向隐层的权重矩阵（$W_1, W_3$），隐层到隐层自己的权重矩阵（$W_2,W_5$），前向隐层和后向隐层到输出层的权重矩阵（$W_4, W_6$）。注意，该网络的前向隐层和后向隐层之间没有连接。
+<p align="center">
+<img src="images/bi_rnn.png" width=450><br/>
+图1. 按时间步展开的双向循环神经网络
+</p>
+<p align="center">
+<img src="images/encoder_attention.png" width=500><br/>
+图2. 使用双向LSTM的编码器
+</p>
+### 注意力机制
+如果编码阶段的输出是一个固定维度的向量，会带来以下两个问题：1）不论源语言序列的长度是5个词还是50个词，如果都用固定维度的向量去编码其中的语义和句法结构信息，对模型来说是一个非常高的要求，特别是对长句子序列而言；2）直觉上，当人类翻译一句话时，会对与当前译文更相关的源语言片段上给予更多关注，且关注点会随着翻译的进行而改变。而固定维度的向量则相当于，任何时刻都对源语言所有信息给予了同等程度的关注，这是不合理的。因此，Bahdanau等人\[[4](#参考文献)\]引入注意力（attention）机制，可以对编码后的上下文片段进行解码，以此来解决长句子的特征学习问题。下面介绍在注意力机制下的解码器结构。
+与简单的解码器不同，这里$z_i$的计算公式为：
+$$z_{i+1}=\phi _{\theta '}\left ( c_i,u_i,z_i \right )$$
+可见，源语言句子的编码向量表示为第$i$个词的上下文片段$c_i$，即针对每一个目标语言中的词$u_i$，都有一个特定的$c_i$与之对应。$c_i$的计算公式如下：
+$$c_i=\sum _{j=1}^{T}a_{ij}h_j, a_i=\left[ a_{i1},a_{i2},...,a_{iT}\right ]$$
+从公式中可以看出，注意力机制是通过对编码器中各时刻的RNN状态$h_j$进行加权平均实现的。权重$a_{ij}$表示目标语言中第$i$个词对源语言中第$j$个词的注意力大小，$a_{ij}$的计算公式如下：
+$$a_{ij} = {exp(e_{ij}) \over {\sum_{k=1}^T exp(e_{ik})}}$$
+$$e_{ij} = {align(z_i, h_j)}$$
+其中，$align$可以看作是一个对齐模型，用来衡量目标语言中第$i$个词和源语言中第$j$个词的匹配程度。具体而言，这个程度是通过解码RNN的第$i$个隐层状态$z_i$和源语言句子的第$j$个上下文片段$h_j$计算得到的。传统的对齐模型中，目标语言的每个词明确对应源语言的一个或多个词（hard alignment）；而在注意力模型中采用的是soft alignment，即任何两个目标语言和源语言词间均存在一定的关联，且这个关联强度是由模型计算得到的实数，因此可以融入整个NMT框架，并通过反向传播算法进行训练。
+<p align="center">
+<img src="images/decoder_attention.png" width=500><br/>
+图3. 基于注意力机制的解码器
+</p>
+### 柱搜索算法
+柱搜索（[beam search](http://en.wikipedia.org/wiki/Beam_search)）是一种启发式图搜索算法，用于在图或树中搜索有限集合中的最优扩展节点，通常用在解空间非常大的系统（如机器翻译、语音识别）中，原因是内存无法装下图或树中所有展开的解。如在机器翻译任务中希望翻译“`<s>你好<e>`”，就算目标语言字典中只有3个词（`<s>`, `<e>`, `hello`），也可能生成无限句话（`hello`循环出现的次数不定），为了找到其中较好的翻译结果，我们可采用柱搜索算法。
+柱搜索算法使用广度优先策略建立搜索树，在树的每一层，按照启发代价（heuristic cost）（本教程中，为生成词的log概率之和）对节点进行排序，然后仅留下预先确定的个数（文献中通常称为beam width、beam size、柱宽度等）的节点。只有这些节点会在下一层继续扩展，其他节点就被剪掉了，也就是说保留了质量较高的节点，剪枝了质量较差的节点。因此，搜索所占用的空间和时间大幅减少，但缺点是无法保证一定获得最优解。
+使用柱搜索算法的解码阶段，目标是最大化生成序列的概率。思路是：
+1. 每一个时刻，根据源语言句子的编码信息$c$、生成的第$i$个目标语言序列单词$u_i$和$i$时刻RNN的隐层状态$z_i$，计算出下一个隐层状态$z_{i+1}$。
+2. 将$z_{i+1}$通过`softmax`归一化，得到目标语言序列的第$i+1$个单词的概率分布$p_{i+1}$。
+3. 根据$p_{i+1}$采样出单词$u_{i+1}$。
+4. 重复步骤1~3，直到获得句子结束标记`<e>`或超过句子的最大生成长度为止。
+注意：$z_{i+1}$和$p_{i+1}$的计算公式同解码器中的一样。且由于生成时的每一步都是通过贪心法实现的，因此并不能保证得到全局最优解。
+## 数据介绍
+本教程使用[WMT-14](http://www-lium.univ-lemans.fr/~schwenk/cslm_joint_paper/)数据集中的[bitexts(after selection)](http://www-lium.univ-lemans.fr/~schwenk/cslm_joint_paper/data/bitexts.tgz)作为训练集，[dev+test data](http://www-lium.univ-lemans.fr/~schwenk/cslm_joint_paper/data/dev+test.tgz)作为测试集和生成集。
+### 数据预处理
+我们的预处理流程包括两步：
+- 将每个源语言到目标语言的平行语料库文件合并为一个文件：
+  - 合并每个`XXX.src`和`XXX.trg`文件为`XXX`。
+  - `XXX`中的第$i$行内容为`XXX.src`中的第$i$行和`XXX.trg`中的第$i$行连接，用'\t'分隔。
+- 创建训练数据的“源字典”和“目标字典”。每个字典都有**DICTSIZE**个单词，包括：语料中词频最高的（DICTSIZE - 3）个单词，和3个特殊符号`<s>`（序列的开始）、`<e>`（序列的结束）和`<unk>`（未登录词）。
+### 示例数据
+因为完整的数据集数据量较大，为了验证训练流程，PaddlePaddle接口paddle.dataset.wmt14中默认提供了一个经过预处理的[较小规模的数据集](http://paddlepaddle.bj.bcebos.com/demo/wmt_shrinked_data/wmt14.tgz)。
+该数据集有193319条训练数据，6003条测试数据，词典长度为30000。因为数据规模限制，使用该数据集训练出来的模型效果无法保证。
+## 训练模型
+`train.py`包含训练程序的主函数，要使用默认参数开始训练，只需要简单地执行：
+```sh
+python train.py
+```
+您可以使用命令行参数来设置模型训练时的参数。要显示所有可用的命令行参数，执行：
+```sh
+python train.py -h
+```
+这样会显示所有的命令行参数的描述，以及其默认值。默认的模型是带有注意力机制的。您也可以尝试运行无注意力机制的模型，命令如下：
+```sh
+python train.py --no_attention
+```
+训练好的模型默认会被保存到`./models`路径下。您可以用命令行参数`--save_dir`来指定模型的保存路径。默认每个pass结束时会保存一个模型。
+## 生成预测结果
+在模型训练好后，可以用`infer.py`来生成预测结果。同样的，使用默认参数，只需要执行：
+```sh
+python infer.py
+```
+您也可以同样用命令行来指定各参数。注意，预测时的参数设置必须与训练时完全一致，否则载入模型会失败。您可以用`--pass_num`参数来选择读取哪个pass结束时保存的模型。同时您可以使用`--beam_width`参数来选择beam search宽度。
+## 参考文献
+1. Koehn P. [Statistical machine translation](https://books.google.com.hk/books?id=4v_Cx1wIMLkC&printsec=frontcover&hl=zh-CN&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false)[M]. Cambridge University Press, 2009.
+2. Cho K, Van Merriënboer B, Gulcehre C, et al. [Learning phrase representations using RNN encoder-decoder for statistical machine translation](http://www.aclweb.org/anthology/D/D14/D14-1179.pdf)[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014: 1724-1734.
+3. Chung J, Gulcehre C, Cho K H, et al. [Empirical evaluation of gated recurrent neural networks on sequence modeling](https://arxiv.org/abs/1412.3555)[J]. arXiv preprint arXiv:1412.3555, 2014.
+4.  Bahdanau D, Cho K, Bengio Y. [Neural machine translation by jointly learning to align and translate](https://arxiv.org/abs/1409.0473)[C]//Proceedings of ICLR 2015, 2015.
+5. Papineni K, Roukos S, Ward T, et al. [BLEU: a method for automatic evaluation of machine translation](http://dl.acm.org/citation.cfm?id=1073135)[C]//Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 2002: 311-318.
+<br/>
+<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="知识共享许可协议" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" href="http://purl.org/dc/dcmitype/Text" property="dct:title" rel="dct:type">本教程</span> 由 <a xmlns:cc="http://creativecommons.org/ns#" href="http://book.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">PaddlePaddle</a> 创作，采用 <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">知识共享 署名-相同方式共享 4.0 国际 许可协议</a>进行许可。
\ No newline at end of file
--- a/fluid/neural_machine_translation/rnn_search/_ce.py
+++ b/fluid/neural_machine_translation/rnn_search/_ce.py
@@ -7,9 +7,9 @@ from kpi import CostKpi, DurationKpi, AccKpi
 #### NOTE kpi.py should shared in models in some way!!!!
-train_cost_kpi = CostKpi('train_cost', 0.02, 0, actived=True)
+train_cost_kpi = CostKpi('train_cost', 0.02, 0, actived=False)
-test_cost_kpi = CostKpi('test_cost', 0.005, 0, actived=True)
+test_cost_kpi = CostKpi('test_cost', 0.005, 0, actived=False)
-train_duration_kpi = DurationKpi('train_duration', 0.06, 0, actived=True)
+train_duration_kpi = DurationKpi('train_duration', 0.06, 0, actived=False)
 tracking_kpis = [
    train_cost_kpi,

--- a/fluid/neural_machine_translation/rnn_search/args.py
+++ b/fluid/neural_machine_translation/rnn_search/args.py
@@ -52,7 +52,8 @@ def parse_args():
        "--pass_num",
        type=int,
        default=5,
-        help="The pass number to train. (default: %(default)d)")
+        help="The pass number to train. In inference mode, load the saved model"
+        " at the end of given pass.(default: %(default)d)")
    parser.add_argument(
        "--learning_rate",
        type=float,
@@ -66,17 +67,17 @@ def parse_args():
        "--beam_size",
        type=int,
        default=3,
-        help="The width for beam searching. (default: %(default)d)")
+        help="The width for beam search. (default: %(default)d)")
    parser.add_argument(
        "--use_gpu",
        type=distutils.util.strtobool,
        default=True,
-        help="Whether to use gpu. (default: %(default)d)")
+        help="Whether to use gpu or not. (default: %(default)d)")
    parser.add_argument(
        "--max_length",
        type=int,
        default=50,
-        help="The maximum length of sequence when doing generation. "
+        help="The maximum sequence length for translation result."
        "(default: %(default)d)")
    parser.add_argument(
        "--save_dir",

--- a/fluid/neural_machine_translation/rnn_search/attention_model.py
+++ b/fluid/neural_machine_translation/rnn_search/attention_model.py
@@ -122,9 +122,8 @@ def seq_to_seq_net(embedding_dim, encoder_size, decoder_size, source_dict_dim,
        decoder_state_expand = fluid.layers.sequence_expand(
            x=decoder_state_proj, y=encoder_proj)
        # concated lod should inherit from encoder_proj
-        concated = fluid.layers.concat(
+        mixed_state = encoder_proj + decoder_state_expand
-            input=[encoder_proj, decoder_state_expand], axis=1)
+        attention_weights = fluid.layers.fc(input=mixed_state,
-        attention_weights = fluid.layers.fc(input=concated,
                                            size=1,
                                            bias_attr=False)
        attention_weights = fluid.layers.sequence_softmax(

--- a/fluid/neural_machine_translation/rnn_search/images/bi_rnn.png
+++ b/fluid/neural_machine_translation/rnn_search/images/bi_rnn.png
--- a/fluid/neural_machine_translation/rnn_search/images/decoder_attention.png
+++ b/fluid/neural_machine_translation/rnn_search/images/decoder_attention.png
--- a/fluid/neural_machine_translation/rnn_search/images/encoder_attention.png
+++ b/fluid/neural_machine_translation/rnn_search/images/encoder_attention.png
--- a/fluid/neural_machine_translation/transformer/README_cn.md
+++ b/fluid/neural_machine_translation/transformer/README_cn.md
@@ -16,7 +16,7 @@
 ├── reader.py            # 数据读取接口
 ├── README.md            # 文档
 ├── train.py             # 训练脚本
-└── util.py              # wordpiece 数据解码工具
+└── gen_data.sh          # 数据生成脚本
 ```
 ### 简介
@@ -59,36 +59,29 @@ Decoder 具有和 Encoder 类似的结构，只是相比于组成 Encoder 的 la
 ### 数据准备
-WMT 数据集是机器翻译领域公认的主流数据集；WMT 英德和英法数据集也是 Transformer 论文中所用数据集，其中英德数据集使用了 BPE（byte-pair encoding）[4]编码的数据，英法数据集使用了 wordpiece [5]的数据。我们这里也将使用 WMT 英德和英法翻译数据，并和论文保持一致使用 BPE 和 wordpiece 的数据，下面给出了使用的方法。对于其他自定义数据，参照下文遵循或转换为类似的数据格式即可。
+WMT 数据集是机器翻译领域公认的主流数据集，[WMT'16 EN-DE 数据集](http://www.statmt.org/wmt16/translation-task.html)是其中一个中等规模的数据集，也是 Transformer 论文中用到的一个数据集，这里将其作为示例，可以直接运行 `gen_data.sh` 脚本进行 WMT'16 EN-DE 数据集的下载和预处理。数据处理过程主要包括 Tokenize 和 BPE 编码（byte-pair encoding）；BPE 编码的数据能够较好的解决未登录词（out-of-vocabulary，OOV）的问题[4]，其在 Transformer 论文中也被使用。运行成功后，将会生成文件夹 `gen_data`，其目录结构如下（可在 `gen_data.sh` 中修改）：
-#### WMT 英德翻译数据
+```text
+.
-[WMT'16 EN-DE 数据集](http://www.statmt.org/wmt16/translation-task.html)是一个中等规模的数据集。参照论文，英德数据集我们使用 BPE 编码的数据，这能够更好的解决未登录词（out-of-vocabulary，OOV）的问题[4]。用到的 BPE 数据可以参照[这里](https://github.com/google/seq2seq/blob/master/docs/data.md)进行下载（如果希望在自定义数据中使用 BPE 编码，可以参照[这里](https://github.com/rsennrich/subword-nmt)进行预处理），下载后解压，其中 `train.tok.clean.bpe.32000.en` 和 `train.tok.clean.bpe.32000.de` 为使用 BPE 的训练数据（平行语料，分别对应了英语和德语，经过了 tokenize 和 BPE 的处理），`newstest2016.tok.bpe.32000.en` 和 `newstest2016.tok.bpe.32000.de` 等为测试数据（`newstest2016.tok.en` 和 `newstest2016.tok.de` 等则为对应的未使用 BPE 的测试数据），`vocab.bpe.32000` 为相应的词典文件（源语言和目标语言共享该词典文件）。
+├── wmt16_ende_data              # WMT16 英德翻译数据
+├── wmt16_ende_data_bpe          # BPE 编码的 WMT16 英德翻译数据
-由于本示例中的数据读取脚本 `reader.py` 默认使用的样本数据的格式为 `\t` 分隔的的源语言和目标语言句子对（默认句子中的词之间使用空格分隔），因此需要将源语言到目标语言的平行语料库文件合并为一个文件，可以执行以下命令进行合并：
+├── mosesdecoder                 # Moses 机器翻译工具集，包含了 Tokenize、BLEU 评估等脚本
-```sh
+└── subword-nmt                  # BPE 编码的代码
-paste -d '\t' train.tok.clean.bpe.32000.en train.tok.clean.bpe.32000.de > train.tok.clean.bpe.32000.en-de
-```
-此外，下载的词典文件 `vocab.bpe.32000` 中未包含表示序列开始、序列结束和未登录词的特殊符号，可以使用如下命令在词典中加入 `<s>` 、`<e>` 和 `<unk>` 作为这三个特殊符号（用 BPE 表示数据已有效避免了未登录词的问题，这里加入只是做通用处理）。
-```sh
-sed -i '1i\<s>\n<e>\n<unk>' vocab.bpe.32000
 ```
-#### WMT 英法翻译数据
+`gen_data/wmt16_ende_data_bpe` 中是我们最终使用的英德翻译数据，其中 `train.tok.clean.bpe.32000.en-de` 为训练数据，`newstest2016.tok.bpe.32000.en-de` 等为验证和测试数据，。`vocab_all.bpe.32000` 为相应的词典文件（已加入 `<s>` 、`<e>` 和 `<unk>` 这三个特殊符号，源语言和目标语言共享该词典文件）。
-[WMT'14 EN-FR 数据集](http://www.statmt.org/wmt14/translation-task.html)是一个较大规模的数据集。参照论文，英法数据我们使用 wordpiece 表示的数据，wordpiece 和 BPE 类似同为采用 sub-word units 来解决 OOV 问题的方法[5]。我们提供了已完成预处理的 wordpiece 数据的下载，可以从[这里](http://transformer-data.bj.bcebos.com/wmt14_enfr.tar)下载，其中 `train.wordpiece.en-fr` 为使用 wordpiece 的训练数据，`newstest2014.wordpiece.en-fr` 为测试数据（`newstest2014.tok.en` 和 `newstest2014.tok.fr` 为对应的未经 wordpiece 处理过的测试数据，使用[脚本](https://github.com/moses-smt/mosesdecoder/blob/master/scripts/tokenizer/tokenizer.perl)进行了 tokenize 的处理），`vocab.wordpiece.en-fr` 为相应的词典文件（源语言和目标语言共享该词典文件）。
-提供的英法翻译数据无需进行额外的处理，可以直接使用；需要注意的是，这些用 wordpiece 表示的数据中句子内的 token 之间使用 `\x01` 而非空格进行分隔（因部分 token 内包含空格），这需要在训练时进行指定。
+对于其他自定义数据，转换为类似 `train.tok.clean.bpe.32000.en-de` 的数据格式（`\t` 分隔的源语言和目标语言句子对，句子中的 token 之间使用空格分隔）即可；如需使用 BPE 编码，可参考，亦可以使用类似 WMT，使用 `gen_data.sh` 进行处理。
 ### 模型训练
 `train.py` 是模型训练脚本。以英德翻译数据为例，可以执行以下命令进行模型训练：
 ```sh
 python -u train.py \
-  --src_vocab_fpath data/vocab.bpe.32000 \
+  --src_vocab_fpath gen_data/wmt16_ende_data_bpe/vocab_all.bpe.32000 \
-  --trg_vocab_fpath data/vocab.bpe.32000 \
+  --trg_vocab_fpath gen_data/wmt16_ende_data_bpe/vocab_all.bpe.32000 \
  --special_token '<s>' '<e>' '<unk>' \
-  --train_file_pattern data/train.tok.clean.bpe.32000.en-de \
+  --train_file_pattern gen_data/wmt16_ende_data_bpe/train.tok.clean.bpe.32000.en-de \
  --token_delimiter ' ' \
  --use_token_batch True \
  --batch_size 4096 \
@@ -104,10 +97,10 @@ python train.py --help
 ```sh
 python -u train.py \
-  --src_vocab_fpath data/vocab.bpe.32000 \
+  --src_vocab_fpath gen_data/wmt16_ende_data_bpe/vocab_all.bpe.32000 \
-  --trg_vocab_fpath data/vocab.bpe.32000 \
+  --trg_vocab_fpath gen_data/wmt16_ende_data_bpe/vocab_all.bpe.32000 \
  --special_token '<s>' '<e>' '<unk>' \
-  --train_file_pattern data/train.tok.clean.bpe.32000.en-de \
+  --train_file_pattern gen_data/wmt16_ende_data_bpe/train.tok.clean.bpe.32000.en-de \
  --token_delimiter ' ' \
  --use_token_batch True \
  --batch_size 3200 \
@@ -120,7 +113,7 @@ python -u train.py \
  n_head 16 \
  prepostprocess_dropout 0.3
 ```
-有关这些参数更详细信息的请参考 `config.py` 中的注释说明。对于英法翻译数据，执行训练和英德翻译训练类似，修改命令中的词典和数据文件为英法数据相应文件的路径，另外要注意的是由于英法翻译数据 token 间不是使用空格进行分隔，需要修改 `token_delimiter` 参数的设置为 `--token_delimiter '\x01'`。
+有关这些参数更详细信息的请参考 `config.py` 中的注释说明。
 训练时默认使用所有 GPU，可以通过 `CUDA_VISIBLE_DEVICES` 环境变量来设置使用的 GPU 数目。也可以只使用 CPU 训练(通过参数 `--divice CPU` 设置)，训练速度相对较慢。在训练过程中，每隔一定 iteration 后(通过参数 `save_freq` 设置，默认为10000)保存模型到参数 `model_dir` 指定的目录，每个 epoch 结束后也会保存 checkpiont 到 `ckpt_dir` 指定的目录，每个 iteration 将打印如下的日志到标准输出：
 ```txt
@@ -141,10 +134,10 @@ step_idx: 9, epoch: 0, batch: 9, avg loss: 10.993434, normalized loss: 9.616467,
 `infer.py` 是模型预测脚本。以英德翻译数据为例，模型训练完成后可以执行以下命令对指定文件中的文本进行翻译：
 ```sh
 python -u infer.py \
-  --src_vocab_fpath data/vocab.bpe.32000 \
+  --src_vocab_fpath gen_data/wmt16_ende_data_bpe/vocab_all.bpe.32000 \
-  --trg_vocab_fpath data/vocab.bpe.32000 \
+  --trg_vocab_fpath gen_data/wmt16_ende_data_bpe/vocab_all.bpe.32000 \
  --special_token '<s>' '<e>' '<unk>' \
-  --test_file_pattern data/newstest2016.tok.bpe.32000.en-de \
+  --test_file_pattern gen_data/wmt16_ende_data_bpe/newstest2016.tok.bpe.32000.en-de \
  --use_wordpiece False \
  --token_delimiter ' ' \
  --batch_size 32 \
@@ -159,14 +152,9 @@ python -u infer.py \
 sed -r 's/(@@ )|(@@ ?$)//g' predict.txt > predict.tok.txt
 ```
-对于英法翻译的 wordpiece 数据，执行预测和英德翻译预测类似，修改命令中的词典和数据文件为英法数据相应文件的路径，另外需要注意修改 `token_delimiter` 参数的设置为 `--token_delimiter '\x01'`；同时要修改 `use_wordpiece` 参数的设置为 `--use_wordpiece True`，这会在预测时将翻译得到的 wordpiece 数据还原为原始数据输出。为了使用 tokenize 的数据进行评估，还需要对翻译结果进行 tokenize 的处理，[Moses](https://github.com/moses-smt/mosesdecoder) 提供了一系列机器翻译相关的脚本。执行 `git clone https://github.com/moses-smt/mosesdecoder.git` 克隆 mosesdecoder 仓库后，可以使用其中的 `tokenizer.perl` 脚本对 `predict.txt` 内的翻译结果进行 tokenize 处理并输出到 `predict.tok.txt` 中，如下：
+接下来就可以使用参考翻译对翻译结果进行 BLEU 指标的评估了。以英德翻译 `newstest2016.tok.de` 数据为例，执行如下命令：
-```sh
-perl mosesdecoder/scripts/tokenizer/tokenizer.perl -l fr < predict.txt > predict.tok.txt
-```
-接下来就可以使用参考翻译对翻译结果进行 BLEU 指标的评估了。计算 BLEU 值的脚本也在 Moses 中包含，以英德翻译 `newstest2016.tok.de` 数据为例，执行如下命令：
 ```sh
-perl mosesdecoder/scripts/generic/multi-bleu.perl data/newstest2016.tok.de < predict.tok.txt
+perl gen_data/mosesdecoder/scripts/generic/multi-bleu.perl gen_data/wmt16_ende_data/newstest2016.tok.de < predict.tok.txt
 ```
 可以看到类似如下的结果（为单机两卡训练 200K 个 iteration 后模型的预测结果）。
 ```
@@ -174,11 +162,10 @@ BLEU = 33.08, 64.2/39.2/26.4/18.5 (BP=0.994, ratio=0.994, hyp_len=61971, ref_len
 ```
 目前在未使用 model average 的情况下，英德翻译 base model 八卡训练 100K 个 iteration 后测试 BLEU 值如下：
-| 测试集 | newstest2013 | newstest2014 | newstest2015 | newstest2016 |
+| 测试集 | newstest2014 | newstest2015 | newstest2016 |
-|-|-|-|-|-|
+|-|-|-|-|
-| BLEU | 25.27 | 26.05 | 28.75 | 33.27 |
+| BLEU | 26.05 | 28.75 | 33.27 |
-英法翻译 base model 八卡训练 100K 个 iteration 后在 `newstest2014` 上测试 BLEU 值为36.。
 ### 分布式训练
@@ -260,4 +247,3 @@ export PADDLE_PORT=6177
 2. He K, Zhang X, Ren S, et al. [Deep residual learning for image recognition](http://openaccess.thecvf.com/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf)[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.
 3. Ba J L, Kiros J R, Hinton G E. [Layer normalization](https://arxiv.org/pdf/1607.06450.pdf)[J]. arXiv preprint arXiv:1607.06450, 2016.
 4. Sennrich R, Haddow B, Birch A. [Neural machine translation of rare words with subword units](https://arxiv.org/pdf/1508.07909)[J]. arXiv preprint arXiv:1508.07909, 2015.
-5. Wu Y, Schuster M, Chen Z, et al. [Google's neural machine translation system: Bridging the gap between human and machine translation](https://arxiv.org/pdf/1609.08144.pdf)[J]. arXiv preprint arXiv:1609.08144, 2016.
--- a/fluid/neural_machine_translation/transformer/gen_data.sh
+++ b/fluid/neural_machine_translation/transformer/gen_data.sh
+#! /usr/bin/env bash
+set -e
+OUTPUT_DIR=$PWD/gen_data
+###############################################################################
+# change these variables for other WMT data
+###############################################################################
+OUTPUT_DIR_DATA="${OUTPUT_DIR}/wmt16_ende_data"
+OUTPUT_DIR_BPE_DATA="${OUTPUT_DIR}/wmt16_ende_data_bpe"
+LANG1="en"
+LANG2="de"
+# each of TRAIN_DATA: data_url data_file_lang1 data_file_lang2
+TRAIN_DATA=(
+'http://www.statmt.org/europarl/v7/de-en.tgz'
+'europarl-v7.de-en.en' 'europarl-v7.de-en.de'
+'http://www.statmt.org/wmt13/training-parallel-commoncrawl.tgz'
+'commoncrawl.de-en.en' 'commoncrawl.de-en.de'
+'http://data.statmt.org/wmt16/translation-task/training-parallel-nc-v11.tgz'
+'news-commentary-v11.de-en.en' 'news-commentary-v11.de-en.de'
+)
+# each of DEV_TEST_DATA: data_url data_file_lang1 data_file_lang2
+DEV_TEST_DATA=(
+'http://data.statmt.org/wmt16/translation-task/dev.tgz'
+'newstest201[45]-deen-ref.en.sgm' 'newstest201[45]-deen-src.de.sgm'
+'http://data.statmt.org/wmt16/translation-task/test.tgz'
+'newstest2016-deen-ref.en.sgm' 'newstest2016-deen-src.de.sgm'
+)
+###############################################################################
+###############################################################################
+# change these variables for other WMT data
+###############################################################################
+# OUTPUT_DIR_DATA="${OUTPUT_DIR}/wmt14_enfr_data"
+# OUTPUT_DIR_BPE_DATA="${OUTPUT_DIR}/wmt14_enfr_data_bpe"
+# LANG1="en"
+# LANG2="fr"
+# # each of TRAIN_DATA: ata_url data_tgz data_file 
+# TRAIN_DATA=(
+# 'http://www.statmt.org/wmt13/training-parallel-commoncrawl.tgz'
+# 'commoncrawl.fr-en.en' 'commoncrawl.fr-en.fr'
+# 'http://www.statmt.org/wmt13/training-parallel-europarl-v7.tgz'
+# 'training/europarl-v7.fr-en.en' 'training/europarl-v7.fr-en.fr'
+# 'http://www.statmt.org/wmt14/training-parallel-nc-v9.tgz'
+# 'training/news-commentary-v9.fr-en.en' 'training/news-commentary-v9.fr-en.fr'
+# 'http://www.statmt.org/wmt10/training-giga-fren.tar'
+# 'giga-fren.release2.fixed.en.*' 'giga-fren.release2.fixed.fr.*'
+# 'http://www.statmt.org/wmt13/training-parallel-un.tgz'
+# 'un/undoc.2000.fr-en.en' 'un/undoc.2000.fr-en.fr'
+# )
+# # each of DEV_TEST_DATA: data_url data_tgz data_file_lang1 data_file_lang2
+# DEV_TEST_DATA=(
+# 'http://data.statmt.org/wmt16/translation-task/dev.tgz'
+# '.*/newstest201[45]-fren-ref.en.sgm' '.*/newstest201[45]-fren-src.fr.sgm'
+# 'http://data.statmt.org/wmt16/translation-task/test.tgz'
+# '.*/newstest2016-fren-ref.en.sgm' '.*/newstest2016-fren-src.fr.sgm'
+# )
+###############################################################################
+mkdir -p $OUTPUT_DIR_DATA $OUTPUT_DIR_BPE_DATA
+# Extract training data
+for ((i=0;i<${#TRAIN_DATA[@]};i+=3)); do
+  data_url=${TRAIN_DATA[i]}
+  data_tgz=${data_url##*/}  # training-parallel-commoncrawl.tgz
+  data=${data_tgz%.*}  # training-parallel-commoncrawl
+  data_lang1=${TRAIN_DATA[i+1]}
+  data_lang2=${TRAIN_DATA[i+2]}
+  if [ ! -e ${OUTPUT_DIR_DATA}/${data_tgz} ]; then
+    echo "Download "${data_url}
+    wget -O ${OUTPUT_DIR_DATA}/${data_tgz} ${data_url}
+  fi
+  if [ ! -d ${OUTPUT_DIR_DATA}/${data} ]; then
+    echo "Extract "${data_tgz}
+    mkdir -p ${OUTPUT_DIR_DATA}/${data}
+    tar_type=${data_tgz:0-3}
+    if [ ${tar_type} == "tar" ]; then
+      tar -xvf ${OUTPUT_DIR_DATA}/${data_tgz} -C ${OUTPUT_DIR_DATA}/${data}
+    else
+      tar -xvzf ${OUTPUT_DIR_DATA}/${data_tgz} -C ${OUTPUT_DIR_DATA}/${data}
+    fi
+  fi
+  # concatenate all training data
+  for data_lang in $data_lang1 $data_lang2; do
+    for f in `find ${OUTPUT_DIR_DATA}/${data} -regex ".*/${data_lang}"`; do
+      data_dir=`dirname $f`
+      data_file=`basename $f`
+      f_base=${f%.*}
+      f_ext=${f##*.}
+      if [ $f_ext == "gz" ]; then
+        gunzip $f
+        l=${f_base##*.}
+        f_base=${f_base%.*}
+      else
+        l=${f_ext}
+      fi
+      if [ $i -eq 0 ]; then
+        cat ${f_base}.$l > ${OUTPUT_DIR_DATA}/train.$l
+      else
+        cat ${f_base}.$l >> ${OUTPUT_DIR_DATA}/train.$l
+      fi
+    done
+  done
+done
+# Clone mosesdecoder
+if [ ! -d ${OUTPUT_DIR}/mosesdecoder ]; then
+  echo "Cloning moses for data processing"
+  git clone https://github.com/moses-smt/mosesdecoder.git ${OUTPUT_DIR}/mosesdecoder
+fi
+# Extract develop and test data
+dev_test_data=""
+for ((i=0;i<${#DEV_TEST_DATA[@]};i+=3)); do
+  data_url=${DEV_TEST_DATA[i]}
+  data_tgz=${data_url##*/}  # training-parallel-commoncrawl.tgz
+  data=${data_tgz%.*}  # training-parallel-commoncrawl
+  data_lang1=${DEV_TEST_DATA[i+1]}
+  data_lang2=${DEV_TEST_DATA[i+2]}
+  if [ ! -e ${OUTPUT_DIR_DATA}/${data_tgz} ]; then
+    echo "Download "${data_url}
+    wget -O ${OUTPUT_DIR_DATA}/${data_tgz} ${data_url}
+  fi
+  if [ ! -d ${OUTPUT_DIR_DATA}/${data} ]; then
+    echo "Extract "${data_tgz}
+    mkdir -p ${OUTPUT_DIR_DATA}/${data}
+    tar_type=${data_tgz:0-3}
+    if [ ${tar_type} == "tar" ]; then
+      tar -xvf ${OUTPUT_DIR_DATA}/${data_tgz} -C ${OUTPUT_DIR_DATA}/${data}
+    else
+      tar -xvzf ${OUTPUT_DIR_DATA}/${data_tgz} -C ${OUTPUT_DIR_DATA}/${data}
+    fi
+  fi
+  for data_lang in $data_lang1 $data_lang2; do
+    for f in `find ${OUTPUT_DIR_DATA}/${data} -regex ".*/${data_lang}"`; do
+      data_dir=`dirname $f`
+      data_file=`basename $f`
+      data_out=`echo ${data_file} | cut -d '-' -f 1`  # newstest2016
+      l=`echo ${data_file} | cut -d '.' -f 2`  # en
+      dev_test_data="${dev_test_data}\|${data_out}"  # to make regexp
+      if [ ! -e ${OUTPUT_DIR_DATA}/${data_out}.$l ]; then
+        ${OUTPUT_DIR}/mosesdecoder/scripts/ems/support/input-from-sgm.perl \
+          < $f > ${OUTPUT_DIR_DATA}/${data_out}.$l
+      fi
+    done
+  done
+done
+# Tokenize data
+for l in ${LANG1} ${LANG2}; do
+  for f in `ls ${OUTPUT_DIR_DATA}/*.$l | grep "\(train${dev_test_data}\)\.$l$"`; do
+    f_base=${f%.*}  # dir/train dir/newstest2016
+    f_out=$f_base.tok.$l
+    if [ ! -e $f_out ]; then
+      echo "Tokenize "$f
+      ${OUTPUT_DIR}/mosesdecoder/scripts/tokenizer/tokenizer.perl -q -l $l -threads 8 < $f > $f_out
+    fi
+  done
+done
+# Clean data
+for f in ${OUTPUT_DIR_DATA}/train.${LANG1} ${OUTPUT_DIR_DATA}/train.tok.${LANG1}; do
+  f_base=${f%.*}  # dir/train dir/train.tok
+  f_out=${f_base}.clean
+  if [ ! -e $f_out.${LANG1} ] && [ ! -e $f_out.${LANG2} ]; then
+    echo "Clean "${f_base}
+    ${OUTPUT_DIR}/mosesdecoder/scripts/training/clean-corpus-n.perl $f_base ${LANG1} ${LANG2} ${f_out} 1 80
+  fi
+done
+# Clone subword-nmt and generate BPE data
+if [ ! -d ${OUTPUT_DIR}/subword-nmt ]; then
+  git clone https://github.com/rsennrich/subword-nmt.git ${OUTPUT_DIR}/subword-nmt
+fi
+# Generate BPE data and vocabulary
+for num_operations in 32000; do
+  if [ ! -e ${OUTPUT_DIR_BPE_DATA}/bpe.${num_operations} ]; then
+    echo "Learn BPE with ${num_operations} merge operations"
+    cat ${OUTPUT_DIR_DATA}/train.tok.clean.${LANG1} ${OUTPUT_DIR_DATA}/train.tok.clean.${LANG2} | \
+      ${OUTPUT_DIR}/subword-nmt/learn_bpe.py -s $num_operations > ${OUTPUT_DIR_BPE_DATA}/bpe.${num_operations}
+  fi
+  for l in ${LANG1} ${LANG2}; do
+    for f in `ls ${OUTPUT_DIR_DATA}/*.$l | grep "\(train${dev_test_data}\)\.tok\(\.clean\)\?\.$l$"`; do
+      f_base=${f%.*}  # dir/train.tok dir/train.tok.clean dir/newstest2016.tok
+      f_base=${f_base##*/}  # train.tok train.tok.clean newstest2016.tok
+      f_out=${OUTPUT_DIR_BPE_DATA}/${f_base}.bpe.${num_operations}.$l
+      if [ ! -e $f_out ]; then
+        echo "Apply BPE to "$f
+        ${OUTPUT_DIR}/subword-nmt/apply_bpe.py -c ${OUTPUT_DIR_BPE_DATA}/bpe.${num_operations} < $f > $f_out
+      fi
+    done
+  done
+  if [ ! -e ${OUTPUT_DIR_BPE_DATA}/vocab.bpe.${num_operations} ]; then
+    echo "Create vocabulary for BPE data"
+    cat ${OUTPUT_DIR_BPE_DATA}/train.tok.clean.bpe.${num_operations}.${LANG1} ${OUTPUT_DIR_BPE_DATA}/train.tok.clean.bpe.${num_operations}.${LANG2} | \
+      ${OUTPUT_DIR}/subword-nmt/get_vocab.py | cut -f1 -d ' ' > ${OUTPUT_DIR_BPE_DATA}/vocab.bpe.${num_operations}
+  fi
+done
+# Adapt to the reader
+for f in ${OUTPUT_DIR_BPE_DATA}/*.bpe.${num_operations}.${LANG1}; do
+  f_base=${f%.*}  # dir/train.tok.clean.bpe.32000 dir/newstest2016.tok.bpe.32000
+  f_out=${f_base}.${LANG1}-${LANG2}
+  if [ ! -e $f_out ]; then
+    paste -d '\t' $f_base.${LANG1} $f_base.${LANG2} > $f_out
+  fi
+done
+if [ ! -e ${OUTPUT_DIR_BPE_DATA}/vocab_all.bpe.${num_operations} ]; then
+  sed '1i\<s>\n<e>\n<unk>' ${OUTPUT_DIR_BPE_DATA}/vocab.bpe.${num_operations} > ${OUTPUT_DIR_BPE_DATA}/vocab_all.bpe.${num_operations}
+fi
+echo "All done."
--- a/fluid/neural_machine_translation/transformer/infer.py
+++ b/fluid/neural_machine_translation/transformer/infer.py
@@ -13,7 +13,6 @@ from model import fast_decode as fast_decoder
 from config import *
 from train import pad_batch_data
 import reader
-import util
 def parse_args():
@@ -49,21 +48,12 @@ def parse_args():
        default=["<s>", "<e>", "<unk>"],
        nargs=3,
        help="The <bos>, <eos> and <unk> tokens in the dictionary.")
-    parser.add_argument(
-        "--use_wordpiece",
-        type=ast.literal_eval,
-        default=False,
-        help="The flag indicating if the data in wordpiece. The EN-FR data "
-        "we provided is wordpiece data. For wordpiece data, converting ids to "
-        "original words is a little different and some special codes are "
-        "provided in util.py to do this.")
    parser.add_argument(
        "--token_delimiter",
        type=lambda x: str(x.encode().decode("unicode-escape")),
        default=" ",
        help="The delimiter used to split tokens in source or target sentences. "
-        "For EN-DE BPE data we provided, use spaces as token delimiter.; "
+        "For EN-DE BPE data we provided, use spaces as token delimiter. ")
-        "For EN-FR wordpiece data we provided, use '\x01' as token delimiter.")
    parser.add_argument(
        'opts',
        help='See config.py for all options',
@@ -144,7 +134,7 @@ def prepare_batch_input(insts, data_input_names, src_pad_idx, bos_idx, n_head,
    return input_dict
-def fast_infer(test_data, trg_idx2word, use_wordpiece):
+def fast_infer(test_data, trg_idx2word):
    """
    Inference by beam search decoder based solely on Fluid operators.
    """
@@ -202,9 +192,7 @@ def fast_infer(test_data, trg_idx2word, use_wordpiece):
                    trg_idx2word[idx]
                    for idx in post_process_seq(
                        np.array(seq_ids)[sub_start:sub_end])
-                ]) if not use_wordpiece else util.subtoken_ids_to_str(
+                ]))
-                    post_process_seq(np.array(seq_ids)[sub_start:sub_end]),
-                    trg_idx2word))
                scores[i].append(np.array(seq_scores)[sub_end - 1])
                print(hyps[i][-1])
                if len(hyps[i]) >= InferTaskConfig.n_best:
@@ -232,7 +220,7 @@ def infer(args, inferencer=fast_infer):
        clip_last_batch=False)
    trg_idx2word = test_data.load_dict(
        dict_path=args.trg_vocab_fpath, reverse=True)
-    inferencer(test_data, trg_idx2word, args.use_wordpiece)
+    inferencer(test_data, trg_idx2word)
 if __name__ == "__main__":

--- a/fluid/neural_machine_translation/transformer/model.py
+++ b/fluid/neural_machine_translation/transformer/model.py
@@ -219,6 +219,7 @@ def prepare_encoder_decoder(src_word,
        size=[src_max_len, src_emb_dim],
        param_attr=fluid.ParamAttr(
            name=pos_enc_param_name, trainable=False))
+    src_pos_enc.stop_gradient = True
    enc_input = src_word_emb + src_pos_enc
    return layers.dropout(
        enc_input,
@@ -458,7 +459,7 @@ def transformer(src_vocab_size,
                use_py_reader=False,
                is_test=False):
    if weight_sharing:
-        assert src_vocab_size == src_vocab_size, (
+        assert src_vocab_size == trg_vocab_size, (
            "Vocabularies in source and target should be same for weight sharing."
        )

--- a/fluid/neural_machine_translation/transformer/reader.py
+++ b/fluid/neural_machine_translation/transformer/reader.py
 import glob
+import six
 import os
 import tarfile
@@ -262,8 +263,10 @@ class DataReader(object):
                if not os.path.isfile(fpath):
                    raise IOError("Invalid file: %s" % fpath)
-                with open(fpath, "r") as f:
+                with open(fpath, "rb") as f:
                    for line in f:
+                        if six.PY3:
+                            line = line.decode()
                        fields = line.strip("\n").split(self._field_delimiter)
                        if (not self._only_src and len(fields) == 2) or (
                                self._only_src and len(fields) == 1):
@@ -272,8 +275,10 @@ class DataReader(object):
    @staticmethod
    def load_dict(dict_path, reverse=False):
        word_dict = {}
-        with open(dict_path, "r") as fdict:
+        with open(dict_path, "rb") as fdict:
            for idx, line in enumerate(fdict):
+                if six.PY3:
+                    line = line.decode()
                if reverse:
                    word_dict[idx] = line.strip("\n")
                else:

--- a/fluid/neural_machine_translation/transformer/train.py
+++ b/fluid/neural_machine_translation/transformer/train.py
@@ -19,6 +19,7 @@ import logging
 import sys
 import copy
 def parse_args():
    parser = argparse.ArgumentParser("Training for Transformer.")
    parser.add_argument(
@@ -86,8 +87,7 @@ def parse_args():
        type=lambda x: str(x.encode().decode("unicode-escape")),
        default=" ",
        help="The delimiter used to split tokens in source or target sentences. "
-        "For EN-DE BPE data we provided, use spaces as token delimiter. "
+        "For EN-DE BPE data we provided, use spaces as token delimiter. ")
-        "For EN-FR wordpiece data we provided, use '\x01' as token delimiter.")
    parser.add_argument(
        'opts',
        help='See config.py for all options',
@@ -128,15 +128,11 @@ def parse_args():
        default=True,
        help="The flag indicating whether to use py_reader.")
    parser.add_argument(
-        "--fetch_steps",
+        "--fetch_steps", type=int, default=100, help="Fetch outputs steps.")
-        type=int,
-        default=100,
-        help="Fetch outputs steps.")
    #parser.add_argument(
    #    '--profile', action='store_true', help='If set, profile a few steps.')
    args = parser.parse_args()
    # Append args related to dict
    src_dict = reader.DataReader.load_dict(args.src_vocab_fpath)
@@ -151,16 +147,14 @@ def parse_args():
                        [TrainTaskConfig, ModelHyperParams])
    return args
 def append_nccl2_prepare(trainer_id, worker_endpoints, current_endpoint):
-    assert(trainer_id >= 0 and
+    assert (trainer_id >= 0 and len(worker_endpoints) > 1 and
-           len(worker_endpoints) > 1 and
+            current_endpoint in worker_endpoints)
-           current_endpoint in worker_endpoints)
    eps = copy.deepcopy(worker_endpoints)
    eps.remove(current_endpoint)
    nccl_id_var = fluid.default_startup_program().global_block().create_var(
-        name="NCCLID",
+        name="NCCLID", persistable=True, type=fluid.core.VarDesc.VarType.RAW)
-        persistable=True,
-        type=fluid.core.VarDesc.VarType.RAW)
    fluid.default_startup_program().global_block().append_op(
        type="gen_nccl_id",
        inputs={},
@@ -172,6 +166,7 @@ def append_nccl2_prepare(trainer_id, worker_endpoints, current_endpoint):
        })
    return nccl_id_var
 def pad_batch_data(insts,
                   pad_idx,
                   n_head,
@@ -385,8 +380,11 @@ def py_reader_provider_wrapper(data_reader):
 def test_context(exe, train_exe, dev_count):
    # Context to do validation.
-    startup_prog = fluid.Program()
    test_prog = fluid.Program()
+    startup_prog = fluid.Program()
+    if args.enable_ce:
+        test_prog.random_seed = 1000
+        startup_prog.random_seed = 1000
    with fluid.program_guard(test_prog, startup_prog):
        with fluid.unique_name.guard():
            sum_cost, avg_cost, predict, token_num, pyreader = transformer(
@@ -448,8 +446,17 @@ def test_context(exe, train_exe, dev_count):
    return test
-def train_loop(exe, train_prog, startup_prog, dev_count, sum_cost, avg_cost,
+def train_loop(exe,
-               token_num, predict, pyreader, nccl2_num_trainers=1, nccl2_trainer_id=0):
+               train_prog,
+               startup_prog,
+               dev_count,
+               sum_cost,
+               avg_cost,
+               token_num,
+               predict,
+               pyreader,
+               nccl2_num_trainers=1,
+               nccl2_trainer_id=0):
    # Initialize the parameters.
    if TrainTaskConfig.ckpt_path:
        fluid.io.load_persistables(exe, TrainTaskConfig.ckpt_path)
@@ -483,7 +490,8 @@ def train_loop(exe, train_prog, startup_prog, dev_count, sum_cost, avg_cost,
        main_program=train_prog,
        build_strategy=build_strategy,
        exec_strategy=exec_strategy,
-        num_trainers=nccl2_num_trainers, trainer_id=nccl2_trainer_id)
+        num_trainers=nccl2_num_trainers,
+        trainer_id=nccl2_trainer_id)
    if args.val_file_pattern is not None:
        test = test_context(exe, train_exe, dev_count)
@@ -509,7 +517,7 @@ def train_loop(exe, train_prog, startup_prog, dev_count, sum_cost, avg_cost,
            data_generator = train_data()
        batch_id = 0
-        avg_batch_time=time.time()
+        avg_batch_time = time.time()
        while True:
            try:
                feed_dict_list = prepare_feed_dict_list(data_generator,
@@ -522,28 +530,33 @@ def train_loop(exe, train_prog, startup_prog, dev_count, sum_cost, avg_cost,
                elif TrainTaskConfig.profile and batch_id == 10:
                    logging.info("end profiler")
                    #logging.info("profiling total time: ", time.time() - start_time)
-                    profiler.stop_profiler("total", "./transformer_local_profile_{}_pass{}".format(batch_id, pass_id))
+                    profiler.stop_profiler(
+                        "total", "./transformer_local_profile_{}_pass{}".format(
+                            batch_id, pass_id))
                    sys.exit(0)
                logging.info("batch_id:{}".format(batch_id))
                outs = train_exe.run(
-                    fetch_list=[sum_cost.name, token_num.name] if (batch_id % args.fetch_steps == 0 or TrainTaskConfig.profile) else[], 
+                    fetch_list=[sum_cost.name, token_num.name]
-                        feed=feed_dict_list)
+                    if (batch_id % args.fetch_steps == 0 or
+                        TrainTaskConfig.profile) else [],
+                    feed=feed_dict_list)
                if (batch_id % args.fetch_steps == 0 and batch_id > 0):
-                    sum_cost_val, token_num_val = np.array(outs[0]), np.array(outs[
+                    sum_cost_val, token_num_val = np.array(outs[0]), np.array(
-                        1])
+                        outs[1])
                    # sum the cost from multi-devices
                    total_sum_cost = sum_cost_val.sum()
                    total_token_num = token_num_val.sum()
                    total_avg_cost = total_sum_cost / total_token_num
-                    logging.info("step_idx: %d, epoch: %d, batch: %d, avg loss: %f, "
+                    logging.info(
-                                 "normalized loss: %f, ppl: %f, speed: %.2f step/s" %
+                        "step_idx: %d, epoch: %d, batch: %d, avg loss: %f, "
-                          (step_idx, pass_id, batch_id, total_avg_cost,
+                        "normalized loss: %f, ppl: %f, speed: %.2f step/s" %
-                           total_avg_cost - loss_normalizer,
+                        (step_idx, pass_id, batch_id, total_avg_cost,
-                           np.exp([min(total_avg_cost, 100)]), 
+                         total_avg_cost - loss_normalizer,
-                           args.fetch_steps / (time.time() - avg_batch_time)))
+                         np.exp([min(total_avg_cost, 100)]),
+                         args.fetch_steps / (time.time() - avg_batch_time)))
                if step_idx % int(TrainTaskConfig.
                                  save_freq) == TrainTaskConfig.save_freq - 1:
@@ -557,7 +570,7 @@ def train_loop(exe, train_prog, startup_prog, dev_count, sum_cost, avg_cost,
                                     "iter_" + str(step_idx) + ".infer.model"),
                        train_prog)
                if batch_id % args.fetch_steps == 0 and batch_id > 0:
-                    avg_batch_time=time.time()
+                    avg_batch_time = time.time()
                init_flag = False
                batch_id += 1
                step_idx += 1
@@ -640,7 +653,7 @@ def train(args):
                use_py_reader=args.use_py_reader,
                is_test=False)
-            optimizer=None
+            optimizer = None
            if args.sync:
                lr_decay = fluid.layers.learning_rate_scheduler.noam_decay(
                    ModelHyperParams.d_model, TrainTaskConfig.warmup_steps)
@@ -682,8 +695,10 @@ def train(args):
            print("worker_endpoints:", worker_endpoints)
            print("current_endpoint:", current_endpoint)
            append_nccl2_prepare(trainer_id, worker_endpoints, current_endpoint)
-            train_loop(exe, fluid.default_main_program(), dev_count, sum_cost, avg_cost,
+            train_loop(exe,
-                       lr_scheduler, token_num, predict, trainers_num, trainer_id)
+                       fluid.default_main_program(), dev_count, sum_cost,
+                       avg_cost, lr_scheduler, token_num, predict, trainers_num,
+                       trainer_id)
            return
        port = os.getenv("PADDLE_PORT", "6174")
@@ -732,7 +747,6 @@ def train(args):
        elif training_role == "TRAINER":
            logging.info("distributed: trainer started")
            trainer_prog = t.get_trainer_program()
            '''
            print("trainer start:")
            program_to_code(pserver_startup)
@@ -744,13 +758,15 @@ def train(args):
            train_loop(exe, train_prog, startup_prog, dev_count, sum_cost,
                       avg_cost, token_num, predict, pyreader)
        else:
-            logging.critical("environment var TRAINER_ROLE should be TRAINER os PSERVER")
+            logging.critical(
+                "environment var TRAINER_ROLE should be TRAINER os PSERVER")
            exit(1)
 if __name__ == "__main__":
    LOG_FORMAT = "[%(asctime)s %(levelname)s %(filename)s:%(lineno)d] %(message)s"
-    logging.basicConfig(stream=sys.stdout, level=logging.DEBUG, format=LOG_FORMAT)
+    logging.basicConfig(
+        stream=sys.stdout, level=logging.DEBUG, format=LOG_FORMAT)
    args = parse_args()
    train(args)
--- a/fluid/neural_machine_translation/transformer/util.py
+++ b/fluid/neural_machine_translation/transformer/util.py
-import sys
-import re
-import six
-import unicodedata
-# Regular expression for unescaping token strings.
-# '\u' is converted to '_'
-# '\\' is converted to '\'
-# '\213;' is converted to unichr(213)
-# Inverse of escaping.
-_UNESCAPE_REGEX = re.compile(r"\\u|\\\\|\\([0-9]+);")
-# This set contains all letter and number characters.
-_ALPHANUMERIC_CHAR_SET = set(
-    six.unichr(i) for i in range(sys.maxunicode)
-    if (unicodedata.category(six.unichr(i)).startswith("L") or
-        unicodedata.category(six.unichr(i)).startswith("N")))
-# Unicode utility functions that work with Python 2 and 3
-def native_to_unicode(s):
-    return s if is_unicode(s) else to_unicode(s)
-def unicode_to_native(s):
-    if six.PY2:
-        return s.encode("utf-8") if is_unicode(s) else s
-    else:
-        return s
-def is_unicode(s):
-    if six.PY2:
-        if isinstance(s, unicode):
-            return True
-    else:
-        if isinstance(s, str):
-            return True
-    return False
-def to_unicode(s, ignore_errors=False):
-    if is_unicode(s):
-        return s
-    error_mode = "ignore" if ignore_errors else "strict"
-    return s.decode("utf-8", errors=error_mode)
-def unescape_token(escaped_token):
-    """
-    Inverse of encoding escaping.
-    """
-    def match(m):
-        if m.group(1) is None:
-            return u"_" if m.group(0) == u"\\u" else u"\\"
-        try:
-            return six.unichr(int(m.group(1)))
-        except (ValueError, OverflowError) as _:
-            return u"\u3013"  # Unicode for undefined character.
-    trimmed = escaped_token[:-1] if escaped_token.endswith(
-        "_") else escaped_token
-    return _UNESCAPE_REGEX.sub(match, trimmed)
-def subtoken_ids_to_str(subtoken_ids, vocabs):
-    """
-    Convert a list of subtoken(word piece) ids to a native string.
-    Refer to SubwordTextEncoder in Tensor2Tensor. 
-    """
-    subtokens = [vocabs.get(subtoken_id, u"") for subtoken_id in subtoken_ids]
-    # Convert a list of subtokens to a list of tokens.
-    concatenated = "".join([native_to_unicode(t) for t in subtokens])
-    split = concatenated.split("_")
-    tokens = []
-    for t in split:
-        if t:
-            unescaped = unescape_token(t + "_")
-            if unescaped:
-                tokens.append(unescaped)
-    # Convert a list of tokens to a unicode string (by inserting spaces bewteen
-    # word tokens).
-    token_is_alnum = [t[0] in _ALPHANUMERIC_CHAR_SET for t in tokens]
-    ret = []
-    for i, token in enumerate(tokens):
-        if i > 0 and token_is_alnum[i - 1] and token_is_alnum[i]:
-            ret.append(u" ")
-        ret.append(token)
-    seq = "".join(ret)
-    return unicode_to_native(seq)
--- a/fluid/object_detection/.gitignore
+++ b/fluid/object_detection/.gitignore
@@ -20,3 +20,4 @@ data/pascalvoc/trainval.txt
 log*
 *.log
+ssd_mobilenet_v1_pascalvoc*
--- a/fluid/object_detection/train.py
+++ b/fluid/object_detection/train.py
@@ -38,7 +38,8 @@ train_parameters = {
        "batch_size": 64,
        "lr": 0.001,
        "lr_epochs": [40, 60, 80, 100],
-        "lr_decay": [1, 0.5, 0.25, 0.1, 0.01]
+        "lr_decay": [1, 0.5, 0.25, 0.1, 0.01],
+        "ap_version": '11point',
    },
    "coco2014": {
        "train_images": 82783,
@@ -47,7 +48,8 @@ train_parameters = {
        "batch_size": 64,
        "lr": 0.001,
        "lr_epochs": [12, 19],
-        "lr_decay": [1, 0.5, 0.25]
+        "lr_decay": [1, 0.5, 0.25],
+        "ap_version": 'integral', # should use eval_coco_map.py to test model
    },
    "coco2017": {
        "train_images": 118287,
@@ -56,7 +58,8 @@ train_parameters = {
        "batch_size": 64,
        "lr": 0.001,
        "lr_epochs": [12, 19],
-        "lr_decay": [1, 0.5, 0.25]
+        "lr_decay": [1, 0.5, 0.25],
+        "ap_version": 'integral', # should use eval_coco_map.py to test model
    }
 }
@@ -77,6 +80,7 @@ def optimizer_setting(train_params):
 def build_program(main_prog, startup_prog, train_params, is_train):
    image_shape = train_params['image_shape']
    class_num = train_params['class_num']
+    ap_version = train_params['ap_version']
    with fluid.program_guard(main_prog, startup_prog):
        py_reader = fluid.layers.py_reader(
            capacity=64,
@@ -88,16 +92,16 @@ def build_program(main_prog, startup_prog, train_params, is_train):
            image, gt_box, gt_label, difficult = fluid.layers.read_file(py_reader)
            locs, confs, box, box_var = mobile_net(class_num, image, image_shape)
            if is_train:
-                loss = fluid.layers.ssd_loss(locs, confs, gt_box, gt_label, box,
+                with fluid.unique_name.guard("train"):
-                    box_var)
+                    loss = fluid.layers.ssd_loss(locs, confs, gt_box, gt_label, box,
-                loss = fluid.layers.reduce_sum(loss)
+                        box_var)
-                optimizer = optimizer_setting(train_params)
+                    loss = fluid.layers.reduce_sum(loss)
-                optimizer.minimize(loss)
+                    optimizer = optimizer_setting(train_params)
+                    optimizer.minimize(loss)
            else:
+                with fluid.unique_name.guard("inference"):
-                nmsed_out = fluid.layers.detection_output(
+                    nmsed_out = fluid.layers.detection_output(
-                    locs, confs, box, box_var, nms_threshold=0.45)
+                        locs, confs, box, box_var, nms_threshold=0.45)
-                with fluid.program_guard(main_prog):
                    loss = fluid.evaluator.DetectionMAP(
                        nmsed_out,
                        gt_label,
@@ -106,7 +110,7 @@ def build_program(main_prog, startup_prog, train_params, is_train):
                        class_num,
                        overlap_threshold=0.5,
                        evaluate_difficult=False,
-                        ap_version=args.ap_version)
+                        ap_version=ap_version)
    return py_reader, loss
@@ -230,7 +234,7 @@ def train(args,
                loss_v = np.mean(np.array(loss_v))
                every_epoc_loss.append(loss_v)
                if batch_id % 20 == 0:
-                    print("Epoc {0}, batch {1}, loss {2}, time {3}".format(
+                    print("Epoc {:d}, batch {:d}, loss {:.6f}, time {:.5f}".format(
                        epoc_id, batch_id, loss_v, start_time - prev_start_time))
            end_time = time.time()
            total_time += end_time - start_time

--- a/fluid/ocr_recognition/attention_model.py
+++ b/fluid/ocr_recognition/attention_model.py
@@ -2,6 +2,7 @@ from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
 import paddle.fluid as fluid
+import six
 decoder_size = 128
 word_vector_dim = 128
@@ -22,7 +23,7 @@ def conv_bn_pool(input,
                 pool=True,
                 use_cudnn=True):
    tmp = input
-    for i in xrange(group):
+    for i in six.moves.xrange(group):
        filter_size = 3
        conv_std = (2.0 / (filter_size**2 * tmp.shape[1]))**0.5
        conv_param = fluid.ParamAttr(

--- a/fluid/ocr_recognition/eval.py
+++ b/fluid/ocr_recognition/eval.py
-import paddle.v2 as paddle
 import paddle.fluid as fluid
 from utility import add_arguments, print_arguments, to_lodtensor, get_ctc_feeder_data, get_attention_feeder_data
 from attention_model import attention_eval

--- a/fluid/ocr_recognition/infer.py
+++ b/fluid/ocr_recognition/infer.py
 from __future__ import print_function
-import paddle.v2 as paddle
 import paddle.fluid as fluid
 from utility import add_arguments, print_arguments, to_lodtensor, get_ctc_feeder_data, get_attention_feeder_for_infer
 import paddle.fluid.profiler as profiler

--- a/fluid/policy_gradient/brain.py
+++ b/fluid/policy_gradient/brain.py
 import numpy as np
-import paddle.v2 as paddle
 import paddle.fluid as fluid
 # reproducible
 np.random.seed(1)

--- a/fluid/recommendation/gru4rec/README.md
+++ b/fluid/recommendation/gru4rec/README.md
+# GRU4REC
+以下是本例的简要目录结构及说明：
+```text
+.
+├── README.md            # 文档
+├── train.py             # 训练脚本
+├── infer.py             # 预测脚本
+├── utils                # 通用函数
+├── convert_format.py    # 转换数据格式
+├── small_train.txt      # 小样本训练集
+└── small_test.txt       # 小样本测试集
+```
+## 简介
+GRU4REC模型的介绍可以参阅论文[Session-based Recommendations with Recurrent Neural Networks](https://arxiv.org/abs/1511.06939)，在本例中，我们实现了GRU4REC的模型。
+## RSC15 数据下载及预处理
+运行命令 下载RSC15官网数据集
+```
+curl -Lo yoochoose-data.7z https://s3-eu-west-1.amazonaws.com/yc-rdata/yoochoose-data.7z
+7z x yoochoose-data.7z
+```
+GRU4REC的数据过滤，下载脚本[https://github.com/hidasib/GRU4Rec/blob/master/examples/rsc15/preprocess.py](https://github.com/hidasib/GRU4Rec/blob/master/examples/rsc15/preprocess.py)，
+注意修改文件路径
+line12: PATH_TO_ORIGINAL_DATA = './'
+line13:PATH_TO_PROCESSED_DATA = './'
+注意使用python3 执行脚本
+```
+python preprocess.py
+```
+生成的数据格式如下
+```
+SessionId    ItemId    Time
+1    214536502    1396839069.277
+1    214536500    1396839249.868
+1    214536506    1396839286.998
+1    214577561    1396839420.306
+2    214662742    1396850197.614
+2    214662742    1396850239.373
+2    214825110    1396850317.446
+2    214757390    1396850390.71
+2    214757407    1396850438.247
+```
+数据格式需要转换 运行脚本
+```
+python convert_format.py
+```
+模型的训练及测试数据如下，一行表示一个用户按照时间顺序的序列
+```
+214536502 214536500 214536506 214577561
+214662742 214662742 214825110 214757390 214757407 214551617
+214716935 214774687 214832672
+214836765 214706482
+214701242 214826623
+214826835 214826715
+214838855 214838855
+214576500 214576500 214576500
+214821275 214821275 214821371 214821371 214821371 214717089 214563337 214706462 214717436 214743335 214826837 214819762
+214717867 214717867
+```
+## 训练
+'--use_cuda 1' 表示使用gpu, 缺省表示使用cpu '--parallel 1' 表示使用多卡，缺省表示使用单卡
+GPU 环境
+运行命令 `CUDA_VISIBLE_DEVICES=0 python train.py train_file test_file --use_cuda 1` 开始训练模型。
+```
+CUDA_VISIBLE_DEVICES=0 python train.py small_train.txt small_test.txt --use_cuda 1
+```
+CPU 环境
+运行命令 `python train.py train_file test_file` 开始训练模型。
+```
+python train.py small_train.txt small_test.txt
+```
+当前支持的参数可参见[train.py](./train.py) `train_net` 函数
+```python
+    batch_size = 50                 # batch大小 推荐500（）
+    args = parse_args()  
+    vocab, train_reader, test_reader = utils.prepare_data(
+        train_file, test_file,batch_size=batch_size * get_cards(args),\
+        buffer_size=1000, word_freq_threshold=0)        # buffer_size 局部序列长度排序
+    train(
+        train_reader=train_reader,  
+        vocab=vocab,
+        network=network,
+        hid_size=100,               # embedding and hidden size
+        base_lr=0.01,               # base learning rate
+        batch_size=batch_size,
+        pass_num=10,                # the number of passed for training
+        use_cuda=use_cuda,          # whether to use GPU card
+        parallel=parallel,          # whether to be parallel
+        model_dir="model_recall20", # directory to save model
+        init_low_bound=-0.1,        # uniform parameter initialization lower bound
+        init_high_bound=0.1)        # uniform parameter initialization upper bound
+```
+## 自定义网络结构
+可在[train.py](./train.py) `network` 函数中调整网络结构，当前的网络结构如下：
+```python
+emb = fluid.layers.embedding(
+    input=src,
+    size=[vocab_size, hid_size],
+    param_attr=fluid.ParamAttr(
+        initializer=fluid.initializer.Uniform(
+            low=init_low_bound, high=init_high_bound),
+        learning_rate=emb_lr_x),
+    is_sparse=True)
+fc0 = fluid.layers.fc(input=emb,
+                      size=hid_size * 3,
+                      param_attr=fluid.ParamAttr(
+                          initializer=fluid.initializer.Uniform(
+                              low=init_low_bound, high=init_high_bound),
+                          learning_rate=gru_lr_x))
+gru_h0 = fluid.layers.dynamic_gru(
+    input=fc0,
+    size=hid_size,
+    param_attr=fluid.ParamAttr(
+        initializer=fluid.initializer.Uniform(
+            low=init_low_bound, high=init_high_bound),
+        learning_rate=gru_lr_x))
+fc = fluid.layers.fc(input=gru_h0,
+                     size=vocab_size,
+                     act='softmax',
+                     param_attr=fluid.ParamAttr(
+                         initializer=fluid.initializer.Uniform(
+                             low=init_low_bound, high=init_high_bound),
+                         learning_rate=fc_lr_x))
+cost = fluid.layers.cross_entropy(input=fc, label=dst)
+acc = fluid.layers.accuracy(input=fc, label=dst, k=20)
+```
+## 训练结果示例
+我们在Tesla K40m单GPU卡上训练的日志如下所示
+```text
+epoch_1 start
+step:100 ppl:441.468
+step:200 ppl:311.043
+step:300 ppl:218.952
+step:400 ppl:186.172
+step:500 ppl:188.600
+step:600 ppl:131.213
+step:700 ppl:165.770
+step:800 ppl:164.414
+step:900 ppl:156.470
+step:1000 ppl:174.201
+step:1100 ppl:118.619
+step:1200 ppl:122.635
+step:1300 ppl:118.220
+step:1400 ppl:90.372
+step:1500 ppl:135.018
+step:1600 ppl:114.327
+step:1700 ppl:141.806
+step:1800 ppl:93.416
+step:1900 ppl:92.897
+step:2000 ppl:121.703
+step:2100 ppl:96.288
+step:2200 ppl:88.355
+step:2300 ppl:101.737
+step:2400 ppl:95.934
+step:2500 ppl:86.158
+step:2600 ppl:80.925
+step:2700 ppl:202.219
+step:2800 ppl:106.828
+step:2900 ppl:91.458
+step:3000 ppl:105.988
+step:3100 ppl:87.067
+step:3200 ppl:92.651
+step:3300 ppl:101.145
+step:3400 ppl:91.247
+step:3500 ppl:107.656
+step:3600 ppl:89.410
+...
+...
+step:15700 ppl:76.819
+step:15800 ppl:62.257
+step:15900 ppl:81.735
+epoch:1 num_steps:15907 time_cost(s):4154.096032
+model saved in model_recall20/epoch_1
+...
+```
+## 预测
+运行命令 `CUDA_VISIBLE_DEVICES=0 python infer.py model_dir start_epoch last_epoch(inclusive) train_file test_file` 开始预测.其中，start_epoch指定开始预测的轮次，last_epoch指定结束的轮次，例如
+```python
+CUDA_VISIBLE_DEVICES=0 python infer.py model 1 10 small_train.txt small_test.txt
+```
+## 预测结果示例
+```text
+model:model_r@20/epoch_1 recall@20:0.613 time_cost(s):12.23
+model:model_r@20/epoch_2 recall@20:0.647 time_cost(s):12.33
+model:model_r@20/epoch_3 recall@20:0.662 time_cost(s):12.38
+model:model_r@20/epoch_4 recall@20:0.669 time_cost(s):12.21
+model:model_r@20/epoch_5 recall@20:0.673 time_cost(s):12.17
+model:model_r@20/epoch_6 recall@20:0.675 time_cost(s):12.26
+model:model_r@20/epoch_7 recall@20:0.677 time_cost(s):12.25
+model:model_r@20/epoch_8 recall@20:0.679 time_cost(s):12.37
+model:model_r@20/epoch_9 recall@20:0.680 time_cost(s):12.22
+model:model_r@20/epoch_10 recall@20:0.681 time_cost(s):12.2
+```
--- a/fluid/recommendation/gru4rec/convert_format.py
+++ b/fluid/recommendation/gru4rec/convert_format.py
+import sys
+def convert_format(input, output):
+    with open(input) as rf:
+        with open(output, "w") as wf:
+            last_sess = -1
+            sign = 1
+            i = 0
+            for l in rf:
+                i = i + 1
+                if i == 1:
+                    continue
+                if (i % 1000000 == 1):
+                    print(i)
+                tokens = l.strip().split()
+                if (int(tokens[0]) != last_sess):
+                    if (sign):
+                        sign = 0
+                        wf.write(tokens[1] + " ")
+                    else:
+                        wf.write("\n" + tokens[1] + " ")
+                    last_sess = int(tokens[0])
+                else:
+                    wf.write(tokens[1] + " ")
+input = "rsc15_train_tr.txt"
+output = "rsc15_train_tr_paddle.txt"
+input2 = "rsc15_test.txt"
+output2 = "rsc15_test_paddle.txt"
+convert_format(input, output)
+convert_format(input2, output2)
--- a/fluid/recommendation/gru4rec/infer.py
+++ b/fluid/recommendation/gru4rec/infer.py
+import sys
+import time
+import math
+import unittest
+import contextlib
+import numpy as np
+import six
+import paddle.fluid as fluid
+import paddle
+import utils
+def infer(test_reader, use_cuda, model_path):
+    """ inference function """
+    place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
+    exe = fluid.Executor(place)
+    with fluid.scope_guard(fluid.core.Scope()):
+        infer_program, feed_target_names, fetch_vars = fluid.io.load_inference_model(
+            model_path, exe)
+        accum_num_recall = 0.0
+        accum_num_sum = 0.0
+        t0 = time.time()
+        step_id = 0
+        for data in test_reader():
+            step_id += 1
+            src_wordseq = utils.to_lodtensor([dat[0] for dat in data], place)
+            label_data = [dat[1] for dat in data]
+            dst_wordseq = utils.to_lodtensor(label_data, place)
+            para = exe.run(
+                infer_program,
+                feed={"src_wordseq": src_wordseq,
+                      "dst_wordseq": dst_wordseq},
+                fetch_list=fetch_vars,
+                return_numpy=False)
+            acc_ = para[1]._get_float_element(0)
+            data_length = len(
+                np.concatenate(
+                    label_data, axis=0).astype("int64"))
+            accum_num_sum += (data_length)
+            accum_num_recall += (data_length * acc_)
+            if step_id % 100 == 0:
+                print("step:%d  " % (step_id), accum_num_recall / accum_num_sum)
+        t1 = time.time()
+        print("model:%s recall@20:%.3f time_cost(s):%.2f" %
+              (model_path, accum_num_recall / accum_num_sum, t1 - t0))
+if __name__ == "__main__":
+    if len(sys.argv) != 6:
+        print(
+            "Usage: %s model_dir start_epoch last_epoch(inclusive) train_file test_file"
+        )
+        exit(0)
+    train_file = ""
+    test_file = ""
+    model_dir = sys.argv[1]
+    try:
+        start_index = int(sys.argv[2])
+        last_index = int(sys.argv[3])
+        train_file = sys.argv[4]
+        test_file = sys.argv[5]
+    except:
+        iprint(
+            "Usage: %s model_dir start_ipoch last_epoch(inclusive) train_file test_file"
+        )
+        exit(-1)
+    vocab, train_reader, test_reader = utils.prepare_data(
+        train_file,
+        test_file,
+        batch_size=5,
+        buffer_size=1000,
+        word_freq_threshold=0)
+    for epoch in xrange(start_index, last_index + 1):
+        epoch_path = model_dir + "/epoch_" + str(epoch)
+        infer(test_reader=test_reader, use_cuda=True, model_path=epoch_path)
--- a/fluid/recommendation/gru4rec/small_test.txt
+++ b/fluid/recommendation/gru4rec/small_test.txt
+214586805 214509260 
+214857547 214857268 214857260 
+214859848 214857787 
+214687963 214531502 214687963 
+214696532 214859034 214858850 
+214857570 214857810 214857568 214857787 214857182 
+214857562 214857570 214857562 214857568 
+214859132 214545928 214859132 214551913 
+214858843 214859859 214858912 214858691 214859900 
+214561888 214561888 
+214688430 214688435 214688430 
+214536302 214531376 214531659 214531440 214531466 214513382 214550996 
+214854930 214854930 
+214858856 214690775 214859306 
+214859872 214858912 214858689 
+214859310 214859338 214859338 214859942 214859293 214859889 214859338 214859889 214859075 214859338 214859338 214859889 
+214574906 214574906 
+214859342 214859342 214858777 214851155 214851152 214572433 
+214537127 214857257 
+214857570 214857570 214857568 214857562 214857015 
+214854352 214854352 214854354 
+214738466 214855010 214857605 214856552 214574906 214857765 214849299 
+214858365 214859900 214859126 214858689 214859126 214859126 214857759 214858850 214859895 214859300 
+214857260 214561481 214848995 214849052 214865212 
+214857596 214819412 214819412 
+214849342 214849342 
+214859902 214854845 214854845 214854825 
+214859306 214859126 214859126 
+214644962 214644960 214644958 
+214696432 214696434 
+214708372 214508287 214684093 
+214857015 214857015 214858847 214690130 
+214858787 214859855 
+214858847 214696532 214859304 214854845 
+214586805 214586805 
+214857568 214857570 
+214696532 214858850 214859034 214569238 214568120 214854165 214684785 214854262 214567327 
+214602729 214857568 214857596 
+214859122 214858687 214859122 214859872 
+214555607 214836225 214836225 214836223 
+214849299 214829724 214855010 214829801 214574906 214586722 214684307 214857570 
+214859872 214695525 
+214845947 214586722 214829801 
+214829312 214546123 
+214849055 214849052 
+214509260 214587932 214596435 214644960 214696432 214696434 214545928 214857030 214636329 214832604 214574906 
+214586805 214586805 
+214587932 214587932 
+214857568 214857549 214854894 
+214836819 214836819 214595855 214595855 
+214858787 214858787 
+214854860 214857701 
+214848750 214643908 
+214858847 214859872 214859038 214859855 214690130 
+214847780 214696817 214717305 
+214509260 214509260 
+214853122 214853122 214853122 214853323 
+214858847 214858631 214858691 
+214859859 214819807 214853072 214853072 214819730 
+214820450 214705115 214586805 
+214858787 214859036 
+214829842 214864967 
+214846033 214850949 
+214587932 214586805 214509260 214696432 214855110 214545928 
+214858856 214859081 214859306 214858854 
+214690839 214690839 214711277 214839607 214582942 214582942 
+214857030 214832604 
+214857570 214855046 214859870 214577475 214858687 214656380 
+214854845 214854845 214854684 214859893 214854845 214854778 
+214850630 214848159 214848159 214848159 214848159 214848159 214848159 214848159 
+214856248 214856248 
+214858365 214858905 214858905 
+214712274 214855046 
+214845947 214845947 214831946 214717511 214846014 214854729 
+214561462 214561462 214561481 214561481 
+214836819 214853250 
+214858854 214859915 214859306 214854300 
+214857660 214857787 214539307 214855010 214855046 214849299 214856981 214849055 
+214855046 214854877 214568102 214539523 214579762 214539347 214641127 214600995 214833733 214600995 214684633 214645121 214658040 214712276 214857660 214687895 214854313 214857517 
+214845962 214853165 214846119 
+214854146 214859034 
+214819412 214819412 214819412 214819412 
+214849747 214578350 214561991 
+214854341 214854341 
+214644855 214644857 214531153 
+214644960 214862167 
+214640490 214600918 214600922 
+214854710 214857759 214859306 
+214858843 214859297 214858631 214859117 214858689 214858912 214859902 214690127 
+214586805 214586805 
+214859306 214859306 214859126 
+214859034 214696532 214858850 214859126 214859859 214859034 214859859 214858850 
+214857782 214849048 214857787 
+214854148 214857787 214854877 
+214858631 214858631 214690127 214859034 214858850 214859117 214858631 214859300 214858843 214859859 214859859 
+214646036 214646036 
+214858847 214858631 214690127 214859297 
+214861603 214700002 214700000 214835117 214700000 214857830 214700000 214712235 214700000 214700002 214510700 214835713 214712235 214853321 
+214854855 214854815 214854815 
+214857185 214854637 214829765 214848384 214829765 214856546 214848596 214835167 214563335 214553837 214536185 214855982 214845515 214550844 214712006 
--- a/fluid/recommendation/gru4rec/small_train.txt
+++ b/fluid/recommendation/gru4rec/small_train.txt
+214536502 214536500 214536506 214577561 
+214662742 214662742 214825110 214757390 214757407 214551617 
+214716935 214774687 214832672 
+214836765 214706482 
+214701242 214826623 
+214826835 214826715 
+214838855 214838855 
+214576500 214576500 214576500 
+214821275 214821275 214821371 214821371 214821371 214717089 214563337 214706462 214717436 214743335 214826837 214819762 
+214717867 214717867 
+214836761 214684513 214836761 
+214577732 214587013 214577732 
+214826897 214820441 
+214684093 214684093 214684093 
+214561790 214561790 214611457 214611457 
+214577732 214577732 
+214838503 214838503 214838503 214838503 214838503 214548744 
+214718203 214718203 214718203 214718203 
+214837485 214837485 214837485 214837487 214837487 214821315 214586711 214821305 214821307 214844357 214821341 214821309 214551617 214551617 214612920 214837487 
+214613743 214613743 214539110 214539110 
+214827028 214827017 214537796 214840762 214707930 214707930 214585652 214536197 214536195 214646169 
+214579288 214714790 214676070 214601407 
+214532036 214700432 
+214836789 214836789 214710804 
+214537967 214537967 
+214718246 214826835 
+214835257 214835265 
+214834865 214571188 214571188 214571188 214820225 214820225 214820225 214820225 214820225 214820225 214820225 214820225 214706441 214706441 214706441 214706441 
+214652878 214716737 214652878 
+214684721 214680356 
+214551594 214586970 
+214826769 214537967 
+214819745 214819745 
+214691587 214587915 
+214821277 214821277 214821277 214821277 214821277 
+214716932 214716932 214716932 214716932 214716932 214716932 
+214712235 214581489 214602605 
+214820441 214826897 214826702 214684513 214838100 214544357 214551626 214691484 
+214545935 214819438 214839907 214835917 214836210 
+214698491 214523692 
+214695307 214695305 214538317 214677448 
+214819468 214716977 214716977 214716977 214716977 214716939 
+214544355 214601212 214601212 214601212 
+214716982 214716984 
+214844248 214844248 
+214515834 214515830 
+214717318 214717318 
+214832557 214559660 214559660 214819520 214586540 
+214587797 214835775 214844109 
+214714794 214601407 214826619 214746427 214821300 214717562 214826927 214748334 214826908 214800262 
+214709645 214709645 214709645 214709645 214709645 
+214532072 214532070 
+214827022 214840419 
+214716984 214832657 
+214662975 214537779 214840762 
+214821277 214821277 214821277 
+214748300 214748293 
+214826955 214826606 214687642 
+214832559 214832559 214832559 214821017 214821017 214572234 214826715 214826715 
+214509135 214536853 214509133 214509135 214509135 214509135 214717877 214826615 214716982 
+214819472 214687685 
+214821285 214821285 214826801 214826801 
+214826705 214826705 
+214668590 214826872 
+214652220 214840483 214840483 214717286 214558807 214821300 214826908 214826908 214826908 214554637 214819430 214819430 214826837 214826837 214820392 214820392 214586694 214819376 214553844 214601229 214555500 214695127 214819760 214717850 214718385 214743369 214743369 
+214648475 214648340 214648438 214648455 214712936 214712887 214696149 214717097 214534352 214534352 214717097 
+214560099 214560099 214560099 214832750 214560099 
+214685621 214684093 214546097 214685623 
+214819685 214839907 214839905 214811752 
+214717007 214717003 214716928 
+214820842 214819490 
+214555869 214537185 
+214840599 214835735 
+214838100 214706216 
+214829737 214821315 
+214748293 214748293 
+214712272 214820450 
+214821380 214821380 
+214826799 214827005 214718390 214718396 214826627 
+214841060 214841060 
+214687768 214706445 
+214811752 214811754 
+214594678 214594680 214594680 
+214821369 214821369 214697771 214697512 214697413 214697409 214652409 214537127 214537127 214820237 214820237 214709645 214699213 214820237 214820237 214820237 214709645 214537127 
+214554358 214716950 
+214821275 214829741 
+214829741 214820842 214821279 214703790 
+214716954 214838366 
+214821022 214820814 
+214684721 214821369 214826833 214819472 
+214821315 214821305 
+214826702 214821275 
+214717847 214819719 214748336 
+214536440 214536437 
+214512416 214512416 
+214839313 214839313 214839313 
+214826705 214826705 
+214510044 214510044 214510044 214582387 214537535 214584812 214537535 214584810 
+214600989 214704180 
+214705693 214696824 214705682 214696817 214705691 214705693 214711710 214705691 214705691 214687539 214705687 214744796 214681648 214717307 214577750 214650382 214744796 214696817 214705682 214711710 
--- a/fluid/recommendation/gru4rec/train.py
+++ b/fluid/recommendation/gru4rec/train.py
+import os
+import sys
+import time
+import six
+import numpy as np
+import math
+import argparse
+import paddle.fluid as fluid
+import paddle
+import time
+import utils
+SEED = 102
+def parse_args():
+    parser = argparse.ArgumentParser("gru4rec benchmark.")
+    parser.add_argument('train_file')
+    parser.add_argument('test_file')
+    parser.add_argument('--use_cuda', help='whether use gpu')
+    parser.add_argument('--parallel', help='whether parallel')
+    parser.add_argument(
+        '--enable_ce',
+        action='store_true',
+        help='If set, run \
+        the task with continuous evaluation logs.')
+    parser.add_argument(
+        '--num_devices', type=int, default=1, help='Number of GPU devices')
+    args = parser.parse_args()
+    return args
+def network(src, dst, vocab_size, hid_size, init_low_bound, init_high_bound):
+    """ network definition """
+    emb_lr_x = 10.0
+    gru_lr_x = 1.0
+    fc_lr_x = 1.0
+    emb = fluid.layers.embedding(
+        input=src,
+        size=[vocab_size, hid_size],
+        param_attr=fluid.ParamAttr(
+            initializer=fluid.initializer.Uniform(
+                low=init_low_bound, high=init_high_bound),
+            learning_rate=emb_lr_x),
+        is_sparse=True)
+    fc0 = fluid.layers.fc(input=emb,
+                          size=hid_size * 3,
+                          param_attr=fluid.ParamAttr(
+                              initializer=fluid.initializer.Uniform(
+                                  low=init_low_bound, high=init_high_bound),
+                              learning_rate=gru_lr_x))
+    gru_h0 = fluid.layers.dynamic_gru(
+        input=fc0,
+        size=hid_size,
+        param_attr=fluid.ParamAttr(
+            initializer=fluid.initializer.Uniform(
+                low=init_low_bound, high=init_high_bound),
+            learning_rate=gru_lr_x))
+    fc = fluid.layers.fc(input=gru_h0,
+                         size=vocab_size,
+                         act='softmax',
+                         param_attr=fluid.ParamAttr(
+                             initializer=fluid.initializer.Uniform(
+                                 low=init_low_bound, high=init_high_bound),
+                             learning_rate=fc_lr_x))
+    cost = fluid.layers.cross_entropy(input=fc, label=dst)
+    acc = fluid.layers.accuracy(input=fc, label=dst, k=20)
+    return cost, acc
+def train(train_reader,
+          vocab,
+          network,
+          hid_size,
+          base_lr,
+          batch_size,
+          pass_num,
+          use_cuda,
+          parallel,
+          model_dir,
+          init_low_bound=-0.04,
+          init_high_bound=0.04):
+    """ train network """
+    args = parse_args()
+    if args.enable_ce:
+        # random seed must set before configuring the network.
+        fluid.default_startup_program().random_seed = SEED
+    vocab_size = len(vocab)
+    # Input data
+    src_wordseq = fluid.layers.data(
+        name="src_wordseq", shape=[1], dtype="int64", lod_level=1)
+    dst_wordseq = fluid.layers.data(
+        name="dst_wordseq", shape=[1], dtype="int64", lod_level=1)
+    # Train program
+    avg_cost = None
+    cost, acc = network(src_wordseq, dst_wordseq, vocab_size, hid_size,
+                        init_low_bound, init_high_bound)
+    avg_cost = fluid.layers.mean(x=cost)
+    # Optimization to minimize lost
+    sgd_optimizer = fluid.optimizer.Adagrad(learning_rate=base_lr)
+    sgd_optimizer.minimize(avg_cost)
+    # Initialize executor
+    place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
+    exe = fluid.Executor(place)
+    exe.run(fluid.default_startup_program())
+    if parallel:
+        train_exe = fluid.ParallelExecutor(
+            use_cuda=use_cuda, loss_name=avg_cost.name)
+    else:
+        train_exe = exe
+    total_time = 0.0
+    fetch_list = [avg_cost.name]
+    for pass_idx in six.moves.xrange(pass_num):
+        epoch_idx = pass_idx + 1
+        print("epoch_%d start" % epoch_idx)
+        t0 = time.time()
+        i = 0
+        newest_ppl = 0
+        for data in train_reader():
+            i += 1
+            lod_src_wordseq = utils.to_lodtensor([dat[0] for dat in data],
+                                                 place)
+            lod_dst_wordseq = utils.to_lodtensor([dat[1] for dat in data],
+                                                 place)
+            ret_avg_cost = train_exe.run(feed={
+                "src_wordseq": lod_src_wordseq,
+                "dst_wordseq": lod_dst_wordseq
+            },
+                                         fetch_list=fetch_list)
+            avg_ppl = np.exp(ret_avg_cost[0])
+            newest_ppl = np.mean(avg_ppl)
+            if i % 10 == 0:
+                print("step:%d ppl:%.3f" % (i, newest_ppl))
+        t1 = time.time()
+        total_time += t1 - t0
+        print("epoch:%d num_steps:%d time_cost(s):%f" %
+              (epoch_idx, i, total_time / epoch_idx))
+        if pass_idx == pass_num - 1 and args.enable_ce:
+            #Note: The following logs are special for CE monitoring.
+            #Other situations do not need to care about these logs.
+            gpu_num = get_cards(args.enable_ce)
+            if gpu_num == 1:
+                print("kpis    rsc15_pass_duration    %s" %
+                      (total_time / epoch_idx))
+                print("kpis    rsc15_avg_ppl    %s" % newest_ppl)
+            else:
+                print("kpis    rsc15_pass_duration_card%s    %s" % \
+                      (gpu_num, total_time / epoch_idx))
+                print("kpis    rsc15_avg_ppl_card%s    %s" %
+                      (gpu_num, newest_ppl))
+        save_dir = "%s/epoch_%d" % (model_dir, epoch_idx)
+        feed_var_names = ["src_wordseq", "dst_wordseq"]
+        fetch_vars = [avg_cost, acc]
+        fluid.io.save_inference_model(save_dir, feed_var_names, fetch_vars, exe)
+        print("model saved in %s" % save_dir)
+    print("finish training")
+def get_cards(args):
+    if args.enable_ce:
+        cards = os.environ.get('CUDA_VISIBLE_DEVICES')
+        num = len(cards.split(","))
+        return num
+    else:
+        return args.num_devices
+def train_net():
+    """ do training """
+    args = parse_args()
+    train_file = args.train_file
+    test_file = args.test_file
+    use_cuda = True if args.use_cuda else False
+    parallel = True if args.parallel else False
+    print("use_cuda:", use_cuda, "parallel:", parallel)
+    batch_size = 50
+    vocab, train_reader, test_reader = utils.prepare_data(
+        train_file, test_file,batch_size=batch_size * get_cards(args),\
+        buffer_size=1000, word_freq_threshold=0)
+    train(
+        train_reader=train_reader,
+        vocab=vocab,
+        network=network,
+        hid_size=100,
+        base_lr=0.01,
+        batch_size=batch_size,
+        pass_num=10,
+        use_cuda=use_cuda,
+        parallel=parallel,
+        model_dir="model_recall20",
+        init_low_bound=-0.1,
+        init_high_bound=0.1)
+if __name__ == "__main__":
+    train_net()
--- a/fluid/recommendation/gru4rec/utils.py
+++ b/fluid/recommendation/gru4rec/utils.py
+import sys
+import collections
+import six
+import time
+import numpy as np
+import paddle.fluid as fluid
+import paddle
+def to_lodtensor(data, place):
+    """ convert to LODtensor """
+    seq_lens = [len(seq) for seq in data]
+    cur_len = 0
+    lod = [cur_len]
+    for l in seq_lens:
+        cur_len += l
+        lod.append(cur_len)
+    flattened_data = np.concatenate(data, axis=0).astype("int64")
+    flattened_data = flattened_data.reshape([len(flattened_data), 1])
+    res = fluid.LoDTensor()
+    res.set(flattened_data, place)
+    res.set_lod([lod])
+    return res
+def prepare_data(train_filename,
+                 test_filename,
+                 batch_size,
+                 buffer_size=1000,
+                 word_freq_threshold=0,
+                 enable_ce=False):
+    """ prepare the English Pann Treebank (PTB) data """
+    print("start constuct word dict")
+    vocab = build_dict(word_freq_threshold, train_filename, test_filename)
+    print("construct word dict done\n")
+    if enable_ce:
+        train_reader = paddle.batch(
+            train(
+                train_filename, vocab, buffer_size, data_type=DataType.SEQ),
+            batch_size)
+    else:
+        train_reader = sort_batch(
+            paddle.reader.shuffle(
+                train(
+                    train_filename, vocab, buffer_size, data_type=DataType.SEQ),
+                buf_size=buffer_size),
+            batch_size,
+            batch_size * 20)
+    test_reader = sort_batch(
+        test(
+            test_filename, vocab, buffer_size, data_type=DataType.SEQ),
+        batch_size,
+        batch_size * 20)
+    return vocab, train_reader, test_reader
+def sort_batch(reader, batch_size, sort_group_size, drop_last=False):
+    """
+    Create a batched reader.
+    :param reader: the data reader to read from.
+    :type reader: callable
+    :param batch_size: size of each mini-batch
+    :type batch_size: int
+    :param sort_group_size: size of partial sorted batch
+    :type sort_group_size: int
+    :param drop_last: drop the last batch, if the size of last batch is not equal to batch_size.
+    :type drop_last: bool
+    :return: the batched reader.
+    :rtype: callable
+    """
+    def batch_reader():
+        r = reader()
+        b = []
+        for instance in r:
+            b.append(instance)
+            if len(b) == sort_group_size:
+                sortl = sorted(b, key=lambda x: len(x[0]), reverse=True)
+                b = []
+                c = []
+                for sort_i in sortl:
+                    c.append(sort_i)
+                    if (len(c) == batch_size):
+                        yield c
+                        c = []
+        if drop_last == False and len(b) != 0:
+            sortl = sorted(b, key=lambda x: len(x[0]), reverse=True)
+            c = []
+            for sort_i in sortl:
+                c.append(sort_i)
+        if (len(c) == batch_size):
+            yield c
+            c = []
+    # Batch size check
+    batch_size = int(batch_size)
+    if batch_size <= 0:
+        raise ValueError("batch_size should be a positive integeral value, "
+                         "but got batch_size={}".format(batch_size))
+    return batch_reader
+class DataType(object):
+    SEQ = 2
+def word_count(input_file, word_freq=None):
+    """
+    compute word count from corpus
+    """
+    if word_freq is None:
+        word_freq = collections.defaultdict(int)
+    for l in input_file:
+        for w in l.strip().split():
+            word_freq[w] += 1
+    return word_freq
+def build_dict(min_word_freq=50, train_filename="", test_filename=""):
+    """
+    Build a word dictionary from the corpus,  Keys of the dictionary are words,
+    and values are zero-based IDs of these words.
+    """
+    with open(train_filename) as trainf:
+        with open(test_filename) as testf:
+            word_freq = word_count(testf, word_count(trainf))
+    word_freq = [x for x in six.iteritems(word_freq) if x[1] > min_word_freq]
+    word_freq_sorted = sorted(word_freq, key=lambda x: (-x[1], x[0]))
+    words, _ = list(zip(*word_freq_sorted))
+    word_idx = dict(list(zip(words, six.moves.range(len(words)))))
+    return word_idx
+def reader_creator(filename, word_idx, n, data_type):
+    def reader():
+        with open(filename) as f:
+            for l in f:
+                if DataType.SEQ == data_type:
+                    l = l.strip().split()
+                    l = [word_idx.get(w) for w in l]
+                    src_seq = l[:len(l) - 1]
+                    trg_seq = l[1:]
+                    if n > 0 and len(src_seq) > n: continue
+                    yield src_seq, trg_seq
+                else:
+                    assert False, 'error data type'
+    return reader
+def train(filename, word_idx, n, data_type=DataType.SEQ):
+    return reader_creator(filename, word_idx, n, data_type)
+def test(filename, word_idx, n, data_type=DataType.SEQ):
+    return reader_creator(filename, word_idx, n, data_type)
--- a/fluid/text_classification/train.py
+++ b/fluid/text_classification/train.py
@@ -89,7 +89,7 @@ def train(train_reader,
 def train_net():
    word_dict, train_reader, test_reader = utils.prepare_data(
-        "imdb", self_dict=False, batch_size=128, buf_size=50000)
+        "imdb", self_dict=False, batch_size=4, buf_size=50000)
    if sys.argv[1] == "bow":
        train(
@@ -101,7 +101,7 @@ def train_net():
            save_dirname="bow_model",
            lr=0.002,
            pass_num=30,
-            batch_size=128)
+            batch_size=4)
    elif sys.argv[1] == "cnn":
        train(
            train_reader,
@@ -134,7 +134,7 @@ def train_net():
            save_dirname="gru_model",
            lr=0.05,
            pass_num=30,
-            batch_size=128)
+            batch_size=4)
    else:
        print("network name cannot be found!")
        sys.exit(1)

--- a/fluid/text_matching_on_quora/README.md
+++ b/fluid/text_matching_on_quora/README.md
+# Text matching on Quora qestion-answer pair dataset
+## contents
+* [Introduction](#introduction)
+  * [a brief review of the Quora Question Pair (QQP) Task](#a-brief-review-of-the-quora-question-pair-qqp-task)
+  * [Our Work](#our-work)
+* [Environment Preparation](#environment-preparation)
+  * [Install Fluid release 1.0](#install-fluid-release-10)
+    * [cpu version](#cpu-version)
+    * [gpu version](#gpu-version)
+    * [Have I installed Fluid successfully?](#have-i-installed-fluid-successfully)
+* [Prepare Data](#prepare-data)
+* [Train and evaluate](#train-and-evaluate)
+* [Models](#models)
+* [Results](#results)
+## Introduction
+### a brief review of the Quora Question Pair (QQP) Task
+The [Quora Question Pair](https://data.quora.com/First-Quora-Dataset-Release-Question-Pairs) dataset contains 400,000 question pairs from [Quora](https://www.quora.com/), where people ask and answer questions related to specific areas. Each sample in the dataset consists of two questions (both English) and a label that represents whether the questions are duplicate. The dataset is well annotated by human. 
+Below are two samples from the dataset. The last column indicates whether the two questions are duplicate (1) or not (0).
+|id | qid1 | qid2| question1| question2| is_duplicate
+|:---:|:---:|:---:|:---:|:---:|:---:|
+|0 |1 |2 |What is the step by step guide to invest in share market in india? |What is the step by step guide to invest in share market? |0|
+|1 |3 |4 |What is the story of Kohinoor (Koh-i-Noor) Diamond? | What would happen if the Indian government stole the Kohinoor (Koh-i-Noor) diamond back? |0|
+ A [kaggle competition](https://www.kaggle.com/c/quora-question-pairs#description) was held based on this dataset in 2017. The kagglers were given a training dataset (with labels), and requested to make predictions on a test dataset (without labels). The predictions were evaluated by the log-likelihood loss on the test data.
+The kaggle competition has inspired much effective work. However, most of these models are rule-based and difficult to be transferred to new tasks. Researchers are seeking for more general models that work well on this task and other natual language processing (NLP) tasks.
+[Wang _et al._](https://arxiv.org/abs/1702.03814) proposed a bilateral multi-perspective matching (BIMPM) model based on the Quora Question Pair dataset. They splitted the original dataset to [3 parts](https://drive.google.com/file/d/0B0PlTAo--BnaQWlsZl9FZ3l1c28/view?usp=sharing): _train.tsv_ (384,348 samples), _dev.tsv_ (10,000 samples) and _test.tsv_ (10,000 samples). The class distribution of _train.tsv_ is unbalanced (37% positive and 63% negative), while those of _dev.tsv_ and _test.tsv_ are balanced(50% positive and 50% negetive). We used the same splitting method in our experiments. 
+### Our Work
+Based on the Quora Question Pair Dataset, we implemented some classic models in the area of neural language understanding (NLU). The accuracy of prediction results are evaluated on the _test.tsv_ from [Wang _et al._](https://arxiv.org/abs/1702.03814).
+## Environment Preparation
+### Install Fluid release 1.0
+Please follow the [official document in English](http://www.paddlepaddle.org/documentation/docs/en/1.0/build_and_install/pip_install_en.html) or [official document in Chinese](http://www.paddlepaddle.org/documentation/docs/zh/1.0/beginners_guide/install/Start.html) to install the Fluid deep learning framework. 
+#### Have I installed Fluid successfully?
+Run the following script from your command line:
+```shell
+python -c "import paddle"
+```
+If Fluid is installed successfully you should see no error message. Feel free to open issues under the [PaddlePaddle repository](https://github.com/PaddlePaddle/Paddle/issues) for support.
+## Prepare Data
+Please download the Quora dataset from [Google drive](https://drive.google.com/file/d/0B0PlTAo--BnaQWlsZl9FZ3l1c28/view?usp=sharing) and unzip to $HOME/.cache/paddle/dataset.
+Then run _data/prepare_quora_data.sh_ to download the pre-trained _word2vec_ embedding file -- _glove.840B.300d.zip_:
+```shell
+sh data/prepare_quora_data.sh   
+```
+At this point the dataset directory ($HOME/.cache/paddle/dataset) structure should be:
+```shell
+$HOME/.cache/paddle/dataset
+    |- Quora_question_pair_partition
+        |- train.tsv
+        |- test.tsv
+        |- dev.tsv
+        |- readme.txt
+        |- wordvec.txt
+    |- glove.840B.300d.txt
+```
+## Train and evaluate
+We provide multiple models and configurations. Details are shown in `models` and `configs` directories. For a quick start, please run the _cdssmNet_ model with the corresponding configuration:
+```shell
+python train_and_evaluate.py  \
+    --model_name=cdssmNet  \
+    --config=cdssm_base
+```
+Logs will be output to the console. If everything works well, the logging information will have the same formats as the content in _cdssm_base.log_.
+All configurations used in our experiments are as follows:
+|Model|Config|command
+|:----:|:----:|:----:|
+|cdssmNet|cdssm_base|python train_and_evaluate.py  --model_name=cdssmNet  --config=cdssm_base
+|DecAttNet|decatt_glove|python train_and_evaluate.py --model_name=DecAttNet  --config=decatt_glove
+|InferSentNet|infer_sent_v1|python train_and_evaluate.py --model_name=InferSentNet --config=infer_sent_v1
+|InferSentNet|infer_sent_v2|python train_and_evaluate.py --model_name=InferSentNet --config=infer_sent_v2
+|SSENet|sse_base|python train_and_evaluate.py  --model_name=SSENet  --config=sse_base
+## Models
+We implemeted 4 models for now: the convolutional deep-structured semantic model (CDSSM, CNN-based), the InferSent model (RNN-based), the shortcut-stacked encoder (SSE, RNN-based), and the decomposed attention model (DecAtt, attention-based).
+|Model|features|Context Encoder|Match Layer|Classification Layer
+|:----:|:----:|:----:|:----:|:----:|
+|CDSSM|word|1 layer conv1d|concatenation|MLP
+|DecAtt|word|Attention|concatenation|MLP
+|InferSent|word|1 layer Bi-LSTM|concatenation/element-wise product/<br>absolute element-wise difference|MLP
+|SSE|word|3 layer Bi-LSTM|concatenation/element-wise product/<br>absolute element-wise difference|MLP
+### CDSSM
+```
+@inproceedings{shen2014learning,
+  title={Learning semantic representations using convolutional neural networks for web search},
+  author={Shen, Yelong and He, Xiaodong and Gao, Jianfeng and Deng, Li and Mesnil, Gr{\'e}goire},
+  booktitle={Proceedings of the 23rd International Conference on World Wide Web},
+  pages={373--374},
+  year={2014},
+  organization={ACM}
+}
+```
+### InferSent
+```
+@article{conneau2017supervised,
+  title={Supervised learning of universal sentence representations from natural language inference data},
+  author={Conneau, Alexis and Kiela, Douwe and Schwenk, Holger and Barrault, Loic and Bordes, Antoine},
+  journal={arXiv preprint arXiv:1705.02364},
+  year={2017}
+}
+```
+### SSE
+```
+@article{nie2017shortcut,
+  title={Shortcut-stacked sentence encoders for multi-domain inference},
+  author={Nie, Yixin and Bansal, Mohit},
+  journal={arXiv preprint arXiv:1708.02312},
+  year={2017}
+}
+```
+### DecAtt
+```
+@article{tomar2017neural,
+  title={Neural paraphrase identification of questions with noisy pretraining},
+  author={Tomar, Gaurav Singh and Duque, Thyago and T{\"a}ckstr{\"o}m, Oscar and Uszkoreit, Jakob and Das, Dipanjan},
+  journal={arXiv preprint arXiv:1704.04565},
+  year={2017}
+}
+```
+## Results
+|Model|Config|dev accuracy| test accuracy
+|:----:|:----:|:----:|:----:|
+|cdssmNet|cdssm_base|83.56%|82.83%|
+|DecAttNet|decatt_glove|86.31%|86.22%|
+|InferSentNet|infer_sent_v1|87.15%|86.62%|
+|InferSentNet|infer_sent_v2|88.55%|88.43%|
+|SSENet|sse_base|88.35%|88.25%|
+In our experiment, we found that LSTM-based models outperformed convolution-based models. The DecAtt model has fewer parameters than LSTM-based models, but is sensitive to hyper-parameters.
+<p align="center"> 
+ <img src="imgs/models_test_acc.png" width = "500" alt="test_acc"/> 
+</p>
--- a/fluid/text_matching_on_quora/cdssm_base.log
+++ b/fluid/text_matching_on_quora/cdssm_base.log
+net_name:  cdssmNet
+config {'save_dirname': 'model_dir', 'optimizer_type': 'adam', 'duplicate_data': False, 'train_samples_num': 384348, 'droprate_fc': 0.1, 'fc_dim': 128, 'kernel_count': 300, 'mlp_hid_dim': [128, 128], 'OOV_fill': 'uniform', 'class_dim': 2, 'epoch_num': 50, 'lr_decay': 1, 'learning_rate': 0.001, 'batch_size': 128, 'use_lod_tensor': True, 'metric_type': ['accuracy'], 'embedding_norm': False, 'emb_dim': 300, 'droprate_conv': 0.1, 'use_pretrained_word_embedding': True, 'kernel_size': 5, 'dict_dim': 40000}
+Generating word dict...
+('Vocab size: ', 36057)
+loading word2vec from  /home/dongdaxiang/.cache/paddle/dataset/glove.840B.300d.txt
+preparing pretrained word embedding ...
+pretrained_word_embedding to be load: [[-0.086864    0.19161     0.10915    ... -0.01516     0.11108
+   0.2065    ]
+ [ 0.27204    -0.06203    -0.1884     ...  0.13015    -0.18317
+   0.1323    ]
+ [-0.20628     0.36716    -0.071933   ...  0.14271     0.50059
+   0.038025  ]
+ ...
+ [ 0.03847164  0.01711482  0.01181574 ...  0.03926358 -0.04032813
+  -0.02135365]
+ [ 0.04201478 -0.02560226 -0.02281064 ...  0.00920258  0.04321
+   0.0227482 ]
+ [-0.04984529 -0.00176931  0.03022346 ...  0.0298265   0.02384543
+   0.00974313]]
+param name: emb.w; param shape: (40000L, 300L)
+param name: conv1d.w; param shape: (1500L, 300L)
+param name: fc1.w; param shape: (300L, 128L)
+param name: fc1.b; param shape: (128L,)
+param name: fc_2.w_0; param shape: (256L, 128L)
+param name: fc_2.b_0; param shape: (128L,)
+param name: fc_3.w_0; param shape: (128L, 128L)
+param name: fc_3.b_0; param shape: (128L,)
+param name: fc_4.w_0; param shape: (128L, 2L)
+param name: fc_4.b_0; param shape: (2L,)
+loading pretrained word embedding to param
+[Wed Oct 10 16:33:18 2018] epoch_id: -1, dev_cost: 0.693804, accuracy: 0.5109
+[Wed Oct 10 16:33:18 2018] epoch_id: -1, test_cost: 0.693670, accuracy: 0.5096
+[Wed Oct 10 16:33:18 2018] Start Training
+[Wed Oct 10 16:33:27 2018] epoch_id: 0, batch_id: 0, cost: 0.699992, acc: 0.515625
+[Wed Oct 10 16:33:30 2018] epoch_id: 0, batch_id: 100, cost: 0.557354, acc: 0.695312
+[Wed Oct 10 16:33:33 2018] epoch_id: 0, batch_id: 200, cost: 0.548301, acc: 0.742188
+[Wed Oct 10 16:33:35 2018] epoch_id: 0, batch_id: 300, cost: 0.528907, acc: 0.742188
+[Wed Oct 10 16:33:39 2018] epoch_id: 0, batch_id: 400, cost: 0.482460, acc: 0.781250
+[Wed Oct 10 16:33:41 2018] epoch_id: 0, batch_id: 500, cost: 0.494885, acc: 0.718750
+[Wed Oct 10 16:33:44 2018] epoch_id: 0, batch_id: 600, cost: 0.600175, acc: 0.695312
+[Wed Oct 10 16:33:46 2018] epoch_id: 0, batch_id: 700, cost: 0.477964, acc: 0.757812
+[Wed Oct 10 16:33:49 2018] epoch_id: 0, batch_id: 800, cost: 0.468172, acc: 0.750000
+[Wed Oct 10 16:33:51 2018] epoch_id: 0, batch_id: 900, cost: 0.394047, acc: 0.835938
+[Wed Oct 10 16:33:54 2018] epoch_id: 0, batch_id: 1000, cost: 0.520142, acc: 0.734375
+[Wed Oct 10 16:33:56 2018] epoch_id: 0, batch_id: 1100, cost: 0.471779, acc: 0.757812
+[Wed Oct 10 16:33:59 2018] epoch_id: 0, batch_id: 1200, cost: 0.407287, acc: 0.789062
+[Wed Oct 10 16:34:01 2018] epoch_id: 0, batch_id: 1300, cost: 0.430800, acc: 0.812500
+[Wed Oct 10 16:34:03 2018] epoch_id: 0, batch_id: 1400, cost: 0.421967, acc: 0.796875
+[Wed Oct 10 16:34:06 2018] epoch_id: 0, batch_id: 1500, cost: 0.388925, acc: 0.835938
+[Wed Oct 10 16:34:08 2018] epoch_id: 0, batch_id: 1600, cost: 0.445022, acc: 0.796875
+[Wed Oct 10 16:34:10 2018] epoch_id: 0, batch_id: 1700, cost: 0.439095, acc: 0.796875
+[Wed Oct 10 16:34:13 2018] epoch_id: 0, batch_id: 1800, cost: 0.448246, acc: 0.765625
+[Wed Oct 10 16:34:15 2018] epoch_id: 0, batch_id: 1900, cost: 0.377162, acc: 0.789062
+[Wed Oct 10 16:34:17 2018] epoch_id: 0, batch_id: 2000, cost: 0.460397, acc: 0.820312
+[Wed Oct 10 16:34:20 2018] epoch_id: 0, batch_id: 2100, cost: 0.416145, acc: 0.812500
+[Wed Oct 10 16:34:22 2018] epoch_id: 0, batch_id: 2200, cost: 0.509166, acc: 0.710938
+[Wed Oct 10 16:34:24 2018] epoch_id: 0, batch_id: 2300, cost: 0.450925, acc: 0.765625
+[Wed Oct 10 16:34:26 2018] epoch_id: 0, batch_id: 2400, cost: 0.457177, acc: 0.796875
+[Wed Oct 10 16:34:29 2018] epoch_id: 0, batch_id: 2500, cost: 0.454368, acc: 0.851562
+[Wed Oct 10 16:34:31 2018] epoch_id: 0, batch_id: 2600, cost: 0.478799, acc: 0.750000
+[Wed Oct 10 16:34:34 2018] epoch_id: 0, batch_id: 2700, cost: 0.521526, acc: 0.757812
+[Wed Oct 10 16:34:36 2018] epoch_id: 0, batch_id: 2800, cost: 0.476336, acc: 0.789062
+[Wed Oct 10 16:34:38 2018] epoch_id: 0, batch_id: 2900, cost: 0.407489, acc: 0.812500
+[Wed Oct 10 16:34:41 2018] epoch_id: 0, batch_id: 3000, cost: 0.404804, acc: 0.820312
+[Wed Oct 10 16:34:42 2018] epoch_id: 0, train_avg_cost: 0.456508, train_avg_acc: 0.779733
+[Wed Oct 10 16:34:43 2018] epoch_id: 0, dev_cost: 0.469818, accuracy: 0.7691
+[Wed Oct 10 16:34:44 2018] epoch_id: 0, test_cost: 0.462696, accuracy: 0.7734
+[Wed Oct 10 16:34:53 2018] epoch_id: 1, batch_id: 0, cost: 0.381106, acc: 0.820312
+[Wed Oct 10 16:34:56 2018] epoch_id: 1, batch_id: 100, cost: 0.325008, acc: 0.859375
+[Wed Oct 10 16:34:58 2018] epoch_id: 1, batch_id: 200, cost: 0.318922, acc: 0.843750
+[Wed Oct 10 16:35:00 2018] epoch_id: 1, batch_id: 300, cost: 0.359727, acc: 0.804688
+[Wed Oct 10 16:35:03 2018] epoch_id: 1, batch_id: 400, cost: 0.308632, acc: 0.875000
+[Wed Oct 10 16:35:05 2018] epoch_id: 1, batch_id: 500, cost: 0.326841, acc: 0.851562
+[Wed Oct 10 16:35:09 2018] epoch_id: 1, batch_id: 600, cost: 0.398975, acc: 0.796875
+[Wed Oct 10 16:35:12 2018] epoch_id: 1, batch_id: 700, cost: 0.296837, acc: 0.867188
+[Wed Oct 10 16:35:14 2018] epoch_id: 1, batch_id: 800, cost: 0.289739, acc: 0.867188
+[Wed Oct 10 16:35:17 2018] epoch_id: 1, batch_id: 900, cost: 0.315425, acc: 0.835938
+[Wed Oct 10 16:35:19 2018] epoch_id: 1, batch_id: 1000, cost: 0.340806, acc: 0.828125
+[Wed Oct 10 16:35:22 2018] epoch_id: 1, batch_id: 1100, cost: 0.383585, acc: 0.828125
+[Wed Oct 10 16:35:24 2018] epoch_id: 1, batch_id: 1200, cost: 0.317520, acc: 0.843750
+[Wed Oct 10 16:35:26 2018] epoch_id: 1, batch_id: 1300, cost: 0.308717, acc: 0.875000
+[Wed Oct 10 16:35:29 2018] epoch_id: 1, batch_id: 1400, cost: 0.320688, acc: 0.828125
+[Wed Oct 10 16:35:31 2018] epoch_id: 1, batch_id: 1500, cost: 0.353638, acc: 0.812500
+[Wed Oct 10 16:35:34 2018] epoch_id: 1, batch_id: 1600, cost: 0.379113, acc: 0.804688
+[Wed Oct 10 16:35:36 2018] epoch_id: 1, batch_id: 1700, cost: 0.309887, acc: 0.859375
+[Wed Oct 10 16:35:38 2018] epoch_id: 1, batch_id: 1800, cost: 0.316372, acc: 0.859375
+[Wed Oct 10 16:35:41 2018] epoch_id: 1, batch_id: 1900, cost: 0.405585, acc: 0.804688
+[Wed Oct 10 16:35:43 2018] epoch_id: 1, batch_id: 2000, cost: 0.336917, acc: 0.851562
+[Wed Oct 10 16:35:45 2018] epoch_id: 1, batch_id: 2100, cost: 0.347034, acc: 0.835938
+[Wed Oct 10 16:35:48 2018] epoch_id: 1, batch_id: 2200, cost: 0.379728, acc: 0.835938
+[Wed Oct 10 16:35:50 2018] epoch_id: 1, batch_id: 2300, cost: 0.395257, acc: 0.820312
+[Wed Oct 10 16:35:53 2018] epoch_id: 1, batch_id: 2400, cost: 0.398583, acc: 0.812500
+[Wed Oct 10 16:35:55 2018] epoch_id: 1, batch_id: 2500, cost: 0.356259, acc: 0.859375
+[Wed Oct 10 16:35:57 2018] epoch_id: 1, batch_id: 2600, cost: 0.297765, acc: 0.835938
+[Wed Oct 10 16:35:59 2018] epoch_id: 1, batch_id: 2700, cost: 0.353899, acc: 0.835938
+[Wed Oct 10 16:36:02 2018] epoch_id: 1, batch_id: 2800, cost: 0.377699, acc: 0.820312
+[Wed Oct 10 16:36:04 2018] epoch_id: 1, batch_id: 2900, cost: 0.388959, acc: 0.804688
+[Wed Oct 10 16:36:06 2018] epoch_id: 1, batch_id: 3000, cost: 0.344840, acc: 0.835938
+[Wed Oct 10 16:36:07 2018] epoch_id: 1, train_avg_cost: 0.346376, train_avg_acc: 0.842572
+[Wed Oct 10 16:36:08 2018] epoch_id: 1, dev_cost: 0.402576, accuracy: 0.8094
+[Wed Oct 10 16:36:09 2018] epoch_id: 1, test_cost: 0.397121, accuracy: 0.8185
+[Wed Oct 10 16:36:18 2018] epoch_id: 2, batch_id: 0, cost: 0.280530, acc: 0.890625
+[Wed Oct 10 16:36:20 2018] epoch_id: 2, batch_id: 100, cost: 0.233576, acc: 0.906250
+[Wed Oct 10 16:36:22 2018] epoch_id: 2, batch_id: 200, cost: 0.245128, acc: 0.898438
+[Wed Oct 10 16:36:25 2018] epoch_id: 2, batch_id: 300, cost: 0.183943, acc: 0.906250
+[Wed Oct 10 16:36:27 2018] epoch_id: 2, batch_id: 400, cost: 0.270915, acc: 0.882812
+[Wed Oct 10 16:36:30 2018] epoch_id: 2, batch_id: 500, cost: 0.248726, acc: 0.906250
+[Wed Oct 10 16:36:32 2018] epoch_id: 2, batch_id: 600, cost: 0.243351, acc: 0.921875
+[Wed Oct 10 16:36:35 2018] epoch_id: 2, batch_id: 700, cost: 0.314026, acc: 0.812500
+[Wed Oct 10 16:36:38 2018] epoch_id: 2, batch_id: 800, cost: 0.336282, acc: 0.867188
+[Wed Oct 10 16:36:41 2018] epoch_id: 2, batch_id: 900, cost: 0.290222, acc: 0.875000
+[Wed Oct 10 16:36:43 2018] epoch_id: 2, batch_id: 1000, cost: 0.287339, acc: 0.859375
+[Wed Oct 10 16:36:45 2018] epoch_id: 2, batch_id: 1100, cost: 0.225436, acc: 0.890625
+[Wed Oct 10 16:36:48 2018] epoch_id: 2, batch_id: 1200, cost: 0.346974, acc: 0.859375
+[Wed Oct 10 16:36:50 2018] epoch_id: 2, batch_id: 1300, cost: 0.283542, acc: 0.843750
+[Wed Oct 10 16:36:53 2018] epoch_id: 2, batch_id: 1400, cost: 0.203151, acc: 0.921875
+[Wed Oct 10 16:36:55 2018] epoch_id: 2, batch_id: 1500, cost: 0.255483, acc: 0.906250
+[Wed Oct 10 16:36:58 2018] epoch_id: 2, batch_id: 1600, cost: 0.275010, acc: 0.898438
+[Wed Oct 10 16:37:00 2018] epoch_id: 2, batch_id: 1700, cost: 0.264693, acc: 0.867188
+[Wed Oct 10 16:37:03 2018] epoch_id: 2, batch_id: 1800, cost: 0.257360, acc: 0.890625
+[Wed Oct 10 16:37:05 2018] epoch_id: 2, batch_id: 1900, cost: 0.150528, acc: 0.921875
+[Wed Oct 10 16:37:08 2018] epoch_id: 2, batch_id: 2000, cost: 0.229797, acc: 0.906250
+[Wed Oct 10 16:37:11 2018] epoch_id: 2, batch_id: 2100, cost: 0.261790, acc: 0.867188
+[Wed Oct 10 16:37:14 2018] epoch_id: 2, batch_id: 2200, cost: 0.201237, acc: 0.914062
+[Wed Oct 10 16:37:16 2018] epoch_id: 2, batch_id: 2300, cost: 0.296701, acc: 0.875000
+[Wed Oct 10 16:37:19 2018] epoch_id: 2, batch_id: 2400, cost: 0.315291, acc: 0.875000
+[Wed Oct 10 16:37:21 2018] epoch_id: 2, batch_id: 2500, cost: 0.282715, acc: 0.843750
+[Wed Oct 10 16:37:24 2018] epoch_id: 2, batch_id: 2600, cost: 0.296843, acc: 0.843750
+[Wed Oct 10 16:37:26 2018] epoch_id: 2, batch_id: 2700, cost: 0.363040, acc: 0.843750
+[Wed Oct 10 16:37:29 2018] epoch_id: 2, batch_id: 2800, cost: 0.262465, acc: 0.867188
+[Wed Oct 10 16:37:31 2018] epoch_id: 2, batch_id: 2900, cost: 0.208009, acc: 0.906250
+[Wed Oct 10 16:37:34 2018] epoch_id: 2, batch_id: 3000, cost: 0.247068, acc: 0.867188
+[Wed Oct 10 16:37:34 2018] epoch_id: 2, train_avg_cost: 0.267260, train_avg_acc: 0.884560
+[Wed Oct 10 16:37:36 2018] epoch_id: 2, dev_cost: 0.434485, accuracy: 0.8153
+[Wed Oct 10 16:37:37 2018] epoch_id: 2, test_cost: 0.425083, accuracy: 0.8243
+[Wed Oct 10 16:37:46 2018] epoch_id: 3, batch_id: 0, cost: 0.130899, acc: 0.945312
+[Wed Oct 10 16:37:49 2018] epoch_id: 3, batch_id: 100, cost: 0.174115, acc: 0.914062
+[Wed Oct 10 16:37:52 2018] epoch_id: 3, batch_id: 200, cost: 0.162655, acc: 0.929688
+[Wed Oct 10 16:37:54 2018] epoch_id: 3, batch_id: 300, cost: 0.156763, acc: 0.937500
+[Wed Oct 10 16:37:56 2018] epoch_id: 3, batch_id: 400, cost: 0.171531, acc: 0.929688
+[Wed Oct 10 16:37:59 2018] epoch_id: 3, batch_id: 500, cost: 0.124120, acc: 0.937500
+[Wed Oct 10 16:38:02 2018] epoch_id: 3, batch_id: 600, cost: 0.172306, acc: 0.929688
+[Wed Oct 10 16:38:04 2018] epoch_id: 3, batch_id: 700, cost: 0.352722, acc: 0.867188
+[Wed Oct 10 16:38:06 2018] epoch_id: 3, batch_id: 800, cost: 0.179998, acc: 0.929688
+[Wed Oct 10 16:38:09 2018] epoch_id: 3, batch_id: 900, cost: 0.197941, acc: 0.921875
+[Wed Oct 10 16:38:11 2018] epoch_id: 3, batch_id: 1000, cost: 0.163592, acc: 0.937500
+[Wed Oct 10 16:38:14 2018] epoch_id: 3, batch_id: 1100, cost: 0.196162, acc: 0.882812
+[Wed Oct 10 16:38:16 2018] epoch_id: 3, batch_id: 1200, cost: 0.201064, acc: 0.929688
+[Wed Oct 10 16:38:19 2018] epoch_id: 3, batch_id: 1300, cost: 0.162742, acc: 0.921875
+[Wed Oct 10 16:38:21 2018] epoch_id: 3, batch_id: 1400, cost: 0.192062, acc: 0.890625
+[Wed Oct 10 16:38:23 2018] epoch_id: 3, batch_id: 1500, cost: 0.215189, acc: 0.914062
+[Wed Oct 10 16:38:26 2018] epoch_id: 3, batch_id: 1600, cost: 0.148390, acc: 0.945312
+[Wed Oct 10 16:38:28 2018] epoch_id: 3, batch_id: 1700, cost: 0.148536, acc: 0.937500
+[Wed Oct 10 16:38:32 2018] epoch_id: 3, batch_id: 1800, cost: 0.122290, acc: 0.960938
+[Wed Oct 10 16:38:34 2018] epoch_id: 3, batch_id: 1900, cost: 0.152864, acc: 0.945312
+[Wed Oct 10 16:38:37 2018] epoch_id: 3, batch_id: 2000, cost: 0.250165, acc: 0.914062
+[Wed Oct 10 16:38:39 2018] epoch_id: 3, batch_id: 2100, cost: 0.197931, acc: 0.929688
+[Wed Oct 10 16:38:42 2018] epoch_id: 3, batch_id: 2200, cost: 0.167291, acc: 0.937500
+[Wed Oct 10 16:38:44 2018] epoch_id: 3, batch_id: 2300, cost: 0.243269, acc: 0.898438
+[Wed Oct 10 16:38:47 2018] epoch_id: 3, batch_id: 2400, cost: 0.170633, acc: 0.921875
+[Wed Oct 10 16:38:49 2018] epoch_id: 3, batch_id: 2500, cost: 0.182344, acc: 0.921875
+[Wed Oct 10 16:38:52 2018] epoch_id: 3, batch_id: 2600, cost: 0.267497, acc: 0.921875
+[Wed Oct 10 16:38:54 2018] epoch_id: 3, batch_id: 2700, cost: 0.170150, acc: 0.929688
+[Wed Oct 10 16:38:56 2018] epoch_id: 3, batch_id: 2800, cost: 0.198175, acc: 0.890625
+[Wed Oct 10 16:38:59 2018] epoch_id: 3, batch_id: 2900, cost: 0.231687, acc: 0.898438
+[Wed Oct 10 16:39:01 2018] epoch_id: 3, batch_id: 3000, cost: 0.280869, acc: 0.882812
+[Wed Oct 10 16:39:02 2018] epoch_id: 3, train_avg_cost: 0.203352, train_avg_acc: 0.915808
+[Wed Oct 10 16:39:03 2018] epoch_id: 3, dev_cost: 0.413912, accuracy: 0.8304
+[Wed Oct 10 16:39:04 2018] epoch_id: 3, test_cost: 0.409365, accuracy: 0.8341
+[Wed Oct 10 16:39:13 2018] epoch_id: 4, batch_id: 0, cost: 0.208998, acc: 0.945312
+[Wed Oct 10 16:39:16 2018] epoch_id: 4, batch_id: 100, cost: 0.148128, acc: 0.929688
+[Wed Oct 10 16:39:18 2018] epoch_id: 4, batch_id: 200, cost: 0.079264, acc: 0.976562
+[Wed Oct 10 16:39:21 2018] epoch_id: 4, batch_id: 300, cost: 0.125277, acc: 0.937500
+[Wed Oct 10 16:39:23 2018] epoch_id: 4, batch_id: 400, cost: 0.105227, acc: 0.968750
+[Wed Oct 10 16:39:25 2018] epoch_id: 4, batch_id: 500, cost: 0.063737, acc: 0.984375
+[Wed Oct 10 16:39:28 2018] epoch_id: 4, batch_id: 600, cost: 0.148419, acc: 0.937500
+[Wed Oct 10 16:39:30 2018] epoch_id: 4, batch_id: 700, cost: 0.118386, acc: 0.937500
+[Wed Oct 10 16:39:33 2018] epoch_id: 4, batch_id: 800, cost: 0.236417, acc: 0.898438
+[Wed Oct 10 16:39:35 2018] epoch_id: 4, batch_id: 900, cost: 0.131614, acc: 0.945312
+[Wed Oct 10 16:39:38 2018] epoch_id: 4, batch_id: 1000, cost: 0.134897, acc: 0.953125
+[Wed Oct 10 16:39:40 2018] epoch_id: 4, batch_id: 1100, cost: 0.152974, acc: 0.945312
+[Wed Oct 10 16:39:43 2018] epoch_id: 4, batch_id: 1200, cost: 0.173617, acc: 0.937500
+[Wed Oct 10 16:39:45 2018] epoch_id: 4, batch_id: 1300, cost: 0.128535, acc: 0.937500
+[Wed Oct 10 16:39:48 2018] epoch_id: 4, batch_id: 1400, cost: 0.156204, acc: 0.945312
+[Wed Oct 10 16:39:50 2018] epoch_id: 4, batch_id: 1500, cost: 0.130960, acc: 0.937500
+[Wed Oct 10 16:39:53 2018] epoch_id: 4, batch_id: 1600, cost: 0.185379, acc: 0.914062
+[Wed Oct 10 16:39:55 2018] epoch_id: 4, batch_id: 1700, cost: 0.092890, acc: 0.960938
+[Wed Oct 10 16:39:58 2018] epoch_id: 4, batch_id: 1800, cost: 0.147196, acc: 0.929688
+[Wed Oct 10 16:40:00 2018] epoch_id: 4, batch_id: 1900, cost: 0.153621, acc: 0.953125
+[Wed Oct 10 16:40:03 2018] epoch_id: 4, batch_id: 2000, cost: 0.153048, acc: 0.921875
+[Wed Oct 10 16:40:05 2018] epoch_id: 4, batch_id: 2100, cost: 0.205303, acc: 0.898438
+[Wed Oct 10 16:40:07 2018] epoch_id: 4, batch_id: 2200, cost: 0.139906, acc: 0.960938
+[Wed Oct 10 16:40:10 2018] epoch_id: 4, batch_id: 2300, cost: 0.254768, acc: 0.890625
+[Wed Oct 10 16:40:12 2018] epoch_id: 4, batch_id: 2400, cost: 0.076761, acc: 0.968750
+[Wed Oct 10 16:40:14 2018] epoch_id: 4, batch_id: 2500, cost: 0.199733, acc: 0.906250
+[Wed Oct 10 16:40:16 2018] epoch_id: 4, batch_id: 2600, cost: 0.310914, acc: 0.882812
+[Wed Oct 10 16:40:19 2018] epoch_id: 4, batch_id: 2700, cost: 0.148558, acc: 0.921875
+[Wed Oct 10 16:40:21 2018] epoch_id: 4, batch_id: 2800, cost: 0.164562, acc: 0.921875
+[Wed Oct 10 16:40:23 2018] epoch_id: 4, batch_id: 2900, cost: 0.177139, acc: 0.921875
+[Wed Oct 10 16:40:26 2018] epoch_id: 4, batch_id: 3000, cost: 0.112299, acc: 0.968750
+[Wed Oct 10 16:40:27 2018] epoch_id: 4, train_avg_cost: 0.156220, train_avg_acc: 0.937780
+[Wed Oct 10 16:40:28 2018] epoch_id: 4, dev_cost: 0.468851, accuracy: 0.8348
+[Wed Oct 10 16:40:29 2018] epoch_id: 4, test_cost: 0.468213, accuracy: 0.8368
+[Wed Oct 10 16:40:38 2018] epoch_id: 5, batch_id: 0, cost: 0.084071, acc: 0.976562
+[Wed Oct 10 16:40:41 2018] epoch_id: 5, batch_id: 100, cost: 0.052093, acc: 0.968750
+[Wed Oct 10 16:40:43 2018] epoch_id: 5, batch_id: 200, cost: 0.193576, acc: 0.929688
+[Wed Oct 10 16:40:46 2018] epoch_id: 5, batch_id: 300, cost: 0.075502, acc: 0.968750
+[Wed Oct 10 16:40:48 2018] epoch_id: 5, batch_id: 400, cost: 0.079619, acc: 0.976562
+[Wed Oct 10 16:40:51 2018] epoch_id: 5, batch_id: 500, cost: 0.124719, acc: 0.945312
+[Wed Oct 10 16:40:53 2018] epoch_id: 5, batch_id: 600, cost: 0.157322, acc: 0.929688
+[Wed Oct 10 16:40:56 2018] epoch_id: 5, batch_id: 700, cost: 0.100680, acc: 0.945312
+[Wed Oct 10 16:40:58 2018] epoch_id: 5, batch_id: 800, cost: 0.164627, acc: 0.937500
+[Wed Oct 10 16:41:00 2018] epoch_id: 5, batch_id: 900, cost: 0.113826, acc: 0.960938
+[Wed Oct 10 16:41:03 2018] epoch_id: 5, batch_id: 1000, cost: 0.122406, acc: 0.953125
+[Wed Oct 10 16:41:05 2018] epoch_id: 5, batch_id: 1100, cost: 0.098428, acc: 0.960938
+[Wed Oct 10 16:41:08 2018] epoch_id: 5, batch_id: 1200, cost: 0.175987, acc: 0.914062
+[Wed Oct 10 16:41:10 2018] epoch_id: 5, batch_id: 1300, cost: 0.161037, acc: 0.929688
+[Wed Oct 10 16:41:12 2018] epoch_id: 5, batch_id: 1400, cost: 0.058083, acc: 0.976562
+[Wed Oct 10 16:41:14 2018] epoch_id: 5, batch_id: 1500, cost: 0.099512, acc: 0.953125
+[Wed Oct 10 16:41:17 2018] epoch_id: 5, batch_id: 1600, cost: 0.155458, acc: 0.929688
+[Wed Oct 10 16:41:19 2018] epoch_id: 5, batch_id: 1700, cost: 0.149099, acc: 0.953125
+[Wed Oct 10 16:41:21 2018] epoch_id: 5, batch_id: 1800, cost: 0.184663, acc: 0.945312
+[Wed Oct 10 16:41:24 2018] epoch_id: 5, batch_id: 1900, cost: 0.153789, acc: 0.945312
+[Wed Oct 10 16:41:26 2018] epoch_id: 5, batch_id: 2000, cost: 0.135054, acc: 0.945312
+[Wed Oct 10 16:41:28 2018] epoch_id: 5, batch_id: 2100, cost: 0.091075, acc: 0.960938
+[Wed Oct 10 16:41:30 2018] epoch_id: 5, batch_id: 2200, cost: 0.175665, acc: 0.937500
+[Wed Oct 10 16:41:33 2018] epoch_id: 5, batch_id: 2300, cost: 0.092569, acc: 0.976562
+[Wed Oct 10 16:41:35 2018] epoch_id: 5, batch_id: 2400, cost: 0.171366, acc: 0.929688
+[Wed Oct 10 16:41:37 2018] epoch_id: 5, batch_id: 2500, cost: 0.077127, acc: 0.984375
+[Wed Oct 10 16:41:39 2018] epoch_id: 5, batch_id: 2600, cost: 0.133260, acc: 0.960938
+[Wed Oct 10 16:41:43 2018] epoch_id: 5, batch_id: 2700, cost: 0.130742, acc: 0.953125
+[Wed Oct 10 16:41:45 2018] epoch_id: 5, batch_id: 2800, cost: 0.165412, acc: 0.945312
+[Wed Oct 10 16:41:48 2018] epoch_id: 5, batch_id: 2900, cost: 0.099631, acc: 0.953125
+[Wed Oct 10 16:41:50 2018] epoch_id: 5, batch_id: 3000, cost: 0.191953, acc: 0.929688
+[Wed Oct 10 16:41:51 2018] epoch_id: 5, train_avg_cost: 0.122534, train_avg_acc: 0.952647
+[Wed Oct 10 16:41:52 2018] epoch_id: 5, dev_cost: 0.517809, accuracy: 0.8338
+[Wed Oct 10 16:41:53 2018] epoch_id: 5, test_cost: 0.516574, accuracy: 0.8379
+[Wed Oct 10 16:42:02 2018] epoch_id: 6, batch_id: 0, cost: 0.108672, acc: 0.953125
+[Wed Oct 10 16:42:04 2018] epoch_id: 6, batch_id: 100, cost: 0.055064, acc: 0.984375
+[Wed Oct 10 16:42:07 2018] epoch_id: 6, batch_id: 200, cost: 0.070521, acc: 0.976562
+[Wed Oct 10 16:42:09 2018] epoch_id: 6, batch_id: 300, cost: 0.044554, acc: 0.992188
+[Wed Oct 10 16:42:12 2018] epoch_id: 6, batch_id: 400, cost: 0.140199, acc: 0.968750
+[Wed Oct 10 16:42:14 2018] epoch_id: 6, batch_id: 500, cost: 0.074043, acc: 0.984375
+[Wed Oct 10 16:42:17 2018] epoch_id: 6, batch_id: 600, cost: 0.072380, acc: 0.960938
+[Wed Oct 10 16:42:19 2018] epoch_id: 6, batch_id: 700, cost: 0.089520, acc: 0.968750
+[Wed Oct 10 16:42:21 2018] epoch_id: 6, batch_id: 800, cost: 0.154753, acc: 0.937500
+[Wed Oct 10 16:42:24 2018] epoch_id: 6, batch_id: 900, cost: 0.137237, acc: 0.945312
+[Wed Oct 10 16:42:26 2018] epoch_id: 6, batch_id: 1000, cost: 0.155418, acc: 0.953125
+[Wed Oct 10 16:42:28 2018] epoch_id: 6, batch_id: 1100, cost: 0.102754, acc: 0.968750
+[Wed Oct 10 16:42:31 2018] epoch_id: 6, batch_id: 1200, cost: 0.171521, acc: 0.929688
+[Wed Oct 10 16:42:33 2018] epoch_id: 6, batch_id: 1300, cost: 0.089853, acc: 0.984375
+[Wed Oct 10 16:42:36 2018] epoch_id: 6, batch_id: 1400, cost: 0.117480, acc: 0.953125
+[Wed Oct 10 16:42:38 2018] epoch_id: 6, batch_id: 1500, cost: 0.144428, acc: 0.953125
+[Wed Oct 10 16:42:40 2018] epoch_id: 6, batch_id: 1600, cost: 0.100815, acc: 0.945312
+[Wed Oct 10 16:42:43 2018] epoch_id: 6, batch_id: 1700, cost: 0.096131, acc: 0.960938
+[Wed Oct 10 16:42:45 2018] epoch_id: 6, batch_id: 1800, cost: 0.083034, acc: 0.968750
+[Wed Oct 10 16:42:47 2018] epoch_id: 6, batch_id: 1900, cost: 0.144603, acc: 0.937500
+[Wed Oct 10 16:42:50 2018] epoch_id: 6, batch_id: 2000, cost: 0.125068, acc: 0.960938
+[Wed Oct 10 16:42:52 2018] epoch_id: 6, batch_id: 2100, cost: 0.096932, acc: 0.945312
+[Wed Oct 10 16:42:54 2018] epoch_id: 6, batch_id: 2200, cost: 0.187626, acc: 0.906250
+[Wed Oct 10 16:42:58 2018] epoch_id: 6, batch_id: 2300, cost: 0.086040, acc: 0.953125
+[Wed Oct 10 16:43:00 2018] epoch_id: 6, batch_id: 2400, cost: 0.112231, acc: 0.960938
+[Wed Oct 10 16:43:03 2018] epoch_id: 6, batch_id: 2500, cost: 0.086397, acc: 0.976562
+[Wed Oct 10 16:43:05 2018] epoch_id: 6, batch_id: 2600, cost: 0.093871, acc: 0.960938
+[Wed Oct 10 16:43:07 2018] epoch_id: 6, batch_id: 2700, cost: 0.143658, acc: 0.953125
+[Wed Oct 10 16:43:10 2018] epoch_id: 6, batch_id: 2800, cost: 0.144744, acc: 0.945312
+[Wed Oct 10 16:43:12 2018] epoch_id: 6, batch_id: 2900, cost: 0.127995, acc: 0.945312
+[Wed Oct 10 16:43:14 2018] epoch_id: 6, batch_id: 3000, cost: 0.201635, acc: 0.929688
+[Wed Oct 10 16:43:15 2018] epoch_id: 6, train_avg_cost: 0.100383, train_avg_acc: 0.961683
+[Wed Oct 10 16:43:16 2018] epoch_id: 6, dev_cost: 0.622004, accuracy: 0.833
+[Wed Oct 10 16:43:17 2018] epoch_id: 6, test_cost: 0.604546, accuracy: 0.836
+[Wed Oct 10 16:43:25 2018] epoch_id: 7, batch_id: 0, cost: 0.092909, acc: 0.968750
+[Wed Oct 10 16:43:28 2018] epoch_id: 7, batch_id: 100, cost: 0.048849, acc: 0.976562
+[Wed Oct 10 16:43:31 2018] epoch_id: 7, batch_id: 200, cost: 0.123149, acc: 0.960938
+[Wed Oct 10 16:43:33 2018] epoch_id: 7, batch_id: 300, cost: 0.043434, acc: 0.992188
+[Wed Oct 10 16:43:35 2018] epoch_id: 7, batch_id: 400, cost: 0.057082, acc: 0.976562
+[Wed Oct 10 16:43:38 2018] epoch_id: 7, batch_id: 500, cost: 0.043290, acc: 0.976562
+[Wed Oct 10 16:43:40 2018] epoch_id: 7, batch_id: 600, cost: 0.061600, acc: 0.976562
+[Wed Oct 10 16:43:42 2018] epoch_id: 7, batch_id: 700, cost: 0.077328, acc: 0.968750
+[Wed Oct 10 16:43:45 2018] epoch_id: 7, batch_id: 800, cost: 0.139978, acc: 0.953125
+[Wed Oct 10 16:43:48 2018] epoch_id: 7, batch_id: 900, cost: 0.099730, acc: 0.960938
+[Wed Oct 10 16:43:51 2018] epoch_id: 7, batch_id: 1000, cost: 0.072699, acc: 0.976562
+[Wed Oct 10 16:43:53 2018] epoch_id: 7, batch_id: 1100, cost: 0.031092, acc: 0.992188
+[Wed Oct 10 16:43:55 2018] epoch_id: 7, batch_id: 1200, cost: 0.118547, acc: 0.960938
+[Wed Oct 10 16:43:57 2018] epoch_id: 7, batch_id: 1300, cost: 0.061420, acc: 0.976562
+[Wed Oct 10 16:44:00 2018] epoch_id: 7, batch_id: 1400, cost: 0.096040, acc: 0.968750
+[Wed Oct 10 16:44:02 2018] epoch_id: 7, batch_id: 1500, cost: 0.052711, acc: 0.992188
+[Wed Oct 10 16:44:04 2018] epoch_id: 7, batch_id: 1600, cost: 0.150460, acc: 0.929688
+[Wed Oct 10 16:44:07 2018] epoch_id: 7, batch_id: 1700, cost: 0.097628, acc: 0.976562
+[Wed Oct 10 16:44:09 2018] epoch_id: 7, batch_id: 1800, cost: 0.081382, acc: 0.976562
+[Wed Oct 10 16:44:11 2018] epoch_id: 7, batch_id: 1900, cost: 0.089064, acc: 0.953125
+[Wed Oct 10 16:44:14 2018] epoch_id: 7, batch_id: 2000, cost: 0.084270, acc: 0.968750
+[Wed Oct 10 16:44:16 2018] epoch_id: 7, batch_id: 2100, cost: 0.097173, acc: 0.968750
+[Wed Oct 10 16:44:18 2018] epoch_id: 7, batch_id: 2200, cost: 0.112953, acc: 0.960938
+[Wed Oct 10 16:44:20 2018] epoch_id: 7, batch_id: 2300, cost: 0.116143, acc: 0.953125
+[Wed Oct 10 16:44:23 2018] epoch_id: 7, batch_id: 2400, cost: 0.098675, acc: 0.968750
+[Wed Oct 10 16:44:25 2018] epoch_id: 7, batch_id: 2500, cost: 0.150993, acc: 0.945312
+[Wed Oct 10 16:44:27 2018] epoch_id: 7, batch_id: 2600, cost: 0.076421, acc: 0.968750
+[Wed Oct 10 16:44:29 2018] epoch_id: 7, batch_id: 2700, cost: 0.088665, acc: 0.968750
+[Wed Oct 10 16:44:32 2018] epoch_id: 7, batch_id: 2800, cost: 0.142891, acc: 0.937500
+[Wed Oct 10 16:44:34 2018] epoch_id: 7, batch_id: 2900, cost: 0.088820, acc: 0.968750
+[Wed Oct 10 16:44:36 2018] epoch_id: 7, batch_id: 3000, cost: 0.100579, acc: 0.968750
+[Wed Oct 10 16:44:37 2018] epoch_id: 7, train_avg_cost: 0.084162, train_avg_acc: 0.968487
+[Wed Oct 10 16:44:38 2018] epoch_id: 7, dev_cost: 0.655423, accuracy: 0.8369
+[Wed Oct 10 16:44:39 2018] epoch_id: 7, test_cost: 0.663061, accuracy: 0.8352
+[Wed Oct 10 16:44:47 2018] epoch_id: 8, batch_id: 0, cost: 0.037309, acc: 0.992188
+[Wed Oct 10 16:44:50 2018] epoch_id: 8, batch_id: 100, cost: 0.043888, acc: 0.976562
+[Wed Oct 10 16:44:52 2018] epoch_id: 8, batch_id: 200, cost: 0.099702, acc: 0.960938
+[Wed Oct 10 16:44:54 2018] epoch_id: 8, batch_id: 300, cost: 0.080207, acc: 0.976562
+[Wed Oct 10 16:44:56 2018] epoch_id: 8, batch_id: 400, cost: 0.049319, acc: 0.976562
+[Wed Oct 10 16:44:59 2018] epoch_id: 8, batch_id: 500, cost: 0.041202, acc: 0.976562
+[Wed Oct 10 16:45:01 2018] epoch_id: 8, batch_id: 600, cost: 0.061663, acc: 0.968750
+[Wed Oct 10 16:45:03 2018] epoch_id: 8, batch_id: 700, cost: 0.065126, acc: 0.984375
+[Wed Oct 10 16:45:05 2018] epoch_id: 8, batch_id: 800, cost: 0.057770, acc: 0.976562
+[Wed Oct 10 16:45:07 2018] epoch_id: 8, batch_id: 900, cost: 0.136513, acc: 0.929688
+[Wed Oct 10 16:45:10 2018] epoch_id: 8, batch_id: 1000, cost: 0.054884, acc: 0.968750
+[Wed Oct 10 16:45:12 2018] epoch_id: 8, batch_id: 1100, cost: 0.046854, acc: 0.992188
+[Wed Oct 10 16:45:14 2018] epoch_id: 8, batch_id: 1200, cost: 0.031739, acc: 1.000000
+[Wed Oct 10 16:45:17 2018] epoch_id: 8, batch_id: 1300, cost: 0.127405, acc: 0.953125
+[Wed Oct 10 16:45:19 2018] epoch_id: 8, batch_id: 1400, cost: 0.052842, acc: 0.976562
+[Wed Oct 10 16:45:21 2018] epoch_id: 8, batch_id: 1500, cost: 0.117588, acc: 0.960938
+[Wed Oct 10 16:45:23 2018] epoch_id: 8, batch_id: 1600, cost: 0.078688, acc: 0.968750
+[Wed Oct 10 16:45:26 2018] epoch_id: 8, batch_id: 1700, cost: 0.069420, acc: 0.976562
+[Wed Oct 10 16:45:28 2018] epoch_id: 8, batch_id: 1800, cost: 0.055502, acc: 0.976562
+[Wed Oct 10 16:45:31 2018] epoch_id: 8, batch_id: 1900, cost: 0.161759, acc: 0.945312
+[Wed Oct 10 16:45:34 2018] epoch_id: 8, batch_id: 2000, cost: 0.063610, acc: 0.984375
+[Wed Oct 10 16:45:36 2018] epoch_id: 8, batch_id: 2100, cost: 0.103227, acc: 0.937500
+[Wed Oct 10 16:45:38 2018] epoch_id: 8, batch_id: 2200, cost: 0.065949, acc: 0.976562
+[Wed Oct 10 16:45:40 2018] epoch_id: 8, batch_id: 2300, cost: 0.060299, acc: 0.968750
+[Wed Oct 10 16:45:43 2018] epoch_id: 8, batch_id: 2400, cost: 0.089557, acc: 0.976562
+[Wed Oct 10 16:45:45 2018] epoch_id: 8, batch_id: 2500, cost: 0.095753, acc: 0.968750
+[Wed Oct 10 16:45:47 2018] epoch_id: 8, batch_id: 2600, cost: 0.111113, acc: 0.968750
+[Wed Oct 10 16:45:49 2018] epoch_id: 8, batch_id: 2700, cost: 0.074921, acc: 0.960938
+[Wed Oct 10 16:45:52 2018] epoch_id: 8, batch_id: 2800, cost: 0.105058, acc: 0.945312
+[Wed Oct 10 16:45:54 2018] epoch_id: 8, batch_id: 2900, cost: 0.173304, acc: 0.921875
+[Wed Oct 10 16:45:56 2018] epoch_id: 8, batch_id: 3000, cost: 0.077586, acc: 0.984375
+[Wed Oct 10 16:45:56 2018] epoch_id: 8, train_avg_cost: 0.072280, train_avg_acc: 0.973521
+[Wed Oct 10 16:45:57 2018] epoch_id: 8, dev_cost: 0.629243, accuracy: 0.8373
+[Wed Oct 10 16:45:58 2018] epoch_id: 8, test_cost: 0.661630, accuracy: 0.8352
+[Wed Oct 10 16:46:07 2018] epoch_id: 9, batch_id: 0, cost: 0.044024, acc: 0.984375
+[Wed Oct 10 16:46:09 2018] epoch_id: 9, batch_id: 100, cost: 0.033798, acc: 0.992188
+[Wed Oct 10 16:46:11 2018] epoch_id: 9, batch_id: 200, cost: 0.077856, acc: 0.976562
+[Wed Oct 10 16:46:14 2018] epoch_id: 9, batch_id: 300, cost: 0.119995, acc: 0.953125
+[Wed Oct 10 16:46:16 2018] epoch_id: 9, batch_id: 400, cost: 0.006741, acc: 1.000000
+[Wed Oct 10 16:46:18 2018] epoch_id: 9, batch_id: 500, cost: 0.097501, acc: 0.953125
+[Wed Oct 10 16:46:20 2018] epoch_id: 9, batch_id: 600, cost: 0.097540, acc: 0.960938
+[Wed Oct 10 16:46:22 2018] epoch_id: 9, batch_id: 700, cost: 0.085677, acc: 0.976562
+[Wed Oct 10 16:46:25 2018] epoch_id: 9, batch_id: 800, cost: 0.131135, acc: 0.960938
+[Wed Oct 10 16:46:27 2018] epoch_id: 9, batch_id: 900, cost: 0.058706, acc: 0.960938
+[Wed Oct 10 16:46:29 2018] epoch_id: 9, batch_id: 1000, cost: 0.081857, acc: 0.968750
+[Wed Oct 10 16:46:31 2018] epoch_id: 9, batch_id: 1100, cost: 0.035656, acc: 0.992188
+[Wed Oct 10 16:46:34 2018] epoch_id: 9, batch_id: 1200, cost: 0.023980, acc: 0.992188
+[Wed Oct 10 16:46:36 2018] epoch_id: 9, batch_id: 1300, cost: 0.104535, acc: 0.976562
+[Wed Oct 10 16:46:38 2018] epoch_id: 9, batch_id: 1400, cost: 0.052738, acc: 0.960938
+[Wed Oct 10 16:46:40 2018] epoch_id: 9, batch_id: 1500, cost: 0.049284, acc: 0.984375
+[Wed Oct 10 16:46:43 2018] epoch_id: 9, batch_id: 1600, cost: 0.040960, acc: 0.976562
+[Wed Oct 10 16:46:45 2018] epoch_id: 9, batch_id: 1700, cost: 0.054090, acc: 0.976562
+[Wed Oct 10 16:46:47 2018] epoch_id: 9, batch_id: 1800, cost: 0.030307, acc: 0.992188
+[Wed Oct 10 16:46:49 2018] epoch_id: 9, batch_id: 1900, cost: 0.152908, acc: 0.960938
+[Wed Oct 10 16:46:52 2018] epoch_id: 9, batch_id: 2000, cost: 0.133532, acc: 0.945312
+[Wed Oct 10 16:46:54 2018] epoch_id: 9, batch_id: 2100, cost: 0.162579, acc: 0.929688
+[Wed Oct 10 16:46:56 2018] epoch_id: 9, batch_id: 2200, cost: 0.037171, acc: 0.984375
+[Wed Oct 10 16:46:58 2018] epoch_id: 9, batch_id: 2300, cost: 0.036093, acc: 0.992188
+[Wed Oct 10 16:47:00 2018] epoch_id: 9, batch_id: 2400, cost: 0.066371, acc: 0.976562
+[Wed Oct 10 16:47:02 2018] epoch_id: 9, batch_id: 2500, cost: 0.047459, acc: 0.984375
+[Wed Oct 10 16:47:04 2018] epoch_id: 9, batch_id: 2600, cost: 0.031411, acc: 0.992188
+[Wed Oct 10 16:47:06 2018] epoch_id: 9, batch_id: 2700, cost: 0.107300, acc: 0.953125
+[Wed Oct 10 16:47:09 2018] epoch_id: 9, batch_id: 2800, cost: 0.041434, acc: 0.984375
+[Wed Oct 10 16:47:11 2018] epoch_id: 9, batch_id: 2900, cost: 0.081185, acc: 0.960938
+[Wed Oct 10 16:47:13 2018] epoch_id: 9, batch_id: 3000, cost: 0.096274, acc: 0.960938
+[Wed Oct 10 16:47:13 2018] epoch_id: 9, train_avg_cost: 0.063124, train_avg_acc: 0.976961
+[Wed Oct 10 16:47:14 2018] epoch_id: 9, dev_cost: 0.678009, accuracy: 0.8403
+[Wed Oct 10 16:47:15 2018] epoch_id: 9, test_cost: 0.707977, accuracy: 0.8359
+[Wed Oct 10 16:47:24 2018] epoch_id: 10, batch_id: 0, cost: 0.053481, acc: 0.968750
+[Wed Oct 10 16:47:26 2018] epoch_id: 10, batch_id: 100, cost: 0.024990, acc: 0.984375
+[Wed Oct 10 16:47:29 2018] epoch_id: 10, batch_id: 200, cost: 0.025989, acc: 0.992188
+[Wed Oct 10 16:47:31 2018] epoch_id: 10, batch_id: 300, cost: 0.016467, acc: 0.992188
+[Wed Oct 10 16:47:33 2018] epoch_id: 10, batch_id: 400, cost: 0.013582, acc: 1.000000
+[Wed Oct 10 16:47:35 2018] epoch_id: 10, batch_id: 500, cost: 0.062821, acc: 0.984375
+[Wed Oct 10 16:47:38 2018] epoch_id: 10, batch_id: 600, cost: 0.018919, acc: 0.992188
+[Wed Oct 10 16:47:40 2018] epoch_id: 10, batch_id: 700, cost: 0.113543, acc: 0.937500
+[Wed Oct 10 16:47:43 2018] epoch_id: 10, batch_id: 800, cost: 0.042273, acc: 0.984375
+[Wed Oct 10 16:47:45 2018] epoch_id: 10, batch_id: 900, cost: 0.040787, acc: 0.976562
+[Wed Oct 10 16:47:47 2018] epoch_id: 10, batch_id: 1000, cost: 0.013215, acc: 1.000000
+[Wed Oct 10 16:47:50 2018] epoch_id: 10, batch_id: 1100, cost: 0.056862, acc: 0.984375
+[Wed Oct 10 16:47:52 2018] epoch_id: 10, batch_id: 1200, cost: 0.114343, acc: 0.960938
+[Wed Oct 10 16:47:54 2018] epoch_id: 10, batch_id: 1300, cost: 0.068139, acc: 0.968750
+[Wed Oct 10 16:47:57 2018] epoch_id: 10, batch_id: 1400, cost: 0.036262, acc: 0.984375
+[Wed Oct 10 16:47:59 2018] epoch_id: 10, batch_id: 1500, cost: 0.031832, acc: 0.984375
+[Wed Oct 10 16:48:01 2018] epoch_id: 10, batch_id: 1600, cost: 0.098699, acc: 0.953125
+[Wed Oct 10 16:48:03 2018] epoch_id: 10, batch_id: 1700, cost: 0.073122, acc: 0.976562
+[Wed Oct 10 16:48:06 2018] epoch_id: 10, batch_id: 1800, cost: 0.035890, acc: 0.984375
+[Wed Oct 10 16:48:08 2018] epoch_id: 10, batch_id: 1900, cost: 0.036370, acc: 0.968750
+[Wed Oct 10 16:48:10 2018] epoch_id: 10, batch_id: 2000, cost: 0.073071, acc: 0.976562
+[Wed Oct 10 16:48:12 2018] epoch_id: 10, batch_id: 2100, cost: 0.017344, acc: 1.000000
+[Wed Oct 10 16:48:15 2018] epoch_id: 10, batch_id: 2200, cost: 0.146855, acc: 0.953125
+[Wed Oct 10 16:48:17 2018] epoch_id: 10, batch_id: 2300, cost: 0.068342, acc: 0.968750
+[Wed Oct 10 16:48:19 2018] epoch_id: 10, batch_id: 2400, cost: 0.026733, acc: 0.992188
+[Wed Oct 10 16:48:21 2018] epoch_id: 10, batch_id: 2500, cost: 0.085184, acc: 0.976562
+[Wed Oct 10 16:48:23 2018] epoch_id: 10, batch_id: 2600, cost: 0.065530, acc: 0.984375
+[Wed Oct 10 16:48:26 2018] epoch_id: 10, batch_id: 2700, cost: 0.111871, acc: 0.968750
+[Wed Oct 10 16:48:29 2018] epoch_id: 10, batch_id: 2800, cost: 0.063721, acc: 0.968750
+[Wed Oct 10 16:48:31 2018] epoch_id: 10, batch_id: 2900, cost: 0.026759, acc: 0.992188
+[Wed Oct 10 16:48:34 2018] epoch_id: 10, batch_id: 3000, cost: 0.031338, acc: 0.992188
+[Wed Oct 10 16:48:34 2018] epoch_id: 10, train_avg_cost: 0.055555, train_avg_acc: 0.979852
+[Wed Oct 10 16:48:35 2018] epoch_id: 10, dev_cost: 0.782007, accuracy: 0.8366
+[Wed Oct 10 16:48:36 2018] epoch_id: 10, test_cost: 0.795087, accuracy: 0.8369
+[Wed Oct 10 16:48:44 2018] epoch_id: 11, batch_id: 0, cost: 0.032797, acc: 0.992188
+[Wed Oct 10 16:48:47 2018] epoch_id: 11, batch_id: 100, cost: 0.011773, acc: 0.992188
+[Wed Oct 10 16:48:49 2018] epoch_id: 11, batch_id: 200, cost: 0.012297, acc: 1.000000
+[Wed Oct 10 16:48:51 2018] epoch_id: 11, batch_id: 300, cost: 0.032454, acc: 0.992188
+[Wed Oct 10 16:48:53 2018] epoch_id: 11, batch_id: 400, cost: 0.100247, acc: 0.976562
+[Wed Oct 10 16:48:55 2018] epoch_id: 11, batch_id: 500, cost: 0.035470, acc: 0.992188
+[Wed Oct 10 16:48:58 2018] epoch_id: 11, batch_id: 600, cost: 0.032553, acc: 0.984375
+[Wed Oct 10 16:49:00 2018] epoch_id: 11, batch_id: 700, cost: 0.035226, acc: 0.984375
+[Wed Oct 10 16:49:02 2018] epoch_id: 11, batch_id: 800, cost: 0.010961, acc: 1.000000
+[Wed Oct 10 16:49:04 2018] epoch_id: 11, batch_id: 900, cost: 0.033747, acc: 0.984375
+[Wed Oct 10 16:49:07 2018] epoch_id: 11, batch_id: 1000, cost: 0.052710, acc: 0.976562
+[Wed Oct 10 16:49:09 2018] epoch_id: 11, batch_id: 1100, cost: 0.021664, acc: 0.992188
+[Wed Oct 10 16:49:11 2018] epoch_id: 11, batch_id: 1200, cost: 0.056635, acc: 0.984375
+[Wed Oct 10 16:49:14 2018] epoch_id: 11, batch_id: 1300, cost: 0.007764, acc: 1.000000
+[Wed Oct 10 16:49:16 2018] epoch_id: 11, batch_id: 1400, cost: 0.042336, acc: 0.976562
+[Wed Oct 10 16:49:18 2018] epoch_id: 11, batch_id: 1500, cost: 0.077117, acc: 0.976562
+[Wed Oct 10 16:49:20 2018] epoch_id: 11, batch_id: 1600, cost: 0.082522, acc: 0.976562
+[Wed Oct 10 16:49:22 2018] epoch_id: 11, batch_id: 1700, cost: 0.022290, acc: 1.000000
+[Wed Oct 10 16:49:25 2018] epoch_id: 11, batch_id: 1800, cost: 0.033992, acc: 0.984375
+[Wed Oct 10 16:49:27 2018] epoch_id: 11, batch_id: 1900, cost: 0.027460, acc: 0.992188
+[Wed Oct 10 16:49:29 2018] epoch_id: 11, batch_id: 2000, cost: 0.032003, acc: 0.992188
+[Wed Oct 10 16:49:31 2018] epoch_id: 11, batch_id: 2100, cost: 0.070170, acc: 0.976562
+[Wed Oct 10 16:49:33 2018] epoch_id: 11, batch_id: 2200, cost: 0.017124, acc: 0.992188
+[Wed Oct 10 16:49:36 2018] epoch_id: 11, batch_id: 2300, cost: 0.037207, acc: 0.984375
+[Wed Oct 10 16:49:39 2018] epoch_id: 11, batch_id: 2400, cost: 0.018202, acc: 1.000000
+[Wed Oct 10 16:49:41 2018] epoch_id: 11, batch_id: 2500, cost: 0.059570, acc: 0.976562
+[Wed Oct 10 16:49:43 2018] epoch_id: 11, batch_id: 2600, cost: 0.009950, acc: 1.000000
+[Wed Oct 10 16:49:46 2018] epoch_id: 11, batch_id: 2700, cost: 0.015869, acc: 1.000000
+[Wed Oct 10 16:49:48 2018] epoch_id: 11, batch_id: 2800, cost: 0.049429, acc: 0.984375
+[Wed Oct 10 16:49:50 2018] epoch_id: 11, batch_id: 2900, cost: 0.061248, acc: 0.976562
+[Wed Oct 10 16:49:52 2018] epoch_id: 11, batch_id: 3000, cost: 0.007281, acc: 1.000000
+[Wed Oct 10 16:49:53 2018] epoch_id: 11, train_avg_cost: 0.049100, train_avg_acc: 0.982414
+[Wed Oct 10 16:49:54 2018] epoch_id: 11, dev_cost: 0.919803, accuracy: 0.8392
+[Wed Oct 10 16:49:55 2018] epoch_id: 11, test_cost: 0.963836, accuracy: 0.8354
+[Wed Oct 10 16:50:03 2018] epoch_id: 12, batch_id: 0, cost: 0.021594, acc: 0.992188
+[Wed Oct 10 16:50:05 2018] epoch_id: 12, batch_id: 100, cost: 0.003167, acc: 1.000000
+[Wed Oct 10 16:50:08 2018] epoch_id: 12, batch_id: 200, cost: 0.034331, acc: 0.984375
+[Wed Oct 10 16:50:10 2018] epoch_id: 12, batch_id: 300, cost: 0.044300, acc: 0.984375
+[Wed Oct 10 16:50:12 2018] epoch_id: 12, batch_id: 400, cost: 0.010300, acc: 1.000000
+[Wed Oct 10 16:50:15 2018] epoch_id: 12, batch_id: 500, cost: 0.071121, acc: 0.968750
+[Wed Oct 10 16:50:17 2018] epoch_id: 12, batch_id: 600, cost: 0.027463, acc: 0.984375
+[Wed Oct 10 16:50:19 2018] epoch_id: 12, batch_id: 700, cost: 0.023278, acc: 0.992188
+[Wed Oct 10 16:50:22 2018] epoch_id: 12, batch_id: 800, cost: 0.024731, acc: 0.992188
+[Wed Oct 10 16:50:25 2018] epoch_id: 12, batch_id: 900, cost: 0.033520, acc: 0.992188
+[Wed Oct 10 16:50:27 2018] epoch_id: 12, batch_id: 1000, cost: 0.066168, acc: 0.984375
+[Wed Oct 10 16:50:29 2018] epoch_id: 12, batch_id: 1100, cost: 0.086032, acc: 0.976562
+[Wed Oct 10 16:50:32 2018] epoch_id: 12, batch_id: 1200, cost: 0.041718, acc: 0.968750
+[Wed Oct 10 16:50:34 2018] epoch_id: 12, batch_id: 1300, cost: 0.085903, acc: 0.968750
+[Wed Oct 10 16:50:36 2018] epoch_id: 12, batch_id: 1400, cost: 0.022963, acc: 0.992188
+[Wed Oct 10 16:50:38 2018] epoch_id: 12, batch_id: 1500, cost: 0.008185, acc: 1.000000
+[Wed Oct 10 16:50:41 2018] epoch_id: 12, batch_id: 1600, cost: 0.057872, acc: 0.968750
+[Wed Oct 10 16:50:43 2018] epoch_id: 12, batch_id: 1700, cost: 0.011306, acc: 1.000000
+[Wed Oct 10 16:50:45 2018] epoch_id: 12, batch_id: 1800, cost: 0.030697, acc: 0.984375
+[Wed Oct 10 16:50:47 2018] epoch_id: 12, batch_id: 1900, cost: 0.049713, acc: 0.984375
+[Wed Oct 10 16:50:50 2018] epoch_id: 12, batch_id: 2000, cost: 0.050341, acc: 0.976562
+[Wed Oct 10 16:50:52 2018] epoch_id: 12, batch_id: 2100, cost: 0.024994, acc: 0.992188
+[Wed Oct 10 16:50:54 2018] epoch_id: 12, batch_id: 2200, cost: 0.046852, acc: 0.968750
+[Wed Oct 10 16:50:56 2018] epoch_id: 12, batch_id: 2300, cost: 0.055520, acc: 0.976562
+[Wed Oct 10 16:50:59 2018] epoch_id: 12, batch_id: 2400, cost: 0.085991, acc: 0.968750
+[Wed Oct 10 16:51:01 2018] epoch_id: 12, batch_id: 2500, cost: 0.044263, acc: 0.984375
+[Wed Oct 10 16:51:03 2018] epoch_id: 12, batch_id: 2600, cost: 0.071548, acc: 0.976562
+[Wed Oct 10 16:51:05 2018] epoch_id: 12, batch_id: 2700, cost: 0.039594, acc: 0.976562
+[Wed Oct 10 16:51:08 2018] epoch_id: 12, batch_id: 2800, cost: 0.058939, acc: 0.984375
+[Wed Oct 10 16:51:10 2018] epoch_id: 12, batch_id: 2900, cost: 0.070956, acc: 0.968750
+[Wed Oct 10 16:51:12 2018] epoch_id: 12, batch_id: 3000, cost: 0.059941, acc: 0.960938
+[Wed Oct 10 16:51:13 2018] epoch_id: 12, train_avg_cost: 0.044984, train_avg_acc: 0.983741
+[Wed Oct 10 16:51:14 2018] epoch_id: 12, dev_cost: 0.742705, accuracy: 0.8364
+[Wed Oct 10 16:51:14 2018] epoch_id: 12, test_cost: 0.765290, accuracy: 0.8355
+[Wed Oct 10 16:51:23 2018] epoch_id: 13, batch_id: 0, cost: 0.054822, acc: 0.968750
+[Wed Oct 10 16:51:25 2018] epoch_id: 13, batch_id: 100, cost: 0.066483, acc: 0.976562
+[Wed Oct 10 16:51:28 2018] epoch_id: 13, batch_id: 200, cost: 0.007064, acc: 1.000000
+[Wed Oct 10 16:51:30 2018] epoch_id: 13, batch_id: 300, cost: 0.050190, acc: 0.984375
+[Wed Oct 10 16:51:32 2018] epoch_id: 13, batch_id: 400, cost: 0.044636, acc: 0.984375
+[Wed Oct 10 16:51:34 2018] epoch_id: 13, batch_id: 500, cost: 0.040963, acc: 0.984375
+[Wed Oct 10 16:51:37 2018] epoch_id: 13, batch_id: 600, cost: 0.029529, acc: 0.992188
+[Wed Oct 10 16:51:39 2018] epoch_id: 13, batch_id: 700, cost: 0.011587, acc: 1.000000
+[Wed Oct 10 16:51:41 2018] epoch_id: 13, batch_id: 800, cost: 0.039673, acc: 0.984375
+[Wed Oct 10 16:51:43 2018] epoch_id: 13, batch_id: 900, cost: 0.028793, acc: 0.984375
+[Wed Oct 10 16:51:46 2018] epoch_id: 13, batch_id: 1000, cost: 0.055973, acc: 0.968750
+[Wed Oct 10 16:51:48 2018] epoch_id: 13, batch_id: 1100, cost: 0.016087, acc: 0.992188
+[Wed Oct 10 16:51:50 2018] epoch_id: 13, batch_id: 1200, cost: 0.096423, acc: 0.960938
+[Wed Oct 10 16:51:52 2018] epoch_id: 13, batch_id: 1300, cost: 0.019652, acc: 0.992188
+[Wed Oct 10 16:51:55 2018] epoch_id: 13, batch_id: 1400, cost: 0.018604, acc: 0.992188
+[Wed Oct 10 16:51:57 2018] epoch_id: 13, batch_id: 1500, cost: 0.060169, acc: 0.960938
+[Wed Oct 10 16:51:59 2018] epoch_id: 13, batch_id: 1600, cost: 0.014124, acc: 0.992188
+[Wed Oct 10 16:52:01 2018] epoch_id: 13, batch_id: 1700, cost: 0.029843, acc: 0.984375
+[Wed Oct 10 16:52:05 2018] epoch_id: 13, batch_id: 1800, cost: 0.063125, acc: 0.976562
+[Wed Oct 10 16:52:07 2018] epoch_id: 13, batch_id: 1900, cost: 0.070910, acc: 0.953125
+[Wed Oct 10 16:52:09 2018] epoch_id: 13, batch_id: 2000, cost: 0.042864, acc: 0.984375
+[Wed Oct 10 16:52:11 2018] epoch_id: 13, batch_id: 2100, cost: 0.014658, acc: 0.992188
+[Wed Oct 10 16:52:14 2018] epoch_id: 13, batch_id: 2200, cost: 0.075003, acc: 0.968750
+[Wed Oct 10 16:52:16 2018] epoch_id: 13, batch_id: 2300, cost: 0.034856, acc: 0.976562
+[Wed Oct 10 16:52:18 2018] epoch_id: 13, batch_id: 2400, cost: 0.040518, acc: 0.976562
+[Wed Oct 10 16:52:20 2018] epoch_id: 13, batch_id: 2500, cost: 0.040826, acc: 0.976562
+[Wed Oct 10 16:52:23 2018] epoch_id: 13, batch_id: 2600, cost: 0.043420, acc: 0.968750
+[Wed Oct 10 16:52:25 2018] epoch_id: 13, batch_id: 2700, cost: 0.027364, acc: 0.984375
+[Wed Oct 10 16:52:27 2018] epoch_id: 13, batch_id: 2800, cost: 0.030051, acc: 0.984375
+[Wed Oct 10 16:52:30 2018] epoch_id: 13, batch_id: 2900, cost: 0.040024, acc: 0.984375
+[Wed Oct 10 16:52:32 2018] epoch_id: 13, batch_id: 3000, cost: 0.054583, acc: 0.968750
+[Wed Oct 10 16:52:32 2018] epoch_id: 13, train_avg_cost: 0.041237, train_avg_acc: 0.985349
+[Wed Oct 10 16:52:33 2018] epoch_id: 13, dev_cost: 1.078762, accuracy: 0.8411
+[Wed Oct 10 16:52:34 2018] epoch_id: 13, test_cost: 1.111191, accuracy: 0.8358
+[Wed Oct 10 16:52:43 2018] epoch_id: 14, batch_id: 0, cost: 0.003011, acc: 1.000000
+[Wed Oct 10 16:52:45 2018] epoch_id: 14, batch_id: 100, cost: 0.006236, acc: 1.000000
+[Wed Oct 10 16:52:48 2018] epoch_id: 14, batch_id: 200, cost: 0.017501, acc: 0.992188
+[Wed Oct 10 16:52:50 2018] epoch_id: 14, batch_id: 300, cost: 0.062686, acc: 0.976562
+[Wed Oct 10 16:52:52 2018] epoch_id: 14, batch_id: 400, cost: 0.008696, acc: 1.000000
+[Wed Oct 10 16:52:54 2018] epoch_id: 14, batch_id: 500, cost: 0.033238, acc: 0.984375
+[Wed Oct 10 16:52:57 2018] epoch_id: 14, batch_id: 600, cost: 0.086478, acc: 0.976562
+[Wed Oct 10 16:52:59 2018] epoch_id: 14, batch_id: 700, cost: 0.009820, acc: 0.992188
+[Wed Oct 10 16:53:01 2018] epoch_id: 14, batch_id: 800, cost: 0.066287, acc: 0.992188
+[Wed Oct 10 16:53:03 2018] epoch_id: 14, batch_id: 900, cost: 0.004043, acc: 1.000000
+[Wed Oct 10 16:53:05 2018] epoch_id: 14, batch_id: 1000, cost: 0.007859, acc: 1.000000
+[Wed Oct 10 16:53:08 2018] epoch_id: 14, batch_id: 1100, cost: 0.040856, acc: 0.976562
+[Wed Oct 10 16:53:10 2018] epoch_id: 14, batch_id: 1200, cost: 0.038995, acc: 0.984375
+[Wed Oct 10 16:53:12 2018] epoch_id: 14, batch_id: 1300, cost: 0.026738, acc: 0.992188
+[Wed Oct 10 16:53:14 2018] epoch_id: 14, batch_id: 1400, cost: 0.048141, acc: 0.968750
+[Wed Oct 10 16:53:16 2018] epoch_id: 14, batch_id: 1500, cost: 0.081051, acc: 0.976562
+[Wed Oct 10 16:53:19 2018] epoch_id: 14, batch_id: 1600, cost: 0.017602, acc: 0.992188
+[Wed Oct 10 16:53:21 2018] epoch_id: 14, batch_id: 1700, cost: 0.018175, acc: 0.992188
+[Wed Oct 10 16:53:23 2018] epoch_id: 14, batch_id: 1800, cost: 0.076890, acc: 0.968750
+[Wed Oct 10 16:53:25 2018] epoch_id: 14, batch_id: 1900, cost: 0.060768, acc: 0.976562
+[Wed Oct 10 16:53:28 2018] epoch_id: 14, batch_id: 2000, cost: 0.020131, acc: 0.984375
+[Wed Oct 10 16:53:30 2018] epoch_id: 14, batch_id: 2100, cost: 0.077612, acc: 0.976562
+[Wed Oct 10 16:53:32 2018] epoch_id: 14, batch_id: 2200, cost: 0.101997, acc: 0.960938
+[Wed Oct 10 16:53:34 2018] epoch_id: 14, batch_id: 2300, cost: 0.061213, acc: 0.976562
+[Wed Oct 10 16:53:37 2018] epoch_id: 14, batch_id: 2400, cost: 0.048987, acc: 0.976562
+[Wed Oct 10 16:53:39 2018] epoch_id: 14, batch_id: 2500, cost: 0.037741, acc: 0.984375
+[Wed Oct 10 16:53:41 2018] epoch_id: 14, batch_id: 2600, cost: 0.011101, acc: 1.000000
+[Wed Oct 10 16:53:43 2018] epoch_id: 14, batch_id: 2700, cost: 0.019846, acc: 0.992188
+[Wed Oct 10 16:53:45 2018] epoch_id: 14, batch_id: 2800, cost: 0.026633, acc: 1.000000
+[Wed Oct 10 16:53:48 2018] epoch_id: 14, batch_id: 2900, cost: 0.048637, acc: 0.976562
+[Wed Oct 10 16:53:50 2018] epoch_id: 14, batch_id: 3000, cost: 0.056658, acc: 0.992188
+[Wed Oct 10 16:53:50 2018] epoch_id: 14, train_avg_cost: 0.037520, train_avg_acc: 0.986595
+[Wed Oct 10 16:53:51 2018] epoch_id: 14, dev_cost: 0.958707, accuracy: 0.8367
+[Wed Oct 10 16:53:52 2018] epoch_id: 14, test_cost: 0.974553, accuracy: 0.8382
+[Wed Oct 10 16:54:01 2018] epoch_id: 15, batch_id: 0, cost: 0.015232, acc: 1.000000
+[Wed Oct 10 16:54:04 2018] epoch_id: 15, batch_id: 100, cost: 0.007195, acc: 1.000000
+[Wed Oct 10 16:54:06 2018] epoch_id: 15, batch_id: 200, cost: 0.017140, acc: 0.992188
+[Wed Oct 10 16:54:08 2018] epoch_id: 15, batch_id: 300, cost: 0.003196, acc: 1.000000
+[Wed Oct 10 16:54:10 2018] epoch_id: 15, batch_id: 400, cost: 0.046839, acc: 0.976562
+[Wed Oct 10 16:54:13 2018] epoch_id: 15, batch_id: 500, cost: 0.038533, acc: 0.992188
+[Wed Oct 10 16:54:15 2018] epoch_id: 15, batch_id: 600, cost: 0.016502, acc: 0.992188
+[Wed Oct 10 16:54:17 2018] epoch_id: 15, batch_id: 700, cost: 0.041825, acc: 0.976562
+[Wed Oct 10 16:54:20 2018] epoch_id: 15, batch_id: 800, cost: 0.083583, acc: 0.968750
+[Wed Oct 10 16:54:22 2018] epoch_id: 15, batch_id: 900, cost: 0.013552, acc: 0.992188
+[Wed Oct 10 16:54:24 2018] epoch_id: 15, batch_id: 1000, cost: 0.015114, acc: 1.000000
+[Wed Oct 10 16:54:26 2018] epoch_id: 15, batch_id: 1100, cost: 0.020185, acc: 0.992188
+[Wed Oct 10 16:54:29 2018] epoch_id: 15, batch_id: 1200, cost: 0.023274, acc: 0.984375
+[Wed Oct 10 16:54:31 2018] epoch_id: 15, batch_id: 1300, cost: 0.013836, acc: 1.000000
+[Wed Oct 10 16:54:33 2018] epoch_id: 15, batch_id: 1400, cost: 0.091024, acc: 0.984375
+[Wed Oct 10 16:54:36 2018] epoch_id: 15, batch_id: 1500, cost: 0.047340, acc: 0.976562
+[Wed Oct 10 16:54:38 2018] epoch_id: 15, batch_id: 1600, cost: 0.030423, acc: 0.992188
+[Wed Oct 10 16:54:40 2018] epoch_id: 15, batch_id: 1700, cost: 0.014750, acc: 0.992188
+[Wed Oct 10 16:54:42 2018] epoch_id: 15, batch_id: 1800, cost: 0.090613, acc: 0.968750
+[Wed Oct 10 16:54:45 2018] epoch_id: 15, batch_id: 1900, cost: 0.030791, acc: 0.984375
+[Wed Oct 10 16:54:47 2018] epoch_id: 15, batch_id: 2000, cost: 0.046719, acc: 0.976562
+[Wed Oct 10 16:54:49 2018] epoch_id: 15, batch_id: 2100, cost: 0.043871, acc: 0.984375
+[Wed Oct 10 16:54:51 2018] epoch_id: 15, batch_id: 2200, cost: 0.078455, acc: 0.968750
+[Wed Oct 10 16:54:53 2018] epoch_id: 15, batch_id: 2300, cost: 0.029536, acc: 0.976562
+[Wed Oct 10 16:54:56 2018] epoch_id: 15, batch_id: 2400, cost: 0.028696, acc: 0.984375
+[Wed Oct 10 16:54:58 2018] epoch_id: 15, batch_id: 2500, cost: 0.007129, acc: 0.992188
+[Wed Oct 10 16:55:00 2018] epoch_id: 15, batch_id: 2600, cost: 0.049990, acc: 0.976562
+[Wed Oct 10 16:55:03 2018] epoch_id: 15, batch_id: 2700, cost: 0.040309, acc: 0.984375
+[Wed Oct 10 16:55:06 2018] epoch_id: 15, batch_id: 2800, cost: 0.098748, acc: 0.976562
+[Wed Oct 10 16:55:08 2018] epoch_id: 15, batch_id: 2900, cost: 0.005371, acc: 1.000000
+[Wed Oct 10 16:55:10 2018] epoch_id: 15, batch_id: 3000, cost: 0.060264, acc: 0.960938
+[Wed Oct 10 16:55:11 2018] epoch_id: 15, train_avg_cost: 0.034637, train_avg_acc: 0.987582
+[Wed Oct 10 16:55:12 2018] epoch_id: 15, dev_cost: 0.858216, accuracy: 0.8365
+[Wed Oct 10 16:55:13 2018] epoch_id: 15, test_cost: 0.874420, accuracy: 0.8411
+[Wed Oct 10 16:55:21 2018] epoch_id: 16, batch_id: 0, cost: 0.013283, acc: 1.000000
+[Wed Oct 10 16:55:23 2018] epoch_id: 16, batch_id: 100, cost: 0.038128, acc: 0.984375
+[Wed Oct 10 16:55:25 2018] epoch_id: 16, batch_id: 200, cost: 0.031110, acc: 0.976562
+[Wed Oct 10 16:55:28 2018] epoch_id: 16, batch_id: 300, cost: 0.005346, acc: 1.000000
+[Wed Oct 10 16:55:30 2018] epoch_id: 16, batch_id: 400, cost: 0.027634, acc: 0.984375
+[Wed Oct 10 16:55:32 2018] epoch_id: 16, batch_id: 500, cost: 0.065929, acc: 0.976562
+[Wed Oct 10 16:55:35 2018] epoch_id: 16, batch_id: 600, cost: 0.012638, acc: 0.992188
+[Wed Oct 10 16:55:37 2018] epoch_id: 16, batch_id: 700, cost: 0.057962, acc: 0.984375
+[Wed Oct 10 16:55:39 2018] epoch_id: 16, batch_id: 800, cost: 0.064390, acc: 0.976562
+[Wed Oct 10 16:55:42 2018] epoch_id: 16, batch_id: 900, cost: 0.018866, acc: 0.992188
+[Wed Oct 10 16:55:44 2018] epoch_id: 16, batch_id: 1000, cost: 0.004791, acc: 1.000000
+[Wed Oct 10 16:55:46 2018] epoch_id: 16, batch_id: 1100, cost: 0.012691, acc: 0.992188
+[Wed Oct 10 16:55:49 2018] epoch_id: 16, batch_id: 1200, cost: 0.033199, acc: 0.992188
+[Wed Oct 10 16:55:51 2018] epoch_id: 16, batch_id: 1300, cost: 0.007757, acc: 1.000000
+[Wed Oct 10 16:55:53 2018] epoch_id: 16, batch_id: 1400, cost: 0.016653, acc: 0.992188
+[Wed Oct 10 16:55:55 2018] epoch_id: 16, batch_id: 1500, cost: 0.034653, acc: 0.968750
+[Wed Oct 10 16:55:58 2018] epoch_id: 16, batch_id: 1600, cost: 0.051049, acc: 0.976562
+[Wed Oct 10 16:56:00 2018] epoch_id: 16, batch_id: 1700, cost: 0.001466, acc: 1.000000
+[Wed Oct 10 16:56:02 2018] epoch_id: 16, batch_id: 1800, cost: 0.035508, acc: 0.992188
+[Wed Oct 10 16:56:05 2018] epoch_id: 16, batch_id: 1900, cost: 0.022919, acc: 0.984375
+[Wed Oct 10 16:56:07 2018] epoch_id: 16, batch_id: 2000, cost: 0.102175, acc: 0.976562
+[Wed Oct 10 16:56:09 2018] epoch_id: 16, batch_id: 2100, cost: 0.012663, acc: 1.000000
+[Wed Oct 10 16:56:11 2018] epoch_id: 16, batch_id: 2200, cost: 0.026142, acc: 0.984375
+[Wed Oct 10 16:56:15 2018] epoch_id: 16, batch_id: 2300, cost: 0.007566, acc: 1.000000
+[Wed Oct 10 16:56:17 2018] epoch_id: 16, batch_id: 2400, cost: 0.043235, acc: 0.976562
+[Wed Oct 10 16:56:20 2018] epoch_id: 16, batch_id: 2500, cost: 0.039383, acc: 0.984375
+[Wed Oct 10 16:56:22 2018] epoch_id: 16, batch_id: 2600, cost: 0.009917, acc: 1.000000
+[Wed Oct 10 16:56:24 2018] epoch_id: 16, batch_id: 2700, cost: 0.036917, acc: 0.984375
+[Wed Oct 10 16:56:26 2018] epoch_id: 16, batch_id: 2800, cost: 0.012813, acc: 1.000000
+[Wed Oct 10 16:56:29 2018] epoch_id: 16, batch_id: 2900, cost: 0.033933, acc: 0.984375
+[Wed Oct 10 16:56:31 2018] epoch_id: 16, batch_id: 3000, cost: 0.007463, acc: 1.000000
+[Wed Oct 10 16:56:32 2018] epoch_id: 16, train_avg_cost: 0.031971, train_avg_acc: 0.988555
+[Wed Oct 10 16:56:33 2018] epoch_id: 16, dev_cost: 0.955907, accuracy: 0.8389
+[Wed Oct 10 16:56:34 2018] epoch_id: 16, test_cost: 0.953062, accuracy: 0.8389
+[Wed Oct 10 16:56:42 2018] epoch_id: 17, batch_id: 0, cost: 0.031323, acc: 0.992188
+[Wed Oct 10 16:56:44 2018] epoch_id: 17, batch_id: 100, cost: 0.010965, acc: 1.000000
+[Wed Oct 10 16:56:46 2018] epoch_id: 17, batch_id: 200, cost: 0.056771, acc: 0.976562
+[Wed Oct 10 16:56:49 2018] epoch_id: 17, batch_id: 300, cost: 0.026509, acc: 0.992188
+[Wed Oct 10 16:56:51 2018] epoch_id: 17, batch_id: 400, cost: 0.039409, acc: 0.992188
+[Wed Oct 10 16:56:53 2018] epoch_id: 17, batch_id: 500, cost: 0.063554, acc: 0.976562
+[Wed Oct 10 16:56:55 2018] epoch_id: 17, batch_id: 600, cost: 0.035896, acc: 0.976562
+[Wed Oct 10 16:56:58 2018] epoch_id: 17, batch_id: 700, cost: 0.022053, acc: 0.992188
+[Wed Oct 10 16:57:00 2018] epoch_id: 17, batch_id: 800, cost: 0.024150, acc: 0.976562
+[Wed Oct 10 16:57:03 2018] epoch_id: 17, batch_id: 900, cost: 0.009064, acc: 0.992188
+[Wed Oct 10 16:57:05 2018] epoch_id: 17, batch_id: 1000, cost: 0.037311, acc: 0.976562
+[Wed Oct 10 16:57:08 2018] epoch_id: 17, batch_id: 1100, cost: 0.036577, acc: 0.976562
+[Wed Oct 10 16:57:10 2018] epoch_id: 17, batch_id: 1200, cost: 0.020783, acc: 0.992188
+[Wed Oct 10 16:57:12 2018] epoch_id: 17, batch_id: 1300, cost: 0.017610, acc: 0.992188
+[Wed Oct 10 16:57:14 2018] epoch_id: 17, batch_id: 1400, cost: 0.027604, acc: 0.976562
+[Wed Oct 10 16:57:17 2018] epoch_id: 17, batch_id: 1500, cost: 0.040730, acc: 0.992188
+[Wed Oct 10 16:57:19 2018] epoch_id: 17, batch_id: 1600, cost: 0.077946, acc: 0.984375
+[Wed Oct 10 16:57:21 2018] epoch_id: 17, batch_id: 1700, cost: 0.021349, acc: 0.984375
+[Wed Oct 10 16:57:24 2018] epoch_id: 17, batch_id: 1800, cost: 0.016132, acc: 0.992188
+[Wed Oct 10 16:57:26 2018] epoch_id: 17, batch_id: 1900, cost: 0.018797, acc: 0.984375
+[Wed Oct 10 16:57:28 2018] epoch_id: 17, batch_id: 2000, cost: 0.009052, acc: 1.000000
+[Wed Oct 10 16:57:30 2018] epoch_id: 17, batch_id: 2100, cost: 0.028399, acc: 0.992188
+[Wed Oct 10 16:57:33 2018] epoch_id: 17, batch_id: 2200, cost: 0.009593, acc: 1.000000
+[Wed Oct 10 16:57:35 2018] epoch_id: 17, batch_id: 2300, cost: 0.018474, acc: 0.992188
+[Wed Oct 10 16:57:37 2018] epoch_id: 17, batch_id: 2400, cost: 0.007873, acc: 1.000000
+[Wed Oct 10 16:57:40 2018] epoch_id: 17, batch_id: 2500, cost: 0.054923, acc: 0.976562
+[Wed Oct 10 16:57:42 2018] epoch_id: 17, batch_id: 2600, cost: 0.019036, acc: 0.992188
+[Wed Oct 10 16:57:44 2018] epoch_id: 17, batch_id: 2700, cost: 0.017081, acc: 1.000000
+[Wed Oct 10 16:57:46 2018] epoch_id: 17, batch_id: 2800, cost: 0.045522, acc: 0.976562
+[Wed Oct 10 16:57:49 2018] epoch_id: 17, batch_id: 2900, cost: 0.034922, acc: 0.984375
+[Wed Oct 10 16:57:51 2018] epoch_id: 17, batch_id: 3000, cost: 0.039566, acc: 0.984375
+[Wed Oct 10 16:57:51 2018] epoch_id: 17, train_avg_cost: 0.030061, train_avg_acc: 0.989478
+[Wed Oct 10 16:57:52 2018] epoch_id: 17, dev_cost: 1.184997, accuracy: 0.8406
+[Wed Oct 10 16:57:53 2018] epoch_id: 17, test_cost: 1.175792, accuracy: 0.8372
+[Wed Oct 10 16:58:02 2018] epoch_id: 18, batch_id: 0, cost: 0.015059, acc: 0.992188
+[Wed Oct 10 16:58:04 2018] epoch_id: 18, batch_id: 100, cost: 0.023421, acc: 0.992188
+[Wed Oct 10 16:58:06 2018] epoch_id: 18, batch_id: 200, cost: 0.007234, acc: 1.000000
+[Wed Oct 10 16:58:08 2018] epoch_id: 18, batch_id: 300, cost: 0.007139, acc: 1.000000
+[Wed Oct 10 16:58:10 2018] epoch_id: 18, batch_id: 400, cost: 0.007934, acc: 1.000000
+[Wed Oct 10 16:58:13 2018] epoch_id: 18, batch_id: 500, cost: 0.004312, acc: 1.000000
+[Wed Oct 10 16:58:15 2018] epoch_id: 18, batch_id: 600, cost: 0.001806, acc: 1.000000
+[Wed Oct 10 16:58:17 2018] epoch_id: 18, batch_id: 700, cost: 0.004790, acc: 1.000000
+[Wed Oct 10 16:58:20 2018] epoch_id: 18, batch_id: 800, cost: 0.048477, acc: 0.992188
+[Wed Oct 10 16:58:22 2018] epoch_id: 18, batch_id: 900, cost: 0.066390, acc: 0.992188
+[Wed Oct 10 16:58:24 2018] epoch_id: 18, batch_id: 1000, cost: 0.014440, acc: 0.992188
+[Wed Oct 10 16:58:26 2018] epoch_id: 18, batch_id: 1100, cost: 0.020435, acc: 0.992188
+[Wed Oct 10 16:58:29 2018] epoch_id: 18, batch_id: 1200, cost: 0.007474, acc: 0.992188
+[Wed Oct 10 16:58:31 2018] epoch_id: 18, batch_id: 1300, cost: 0.036209, acc: 0.984375
+[Wed Oct 10 16:58:33 2018] epoch_id: 18, batch_id: 1400, cost: 0.026540, acc: 0.984375
+[Wed Oct 10 16:58:35 2018] epoch_id: 18, batch_id: 1500, cost: 0.019448, acc: 0.992188
+[Wed Oct 10 16:58:38 2018] epoch_id: 18, batch_id: 1600, cost: 0.052421, acc: 0.968750
+[Wed Oct 10 16:58:40 2018] epoch_id: 18, batch_id: 1700, cost: 0.022365, acc: 0.992188
+[Wed Oct 10 16:58:42 2018] epoch_id: 18, batch_id: 1800, cost: 0.135754, acc: 0.984375
+[Wed Oct 10 16:58:45 2018] epoch_id: 18, batch_id: 1900, cost: 0.037197, acc: 0.992188
+[Wed Oct 10 16:58:48 2018] epoch_id: 18, batch_id: 2000, cost: 0.010672, acc: 0.992188
+[Wed Oct 10 16:58:50 2018] epoch_id: 18, batch_id: 2100, cost: 0.012909, acc: 1.000000
+[Wed Oct 10 16:58:52 2018] epoch_id: 18, batch_id: 2200, cost: 0.061615, acc: 0.976562
+[Wed Oct 10 16:58:55 2018] epoch_id: 18, batch_id: 2300, cost: 0.081252, acc: 0.960938
+[Wed Oct 10 16:58:57 2018] epoch_id: 18, batch_id: 2400, cost: 0.009792, acc: 1.000000
+[Wed Oct 10 16:58:59 2018] epoch_id: 18, batch_id: 2500, cost: 0.039835, acc: 0.984375
+[Wed Oct 10 16:59:02 2018] epoch_id: 18, batch_id: 2600, cost: 0.002643, acc: 1.000000
+[Wed Oct 10 16:59:04 2018] epoch_id: 18, batch_id: 2700, cost: 0.017633, acc: 0.992188
+[Wed Oct 10 16:59:06 2018] epoch_id: 18, batch_id: 2800, cost: 0.050407, acc: 0.976562
+[Wed Oct 10 16:59:08 2018] epoch_id: 18, batch_id: 2900, cost: 0.066672, acc: 0.960938
+[Wed Oct 10 16:59:11 2018] epoch_id: 18, batch_id: 3000, cost: 0.023438, acc: 0.984375
+[Wed Oct 10 16:59:11 2018] epoch_id: 18, train_avg_cost: 0.028777, train_avg_acc: 0.989884
+[Wed Oct 10 16:59:12 2018] epoch_id: 18, dev_cost: 1.191979, accuracy: 0.8346
+[Wed Oct 10 16:59:13 2018] epoch_id: 18, test_cost: 1.159855, accuracy: 0.8344
+[Wed Oct 10 16:59:22 2018] epoch_id: 19, batch_id: 0, cost: 0.023233, acc: 0.992188
+[Wed Oct 10 16:59:24 2018] epoch_id: 19, batch_id: 100, cost: 0.006624, acc: 1.000000
+[Wed Oct 10 16:59:26 2018] epoch_id: 19, batch_id: 200, cost: 0.018784, acc: 0.992188
+[Wed Oct 10 16:59:28 2018] epoch_id: 19, batch_id: 300, cost: 0.012745, acc: 0.992188
+[Wed Oct 10 16:59:31 2018] epoch_id: 19, batch_id: 400, cost: 0.010857, acc: 1.000000
+[Wed Oct 10 16:59:33 2018] epoch_id: 19, batch_id: 500, cost: 0.006066, acc: 1.000000
+[Wed Oct 10 16:59:35 2018] epoch_id: 19, batch_id: 600, cost: 0.014349, acc: 0.992188
+[Wed Oct 10 16:59:38 2018] epoch_id: 19, batch_id: 700, cost: 0.016725, acc: 0.992188
+[Wed Oct 10 16:59:40 2018] epoch_id: 19, batch_id: 800, cost: 0.069121, acc: 0.984375
+[Wed Oct 10 16:59:42 2018] epoch_id: 19, batch_id: 900, cost: 0.018849, acc: 0.984375
+[Wed Oct 10 16:59:44 2018] epoch_id: 19, batch_id: 1000, cost: 0.031679, acc: 0.984375
+[Wed Oct 10 16:59:47 2018] epoch_id: 19, batch_id: 1100, cost: 0.010815, acc: 0.992188
+[Wed Oct 10 16:59:49 2018] epoch_id: 19, batch_id: 1200, cost: 0.015778, acc: 0.992188
+[Wed Oct 10 16:59:51 2018] epoch_id: 19, batch_id: 1300, cost: 0.055160, acc: 0.984375
+[Wed Oct 10 16:59:53 2018] epoch_id: 19, batch_id: 1400, cost: 0.009311, acc: 0.992188
+[Wed Oct 10 16:59:55 2018] epoch_id: 19, batch_id: 1500, cost: 0.014874, acc: 0.992188
+[Wed Oct 10 16:59:58 2018] epoch_id: 19, batch_id: 1600, cost: 0.038188, acc: 0.992188
+[Wed Oct 10 17:00:00 2018] epoch_id: 19, batch_id: 1700, cost: 0.001565, acc: 1.000000
+[Wed Oct 10 17:00:02 2018] epoch_id: 19, batch_id: 1800, cost: 0.013963, acc: 0.992188
+[Wed Oct 10 17:00:04 2018] epoch_id: 19, batch_id: 1900, cost: 0.028362, acc: 0.992188
+[Wed Oct 10 17:00:06 2018] epoch_id: 19, batch_id: 2000, cost: 0.006552, acc: 1.000000
+[Wed Oct 10 17:00:09 2018] epoch_id: 19, batch_id: 2100, cost: 0.045230, acc: 0.992188
+[Wed Oct 10 17:00:11 2018] epoch_id: 19, batch_id: 2200, cost: 0.029525, acc: 0.984375
+[Wed Oct 10 17:00:13 2018] epoch_id: 19, batch_id: 2300, cost: 0.009774, acc: 0.992188
+[Wed Oct 10 17:00:15 2018] epoch_id: 19, batch_id: 2400, cost: 0.003385, acc: 1.000000
+[Wed Oct 10 17:00:18 2018] epoch_id: 19, batch_id: 2500, cost: 0.030629, acc: 0.984375
+[Wed Oct 10 17:00:20 2018] epoch_id: 19, batch_id: 2600, cost: 0.039615, acc: 0.992188
+[Wed Oct 10 17:00:22 2018] epoch_id: 19, batch_id: 2700, cost: 0.016678, acc: 0.992188
+[Wed Oct 10 17:00:24 2018] epoch_id: 19, batch_id: 2800, cost: 0.004723, acc: 1.000000
+[Wed Oct 10 17:00:26 2018] epoch_id: 19, batch_id: 2900, cost: 0.018062, acc: 0.992188
+[Wed Oct 10 17:00:29 2018] epoch_id: 19, batch_id: 3000, cost: 0.032904, acc: 0.984375
+[Wed Oct 10 17:00:29 2018] epoch_id: 19, train_avg_cost: 0.026175, train_avg_acc: 0.991055
+[Wed Oct 10 17:00:30 2018] epoch_id: 19, dev_cost: 1.013367, accuracy: 0.8388
+[Wed Oct 10 17:00:31 2018] epoch_id: 19, test_cost: 1.016906, accuracy: 0.8335
+[Wed Oct 10 17:00:40 2018] epoch_id: 20, batch_id: 0, cost: 0.019038, acc: 0.992188
+[Wed Oct 10 17:00:42 2018] epoch_id: 20, batch_id: 100, cost: 0.001216, acc: 1.000000
+[Wed Oct 10 17:00:44 2018] epoch_id: 20, batch_id: 200, cost: 0.006635, acc: 1.000000
+[Wed Oct 10 17:00:47 2018] epoch_id: 20, batch_id: 300, cost: 0.051503, acc: 0.984375
+[Wed Oct 10 17:00:49 2018] epoch_id: 20, batch_id: 400, cost: 0.044815, acc: 0.992188
+[Wed Oct 10 17:00:51 2018] epoch_id: 20, batch_id: 500, cost: 0.041529, acc: 0.992188
+[Wed Oct 10 17:00:53 2018] epoch_id: 20, batch_id: 600, cost: 0.010035, acc: 1.000000
+[Wed Oct 10 17:00:56 2018] epoch_id: 20, batch_id: 700, cost: 0.019799, acc: 0.992188
+[Wed Oct 10 17:00:58 2018] epoch_id: 20, batch_id: 800, cost: 0.062296, acc: 0.984375
+[Wed Oct 10 17:01:00 2018] epoch_id: 20, batch_id: 900, cost: 0.015680, acc: 0.992188
+[Wed Oct 10 17:01:03 2018] epoch_id: 20, batch_id: 1000, cost: 0.051963, acc: 0.984375
+[Wed Oct 10 17:01:05 2018] epoch_id: 20, batch_id: 1100, cost: 0.023968, acc: 0.984375
+[Wed Oct 10 17:01:07 2018] epoch_id: 20, batch_id: 1200, cost: 0.079527, acc: 0.984375
+[Wed Oct 10 17:01:09 2018] epoch_id: 20, batch_id: 1300, cost: 0.039612, acc: 0.992188
+[Wed Oct 10 17:01:12 2018] epoch_id: 20, batch_id: 1400, cost: 0.010211, acc: 1.000000
+[Wed Oct 10 17:01:14 2018] epoch_id: 20, batch_id: 1500, cost: 0.012661, acc: 0.992188
+[Wed Oct 10 17:01:16 2018] epoch_id: 20, batch_id: 1600, cost: 0.051475, acc: 0.984375
+[Wed Oct 10 17:01:18 2018] epoch_id: 20, batch_id: 1700, cost: 0.013513, acc: 1.000000
+[Wed Oct 10 17:01:21 2018] epoch_id: 20, batch_id: 1800, cost: 0.006646, acc: 1.000000
+[Wed Oct 10 17:01:23 2018] epoch_id: 20, batch_id: 1900, cost: 0.013369, acc: 0.992188
+[Wed Oct 10 17:01:25 2018] epoch_id: 20, batch_id: 2000, cost: 0.030614, acc: 0.984375
+[Wed Oct 10 17:01:27 2018] epoch_id: 20, batch_id: 2100, cost: 0.003242, acc: 1.000000
+[Wed Oct 10 17:01:30 2018] epoch_id: 20, batch_id: 2200, cost: 0.051409, acc: 0.984375
+[Wed Oct 10 17:01:32 2018] epoch_id: 20, batch_id: 2300, cost: 0.005996, acc: 1.000000
+[Wed Oct 10 17:01:34 2018] epoch_id: 20, batch_id: 2400, cost: 0.049493, acc: 0.976562
+[Wed Oct 10 17:01:36 2018] epoch_id: 20, batch_id: 2500, cost: 0.013635, acc: 0.992188
+[Wed Oct 10 17:01:38 2018] epoch_id: 20, batch_id: 2600, cost: 0.019265, acc: 1.000000
+[Wed Oct 10 17:01:41 2018] epoch_id: 20, batch_id: 2700, cost: 0.040467, acc: 0.976562
+[Wed Oct 10 17:01:44 2018] epoch_id: 20, batch_id: 2800, cost: 0.029407, acc: 0.992188
+[Wed Oct 10 17:01:46 2018] epoch_id: 20, batch_id: 2900, cost: 0.036886, acc: 0.976562
+[Wed Oct 10 17:01:49 2018] epoch_id: 20, batch_id: 3000, cost: 0.018317, acc: 0.992188
+[Wed Oct 10 17:01:49 2018] epoch_id: 20, train_avg_cost: 0.025258, train_avg_acc: 0.991367
+[Wed Oct 10 17:01:50 2018] epoch_id: 20, dev_cost: 1.125290, accuracy: 0.8358
+[Wed Oct 10 17:01:51 2018] epoch_id: 20, test_cost: 1.148761, accuracy: 0.832
+[Wed Oct 10 17:01:59 2018] epoch_id: 21, batch_id: 0, cost: 0.020581, acc: 0.992188
+[Wed Oct 10 17:02:02 2018] epoch_id: 21, batch_id: 100, cost: 0.021132, acc: 0.992188
+[Wed Oct 10 17:02:04 2018] epoch_id: 21, batch_id: 200, cost: 0.040257, acc: 0.976562
+[Wed Oct 10 17:02:06 2018] epoch_id: 21, batch_id: 300, cost: 0.013450, acc: 1.000000
+[Wed Oct 10 17:02:08 2018] epoch_id: 21, batch_id: 400, cost: 0.027469, acc: 0.992188
+[Wed Oct 10 17:02:11 2018] epoch_id: 21, batch_id: 500, cost: 0.007088, acc: 0.992188
+[Wed Oct 10 17:02:13 2018] epoch_id: 21, batch_id: 600, cost: 0.028169, acc: 0.992188
+[Wed Oct 10 17:02:15 2018] epoch_id: 21, batch_id: 700, cost: 0.067799, acc: 0.984375
+[Wed Oct 10 17:02:17 2018] epoch_id: 21, batch_id: 800, cost: 0.003184, acc: 1.000000
+[Wed Oct 10 17:02:20 2018] epoch_id: 21, batch_id: 900, cost: 0.011056, acc: 0.992188
+[Wed Oct 10 17:02:22 2018] epoch_id: 21, batch_id: 1000, cost: 0.012187, acc: 1.000000
+[Wed Oct 10 17:02:24 2018] epoch_id: 21, batch_id: 1100, cost: 0.009409, acc: 0.992188
+[Wed Oct 10 17:02:26 2018] epoch_id: 21, batch_id: 1200, cost: 0.000739, acc: 1.000000
+[Wed Oct 10 17:02:29 2018] epoch_id: 21, batch_id: 1300, cost: 0.002971, acc: 1.000000
+[Wed Oct 10 17:02:31 2018] epoch_id: 21, batch_id: 1400, cost: 0.031287, acc: 0.984375
+[Wed Oct 10 17:02:33 2018] epoch_id: 21, batch_id: 1500, cost: 0.023455, acc: 0.992188
+[Wed Oct 10 17:02:36 2018] epoch_id: 21, batch_id: 1600, cost: 0.007438, acc: 1.000000
+[Wed Oct 10 17:02:38 2018] epoch_id: 21, batch_id: 1700, cost: 0.035499, acc: 0.968750
+[Wed Oct 10 17:02:40 2018] epoch_id: 21, batch_id: 1800, cost: 0.012515, acc: 1.000000
+[Wed Oct 10 17:02:42 2018] epoch_id: 21, batch_id: 1900, cost: 0.008550, acc: 1.000000
+[Wed Oct 10 17:02:45 2018] epoch_id: 21, batch_id: 2000, cost: 0.051551, acc: 0.992188
+[Wed Oct 10 17:02:47 2018] epoch_id: 21, batch_id: 2100, cost: 0.004980, acc: 1.000000
+[Wed Oct 10 17:02:49 2018] epoch_id: 21, batch_id: 2200, cost: 0.006854, acc: 1.000000
+[Wed Oct 10 17:02:51 2018] epoch_id: 21, batch_id: 2300, cost: 0.071025, acc: 0.968750
+[Wed Oct 10 17:02:55 2018] epoch_id: 21, batch_id: 2400, cost: 0.013599, acc: 1.000000
+[Wed Oct 10 17:02:57 2018] epoch_id: 21, batch_id: 2500, cost: 0.025085, acc: 0.992188
+[Wed Oct 10 17:02:59 2018] epoch_id: 21, batch_id: 2600, cost: 0.018276, acc: 0.984375
+[Wed Oct 10 17:03:01 2018] epoch_id: 21, batch_id: 2700, cost: 0.040565, acc: 0.984375
+[Wed Oct 10 17:03:04 2018] epoch_id: 21, batch_id: 2800, cost: 0.099454, acc: 0.968750
+[Wed Oct 10 17:03:06 2018] epoch_id: 21, batch_id: 2900, cost: 0.017812, acc: 0.992188
+[Wed Oct 10 17:03:08 2018] epoch_id: 21, batch_id: 3000, cost: 0.019825, acc: 0.992188
+[Wed Oct 10 17:03:09 2018] epoch_id: 21, train_avg_cost: 0.024180, train_avg_acc: 0.991505
+[Wed Oct 10 17:03:10 2018] epoch_id: 21, dev_cost: 1.413867, accuracy: 0.836
+[Wed Oct 10 17:03:11 2018] epoch_id: 21, test_cost: 1.380237, accuracy: 0.8353
+[Wed Oct 10 17:03:19 2018] epoch_id: 22, batch_id: 0, cost: 0.001493, acc: 1.000000
+[Wed Oct 10 17:03:21 2018] epoch_id: 22, batch_id: 100, cost: 0.017211, acc: 0.984375
+[Wed Oct 10 17:03:23 2018] epoch_id: 22, batch_id: 200, cost: 0.015626, acc: 0.992188
+[Wed Oct 10 17:03:25 2018] epoch_id: 22, batch_id: 300, cost: 0.002411, acc: 1.000000
+[Wed Oct 10 17:03:28 2018] epoch_id: 22, batch_id: 400, cost: 0.098118, acc: 0.984375
+[Wed Oct 10 17:03:30 2018] epoch_id: 22, batch_id: 500, cost: 0.031192, acc: 0.992188
+[Wed Oct 10 17:03:32 2018] epoch_id: 22, batch_id: 600, cost: 0.002122, acc: 1.000000
+[Wed Oct 10 17:03:34 2018] epoch_id: 22, batch_id: 700, cost: 0.006148, acc: 1.000000
+[Wed Oct 10 17:03:38 2018] epoch_id: 22, batch_id: 800, cost: 0.007830, acc: 1.000000
+[Wed Oct 10 17:03:40 2018] epoch_id: 22, batch_id: 900, cost: 0.009371, acc: 1.000000
+[Wed Oct 10 17:03:43 2018] epoch_id: 22, batch_id: 1000, cost: 0.024280, acc: 0.984375
+[Wed Oct 10 17:03:45 2018] epoch_id: 22, batch_id: 1100, cost: 0.067847, acc: 0.984375
+[Wed Oct 10 17:03:47 2018] epoch_id: 22, batch_id: 1200, cost: 0.024875, acc: 0.984375
+[Wed Oct 10 17:03:50 2018] epoch_id: 22, batch_id: 1300, cost: 0.004252, acc: 1.000000
+[Wed Oct 10 17:03:52 2018] epoch_id: 22, batch_id: 1400, cost: 0.014934, acc: 0.992188
+[Wed Oct 10 17:03:54 2018] epoch_id: 22, batch_id: 1500, cost: 0.008299, acc: 1.000000
+[Wed Oct 10 17:03:56 2018] epoch_id: 22, batch_id: 1600, cost: 0.007932, acc: 1.000000
+[Wed Oct 10 17:03:59 2018] epoch_id: 22, batch_id: 1700, cost: 0.007008, acc: 1.000000
+[Wed Oct 10 17:04:01 2018] epoch_id: 22, batch_id: 1800, cost: 0.028636, acc: 0.984375
+[Wed Oct 10 17:04:03 2018] epoch_id: 22, batch_id: 1900, cost: 0.012712, acc: 0.992188
+[Wed Oct 10 17:04:05 2018] epoch_id: 22, batch_id: 2000, cost: 0.027561, acc: 0.992188
+[Wed Oct 10 17:04:08 2018] epoch_id: 22, batch_id: 2100, cost: 0.017589, acc: 0.992188
+[Wed Oct 10 17:04:10 2018] epoch_id: 22, batch_id: 2200, cost: 0.016391, acc: 0.992188
+[Wed Oct 10 17:04:12 2018] epoch_id: 22, batch_id: 2300, cost: 0.042172, acc: 0.984375
+[Wed Oct 10 17:04:14 2018] epoch_id: 22, batch_id: 2400, cost: 0.024060, acc: 0.984375
+[Wed Oct 10 17:04:17 2018] epoch_id: 22, batch_id: 2500, cost: 0.014206, acc: 1.000000
+[Wed Oct 10 17:04:19 2018] epoch_id: 22, batch_id: 2600, cost: 0.028562, acc: 0.992188
+[Wed Oct 10 17:04:21 2018] epoch_id: 22, batch_id: 2700, cost: 0.013936, acc: 0.992188
+[Wed Oct 10 17:04:23 2018] epoch_id: 22, batch_id: 2800, cost: 0.023205, acc: 0.984375
+[Wed Oct 10 17:04:26 2018] epoch_id: 22, batch_id: 2900, cost: 0.031024, acc: 0.984375
+[Wed Oct 10 17:04:28 2018] epoch_id: 22, batch_id: 3000, cost: 0.004115, acc: 1.000000
+[Wed Oct 10 17:04:29 2018] epoch_id: 22, train_avg_cost: 0.022458, train_avg_acc: 0.992184
+[Wed Oct 10 17:04:30 2018] epoch_id: 22, dev_cost: 1.388674, accuracy: 0.8329
+[Wed Oct 10 17:04:31 2018] epoch_id: 22, test_cost: 1.366122, accuracy: 0.8359
+[Wed Oct 10 17:04:39 2018] epoch_id: 23, batch_id: 0, cost: 0.012273, acc: 0.992188
+[Wed Oct 10 17:04:41 2018] epoch_id: 23, batch_id: 100, cost: 0.010904, acc: 0.992188
+[Wed Oct 10 17:04:44 2018] epoch_id: 23, batch_id: 200, cost: 0.001967, acc: 1.000000
+[Wed Oct 10 17:04:46 2018] epoch_id: 23, batch_id: 300, cost: 0.006554, acc: 1.000000
+[Wed Oct 10 17:04:48 2018] epoch_id: 23, batch_id: 400, cost: 0.005179, acc: 1.000000
+[Wed Oct 10 17:04:50 2018] epoch_id: 23, batch_id: 500, cost: 0.014761, acc: 0.992188
+[Wed Oct 10 17:04:53 2018] epoch_id: 23, batch_id: 600, cost: 0.015971, acc: 0.992188
+[Wed Oct 10 17:04:55 2018] epoch_id: 23, batch_id: 700, cost: 0.058416, acc: 0.984375
+[Wed Oct 10 17:04:57 2018] epoch_id: 23, batch_id: 800, cost: 0.005064, acc: 1.000000
+[Wed Oct 10 17:04:59 2018] epoch_id: 23, batch_id: 900, cost: 0.003761, acc: 1.000000
+[Wed Oct 10 17:05:02 2018] epoch_id: 23, batch_id: 1000, cost: 0.002844, acc: 1.000000
+[Wed Oct 10 17:05:04 2018] epoch_id: 23, batch_id: 1100, cost: 0.010259, acc: 1.000000
+[Wed Oct 10 17:05:06 2018] epoch_id: 23, batch_id: 1200, cost: 0.005445, acc: 1.000000
+[Wed Oct 10 17:05:09 2018] epoch_id: 23, batch_id: 1300, cost: 0.018197, acc: 0.992188
+[Wed Oct 10 17:05:11 2018] epoch_id: 23, batch_id: 1400, cost: 0.016600, acc: 0.992188
+[Wed Oct 10 17:05:13 2018] epoch_id: 23, batch_id: 1500, cost: 0.047691, acc: 0.992188
+[Wed Oct 10 17:05:15 2018] epoch_id: 23, batch_id: 1600, cost: 0.084442, acc: 0.984375
+[Wed Oct 10 17:05:18 2018] epoch_id: 23, batch_id: 1700, cost: 0.044283, acc: 0.992188
+[Wed Oct 10 17:05:21 2018] epoch_id: 23, batch_id: 1800, cost: 0.120200, acc: 0.984375
+[Wed Oct 10 17:05:23 2018] epoch_id: 23, batch_id: 1900, cost: 0.013874, acc: 0.992188
+[Wed Oct 10 17:05:26 2018] epoch_id: 23, batch_id: 2000, cost: 0.027709, acc: 0.984375
+[Wed Oct 10 17:05:28 2018] epoch_id: 23, batch_id: 2100, cost: 0.017088, acc: 0.992188
+[Wed Oct 10 17:05:30 2018] epoch_id: 23, batch_id: 2200, cost: 0.049081, acc: 0.976562
+[Wed Oct 10 17:05:32 2018] epoch_id: 23, batch_id: 2300, cost: 0.013016, acc: 0.992188
+[Wed Oct 10 17:05:35 2018] epoch_id: 23, batch_id: 2400, cost: 0.015467, acc: 0.992188
+[Wed Oct 10 17:05:37 2018] epoch_id: 23, batch_id: 2500, cost: 0.002745, acc: 1.000000
+[Wed Oct 10 17:05:39 2018] epoch_id: 23, batch_id: 2600, cost: 0.002618, acc: 1.000000
+[Wed Oct 10 17:05:42 2018] epoch_id: 23, batch_id: 2700, cost: 0.010789, acc: 1.000000
+[Wed Oct 10 17:05:44 2018] epoch_id: 23, batch_id: 2800, cost: 0.026513, acc: 0.984375
+[Wed Oct 10 17:05:46 2018] epoch_id: 23, batch_id: 2900, cost: 0.056513, acc: 0.984375
+[Wed Oct 10 17:05:49 2018] epoch_id: 23, batch_id: 3000, cost: 0.007607, acc: 1.000000
+[Wed Oct 10 17:05:49 2018] epoch_id: 23, train_avg_cost: 0.021786, train_avg_acc: 0.992707
+[Wed Oct 10 17:05:50 2018] epoch_id: 23, dev_cost: 1.181561, accuracy: 0.8368
+[Wed Oct 10 17:05:51 2018] epoch_id: 23, test_cost: 1.209735, accuracy: 0.8339
+[Wed Oct 10 17:06:00 2018] epoch_id: 24, batch_id: 0, cost: 0.005431, acc: 1.000000
+[Wed Oct 10 17:06:02 2018] epoch_id: 24, batch_id: 100, cost: 0.017588, acc: 0.984375
+[Wed Oct 10 17:06:04 2018] epoch_id: 24, batch_id: 200, cost: 0.078571, acc: 0.976562
+[Wed Oct 10 17:06:06 2018] epoch_id: 24, batch_id: 300, cost: 0.003192, acc: 1.000000
+[Wed Oct 10 17:06:09 2018] epoch_id: 24, batch_id: 400, cost: 0.008610, acc: 1.000000
+[Wed Oct 10 17:06:11 2018] epoch_id: 24, batch_id: 500, cost: 0.010603, acc: 0.992188
+[Wed Oct 10 17:06:13 2018] epoch_id: 24, batch_id: 600, cost: 0.068159, acc: 0.984375
+[Wed Oct 10 17:06:15 2018] epoch_id: 24, batch_id: 700, cost: 0.031611, acc: 0.992188
+[Wed Oct 10 17:06:18 2018] epoch_id: 24, batch_id: 800, cost: 0.005276, acc: 1.000000
+[Wed Oct 10 17:06:20 2018] epoch_id: 24, batch_id: 900, cost: 0.019978, acc: 0.992188
+[Wed Oct 10 17:06:22 2018] epoch_id: 24, batch_id: 1000, cost: 0.061957, acc: 0.992188
+[Wed Oct 10 17:06:25 2018] epoch_id: 24, batch_id: 1100, cost: 0.015165, acc: 0.992188
+[Wed Oct 10 17:06:27 2018] epoch_id: 24, batch_id: 1200, cost: 0.052448, acc: 0.976562
+[Wed Oct 10 17:06:29 2018] epoch_id: 24, batch_id: 1300, cost: 0.003287, acc: 1.000000
+[Wed Oct 10 17:06:31 2018] epoch_id: 24, batch_id: 1400, cost: 0.027564, acc: 0.992188
+[Wed Oct 10 17:06:34 2018] epoch_id: 24, batch_id: 1500, cost: 0.002861, acc: 1.000000
+[Wed Oct 10 17:06:36 2018] epoch_id: 24, batch_id: 1600, cost: 0.022500, acc: 0.992188
+[Wed Oct 10 17:06:38 2018] epoch_id: 24, batch_id: 1700, cost: 0.041690, acc: 0.984375
+[Wed Oct 10 17:06:40 2018] epoch_id: 24, batch_id: 1800, cost: 0.016889, acc: 0.992188
+[Wed Oct 10 17:06:43 2018] epoch_id: 24, batch_id: 1900, cost: 0.026357, acc: 0.992188
+[Wed Oct 10 17:06:45 2018] epoch_id: 24, batch_id: 2000, cost: 0.035357, acc: 0.984375
+[Wed Oct 10 17:06:47 2018] epoch_id: 24, batch_id: 2100, cost: 0.070517, acc: 0.960938
+[Wed Oct 10 17:06:49 2018] epoch_id: 24, batch_id: 2200, cost: 0.021093, acc: 0.984375
+[Wed Oct 10 17:06:52 2018] epoch_id: 24, batch_id: 2300, cost: 0.003296, acc: 1.000000
+[Wed Oct 10 17:06:54 2018] epoch_id: 24, batch_id: 2400, cost: 0.002669, acc: 1.000000
+[Wed Oct 10 17:06:56 2018] epoch_id: 24, batch_id: 2500, cost: 0.047008, acc: 0.976562
+[Wed Oct 10 17:06:58 2018] epoch_id: 24, batch_id: 2600, cost: 0.015561, acc: 0.992188
+[Wed Oct 10 17:07:00 2018] epoch_id: 24, batch_id: 2700, cost: 0.074711, acc: 0.984375
+[Wed Oct 10 17:07:03 2018] epoch_id: 24, batch_id: 2800, cost: 0.021376, acc: 0.992188
+[Wed Oct 10 17:07:05 2018] epoch_id: 24, batch_id: 2900, cost: 0.013928, acc: 1.000000
+[Wed Oct 10 17:07:07 2018] epoch_id: 24, batch_id: 3000, cost: 0.019474, acc: 0.992188
+[Wed Oct 10 17:07:07 2018] epoch_id: 24, train_avg_cost: 0.020611, train_avg_acc: 0.992913
+[Wed Oct 10 17:07:08 2018] epoch_id: 24, dev_cost: 1.249092, accuracy: 0.8329
+[Wed Oct 10 17:07:09 2018] epoch_id: 24, test_cost: 1.206091, accuracy: 0.8348
+[Wed Oct 10 17:07:18 2018] epoch_id: 25, batch_id: 0, cost: 0.009832, acc: 1.000000
+[Wed Oct 10 17:07:21 2018] epoch_id: 25, batch_id: 100, cost: 0.007028, acc: 1.000000
+[Wed Oct 10 17:07:23 2018] epoch_id: 25, batch_id: 200, cost: 0.029548, acc: 0.984375
+[Wed Oct 10 17:07:25 2018] epoch_id: 25, batch_id: 300, cost: 0.001753, acc: 1.000000
+[Wed Oct 10 17:07:28 2018] epoch_id: 25, batch_id: 400, cost: 0.001457, acc: 1.000000
+[Wed Oct 10 17:07:30 2018] epoch_id: 25, batch_id: 500, cost: 0.004209, acc: 1.000000
+[Wed Oct 10 17:07:32 2018] epoch_id: 25, batch_id: 600, cost: 0.002758, acc: 1.000000
+[Wed Oct 10 17:07:35 2018] epoch_id: 25, batch_id: 700, cost: 0.039204, acc: 0.984375
+[Wed Oct 10 17:07:37 2018] epoch_id: 25, batch_id: 800, cost: 0.004454, acc: 1.000000
+[Wed Oct 10 17:07:39 2018] epoch_id: 25, batch_id: 900, cost: 0.005273, acc: 1.000000
+[Wed Oct 10 17:07:41 2018] epoch_id: 25, batch_id: 1000, cost: 0.008021, acc: 0.992188
+[Wed Oct 10 17:07:44 2018] epoch_id: 25, batch_id: 1100, cost: 0.037441, acc: 0.976562
+[Wed Oct 10 17:07:46 2018] epoch_id: 25, batch_id: 1200, cost: 0.011153, acc: 1.000000
+[Wed Oct 10 17:07:48 2018] epoch_id: 25, batch_id: 1300, cost: 0.064342, acc: 0.992188
+[Wed Oct 10 17:07:50 2018] epoch_id: 25, batch_id: 1400, cost: 0.036600, acc: 0.992188
+[Wed Oct 10 17:07:53 2018] epoch_id: 25, batch_id: 1500, cost: 0.046661, acc: 0.992188
+[Wed Oct 10 17:07:55 2018] epoch_id: 25, batch_id: 1600, cost: 0.015580, acc: 1.000000
+[Wed Oct 10 17:07:57 2018] epoch_id: 25, batch_id: 1700, cost: 0.008311, acc: 1.000000
+[Wed Oct 10 17:07:59 2018] epoch_id: 25, batch_id: 1800, cost: 0.004560, acc: 1.000000
+[Wed Oct 10 17:08:02 2018] epoch_id: 25, batch_id: 1900, cost: 0.012200, acc: 1.000000
+[Wed Oct 10 17:08:04 2018] epoch_id: 25, batch_id: 2000, cost: 0.006555, acc: 1.000000
+[Wed Oct 10 17:08:06 2018] epoch_id: 25, batch_id: 2100, cost: 0.028259, acc: 0.992188
+[Wed Oct 10 17:08:08 2018] epoch_id: 25, batch_id: 2200, cost: 0.003801, acc: 1.000000
+[Wed Oct 10 17:08:11 2018] epoch_id: 25, batch_id: 2300, cost: 0.004532, acc: 1.000000
+[Wed Oct 10 17:08:13 2018] epoch_id: 25, batch_id: 2400, cost: 0.008551, acc: 1.000000
+[Wed Oct 10 17:08:15 2018] epoch_id: 25, batch_id: 2500, cost: 0.013781, acc: 0.992188
+[Wed Oct 10 17:08:17 2018] epoch_id: 25, batch_id: 2600, cost: 0.024098, acc: 0.992188
+[Wed Oct 10 17:08:21 2018] epoch_id: 25, batch_id: 2700, cost: 0.009117, acc: 0.992188
+[Wed Oct 10 17:08:23 2018] epoch_id: 25, batch_id: 2800, cost: 0.032231, acc: 0.984375
+[Wed Oct 10 17:08:25 2018] epoch_id: 25, batch_id: 2900, cost: 0.004502, acc: 1.000000
+[Wed Oct 10 17:08:28 2018] epoch_id: 25, batch_id: 3000, cost: 0.006727, acc: 1.000000
+[Wed Oct 10 17:08:28 2018] epoch_id: 25, train_avg_cost: 0.020529, train_avg_acc: 0.993019
+[Wed Oct 10 17:08:29 2018] epoch_id: 25, dev_cost: 1.238637, accuracy: 0.8323
+[Wed Oct 10 17:08:30 2018] epoch_id: 25, test_cost: 1.213099, accuracy: 0.8345
+[Wed Oct 10 17:08:38 2018] epoch_id: 26, batch_id: 0, cost: 0.040923, acc: 0.992188
+[Wed Oct 10 17:08:40 2018] epoch_id: 26, batch_id: 100, cost: 0.003892, acc: 1.000000
+[Wed Oct 10 17:08:43 2018] epoch_id: 26, batch_id: 200, cost: 0.005719, acc: 1.000000
+[Wed Oct 10 17:08:45 2018] epoch_id: 26, batch_id: 300, cost: 0.011791, acc: 1.000000
+[Wed Oct 10 17:08:47 2018] epoch_id: 26, batch_id: 400, cost: 0.015297, acc: 0.992188
+[Wed Oct 10 17:08:49 2018] epoch_id: 26, batch_id: 500, cost: 0.067796, acc: 0.984375
+[Wed Oct 10 17:08:52 2018] epoch_id: 26, batch_id: 600, cost: 0.041215, acc: 0.992188
+[Wed Oct 10 17:08:54 2018] epoch_id: 26, batch_id: 700, cost: 0.017786, acc: 0.984375
+[Wed Oct 10 17:08:56 2018] epoch_id: 26, batch_id: 800, cost: 0.033173, acc: 0.992188
+[Wed Oct 10 17:08:59 2018] epoch_id: 26, batch_id: 900, cost: 0.007282, acc: 0.992188
+[Wed Oct 10 17:09:01 2018] epoch_id: 26, batch_id: 1000, cost: 0.028577, acc: 0.992188
+[Wed Oct 10 17:09:03 2018] epoch_id: 26, batch_id: 1100, cost: 0.017994, acc: 0.992188
+[Wed Oct 10 17:09:05 2018] epoch_id: 26, batch_id: 1200, cost: 0.005319, acc: 1.000000
+[Wed Oct 10 17:09:08 2018] epoch_id: 26, batch_id: 1300, cost: 0.030209, acc: 0.992188
+[Wed Oct 10 17:09:10 2018] epoch_id: 26, batch_id: 1400, cost: 0.012992, acc: 0.992188
+[Wed Oct 10 17:09:12 2018] epoch_id: 26, batch_id: 1500, cost: 0.014228, acc: 0.992188
+[Wed Oct 10 17:09:15 2018] epoch_id: 26, batch_id: 1600, cost: 0.008148, acc: 1.000000
+[Wed Oct 10 17:09:17 2018] epoch_id: 26, batch_id: 1700, cost: 0.003299, acc: 1.000000
+[Wed Oct 10 17:09:19 2018] epoch_id: 26, batch_id: 1800, cost: 0.026134, acc: 0.992188
+[Wed Oct 10 17:09:22 2018] epoch_id: 26, batch_id: 1900, cost: 0.016610, acc: 1.000000
+[Wed Oct 10 17:09:24 2018] epoch_id: 26, batch_id: 2000, cost: 0.019105, acc: 0.992188
+[Wed Oct 10 17:09:26 2018] epoch_id: 26, batch_id: 2100, cost: 0.004593, acc: 1.000000
+[Wed Oct 10 17:09:28 2018] epoch_id: 26, batch_id: 2200, cost: 0.036595, acc: 0.992188
+[Wed Oct 10 17:09:32 2018] epoch_id: 26, batch_id: 2300, cost: 0.003857, acc: 1.000000
+[Wed Oct 10 17:09:34 2018] epoch_id: 26, batch_id: 2400, cost: 0.002700, acc: 1.000000
+[Wed Oct 10 17:09:36 2018] epoch_id: 26, batch_id: 2500, cost: 0.002269, acc: 1.000000
+[Wed Oct 10 17:09:38 2018] epoch_id: 26, batch_id: 2600, cost: 0.022186, acc: 0.992188
+[Wed Oct 10 17:09:41 2018] epoch_id: 26, batch_id: 2700, cost: 0.035991, acc: 0.976562
+[Wed Oct 10 17:09:43 2018] epoch_id: 26, batch_id: 2800, cost: 0.005430, acc: 1.000000
+[Wed Oct 10 17:09:45 2018] epoch_id: 26, batch_id: 2900, cost: 0.017578, acc: 0.992188
+[Wed Oct 10 17:09:47 2018] epoch_id: 26, batch_id: 3000, cost: 0.030596, acc: 0.984375
+[Wed Oct 10 17:09:48 2018] epoch_id: 26, train_avg_cost: 0.019528, train_avg_acc: 0.993425
+[Wed Oct 10 17:09:49 2018] epoch_id: 26, dev_cost: 1.452644, accuracy: 0.8334
+[Wed Oct 10 17:09:50 2018] epoch_id: 26, test_cost: 1.449995, accuracy: 0.8329
+[Wed Oct 10 17:09:58 2018] epoch_id: 27, batch_id: 0, cost: 0.006640, acc: 1.000000
+[Wed Oct 10 17:10:00 2018] epoch_id: 27, batch_id: 100, cost: 0.001101, acc: 1.000000
+[Wed Oct 10 17:10:02 2018] epoch_id: 27, batch_id: 200, cost: 0.019329, acc: 0.992188
+[Wed Oct 10 17:10:05 2018] epoch_id: 27, batch_id: 300, cost: 0.002996, acc: 1.000000
+[Wed Oct 10 17:10:07 2018] epoch_id: 27, batch_id: 400, cost: 0.002077, acc: 1.000000
+[Wed Oct 10 17:10:09 2018] epoch_id: 27, batch_id: 500, cost: 0.007058, acc: 1.000000
+[Wed Oct 10 17:10:11 2018] epoch_id: 27, batch_id: 600, cost: 0.002119, acc: 1.000000
+[Wed Oct 10 17:10:14 2018] epoch_id: 27, batch_id: 700, cost: 0.039876, acc: 0.984375
+[Wed Oct 10 17:10:16 2018] epoch_id: 27, batch_id: 800, cost: 0.010680, acc: 1.000000
+[Wed Oct 10 17:10:19 2018] epoch_id: 27, batch_id: 900, cost: 0.004508, acc: 1.000000
+[Wed Oct 10 17:10:21 2018] epoch_id: 27, batch_id: 1000, cost: 0.029683, acc: 0.984375
+[Wed Oct 10 17:10:24 2018] epoch_id: 27, batch_id: 1100, cost: 0.011985, acc: 1.000000
+[Wed Oct 10 17:10:26 2018] epoch_id: 27, batch_id: 1200, cost: 0.004091, acc: 1.000000
+[Wed Oct 10 17:10:28 2018] epoch_id: 27, batch_id: 1300, cost: 0.028585, acc: 0.984375
+[Wed Oct 10 17:10:30 2018] epoch_id: 27, batch_id: 1400, cost: 0.001462, acc: 1.000000
+[Wed Oct 10 17:10:33 2018] epoch_id: 27, batch_id: 1500, cost: 0.033079, acc: 0.992188
+[Wed Oct 10 17:10:35 2018] epoch_id: 27, batch_id: 1600, cost: 0.017679, acc: 0.992188
+[Wed Oct 10 17:10:37 2018] epoch_id: 27, batch_id: 1700, cost: 0.000921, acc: 1.000000
+[Wed Oct 10 17:10:39 2018] epoch_id: 27, batch_id: 1800, cost: 0.029850, acc: 0.984375
+[Wed Oct 10 17:10:42 2018] epoch_id: 27, batch_id: 1900, cost: 0.005679, acc: 1.000000
+[Wed Oct 10 17:10:44 2018] epoch_id: 27, batch_id: 2000, cost: 0.007635, acc: 0.992188
+[Wed Oct 10 17:10:46 2018] epoch_id: 27, batch_id: 2100, cost: 0.056935, acc: 0.984375
+[Wed Oct 10 17:10:48 2018] epoch_id: 27, batch_id: 2200, cost: 0.014361, acc: 1.000000
+[Wed Oct 10 17:10:51 2018] epoch_id: 27, batch_id: 2300, cost: 0.040282, acc: 0.984375
+[Wed Oct 10 17:10:53 2018] epoch_id: 27, batch_id: 2400, cost: 0.004073, acc: 1.000000
+[Wed Oct 10 17:10:55 2018] epoch_id: 27, batch_id: 2500, cost: 0.013922, acc: 0.984375
+[Wed Oct 10 17:10:57 2018] epoch_id: 27, batch_id: 2600, cost: 0.018309, acc: 0.992188
+[Wed Oct 10 17:10:59 2018] epoch_id: 27, batch_id: 2700, cost: 0.011584, acc: 0.992188
+[Wed Oct 10 17:11:02 2018] epoch_id: 27, batch_id: 2800, cost: 0.018637, acc: 0.992188
+[Wed Oct 10 17:11:04 2018] epoch_id: 27, batch_id: 2900, cost: 0.013617, acc: 0.992188
+[Wed Oct 10 17:11:06 2018] epoch_id: 27, batch_id: 3000, cost: 0.079333, acc: 0.976562
+[Wed Oct 10 17:11:07 2018] epoch_id: 27, train_avg_cost: 0.018039, train_avg_acc: 0.993701
+[Wed Oct 10 17:11:08 2018] epoch_id: 27, dev_cost: 1.463991, accuracy: 0.8333
+[Wed Oct 10 17:11:09 2018] epoch_id: 27, test_cost: 1.450415, accuracy: 0.8334
+[Wed Oct 10 17:11:17 2018] epoch_id: 28, batch_id: 0, cost: 0.023539, acc: 0.984375
+[Wed Oct 10 17:11:20 2018] epoch_id: 28, batch_id: 100, cost: 0.005577, acc: 1.000000
+[Wed Oct 10 17:11:22 2018] epoch_id: 28, batch_id: 200, cost: 0.001478, acc: 1.000000
+[Wed Oct 10 17:11:24 2018] epoch_id: 28, batch_id: 300, cost: 0.005870, acc: 1.000000
+[Wed Oct 10 17:11:26 2018] epoch_id: 28, batch_id: 400, cost: 0.021292, acc: 0.992188
+[Wed Oct 10 17:11:29 2018] epoch_id: 28, batch_id: 500, cost: 0.032081, acc: 0.984375
+[Wed Oct 10 17:11:31 2018] epoch_id: 28, batch_id: 600, cost: 0.004568, acc: 1.000000
+[Wed Oct 10 17:11:33 2018] epoch_id: 28, batch_id: 700, cost: 0.006552, acc: 1.000000
+[Wed Oct 10 17:11:35 2018] epoch_id: 28, batch_id: 800, cost: 0.012579, acc: 0.992188
+[Wed Oct 10 17:11:38 2018] epoch_id: 28, batch_id: 900, cost: 0.004214, acc: 1.000000
+[Wed Oct 10 17:11:40 2018] epoch_id: 28, batch_id: 1000, cost: 0.023843, acc: 0.984375
+[Wed Oct 10 17:11:42 2018] epoch_id: 28, batch_id: 1100, cost: 0.017869, acc: 0.992188
+[Wed Oct 10 17:11:44 2018] epoch_id: 28, batch_id: 1200, cost: 0.045617, acc: 0.984375
+[Wed Oct 10 17:11:46 2018] epoch_id: 28, batch_id: 1300, cost: 0.012739, acc: 0.992188
+[Wed Oct 10 17:11:49 2018] epoch_id: 28, batch_id: 1400, cost: 0.020053, acc: 0.992188
+[Wed Oct 10 17:11:51 2018] epoch_id: 28, batch_id: 1500, cost: 0.006956, acc: 1.000000
+[Wed Oct 10 17:11:53 2018] epoch_id: 28, batch_id: 1600, cost: 0.022830, acc: 0.984375
+[Wed Oct 10 17:11:55 2018] epoch_id: 28, batch_id: 1700, cost: 0.008924, acc: 1.000000
+[Wed Oct 10 17:11:58 2018] epoch_id: 28, batch_id: 1800, cost: 0.013902, acc: 0.992188
+[Wed Oct 10 17:12:01 2018] epoch_id: 28, batch_id: 1900, cost: 0.026418, acc: 0.984375
+[Wed Oct 10 17:12:03 2018] epoch_id: 28, batch_id: 2000, cost: 0.006809, acc: 1.000000
+[Wed Oct 10 17:12:05 2018] epoch_id: 28, batch_id: 2100, cost: 0.041039, acc: 0.984375
+[Wed Oct 10 17:12:08 2018] epoch_id: 28, batch_id: 2200, cost: 0.023235, acc: 0.992188
+[Wed Oct 10 17:12:10 2018] epoch_id: 28, batch_id: 2300, cost: 0.057685, acc: 0.976562
+[Wed Oct 10 17:12:12 2018] epoch_id: 28, batch_id: 2400, cost: 0.012688, acc: 1.000000
+[Wed Oct 10 17:12:14 2018] epoch_id: 28, batch_id: 2500, cost: 0.010697, acc: 0.992188
+[Wed Oct 10 17:12:16 2018] epoch_id: 28, batch_id: 2600, cost: 0.025213, acc: 0.992188
+[Wed Oct 10 17:12:19 2018] epoch_id: 28, batch_id: 2700, cost: 0.011269, acc: 0.992188
+[Wed Oct 10 17:12:21 2018] epoch_id: 28, batch_id: 2800, cost: 0.001141, acc: 1.000000
+[Wed Oct 10 17:12:23 2018] epoch_id: 28, batch_id: 2900, cost: 0.049410, acc: 0.984375
+[Wed Oct 10 17:12:25 2018] epoch_id: 28, batch_id: 3000, cost: 0.019739, acc: 0.992188
+[Wed Oct 10 17:12:26 2018] epoch_id: 28, train_avg_cost: 0.018105, train_avg_acc: 0.993756
+[Wed Oct 10 17:12:27 2018] epoch_id: 28, dev_cost: 1.200318, accuracy: 0.8345
+[Wed Oct 10 17:12:28 2018] epoch_id: 28, test_cost: 1.228304, accuracy: 0.8308
+[Wed Oct 10 17:12:36 2018] epoch_id: 29, batch_id: 0, cost: 0.004694, acc: 1.000000
+[Wed Oct 10 17:12:39 2018] epoch_id: 29, batch_id: 100, cost: 0.008528, acc: 0.992188
+[Wed Oct 10 17:12:41 2018] epoch_id: 29, batch_id: 200, cost: 0.006778, acc: 0.992188
+[Wed Oct 10 17:12:43 2018] epoch_id: 29, batch_id: 300, cost: 0.026610, acc: 0.992188
+[Wed Oct 10 17:12:45 2018] epoch_id: 29, batch_id: 400, cost: 0.008479, acc: 1.000000
+[Wed Oct 10 17:12:47 2018] epoch_id: 29, batch_id: 500, cost: 0.021705, acc: 0.984375
+[Wed Oct 10 17:12:50 2018] epoch_id: 29, batch_id: 600, cost: 0.010583, acc: 0.992188
+[Wed Oct 10 17:12:52 2018] epoch_id: 29, batch_id: 700, cost: 0.056105, acc: 0.992188
+[Wed Oct 10 17:12:54 2018] epoch_id: 29, batch_id: 800, cost: 0.000675, acc: 1.000000
+[Wed Oct 10 17:12:56 2018] epoch_id: 29, batch_id: 900, cost: 0.011277, acc: 1.000000
+[Wed Oct 10 17:12:58 2018] epoch_id: 29, batch_id: 1000, cost: 0.006004, acc: 1.000000
+[Wed Oct 10 17:13:01 2018] epoch_id: 29, batch_id: 1100, cost: 0.000914, acc: 1.000000
+[Wed Oct 10 17:13:03 2018] epoch_id: 29, batch_id: 1200, cost: 0.001097, acc: 1.000000
+[Wed Oct 10 17:13:05 2018] epoch_id: 29, batch_id: 1300, cost: 0.002556, acc: 1.000000
+[Wed Oct 10 17:13:07 2018] epoch_id: 29, batch_id: 1400, cost: 0.005061, acc: 1.000000
+[Wed Oct 10 17:13:10 2018] epoch_id: 29, batch_id: 1500, cost: 0.002417, acc: 1.000000
+[Wed Oct 10 17:13:12 2018] epoch_id: 29, batch_id: 1600, cost: 0.001037, acc: 1.000000
+[Wed Oct 10 17:13:14 2018] epoch_id: 29, batch_id: 1700, cost: 0.003415, acc: 1.000000
+[Wed Oct 10 17:13:16 2018] epoch_id: 29, batch_id: 1800, cost: 0.033230, acc: 0.984375
+[Wed Oct 10 17:13:19 2018] epoch_id: 29, batch_id: 1900, cost: 0.002914, acc: 1.000000
+[Wed Oct 10 17:13:21 2018] epoch_id: 29, batch_id: 2000, cost: 0.036463, acc: 0.984375
+[Wed Oct 10 17:13:23 2018] epoch_id: 29, batch_id: 2100, cost: 0.067978, acc: 0.976562
+[Wed Oct 10 17:13:25 2018] epoch_id: 29, batch_id: 2200, cost: 0.028088, acc: 0.992188
+[Wed Oct 10 17:13:28 2018] epoch_id: 29, batch_id: 2300, cost: 0.013688, acc: 0.992188
+[Wed Oct 10 17:13:30 2018] epoch_id: 29, batch_id: 2400, cost: 0.000238, acc: 1.000000
+[Wed Oct 10 17:13:32 2018] epoch_id: 29, batch_id: 2500, cost: 0.006287, acc: 1.000000
+[Wed Oct 10 17:13:35 2018] epoch_id: 29, batch_id: 2600, cost: 0.058838, acc: 0.992188
+[Wed Oct 10 17:13:37 2018] epoch_id: 29, batch_id: 2700, cost: 0.013440, acc: 0.992188
+[Wed Oct 10 17:13:39 2018] epoch_id: 29, batch_id: 2800, cost: 0.002577, acc: 1.000000
+[Wed Oct 10 17:13:41 2018] epoch_id: 29, batch_id: 2900, cost: 0.020076, acc: 0.992188
+[Wed Oct 10 17:13:43 2018] epoch_id: 29, batch_id: 3000, cost: 0.025126, acc: 0.992188
+[Wed Oct 10 17:13:44 2018] epoch_id: 29, train_avg_cost: 0.017397, train_avg_acc: 0.994107
+[Wed Oct 10 17:13:45 2018] epoch_id: 29, dev_cost: 1.314838, accuracy: 0.8304
+[Wed Oct 10 17:13:46 2018] epoch_id: 29, test_cost: 1.349980, accuracy: 0.8298
+[Wed Oct 10 17:13:55 2018] epoch_id: 30, batch_id: 0, cost: 0.063661, acc: 0.984375
+[Wed Oct 10 17:13:57 2018] epoch_id: 30, batch_id: 100, cost: 0.005445, acc: 1.000000
+[Wed Oct 10 17:13:59 2018] epoch_id: 30, batch_id: 200, cost: 0.025451, acc: 0.984375
+[Wed Oct 10 17:14:01 2018] epoch_id: 30, batch_id: 300, cost: 0.019455, acc: 0.992188
+[Wed Oct 10 17:14:04 2018] epoch_id: 30, batch_id: 400, cost: 0.000182, acc: 1.000000
+[Wed Oct 10 17:14:06 2018] epoch_id: 30, batch_id: 500, cost: 0.036089, acc: 0.984375
+[Wed Oct 10 17:14:08 2018] epoch_id: 30, batch_id: 600, cost: 0.003895, acc: 1.000000
+[Wed Oct 10 17:14:10 2018] epoch_id: 30, batch_id: 700, cost: 0.012125, acc: 0.992188
+[Wed Oct 10 17:14:13 2018] epoch_id: 30, batch_id: 800, cost: 0.007463, acc: 1.000000
+[Wed Oct 10 17:14:15 2018] epoch_id: 30, batch_id: 900, cost: 0.043093, acc: 0.992188
+[Wed Oct 10 17:14:17 2018] epoch_id: 30, batch_id: 1000, cost: 0.023025, acc: 0.992188
+[Wed Oct 10 17:14:20 2018] epoch_id: 30, batch_id: 1100, cost: 0.008640, acc: 0.992188
+[Wed Oct 10 17:14:22 2018] epoch_id: 30, batch_id: 1200, cost: 0.023361, acc: 0.984375
+[Wed Oct 10 17:14:24 2018] epoch_id: 30, batch_id: 1300, cost: 0.003226, acc: 1.000000
+[Wed Oct 10 17:14:27 2018] epoch_id: 30, batch_id: 1400, cost: 0.010225, acc: 0.992188
+[Wed Oct 10 17:14:29 2018] epoch_id: 30, batch_id: 1500, cost: 0.009733, acc: 1.000000
+[Wed Oct 10 17:14:31 2018] epoch_id: 30, batch_id: 1600, cost: 0.014048, acc: 0.992188
+[Wed Oct 10 17:14:34 2018] epoch_id: 30, batch_id: 1700, cost: 0.008200, acc: 1.000000
+[Wed Oct 10 17:14:36 2018] epoch_id: 30, batch_id: 1800, cost: 0.035217, acc: 0.992188
+[Wed Oct 10 17:14:38 2018] epoch_id: 30, batch_id: 1900, cost: 0.002707, acc: 1.000000
+[Wed Oct 10 17:14:40 2018] epoch_id: 30, batch_id: 2000, cost: 0.028292, acc: 0.984375
+[Wed Oct 10 17:14:43 2018] epoch_id: 30, batch_id: 2100, cost: 0.003164, acc: 1.000000
+[Wed Oct 10 17:14:45 2018] epoch_id: 30, batch_id: 2200, cost: 0.014421, acc: 0.992188
+[Wed Oct 10 17:14:47 2018] epoch_id: 30, batch_id: 2300, cost: 0.001986, acc: 1.000000
+[Wed Oct 10 17:14:49 2018] epoch_id: 30, batch_id: 2400, cost: 0.038462, acc: 0.992188
+[Wed Oct 10 17:14:52 2018] epoch_id: 30, batch_id: 2500, cost: 0.003580, acc: 1.000000
+[Wed Oct 10 17:14:54 2018] epoch_id: 30, batch_id: 2600, cost: 0.061259, acc: 0.984375
+[Wed Oct 10 17:14:56 2018] epoch_id: 30, batch_id: 2700, cost: 0.042758, acc: 0.992188
+[Wed Oct 10 17:14:59 2018] epoch_id: 30, batch_id: 2800, cost: 0.012991, acc: 0.992188
+[Wed Oct 10 17:15:02 2018] epoch_id: 30, batch_id: 2900, cost: 0.021263, acc: 0.992188
+[Wed Oct 10 17:15:04 2018] epoch_id: 30, batch_id: 3000, cost: 0.046058, acc: 0.992188
+[Wed Oct 10 17:15:05 2018] epoch_id: 30, train_avg_cost: 0.016908, train_avg_acc: 0.994391
+[Wed Oct 10 17:15:06 2018] epoch_id: 30, dev_cost: 1.214737, accuracy: 0.8343
+[Wed Oct 10 17:15:07 2018] epoch_id: 30, test_cost: 1.247275, accuracy: 0.828
+[Wed Oct 10 17:15:15 2018] epoch_id: 31, batch_id: 0, cost: 0.019613, acc: 0.992188
+[Wed Oct 10 17:15:17 2018] epoch_id: 31, batch_id: 100, cost: 0.048000, acc: 0.984375
+[Wed Oct 10 17:15:19 2018] epoch_id: 31, batch_id: 200, cost: 0.038604, acc: 0.992188
+[Wed Oct 10 17:15:21 2018] epoch_id: 31, batch_id: 300, cost: 0.003548, acc: 1.000000
+[Wed Oct 10 17:15:24 2018] epoch_id: 31, batch_id: 400, cost: 0.001539, acc: 1.000000
+[Wed Oct 10 17:15:26 2018] epoch_id: 31, batch_id: 500, cost: 0.034219, acc: 0.992188
+[Wed Oct 10 17:15:28 2018] epoch_id: 31, batch_id: 600, cost: 0.005696, acc: 1.000000
+[Wed Oct 10 17:15:31 2018] epoch_id: 31, batch_id: 700, cost: 0.012590, acc: 0.992188
+[Wed Oct 10 17:15:33 2018] epoch_id: 31, batch_id: 800, cost: 0.010021, acc: 0.992188
+[Wed Oct 10 17:15:35 2018] epoch_id: 31, batch_id: 900, cost: 0.004838, acc: 1.000000
+[Wed Oct 10 17:15:38 2018] epoch_id: 31, batch_id: 1000, cost: 0.006327, acc: 1.000000
+[Wed Oct 10 17:15:40 2018] epoch_id: 31, batch_id: 1100, cost: 0.019881, acc: 0.992188
+[Wed Oct 10 17:15:42 2018] epoch_id: 31, batch_id: 1200, cost: 0.006641, acc: 1.000000
+[Wed Oct 10 17:15:44 2018] epoch_id: 31, batch_id: 1300, cost: 0.014323, acc: 0.992188
+[Wed Oct 10 17:15:47 2018] epoch_id: 31, batch_id: 1400, cost: 0.008565, acc: 1.000000
+[Wed Oct 10 17:15:49 2018] epoch_id: 31, batch_id: 1500, cost: 0.003106, acc: 1.000000
+[Wed Oct 10 17:15:51 2018] epoch_id: 31, batch_id: 1600, cost: 0.023656, acc: 0.992188
+[Wed Oct 10 17:15:53 2018] epoch_id: 31, batch_id: 1700, cost: 0.014398, acc: 1.000000
+[Wed Oct 10 17:15:56 2018] epoch_id: 31, batch_id: 1800, cost: 0.005019, acc: 1.000000
+[Wed Oct 10 17:15:58 2018] epoch_id: 31, batch_id: 1900, cost: 0.042051, acc: 0.984375
+[Wed Oct 10 17:16:00 2018] epoch_id: 31, batch_id: 2000, cost: 0.005070, acc: 1.000000
+[Wed Oct 10 17:16:03 2018] epoch_id: 31, batch_id: 2100, cost: 0.071147, acc: 0.984375
+[Wed Oct 10 17:16:05 2018] epoch_id: 31, batch_id: 2200, cost: 0.004077, acc: 1.000000
+[Wed Oct 10 17:16:07 2018] epoch_id: 31, batch_id: 2300, cost: 0.000753, acc: 1.000000
+[Wed Oct 10 17:16:11 2018] epoch_id: 31, batch_id: 2400, cost: 0.007293, acc: 1.000000
+[Wed Oct 10 17:16:13 2018] epoch_id: 31, batch_id: 2500, cost: 0.020403, acc: 0.992188
+[Wed Oct 10 17:16:15 2018] epoch_id: 31, batch_id: 2600, cost: 0.002491, acc: 1.000000
+[Wed Oct 10 17:16:17 2018] epoch_id: 31, batch_id: 2700, cost: 0.001376, acc: 1.000000
+[Wed Oct 10 17:16:20 2018] epoch_id: 31, batch_id: 2800, cost: 0.006589, acc: 1.000000
+[Wed Oct 10 17:16:22 2018] epoch_id: 31, batch_id: 2900, cost: 0.009986, acc: 1.000000
+[Wed Oct 10 17:16:24 2018] epoch_id: 31, batch_id: 3000, cost: 0.004628, acc: 1.000000
+[Wed Oct 10 17:16:25 2018] epoch_id: 31, train_avg_cost: 0.016863, train_avg_acc: 0.994502
+[Wed Oct 10 17:16:26 2018] epoch_id: 31, dev_cost: 1.237226, accuracy: 0.8348
+[Wed Oct 10 17:16:27 2018] epoch_id: 31, test_cost: 1.256692, accuracy: 0.8327
+[Wed Oct 10 17:16:35 2018] epoch_id: 32, batch_id: 0, cost: 0.001936, acc: 1.000000
+[Wed Oct 10 17:16:37 2018] epoch_id: 32, batch_id: 100, cost: 0.002628, acc: 1.000000
+[Wed Oct 10 17:16:40 2018] epoch_id: 32, batch_id: 200, cost: 0.006948, acc: 1.000000
+[Wed Oct 10 17:16:42 2018] epoch_id: 32, batch_id: 300, cost: 0.001289, acc: 1.000000
+[Wed Oct 10 17:16:44 2018] epoch_id: 32, batch_id: 400, cost: 0.016850, acc: 1.000000
+[Wed Oct 10 17:16:46 2018] epoch_id: 32, batch_id: 500, cost: 0.001709, acc: 1.000000
+[Wed Oct 10 17:16:49 2018] epoch_id: 32, batch_id: 600, cost: 0.000500, acc: 1.000000
+[Wed Oct 10 17:16:51 2018] epoch_id: 32, batch_id: 700, cost: 0.026876, acc: 0.992188
+[Wed Oct 10 17:16:54 2018] epoch_id: 32, batch_id: 800, cost: 0.032499, acc: 0.992188
+[Wed Oct 10 17:16:56 2018] epoch_id: 32, batch_id: 900, cost: 0.008563, acc: 1.000000
+[Wed Oct 10 17:16:59 2018] epoch_id: 32, batch_id: 1000, cost: 0.033638, acc: 0.992188
+[Wed Oct 10 17:17:01 2018] epoch_id: 32, batch_id: 1100, cost: 0.021626, acc: 0.992188
+[Wed Oct 10 17:17:03 2018] epoch_id: 32, batch_id: 1200, cost: 0.035490, acc: 0.984375
+[Wed Oct 10 17:17:05 2018] epoch_id: 32, batch_id: 1300, cost: 0.064303, acc: 0.992188
+[Wed Oct 10 17:17:08 2018] epoch_id: 32, batch_id: 1400, cost: 0.000839, acc: 1.000000
+[Wed Oct 10 17:17:10 2018] epoch_id: 32, batch_id: 1500, cost: 0.014770, acc: 0.992188
+[Wed Oct 10 17:17:12 2018] epoch_id: 32, batch_id: 1600, cost: 0.067803, acc: 0.992188
+[Wed Oct 10 17:17:14 2018] epoch_id: 32, batch_id: 1700, cost: 0.001507, acc: 1.000000
+[Wed Oct 10 17:17:17 2018] epoch_id: 32, batch_id: 1800, cost: 0.039594, acc: 0.984375
+[Wed Oct 10 17:17:19 2018] epoch_id: 32, batch_id: 1900, cost: 0.016198, acc: 0.992188
+[Wed Oct 10 17:17:21 2018] epoch_id: 32, batch_id: 2000, cost: 0.027783, acc: 0.984375
+[Wed Oct 10 17:17:24 2018] epoch_id: 32, batch_id: 2100, cost: 0.010040, acc: 0.992188
+[Wed Oct 10 17:17:26 2018] epoch_id: 32, batch_id: 2200, cost: 0.043833, acc: 0.992188
+[Wed Oct 10 17:17:28 2018] epoch_id: 32, batch_id: 2300, cost: 0.012850, acc: 0.992188
+[Wed Oct 10 17:17:31 2018] epoch_id: 32, batch_id: 2400, cost: 0.010643, acc: 1.000000
+[Wed Oct 10 17:17:33 2018] epoch_id: 32, batch_id: 2500, cost: 0.013513, acc: 0.992188
+[Wed Oct 10 17:17:35 2018] epoch_id: 32, batch_id: 2600, cost: 0.021498, acc: 0.984375
+[Wed Oct 10 17:17:38 2018] epoch_id: 32, batch_id: 2700, cost: 0.048091, acc: 0.984375
+[Wed Oct 10 17:17:40 2018] epoch_id: 32, batch_id: 2800, cost: 0.054710, acc: 0.984375
+[Wed Oct 10 17:17:42 2018] epoch_id: 32, batch_id: 2900, cost: 0.028200, acc: 0.992188
+[Wed Oct 10 17:17:44 2018] epoch_id: 32, batch_id: 3000, cost: 0.052160, acc: 0.992188
+[Wed Oct 10 17:17:45 2018] epoch_id: 32, train_avg_cost: 0.016115, train_avg_acc: 0.994599
+[Wed Oct 10 17:17:46 2018] epoch_id: 32, dev_cost: 1.182178, accuracy: 0.8359
+[Wed Oct 10 17:17:47 2018] epoch_id: 32, test_cost: 1.183695, accuracy: 0.8297
+[Wed Oct 10 17:17:55 2018] epoch_id: 33, batch_id: 0, cost: 0.002170, acc: 1.000000
+[Wed Oct 10 17:17:58 2018] epoch_id: 33, batch_id: 100, cost: 0.000724, acc: 1.000000
+[Wed Oct 10 17:18:00 2018] epoch_id: 33, batch_id: 200, cost: 0.102036, acc: 0.968750
+[Wed Oct 10 17:18:02 2018] epoch_id: 33, batch_id: 300, cost: 0.006967, acc: 1.000000
+[Wed Oct 10 17:18:04 2018] epoch_id: 33, batch_id: 400, cost: 0.004401, acc: 1.000000
+[Wed Oct 10 17:18:07 2018] epoch_id: 33, batch_id: 500, cost: 0.006693, acc: 1.000000
+[Wed Oct 10 17:18:09 2018] epoch_id: 33, batch_id: 600, cost: 0.002759, acc: 1.000000
+[Wed Oct 10 17:18:11 2018] epoch_id: 33, batch_id: 700, cost: 0.000587, acc: 1.000000
+[Wed Oct 10 17:18:13 2018] epoch_id: 33, batch_id: 800, cost: 0.006432, acc: 1.000000
+[Wed Oct 10 17:18:16 2018] epoch_id: 33, batch_id: 900, cost: 0.043751, acc: 0.984375
+[Wed Oct 10 17:18:18 2018] epoch_id: 33, batch_id: 1000, cost: 0.006652, acc: 1.000000
+[Wed Oct 10 17:18:20 2018] epoch_id: 33, batch_id: 1100, cost: 0.008419, acc: 1.000000
+[Wed Oct 10 17:18:23 2018] epoch_id: 33, batch_id: 1200, cost: 0.012309, acc: 0.992188
+[Wed Oct 10 17:18:25 2018] epoch_id: 33, batch_id: 1300, cost: 0.023884, acc: 0.984375
+[Wed Oct 10 17:18:27 2018] epoch_id: 33, batch_id: 1400, cost: 0.011711, acc: 0.992188
+[Wed Oct 10 17:18:29 2018] epoch_id: 33, batch_id: 1500, cost: 0.005948, acc: 1.000000
+[Wed Oct 10 17:18:32 2018] epoch_id: 33, batch_id: 1600, cost: 0.014363, acc: 0.992188
+[Wed Oct 10 17:18:34 2018] epoch_id: 33, batch_id: 1700, cost: 0.000291, acc: 1.000000
+[Wed Oct 10 17:18:37 2018] epoch_id: 33, batch_id: 1800, cost: 0.005694, acc: 1.000000
+[Wed Oct 10 17:18:40 2018] epoch_id: 33, batch_id: 1900, cost: 0.170195, acc: 0.984375
+[Wed Oct 10 17:18:42 2018] epoch_id: 33, batch_id: 2000, cost: 0.001044, acc: 1.000000
+[Wed Oct 10 17:18:44 2018] epoch_id: 33, batch_id: 2100, cost: 0.004921, acc: 1.000000
+[Wed Oct 10 17:18:46 2018] epoch_id: 33, batch_id: 2200, cost: 0.006203, acc: 1.000000
+[Wed Oct 10 17:18:48 2018] epoch_id: 33, batch_id: 2300, cost: 0.038624, acc: 0.984375
+[Wed Oct 10 17:18:51 2018] epoch_id: 33, batch_id: 2400, cost: 0.067313, acc: 0.976562
+[Wed Oct 10 17:18:53 2018] epoch_id: 33, batch_id: 2500, cost: 0.040853, acc: 0.992188
+[Wed Oct 10 17:18:55 2018] epoch_id: 33, batch_id: 2600, cost: 0.039087, acc: 0.984375
+[Wed Oct 10 17:18:57 2018] epoch_id: 33, batch_id: 2700, cost: 0.004672, acc: 1.000000
+[Wed Oct 10 17:19:00 2018] epoch_id: 33, batch_id: 2800, cost: 0.021997, acc: 0.984375
+[Wed Oct 10 17:19:02 2018] epoch_id: 33, batch_id: 2900, cost: 0.013635, acc: 1.000000
+[Wed Oct 10 17:19:04 2018] epoch_id: 33, batch_id: 3000, cost: 0.009055, acc: 0.992188
+[Wed Oct 10 17:19:05 2018] epoch_id: 33, train_avg_cost: 0.014972, train_avg_acc: 0.995145
+[Wed Oct 10 17:19:06 2018] epoch_id: 33, dev_cost: 1.819085, accuracy: 0.8352
+[Wed Oct 10 17:19:07 2018] epoch_id: 33, test_cost: 1.859041, accuracy: 0.8314
+[Wed Oct 10 17:19:15 2018] epoch_id: 34, batch_id: 0, cost: 0.026821, acc: 0.992188
+[Wed Oct 10 17:19:17 2018] epoch_id: 34, batch_id: 100, cost: 0.001463, acc: 1.000000
+[Wed Oct 10 17:19:20 2018] epoch_id: 34, batch_id: 200, cost: 0.000579, acc: 1.000000
+[Wed Oct 10 17:19:22 2018] epoch_id: 34, batch_id: 300, cost: 0.000492, acc: 1.000000
+[Wed Oct 10 17:19:24 2018] epoch_id: 34, batch_id: 400, cost: 0.000671, acc: 1.000000
+[Wed Oct 10 17:19:26 2018] epoch_id: 34, batch_id: 500, cost: 0.007763, acc: 1.000000
+[Wed Oct 10 17:19:29 2018] epoch_id: 34, batch_id: 600, cost: 0.018827, acc: 0.992188
+[Wed Oct 10 17:19:31 2018] epoch_id: 34, batch_id: 700, cost: 0.004606, acc: 1.000000
+[Wed Oct 10 17:19:33 2018] epoch_id: 34, batch_id: 800, cost: 0.004697, acc: 1.000000
+[Wed Oct 10 17:19:35 2018] epoch_id: 34, batch_id: 900, cost: 0.003752, acc: 1.000000
+[Wed Oct 10 17:19:38 2018] epoch_id: 34, batch_id: 1000, cost: 0.003546, acc: 1.000000
+[Wed Oct 10 17:19:40 2018] epoch_id: 34, batch_id: 1100, cost: 0.003848, acc: 1.000000
+[Wed Oct 10 17:19:42 2018] epoch_id: 34, batch_id: 1200, cost: 0.010363, acc: 1.000000
+[Wed Oct 10 17:19:44 2018] epoch_id: 34, batch_id: 1300, cost: 0.013875, acc: 0.992188
+[Wed Oct 10 17:19:47 2018] epoch_id: 34, batch_id: 1400, cost: 0.009212, acc: 0.992188
+[Wed Oct 10 17:19:49 2018] epoch_id: 34, batch_id: 1500, cost: 0.047909, acc: 0.992188
+[Wed Oct 10 17:19:51 2018] epoch_id: 34, batch_id: 1600, cost: 0.012809, acc: 0.992188
+[Wed Oct 10 17:19:53 2018] epoch_id: 34, batch_id: 1700, cost: 0.009717, acc: 1.000000
+[Wed Oct 10 17:19:56 2018] epoch_id: 34, batch_id: 1800, cost: 0.026330, acc: 0.984375
+[Wed Oct 10 17:19:58 2018] epoch_id: 34, batch_id: 1900, cost: 0.016982, acc: 0.992188
+[Wed Oct 10 17:20:00 2018] epoch_id: 34, batch_id: 2000, cost: 0.021416, acc: 0.992188
+[Wed Oct 10 17:20:03 2018] epoch_id: 34, batch_id: 2100, cost: 0.001120, acc: 1.000000
+[Wed Oct 10 17:20:05 2018] epoch_id: 34, batch_id: 2200, cost: 0.011436, acc: 1.000000
+[Wed Oct 10 17:20:07 2018] epoch_id: 34, batch_id: 2300, cost: 0.007605, acc: 0.992188
+[Wed Oct 10 17:20:10 2018] epoch_id: 34, batch_id: 2400, cost: 0.026308, acc: 0.992188
+[Wed Oct 10 17:20:12 2018] epoch_id: 34, batch_id: 2500, cost: 0.006798, acc: 1.000000
+[Wed Oct 10 17:20:14 2018] epoch_id: 34, batch_id: 2600, cost: 0.017334, acc: 0.992188
+[Wed Oct 10 17:20:16 2018] epoch_id: 34, batch_id: 2700, cost: 0.030094, acc: 0.992188
+[Wed Oct 10 17:20:18 2018] epoch_id: 34, batch_id: 2800, cost: 0.053259, acc: 0.992188
+[Wed Oct 10 17:20:21 2018] epoch_id: 34, batch_id: 2900, cost: 0.061547, acc: 0.968750
+[Wed Oct 10 17:20:23 2018] epoch_id: 34, batch_id: 3000, cost: 0.002864, acc: 1.000000
+[Wed Oct 10 17:20:24 2018] epoch_id: 34, train_avg_cost: 0.014813, train_avg_acc: 0.995064
+[Wed Oct 10 17:20:25 2018] epoch_id: 34, dev_cost: 1.697732, accuracy: 0.8346
+[Wed Oct 10 17:20:26 2018] epoch_id: 34, test_cost: 1.721137, accuracy: 0.8341
+[Wed Oct 10 17:20:34 2018] epoch_id: 35, batch_id: 0, cost: 0.000268, acc: 1.000000
+[Wed Oct 10 17:20:37 2018] epoch_id: 35, batch_id: 100, cost: 0.001389, acc: 1.000000
+[Wed Oct 10 17:20:39 2018] epoch_id: 35, batch_id: 200, cost: 0.003275, acc: 1.000000
+[Wed Oct 10 17:20:41 2018] epoch_id: 35, batch_id: 300, cost: 0.006535, acc: 1.000000
+[Wed Oct 10 17:20:43 2018] epoch_id: 35, batch_id: 400, cost: 0.005316, acc: 1.000000
+[Wed Oct 10 17:20:45 2018] epoch_id: 35, batch_id: 500, cost: 0.017976, acc: 0.992188
+[Wed Oct 10 17:20:48 2018] epoch_id: 35, batch_id: 600, cost: 0.060320, acc: 0.992188
+[Wed Oct 10 17:20:50 2018] epoch_id: 35, batch_id: 700, cost: 0.004358, acc: 1.000000
+[Wed Oct 10 17:20:52 2018] epoch_id: 35, batch_id: 800, cost: 0.003560, acc: 1.000000
+[Wed Oct 10 17:20:55 2018] epoch_id: 35, batch_id: 900, cost: 0.017978, acc: 0.992188
+[Wed Oct 10 17:20:57 2018] epoch_id: 35, batch_id: 1000, cost: 0.007025, acc: 1.000000
+[Wed Oct 10 17:20:59 2018] epoch_id: 35, batch_id: 1100, cost: 0.008777, acc: 0.992188
+[Wed Oct 10 17:21:01 2018] epoch_id: 35, batch_id: 1200, cost: 0.006591, acc: 1.000000
+[Wed Oct 10 17:21:04 2018] epoch_id: 35, batch_id: 1300, cost: 0.008911, acc: 0.992188
+[Wed Oct 10 17:21:06 2018] epoch_id: 35, batch_id: 1400, cost: 0.038343, acc: 0.984375
+[Wed Oct 10 17:21:08 2018] epoch_id: 35, batch_id: 1500, cost: 0.001654, acc: 1.000000
+[Wed Oct 10 17:21:10 2018] epoch_id: 35, batch_id: 1600, cost: 0.002577, acc: 1.000000
+[Wed Oct 10 17:21:13 2018] epoch_id: 35, batch_id: 1700, cost: 0.026908, acc: 0.992188
+[Wed Oct 10 17:21:15 2018] epoch_id: 35, batch_id: 1800, cost: 0.024004, acc: 0.992188
+[Wed Oct 10 17:21:17 2018] epoch_id: 35, batch_id: 1900, cost: 0.013134, acc: 0.992188
+[Wed Oct 10 17:21:19 2018] epoch_id: 35, batch_id: 2000, cost: 0.003633, acc: 1.000000
+[Wed Oct 10 17:21:21 2018] epoch_id: 35, batch_id: 2100, cost: 0.011727, acc: 0.992188
+[Wed Oct 10 17:21:24 2018] epoch_id: 35, batch_id: 2200, cost: 0.019991, acc: 0.992188
+[Wed Oct 10 17:21:26 2018] epoch_id: 35, batch_id: 2300, cost: 0.004771, acc: 1.000000
+[Wed Oct 10 17:21:28 2018] epoch_id: 35, batch_id: 2400, cost: 0.013732, acc: 0.992188
+[Wed Oct 10 17:21:30 2018] epoch_id: 35, batch_id: 2500, cost: 0.096741, acc: 0.984375
+[Wed Oct 10 17:21:33 2018] epoch_id: 35, batch_id: 2600, cost: 0.006102, acc: 1.000000
+[Wed Oct 10 17:21:36 2018] epoch_id: 35, batch_id: 2700, cost: 0.007046, acc: 0.992188
+[Wed Oct 10 17:21:38 2018] epoch_id: 35, batch_id: 2800, cost: 0.028777, acc: 0.984375
+[Wed Oct 10 17:21:41 2018] epoch_id: 35, batch_id: 2900, cost: 0.116960, acc: 0.976562
+[Wed Oct 10 17:21:43 2018] epoch_id: 35, batch_id: 3000, cost: 0.039752, acc: 0.968750
+[Wed Oct 10 17:21:44 2018] epoch_id: 35, train_avg_cost: 0.014921, train_avg_acc: 0.995075
+[Wed Oct 10 17:21:45 2018] epoch_id: 35, dev_cost: 1.203598, accuracy: 0.8348
+[Wed Oct 10 17:21:45 2018] epoch_id: 35, test_cost: 1.205202, accuracy: 0.8347
+[Wed Oct 10 17:21:54 2018] epoch_id: 36, batch_id: 0, cost: 0.009331, acc: 1.000000
+[Wed Oct 10 17:21:56 2018] epoch_id: 36, batch_id: 100, cost: 0.004473, acc: 1.000000
+[Wed Oct 10 17:21:58 2018] epoch_id: 36, batch_id: 200, cost: 0.001097, acc: 1.000000
+[Wed Oct 10 17:22:00 2018] epoch_id: 36, batch_id: 300, cost: 0.001914, acc: 1.000000
+[Wed Oct 10 17:22:03 2018] epoch_id: 36, batch_id: 400, cost: 0.003967, acc: 1.000000
+[Wed Oct 10 17:22:05 2018] epoch_id: 36, batch_id: 500, cost: 0.008101, acc: 1.000000
+[Wed Oct 10 17:22:07 2018] epoch_id: 36, batch_id: 600, cost: 0.037581, acc: 0.976562
+[Wed Oct 10 17:22:09 2018] epoch_id: 36, batch_id: 700, cost: 0.031872, acc: 0.992188
+[Wed Oct 10 17:22:11 2018] epoch_id: 36, batch_id: 800, cost: 0.002586, acc: 1.000000
+[Wed Oct 10 17:22:14 2018] epoch_id: 36, batch_id: 900, cost: 0.025838, acc: 0.984375
+[Wed Oct 10 17:22:16 2018] epoch_id: 36, batch_id: 1000, cost: 0.012382, acc: 0.992188
+[Wed Oct 10 17:22:18 2018] epoch_id: 36, batch_id: 1100, cost: 0.006482, acc: 1.000000
+[Wed Oct 10 17:22:20 2018] epoch_id: 36, batch_id: 1200, cost: 0.006437, acc: 1.000000
+[Wed Oct 10 17:22:23 2018] epoch_id: 36, batch_id: 1300, cost: 0.026039, acc: 0.992188
+[Wed Oct 10 17:22:25 2018] epoch_id: 36, batch_id: 1400, cost: 0.017908, acc: 0.992188
+[Wed Oct 10 17:22:27 2018] epoch_id: 36, batch_id: 1500, cost: 0.025722, acc: 0.984375
+[Wed Oct 10 17:22:29 2018] epoch_id: 36, batch_id: 1600, cost: 0.031398, acc: 0.992188
+[Wed Oct 10 17:22:32 2018] epoch_id: 36, batch_id: 1700, cost: 0.034194, acc: 0.984375
+[Wed Oct 10 17:22:34 2018] epoch_id: 36, batch_id: 1800, cost: 0.001353, acc: 1.000000
+[Wed Oct 10 17:22:36 2018] epoch_id: 36, batch_id: 1900, cost: 0.000942, acc: 1.000000
+[Wed Oct 10 17:22:38 2018] epoch_id: 36, batch_id: 2000, cost: 0.004051, acc: 1.000000
+[Wed Oct 10 17:22:40 2018] epoch_id: 36, batch_id: 2100, cost: 0.016359, acc: 0.992188
+[Wed Oct 10 17:22:43 2018] epoch_id: 36, batch_id: 2200, cost: 0.010324, acc: 1.000000
+[Wed Oct 10 17:22:46 2018] epoch_id: 36, batch_id: 2300, cost: 0.015250, acc: 1.000000
+[Wed Oct 10 17:22:48 2018] epoch_id: 36, batch_id: 2400, cost: 0.053711, acc: 0.976562
+[Wed Oct 10 17:22:51 2018] epoch_id: 36, batch_id: 2500, cost: 0.059409, acc: 0.984375
+[Wed Oct 10 17:22:53 2018] epoch_id: 36, batch_id: 2600, cost: 0.009707, acc: 1.000000
+[Wed Oct 10 17:22:55 2018] epoch_id: 36, batch_id: 2700, cost: 0.003367, acc: 1.000000
+[Wed Oct 10 17:22:58 2018] epoch_id: 36, batch_id: 2800, cost: 0.001207, acc: 1.000000
+[Wed Oct 10 17:23:00 2018] epoch_id: 36, batch_id: 2900, cost: 0.009538, acc: 0.992188
+[Wed Oct 10 17:23:02 2018] epoch_id: 36, batch_id: 3000, cost: 0.013745, acc: 0.992188
+[Wed Oct 10 17:23:03 2018] epoch_id: 36, train_avg_cost: 0.014009, train_avg_acc: 0.995522
+[Wed Oct 10 17:23:04 2018] epoch_id: 36, dev_cost: 1.647745, accuracy: 0.8324
+[Wed Oct 10 17:23:05 2018] epoch_id: 36, test_cost: 1.662931, accuracy: 0.8368
+[Wed Oct 10 17:23:13 2018] epoch_id: 37, batch_id: 0, cost: 0.009128, acc: 1.000000
+[Wed Oct 10 17:23:15 2018] epoch_id: 37, batch_id: 100, cost: 0.000989, acc: 1.000000
+[Wed Oct 10 17:23:17 2018] epoch_id: 37, batch_id: 200, cost: 0.031867, acc: 0.992188
+[Wed Oct 10 17:23:20 2018] epoch_id: 37, batch_id: 300, cost: 0.016197, acc: 0.984375
+[Wed Oct 10 17:23:22 2018] epoch_id: 37, batch_id: 400, cost: 0.004157, acc: 1.000000
+[Wed Oct 10 17:23:24 2018] epoch_id: 37, batch_id: 500, cost: 0.004215, acc: 1.000000
+[Wed Oct 10 17:23:26 2018] epoch_id: 37, batch_id: 600, cost: 0.000303, acc: 1.000000
+[Wed Oct 10 17:23:29 2018] epoch_id: 37, batch_id: 700, cost: 0.005056, acc: 1.000000
+[Wed Oct 10 17:23:31 2018] epoch_id: 37, batch_id: 800, cost: 0.016816, acc: 0.992188
+[Wed Oct 10 17:23:34 2018] epoch_id: 37, batch_id: 900, cost: 0.036067, acc: 0.984375
+[Wed Oct 10 17:23:37 2018] epoch_id: 37, batch_id: 1000, cost: 0.002430, acc: 1.000000
+[Wed Oct 10 17:23:39 2018] epoch_id: 37, batch_id: 1100, cost: 0.001621, acc: 1.000000
+[Wed Oct 10 17:23:41 2018] epoch_id: 37, batch_id: 1200, cost: 0.034505, acc: 0.992188
+[Wed Oct 10 17:23:43 2018] epoch_id: 37, batch_id: 1300, cost: 0.008605, acc: 0.992188
+[Wed Oct 10 17:23:45 2018] epoch_id: 37, batch_id: 1400, cost: 0.039387, acc: 0.984375
+[Wed Oct 10 17:23:48 2018] epoch_id: 37, batch_id: 1500, cost: 0.005761, acc: 1.000000
+[Wed Oct 10 17:23:50 2018] epoch_id: 37, batch_id: 1600, cost: 0.002905, acc: 1.000000
+[Wed Oct 10 17:23:52 2018] epoch_id: 37, batch_id: 1700, cost: 0.009640, acc: 1.000000
+[Wed Oct 10 17:23:55 2018] epoch_id: 37, batch_id: 1800, cost: 0.004734, acc: 1.000000
+[Wed Oct 10 17:23:57 2018] epoch_id: 37, batch_id: 1900, cost: 0.029191, acc: 0.992188
+[Wed Oct 10 17:23:59 2018] epoch_id: 37, batch_id: 2000, cost: 0.000724, acc: 1.000000
+[Wed Oct 10 17:24:01 2018] epoch_id: 37, batch_id: 2100, cost: 0.014325, acc: 0.992188
+[Wed Oct 10 17:24:04 2018] epoch_id: 37, batch_id: 2200, cost: 0.004239, acc: 1.000000
+[Wed Oct 10 17:24:06 2018] epoch_id: 37, batch_id: 2300, cost: 0.000597, acc: 1.000000
+[Wed Oct 10 17:24:08 2018] epoch_id: 37, batch_id: 2400, cost: 0.008226, acc: 1.000000
+[Wed Oct 10 17:24:10 2018] epoch_id: 37, batch_id: 2500, cost: 0.001601, acc: 1.000000
+[Wed Oct 10 17:24:12 2018] epoch_id: 37, batch_id: 2600, cost: 0.014527, acc: 0.992188
+[Wed Oct 10 17:24:15 2018] epoch_id: 37, batch_id: 2700, cost: 0.010813, acc: 0.992188
+[Wed Oct 10 17:24:17 2018] epoch_id: 37, batch_id: 2800, cost: 0.015832, acc: 0.992188
+[Wed Oct 10 17:24:19 2018] epoch_id: 37, batch_id: 2900, cost: 0.063636, acc: 0.976562
+[Wed Oct 10 17:24:22 2018] epoch_id: 37, batch_id: 3000, cost: 0.003993, acc: 1.000000
+[Wed Oct 10 17:24:22 2018] epoch_id: 37, train_avg_cost: 0.014056, train_avg_acc: 0.995431
+[Wed Oct 10 17:24:23 2018] epoch_id: 37, dev_cost: 1.500988, accuracy: 0.8334
+[Wed Oct 10 17:24:24 2018] epoch_id: 37, test_cost: 1.491400, accuracy: 0.8327
+[Wed Oct 10 17:24:33 2018] epoch_id: 38, batch_id: 0, cost: 0.016895, acc: 0.992188
+[Wed Oct 10 17:24:35 2018] epoch_id: 38, batch_id: 100, cost: 0.001690, acc: 1.000000
+[Wed Oct 10 17:24:38 2018] epoch_id: 38, batch_id: 200, cost: 0.009989, acc: 1.000000
+[Wed Oct 10 17:24:40 2018] epoch_id: 38, batch_id: 300, cost: 0.023480, acc: 0.984375
+[Wed Oct 10 17:24:42 2018] epoch_id: 38, batch_id: 400, cost: 0.004687, acc: 1.000000
+[Wed Oct 10 17:24:45 2018] epoch_id: 38, batch_id: 500, cost: 0.020183, acc: 0.992188
+[Wed Oct 10 17:24:47 2018] epoch_id: 38, batch_id: 600, cost: 0.028614, acc: 0.992188
+[Wed Oct 10 17:24:49 2018] epoch_id: 38, batch_id: 700, cost: 0.000448, acc: 1.000000
+[Wed Oct 10 17:24:51 2018] epoch_id: 38, batch_id: 800, cost: 0.000913, acc: 1.000000
+[Wed Oct 10 17:24:54 2018] epoch_id: 38, batch_id: 900, cost: 0.022090, acc: 0.992188
+[Wed Oct 10 17:24:56 2018] epoch_id: 38, batch_id: 1000, cost: 0.006918, acc: 0.992188
+[Wed Oct 10 17:24:58 2018] epoch_id: 38, batch_id: 1100, cost: 0.028611, acc: 0.984375
+[Wed Oct 10 17:25:00 2018] epoch_id: 38, batch_id: 1200, cost: 0.013097, acc: 0.992188
+[Wed Oct 10 17:25:03 2018] epoch_id: 38, batch_id: 1300, cost: 0.014227, acc: 0.992188
+[Wed Oct 10 17:25:05 2018] epoch_id: 38, batch_id: 1400, cost: 0.033064, acc: 0.992188
+[Wed Oct 10 17:25:07 2018] epoch_id: 38, batch_id: 1500, cost: 0.004276, acc: 1.000000
+[Wed Oct 10 17:25:09 2018] epoch_id: 38, batch_id: 1600, cost: 0.016516, acc: 0.992188
+[Wed Oct 10 17:25:12 2018] epoch_id: 38, batch_id: 1700, cost: 0.004443, acc: 1.000000
+[Wed Oct 10 17:25:14 2018] epoch_id: 38, batch_id: 1800, cost: 0.001648, acc: 1.000000
+[Wed Oct 10 17:25:17 2018] epoch_id: 38, batch_id: 1900, cost: 0.026780, acc: 0.992188
+[Wed Oct 10 17:25:20 2018] epoch_id: 38, batch_id: 2000, cost: 0.006375, acc: 0.992188
+[Wed Oct 10 17:25:22 2018] epoch_id: 38, batch_id: 2100, cost: 0.013131, acc: 0.992188
+[Wed Oct 10 17:25:24 2018] epoch_id: 38, batch_id: 2200, cost: 0.012666, acc: 1.000000
+[Wed Oct 10 17:25:26 2018] epoch_id: 38, batch_id: 2300, cost: 0.001973, acc: 1.000000
+[Wed Oct 10 17:25:29 2018] epoch_id: 38, batch_id: 2400, cost: 0.005966, acc: 1.000000
+[Wed Oct 10 17:25:31 2018] epoch_id: 38, batch_id: 2500, cost: 0.011249, acc: 0.992188
+[Wed Oct 10 17:25:33 2018] epoch_id: 38, batch_id: 2600, cost: 0.022209, acc: 0.992188
+[Wed Oct 10 17:25:36 2018] epoch_id: 38, batch_id: 2700, cost: 0.003999, acc: 1.000000
+[Wed Oct 10 17:25:38 2018] epoch_id: 38, batch_id: 2800, cost: 0.010264, acc: 0.992188
+[Wed Oct 10 17:25:40 2018] epoch_id: 38, batch_id: 2900, cost: 0.003841, acc: 1.000000
+[Wed Oct 10 17:25:42 2018] epoch_id: 38, batch_id: 3000, cost: 0.075514, acc: 0.992188
+[Wed Oct 10 17:25:43 2018] epoch_id: 38, train_avg_cost: 0.013573, train_avg_acc: 0.995548
+[Wed Oct 10 17:25:44 2018] epoch_id: 38, dev_cost: 1.577028, accuracy: 0.8317
+[Wed Oct 10 17:25:45 2018] epoch_id: 38, test_cost: 1.546861, accuracy: 0.8363
+[Wed Oct 10 17:25:54 2018] epoch_id: 39, batch_id: 0, cost: 0.000487, acc: 1.000000
+[Wed Oct 10 17:25:56 2018] epoch_id: 39, batch_id: 100, cost: 0.003988, acc: 1.000000
+[Wed Oct 10 17:25:58 2018] epoch_id: 39, batch_id: 200, cost: 0.069709, acc: 0.984375
+[Wed Oct 10 17:26:00 2018] epoch_id: 39, batch_id: 300, cost: 0.031796, acc: 0.992188
+[Wed Oct 10 17:26:03 2018] epoch_id: 39, batch_id: 400, cost: 0.007788, acc: 1.000000
+[Wed Oct 10 17:26:05 2018] epoch_id: 39, batch_id: 500, cost: 0.014854, acc: 0.992188
+[Wed Oct 10 17:26:07 2018] epoch_id: 39, batch_id: 600, cost: 0.017382, acc: 0.992188
+[Wed Oct 10 17:26:09 2018] epoch_id: 39, batch_id: 700, cost: 0.003342, acc: 1.000000
+[Wed Oct 10 17:26:12 2018] epoch_id: 39, batch_id: 800, cost: 0.003279, acc: 1.000000
+[Wed Oct 10 17:26:14 2018] epoch_id: 39, batch_id: 900, cost: 0.018283, acc: 0.992188
+[Wed Oct 10 17:26:16 2018] epoch_id: 39, batch_id: 1000, cost: 0.000697, acc: 1.000000
+[Wed Oct 10 17:26:18 2018] epoch_id: 39, batch_id: 1100, cost: 0.003188, acc: 1.000000
+[Wed Oct 10 17:26:21 2018] epoch_id: 39, batch_id: 1200, cost: 0.002884, acc: 1.000000
+[Wed Oct 10 17:26:23 2018] epoch_id: 39, batch_id: 1300, cost: 0.016443, acc: 0.992188
+[Wed Oct 10 17:26:25 2018] epoch_id: 39, batch_id: 1400, cost: 0.036063, acc: 0.992188
+[Wed Oct 10 17:26:28 2018] epoch_id: 39, batch_id: 1500, cost: 0.010849, acc: 0.992188
+[Wed Oct 10 17:26:30 2018] epoch_id: 39, batch_id: 1600, cost: 0.002218, acc: 1.000000
+[Wed Oct 10 17:26:32 2018] epoch_id: 39, batch_id: 1700, cost: 0.011184, acc: 1.000000
+[Wed Oct 10 17:26:34 2018] epoch_id: 39, batch_id: 1800, cost: 0.002410, acc: 1.000000
+[Wed Oct 10 17:26:37 2018] epoch_id: 39, batch_id: 1900, cost: 0.010422, acc: 0.992188
+[Wed Oct 10 17:26:39 2018] epoch_id: 39, batch_id: 2000, cost: 0.012162, acc: 0.992188
+[Wed Oct 10 17:26:41 2018] epoch_id: 39, batch_id: 2100, cost: 0.042420, acc: 0.984375
+[Wed Oct 10 17:26:43 2018] epoch_id: 39, batch_id: 2200, cost: 0.006210, acc: 1.000000
+[Wed Oct 10 17:26:46 2018] epoch_id: 39, batch_id: 2300, cost: 0.002905, acc: 1.000000
+[Wed Oct 10 17:26:48 2018] epoch_id: 39, batch_id: 2400, cost: 0.067472, acc: 0.992188
+[Wed Oct 10 17:26:50 2018] epoch_id: 39, batch_id: 2500, cost: 0.030382, acc: 0.992188
+[Wed Oct 10 17:26:52 2018] epoch_id: 39, batch_id: 2600, cost: 0.049727, acc: 0.992188
+[Wed Oct 10 17:26:54 2018] epoch_id: 39, batch_id: 2700, cost: 0.024157, acc: 0.984375
+[Wed Oct 10 17:26:57 2018] epoch_id: 39, batch_id: 2800, cost: 0.021991, acc: 0.992188
+[Wed Oct 10 17:26:59 2018] epoch_id: 39, batch_id: 2900, cost: 0.001997, acc: 1.000000
+[Wed Oct 10 17:27:01 2018] epoch_id: 39, batch_id: 3000, cost: 0.001907, acc: 1.000000
+[Wed Oct 10 17:27:02 2018] epoch_id: 39, train_avg_cost: 0.012756, train_avg_acc: 0.995835
+[Wed Oct 10 17:27:03 2018] epoch_id: 39, dev_cost: 1.650582, accuracy: 0.8342
+[Wed Oct 10 17:27:04 2018] epoch_id: 39, test_cost: 1.662477, accuracy: 0.8325
+[Wed Oct 10 17:27:12 2018] epoch_id: 40, batch_id: 0, cost: 0.000858, acc: 1.000000
+[Wed Oct 10 17:27:15 2018] epoch_id: 40, batch_id: 100, cost: 0.000849, acc: 1.000000
+[Wed Oct 10 17:27:17 2018] epoch_id: 40, batch_id: 200, cost: 0.016273, acc: 0.992188
+[Wed Oct 10 17:27:19 2018] epoch_id: 40, batch_id: 300, cost: 0.042659, acc: 0.992188
+[Wed Oct 10 17:27:21 2018] epoch_id: 40, batch_id: 400, cost: 0.010672, acc: 0.992188
+[Wed Oct 10 17:27:24 2018] epoch_id: 40, batch_id: 500, cost: 0.000544, acc: 1.000000
+[Wed Oct 10 17:27:26 2018] epoch_id: 40, batch_id: 600, cost: 0.005578, acc: 1.000000
+[Wed Oct 10 17:27:28 2018] epoch_id: 40, batch_id: 700, cost: 0.039266, acc: 0.992188
+[Wed Oct 10 17:27:31 2018] epoch_id: 40, batch_id: 800, cost: 0.013144, acc: 0.992188
+[Wed Oct 10 17:27:33 2018] epoch_id: 40, batch_id: 900, cost: 0.000740, acc: 1.000000
+[Wed Oct 10 17:27:35 2018] epoch_id: 40, batch_id: 1000, cost: 0.003259, acc: 1.000000
+[Wed Oct 10 17:27:37 2018] epoch_id: 40, batch_id: 1100, cost: 0.002126, acc: 1.000000
+[Wed Oct 10 17:27:40 2018] epoch_id: 40, batch_id: 1200, cost: 0.003089, acc: 1.000000
+[Wed Oct 10 17:27:42 2018] epoch_id: 40, batch_id: 1300, cost: 0.000690, acc: 1.000000
+[Wed Oct 10 17:27:44 2018] epoch_id: 40, batch_id: 1400, cost: 0.000283, acc: 1.000000
+[Wed Oct 10 17:27:46 2018] epoch_id: 40, batch_id: 1500, cost: 0.013878, acc: 0.984375
+[Wed Oct 10 17:27:49 2018] epoch_id: 40, batch_id: 1600, cost: 0.005389, acc: 1.000000
+[Wed Oct 10 17:27:51 2018] epoch_id: 40, batch_id: 1700, cost: 0.024631, acc: 0.992188
+[Wed Oct 10 17:27:53 2018] epoch_id: 40, batch_id: 1800, cost: 0.003978, acc: 1.000000
+[Wed Oct 10 17:27:55 2018] epoch_id: 40, batch_id: 1900, cost: 0.004993, acc: 1.000000
+[Wed Oct 10 17:27:58 2018] epoch_id: 40, batch_id: 2000, cost: 0.014580, acc: 0.984375
+[Wed Oct 10 17:28:00 2018] epoch_id: 40, batch_id: 2100, cost: 0.003148, acc: 1.000000
+[Wed Oct 10 17:28:02 2018] epoch_id: 40, batch_id: 2200, cost: 0.000848, acc: 1.000000
+[Wed Oct 10 17:28:04 2018] epoch_id: 40, batch_id: 2300, cost: 0.009250, acc: 1.000000
+[Wed Oct 10 17:28:06 2018] epoch_id: 40, batch_id: 2400, cost: 0.006138, acc: 1.000000
+[Wed Oct 10 17:28:09 2018] epoch_id: 40, batch_id: 2500, cost: 0.050052, acc: 0.984375
+[Wed Oct 10 17:28:11 2018] epoch_id: 40, batch_id: 2600, cost: 0.005259, acc: 1.000000
+[Wed Oct 10 17:28:13 2018] epoch_id: 40, batch_id: 2700, cost: 0.027375, acc: 0.984375
+[Wed Oct 10 17:28:17 2018] epoch_id: 40, batch_id: 2800, cost: 0.010132, acc: 0.992188
+[Wed Oct 10 17:28:19 2018] epoch_id: 40, batch_id: 2900, cost: 0.003442, acc: 1.000000
+[Wed Oct 10 17:28:21 2018] epoch_id: 40, batch_id: 3000, cost: 0.005328, acc: 1.000000
+[Wed Oct 10 17:28:22 2018] epoch_id: 40, train_avg_cost: 0.013034, train_avg_acc: 0.995832
+[Wed Oct 10 17:28:23 2018] epoch_id: 40, dev_cost: 1.424795, accuracy: 0.8311
+[Wed Oct 10 17:28:24 2018] epoch_id: 40, test_cost: 1.404285, accuracy: 0.8345
+[Wed Oct 10 17:28:32 2018] epoch_id: 41, batch_id: 0, cost: 0.023169, acc: 0.992188
+[Wed Oct 10 17:28:34 2018] epoch_id: 41, batch_id: 100, cost: 0.008356, acc: 0.992188
+[Wed Oct 10 17:28:36 2018] epoch_id: 41, batch_id: 200, cost: 0.034033, acc: 0.992188
+[Wed Oct 10 17:28:39 2018] epoch_id: 41, batch_id: 300, cost: 0.003154, acc: 1.000000
+[Wed Oct 10 17:28:41 2018] epoch_id: 41, batch_id: 400, cost: 0.000178, acc: 1.000000
+[Wed Oct 10 17:28:43 2018] epoch_id: 41, batch_id: 500, cost: 0.001488, acc: 1.000000
+[Wed Oct 10 17:28:45 2018] epoch_id: 41, batch_id: 600, cost: 0.034724, acc: 0.992188
+[Wed Oct 10 17:28:48 2018] epoch_id: 41, batch_id: 700, cost: 0.011531, acc: 0.992188
+[Wed Oct 10 17:28:50 2018] epoch_id: 41, batch_id: 800, cost: 0.003504, acc: 1.000000
+[Wed Oct 10 17:28:52 2018] epoch_id: 41, batch_id: 900, cost: 0.010360, acc: 0.992188
+[Wed Oct 10 17:28:54 2018] epoch_id: 41, batch_id: 1000, cost: 0.014474, acc: 0.992188
+[Wed Oct 10 17:28:57 2018] epoch_id: 41, batch_id: 1100, cost: 0.005857, acc: 1.000000
+[Wed Oct 10 17:28:59 2018] epoch_id: 41, batch_id: 1200, cost: 0.007621, acc: 0.992188
+[Wed Oct 10 17:29:01 2018] epoch_id: 41, batch_id: 1300, cost: 0.013386, acc: 0.992188
+[Wed Oct 10 17:29:03 2018] epoch_id: 41, batch_id: 1400, cost: 0.004675, acc: 1.000000
+[Wed Oct 10 17:29:05 2018] epoch_id: 41, batch_id: 1500, cost: 0.023563, acc: 0.984375
+[Wed Oct 10 17:29:07 2018] epoch_id: 41, batch_id: 1600, cost: 0.001719, acc: 1.000000
+[Wed Oct 10 17:29:10 2018] epoch_id: 41, batch_id: 1700, cost: 0.000334, acc: 1.000000
+[Wed Oct 10 17:29:12 2018] epoch_id: 41, batch_id: 1800, cost: 0.001468, acc: 1.000000
+[Wed Oct 10 17:29:14 2018] epoch_id: 41, batch_id: 1900, cost: 0.002295, acc: 1.000000
+[Wed Oct 10 17:29:16 2018] epoch_id: 41, batch_id: 2000, cost: 0.021738, acc: 0.984375
+[Wed Oct 10 17:29:19 2018] epoch_id: 41, batch_id: 2100, cost: 0.023329, acc: 0.984375
+[Wed Oct 10 17:29:21 2018] epoch_id: 41, batch_id: 2200, cost: 0.005678, acc: 1.000000
+[Wed Oct 10 17:29:23 2018] epoch_id: 41, batch_id: 2300, cost: 0.004800, acc: 1.000000
+[Wed Oct 10 17:29:27 2018] epoch_id: 41, batch_id: 2400, cost: 0.007035, acc: 1.000000
+[Wed Oct 10 17:29:29 2018] epoch_id: 41, batch_id: 2500, cost: 0.041456, acc: 0.976562
+[Wed Oct 10 17:29:31 2018] epoch_id: 41, batch_id: 2600, cost: 0.011735, acc: 0.992188
+[Wed Oct 10 17:29:33 2018] epoch_id: 41, batch_id: 2700, cost: 0.016611, acc: 0.992188
+[Wed Oct 10 17:29:36 2018] epoch_id: 41, batch_id: 2800, cost: 0.004084, acc: 1.000000
+[Wed Oct 10 17:29:38 2018] epoch_id: 41, batch_id: 2900, cost: 0.001111, acc: 1.000000
+[Wed Oct 10 17:29:40 2018] epoch_id: 41, batch_id: 3000, cost: 0.015571, acc: 0.992188
+[Wed Oct 10 17:29:41 2018] epoch_id: 41, train_avg_cost: 0.012473, train_avg_acc: 0.995959
+[Wed Oct 10 17:29:42 2018] epoch_id: 41, dev_cost: 1.301212, accuracy: 0.8313
+[Wed Oct 10 17:29:43 2018] epoch_id: 41, test_cost: 1.292132, accuracy: 0.8314
+[Wed Oct 10 17:29:51 2018] epoch_id: 42, batch_id: 0, cost: 0.006710, acc: 1.000000
+[Wed Oct 10 17:29:53 2018] epoch_id: 42, batch_id: 100, cost: 0.003760, acc: 1.000000
+[Wed Oct 10 17:29:56 2018] epoch_id: 42, batch_id: 200, cost: 0.007728, acc: 1.000000
+[Wed Oct 10 17:29:58 2018] epoch_id: 42, batch_id: 300, cost: 0.010997, acc: 1.000000
+[Wed Oct 10 17:30:00 2018] epoch_id: 42, batch_id: 400, cost: 0.015313, acc: 0.984375
+[Wed Oct 10 17:30:02 2018] epoch_id: 42, batch_id: 500, cost: 0.000985, acc: 1.000000
+[Wed Oct 10 17:30:05 2018] epoch_id: 42, batch_id: 600, cost: 0.001277, acc: 1.000000
+[Wed Oct 10 17:30:07 2018] epoch_id: 42, batch_id: 700, cost: 0.002231, acc: 1.000000
+[Wed Oct 10 17:30:10 2018] epoch_id: 42, batch_id: 800, cost: 0.002233, acc: 1.000000
+[Wed Oct 10 17:30:12 2018] epoch_id: 42, batch_id: 900, cost: 0.002083, acc: 1.000000
+[Wed Oct 10 17:30:15 2018] epoch_id: 42, batch_id: 1000, cost: 0.004574, acc: 1.000000
+[Wed Oct 10 17:30:17 2018] epoch_id: 42, batch_id: 1100, cost: 0.004339, acc: 1.000000
+[Wed Oct 10 17:30:19 2018] epoch_id: 42, batch_id: 1200, cost: 0.006596, acc: 1.000000
+[Wed Oct 10 17:30:22 2018] epoch_id: 42, batch_id: 1300, cost: 0.000877, acc: 1.000000
+[Wed Oct 10 17:30:24 2018] epoch_id: 42, batch_id: 1400, cost: 0.001873, acc: 1.000000
+[Wed Oct 10 17:30:26 2018] epoch_id: 42, batch_id: 1500, cost: 0.000632, acc: 1.000000
+[Wed Oct 10 17:30:29 2018] epoch_id: 42, batch_id: 1600, cost: 0.002006, acc: 1.000000
+[Wed Oct 10 17:30:31 2018] epoch_id: 42, batch_id: 1700, cost: 0.002035, acc: 1.000000
+[Wed Oct 10 17:30:33 2018] epoch_id: 42, batch_id: 1800, cost: 0.010094, acc: 1.000000
+[Wed Oct 10 17:30:35 2018] epoch_id: 42, batch_id: 1900, cost: 0.002634, acc: 1.000000
+[Wed Oct 10 17:30:38 2018] epoch_id: 42, batch_id: 2000, cost: 0.045660, acc: 0.984375
+[Wed Oct 10 17:30:40 2018] epoch_id: 42, batch_id: 2100, cost: 0.034275, acc: 0.984375
+[Wed Oct 10 17:30:42 2018] epoch_id: 42, batch_id: 2200, cost: 0.001633, acc: 1.000000
+[Wed Oct 10 17:30:44 2018] epoch_id: 42, batch_id: 2300, cost: 0.001030, acc: 1.000000
+[Wed Oct 10 17:30:47 2018] epoch_id: 42, batch_id: 2400, cost: 0.002235, acc: 1.000000
+[Wed Oct 10 17:30:49 2018] epoch_id: 42, batch_id: 2500, cost: 0.017729, acc: 0.992188
+[Wed Oct 10 17:30:51 2018] epoch_id: 42, batch_id: 2600, cost: 0.004357, acc: 1.000000
+[Wed Oct 10 17:30:53 2018] epoch_id: 42, batch_id: 2700, cost: 0.000981, acc: 1.000000
+[Wed Oct 10 17:30:56 2018] epoch_id: 42, batch_id: 2800, cost: 0.000964, acc: 1.000000
+[Wed Oct 10 17:30:58 2018] epoch_id: 42, batch_id: 2900, cost: 0.018888, acc: 0.992188
+[Wed Oct 10 17:31:00 2018] epoch_id: 42, batch_id: 3000, cost: 0.032965, acc: 0.984375
+[Wed Oct 10 17:31:01 2018] epoch_id: 42, train_avg_cost: 0.013007, train_avg_acc: 0.995946
+[Wed Oct 10 17:31:02 2018] epoch_id: 42, dev_cost: 1.701511, accuracy: 0.8335
+[Wed Oct 10 17:31:03 2018] epoch_id: 42, test_cost: 1.704458, accuracy: 0.8312
+[Wed Oct 10 17:31:12 2018] epoch_id: 43, batch_id: 0, cost: 0.002044, acc: 1.000000
+[Wed Oct 10 17:31:14 2018] epoch_id: 43, batch_id: 100, cost: 0.018454, acc: 0.992188
+[Wed Oct 10 17:31:16 2018] epoch_id: 43, batch_id: 200, cost: 0.002746, acc: 1.000000
+[Wed Oct 10 17:31:18 2018] epoch_id: 43, batch_id: 300, cost: 0.008316, acc: 0.992188
+[Wed Oct 10 17:31:21 2018] epoch_id: 43, batch_id: 400, cost: 0.009446, acc: 1.000000
+[Wed Oct 10 17:31:23 2018] epoch_id: 43, batch_id: 500, cost: 0.000336, acc: 1.000000
+[Wed Oct 10 17:31:25 2018] epoch_id: 43, batch_id: 600, cost: 0.000436, acc: 1.000000
+[Wed Oct 10 17:31:27 2018] epoch_id: 43, batch_id: 700, cost: 0.000142, acc: 1.000000
+[Wed Oct 10 17:31:30 2018] epoch_id: 43, batch_id: 800, cost: 0.001449, acc: 1.000000
+[Wed Oct 10 17:31:32 2018] epoch_id: 43, batch_id: 900, cost: 0.040274, acc: 0.992188
+[Wed Oct 10 17:31:34 2018] epoch_id: 43, batch_id: 1000, cost: 0.002314, acc: 1.000000
+[Wed Oct 10 17:31:36 2018] epoch_id: 43, batch_id: 1100, cost: 0.008140, acc: 0.992188
+[Wed Oct 10 17:31:39 2018] epoch_id: 43, batch_id: 1200, cost: 0.001320, acc: 1.000000
+[Wed Oct 10 17:31:41 2018] epoch_id: 43, batch_id: 1300, cost: 0.000427, acc: 1.000000
+[Wed Oct 10 17:31:43 2018] epoch_id: 43, batch_id: 1400, cost: 0.004985, acc: 1.000000
+[Wed Oct 10 17:31:46 2018] epoch_id: 43, batch_id: 1500, cost: 0.005165, acc: 1.000000
+[Wed Oct 10 17:31:48 2018] epoch_id: 43, batch_id: 1600, cost: 0.006397, acc: 1.000000
+[Wed Oct 10 17:31:50 2018] epoch_id: 43, batch_id: 1700, cost: 0.026334, acc: 0.984375
+[Wed Oct 10 17:31:54 2018] epoch_id: 43, batch_id: 1800, cost: 0.003058, acc: 1.000000
+[Wed Oct 10 17:31:56 2018] epoch_id: 43, batch_id: 1900, cost: 0.009215, acc: 1.000000
+[Wed Oct 10 17:31:58 2018] epoch_id: 43, batch_id: 2000, cost: 0.005750, acc: 1.000000
+[Wed Oct 10 17:32:01 2018] epoch_id: 43, batch_id: 2100, cost: 0.006973, acc: 1.000000
+[Wed Oct 10 17:32:03 2018] epoch_id: 43, batch_id: 2200, cost: 0.040183, acc: 0.984375
+[Wed Oct 10 17:32:05 2018] epoch_id: 43, batch_id: 2300, cost: 0.007980, acc: 0.992188
+[Wed Oct 10 17:32:07 2018] epoch_id: 43, batch_id: 2400, cost: 0.018794, acc: 0.992188
+[Wed Oct 10 17:32:10 2018] epoch_id: 43, batch_id: 2500, cost: 0.031288, acc: 0.984375
+[Wed Oct 10 17:32:12 2018] epoch_id: 43, batch_id: 2600, cost: 0.010219, acc: 0.992188
+[Wed Oct 10 17:32:14 2018] epoch_id: 43, batch_id: 2700, cost: 0.021514, acc: 0.984375
+[Wed Oct 10 17:32:17 2018] epoch_id: 43, batch_id: 2800, cost: 0.005614, acc: 1.000000
+[Wed Oct 10 17:32:19 2018] epoch_id: 43, batch_id: 2900, cost: 0.065875, acc: 0.984375
+[Wed Oct 10 17:32:21 2018] epoch_id: 43, batch_id: 3000, cost: 0.013279, acc: 0.992188
+[Wed Oct 10 17:32:22 2018] epoch_id: 43, train_avg_cost: 0.011822, train_avg_acc: 0.996238
+[Wed Oct 10 17:32:23 2018] epoch_id: 43, dev_cost: 1.703876, accuracy: 0.8322
+[Wed Oct 10 17:32:24 2018] epoch_id: 43, test_cost: 1.724094, accuracy: 0.8315
+[Wed Oct 10 17:32:32 2018] epoch_id: 44, batch_id: 0, cost: 0.003358, acc: 1.000000
+[Wed Oct 10 17:32:34 2018] epoch_id: 44, batch_id: 100, cost: 0.003024, acc: 1.000000
+[Wed Oct 10 17:32:37 2018] epoch_id: 44, batch_id: 200, cost: 0.038726, acc: 0.992188
+[Wed Oct 10 17:32:39 2018] epoch_id: 44, batch_id: 300, cost: 0.001766, acc: 1.000000
+[Wed Oct 10 17:32:41 2018] epoch_id: 44, batch_id: 400, cost: 0.005300, acc: 1.000000
+[Wed Oct 10 17:32:43 2018] epoch_id: 44, batch_id: 500, cost: 0.023175, acc: 0.992188
+[Wed Oct 10 17:32:46 2018] epoch_id: 44, batch_id: 600, cost: 0.002893, acc: 1.000000
+[Wed Oct 10 17:32:48 2018] epoch_id: 44, batch_id: 700, cost: 0.025870, acc: 0.976562
+[Wed Oct 10 17:32:50 2018] epoch_id: 44, batch_id: 800, cost: 0.019898, acc: 0.992188
+[Wed Oct 10 17:32:52 2018] epoch_id: 44, batch_id: 900, cost: 0.001718, acc: 1.000000
+[Wed Oct 10 17:32:55 2018] epoch_id: 44, batch_id: 1000, cost: 0.000221, acc: 1.000000
+[Wed Oct 10 17:32:57 2018] epoch_id: 44, batch_id: 1100, cost: 0.002172, acc: 1.000000
+[Wed Oct 10 17:32:59 2018] epoch_id: 44, batch_id: 1200, cost: 0.001158, acc: 1.000000
+[Wed Oct 10 17:33:02 2018] epoch_id: 44, batch_id: 1300, cost: 0.004667, acc: 1.000000
+[Wed Oct 10 17:33:04 2018] epoch_id: 44, batch_id: 1400, cost: 0.000685, acc: 1.000000
+[Wed Oct 10 17:33:06 2018] epoch_id: 44, batch_id: 1500, cost: 0.007730, acc: 1.000000
+[Wed Oct 10 17:33:08 2018] epoch_id: 44, batch_id: 1600, cost: 0.006694, acc: 1.000000
+[Wed Oct 10 17:33:11 2018] epoch_id: 44, batch_id: 1700, cost: 0.009508, acc: 0.992188
+[Wed Oct 10 17:33:13 2018] epoch_id: 44, batch_id: 1800, cost: 0.018037, acc: 0.992188
+[Wed Oct 10 17:33:15 2018] epoch_id: 44, batch_id: 1900, cost: 0.020902, acc: 0.976562
+[Wed Oct 10 17:33:18 2018] epoch_id: 44, batch_id: 2000, cost: 0.006977, acc: 0.992188
+[Wed Oct 10 17:33:20 2018] epoch_id: 44, batch_id: 2100, cost: 0.004821, acc: 1.000000
+[Wed Oct 10 17:33:22 2018] epoch_id: 44, batch_id: 2200, cost: 0.000209, acc: 1.000000
+[Wed Oct 10 17:33:25 2018] epoch_id: 44, batch_id: 2300, cost: 0.008764, acc: 0.992188
+[Wed Oct 10 17:33:27 2018] epoch_id: 44, batch_id: 2400, cost: 0.029171, acc: 0.992188
+[Wed Oct 10 17:33:29 2018] epoch_id: 44, batch_id: 2500, cost: 0.015028, acc: 0.992188
+[Wed Oct 10 17:33:31 2018] epoch_id: 44, batch_id: 2600, cost: 0.007096, acc: 1.000000
+[Wed Oct 10 17:33:33 2018] epoch_id: 44, batch_id: 2700, cost: 0.000547, acc: 1.000000
+[Wed Oct 10 17:33:36 2018] epoch_id: 44, batch_id: 2800, cost: 0.004024, acc: 1.000000
+[Wed Oct 10 17:33:38 2018] epoch_id: 44, batch_id: 2900, cost: 0.002191, acc: 1.000000
+[Wed Oct 10 17:33:40 2018] epoch_id: 44, batch_id: 3000, cost: 0.008875, acc: 1.000000
+[Wed Oct 10 17:33:41 2018] epoch_id: 44, train_avg_cost: 0.012328, train_avg_acc: 0.996076
+[Wed Oct 10 17:33:42 2018] epoch_id: 44, dev_cost: 1.575702, accuracy: 0.8331
+[Wed Oct 10 17:33:43 2018] epoch_id: 44, test_cost: 1.573283, accuracy: 0.8313
+[Wed Oct 10 17:33:52 2018] epoch_id: 45, batch_id: 0, cost: 0.002271, acc: 1.000000
+[Wed Oct 10 17:33:54 2018] epoch_id: 45, batch_id: 100, cost: 0.005500, acc: 1.000000
+[Wed Oct 10 17:33:56 2018] epoch_id: 45, batch_id: 200, cost: 0.001735, acc: 1.000000
+[Wed Oct 10 17:33:58 2018] epoch_id: 45, batch_id: 300, cost: 0.008910, acc: 1.000000
+[Wed Oct 10 17:34:01 2018] epoch_id: 45, batch_id: 400, cost: 0.010551, acc: 0.992188
+[Wed Oct 10 17:34:03 2018] epoch_id: 45, batch_id: 500, cost: 0.005958, acc: 1.000000
+[Wed Oct 10 17:34:05 2018] epoch_id: 45, batch_id: 600, cost: 0.012035, acc: 0.992188
+[Wed Oct 10 17:34:07 2018] epoch_id: 45, batch_id: 700, cost: 0.002110, acc: 1.000000
+[Wed Oct 10 17:34:10 2018] epoch_id: 45, batch_id: 800, cost: 0.014834, acc: 0.992188
+[Wed Oct 10 17:34:12 2018] epoch_id: 45, batch_id: 900, cost: 0.010944, acc: 0.992188
+[Wed Oct 10 17:34:14 2018] epoch_id: 45, batch_id: 1000, cost: 0.017574, acc: 0.992188
+[Wed Oct 10 17:34:16 2018] epoch_id: 45, batch_id: 1100, cost: 0.006877, acc: 1.000000
+[Wed Oct 10 17:34:19 2018] epoch_id: 45, batch_id: 1200, cost: 0.001731, acc: 1.000000
+[Wed Oct 10 17:34:21 2018] epoch_id: 45, batch_id: 1300, cost: 0.002963, acc: 1.000000
+[Wed Oct 10 17:34:23 2018] epoch_id: 45, batch_id: 1400, cost: 0.009798, acc: 1.000000
+[Wed Oct 10 17:34:25 2018] epoch_id: 45, batch_id: 1500, cost: 0.003309, acc: 1.000000
+[Wed Oct 10 17:34:28 2018] epoch_id: 45, batch_id: 1600, cost: 0.022402, acc: 0.984375
+[Wed Oct 10 17:34:30 2018] epoch_id: 45, batch_id: 1700, cost: 0.003854, acc: 1.000000
+[Wed Oct 10 17:34:32 2018] epoch_id: 45, batch_id: 1800, cost: 0.000418, acc: 1.000000
+[Wed Oct 10 17:34:35 2018] epoch_id: 45, batch_id: 1900, cost: 0.014512, acc: 0.992188
+[Wed Oct 10 17:34:37 2018] epoch_id: 45, batch_id: 2000, cost: 0.031922, acc: 0.992188
+[Wed Oct 10 17:34:39 2018] epoch_id: 45, batch_id: 2100, cost: 0.002671, acc: 1.000000
+[Wed Oct 10 17:34:42 2018] epoch_id: 45, batch_id: 2200, cost: 0.042934, acc: 0.984375
+[Wed Oct 10 17:34:44 2018] epoch_id: 45, batch_id: 2300, cost: 0.008559, acc: 1.000000
+[Wed Oct 10 17:34:46 2018] epoch_id: 45, batch_id: 2400, cost: 0.050518, acc: 0.984375
+[Wed Oct 10 17:34:48 2018] epoch_id: 45, batch_id: 2500, cost: 0.001887, acc: 1.000000
+[Wed Oct 10 17:34:50 2018] epoch_id: 45, batch_id: 2600, cost: 0.002196, acc: 1.000000
+[Wed Oct 10 17:34:54 2018] epoch_id: 45, batch_id: 2700, cost: 0.002765, acc: 1.000000
+[Wed Oct 10 17:34:56 2018] epoch_id: 45, batch_id: 2800, cost: 0.024691, acc: 0.992188
+[Wed Oct 10 17:34:59 2018] epoch_id: 45, batch_id: 2900, cost: 0.003790, acc: 1.000000
+[Wed Oct 10 17:35:01 2018] epoch_id: 45, batch_id: 3000, cost: 0.001317, acc: 1.000000
+[Wed Oct 10 17:35:01 2018] epoch_id: 45, train_avg_cost: 0.012084, train_avg_acc: 0.996298
+[Wed Oct 10 17:35:02 2018] epoch_id: 45, dev_cost: 1.603634, accuracy: 0.8321
+[Wed Oct 10 17:35:03 2018] epoch_id: 45, test_cost: 1.609678, accuracy: 0.8291
+[Wed Oct 10 17:35:12 2018] epoch_id: 46, batch_id: 0, cost: 0.002291, acc: 1.000000
+[Wed Oct 10 17:35:14 2018] epoch_id: 46, batch_id: 100, cost: 0.018703, acc: 0.992188
+[Wed Oct 10 17:35:16 2018] epoch_id: 46, batch_id: 200, cost: 0.004407, acc: 1.000000
+[Wed Oct 10 17:35:18 2018] epoch_id: 46, batch_id: 300, cost: 0.000953, acc: 1.000000
+[Wed Oct 10 17:35:21 2018] epoch_id: 46, batch_id: 400, cost: 0.000732, acc: 1.000000
+[Wed Oct 10 17:35:23 2018] epoch_id: 46, batch_id: 500, cost: 0.011275, acc: 0.992188
+[Wed Oct 10 17:35:25 2018] epoch_id: 46, batch_id: 600, cost: 0.009521, acc: 1.000000
+[Wed Oct 10 17:35:27 2018] epoch_id: 46, batch_id: 700, cost: 0.000671, acc: 1.000000
+[Wed Oct 10 17:35:30 2018] epoch_id: 46, batch_id: 800, cost: 0.000768, acc: 1.000000
+[Wed Oct 10 17:35:32 2018] epoch_id: 46, batch_id: 900, cost: 0.001357, acc: 1.000000
+[Wed Oct 10 17:35:34 2018] epoch_id: 46, batch_id: 1000, cost: 0.001384, acc: 1.000000
+[Wed Oct 10 17:35:37 2018] epoch_id: 46, batch_id: 1100, cost: 0.010220, acc: 0.992188
+[Wed Oct 10 17:35:39 2018] epoch_id: 46, batch_id: 1200, cost: 0.006540, acc: 1.000000
+[Wed Oct 10 17:35:41 2018] epoch_id: 46, batch_id: 1300, cost: 0.002771, acc: 1.000000
+[Wed Oct 10 17:35:44 2018] epoch_id: 46, batch_id: 1400, cost: 0.010623, acc: 0.992188
+[Wed Oct 10 17:35:46 2018] epoch_id: 46, batch_id: 1500, cost: 0.000798, acc: 1.000000
+[Wed Oct 10 17:35:48 2018] epoch_id: 46, batch_id: 1600, cost: 0.004519, acc: 1.000000
+[Wed Oct 10 17:35:50 2018] epoch_id: 46, batch_id: 1700, cost: 0.010096, acc: 1.000000
+[Wed Oct 10 17:35:53 2018] epoch_id: 46, batch_id: 1800, cost: 0.001868, acc: 1.000000
+[Wed Oct 10 17:35:55 2018] epoch_id: 46, batch_id: 1900, cost: 0.039460, acc: 0.984375
+[Wed Oct 10 17:35:57 2018] epoch_id: 46, batch_id: 2000, cost: 0.008906, acc: 1.000000
+[Wed Oct 10 17:35:59 2018] epoch_id: 46, batch_id: 2100, cost: 0.008440, acc: 0.992188
+[Wed Oct 10 17:36:02 2018] epoch_id: 46, batch_id: 2200, cost: 0.014774, acc: 0.992188
+[Wed Oct 10 17:36:05 2018] epoch_id: 46, batch_id: 2300, cost: 0.016775, acc: 0.992188
+[Wed Oct 10 17:36:07 2018] epoch_id: 46, batch_id: 2400, cost: 0.008999, acc: 0.992188
+[Wed Oct 10 17:36:10 2018] epoch_id: 46, batch_id: 2500, cost: 0.001394, acc: 1.000000
+[Wed Oct 10 17:36:12 2018] epoch_id: 46, batch_id: 2600, cost: 0.005627, acc: 1.000000
+[Wed Oct 10 17:36:14 2018] epoch_id: 46, batch_id: 2700, cost: 0.003667, acc: 1.000000
+[Wed Oct 10 17:36:16 2018] epoch_id: 46, batch_id: 2800, cost: 0.016338, acc: 0.992188
+[Wed Oct 10 17:36:19 2018] epoch_id: 46, batch_id: 2900, cost: 0.005622, acc: 1.000000
+[Wed Oct 10 17:36:21 2018] epoch_id: 46, batch_id: 3000, cost: 0.003068, acc: 1.000000
+[Wed Oct 10 17:36:21 2018] epoch_id: 46, train_avg_cost: 0.011216, train_avg_acc: 0.996266
+[Wed Oct 10 17:36:22 2018] epoch_id: 46, dev_cost: 1.772260, accuracy: 0.8309
+[Wed Oct 10 17:36:23 2018] epoch_id: 46, test_cost: 1.783967, accuracy: 0.8284
+[Wed Oct 10 17:36:32 2018] epoch_id: 47, batch_id: 0, cost: 0.001193, acc: 1.000000
+[Wed Oct 10 17:36:34 2018] epoch_id: 47, batch_id: 100, cost: 0.000584, acc: 1.000000
+[Wed Oct 10 17:36:36 2018] epoch_id: 47, batch_id: 200, cost: 0.001534, acc: 1.000000
+[Wed Oct 10 17:36:38 2018] epoch_id: 47, batch_id: 300, cost: 0.014105, acc: 0.992188
+[Wed Oct 10 17:36:41 2018] epoch_id: 47, batch_id: 400, cost: 0.000929, acc: 1.000000
+[Wed Oct 10 17:36:43 2018] epoch_id: 47, batch_id: 500, cost: 0.007649, acc: 0.992188
+[Wed Oct 10 17:36:45 2018] epoch_id: 47, batch_id: 600, cost: 0.009973, acc: 0.992188
+[Wed Oct 10 17:36:47 2018] epoch_id: 47, batch_id: 700, cost: 0.006471, acc: 1.000000
+[Wed Oct 10 17:36:49 2018] epoch_id: 47, batch_id: 800, cost: 0.002720, acc: 1.000000
+[Wed Oct 10 17:36:53 2018] epoch_id: 47, batch_id: 900, cost: 0.001402, acc: 1.000000
+[Wed Oct 10 17:36:55 2018] epoch_id: 47, batch_id: 1000, cost: 0.000697, acc: 1.000000
+[Wed Oct 10 17:36:57 2018] epoch_id: 47, batch_id: 1100, cost: 0.001998, acc: 1.000000
+[Wed Oct 10 17:36:59 2018] epoch_id: 47, batch_id: 1200, cost: 0.009035, acc: 0.992188
+[Wed Oct 10 17:37:02 2018] epoch_id: 47, batch_id: 1300, cost: 0.006139, acc: 1.000000
+[Wed Oct 10 17:37:04 2018] epoch_id: 47, batch_id: 1400, cost: 0.007283, acc: 1.000000
+[Wed Oct 10 17:37:06 2018] epoch_id: 47, batch_id: 1500, cost: 0.016960, acc: 0.992188
+[Wed Oct 10 17:37:08 2018] epoch_id: 47, batch_id: 1600, cost: 0.001158, acc: 1.000000
+[Wed Oct 10 17:37:11 2018] epoch_id: 47, batch_id: 1700, cost: 0.001425, acc: 1.000000
+[Wed Oct 10 17:37:13 2018] epoch_id: 47, batch_id: 1800, cost: 0.001285, acc: 1.000000
+[Wed Oct 10 17:37:15 2018] epoch_id: 47, batch_id: 1900, cost: 0.002734, acc: 1.000000
+[Wed Oct 10 17:37:17 2018] epoch_id: 47, batch_id: 2000, cost: 0.000576, acc: 1.000000
+[Wed Oct 10 17:37:20 2018] epoch_id: 47, batch_id: 2100, cost: 0.001285, acc: 1.000000
+[Wed Oct 10 17:37:22 2018] epoch_id: 47, batch_id: 2200, cost: 0.000798, acc: 1.000000
+[Wed Oct 10 17:37:24 2018] epoch_id: 47, batch_id: 2300, cost: 0.059468, acc: 0.984375
+[Wed Oct 10 17:37:26 2018] epoch_id: 47, batch_id: 2400, cost: 0.004177, acc: 1.000000
+[Wed Oct 10 17:37:29 2018] epoch_id: 47, batch_id: 2500, cost: 0.001915, acc: 1.000000
+[Wed Oct 10 17:37:31 2018] epoch_id: 47, batch_id: 2600, cost: 0.000491, acc: 1.000000
+[Wed Oct 10 17:37:33 2018] epoch_id: 47, batch_id: 2700, cost: 0.001129, acc: 1.000000
+[Wed Oct 10 17:37:35 2018] epoch_id: 47, batch_id: 2800, cost: 0.000988, acc: 1.000000
+[Wed Oct 10 17:37:38 2018] epoch_id: 47, batch_id: 2900, cost: 0.024258, acc: 0.992188
+[Wed Oct 10 17:37:40 2018] epoch_id: 47, batch_id: 3000, cost: 0.000902, acc: 1.000000
+[Wed Oct 10 17:37:41 2018] epoch_id: 47, train_avg_cost: 0.011536, train_avg_acc: 0.996225
+[Wed Oct 10 17:37:42 2018] epoch_id: 47, dev_cost: 2.235156, accuracy: 0.8326
+[Wed Oct 10 17:37:43 2018] epoch_id: 47, test_cost: 2.289617, accuracy: 0.8318
+[Wed Oct 10 17:37:51 2018] epoch_id: 48, batch_id: 0, cost: 0.007975, acc: 1.000000
+[Wed Oct 10 17:37:54 2018] epoch_id: 48, batch_id: 100, cost: 0.000491, acc: 1.000000
+[Wed Oct 10 17:37:56 2018] epoch_id: 48, batch_id: 200, cost: 0.015161, acc: 0.992188
+[Wed Oct 10 17:37:58 2018] epoch_id: 48, batch_id: 300, cost: 0.030692, acc: 0.992188
+[Wed Oct 10 17:38:00 2018] epoch_id: 48, batch_id: 400, cost: 0.016749, acc: 0.992188
+[Wed Oct 10 17:38:03 2018] epoch_id: 48, batch_id: 500, cost: 0.005637, acc: 1.000000
+[Wed Oct 10 17:38:05 2018] epoch_id: 48, batch_id: 600, cost: 0.014267, acc: 0.992188
+[Wed Oct 10 17:38:07 2018] epoch_id: 48, batch_id: 700, cost: 0.002352, acc: 1.000000
+[Wed Oct 10 17:38:10 2018] epoch_id: 48, batch_id: 800, cost: 0.002758, acc: 1.000000
+[Wed Oct 10 17:38:12 2018] epoch_id: 48, batch_id: 900, cost: 0.000367, acc: 1.000000
+[Wed Oct 10 17:38:14 2018] epoch_id: 48, batch_id: 1000, cost: 0.003479, acc: 1.000000
+[Wed Oct 10 17:38:16 2018] epoch_id: 48, batch_id: 1100, cost: 0.006107, acc: 1.000000
+[Wed Oct 10 17:38:19 2018] epoch_id: 48, batch_id: 1200, cost: 0.000989, acc: 1.000000
+[Wed Oct 10 17:38:21 2018] epoch_id: 48, batch_id: 1300, cost: 0.000442, acc: 1.000000
+[Wed Oct 10 17:38:23 2018] epoch_id: 48, batch_id: 1400, cost: 0.002006, acc: 1.000000
+[Wed Oct 10 17:38:25 2018] epoch_id: 48, batch_id: 1500, cost: 0.022174, acc: 0.992188
+[Wed Oct 10 17:38:28 2018] epoch_id: 48, batch_id: 1600, cost: 0.004670, acc: 1.000000
+[Wed Oct 10 17:38:30 2018] epoch_id: 48, batch_id: 1700, cost: 0.014862, acc: 0.992188
+[Wed Oct 10 17:38:32 2018] epoch_id: 48, batch_id: 1800, cost: 0.004648, acc: 1.000000
+[Wed Oct 10 17:38:36 2018] epoch_id: 48, batch_id: 1900, cost: 0.035342, acc: 0.992188
+[Wed Oct 10 17:38:38 2018] epoch_id: 48, batch_id: 2000, cost: 0.018578, acc: 0.992188
+[Wed Oct 10 17:38:40 2018] epoch_id: 48, batch_id: 2100, cost: 0.003790, acc: 1.000000
+[Wed Oct 10 17:38:42 2018] epoch_id: 48, batch_id: 2200, cost: 0.026731, acc: 0.984375
+[Wed Oct 10 17:38:45 2018] epoch_id: 48, batch_id: 2300, cost: 0.003608, acc: 1.000000
+[Wed Oct 10 17:38:47 2018] epoch_id: 48, batch_id: 2400, cost: 0.005601, acc: 1.000000
+[Wed Oct 10 17:38:49 2018] epoch_id: 48, batch_id: 2500, cost: 0.000833, acc: 1.000000
+[Wed Oct 10 17:38:52 2018] epoch_id: 48, batch_id: 2600, cost: 0.004157, acc: 1.000000
+[Wed Oct 10 17:38:54 2018] epoch_id: 48, batch_id: 2700, cost: 0.010146, acc: 0.992188
+[Wed Oct 10 17:38:56 2018] epoch_id: 48, batch_id: 2800, cost: 0.001127, acc: 1.000000
+[Wed Oct 10 17:38:58 2018] epoch_id: 48, batch_id: 2900, cost: 0.004332, acc: 1.000000
+[Wed Oct 10 17:39:01 2018] epoch_id: 48, batch_id: 3000, cost: 0.004895, acc: 1.000000
+[Wed Oct 10 17:39:01 2018] epoch_id: 48, train_avg_cost: 0.010959, train_avg_acc: 0.996475
+[Wed Oct 10 17:39:02 2018] epoch_id: 48, dev_cost: 1.764490, accuracy: 0.8343
+[Wed Oct 10 17:39:03 2018] epoch_id: 48, test_cost: 1.826369, accuracy: 0.8296
+[Wed Oct 10 17:39:12 2018] epoch_id: 49, batch_id: 0, cost: 0.004527, acc: 1.000000
+[Wed Oct 10 17:39:14 2018] epoch_id: 49, batch_id: 100, cost: 0.003537, acc: 1.000000
+[Wed Oct 10 17:39:16 2018] epoch_id: 49, batch_id: 200, cost: 0.034318, acc: 0.992188
+[Wed Oct 10 17:39:19 2018] epoch_id: 49, batch_id: 300, cost: 0.024897, acc: 0.992188
+[Wed Oct 10 17:39:21 2018] epoch_id: 49, batch_id: 400, cost: 0.002212, acc: 1.000000
+[Wed Oct 10 17:39:23 2018] epoch_id: 49, batch_id: 500, cost: 0.012678, acc: 0.992188
+[Wed Oct 10 17:39:25 2018] epoch_id: 49, batch_id: 600, cost: 0.006081, acc: 1.000000
+[Wed Oct 10 17:39:28 2018] epoch_id: 49, batch_id: 700, cost: 0.004294, acc: 1.000000
+[Wed Oct 10 17:39:30 2018] epoch_id: 49, batch_id: 800, cost: 0.000339, acc: 1.000000
+[Wed Oct 10 17:39:32 2018] epoch_id: 49, batch_id: 900, cost: 0.006350, acc: 0.992188
+[Wed Oct 10 17:39:35 2018] epoch_id: 49, batch_id: 1000, cost: 0.002183, acc: 1.000000
+[Wed Oct 10 17:39:37 2018] epoch_id: 49, batch_id: 1100, cost: 0.006977, acc: 1.000000
+[Wed Oct 10 17:39:39 2018] epoch_id: 49, batch_id: 1200, cost: 0.003140, acc: 1.000000
+[Wed Oct 10 17:39:41 2018] epoch_id: 49, batch_id: 1300, cost: 0.003361, acc: 1.000000
+[Wed Oct 10 17:39:44 2018] epoch_id: 49, batch_id: 1400, cost: 0.002039, acc: 1.000000
+[Wed Oct 10 17:39:46 2018] epoch_id: 49, batch_id: 1500, cost: 0.001850, acc: 1.000000
+[Wed Oct 10 17:39:48 2018] epoch_id: 49, batch_id: 1600, cost: 0.045419, acc: 0.992188
+[Wed Oct 10 17:39:50 2018] epoch_id: 49, batch_id: 1700, cost: 0.000883, acc: 1.000000
+[Wed Oct 10 17:39:53 2018] epoch_id: 49, batch_id: 1800, cost: 0.002086, acc: 1.000000
+[Wed Oct 10 17:39:55 2018] epoch_id: 49, batch_id: 1900, cost: 0.014964, acc: 0.992188
+[Wed Oct 10 17:39:57 2018] epoch_id: 49, batch_id: 2000, cost: 0.002001, acc: 1.000000
+[Wed Oct 10 17:39:59 2018] epoch_id: 49, batch_id: 2100, cost: 0.013663, acc: 0.984375
+[Wed Oct 10 17:40:02 2018] epoch_id: 49, batch_id: 2200, cost: 0.013116, acc: 0.992188
+[Wed Oct 10 17:40:04 2018] epoch_id: 49, batch_id: 2300, cost: 0.002713, acc: 1.000000
+[Wed Oct 10 17:40:06 2018] epoch_id: 49, batch_id: 2400, cost: 0.004193, acc: 1.000000
+[Wed Oct 10 17:40:08 2018] epoch_id: 49, batch_id: 2500, cost: 0.001507, acc: 1.000000
+[Wed Oct 10 17:40:11 2018] epoch_id: 49, batch_id: 2600, cost: 0.034837, acc: 0.992188
+[Wed Oct 10 17:40:13 2018] epoch_id: 49, batch_id: 2700, cost: 0.006245, acc: 1.000000
+[Wed Oct 10 17:40:15 2018] epoch_id: 49, batch_id: 2800, cost: 0.003659, acc: 1.000000
+[Wed Oct 10 17:40:17 2018] epoch_id: 49, batch_id: 2900, cost: 0.002175, acc: 1.000000
+[Wed Oct 10 17:40:19 2018] epoch_id: 49, batch_id: 3000, cost: 0.000767, acc: 1.000000
+[Wed Oct 10 17:40:20 2018] epoch_id: 49, train_avg_cost: 0.011233, train_avg_acc: 0.996326
+[Wed Oct 10 17:40:21 2018] epoch_id: 49, dev_cost: 1.652680, accuracy: 0.8353
+[Wed Oct 10 17:40:22 2018] epoch_id: 49, test_cost: 1.685406, accuracy: 0.8324
--- a/fluid/text_matching_on_quora/configs/__init__.py
+++ b/fluid/text_matching_on_quora/configs/__init__.py
+# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from .cdssm import cdssm_base
+from .dec_att import decatt_glove
+from .sse import sse_base
+from .infer_sent import infer_sent_v1
+from .infer_sent import infer_sent_v2
--- a/fluid/text_matching_on_quora/configs/basic_config.py
+++ b/fluid/text_matching_on_quora/configs/basic_config.py
+from __future__ import print_function
+class config(object):
+    def __init__(self):
+        self.batch_size = 128
+        self.epoch_num = 50
+        self.optimizer_type = 'adam' # sgd, adagrad
+        # pretrained word embedding 
+        self.use_pretrained_word_embedding = True
+        # when employing pretrained word embedding,  
+        # out of vocabulary words' embedding is initialized with uniform or normal numbers
+        self.OOV_fill = 'uniform'
+        self.embedding_norm = False
+        # or else, use padding and masks for sequence data
+        self.use_lod_tensor = True
+        # lr = lr * lr_decay after each epoch
+        self.lr_decay = 1
+        self.learning_rate = 0.001
+        self.save_dirname = 'model_dir'
+        self.train_samples_num = 384348
+        self.duplicate_data = False
+        self.metric_type = ['accuracy']
+    def list_config(self):
+        print("config", self.__dict__)
+    def has_member(self, var_name):
+        return var_name in self.__dict__
+if __name__ == "__main__":
+    basic = config()
+    basic.list_config()
+    basic.ahh = 2
+    basic.list_config()
--- a/fluid/text_matching_on_quora/configs/cdssm.py
+++ b/fluid/text_matching_on_quora/configs/cdssm.py
+# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from . import basic_config
+def cdssm_base():
+    """
+    set configs
+    """
+    config = basic_config.config()
+    config.learning_rate = 0.001
+    config.save_dirname = "model_dir"
+    config.use_pretrained_word_embedding = True
+    config.dict_dim = 40000 # approx_vocab_size
+    # net config
+    config.emb_dim = 300
+    config.kernel_size = 5
+    config.kernel_count = 300
+    config.fc_dim = 128
+    config.mlp_hid_dim = [128, 128]
+    config.droprate_conv = 0.1
+    config.droprate_fc = 0.1
+    config.class_dim = 2
+    return config 
--- a/fluid/text_matching_on_quora/configs/dec_att.py
+++ b/fluid/text_matching_on_quora/configs/dec_att.py
+# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from . import basic_config
+def decatt_glove():
+    """
+    use config 'decAtt_glove' in the paper 'Neural Paraphrase Identification of Questions with Noisy Pretraining'
+    """
+    config = basic_config.config()
+    config.learning_rate = 0.05
+    config.save_dirname = "model_dir"
+    config.use_pretrained_word_embedding = True
+    config.dict_dim = 40000 # approx_vocab_size
+    config.metric_type = ['accuracy', 'accuracy_with_threshold']
+    config.optimizer_type = 'sgd'
+    config.lr_decay = 1
+    config.use_lod_tensor = False
+    config.embedding_norm = False
+    config.OOV_fill = 'uniform'
+    config.duplicate_data = False
+    # net config
+    config.emb_dim = 300
+    config.proj_emb_dim = 200 #TODO: has project?
+    config.num_units = [400, 200]
+    config.word_embedding_trainable = True
+    config.droprate = 0.1
+    config.share_wight_btw_seq =  True
+    config.class_dim = 2
+    return config
+def decatt_word():
+    """
+    use config 'decAtt_glove' in the paper 'Neural Paraphrase Identification of Questions with Noisy Pretraining'
+    """
+    config = basic_config.config()
+    config.learning_rate = 0.05
+    config.save_dirname = "model_dir"
+    config.use_pretrained_word_embedding = False
+    config.dict_dim = 40000 # approx_vocab_size
+    config.metric_type = ['accuracy', 'accuracy_with_threshold']
+    config.optimizer_type = 'sgd'
+    config.lr_decay = 1
+    config.use_lod_tensor = False
+    config.embedding_norm = False
+    config.OOV_fill = 'uniform'
+    config.duplicate_data = False
+    # net config
+    config.emb_dim = 300
+    config.proj_emb_dim = 200 #TODO: has project?
+    config.num_units = [400, 200]
+    config.word_embedding_trainable = True
+    config.droprate = 0.1
+    config.share_wight_btw_seq =  True
+    config.class_dim = 2
+    return config 
--- a/fluid/text_matching_on_quora/configs/infer_sent.py
+++ b/fluid/text_matching_on_quora/configs/infer_sent.py
+# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from . import basic_config
+def infer_sent_v1():
+    """
+    set configs
+    """
+    config = basic_config.config()
+    config.learning_rate = 0.1
+    config.lr_decay = 0.99
+    config.optimizer_type = 'sgd'
+    config.save_dirname = "model_dir"
+    config.use_pretrained_word_embedding = True
+    config.dict_dim = 40000 # approx_vocab_size
+    config.class_dim = 2
+    # net config
+    config.emb_dim = 300 
+    config.droprate_lstm = 0.0
+    config.droprate_fc = 0.0
+    config.word_embedding_trainable = False
+    config.rnn_hid_dim = 2048
+    config.mlp_non_linear = False
+    return config
+def infer_sent_v2():
+    """
+    use our own config
+    """
+    config = basic_config.config()
+    config.learning_rate = 0.0002
+    config.lr_decay = 0.99
+    config.optimizer_type = 'adam'
+    config.save_dirname = "model_dir"
+    config.use_pretrained_word_embedding = True
+    config.dict_dim = 40000 # approx_vocab_size
+    config.class_dim = 2
+    # net config
+    config.emb_dim = 300
+    config.droprate_lstm = 0.0
+    config.droprate_fc = 0.2
+    config.word_embedding_trainable = False
+    config.rnn_hid_dim = 2048
+    config.mlp_non_linear = True
+    return config
--- a/fluid/text_matching_on_quora/configs/sse.py
+++ b/fluid/text_matching_on_quora/configs/sse.py
+# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from . import basic_config
+def sse_base():
+    """
+    use config in the paper 'Shortcut-Stacked Sentence Encoders for Multi-Domain Inference'
+    """
+    config = basic_config.config()
+    config.learning_rate = 0.0002
+    config.lr_decay = 0.7
+    config.save_dirname = "model_dir"
+    config.use_pretrained_word_embedding = True
+    config.dict_dim = 40000 # approx_vocab_size
+    config.metric_type = ['accuracy']
+    config.optimizer_type = 'adam'
+    config.use_lod_tensor = True
+    config.embedding_norm = False
+    config.OOV_fill = 'uniform'
+    config.duplicate_data = False
+    # net config
+    config.emb_dim = 300
+    config.rnn_hid_dim = [512, 1024, 2048]
+    config.fc_dim = [1600, 1600]
+    config.droprate_lstm = 0.0
+    config.droprate_fc = 0.1
+    config.class_dim = 2
+    return config
--- a/fluid/text_matching_on_quora/data/prepare_quora_data.sh
+++ b/fluid/text_matching_on_quora/data/prepare_quora_data.sh
+# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# Please download the Quora dataset firstly from https://drive.google.com/file/d/0B0PlTAo--BnaQWlsZl9FZ3l1c28/view?usp=sharing
+# to the ROOT_DIR: $HOME/.cache/paddle/dataset
+DATA_DIR=$HOME/.cache/paddle/dataset
+wget --directory-prefix=$DATA_DIR http://nlp.stanford.edu/data/glove.840B.300d.zip
+unzip $DATA_DIR/glove.840B.300d.zip
+# The finally dataset dir should be like
+# $HOME/.cache/paddle/dataset
+# |- Quora_question_pair_partition
+#     |- train.tsv
+#     |- test.tsv
+#     |- dev.tsv
+#     |- readme.txt
+#     |- wordvec.txt
+# |- glove.840B.300d.txt
--- a/fluid/text_matching_on_quora/imgs/README.md
+++ b/fluid/text_matching_on_quora/imgs/README.md
+Image files for this model: text_matching_on_quora
--- a/fluid/text_matching_on_quora/imgs/models_test_acc.png
+++ b/fluid/text_matching_on_quora/imgs/models_test_acc.png
--- a/fluid/text_matching_on_quora/metric.py
+++ b/fluid/text_matching_on_quora/metric.py
+# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import numpy as np
+"""
+This Module defines evaluate metrics for classification tasks
+"""
+def accuracy(y_pred, label):
+    """
+    define correct: the top 1 class in y_pred is the same as y_true
+    """
+    y_pred = np.squeeze(y_pred)
+    y_pred_idx = np.argmax(y_pred, axis=1)
+    return 1.0 * np.sum(y_pred_idx == label) / label.shape[0]
+def accuracy_with_threshold(y_pred, label, threshold=0.5):
+    """
+    define correct: the y_true class's prob in y_pred is bigger than threshold
+    when threshold is 0.5, This fuction is equal to accuracy
+    """
+    y_pred = np.squeeze(y_pred)
+    y_pred_idx = (y_pred[:, 1] > threshold).astype(int)
+    return 1.0 * np.sum(y_pred_idx == label) / label.shape[0]
--- a/fluid/text_matching_on_quora/models/__init__.py
+++ b/fluid/text_matching_on_quora/models/__init__.py
+# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from .cdssm import cdssmNet
+from .dec_att import DecAttNet
+from .sse import SSENet
+from .infer_sent import InferSentNet
--- a/fluid/text_matching_on_quora/models/cdssm.py
+++ b/fluid/text_matching_on_quora/models/cdssm.py
+# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import paddle.fluid as fluid
+class cdssmNet():
+    """cdssm net"""
+    def __init__(self, config):
+         self._config = config
+    def __call__(self, seq1, seq2, label):
+        return self.body(seq1, seq2, label, self._config)
+    def body(self, seq1, seq2, label, config):
+        """Body function"""
+        def conv_model(seq):
+            embed = fluid.layers.embedding(input=seq, size=[config.dict_dim, config.emb_dim], param_attr='emb.w')
+            conv = fluid.layers.sequence_conv(embed,
+                                        num_filters=config.kernel_count,
+                                        filter_size=config.kernel_size,
+                                        filter_stride=1,
+                                        padding=True, # TODO: what is padding
+                                        bias_attr=False,
+                                        param_attr='conv1d.w',
+                                        act='relu')
+            #print paddle.parameters.get('conv1d.w').shape
+            conv = fluid.layers.dropout(conv, dropout_prob = config.droprate_conv)
+            pool = fluid.layers.sequence_pool(conv, pool_type="max")
+            fc = fluid.layers.fc(pool,
+                             size=config.fc_dim,
+                             param_attr='fc1.w',
+                             bias_attr='fc1.b',
+                             act='relu')
+            return fc
+        def MLP(vec):
+            for dim in config.mlp_hid_dim:
+                vec = fluid.layers.fc(vec, size=dim, act='relu')
+                vec = fluid.layers.dropout(vec, dropout_prob=config.droprate_fc)
+            return vec
+        seq1_fc = conv_model(seq1)
+        seq2_fc = conv_model(seq2)
+        concated_seq = fluid.layers.concat(input=[seq1_fc, seq2_fc], axis=1)
+        mlp_res = MLP(concated_seq)
+        prediction = fluid.layers.fc(mlp_res, size=config.class_dim, act='softmax')
+        loss = fluid.layers.cross_entropy(input=prediction, label=label)
+        avg_cost = fluid.layers.mean(x=loss)
+        acc = fluid.layers.accuracy(input=prediction, label=label)
+        return avg_cost, acc, prediction
--- a/fluid/text_matching_on_quora/models/dec_att.py
+++ b/fluid/text_matching_on_quora/models/dec_att.py
+# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import paddle.fluid as fluid
+class DecAttNet():
+    """decompose attention net"""
+    def __init__(self, config):
+         self._config = config
+         self.initializer = fluid.initializer.Xavier(uniform=False)
+    def __call__(self, seq1, seq2, mask1, mask2, label):
+        return self.body(seq1, seq2, mask1, mask2, label)
+    def body(self, seq1, seq2, mask1, mask2, label):
+        """Body function"""
+        transformed_q1 = self.transformation(seq1)
+        transformed_q2 = self.transformation(seq2)
+        masked_q1 = self.apply_mask(transformed_q1, mask1)
+        masked_q2 = self.apply_mask(transformed_q2, mask2)
+        alpha, beta = self.attend(masked_q1, masked_q2)
+        if self._config.share_wight_btw_seq:
+            seq1_compare = self.compare(masked_q1, beta, param_prefix='compare')
+            seq2_compare = self.compare(masked_q2, alpha, param_prefix='compare')
+        else:
+            seq1_compare = self.compare(masked_q1, beta, param_prefix='compare_1')
+            seq2_compare = self.compare(masked_q2, alpha, param_prefix='compare_2')
+        aggregate_res = self.aggregate(seq1_compare, seq2_compare)
+        prediction = fluid.layers.fc(aggregate_res, size=self._config.class_dim, act='softmax')
+        loss = fluid.layers.cross_entropy(input=prediction, label=label)
+        avg_cost = fluid.layers.mean(x=loss)
+        acc = fluid.layers.accuracy(input=prediction, label=label)
+        return avg_cost, acc, prediction
+    def apply_mask(self, seq, mask):
+       """
+       apply mask on seq
+       Input: seq in shape [batch_size, seq_len, embedding_size]
+       Input: mask in shape [batch_size, seq_len]
+       Output: masked seq in shape [batch_size, seq_len, embedding_size]
+       """
+       return fluid.layers.elementwise_mul(x=seq, y=mask, axis=0)
+    def feed_forward_2d(self, vec, param_prefix):
+        """
+        Input: vec in shape [batch_size, seq_len, vec_dim]
+        Output: fc2 in shape [batch_size, seq_len, num_units[1]]
+        """
+        fc1 = fluid.layers.fc(vec, size=self._config.num_units[0], num_flatten_dims=2,
+                        param_attr=fluid.ParamAttr(name=param_prefix+'_fc1.w',
+                                                   initializer=self.initializer),
+                        bias_attr=param_prefix + '_fc1.b', act='relu')
+        fc1 = fluid.layers.dropout(fc1, dropout_prob = self._config.droprate)
+        fc2 = fluid.layers.fc(fc1, size=self._config.num_units[1], num_flatten_dims=2,
+                        param_attr=fluid.ParamAttr(name=param_prefix+'_fc2.w',
+                                                   initializer=self.initializer),
+                        bias_attr=param_prefix + '_fc2.b', act='relu')
+        fc2 = fluid.layers.dropout(fc2, dropout_prob = self._config.droprate)
+        return fc2
+    def feed_forward(self, vec, param_prefix):
+        """
+        Input: vec in shape [batch_size, vec_dim]
+        Output: fc2 in shape [batch_size, num_units[1]]
+        """
+        fc1 = fluid.layers.fc(vec, size=self._config.num_units[0], num_flatten_dims=1,
+                        param_attr=fluid.ParamAttr(name=param_prefix+'_fc1.w',
+                                                   initializer=self.initializer),
+                        bias_attr=param_prefix + '_fc1.b', act='relu')
+        fc1 = fluid.layers.dropout(fc1, dropout_prob = self._config.droprate)
+        fc2 = fluid.layers.fc(fc1, size=self._config.num_units[1], num_flatten_dims=1,
+                        param_attr=fluid.ParamAttr(name=param_prefix+'_fc2.w',
+                                                   initializer=self.initializer),
+                        bias_attr=param_prefix + '_fc2.b', act='relu')
+        fc2 = fluid.layers.dropout(fc2, dropout_prob = self._config.droprate)
+        return fc2
+    def transformation(self, seq):
+        embed = fluid.layers.embedding(input=seq, size=[self._config.dict_dim, self._config.emb_dim],
+                                       param_attr=fluid.ParamAttr(name='emb.w', trainable=self._config.word_embedding_trainable))
+        if self._config.proj_emb_dim is not None:
+            return fluid.layers.fc(embed, size=self._config.proj_emb_dim, num_flatten_dims=2,
+                        param_attr=fluid.ParamAttr(name='project' + '_fc1.w',
+                                                   initializer=self.initializer),
+                        bias_attr=False,
+                        act=None)
+        return embed
+    def attend(self, seq1, seq2):
+        """
+        Input: seq1, shape [batch_size, seq_len1, embed_size]
+        Input: seq2, shape [batch_size, seq_len2, embed_size]
+        Output: alpha, shape [batch_size, seq_len1, embed_size]
+        Output: beta, shape [batch_size, seq_len2, embed_size]
+        """
+        if self._config.share_wight_btw_seq:
+            seq1 = self.feed_forward_2d(seq1, param_prefix="attend")
+            seq2 = self.feed_forward_2d(seq2, param_prefix="attend")
+        else:
+            seq1 = self.feed_forward_2d(seq1, param_prefix="attend_1")
+            seq2 = self.feed_forward_2d(seq2, param_prefix="attend_2")
+        attention_weight = fluid.layers.matmul(seq1, seq2, transpose_y=True)
+        normalized_attention_weight = fluid.layers.softmax(attention_weight)
+        beta = fluid.layers.matmul(normalized_attention_weight, seq2)
+        attention_weight_t = fluid.layers.transpose(attention_weight, perm=[0, 2, 1])
+        normalized_attention_weight_t = fluid.layers.softmax(attention_weight_t)
+        alpha = fluid.layers.matmul(normalized_attention_weight_t, seq1)
+        return alpha, beta
+    def compare(self, seq, soft_alignment, param_prefix):
+        concat_seq = fluid.layers.concat(input=[seq, soft_alignment], axis=2)
+        return self.feed_forward_2d(concat_seq, param_prefix="compare")
+    def aggregate(self, vec1, vec2):
+        vec1 = fluid.layers.reduce_sum(vec1, dim=1)
+        vec2 = fluid.layers.reduce_sum(vec2, dim=1)
+        concat_vec = fluid.layers.concat(input=[vec1, vec2], axis=1)
+        return self.feed_forward(concat_vec, param_prefix='aggregate')
--- a/fluid/text_matching_on_quora/models/infer_sent.py
+++ b/fluid/text_matching_on_quora/models/infer_sent.py
+# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import paddle.fluid as fluid
+from .my_layers import bi_lstm_layer
+from .match_layers import ElementwiseMatching
+class InferSentNet():
+    """
+    Base on the paper: Supervised Learning of Universal Sentence Representations from Natural Language Inference Data:
+    https://arxiv.org/abs/1705.02364
+    """
+    def __init__(self, config):
+         self._config = config
+    def __call__(self, seq1, seq2, label):
+        return self.body(seq1, seq2, label, self._config)
+    def body(self, seq1, seq2, label, config):
+        """Body function"""
+        seq1_rnn = self.encoder(seq1)
+        seq2_rnn = self.encoder(seq2)
+        seq_match = ElementwiseMatching(seq1_rnn, seq2_rnn)
+        mlp_res = self.MLP(seq_match)
+        prediction = fluid.layers.fc(mlp_res, size=self._config.class_dim, act='softmax')
+        loss = fluid.layers.cross_entropy(input=prediction, label=label)
+        avg_cost = fluid.layers.mean(x=loss)
+        acc = fluid.layers.accuracy(input=prediction, label=label)
+        return avg_cost, acc, prediction
+    def encoder(self, seq):
+        """encoder"""
+        embed = fluid.layers.embedding(
+                    input=seq,
+                    size=[self._config.dict_dim, self._config.emb_dim],
+                    param_attr=fluid.ParamAttr(name='emb.w', trainable=self._config.word_embedding_trainable))
+        bi_lstm_h = bi_lstm_layer(
+                        embed,
+                        rnn_hid_dim = self._config.rnn_hid_dim, 
+                        name='encoder')
+        bi_lstm_h = fluid.layers.dropout(bi_lstm_h, dropout_prob=self._config.droprate_lstm)
+        pool = fluid.layers.sequence_pool(input=bi_lstm_h, pool_type='max')
+        return pool
+    def MLP(self, vec):
+        if self._config.mlp_non_linear:
+            drop1 = fluid.layers.dropout(vec, dropout_prob=self._config.droprate_fc)
+            fc1 = fluid.layers.fc(drop1, size=512, act='tanh')
+            drop2 = fluid.layers.dropout(fc1, dropout_prob=self._config.droprate_fc)
+            fc2 = fluid.layers.fc(drop2, size=512, act='tanh')
+            res = fluid.layers.dropout(fc2, dropout_prob=self._config.droprate_fc)
+        else:
+            fc1 = fluid.layers.fc(vec, size=512, act=None)
+            res = fluid.layers.fc(fc1, size=512, act=None)
+        return res
--- a/fluid/text_matching_on_quora/models/match_layers.py
+++ b/fluid/text_matching_on_quora/models/match_layers.py
+# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This Module provide different kinds of Match layers
+"""
+import paddle.fluid as fluid
+def MultiPerspectiveMatching(vec1, vec2, perspective_num):
+    """
+    MultiPerspectiveMatching
+    """
+    sim_res = None
+    for i in range(perspective_num):
+        vec1_res = fluid.layers.elementwise_add_with_weight(
+                       vec1,
+                       param_attr="elementwise_add_with_weight." + str(i))
+        vec2_res = fluid.layers.elementwise_add_with_weight(
+                       vec2,
+                       param_attr="elementwise_add_with_weight." + str(i))
+        m = fluid.layers.cos_sim(vec1_res, vec2_res)
+        if sim_res is None:
+            sim_res = m
+        else:
+            sim_res = fluid.layers.concat(input=[sim_res, m], axis=1)
+    return sim_res
+def ConcateMatching(vec1, vec2):
+    """
+    ConcateMatching
+    """
+    #TODO: assert shape
+    return fluid.layers.concat(input=[vec1, vec2], axis=1)
+def ElementwiseMatching(vec1, vec2):
+    """
+    reference: [Supervised Learning of Universal Sentence Representations from Natural Language Inference Data](https://arxiv.org/abs/1705.02364)
+    """
+    elementwise_mul = fluid.layers.elementwise_mul(x=vec1, y=vec2)
+    elementwise_sub = fluid.layers.elementwise_sub(x=vec1, y=vec2) 
+    elementwise_abs_sub = fluid.layers.abs(elementwise_sub)
+    return fluid.layers.concat(input=[vec1, vec2, elementwise_mul, elementwise_abs_sub], axis=1)
--- a/fluid/text_matching_on_quora/models/my_layers.py
+++ b/fluid/text_matching_on_quora/models/my_layers.py
+# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This module defines some Frequently-used DNN layers
+"""
+import paddle.fluid as fluid
+def bi_lstm_layer(input, rnn_hid_dim, name):
+    """
+    This is a Bi-directional LSTM(long short term memory) Module
+    """
+    fc0 = fluid.layers.fc(input=input,              # fc for lstm
+                                size=rnn_hid_dim * 4,
+                                param_attr=name + '.fc0.w',
+                                bias_attr=False,
+                                act=None)
+    lstm_h, c = fluid.layers.dynamic_lstm(
+                 input=fc0,
+                 size=rnn_hid_dim * 4,
+                 is_reverse=False,
+                 param_attr=name + '.lstm_w',
+                 bias_attr=name + '.lstm_b')
+    reversed_lstm_h, reversed_c = fluid.layers.dynamic_lstm(
+                 input=fc0,
+                 size=rnn_hid_dim * 4,
+                 is_reverse=True,
+                 param_attr=name + '.reversed_lstm_w',
+                 bias_attr=name + '.reversed_lstm_b')
+    return fluid.layers.concat(input=[lstm_h, reversed_lstm_h], axis=1) 
--- a/fluid/text_matching_on_quora/models/pwim.py
+++ b/fluid/text_matching_on_quora/models/pwim.py
+# Just for test `git push`
--- a/fluid/text_matching_on_quora/models/sse.py
+++ b/fluid/text_matching_on_quora/models/sse.py
+# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import paddle.fluid as fluid
+from .my_layers import bi_lstm_layer
+from .match_layers import ElementwiseMatching
+class SSENet():
+    """
+    SSE net: Shortcut-Stacked Sentence Encoders for Multi-Domain Inference
+    https://arxiv.org/abs/1708.02312
+    """
+    def __init__(self, config):
+         self._config = config
+    def __call__(self, seq1, seq2, label):
+        return self.body(seq1, seq2, label, self._config)
+    def body(self, seq1, seq2, label, config):
+        """Body function"""
+        def stacked_bi_rnn_model(seq):
+            embed = fluid.layers.embedding(input=seq, size=[self._config.dict_dim, self._config.emb_dim], param_attr='emb.w')
+            stacked_lstm_out = [embed]
+            for i in range(len(self._config.rnn_hid_dim)):
+                if i == 0:
+                    feature = embed
+                else:
+                    feature = fluid.layers.concat(input = stacked_lstm_out, axis=1)
+                bi_lstm_h = bi_lstm_layer(feature,
+                                          rnn_hid_dim=self._config.rnn_hid_dim[i],
+                                          name="lstm_" + str(i))
+                # add dropout except for the last stacked lstm layer
+                if i != len(self._config.rnn_hid_dim) - 1:
+                    bi_lstm_h = fluid.layers.dropout(bi_lstm_h, dropout_prob=self._config.droprate_lstm)
+                stacked_lstm_out.append(bi_lstm_h)
+            pool = fluid.layers.sequence_pool(input=bi_lstm_h, pool_type='max')
+            return pool
+        def MLP(vec):
+            for i in range(len(self._config.fc_dim)):
+                vec = fluid.layers.fc(vec, size=self._config.fc_dim[i], act='relu')
+                # add dropout after every layer of MLP
+                vec = fluid.layers.dropout(vec, dropout_prob=self._config.droprate_fc)
+            return vec
+        seq1_rnn = stacked_bi_rnn_model(seq1)
+        seq2_rnn = stacked_bi_rnn_model(seq2)
+        seq_match = ElementwiseMatching(seq1_rnn, seq2_rnn)
+        mlp_res = MLP(seq_match)
+        prediction = fluid.layers.fc(mlp_res, size=self._config.class_dim, act='softmax')
+        loss = fluid.layers.cross_entropy(input=prediction, label=label)
+        avg_cost = fluid.layers.mean(x=loss)
+        acc = fluid.layers.accuracy(input=prediction, label=label)
+        return avg_cost, acc, prediction
--- a/fluid/text_matching_on_quora/models/test.py
+++ b/fluid/text_matching_on_quora/models/test.py
--- a/fluid/text_matching_on_quora/pretrained_word2vec.py
+++ b/fluid/text_matching_on_quora/pretrained_word2vec.py
+# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This Module provide pretrained word-embeddings 
+"""
+from __future__ import print_function, unicode_literals 
+import numpy as np
+import time, datetime
+import os, sys
+def Glove840B_300D(filepath, keys=None):
+    """
+    input: the "glove.840B.300d.txt" file path
+    return: a dict, key: word (unicode), value: a numpy array with shape [300]
+    """
+    if keys is not None:
+        assert(isinstance(keys, set))
+    print("loading word2vec from ", filepath)
+    print("please wait for a minute.")
+    start = time.time()
+    word2vec = {}
+    with open(filepath, "r") as f:
+        for line in f:
+            if sys.version_info <= (3, 0): # for python2
+                line = line.decode('utf-8')
+            info = line.strip("\n").split(" ")
+            word = info[0]
+            if (keys is not None) and (word not in keys):
+                continue
+            vector = info[1:]
+            assert(len(vector) == 300)
+            word2vec[word] = np.asarray(vector, dtype='float32')
+    end = time.time()
+    print("Spent ", str(datetime.timedelta(seconds=end-start)), " on loading word2vec.")
+    return word2vec
+if __name__ == '__main__':
+    from os.path import expanduser
+    home = expanduser("~")
+    embed_dict = Glove840B_300D(os.path.join(home, "./.cache/paddle/dataset/glove.840B.300d.txt"))
+    exit(0)
--- a/fluid/text_matching_on_quora/quora_question_pairs.py
+++ b/fluid/text_matching_on_quora/quora_question_pairs.py
+# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+"""
+import paddle.dataset.common
+import collections
+import tarfile
+import re
+import string
+import random
+import os, sys
+import nltk
+from os.path import expanduser
+__all__ = ['word_dict', 'train', 'dev', 'test']
+URL = "https://drive.google.com/file/d/0B0PlTAo--BnaQWlsZl9FZ3l1c28/view"
+DATA_HOME = os.path.expanduser('~/.cache/paddle/dataset')
+DATA_DIR = "Quora_question_pair_partition"
+QUORA_TRAIN_FILE_NAME = os.path.join(DATA_HOME, DATA_DIR, 'train.tsv')
+QUORA_DEV_FILE_NAME = os.path.join(DATA_HOME, DATA_DIR, 'dev.tsv')
+QUORA_TEST_FILE_NAME = os.path.join(DATA_HOME, DATA_DIR, 'test.tsv')
+# punctuation or nltk or space
+TOKENIZE_METHOD='space'
+COLUMN_COUNT = 4
+def tokenize(s):
+    if sys.version_info <= (3, 0): # for python2
+        s = s.decode('utf-8')
+    if TOKENIZE_METHOD == "nltk":
+        return nltk.tokenize.word_tokenize(s)
+    elif TOKENIZE_METHOD == "punctuation":
+        return s.translate({ord(char): None for char in string.punctuation}).lower().split()
+    elif TOKENIZE_METHOD == "space":
+        return s.split()
+    else:
+        raise RuntimeError("Invalid tokenize method")
+def maybe_open(file_name):
+    if not os.path.isfile(file_name):
+        msg = "file not exist: %s\nPlease download the dataset firstly from: %s\n\n" % (file_name, URL) + \
+                ("# The finally dataset dir should be like\n\n"
+                "$HOME/.cache/paddle/dataset\n"
+                " |- Quora_question_pair_partition\n"
+                "     |- train.tsv\n"
+                "     |- test.tsv\n"
+                "     |- dev.tsv\n"
+                "     |- readme.txt\n"
+                "     |- wordvec.txt\n")
+        raise RuntimeError(msg)
+    return open(file_name, 'r')
+def tokenized_question_pairs(file_name):
+    """
+    """
+    with maybe_open(file_name) as f:
+        questions = {}
+        lines = f.readlines()
+        for line in lines:
+            info = line.strip().split('\t')
+            if len(info) != COLUMN_COUNT:
+                # formatting error
+                continue
+            (label, question1, question2, id) = info
+            question1 = tokenize(question1)
+            question2 = tokenize(question2)
+            yield question1, question2, int(label)
+def tokenized_questions(file_name):
+    """
+    """
+    with maybe_open(file_name) as f:
+        lines = f.readlines()
+        for line in lines:
+            info = line.strip().split('\t')
+            if len(info) != COLUMN_COUNT:
+                # formatting error
+                continue
+            (label, question1, question2, id) = info
+            yield tokenize(question1)
+            yield tokenize(question2)
+def build_dict(file_name, cutoff):
+    """
+    Build a word dictionary from the corpus. Keys of the dictionary are words,
+    and values are zero-based IDs of these words.
+    """
+    word_freq = collections.defaultdict(int)
+    for doc in tokenized_questions(file_name):
+        for word in doc:
+            word_freq[word] += 1
+    word_freq = filter(lambda x: x[1] > cutoff, word_freq.items())
+    dictionary = sorted(word_freq, key=lambda x: (-x[1], x[0]))
+    words, _ = list(zip(*dictionary))
+    word_idx = dict(zip(words, range(len(words))))
+    word_idx['<unk>'] = len(words)
+    word_idx['<pad>'] = len(words) + 1
+    return word_idx
+def reader_creator(file_name, word_idx):
+    UNK_ID = word_idx['<unk>']
+    def reader():
+        for (q1, q2, label) in tokenized_question_pairs(file_name):
+            q1_ids = [word_idx.get(w, UNK_ID) for w in q1]
+            q2_ids = [word_idx.get(w, UNK_ID) for w in q2]
+            if q1_ids != [] and q2_ids != []: # [] is not allowed in fluid
+                assert(label in [0, 1])
+                yield q1_ids, q2_ids, label
+    return reader
+def train(word_idx):
+    """
+    Quora training set creator.
+    It returns a reader creator, each sample in the reader is two zero-based ID
+    list and label in [0, 1].
+    :param word_idx: word dictionary
+    :type word_idx: dict
+    :return: Training reader creator
+    :rtype: callable
+    """   
+    return reader_creator(QUORA_TRAIN_FILE_NAME, word_idx)
+def dev(word_idx):
+    """
+    Quora develop set creator.
+    It returns a reader creator, each sample in the reader is two zero-based ID
+    list and label in [0, 1].
+    :param word_idx: word dictionary
+    :type word_idx: dict
+    :return: develop reader creator
+    :rtype: callable
+    """
+    return reader_creator(QUORA_DEV_FILE_NAME, word_idx)
+def test(word_idx):
+    """
+    Quora test set creator.
+    It returns a reader creator, each sample in the reader is two zero-based ID
+    list and label in [0, 1].
+    :param word_idx: word dictionary
+    :type word_idx: dict
+    :return: Test reader creator
+    :rtype: callable
+    """
+    return reader_creator(QUORA_TEST_FILE_NAME, word_idx)
+def word_dict():
+    """
+    Build a word dictionary from the corpus.
+    :return: Word dictionary
+    :rtype: dict
+    """
+    return build_dict(file_name=QUORA_TRAIN_FILE_NAME, cutoff=4)
--- a/fluid/text_matching_on_quora/train_and_evaluate.py
+++ b/fluid/text_matching_on_quora/train_and_evaluate.py
+#Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from __future__ import print_function
+import os
+import sys
+import time
+import argparse
+import unittest
+import contextlib
+import numpy as np
+import paddle.fluid as fluid
+import utils, metric, configs
+import models
+from pretrained_word2vec import Glove840B_300D 
+parser = argparse.ArgumentParser(description=__doc__)
+parser.add_argument('--model_name',       type=str,   default='cdssmNet',                  help="Which model to train")
+parser.add_argument('--config',           type=str,   default='cdssm_base',       help="The global config setting")
+DATA_DIR = os.path.join(os.path.expanduser('~'), '.cache/paddle/dataset')
+def evaluate(epoch_id, exe, inference_program, dev_reader, test_reader, fetch_list, feeder, metric_type):
+    """
+    evaluate on test/dev dataset
+    """
+    def infer(test_reader):
+        """
+        do inference function
+        """
+        total_cost = 0.0
+        total_count = 0
+        preds, labels = [], []
+        for data in test_reader():
+            avg_cost, avg_acc, batch_prediction = exe.run(inference_program,
+                          feed=feeder.feed(data),
+                          fetch_list=fetch_list,
+                          return_numpy=True)
+            total_cost += avg_cost * len(data)
+            total_count += len(data)
+            preds.append(batch_prediction)
+            labels.append(np.asarray([x[-1] for x in data], dtype=np.int64))
+        y_pred = np.concatenate(preds)
+        y_label = np.concatenate(labels)
+        metric_res = []
+        for metric_name in metric_type:
+            if metric_name == 'accuracy_with_threshold':
+                metric_res.append((metric_name, metric.accuracy_with_threshold(y_pred, y_label, threshold=0.3)))
+            elif metric_name == 'accuracy':
+                metric_res.append((metric_name, metric.accuracy(y_pred, y_label)))
+            else:
+                print("Unknown metric type: ", metric_name)
+                exit()
+        return total_cost / (total_count * 1.0), metric_res
+    dev_cost, dev_metric_res = infer(dev_reader)
+    print("[%s] epoch_id: %d, dev_cost: %f, " % (
+                 time.asctime( time.localtime(time.time()) ),
+                 epoch_id,
+                 dev_cost)
+               + ', '.join([str(x[0]) + ": " + str(x[1]) for x in dev_metric_res]))
+    test_cost, test_metric_res = infer(test_reader)
+    print("[%s] epoch_id: %d, test_cost: %f, " % (
+                time.asctime( time.localtime(time.time()) ),
+                epoch_id,
+                test_cost)
+              + ', '.join([str(x[0]) + ": " + str(x[1]) for x in test_metric_res]))
+    print("")
+def train_and_evaluate(train_reader,
+          test_reader, 
+          dev_reader,
+          network,
+          optimizer,
+          global_config,
+          pretrained_word_embedding,
+          use_cuda,
+          parallel):
+    """
+    train network
+    """
+    # define the net
+    if global_config.use_lod_tensor: 
+        # automatic add batch dim
+        q1 = fluid.layers.data(
+            name="question1", shape=[1], dtype="int64", lod_level=1)
+        q2 = fluid.layers.data(
+            name="question2", shape=[1], dtype="int64", lod_level=1)
+        label = fluid.layers.data(name="label", shape=[1], dtype="int64")
+        cost, acc, prediction = network(q1, q2, label)  
+    else:
+        # shape: [batch_size, max_seq_len_in_batch, 1]
+        q1 = fluid.layers.data(
+            name="question1", shape=[-1, -1, 1], dtype="int64")
+        q2 = fluid.layers.data(
+            name="question2", shape=[-1, -1, 1], dtype="int64")
+        # shape: [batch_size, max_seq_len_in_batch]
+        mask1 = fluid.layers.data(name="mask1", shape=[-1, -1], dtype="float32")
+        mask2 = fluid.layers.data(name="mask2", shape=[-1, -1], dtype="float32")
+        label = fluid.layers.data(name="label", shape=[1], dtype="int64")
+        cost, acc, prediction = network(q1, q2, mask1, mask2, label)
+    if parallel:
+        # TODO: Paarallel Training
+        print("Parallel Training is not supported for now.")
+        sys.exit(1)
+    #optimizer.minimize(cost)
+    if use_cuda:
+        print("Using GPU")
+        place = fluid.CUDAPlace(0)
+    else:
+        print("Using CPU")
+        place = fluid.CPUPlace()
+    exe = fluid.Executor(place)
+    if global_config.use_lod_tensor:
+        feeder = fluid.DataFeeder(feed_list=[q1, q2, label], place=place)
+    else:
+        feeder = fluid.DataFeeder(feed_list=[q1, q2, mask1, mask2, label], place=place)
+    # logging param info
+    for param in fluid.default_main_program().global_block().all_parameters():
+        print("param name: %s; param shape: %s" % (param.name, param.shape))
+    # define inference_program
+    inference_program = fluid.default_main_program().clone(for_test=True)
+    optimizer.minimize(cost)
+    exe.run(fluid.default_startup_program())
+    # load emb from a numpy erray
+    if pretrained_word_embedding is not None:
+        print("loading pretrained word embedding to param")
+        embedding_name = "emb.w"
+        embedding_param = fluid.global_scope().find_var(embedding_name).get_tensor()
+        embedding_param.set(pretrained_word_embedding, place)
+    evaluate(-1,
+             exe,
+             inference_program,
+             dev_reader,
+             test_reader,
+             fetch_list=[cost, acc, prediction],
+             feeder=feeder,
+             metric_type=global_config.metric_type)
+    # start training
+    print("[%s] Start Training" % time.asctime(time.localtime(time.time())))
+    for epoch_id in range(global_config.epoch_num):
+        data_size, data_count, total_acc, total_cost = 0, 0, 0.0, 0.0
+        batch_id = 0
+        for data in train_reader():
+            avg_cost_np, avg_acc_np = exe.run(fluid.default_main_program(),
+                                              feed=feeder.feed(data),
+                                              fetch_list=[cost, acc])
+            data_size = len(data)
+            total_acc += data_size * avg_acc_np
+            total_cost += data_size * avg_cost_np
+            data_count += data_size
+            if batch_id % 100 == 0:
+                print("[%s] epoch_id: %d, batch_id: %d, cost: %f, acc: %f" % (
+                    time.asctime(time.localtime(time.time())),
+                    epoch_id, 
+                    batch_id, 
+                    avg_cost_np,
+                    avg_acc_np))
+            batch_id += 1
+        avg_cost = total_cost / data_count
+        avg_acc = total_acc / data_count
+        print("")
+        print("[%s] epoch_id: %d, train_avg_cost: %f, train_avg_acc: %f" % (
+            time.asctime( time.localtime(time.time()) ), epoch_id, avg_cost, avg_acc))
+        epoch_model = global_config.save_dirname + "/" + "epoch" + str(epoch_id)
+        fluid.io.save_inference_model(epoch_model, ["question1", "question2", "label"], acc, exe)    
+        evaluate(epoch_id, 
+                 exe, 
+                 inference_program,
+                 dev_reader,
+                 test_reader, 
+                 fetch_list=[cost, acc, prediction], 
+                 feeder=feeder, 
+                 metric_type=global_config.metric_type)
+def main():
+    """
+    This function will parse argments, prepare data and prepare pretrained embedding
+    """
+    args = parser.parse_args()
+    global_config = configs.__dict__[args.config]()
+    print("net_name: ", args.model_name)
+    net = models.__dict__[args.model_name](global_config)
+    # get word_dict
+    word_dict = utils.getDict(data_type="quora_question_pairs")
+    # get reader
+    train_reader, dev_reader, test_reader = utils.prepare_data(
+        "quora_question_pairs",
+         word_dict=word_dict,
+         batch_size = global_config.batch_size,
+         buf_size=800000,
+         duplicate_data=global_config.duplicate_data,
+         use_pad=(not global_config.use_lod_tensor))
+    # load pretrained_word_embedding
+    if global_config.use_pretrained_word_embedding:
+        word2vec = Glove840B_300D(filepath=os.path.join(DATA_DIR, "glove.840B.300d.txt"),
+                                  keys=set(word_dict.keys()))
+        pretrained_word_embedding = utils.get_pretrained_word_embedding(
+                                        word2vec=word2vec,
+                                        word2id=word_dict,
+                                        config=global_config)
+        print("pretrained_word_embedding to be load:", pretrained_word_embedding)
+    else:
+        pretrained_word_embedding = None
+    # define optimizer
+    optimizer = utils.getOptimizer(global_config)
+    # use cuda or not
+    if not global_config.has_member('use_cuda'):
+        global_config.use_cuda = 'CUDA_VISIBLE_DEVICES' in os.environ
+    global_config.list_config()
+    train_and_evaluate(
+                   train_reader,
+                   dev_reader,
+                   test_reader,
+                   net,
+                   optimizer,
+                   global_config,
+                   pretrained_word_embedding,
+                   use_cuda=global_config.use_cuda,
+                   parallel=False)
+if __name__ == "__main__":
+    main()
--- a/fluid/text_matching_on_quora/utils.py
+++ b/fluid/text_matching_on_quora/utils.py
+# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This module provides utilities for data generator and optimizer definition 
+"""
+import sys
+import time
+import numpy as np
+import paddle.fluid as fluid
+import paddle
+import quora_question_pairs
+def to_lodtensor(data, place):
+    """
+    convert to LODtensor
+    """
+    seq_lens = [len(seq) for seq in data]
+    cur_len = 0
+    lod = [cur_len]
+    for l in seq_lens:
+        cur_len += l
+        lod.append(cur_len)
+    flattened_data = np.concatenate(data, axis=0).astype("int64")
+    flattened_data = flattened_data.reshape([len(flattened_data), 1])
+    res = fluid.LoDTensor()
+    res.set(flattened_data, place)
+    res.set_lod([lod])
+    return res
+def getOptimizer(global_config):
+    """
+    get Optimizer by config
+    """
+    if global_config.optimizer_type == "adam":
+        optimizer = fluid.optimizer.Adam(learning_rate=fluid.layers.exponential_decay(
+                                                      learning_rate=global_config.learning_rate,
+                                                      decay_steps=global_config.train_samples_num // global_config.batch_size,
+                                                      decay_rate=global_config.lr_decay))
+    elif global_config.optimizer_type == "sgd":
+        optimizer = fluid.optimizer.SGD(learning_rate=fluid.layers.exponential_decay(
+                                                      learning_rate=global_config.learning_rate,
+                                                      decay_steps=global_config.train_samples_num // global_config.batch_size,
+                                                      decay_rate=global_config.lr_decay))
+    elif global_config.optimizer_type == "adagrad":
+        optimizer = fluid.optimizer.Adagrad(learning_rate=fluid.layers.exponential_decay(
+                                                      learning_rate=global_config.learning_rate,
+                                                      decay_steps=global_config.train_samples_num // global_config.batch_size,
+                                                      decay_rate=global_config.lr_decay))
+    return optimizer
+def get_pretrained_word_embedding(word2vec, word2id, config):
+    """get pretrained embedding in shape [config.dict_dim, config.emb_dim]"""
+    print("preparing pretrained word embedding ...")
+    assert(config.dict_dim >= len(word2id))
+    word2id = sorted(word2id.items(), key = lambda x : x[1])
+    words = [x[0] for x in word2id]
+    words = words + ['<not-a-real-words>'] * (config.dict_dim - len(words))
+    pretrained_emb = []
+    for _, word in enumerate(words):
+        if word in word2vec:
+            assert(len(word2vec[word] == config.emb_dim))
+            if config.embedding_norm:
+                pretrained_emb.append(word2vec[word] / np.linalg.norm(word2vec[word]))
+            else:
+                pretrained_emb.append(word2vec[word])
+        elif config.OOV_fill == 'uniform':
+            pretrained_emb.append(np.random.uniform(-0.05, 0.05, size=[config.emb_dim]).astype(np.float32))
+        elif config.OOV_fill == 'normal':
+            pretrained_emb.append(np.random.normal(loc=0.0, scale=0.1, size=[config.emb_dim]).astype(np.float32))
+        else:
+            print("Unkown OOV fill method: ", OOV_fill)
+            exit()
+    word_embedding = np.stack(pretrained_emb)
+    return word_embedding
+def getDict(data_type="quora_question_pairs"):
+    """
+    get word2id dict from quora dataset
+    """
+    print("Generating word dict...")
+    if data_type == "quora_question_pairs":
+        word_dict = quora_question_pairs.word_dict()
+    else:
+        raise RuntimeError("No such dataset")
+    print("Vocab size: ", len(word_dict))
+    return word_dict
+def duplicate(reader):
+    """
+    duplicate the quora qestion pairs since there are 2 questions in a sample
+    Input: reader, which yield (question1, question2, label)
+    Output: reader, which yield (question1, question2, label) and yield (question2, question1, label)
+    """
+    def duplicated_reader():
+        for data in reader():
+            (q1, q2, label) = data
+            yield (q1, q2, label)
+            yield (q2, q1, label)
+    return duplicated_reader
+def pad(reader, PAD_ID):
+    """
+    Input: reader, yield batches of [(question1, question2, label), ... ]
+    Output: padded_reader, yield batches of [(padded_question1, padded_question2, mask1, mask2, label), ... ]
+    """
+    assert(isinstance(PAD_ID, int))
+    def padded_reader():
+        for batch in reader():
+            max_len1 = max([len(data[0]) for data in batch])
+            max_len2 = max([len(data[1]) for data in batch])
+            padded_batch = []
+            for data in batch:
+                question1, question2, label = data
+                seq_len1 = len(question1)
+                seq_len2 = len(question2)
+                mask1 = [1] * seq_len1 + [0] * (max_len1 - seq_len1)
+                mask2 = [1] * seq_len2 + [0] * (max_len2 - seq_len2)
+                padded_question1 = question1 + [PAD_ID] * (max_len1 - seq_len1)
+                padded_question2 = question2 + [PAD_ID] * (max_len2 - seq_len2)
+                padded_question1 = [[x] for x in padded_question1] # last dim of questions must be 1, according to fluid's request
+                padded_question2 = [[x] for x in padded_question2]
+                assert(len(mask1) == max_len1)
+                assert(len(mask2) == max_len2)
+                assert(len(padded_question1) == max_len1)
+                assert(len(padded_question2) == max_len2)
+                padded_batch.append((padded_question1, padded_question2, mask1, mask2, label))
+            yield padded_batch
+    return padded_reader
+def prepare_data(data_type,
+                 word_dict,
+                 batch_size,
+                 buf_size=50000,
+                 duplicate_data=False,
+                 use_pad=False):
+    """
+    prepare data
+    """
+    PAD_ID=word_dict['<pad>']
+    if data_type == "quora_question_pairs":
+	# train/dev/test reader are batched iters which yield a batch of (question1, question2, label) each time
+	# qestion1 and question2 are lists of word ID
+	# label is 0 or 1
+	# for example: ([1, 3, 2], [7, 5, 4, 99], 1)
+        def prepare_reader(reader):
+            if duplicate_data:
+                reader = duplicate(reader)
+            reader = paddle.batch(
+                       paddle.reader.shuffle(reader, buf_size=buf_size),
+                       batch_size=batch_size, 
+                       drop_last=False)
+            if use_pad:
+                reader = pad(reader, PAD_ID=PAD_ID)
+            return reader
+        train_reader = prepare_reader(quora_question_pairs.train(word_dict))
+        dev_reader = prepare_reader(quora_question_pairs.dev(word_dict))
+        test_reader = prepare_reader(quora_question_pairs.test(word_dict))
+    else:
+        raise RuntimeError("no such dataset")
+    return train_reader, dev_reader, test_reader
--- a/fluid/video_classification/eval.py
+++ b/fluid/video_classification/eval.py
@@ -2,7 +2,7 @@ import os
 import numpy as np
 import time
 import sys
-import paddle.v2 as paddle
+import paddle
 import paddle.fluid as fluid
 from resnet import TSN_ResNet
 import reader

--- a/fluid/video_classification/infer.py
+++ b/fluid/video_classification/infer.py
@@ -2,7 +2,7 @@ import os
 import numpy as np
 import time
 import sys
-import paddle.v2 as paddle
+import paddle
 import paddle.fluid as fluid
 from resnet import TSN_ResNet
 import reader

--- a/fluid/video_classification/reader.py
+++ b/fluid/video_classification/reader.py
 import os
+import sys
 import math
 import random
 import functools
-import cPickle
+try:
-from cStringIO import StringIO
+    import cPickle as pickle
+    from cStringIO import StringIO
+except ImportError:
+    import pickle
+    from io import BytesIO
 import numpy as np
-import paddle.v2 as paddle
+import paddle
 from PIL import Image, ImageEnhance
 random.seed(0)
-DATA_DIM = 224
 THREAD = 8
 BUF_SIZE = 1024
@@ -22,17 +25,13 @@ INFER_LIST = 'data/test.list'
 img_mean = np.array([0.485, 0.456, 0.406]).reshape((3, 1, 1))
 img_std = np.array([0.229, 0.224, 0.225]).reshape((3, 1, 1))
+python_ver = sys.version_info
 def imageloader(buf):
    if isinstance(buf, str):
-        tempbuff = StringIO()
-        tempbuff.write(buf)
-        tempbuff.seek(0)
-        img = Image.open(tempbuff)
-    elif isinstance(buf, collections.Sequence):
-        img = Image.open(StringIO(buf[-1]))
-    else:
        img = Image.open(StringIO(buf))
+    else:
+        img = Image.open(BytesIO(buf))
    return img.convert('RGB')
@@ -98,7 +97,7 @@ def group_center_crop(img_group, target_size):
 def video_loader(frames, nsample, mode):
    videolen = len(frames)
-    average_dur = videolen / nsample
+    average_dur = videolen // nsample
    imgs = []
    for i in range(nsample):
@@ -111,12 +110,12 @@ def video_loader(frames, nsample, mode):
                idx = i
        else:
            if average_dur >= 1:
-                idx = (average_dur - 1) / 2
+                idx = (average_dur - 1) // 2
                idx += i * average_dur
            else:
                idx = i
-        imgbuf = frames[idx % videolen]
+        imgbuf = frames[int(idx % videolen)]
        img = imageloader(imgbuf)
        imgs.append(img)
@@ -125,7 +124,10 @@ def video_loader(frames, nsample, mode):
 def decode_pickle(sample, mode, seg_num, short_size, target_size):
    pickle_path = sample[0]
-    data_loaded = cPickle.load(open(pickle_path))
+    if python_ver < (3, 0):
+        data_loaded = pickle.load(open(pickle_path, 'rb'))
+    else:
+        data_loaded = pickle.load(open(pickle_path, 'rb'), encoding='bytes')
    vid, label, frames = data_loaded
    imgs = video_loader(frames, seg_num, mode)

--- a/fluid/video_classification/resnet.py
+++ b/fluid/video_classification/resnet.py
@@ -22,7 +22,7 @@ class TSN_ResNet():
            num_filters=num_filters,
            filter_size=filter_size,
            stride=stride,
-            padding=(filter_size - 1) / 2,
+            padding=(filter_size - 1) // 2,
            groups=groups,
            act=None,
            bias_attr=False)

--- a/fluid/video_classification/train.py
+++ b/fluid/video_classification/train.py
@@ -2,7 +2,7 @@ import os
 import numpy as np
 import time
 import sys
-import paddle.v2 as paddle
+import paddle
 import paddle.fluid as fluid
 from resnet import TSN_ResNet
 import reader
@@ -91,7 +91,7 @@ def train(args):
        fluid.io.load_vars(exe, pretrained_model, vars=vars)
    # reader
-    train_reader = paddle.batch(reader.train(seg_num), batch_size=batch_size)
+    train_reader = paddle.batch(reader.train(seg_num), batch_size=batch_size, drop_last=True)
    # test in single GPU
    test_reader = paddle.batch(reader.test(seg_num), batch_size=batch_size / 16)
    feeder = fluid.DataFeeder(place=place, feed_list=[image, label])

--- a/fluid/video_classification/utility.py
+++ b/fluid/video_classification/utility.py
@@ -17,6 +17,7 @@ from __future__ import division
 from __future__ import print_function
 import distutils.util
 import numpy as np
+import six
 from paddle.fluid import core
@@ -36,7 +37,7 @@ def print_arguments(args):
    :type args: argparse.Namespace
    """
    print("-----------  Configuration Arguments -----------")
-    for arg, value in sorted(vars(args).iteritems()):
+    for arg, value in sorted(six.iteritems(vars(args))):
        print("%s: %s" % (arg, value))
    print("------------------------------------------------")

--- a/v2/README.cn.md
+++ b/v2/README.cn.md
--- a/v2/README.md
+++ b/v2/README.md
@@ -12,23 +12,23 @@ The word embedding expresses words with a real vector. Each dimension of the vec
 In the example of word vectors, we show how to use Hierarchical-Sigmoid and Noise Contrastive Estimation (NCE) to accelerate word-vector learning.
- 1.1 [Hsigmoid Accelerated Word Vector Training](https://github.com/PaddlePaddle/models/tree/develop/v2/hsigmoid)
+- 1.1 [Hsigmoid Accelerated Word Vector Training](https://github.com/PaddlePaddle/models/tree/develop/legacy/hsigmoid)
- 1.2 [Noise Contrastive Estimation Accelerated Word Vector Training](https://github.com/PaddlePaddle/models/tree/develop/v2/nce_cost)
+- 1.2 [Noise Contrastive Estimation Accelerated Word Vector Training](https://github.com/PaddlePaddle/models/tree/develop/legacy/nce_cost)
 ## 2. RNN language model
 The language model is important in the field of natural language processing. In addition to getting the word vector (a by-product of language model training), it can also help us to generate text. Given a number of words, the language model can help us predict the next most likely word. In the example of using the language model to generate text, we focus on the recurrent neural network language model. We can use the instructions in the document quickly adapt to their training corpus, complete automatic writing poetry, automatic writing prose and other interesting models.
- 2.1 [Generate text using the RNN language model](https://github.com/PaddlePaddle/models/tree/develop/v2/generate_sequence_by_rnn_lm)
+- 2.1 [Generate text using the RNN language model](https://github.com/PaddlePaddle/models/tree/develop/legacy/generate_sequence_by_rnn_lm)
 ## 3. Click-Through Rate prediction
 The click-through rate model predicts the probability that a user will click on an ad. This is widely used for advertising technology. Logistic Regression has a good learning performance for large-scale sparse features in the early stages of the development of click-through rate prediction. In recent years, DNN model because of its strong learning ability to gradually take the banner rate of the task of the banner.
 In the example of click-through rate estimates, we first give the Google's Wide & Deep model. This model combines the advantages of DNN and the applicable logistic regression model for DNN and large-scale sparse features. Then we provide the deep factorization machine for click-through rate prediction. The deep factorization machine combines the factorization machine and deep neural networks to model both low order and high order interactions of input features.
- 3.1 [Click-Through Rate Model](https://github.com/PaddlePaddle/models/tree/develop/v2/ctr)
+- 3.1 [Click-Through Rate Model](https://github.com/PaddlePaddle/models/tree/develop/legacy/ctr)
- 3.2 [Deep Factorization Machine for Click-Through Rate prediction](https://github.com/PaddlePaddle/models/tree/develop/v2/deep_fm)
+- 3.2 [Deep Factorization Machine for Click-Through Rate prediction](https://github.com/PaddlePaddle/models/tree/develop/legacy/deep_fm)
 ## 4. Text classification
@@ -36,7 +36,7 @@ Text classification is one of the most basic tasks in natural language processin
 For text classification, we provide a non-sequential text classification model based on DNN and CNN. (For LSTM-based model, please refer to PaddleBook [Sentiment Analysis](http://www.paddlepaddle.org/docs/develop/book/06.understand_sentiment/index.html)).
- 4.1 [Sentiment analysis based on DNN / CNN](https://github.com/PaddlePaddle/models/tree/develop/v2/text_classification)
+- 4.1 [Sentiment analysis based on DNN / CNN](https://github.com/PaddlePaddle/models/tree/develop/legacy/text_classification)
 ## 5. Learning to rank
@@ -45,14 +45,14 @@ The depth neural network can be used to model the fractional function to form va
 The algorithms for learning to rank are usually categorized into three groups by their input representation and the loss function. These are pointwise, pairwise and listwise approaches. Here we demonstrate RankLoss loss function method (pairwise approach), and LambdaRank loss function method (listwise approach). (For Pointwise approaches, please refer to [Recommended System](http://www.paddlepaddle.org/docs/develop/book/05.recommender_system/index.html)).
- 5.1 [Learning to rank based on Pairwise and Listwise approches](https://github.com/PaddlePaddle/models/tree/develop/v2/ltr)
+- 5.1 [Learning to rank based on Pairwise and Listwise approches](https://github.com/PaddlePaddle/models/tree/develop/legacy/ltr)
 ## 6. Semantic model
 The deep structured semantic model uses the DNN model to learn the vector representation of the low latitude in a continuous semantic space, finally models the semantic similarity between the two sentences.
 In this example, we demonstrate how to use PaddlePaddle to implement a generic deep structured semantic model to model the semantic similarity between two strings. The model supports different network structures such as CNN (Convolutional Network), FC (Fully Connected Network), RNN (Recurrent Neural Network), and different loss functions such as classification, regression, and sequencing.
- 6.1 [Deep structured semantic model](https://github.com/PaddlePaddle/models/tree/develop/v2/dssm)
+- 6.1 [Deep structured semantic model](https://github.com/PaddlePaddle/models/tree/develop/legacy/dssm)
 ## 7. Sequence tagging
@@ -60,7 +60,7 @@ Given the input sequence, the sequence tagging model is one of the most basic ta
 In the example of the sequence tagging, we describe how to train an end-to-end sequence tagging model with the Named Entity Recognition (NER) task as an example.
- 7.1 [Name Entity Recognition](https://github.com/PaddlePaddle/models/tree/develop/v2/sequence_tagging_for_ner)
+- 7.1 [Name Entity Recognition](https://github.com/PaddlePaddle/models/tree/develop/legacy/sequence_tagging_for_ner)
 ## 8. Sequence to sequence learning
@@ -68,19 +68,19 @@ Sequence-to-sequence model has a wide range of applications. This includes machi
 As an example for sequence-to-sequence learning, we take the machine translation task. We demonstrate the sequence-to-sequence mapping model without attention mechanism, which is the basis for all sequence-to-sequence learning models. We will use scheduled sampling to improve the problem of error accumulation in the RNN model, and machine translation with external memory mechanism.
- 8.1 [Basic Sequence-to-sequence model](https://github.com/PaddlePaddle/models/tree/develop/v2/nmt_without_attention)
+- 8.1 [Basic Sequence-to-sequence model](https://github.com/PaddlePaddle/models/tree/develop/legacy/nmt_without_attention)
 ## 9. Image classification
 For the example of image classification, we show you how to train AlexNet, VGG, GoogLeNet, ResNet, Inception-v4, Inception-Resnet-V2 and Xception models in PaddlePaddle. It also provides model conversion tools that convert Caffe or TensorFlow trained model files into PaddlePaddle model files.
- 9.1 [convert Caffe model file to PaddlePaddle model file](https://github.com/PaddlePaddle/models/tree/develop/v2/image_classification/caffe2paddle)
+- 9.1 [convert Caffe model file to PaddlePaddle model file](https://github.com/PaddlePaddle/models/tree/develop/legacy/image_classification/caffe2paddle)
- 9.2 [convert TensorFlow model file to PaddlePaddle model file](https://github.com/PaddlePaddle/models/tree/develop/v2/image_classification/tf2paddle)
+- 9.2 [convert TensorFlow model file to PaddlePaddle model file](https://github.com/PaddlePaddle/models/tree/develop/legacy/image_classification/tf2paddle)
- 9.3 [AlexNet](https://github.com/PaddlePaddle/models/tree/develop/v2/image_classification)
+- 9.3 [AlexNet](https://github.com/PaddlePaddle/models/tree/develop/legacy/image_classification)
- 9.4 [VGG](https://github.com/PaddlePaddle/models/tree/develop/v2/image_classification)
+- 9.4 [VGG](https://github.com/PaddlePaddle/models/tree/develop/legacy/image_classification)
- 9.5 [Residual Network](https://github.com/PaddlePaddle/models/tree/develop/v2/image_classification)
+- 9.5 [Residual Network](https://github.com/PaddlePaddle/models/tree/develop/legacy/image_classification)
- 9.6 [Inception-v4](https://github.com/PaddlePaddle/models/tree/develop/v2/image_classification)
+- 9.6 [Inception-v4](https://github.com/PaddlePaddle/models/tree/develop/legacy/image_classification)
- 9.7 [Inception-Resnet-V2](https://github.com/PaddlePaddle/models/tree/develop/v2/image_classification)
+- 9.7 [Inception-Resnet-V2](https://github.com/PaddlePaddle/models/tree/develop/legacy/image_classification)
- 9.8 [Xception](https://github.com/PaddlePaddle/models/tree/develop/v2/image_classification)
+- 9.8 [Xception](https://github.com/PaddlePaddle/models/tree/develop/legacy/image_classification)
 This tutorial is contributed by [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) and licensed under the [Apache-2.0 license](LICENSE).
--- a/v2/conv_seq2seq/README.md
+++ b/v2/conv_seq2seq/README.md
--- a/v2/conv_seq2seq/beamsearch.py
+++ b/v2/conv_seq2seq/beamsearch.py
--- a/v2/conv_seq2seq/download.sh
+++ b/v2/conv_seq2seq/download.sh
--- a/v2/conv_seq2seq/infer.py
+++ b/v2/conv_seq2seq/infer.py
--- a/v2/conv_seq2seq/model.py
+++ b/v2/conv_seq2seq/model.py
--- a/v2/conv_seq2seq/preprocess.py
+++ b/v2/conv_seq2seq/preprocess.py
--- a/v2/conv_seq2seq/reader.py
+++ b/v2/conv_seq2seq/reader.py
--- a/v2/conv_seq2seq/train.py
+++ b/v2/conv_seq2seq/train.py
--- a/v2/ctr/README.cn.md
+++ b/v2/ctr/README.cn.md
--- a/v2/ctr/README.md
+++ b/v2/ctr/README.md
--- a/v2/ctr/avazu_data_processer.py
+++ b/v2/ctr/avazu_data_processer.py
--- a/v2/ctr/dataset.md
+++ b/v2/ctr/dataset.md
--- a/v2/ctr/images/lr_vs_dnn.jpg
+++ b/v2/ctr/images/lr_vs_dnn.jpg
--- a/v2/ctr/images/wide_deep.png
+++ b/v2/ctr/images/wide_deep.png
--- a/v2/ctr/infer.py
+++ b/v2/ctr/infer.py
--- a/v2/ctr/network_conf.py
+++ b/v2/ctr/network_conf.py
--- a/v2/ctr/reader.py
+++ b/v2/ctr/reader.py
--- a/v2/ctr/train.py
+++ b/v2/ctr/train.py
--- a/v2/ctr/utils.py
+++ b/v2/ctr/utils.py
--- a/v2/deep_fm/README.cn.md
+++ b/v2/deep_fm/README.cn.md
--- a/v2/deep_fm/README.md
+++ b/v2/deep_fm/README.md
--- a/v2/deep_fm/data/download.sh
+++ b/v2/deep_fm/data/download.sh
--- a/v2/deep_fm/infer.py
+++ b/v2/deep_fm/infer.py
--- a/v2/deep_fm/network_conf.py
+++ b/v2/deep_fm/network_conf.py
--- a/v2/deep_fm/preprocess.py
+++ b/v2/deep_fm/preprocess.py
--- a/v2/deep_fm/reader.py
+++ b/v2/deep_fm/reader.py
--- a/v2/deep_fm/train.py
+++ b/v2/deep_fm/train.py
--- a/v2/dssm/README.cn.md
+++ b/v2/dssm/README.cn.md
--- a/v2/dssm/README.md
+++ b/v2/dssm/README.md
--- a/v2/dssm/data/classification/test.txt
+++ b/v2/dssm/data/classification/test.txt
--- a/v2/dssm/data/classification/train.txt
+++ b/v2/dssm/data/classification/train.txt
--- a/v2/dssm/data/rank/test.txt
+++ b/v2/dssm/data/rank/test.txt
--- a/v2/dssm/data/rank/train.txt
+++ b/v2/dssm/data/rank/train.txt
--- a/v2/dssm/data/vocab.txt
+++ b/v2/dssm/data/vocab.txt
--- a/v2/dssm/images/dssm.jpg
+++ b/v2/dssm/images/dssm.jpg
--- a/v2/dssm/images/dssm.png
+++ b/v2/dssm/images/dssm.png
--- a/v2/dssm/images/dssm2.jpg
+++ b/v2/dssm/images/dssm2.jpg
--- a/v2/dssm/images/dssm2.png
+++ b/v2/dssm/images/dssm2.png
--- a/v2/dssm/images/dssm3.jpg
+++ b/v2/dssm/images/dssm3.jpg
--- a/v2/dssm/infer.py
+++ b/v2/dssm/infer.py
--- a/v2/dssm/network_conf.py
+++ b/v2/dssm/network_conf.py
--- a/v2/dssm/reader.py
+++ b/v2/dssm/reader.py
--- a/v2/dssm/train.py
+++ b/v2/dssm/train.py
--- a/v2/dssm/utils.py
+++ b/v2/dssm/utils.py
--- a/v2/generate_chinese_poetry/README.md
+++ b/v2/generate_chinese_poetry/README.md
--- a/v2/generate_chinese_poetry/README_en.md
+++ b/v2/generate_chinese_poetry/README_en.md
--- a/v2/generate_chinese_poetry/data/download.sh
+++ b/v2/generate_chinese_poetry/data/download.sh
--- a/v2/generate_chinese_poetry/generate.py
+++ b/v2/generate_chinese_poetry/generate.py
--- a/v2/generate_chinese_poetry/network_conf.py
+++ b/v2/generate_chinese_poetry/network_conf.py
--- a/v2/generate_chinese_poetry/preprocess.py
+++ b/v2/generate_chinese_poetry/preprocess.py
--- a/v2/generate_chinese_poetry/reader.py
+++ b/v2/generate_chinese_poetry/reader.py
--- a/v2/generate_chinese_poetry/train.py
+++ b/v2/generate_chinese_poetry/train.py
--- a/v2/generate_chinese_poetry/utils.py
+++ b/v2/generate_chinese_poetry/utils.py
--- a/v2/generate_sequence_by_rnn_lm/.gitignore
+++ b/v2/generate_sequence_by_rnn_lm/.gitignore
--- a/v2/generate_sequence_by_rnn_lm/README.md
+++ b/v2/generate_sequence_by_rnn_lm/README.md
--- a/v2/generate_sequence_by_rnn_lm/beam_search.py
+++ b/v2/generate_sequence_by_rnn_lm/beam_search.py
--- a/v2/generate_sequence_by_rnn_lm/config.py
+++ b/v2/generate_sequence_by_rnn_lm/config.py
--- a/v2/generate_sequence_by_rnn_lm/data/train_data_examples.txt
+++ b/v2/generate_sequence_by_rnn_lm/data/train_data_examples.txt
--- a/v2/generate_sequence_by_rnn_lm/generate.py
+++ b/v2/generate_sequence_by_rnn_lm/generate.py
--- a/v2/generate_sequence_by_rnn_lm/images/ngram.png
+++ b/v2/generate_sequence_by_rnn_lm/images/ngram.png
--- a/v2/generate_sequence_by_rnn_lm/images/rnn.png
+++ b/v2/generate_sequence_by_rnn_lm/images/rnn.png
--- a/v2/generate_sequence_by_rnn_lm/network_conf.py
+++ b/v2/generate_sequence_by_rnn_lm/network_conf.py
--- a/v2/generate_sequence_by_rnn_lm/reader.py
+++ b/v2/generate_sequence_by_rnn_lm/reader.py
--- a/v2/generate_sequence_by_rnn_lm/train.py
+++ b/v2/generate_sequence_by_rnn_lm/train.py
--- a/v2/generate_sequence_by_rnn_lm/utils.py
+++ b/v2/generate_sequence_by_rnn_lm/utils.py
--- a/v2/globally_normalized_reader/.gitignore
+++ b/v2/globally_normalized_reader/.gitignore
--- a/v2/globally_normalized_reader/README.cn.md
+++ b/v2/globally_normalized_reader/README.cn.md
--- a/v2/globally_normalized_reader/README.md
+++ b/v2/globally_normalized_reader/README.md
--- a/v2/globally_normalized_reader/basic_modules.py
+++ b/v2/globally_normalized_reader/basic_modules.py
--- a/v2/globally_normalized_reader/beam_decoding.py
+++ b/v2/globally_normalized_reader/beam_decoding.py
--- a/v2/globally_normalized_reader/config.py
+++ b/v2/globally_normalized_reader/config.py
--- a/v2/globally_normalized_reader/data/download.sh
+++ b/v2/globally_normalized_reader/data/download.sh
--- a/v2/globally_normalized_reader/evaluate.py
+++ b/v2/globally_normalized_reader/evaluate.py
--- a/v2/globally_normalized_reader/featurize.py
+++ b/v2/globally_normalized_reader/featurize.py
--- a/v2/globally_normalized_reader/infer.py
+++ b/v2/globally_normalized_reader/infer.py
--- a/v2/globally_normalized_reader/model.py
+++ b/v2/globally_normalized_reader/model.py
--- a/v2/globally_normalized_reader/reader.py
+++ b/v2/globally_normalized_reader/reader.py
--- a/v2/globally_normalized_reader/train.py
+++ b/v2/globally_normalized_reader/train.py
--- a/v2/globally_normalized_reader/vocab.py
+++ b/v2/globally_normalized_reader/vocab.py
--- a/v2/hsigmoid/.gitignore
+++ b/v2/hsigmoid/.gitignore
--- a/v2/hsigmoid/README.md
+++ b/v2/hsigmoid/README.md
--- a/v2/hsigmoid/images/binary_tree.png
+++ b/v2/hsigmoid/images/binary_tree.png
--- a/v2/hsigmoid/images/network_conf.png
+++ b/v2/hsigmoid/images/network_conf.png
--- a/v2/hsigmoid/images/path_to_1.png
+++ b/v2/hsigmoid/images/path_to_1.png
--- a/v2/hsigmoid/infer.py
+++ b/v2/hsigmoid/infer.py
--- a/v2/hsigmoid/network_conf.py
+++ b/v2/hsigmoid/network_conf.py
--- a/v2/hsigmoid/train.py
+++ b/v2/hsigmoid/train.py
--- a/v2/image_classification/README.md
+++ b/v2/image_classification/README.md
--- a/v2/image_classification/alexnet.py
+++ b/v2/image_classification/alexnet.py
--- a/v2/image_classification/caffe2paddle/README.md
+++ b/v2/image_classification/caffe2paddle/README.md
--- a/v2/image_classification/caffe2paddle/caffe2paddle.py
+++ b/v2/image_classification/caffe2paddle/caffe2paddle.py
--- a/v2/image_classification/googlenet.py
+++ b/v2/image_classification/googlenet.py
--- a/v2/image_classification/inception_resnet_v2.py
+++ b/v2/image_classification/inception_resnet_v2.py
--- a/v2/image_classification/inception_v4.py
+++ b/v2/image_classification/inception_v4.py
--- a/v2/image_classification/infer.py
+++ b/v2/image_classification/infer.py
--- a/v2/image_classification/models/model_download.sh
+++ b/v2/image_classification/models/model_download.sh
--- a/v2/image_classification/reader.py
+++ b/v2/image_classification/reader.py
--- a/v2/image_classification/resnet.py
+++ b/v2/image_classification/resnet.py
--- a/v2/image_classification/tf2paddle/README.md
+++ b/v2/image_classification/tf2paddle/README.md
--- a/v2/image_classification/tf2paddle/tf2paddle.py
+++ b/v2/image_classification/tf2paddle/tf2paddle.py
--- a/v2/image_classification/train.py
+++ b/v2/image_classification/train.py
--- a/v2/image_classification/vgg.py
+++ b/v2/image_classification/vgg.py
--- a/v2/image_classification/xception.py
+++ b/v2/image_classification/xception.py
--- a/v2/ltr/README.md
+++ b/v2/ltr/README.md
--- a/v2/ltr/README_en.md
+++ b/v2/ltr/README_en.md
--- a/v2/ltr/images/LambdaRank_EN.png
+++ b/v2/ltr/images/LambdaRank_EN.png
--- a/v2/ltr/images/lambdarank.jpg
+++ b/v2/ltr/images/lambdarank.jpg
--- a/v2/ltr/images/learning_to_rank.jpg
+++ b/v2/ltr/images/learning_to_rank.jpg
--- a/v2/ltr/images/ranknet.jpg
+++ b/v2/ltr/images/ranknet.jpg
--- a/v2/ltr/images/ranknet_en.png
+++ b/v2/ltr/images/ranknet_en.png
--- a/v2/ltr/images/search_engine_example.png
+++ b/v2/ltr/images/search_engine_example.png
--- a/v2/ltr/infer.py
+++ b/v2/ltr/infer.py
--- a/v2/ltr/lambda_rank.py
+++ b/v2/ltr/lambda_rank.py
--- a/v2/ltr/ranknet.py
+++ b/v2/ltr/ranknet.py
--- a/v2/ltr/train.py
+++ b/v2/ltr/train.py
--- a/v2/mt_with_external_memory/README.md
+++ b/v2/mt_with_external_memory/README.md
--- a/v2/mt_with_external_memory/data_utils.py
+++ b/v2/mt_with_external_memory/data_utils.py
--- a/v2/mt_with_external_memory/external_memory.py
+++ b/v2/mt_with_external_memory/external_memory.py
--- a/v2/mt_with_external_memory/image/lstm_c_state.png
+++ b/v2/mt_with_external_memory/image/lstm_c_state.png
--- a/v2/mt_with_external_memory/image/memory_enhanced_decoder.png
+++ b/v2/mt_with_external_memory/image/memory_enhanced_decoder.png
--- a/v2/mt_with_external_memory/image/neural_turing_machine_arch.png
+++ b/v2/mt_with_external_memory/image/neural_turing_machine_arch.png
--- a/v2/mt_with_external_memory/image/turing_machine_cartoon.gif
+++ b/v2/mt_with_external_memory/image/turing_machine_cartoon.gif
--- a/v2/mt_with_external_memory/infer.py
+++ b/v2/mt_with_external_memory/infer.py
--- a/v2/mt_with_external_memory/model.py
+++ b/v2/mt_with_external_memory/model.py
--- a/v2/mt_with_external_memory/train.py
+++ b/v2/mt_with_external_memory/train.py
--- a/v2/nce_cost/.gitignore
+++ b/v2/nce_cost/.gitignore
--- a/v2/nce_cost/README.md
+++ b/v2/nce_cost/README.md
--- a/v2/nce_cost/images/network_conf.png
+++ b/v2/nce_cost/images/network_conf.png
--- a/v2/nce_cost/infer.py
+++ b/v2/nce_cost/infer.py
--- a/v2/nce_cost/network_conf.py
+++ b/v2/nce_cost/network_conf.py
--- a/v2/nce_cost/train.py
+++ b/v2/nce_cost/train.py
--- a/v2/nested_sequence/README.md
+++ b/v2/nested_sequence/README.md
--- a/v2/nested_sequence/README_en.md
+++ b/v2/nested_sequence/README_en.md
--- a/v2/nested_sequence/text_classification/.gitignore
+++ b/v2/nested_sequence/text_classification/.gitignore
--- a/v2/nested_sequence/text_classification/README.md
+++ b/v2/nested_sequence/text_classification/README.md
--- a/v2/nested_sequence/text_classification/README_en.md
+++ b/v2/nested_sequence/text_classification/README_en.md
--- a/v2/nested_sequence/text_classification/config.py
+++ b/v2/nested_sequence/text_classification/config.py
--- a/v2/nested_sequence/text_classification/data/infer.txt
+++ b/v2/nested_sequence/text_classification/data/infer.txt
--- a/v2/nested_sequence/text_classification/data/test_data/test.txt
+++ b/v2/nested_sequence/text_classification/data/test_data/test.txt
--- a/v2/nested_sequence/text_classification/data/train_data/train.txt
+++ b/v2/nested_sequence/text_classification/data/train_data/train.txt
--- a/v2/nested_sequence/text_classification/images/model.jpg
+++ b/v2/nested_sequence/text_classification/images/model.jpg
--- a/v2/nested_sequence/text_classification/infer.py
+++ b/v2/nested_sequence/text_classification/infer.py
--- a/v2/nested_sequence/text_classification/network_conf.py
+++ b/v2/nested_sequence/text_classification/network_conf.py
--- a/v2/nested_sequence/text_classification/reader.py
+++ b/v2/nested_sequence/text_classification/reader.py
--- a/v2/nested_sequence/text_classification/requirements.txt
+++ b/v2/nested_sequence/text_classification/requirements.txt
--- a/v2/nested_sequence/text_classification/train.py
+++ b/v2/nested_sequence/text_classification/train.py
--- a/v2/nested_sequence/text_classification/utils.py
+++ b/v2/nested_sequence/text_classification/utils.py
--- a/v2/neural_qa/.gitignore
+++ b/v2/neural_qa/.gitignore
--- a/v2/neural_qa/README.md
+++ b/v2/neural_qa/README.md
--- a/v2/neural_qa/config.py
+++ b/v2/neural_qa/config.py
--- a/v2/neural_qa/infer.py
+++ b/v2/neural_qa/infer.py
--- a/v2/neural_qa/network.py
+++ b/v2/neural_qa/network.py
--- a/v2/neural_qa/pre-trained-models/download-models.sh
+++ b/v2/neural_qa/pre-trained-models/download-models.sh
--- a/v2/neural_qa/pre-trained-models/neural_seq_qa.pre-trained-models.2017-10-27.tar.gz.md5
+++ b/v2/neural_qa/pre-trained-models/neural_seq_qa.pre-trained-models.2017-10-27.tar.gz.md5
--- a/v2/neural_qa/reader.py
+++ b/v2/neural_qa/reader.py
--- a/v2/neural_qa/test/test_reader.py
+++ b/v2/neural_qa/test/test_reader.py
--- a/v2/neural_qa/test/trn_data.gz
+++ b/v2/neural_qa/test/trn_data.gz
--- a/v2/neural_qa/train.py
+++ b/v2/neural_qa/train.py
--- a/v2/neural_qa/utils.py
+++ b/v2/neural_qa/utils.py
--- a/v2/neural_qa/val_and_test.py
+++ b/v2/neural_qa/val_and_test.py
--- a/v2/nmt_without_attention/README.cn.md
+++ b/v2/nmt_without_attention/README.cn.md
--- a/v2/nmt_without_attention/README.md
+++ b/v2/nmt_without_attention/README.md
--- a/v2/nmt_without_attention/generate.py
+++ b/v2/nmt_without_attention/generate.py
--- a/v2/nmt_without_attention/images/bidirectional-encoder.png
+++ b/v2/nmt_without_attention/images/bidirectional-encoder.png
--- a/v2/nmt_without_attention/images/encoder-decoder.png
+++ b/v2/nmt_without_attention/images/encoder-decoder.png
--- a/v2/nmt_without_attention/images/gru.png
+++ b/v2/nmt_without_attention/images/gru.png
--- a/v2/nmt_without_attention/network_conf.py
+++ b/v2/nmt_without_attention/network_conf.py
--- a/v2/nmt_without_attention/train.py
+++ b/v2/nmt_without_attention/train.py
--- a/v2/scene_text_recognition/README.md
+++ b/v2/scene_text_recognition/README.md
--- a/v2/scene_text_recognition/config.py
+++ b/v2/scene_text_recognition/config.py
--- a/v2/scene_text_recognition/decoder.py
+++ b/v2/scene_text_recognition/decoder.py
--- a/v2/scene_text_recognition/images/503.jpg
+++ b/v2/scene_text_recognition/images/503.jpg
--- a/v2/scene_text_recognition/images/504.jpg
+++ b/v2/scene_text_recognition/images/504.jpg
--- a/v2/scene_text_recognition/images/505.jpg
+++ b/v2/scene_text_recognition/images/505.jpg
--- a/v2/scene_text_recognition/images/ctc.png
+++ b/v2/scene_text_recognition/images/ctc.png
--- a/v2/scene_text_recognition/images/feature_vector.png
+++ b/v2/scene_text_recognition/images/feature_vector.png
--- a/v2/scene_text_recognition/images/transcription.png
+++ b/v2/scene_text_recognition/images/transcription.png
--- a/v2/scene_text_recognition/infer.py
+++ b/v2/scene_text_recognition/infer.py
--- a/v2/scene_text_recognition/network_conf.py
+++ b/v2/scene_text_recognition/network_conf.py
--- a/v2/scene_text_recognition/reader.py
+++ b/v2/scene_text_recognition/reader.py
--- a/v2/scene_text_recognition/requirements.txt
+++ b/v2/scene_text_recognition/requirements.txt
--- a/v2/scene_text_recognition/train.py
+++ b/v2/scene_text_recognition/train.py
--- a/v2/scene_text_recognition/utils.py
+++ b/v2/scene_text_recognition/utils.py
--- a/v2/scheduled_sampling/README.md
+++ b/v2/scheduled_sampling/README.md
--- a/v2/scheduled_sampling/README_en.md
+++ b/v2/scheduled_sampling/README_en.md
--- a/v2/scheduled_sampling/generate.py
+++ b/v2/scheduled_sampling/generate.py
--- a/v2/scheduled_sampling/images/Scheduled_Sampling.jpg
+++ b/v2/scheduled_sampling/images/Scheduled_Sampling.jpg
--- a/v2/scheduled_sampling/images/decay.jpg
+++ b/v2/scheduled_sampling/images/decay.jpg
--- a/v2/scheduled_sampling/network_conf.py
+++ b/v2/scheduled_sampling/network_conf.py
--- a/v2/scheduled_sampling/reader.py
+++ b/v2/scheduled_sampling/reader.py
--- a/v2/scheduled_sampling/train.py
+++ b/v2/scheduled_sampling/train.py
--- a/v2/scheduled_sampling/utils.py
+++ b/v2/scheduled_sampling/utils.py
--- a/v2/sequence_tagging_for_ner/.gitignore
+++ b/v2/sequence_tagging_for_ner/.gitignore
--- a/v2/sequence_tagging_for_ner/README.md
+++ b/v2/sequence_tagging_for_ner/README.md
--- a/v2/sequence_tagging_for_ner/data/download.sh
+++ b/v2/sequence_tagging_for_ner/data/download.sh
--- a/v2/sequence_tagging_for_ner/data/target.txt
+++ b/v2/sequence_tagging_for_ner/data/target.txt
--- a/v2/sequence_tagging_for_ner/data/test
+++ b/v2/sequence_tagging_for_ner/data/test
--- a/v2/sequence_tagging_for_ner/data/train
+++ b/v2/sequence_tagging_for_ner/data/train
--- a/v2/sequence_tagging_for_ner/data/vocab.txt
+++ b/v2/sequence_tagging_for_ner/data/vocab.txt
--- a/v2/sequence_tagging_for_ner/images/BIO tag example.png
+++ b/v2/sequence_tagging_for_ner/images/BIO tag example.png
--- a/v2/sequence_tagging_for_ner/images/ner_label_ins.png
+++ b/v2/sequence_tagging_for_ner/images/ner_label_ins.png
--- a/v2/sequence_tagging_for_ner/images/ner_model_en.png
+++ b/v2/sequence_tagging_for_ner/images/ner_model_en.png
--- a/v2/sequence_tagging_for_ner/images/ner_network.png
+++ b/v2/sequence_tagging_for_ner/images/ner_network.png
--- a/v2/sequence_tagging_for_ner/infer.py
+++ b/v2/sequence_tagging_for_ner/infer.py
--- a/v2/sequence_tagging_for_ner/network_conf.py
+++ b/v2/sequence_tagging_for_ner/network_conf.py
--- a/v2/sequence_tagging_for_ner/reader.py
+++ b/v2/sequence_tagging_for_ner/reader.py
--- a/v2/sequence_tagging_for_ner/train.py
+++ b/v2/sequence_tagging_for_ner/train.py
--- a/v2/sequence_tagging_for_ner/utils.py
+++ b/v2/sequence_tagging_for_ner/utils.py
--- a/v2/ssd/README.cn.md
+++ b/v2/ssd/README.cn.md
--- a/v2/ssd/README.md
+++ b/v2/ssd/README.md
--- a/v2/ssd/config/__init__.py
+++ b/v2/ssd/config/__init__.py
--- a/v2/ssd/config/pascal_voc_conf.py
+++ b/v2/ssd/config/pascal_voc_conf.py
--- a/v2/ssd/data/label_list
+++ b/v2/ssd/data/label_list
--- a/v2/ssd/data/prepare_voc_data.py
+++ b/v2/ssd/data/prepare_voc_data.py
--- a/v2/ssd/data_provider.py
+++ b/v2/ssd/data_provider.py
--- a/v2/ssd/eval.py
+++ b/v2/ssd/eval.py
--- a/v2/ssd/image_util.py
+++ b/v2/ssd/image_util.py
--- a/v2/ssd/images/SSD300x300_map.png
+++ b/v2/ssd/images/SSD300x300_map.png
--- a/v2/ssd/images/ssd_network.png
+++ b/v2/ssd/images/ssd_network.png
--- a/v2/ssd/images/vis_1.jpg
+++ b/v2/ssd/images/vis_1.jpg
--- a/v2/ssd/images/vis_2.jpg
+++ b/v2/ssd/images/vis_2.jpg
--- a/v2/ssd/images/vis_3.jpg
+++ b/v2/ssd/images/vis_3.jpg
--- a/v2/ssd/images/vis_4.jpg
+++ b/v2/ssd/images/vis_4.jpg
--- a/v2/ssd/infer.py
+++ b/v2/ssd/infer.py
--- a/v2/ssd/train.py
+++ b/v2/ssd/train.py
--- a/v2/ssd/vgg_ssd_net.py
+++ b/v2/ssd/vgg_ssd_net.py
--- a/v2/ssd/visual.py
+++ b/v2/ssd/visual.py
--- a/v2/text_classification/.gitignore
+++ b/v2/text_classification/.gitignore
--- a/v2/text_classification/README.md
+++ b/v2/text_classification/README.md
--- a/v2/text_classification/images/cnn_net.png
+++ b/v2/text_classification/images/cnn_net.png
--- a/v2/text_classification/images/dnn_net.png
+++ b/v2/text_classification/images/dnn_net.png
--- a/v2/text_classification/infer.py
+++ b/v2/text_classification/infer.py
--- a/v2/text_classification/network_conf.py
+++ b/v2/text_classification/network_conf.py
--- a/v2/text_classification/reader.py
+++ b/v2/text_classification/reader.py
--- a/v2/text_classification/run.sh
+++ b/v2/text_classification/run.sh
--- a/v2/text_classification/train.py
+++ b/v2/text_classification/train.py
--- a/v2/text_classification/utils.py
+++ b/v2/text_classification/utils.py
--- a/v2/youtube_recall/README.cn.md
+++ b/v2/youtube_recall/README.cn.md
--- a/v2/youtube_recall/README.md
+++ b/v2/youtube_recall/README.md
--- a/v2/youtube_recall/data/data.tar
+++ b/v2/youtube_recall/data/data.tar
--- a/v2/youtube_recall/data_processor.py
+++ b/v2/youtube_recall/data_processor.py
--- a/v2/youtube_recall/images/model_network.png
+++ b/v2/youtube_recall/images/model_network.png
--- a/v2/youtube_recall/images/recommendation_system.png
+++ b/v2/youtube_recall/images/recommendation_system.png
--- a/v2/youtube_recall/infer.py
+++ b/v2/youtube_recall/infer.py
--- a/v2/youtube_recall/infer_user.py
+++ b/v2/youtube_recall/infer_user.py
--- a/v2/youtube_recall/item_vector.py
+++ b/v2/youtube_recall/item_vector.py
--- a/v2/youtube_recall/network_conf.py
+++ b/v2/youtube_recall/network_conf.py
--- a/v2/youtube_recall/reader.py
+++ b/v2/youtube_recall/reader.py
--- a/v2/youtube_recall/train.py
+++ b/v2/youtube_recall/train.py
--- a/v2/youtube_recall/user_vector.py
+++ b/v2/youtube_recall/user_vector.py
--- a/v2/youtube_recall/utils.py
+++ b/v2/youtube_recall/utils.py