提交 05403680 编写于 作者: G guosheng

Merge branch 'develop' of https://github.com/PaddlePaddle/models into cloudtest2

[submodule "fluid/SimNet"]
path = fluid/SimNet
url = https://github.com/baidu/AnyQ.git
[submodule "fluid/LAC"]
path = fluid/LAC
url = https://github.com/baidu/lac
[submodule "fluid/Senta"]
path = fluid/Senta
url = https://github.com/baidu/Senta
...@@ -8,7 +8,7 @@ PaddlePaddle provides a rich set of computational units to enable users to adopt ...@@ -8,7 +8,7 @@ PaddlePaddle provides a rich set of computational units to enable users to adopt
- [fluid models](fluid): use PaddlePaddle's Fluid APIs. We especially recommend users to use Fluid models. - [fluid models](fluid): use PaddlePaddle's Fluid APIs. We especially recommend users to use Fluid models.
- [v2 models](v2): use PaddlePaddle's v2 APIs. - [legacy models](legacy): use PaddlePaddle's v2 APIs.
## License ## License
......
...@@ -2,7 +2,6 @@ from __future__ import absolute_import ...@@ -2,7 +2,6 @@ from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
import paddle.v2 as paddle
import paddle.fluid as fluid import paddle.fluid as fluid
......
Subproject commit 66660503bb6e8f34adc4715ccf42cad77ed46ded
...@@ -49,14 +49,46 @@ Network,ICNet)进行语义分割,相比其他分割算法,ICNet兼顾了准 ...@@ -49,14 +49,46 @@ Network,ICNet)进行语义分割,相比其他分割算法,ICNet兼顾了准
- `ICNet <https://github.com/PaddlePaddle/models/tree/develop/fluid/icnet>`__ - `ICNet <https://github.com/PaddlePaddle/models/tree/develop/fluid/icnet>`__
图像生成
-----------
图像生成是指根据输入向量,生成目标图像。这里的输入向量可以是随机的噪声或用户指定的条件向量。具体的应用场景有:手写体生成、人脸合成、风格迁移、图像修复等。当前的图像生成任务主要是借助生成对抗网络(GAN)来实现。
生成对抗网络(GAN)由两种子网络组成:生成器和识别器。生成器的输入是随机噪声或条件向量,输出是目标图像。识别器是一个分类器,输入是一张图像,输出是该图像是否是真实的图像。在训练过程中,生成器和识别器通过不断的相互博弈提升自己的能力。
在图像生成任务中,我们介绍了如何使用DCGAN和ConditioanlGAN来进行手写数字的生成,另外还介绍了用于风格迁移的CycleGAN.
- `DCGAN & ConditionalGAN <https://github.com/PaddlePaddle/models/tree/develop/fluid/gan/c_gan>`__
- `CycleGAN <https://github.com/PaddlePaddle/models/tree/develop/fluid/gan/cycle_gan>`__
场景文字识别 场景文字识别
------------ ------------
许多场景图像中包含着丰富的文本信息,对理解图像信息有着重要作用,能够极大地帮助人们认知和理解场景图像的内容。场景文字识别是在图像背景复杂、分辨率低下、字体多样、分布随意等情况下,将图像信息转化为文字序列的过程,可认为是一种特别的翻译过程:将图像输入翻译为自然语言输出。场景图像文字识别技术的发展也促进了一些新型应用的产生,如通过自动识别路牌中的文字帮助街景应用获取更加准确的地址信息等。 许多场景图像中包含着丰富的文本信息,对理解图像信息有着重要作用,能够极大地帮助人们认知和理解场景图像的内容。场景文字识别是在图像背景复杂、分辨率低下、字体多样、分布随意等情况下,将图像信息转化为文字序列的过程,可认为是一种特别的翻译过程:将图像输入翻译为自然语言输出。场景图像文字识别技术的发展也促进了一些新型应用的产生,如通过自动识别路牌中的文字帮助街景应用获取更加准确的地址信息等。
在场景文字识别任务中,我们介绍如何将基于CNN的图像特征提取和基于RNN的序列翻译技术结合,免除人工定义特征,避免字符分割,使用自动学习到的图像特征,完成端到端地无约束字符定位和识别。当前,介绍了CRNN-CTC模型,后续会引入基于注意力机制的序列到序列模型。 在场景文字识别任务中,我们介绍如何将基于CNN的图像特征提取和基于RNN的序列翻译技术结合,免除人工定义特征,避免字符分割,使用自动学习到的图像特征,完成字符识别。当前,介绍了CRNN-CTC模型和基于注意力机制的序列到序列模型。
- `CRNN-CTC模型 <https://github.com/PaddlePaddle/models/tree/develop/fluid/ocr_recognition>`__ - `CRNN-CTC模型 <https://github.com/PaddlePaddle/models/tree/develop/fluid/ocr_recognition>`__
- `Attention模型 <https://github.com/PaddlePaddle/models/tree/develop/fluid/ocr_recognition>`__
度量学习
-------
度量学习也称作距离度量学习、相似度学习,通过学习对象之间的距离,度量学习能够用于分析对象时间的关联、比较关系,在实际问题中应用较为广泛,可应用于辅助分类、聚类问题,也广泛用于图像检索、人脸识别等领域。以往,针对不同的任务,需要选择合适的特征并手动构建距离函数,而度量学习可根据不同的任务来自主学习出针对特定任务的度量距离函数。度量学习和深度学习的结合,在人脸识别/验证、行人再识别(human Re-ID)、图像检索等领域均取得较好的性能,在这个任务中我们主要介绍了基于Fluid的深度度量学习模型,包含了三元组、四元组等损失函数。
- `Metric Learning <https://github.com/PaddlePaddle/models/tree/develop/fluid/metric_learning>`__
视频分类
-------
视频分类是视频理解任务的基础,与图像分类不同的是,分类的对象不再是静止的图像,而是一个由多帧图像构成的、包含语音数据、包含运动信息等的视频对象,因此理解视频需要获得更多的上下文信息,不仅要理解每帧图像是什么、包含什么,还需要结合不同帧,知道上下文的关联信息。视频分类方法主要包含基于卷积神经网络、基于循环神经网络、或将这两者结合的方法。该任务中我们介绍基于Fluid的视频分类模型,目前包含Temporal Segment Network(TSN)模型,后续会持续增加更多模型。
- `TSN <https://github.com/PaddlePaddle/models/tree/develop/fluid/video_classification>`__
语音识别 语音识别
-------- --------
...@@ -124,6 +156,15 @@ DQN 及其变体,并测试了它们在 Atari 游戏中的表现。 ...@@ -124,6 +156,15 @@ DQN 及其变体,并测试了它们在 Atari 游戏中的表现。
- `Senta <https://github.com/baidu/Senta/blob/master/README.md>`__ - `Senta <https://github.com/baidu/Senta/blob/master/README.md>`__
语义匹配
--------
在自然语言处理很多场景中,需要度量两个文本在语义上的相似度,这类任务通常被称为语义匹配。例如在搜索中根据查询与候选文档的相似度对搜索结果进行排序,文本去重中文本与文本相似度的计算,自动问答中候选答案与问题的匹配等。
本例所开放的DAM (Deep Attention Matching Network)为百度自然语言处理部发表于ACL-2018的工作,用于检索式聊天机器人多轮对话中应答的选择。DAM受Transformer的启发,其网络结构完全基于注意力(attention)机制,利用栈式的self-attention结构分别学习不同粒度下应答和语境的语义表示,然后利用cross-attention获取应答与语境之间的相关性,在两个大规模多轮对话数据集上的表现均好于其它模型。
- `Deep Attention Matching Network <https://github.com/PaddlePaddle/models/tree/develop/fluid/deep_attention_matching_net>`__
AnyQ AnyQ
---- ----
...@@ -135,3 +176,12 @@ SimNet是百度自然语言处理部于2013年自主研发的语义匹配框架 ...@@ -135,3 +176,12 @@ SimNet是百度自然语言处理部于2013年自主研发的语义匹配框架
- `SimNet in PaddlePaddle - `SimNet in PaddlePaddle
Fluid <https://github.com/baidu/AnyQ/blob/master/tools/simnet/train/paddle/README.md>`__ Fluid <https://github.com/baidu/AnyQ/blob/master/tools/simnet/train/paddle/README.md>`__
机器阅读理解
----
机器阅读理解(MRC)是自然语言处理(NLP)中的核心任务之一,最终目标是让机器像人类一样阅读文本,提炼文本信息并回答相关问题。深度学习近年来在NLP中得到广泛使用,也使得机器阅读理解能力在近年有了大幅提高,但是目前研究的机器阅读理解都采用人工构造的数据集,以及回答一些相对简单的问题,和人类处理的数据还有明显差距,因此亟需大规模真实训练数据推动MRC的进一步发展。
百度阅读理解数据集是由百度自然语言处理部开源的一个真实世界数据集,所有的问题、原文都来源于实际数据(百度搜索引擎数据和百度知道问答社区),答案是由人类回答的。每个问题都对应多个答案,数据集包含200k问题、1000k原文和420k答案,是目前最大的中文MRC数据集。百度同时开源了对应的阅读理解模型,称为DuReader,采用当前通用的网络分层结构,通过双向attention机制捕捉问题和原文之间的交互关系,生成query-aware的原文表示,最终基于query-aware的原文表示通过point network预测答案范围。
- `DuReader in PaddlePaddle Fluid] <https://github.com/PaddlePaddle/models/blob/develop/fluid/machine_reading_comprehension/README.md>`__
...@@ -28,8 +28,11 @@ Fluid模型配置和参数文件的工具。 ...@@ -28,8 +28,11 @@ Fluid模型配置和参数文件的工具。
开放环境中的检测人脸,尤其是小的、模糊的和部分遮挡的人脸也是一个具有挑战的任务。我们也介绍了如何基于 [WIDER FACE](http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace) 数据训练百度自研的人脸检测PyramidBox模型,该算法于2018年3月份在WIDER FACE的多项评测中均获得 [第一名](http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/WiderFace_Results.html) 开放环境中的检测人脸,尤其是小的、模糊的和部分遮挡的人脸也是一个具有挑战的任务。我们也介绍了如何基于 [WIDER FACE](http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace) 数据训练百度自研的人脸检测PyramidBox模型,该算法于2018年3月份在WIDER FACE的多项评测中均获得 [第一名](http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/WiderFace_Results.html)
Faster RCNN 是典型的两阶段目标检测器,相较于传统提取区域的方法,Faster RCNN中RPN网络通过共享卷积层参数大幅提高提取区域的效率,并提出高质量的候选区域。
- [Single Shot MultiBox Detector](https://github.com/PaddlePaddle/models/blob/develop/fluid/object_detection/README_cn.md) - [Single Shot MultiBox Detector](https://github.com/PaddlePaddle/models/blob/develop/fluid/object_detection/README_cn.md)
- [Face Detector: PyramidBox](https://github.com/PaddlePaddle/models/tree/develop/fluid/face_detection/README_cn.md) - [Face Detector: PyramidBox](https://github.com/PaddlePaddle/models/tree/develop/fluid/face_detection/README_cn.md)
- [Faster RCNN](https://github.com/PaddlePaddle/models/tree/develop/fluid/faster_rcnn/README_cn.md)
图像语义分割 图像语义分割
------------ ------------
...@@ -41,14 +44,45 @@ Network,ICNet)进行语义分割,相比其他分割算法,ICNet兼顾了准 ...@@ -41,14 +44,45 @@ Network,ICNet)进行语义分割,相比其他分割算法,ICNet兼顾了准
- [ICNet](https://github.com/PaddlePaddle/models/tree/develop/fluid/icnet) - [ICNet](https://github.com/PaddlePaddle/models/tree/develop/fluid/icnet)
图像生成
-----------
图像生成是指根据输入向量,生成目标图像。这里的输入向量可以是随机的噪声或用户指定的条件向量。具体的应用场景有:手写体生成、人脸合成、风格迁移、图像修复等。当前的图像生成任务主要是借助生成对抗网络(GAN)来实现。
生成对抗网络(GAN)由两种子网络组成:生成器和识别器。生成器的输入是随机噪声或条件向量,输出是目标图像。识别器是一个分类器,输入是一张图像,输出是该图像是否是真实的图像。在训练过程中,生成器和识别器通过不断的相互博弈提升自己的能力。
在图像生成任务中,我们介绍了如何使用DCGAN和ConditioanlGAN来进行手写数字的生成,另外还介绍了用于风格迁移的CycleGAN.
- [DCGAN & ConditionalGAN](https://github.com/PaddlePaddle/models/tree/develop/fluid/gan/c_gan)
- [CycleGAN](https://github.com/PaddlePaddle/models/tree/develop/fluid/gan/cycle_gan)
场景文字识别 场景文字识别
------------ ------------
许多场景图像中包含着丰富的文本信息,对理解图像信息有着重要作用,能够极大地帮助人们认知和理解场景图像的内容。场景文字识别是在图像背景复杂、分辨率低下、字体多样、分布随意等情况下,将图像信息转化为文字序列的过程,可认为是一种特别的翻译过程:将图像输入翻译为自然语言输出。场景图像文字识别技术的发展也促进了一些新型应用的产生,如通过自动识别路牌中的文字帮助街景应用获取更加准确的地址信息等。 许多场景图像中包含着丰富的文本信息,对理解图像信息有着重要作用,能够极大地帮助人们认知和理解场景图像的内容。场景文字识别是在图像背景复杂、分辨率低下、字体多样、分布随意等情况下,将图像信息转化为文字序列的过程,可认为是一种特别的翻译过程:将图像输入翻译为自然语言输出。场景图像文字识别技术的发展也促进了一些新型应用的产生,如通过自动识别路牌中的文字帮助街景应用获取更加准确的地址信息等。
在场景文字识别任务中,我们介绍如何将基于CNN的图像特征提取和基于RNN的序列翻译技术结合,免除人工定义特征,避免字符分割,使用自动学习到的图像特征,完成端到端地无约束字符定位和识别。当前,介绍了CRNN-CTC模型,后续会引入基于注意力机制的序列到序列模型。 在场景文字识别任务中,我们介绍如何将基于CNN的图像特征提取和基于RNN的序列翻译技术结合,免除人工定义特征,避免字符分割,使用自动学习到的图像特征,完成字符识别。当前,介绍了CRNN-CTC模型和基于注意力机制的序列到序列模型。
- [CRNN-CTC模型](https://github.com/PaddlePaddle/models/tree/develop/fluid/ocr_recognition)
- [Attention模型](https://github.com/PaddlePaddle/models/tree/develop/fluid/ocr_recognition)
度量学习
-------
度量学习也称作距离度量学习、相似度学习,通过学习对象之间的距离,度量学习能够用于分析对象时间的关联、比较关系,在实际问题中应用较为广泛,可应用于辅助分类、聚类问题,也广泛用于图像检索、人脸识别等领域。以往,针对不同的任务,需要选择合适的特征并手动构建距离函数,而度量学习可根据不同的任务来自主学习出针对特定任务的度量距离函数。度量学习和深度学习的结合,在人脸识别/验证、行人再识别(human Re-ID)、图像检索等领域均取得较好的性能,在这个任务中我们主要介绍了基于Fluid的深度度量学习模型,包含了三元组、四元组等损失函数。
- [Metric Learning](https://github.com/PaddlePaddle/models/tree/develop/fluid/metric_learning)
视频分类
-------
视频分类是视频理解任务的基础,与图像分类不同的是,分类的对象不再是静止的图像,而是一个由多帧图像构成的、包含语音数据、包含运动信息等的视频对象,因此理解视频需要获得更多的上下文信息,不仅要理解每帧图像是什么、包含什么,还需要结合不同帧,知道上下文的关联信息。视频分类方法主要包含基于卷积神经网络、基于循环神经网络、或将这两者结合的方法。该任务中我们介绍基于Fluid的视频分类模型,目前包含Temporal Segment Network(TSN)模型,后续会持续增加更多模型。
- [TSN](https://github.com/PaddlePaddle/models/tree/develop/fluid/video_classification)
- [CRNN-CTC模](https://github.com/PaddlePaddle/models/tree/develop/fluid/ocr_recognition)
语音识别 语音识别
-------- --------
...@@ -94,6 +128,15 @@ Machine Translation, NMT)等阶段。在 NMT 成熟后,机器翻译才真正 ...@@ -94,6 +128,15 @@ Machine Translation, NMT)等阶段。在 NMT 成熟后,机器翻译才真正
- [Senta](https://github.com/baidu/Senta/blob/master/README.md) - [Senta](https://github.com/baidu/Senta/blob/master/README.md)
语义匹配
--------
在自然语言处理很多场景中,需要度量两个文本在语义上的相似度,这类任务通常被称为语义匹配。例如在搜索中根据查询与候选文档的相似度对搜索结果进行排序,文本去重中文本与文本相似度的计算,自动问答中候选答案与问题的匹配等。
本例所开放的DAM (Deep Attention Matching Network)为百度自然语言处理部发表于ACL-2018的工作,用于检索式聊天机器人多轮对话中应答的选择。DAM受Transformer的启发,其网络结构完全基于注意力(attention)机制,利用栈式的self-attention结构分别学习不同粒度下应答和语境的语义表示,然后利用cross-attention获取应答与语境之间的相关性,在两个大规模多轮对话数据集上的表现均好于其它模型。
- [Deep Attention Matching Network](https://github.com/PaddlePaddle/models/tree/develop/fluid/deep_attention_matching_net)
AnyQ AnyQ
---- ----
...@@ -102,3 +145,12 @@ AnyQ ...@@ -102,3 +145,12 @@ AnyQ
SimNet是百度自然语言处理部于2013年自主研发的语义匹配框架,该框架在百度各产品上广泛应用,主要包括BOW、CNN、RNN、MM-DNN等核心网络结构形式,同时基于该框架也集成了学术界主流的语义匹配模型,如MatchPyramid、MV-LSTM、K-NRM等模型。使用SimNet构建出的模型可以便捷的加入AnyQ系统中,增强AnyQ系统的语义匹配能力。 SimNet是百度自然语言处理部于2013年自主研发的语义匹配框架,该框架在百度各产品上广泛应用,主要包括BOW、CNN、RNN、MM-DNN等核心网络结构形式,同时基于该框架也集成了学术界主流的语义匹配模型,如MatchPyramid、MV-LSTM、K-NRM等模型。使用SimNet构建出的模型可以便捷的加入AnyQ系统中,增强AnyQ系统的语义匹配能力。
- [SimNet in PaddlePaddle Fluid](https://github.com/baidu/AnyQ/blob/master/tools/simnet/train/paddle/README.md) - [SimNet in PaddlePaddle Fluid](https://github.com/baidu/AnyQ/blob/master/tools/simnet/train/paddle/README.md)
机器阅读理解
----------
机器阅读理解(MRC)是自然语言处理(NLP)中的核心任务之一,最终目标是让机器像人类一样阅读文本,提炼文本信息并回答相关问题。深度学习近年来在NLP中得到广泛使用,也使得机器阅读理解能力在近年有了大幅提高,但是目前研究的机器阅读理解都采用人工构造的数据集,以及回答一些相对简单的问题,和人类处理的数据还有明显差距,因此亟需大规模真实训练数据推动MRC的进一步发展。
百度阅读理解数据集是由百度自然语言处理部开源的一个真实世界数据集,所有的问题、原文都来源于实际数据(百度搜索引擎数据和百度知道问答社区),答案是由人类回答的。每个问题都对应多个答案,数据集包含200k问题、1000k原文和420k答案,是目前最大的中文MRC数据集。百度同时开源了对应的阅读理解模型,称为DuReader,采用当前通用的网络分层结构,通过双向attention机制捕捉问题和原文之间的交互关系,生成query-aware的原文表示,最终基于query-aware的原文表示通过point network预测答案范围。
- [DuReader in PaddlePaddle Fluid](https://github.com/PaddlePaddle/models/blob/develop/fluid/machine_reading_comprehension/README.md)
Subproject commit 870651e257750f2c237f0b0bc9a27e5d062d1909
Subproject commit 4dbe7f7b0e76c188eb7f448d104f0165f0a12229
""" """
CNN on mnist data using fluid api of paddlepaddle CNN on mnist data using fluid api of paddlepaddle
""" """
import paddle.v2 as paddle import paddle
import paddle.fluid as fluid import paddle.fluid as fluid
......
...@@ -8,7 +8,7 @@ sys.path.append("..") ...@@ -8,7 +8,7 @@ sys.path.append("..")
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
import paddle.fluid as fluid import paddle.fluid as fluid
import paddle.v2 as paddle import paddle
from advbox.adversary import Adversary from advbox.adversary import Adversary
from advbox.attacks.gradient_method import BIM from advbox.attacks.gradient_method import BIM
......
...@@ -8,7 +8,7 @@ sys.path.append("..") ...@@ -8,7 +8,7 @@ sys.path.append("..")
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
import paddle.fluid as fluid import paddle.fluid as fluid
import paddle.v2 as paddle import paddle
from advbox.adversary import Adversary from advbox.adversary import Adversary
from advbox.attacks.deepfool import DeepFoolAttack from advbox.attacks.deepfool import DeepFoolAttack
......
...@@ -8,7 +8,7 @@ sys.path.append("..") ...@@ -8,7 +8,7 @@ sys.path.append("..")
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
import numpy as np import numpy as np
import paddle.fluid as fluid import paddle.fluid as fluid
import paddle.v2 as paddle import paddle
from advbox.adversary import Adversary from advbox.adversary import Adversary
from advbox.attacks.gradient_method import FGSM from advbox.attacks.gradient_method import FGSM
......
...@@ -7,7 +7,7 @@ sys.path.append("..") ...@@ -7,7 +7,7 @@ sys.path.append("..")
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
import paddle.fluid as fluid import paddle.fluid as fluid
import paddle.v2 as paddle import paddle
from advbox.adversary import Adversary from advbox.adversary import Adversary
from advbox.attacks.gradient_method import ILCM from advbox.attacks.gradient_method import ILCM
......
...@@ -7,7 +7,7 @@ sys.path.append("..") ...@@ -7,7 +7,7 @@ sys.path.append("..")
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
import paddle.fluid as fluid import paddle.fluid as fluid
import paddle.v2 as paddle import paddle
from advbox.adversary import Adversary from advbox.adversary import Adversary
from advbox.attacks.saliency import JSMA from advbox.attacks.saliency import JSMA
......
...@@ -7,7 +7,7 @@ sys.path.append("..") ...@@ -7,7 +7,7 @@ sys.path.append("..")
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
import paddle.fluid as fluid import paddle.fluid as fluid
import paddle.v2 as paddle import paddle
from advbox.adversary import Adversary from advbox.adversary import Adversary
from advbox.attacks.lbfgs import LBFGS from advbox.attacks.lbfgs import LBFGS
......
...@@ -9,7 +9,7 @@ sys.path.append("..") ...@@ -9,7 +9,7 @@ sys.path.append("..")
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
import numpy as np import numpy as np
import paddle.fluid as fluid import paddle.fluid as fluid
import paddle.v2 as paddle import paddle
from advbox.adversary import Adversary from advbox.adversary import Adversary
from advbox.attacks.gradient_method import MIFGSM from advbox.attacks.gradient_method import MIFGSM
......
import cPickle as pickle import six
import numpy as np import numpy as np
import paddle.fluid as fluid import paddle.fluid as fluid
import utils.layers as layers import utils.layers as layers
...@@ -29,7 +29,7 @@ class Net(object): ...@@ -29,7 +29,7 @@ class Net(object):
mask_cache = dict() if self.use_mask_cache else None mask_cache = dict() if self.use_mask_cache else None
turns_data = [] turns_data = []
for i in xrange(self._max_turn_num): for i in six.moves.xrange(self._max_turn_num):
turn = fluid.layers.data( turn = fluid.layers.data(
name="turn_%d" % i, name="turn_%d" % i,
shape=[self._max_turn_len, 1], shape=[self._max_turn_len, 1],
...@@ -37,7 +37,7 @@ class Net(object): ...@@ -37,7 +37,7 @@ class Net(object):
turns_data.append(turn) turns_data.append(turn)
turns_mask = [] turns_mask = []
for i in xrange(self._max_turn_num): for i in six.moves.xrange(self._max_turn_num):
turn_mask = fluid.layers.data( turn_mask = fluid.layers.data(
name="turn_mask_%d" % i, name="turn_mask_%d" % i,
shape=[self._max_turn_len, 1], shape=[self._max_turn_len, 1],
...@@ -64,7 +64,7 @@ class Net(object): ...@@ -64,7 +64,7 @@ class Net(object):
Hr = response_emb Hr = response_emb
Hr_stack = [Hr] Hr_stack = [Hr]
for index in range(self._stack_num): for index in six.moves.xrange(self._stack_num):
Hr = layers.block( Hr = layers.block(
name="response_self_stack" + str(index), name="response_self_stack" + str(index),
query=Hr, query=Hr,
...@@ -78,7 +78,7 @@ class Net(object): ...@@ -78,7 +78,7 @@ class Net(object):
# context part # context part
sim_turns = [] sim_turns = []
for t in xrange(self._max_turn_num): for t in six.moves.xrange(self._max_turn_num):
Hu = fluid.layers.embedding( Hu = fluid.layers.embedding(
input=turns_data[t], input=turns_data[t],
size=[self._vocab_size + 1, self._emb_size], size=[self._vocab_size + 1, self._emb_size],
...@@ -88,7 +88,7 @@ class Net(object): ...@@ -88,7 +88,7 @@ class Net(object):
initializer=fluid.initializer.Normal(scale=0.1))) initializer=fluid.initializer.Normal(scale=0.1)))
Hu_stack = [Hu] Hu_stack = [Hu]
for index in range(self._stack_num): for index in six.moves.xrange(self._stack_num):
# share parameters # share parameters
Hu = layers.block( Hu = layers.block(
name="turn_self_stack" + str(index), name="turn_self_stack" + str(index),
...@@ -104,7 +104,7 @@ class Net(object): ...@@ -104,7 +104,7 @@ class Net(object):
# cross attention # cross attention
r_a_t_stack = [] r_a_t_stack = []
t_a_r_stack = [] t_a_r_stack = []
for index in range(self._stack_num + 1): for index in six.moves.xrange(self._stack_num + 1):
t_a_r = layers.block( t_a_r = layers.block(
name="t_attend_r_" + str(index), name="t_attend_r_" + str(index),
query=Hu_stack[index], query=Hu_stack[index],
...@@ -134,7 +134,7 @@ class Net(object): ...@@ -134,7 +134,7 @@ class Net(object):
t_a_r = fluid.layers.stack(t_a_r_stack, axis=1) t_a_r = fluid.layers.stack(t_a_r_stack, axis=1)
r_a_t = fluid.layers.stack(r_a_t_stack, axis=1) r_a_t = fluid.layers.stack(r_a_t_stack, axis=1)
else: else:
for index in xrange(len(t_a_r_stack)): for index in six.moves.xrange(len(t_a_r_stack)):
t_a_r_stack[index] = fluid.layers.unsqueeze( t_a_r_stack[index] = fluid.layers.unsqueeze(
input=t_a_r_stack[index], axes=[1]) input=t_a_r_stack[index], axes=[1])
r_a_t_stack[index] = fluid.layers.unsqueeze( r_a_t_stack[index] = fluid.layers.unsqueeze(
...@@ -151,7 +151,7 @@ class Net(object): ...@@ -151,7 +151,7 @@ class Net(object):
if self.use_stack_op: if self.use_stack_op:
sim = fluid.layers.stack(sim_turns, axis=2) sim = fluid.layers.stack(sim_turns, axis=2)
else: else:
for index in xrange(len(sim_turns)): for index in six.moves.xrange(len(sim_turns)):
sim_turns[index] = fluid.layers.unsqueeze( sim_turns[index] = fluid.layers.unsqueeze(
input=sim_turns[index], axes=[2]) input=sim_turns[index], axes=[2])
# sim shape: [batch_size, 2*(stack_num+1), max_turn_num, max_turn_len, max_turn_len] # sim shape: [batch_size, 2*(stack_num+1), max_turn_num, max_turn_len, max_turn_len]
......
import os import os
import six
import numpy as np import numpy as np
import time import time
import argparse import argparse
...@@ -6,8 +7,12 @@ import multiprocessing ...@@ -6,8 +7,12 @@ import multiprocessing
import paddle import paddle
import paddle.fluid as fluid import paddle.fluid as fluid
import utils.reader as reader import utils.reader as reader
import cPickle as pickle from utils.util import print_arguments, mkdir
from utils.util import print_arguments
try:
import cPickle as pickle #python 2
except ImportError as e:
import pickle #python 3
from model import Net from model import Net
...@@ -107,7 +112,7 @@ def parse_args(): ...@@ -107,7 +112,7 @@ def parse_args():
def test(args): def test(args):
if not os.path.exists(args.save_path): if not os.path.exists(args.save_path):
raise ValueError("Invalid save path %s" % args.save_path) mkdir(args.save_path)
if not os.path.exists(args.model_path): if not os.path.exists(args.model_path):
raise ValueError("Invalid model init path %s" % args.model_path) raise ValueError("Invalid model init path %s" % args.model_path)
# data data_config # data data_config
...@@ -158,7 +163,11 @@ def test(args): ...@@ -158,7 +163,11 @@ def test(args):
use_cuda=args.use_cuda, main_program=test_program) use_cuda=args.use_cuda, main_program=test_program)
print("start loading data ...") print("start loading data ...")
train_data, val_data, test_data = pickle.load(open(args.data_path, 'rb')) with open(args.data_path, 'rb') as f:
if six.PY2:
train_data, val_data, test_data = pickle.load(f)
else:
train_data, val_data, test_data = pickle.load(f, encoding="bytes")
print("finish loading data ...") print("finish loading data ...")
if args.ext_eval: if args.ext_eval:
...@@ -178,9 +187,9 @@ def test(args): ...@@ -178,9 +187,9 @@ def test(args):
score_path = os.path.join(args.save_path, 'score.txt') score_path = os.path.join(args.save_path, 'score.txt')
score_file = open(score_path, 'w') score_file = open(score_path, 'w')
for it in xrange(test_batch_num // dev_count): for it in six.moves.xrange(test_batch_num // dev_count):
feed_list = [] feed_list = []
for dev in xrange(dev_count): for dev in six.moves.xrange(dev_count):
index = it * dev_count + dev index = it * dev_count + dev
feed_dict = reader.make_one_batch_input(test_batches, index) feed_dict = reader.make_one_batch_input(test_batches, index)
feed_list.append(feed_dict) feed_list.append(feed_dict)
...@@ -190,9 +199,9 @@ def test(args): ...@@ -190,9 +199,9 @@ def test(args):
scores = np.array(predicts[0]) scores = np.array(predicts[0])
print("step = %d" % it) print("step = %d" % it)
for dev in xrange(dev_count): for dev in six.moves.xrange(dev_count):
index = it * dev_count + dev index = it * dev_count + dev
for i in xrange(args.batch_size): for i in six.moves.xrange(args.batch_size):
score_file.write( score_file.write(
str(scores[args.batch_size * dev + i][0]) + '\t' + str( str(scores[args.batch_size * dev + i][0]) + '\t' + str(
test_batches["label"][index][i]) + '\n') test_batches["label"][index][i]) + '\n')
......
import os import os
import six
import numpy as np import numpy as np
import time import time
import argparse import argparse
...@@ -6,9 +7,13 @@ import multiprocessing ...@@ -6,9 +7,13 @@ import multiprocessing
import paddle import paddle
import paddle.fluid as fluid import paddle.fluid as fluid
import utils.reader as reader import utils.reader as reader
import cPickle as pickle
from utils.util import print_arguments from utils.util import print_arguments
try:
import cPickle as pickle #python 2
except ImportError as e:
import pickle #python 3
from model import Net from model import Net
...@@ -164,35 +169,45 @@ def train(args): ...@@ -164,35 +169,45 @@ def train(args):
if args.word_emb_init is not None: if args.word_emb_init is not None:
print("start loading word embedding init ...") print("start loading word embedding init ...")
word_emb = np.array(pickle.load(open(args.word_emb_init, 'rb'))).astype( if six.PY2:
'float32') word_emb = np.array(pickle.load(open(args.word_emb_init,
'rb'))).astype('float32')
else:
word_emb = np.array(
pickle.load(
open(args.word_emb_init, 'rb'), encoding="bytes")).astype(
'float32')
dam.set_word_embedding(word_emb, place) dam.set_word_embedding(word_emb, place)
print("finish init word embedding ...") print("finish init word embedding ...")
print("start loading data ...") print("start loading data ...")
train_data, val_data, test_data = pickle.load(open(args.data_path, 'rb')) with open(args.data_path, 'rb') as f:
if six.PY2:
train_data, val_data, test_data = pickle.load(f)
else:
train_data, val_data, test_data = pickle.load(f, encoding="bytes")
print("finish loading data ...") print("finish loading data ...")
val_batches = reader.build_batches(val_data, data_conf) val_batches = reader.build_batches(val_data, data_conf)
batch_num = len(train_data['y']) / args.batch_size batch_num = len(train_data[six.b('y')]) // args.batch_size
val_batch_num = len(val_batches["response"]) val_batch_num = len(val_batches["response"])
print_step = max(1, batch_num / (dev_count * 100)) print_step = max(1, batch_num // (dev_count * 100))
save_step = max(1, batch_num / (dev_count * 10)) save_step = max(1, batch_num // (dev_count * 10))
print("begin model training ...") print("begin model training ...")
print(time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(time.time()))) print(time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(time.time())))
step = 0 step = 0
for epoch in xrange(args.num_scan_data): for epoch in six.moves.xrange(args.num_scan_data):
shuffle_train = reader.unison_shuffle(train_data) shuffle_train = reader.unison_shuffle(train_data)
train_batches = reader.build_batches(shuffle_train, data_conf) train_batches = reader.build_batches(shuffle_train, data_conf)
ave_cost = 0.0 ave_cost = 0.0
for it in xrange(batch_num // dev_count): for it in six.moves.xrange(batch_num // dev_count):
feed_list = [] feed_list = []
for dev in xrange(dev_count): for dev in six.moves.xrange(dev_count):
index = it * dev_count + dev index = it * dev_count + dev
feed_dict = reader.make_one_batch_input(train_batches, index) feed_dict = reader.make_one_batch_input(train_batches, index)
feed_list.append(feed_dict) feed_list.append(feed_dict)
...@@ -215,9 +230,9 @@ def train(args): ...@@ -215,9 +230,9 @@ def train(args):
score_path = os.path.join(args.save_path, 'score.' + str(step)) score_path = os.path.join(args.save_path, 'score.' + str(step))
score_file = open(score_path, 'w') score_file = open(score_path, 'w')
for it in xrange(val_batch_num // dev_count): for it in six.moves.xrange(val_batch_num // dev_count):
feed_list = [] feed_list = []
for dev in xrange(dev_count): for dev in six.moves.xrange(dev_count):
val_index = it * dev_count + dev val_index = it * dev_count + dev
feed_dict = reader.make_one_batch_input(val_batches, feed_dict = reader.make_one_batch_input(val_batches,
val_index) val_index)
...@@ -227,9 +242,9 @@ def train(args): ...@@ -227,9 +242,9 @@ def train(args):
fetch_list=[logits.name]) fetch_list=[logits.name])
scores = np.array(predicts[0]) scores = np.array(predicts[0])
for dev in xrange(dev_count): for dev in six.moves.xrange(dev_count):
val_index = it * dev_count + dev val_index = it * dev_count + dev
for i in xrange(args.batch_size): for i in six.moves.xrange(args.batch_size):
score_file.write( score_file.write(
str(scores[args.batch_size * dev + i][0]) + '\t' str(scores[args.batch_size * dev + i][0]) + '\t'
+ str(val_batches["label"][val_index][ + str(val_batches["label"][val_index][
......
import sys import sys
import six
import numpy as np import numpy as np
from sklearn.metrics import average_precision_score from sklearn.metrics import average_precision_score
...@@ -7,7 +8,7 @@ def mean_average_precision(sort_data): ...@@ -7,7 +8,7 @@ def mean_average_precision(sort_data):
#to do #to do
count_1 = 0 count_1 = 0
sum_precision = 0 sum_precision = 0
for index in range(len(sort_data)): for index in six.moves.xrange(len(sort_data)):
if sort_data[index][1] == 1: if sort_data[index][1] == 1:
count_1 += 1 count_1 += 1
sum_precision += 1.0 * count_1 / (index + 1) sum_precision += 1.0 * count_1 / (index + 1)
......
import sys import sys
import six
def get_p_at_n_in_m(data, n, m, ind): def get_p_at_n_in_m(data, n, m, ind):
...@@ -30,9 +31,9 @@ def evaluate(file_path): ...@@ -30,9 +31,9 @@ def evaluate(file_path):
p_at_2_in_10 = 0.0 p_at_2_in_10 = 0.0
p_at_5_in_10 = 0.0 p_at_5_in_10 = 0.0
length = len(data) / 10 length = len(data) // 10
for i in xrange(0, length): for i in six.moves.xrange(0, length):
ind = i * 10 ind = i * 10
assert data[ind][1] == 1 assert data[ind][1] == 1
......
import cPickle as pickle import six
import numpy as np import numpy as np
try:
import cPickle as pickle #python 2
except ImportError as e:
import pickle #python 3
def unison_shuffle(data, seed=None): def unison_shuffle(data, seed=None):
if seed is not None: if seed is not None:
np.random.seed(seed) np.random.seed(seed)
y = np.array(data['y']) y = np.array(data[six.b('y')])
c = np.array(data['c']) c = np.array(data[six.b('c')])
r = np.array(data['r']) r = np.array(data[six.b('r')])
assert len(y) == len(c) == len(r) assert len(y) == len(c) == len(r)
p = np.random.permutation(len(y)) p = np.random.permutation(len(y))
shuffle_data = {'y': y[p], 'c': c[p], 'r': r[p]} shuffle_data = {six.b('y'): y[p], six.b('c'): c[p], six.b('r'): r[p]}
return shuffle_data return shuffle_data
...@@ -65,9 +70,9 @@ def produce_one_sample(data, ...@@ -65,9 +70,9 @@ def produce_one_sample(data,
max_turn_len=50 max_turn_len=50
return y, nor_turns_nor_c, nor_r, turn_len, term_len, r_len return y, nor_turns_nor_c, nor_r, turn_len, term_len, r_len
''' '''
c = data['c'][index] c = data[six.b('c')][index]
r = data['r'][index][:] r = data[six.b('r')][index][:]
y = data['y'][index] y = data[six.b('y')][index]
turns = split_c(c, split_id) turns = split_c(c, split_id)
#normalize turns_c length, nor_turns length is max_turn_num #normalize turns_c length, nor_turns length is max_turn_num
...@@ -101,7 +106,7 @@ def build_one_batch(data, ...@@ -101,7 +106,7 @@ def build_one_batch(data,
_label = [] _label = []
for i in range(conf['batch_size']): for i in six.moves.xrange(conf['batch_size']):
index = batch_index * conf['batch_size'] + i index = batch_index * conf['batch_size'] + i
y, nor_turns_nor_c, nor_r, turn_len, term_len, r_len = produce_one_sample( y, nor_turns_nor_c, nor_r, turn_len, term_len, r_len = produce_one_sample(
data, index, conf['_EOS_'], conf['max_turn_num'], data, index, conf['_EOS_'], conf['max_turn_num'],
...@@ -145,8 +150,8 @@ def build_batches(data, conf, turn_cut_type='tail', term_cut_type='tail'): ...@@ -145,8 +150,8 @@ def build_batches(data, conf, turn_cut_type='tail', term_cut_type='tail'):
_label_batches = [] _label_batches = []
batch_len = len(data['y']) / conf['batch_size'] batch_len = len(data[six.b('y')]) // conf['batch_size']
for batch_index in range(batch_len): for batch_index in six.moves.range(batch_len):
_turns, _tt_turns_len, _every_turn_len, _response, _response_len, _label = build_one_batch( _turns, _tt_turns_len, _every_turn_len, _response, _response_len, _label = build_one_batch(
data, batch_index, conf, turn_cut_type='tail', term_cut_type='tail') data, batch_index, conf, turn_cut_type='tail', term_cut_type='tail')
...@@ -192,8 +197,10 @@ def make_one_batch_input(data_batches, index): ...@@ -192,8 +197,10 @@ def make_one_batch_input(data_batches, index):
max_turn_num = turns.shape[1] max_turn_num = turns.shape[1]
max_turn_len = turns.shape[2] max_turn_len = turns.shape[2]
turns_list = [turns[:, i, :] for i in xrange(max_turn_num)] turns_list = [turns[:, i, :] for i in six.moves.xrange(max_turn_num)]
every_turn_len_list = [every_turn_len[:, i] for i in xrange(max_turn_num)] every_turn_len_list = [
every_turn_len[:, i] for i in six.moves.xrange(max_turn_num)
]
feed_dict = {} feed_dict = {}
for i, turn in enumerate(turns_list): for i, turn in enumerate(turns_list):
...@@ -204,7 +211,7 @@ def make_one_batch_input(data_batches, index): ...@@ -204,7 +211,7 @@ def make_one_batch_input(data_batches, index):
for i, turn_len in enumerate(every_turn_len_list): for i, turn_len in enumerate(every_turn_len_list):
feed_dict["turn_mask_%d" % i] = np.ones( feed_dict["turn_mask_%d" % i] = np.ones(
(batch_size, max_turn_len, 1)).astype("float32") (batch_size, max_turn_len, 1)).astype("float32")
for row in xrange(batch_size): for row in six.moves.xrange(batch_size):
feed_dict["turn_mask_%d" % i][row, turn_len[row]:, 0] = 0 feed_dict["turn_mask_%d" % i][row, turn_len[row]:, 0] = 0
feed_dict["response"] = response feed_dict["response"] = response
...@@ -212,7 +219,7 @@ def make_one_batch_input(data_batches, index): ...@@ -212,7 +219,7 @@ def make_one_batch_input(data_batches, index):
feed_dict["response_mask"] = np.ones( feed_dict["response_mask"] = np.ones(
(batch_size, max_turn_len, 1)).astype("float32") (batch_size, max_turn_len, 1)).astype("float32")
for row in xrange(batch_size): for row in six.moves.xrange(batch_size):
feed_dict["response_mask"][row, response_len[row]:, 0] = 0 feed_dict["response_mask"][row, response_len[row]:, 0] = 0
feed_dict["label"] = np.array([data_batches["label"][index]]).reshape( feed_dict["label"] = np.array([data_batches["label"][index]]).reshape(
...@@ -228,14 +235,14 @@ if __name__ == '__main__': ...@@ -228,14 +235,14 @@ if __name__ == '__main__':
"max_turn_len": 50, "max_turn_len": 50,
"_EOS_": 28270, "_EOS_": 28270,
} }
train, val, test = pickle.load(open('../data/ubuntu/data_small.pkl', 'rb')) with open('../ubuntu/data/data_small.pkl', 'rb') as f:
if six.PY2:
train, val, test = pickle.load(f)
else:
train, val, test = pickle.load(f, encoding="bytes")
print('load data success') print('load data success')
train_batches = build_batches(train, conf) train_batches = build_batches(train, conf)
val_batches = build_batches(val, conf) val_batches = build_batches(val, conf)
test_batches = build_batches(test, conf) test_batches = build_batches(test, conf)
print('build batches success') print('build batches success')
pickle.dump([train_batches, val_batches, test_batches],
open('../data/ubuntu/data_small_xxx.pkl', 'wb'))
print('dump success')
import six
import os
def print_arguments(args): def print_arguments(args):
print('----------- Configuration Arguments -----------') print('----------- Configuration Arguments -----------')
for arg, value in sorted(vars(args).iteritems()): for arg, value in sorted(six.iteritems(vars(args))):
print('%s: %s' % (arg, value)) print('%s: %s' % (arg, value))
print('------------------------------------------------') print('------------------------------------------------')
def mkdir(path):
if not os.path.isdir(path):
mkdir(os.path.split(path)[0])
else:
return
os.mkdir(path)
def pos_encoding_init(): def pos_encoding_init():
pass pass
......
deeplabv3plus_xception65_initialize.params
deeplabv3plus.params
deeplabv3plus.tar.gz
DeepLab运行本目录下的程序示例需要使用PaddlePaddle develop最新版本。如果您的PaddlePaddle安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新PaddlePaddle安装版本。 DeepLab运行本目录下的程序示例需要使用PaddlePaddle Fluid v1.0.0版本或以上。如果您的PaddlePaddle安装版本低于此要求,请按照安装文档中的说明更新PaddlePaddle安装版本,如果使用GPU,该程序需要使用cuDNN v7版本。
## 代码结构 ## 代码结构
...@@ -41,10 +41,12 @@ data/cityscape/ ...@@ -41,10 +41,12 @@ data/cityscape/
如果需要从头开始训练模型,用户需要下载我们的初始化模型 如果需要从头开始训练模型,用户需要下载我们的初始化模型
``` ```
wget http://paddlemodels.cdn.bcebos.com/deeplab/deeplabv3plus_xception65_initialize.tar.gz wget http://paddlemodels.cdn.bcebos.com/deeplab/deeplabv3plus_xception65_initialize.tar.gz
tar -xf deeplabv3plus_xception65_initialize.tar.gz && rm deeplabv3plus_xception65_initialize.tar.gz
``` ```
如果需要最终训练模型进行fine tune或者直接用于预测,请下载我们的最终模型 如果需要最终训练模型进行fine tune或者直接用于预测,请下载我们的最终模型
``` ```
wget http://paddlemodels.cdn.bcebos.com/deeplab/deeplabv3plus.tar.gz wget http://paddlemodels.cdn.bcebos.com/deeplab/deeplabv3plus.tar.gz
tar -xf deeplabv3plus.tar.gz && rm deeplabv3plus.tar.gz
``` ```
...@@ -70,11 +72,11 @@ python train.py --help ...@@ -70,11 +72,11 @@ python train.py --help
``` ```
python ./train.py \ python ./train.py \
--batch_size=8 \ --batch_size=8 \
--parallel=true --parallel=true \
--train_crop_size=769 \ --train_crop_size=769 \
--total_step=90000 \ --total_step=90000 \
--init_weights_path=$INIT_WEIGHTS_PATH \ --init_weights_path=deeplabv3plus_xception65_initialize.params \
--save_weights_path=$SAVE_WEIGHTS_PATH \ --save_weights_path=output \
--dataset_path=$DATASET_PATH --dataset_path=$DATASET_PATH
``` ```
...@@ -82,11 +84,10 @@ python ./train.py \ ...@@ -82,11 +84,10 @@ python ./train.py \
执行以下命令在`Cityscape`测试数据集上进行测试: 执行以下命令在`Cityscape`测试数据集上进行测试:
``` ```
python ./eval.py \ python ./eval.py \
--init_weights_path=$INIT_WEIGHTS_PATH \ --init_weights=deeplabv3plus.params \
--dataset_path=$DATASET_PATH --dataset_path=$DATASET_PATH
``` ```
需要通过选项`--model_path`指定模型文件。 需要通过选项`--model_path`指定模型文件。测试脚本的输出的评估指标为mean IoU。
测试脚本的输出的评估指标为[mean IoU]()。
## 实验结果 ## 实验结果
......
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os import os
os.environ['FLAGS_fraction_of_gpu_memory_to_use'] = '0.98' os.environ['FLAGS_fraction_of_gpu_memory_to_use'] = '0.98'
...@@ -91,7 +94,7 @@ exe = fluid.Executor(place) ...@@ -91,7 +94,7 @@ exe = fluid.Executor(place)
exe.run(sp) exe.run(sp)
if args.init_weights_path: if args.init_weights_path:
print "load from:", args.init_weights_path print("load from:", args.init_weights_path)
load_model() load_model()
dataset = CityscapeDataset(args.dataset_path, 'val') dataset = CityscapeDataset(args.dataset_path, 'val')
...@@ -118,7 +121,7 @@ for i, imgs, labels, names in batches: ...@@ -118,7 +121,7 @@ for i, imgs, labels, names in batches:
mp = (wrong + right) != 0 mp = (wrong + right) != 0
miou2 = np.mean((right[mp] * 1.0 / (right[mp] + wrong[mp]))) miou2 = np.mean((right[mp] * 1.0 / (right[mp] + wrong[mp])))
if args.verbose: if args.verbose:
print 'step: %s, mIoU: %s' % (i + 1, miou2) print('step: %s, mIoU: %s' % (i + 1, miou2))
else: else:
print '\rstep: %s, mIoU: %s' % (i + 1, miou2), print('\rstep: %s, mIoU: %s' % (i + 1, miou2))
sys.stdout.flush() sys.stdout.flush()
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import paddle import paddle
import paddle.fluid as fluid import paddle.fluid as fluid
...@@ -50,7 +53,7 @@ def append_op_result(result, name): ...@@ -50,7 +53,7 @@ def append_op_result(result, name):
def conv(*args, **kargs): def conv(*args, **kargs):
kargs['param_attr'] = name_scope + 'weights' kargs['param_attr'] = name_scope + 'weights'
if kargs.has_key('bias_attr') and kargs['bias_attr']: if 'bias_attr' in kargs and kargs['bias_attr']:
kargs['bias_attr'] = name_scope + 'biases' kargs['bias_attr'] = name_scope + 'biases'
else: else:
kargs['bias_attr'] = False kargs['bias_attr'] = False
...@@ -62,7 +65,7 @@ def group_norm(input, G, eps=1e-5, param_attr=None, bias_attr=None): ...@@ -62,7 +65,7 @@ def group_norm(input, G, eps=1e-5, param_attr=None, bias_attr=None):
N, C, H, W = input.shape N, C, H, W = input.shape
if C % G != 0: if C % G != 0:
print "group can not divide channle:", C, G print("group can not divide channle:", C, G)
for d in range(10): for d in range(10):
for t in [d, -d]: for t in [d, -d]:
if G + t <= 0: continue if G + t <= 0: continue
...@@ -70,7 +73,7 @@ def group_norm(input, G, eps=1e-5, param_attr=None, bias_attr=None): ...@@ -70,7 +73,7 @@ def group_norm(input, G, eps=1e-5, param_attr=None, bias_attr=None):
G = G + t G = G + t
break break
if C % G == 0: if C % G == 0:
print "use group size:", G print("use group size:", G)
break break
assert C % G == 0 assert C % G == 0
param_shape = (G, ) param_shape = (G, )
...@@ -139,7 +142,7 @@ def seq_conv(input, channel, stride, filter, dilation=1, act=None): ...@@ -139,7 +142,7 @@ def seq_conv(input, channel, stride, filter, dilation=1, act=None):
filter, filter,
stride, stride,
groups=input.shape[1], groups=input.shape[1],
padding=(filter / 2) * dilation, padding=(filter // 2) * dilation,
dilation=dilation) dilation=dilation)
input = bn(input) input = bn(input)
if act: input = act(input) if act: input = act(input)
......
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import cv2 import cv2
import numpy as np import numpy as np
import os
import six
default_config = { default_config = {
"shuffle": True, "shuffle": True,
...@@ -30,7 +35,7 @@ def slice_with_pad(a, s, value=0): ...@@ -30,7 +35,7 @@ def slice_with_pad(a, s, value=0):
pr = 0 pr = 0
pads.append([pl, pr]) pads.append([pl, pr])
slices.append([l, r]) slices.append([l, r])
slices = map(lambda x: slice(x[0], x[1], 1), slices) slices = list(map(lambda x: slice(x[0], x[1], 1), slices))
a = a[slices] a = a[slices]
a = np.pad(a, pad_width=pads, mode='constant', constant_values=value) a = np.pad(a, pad_width=pads, mode='constant', constant_values=value)
return a return a
...@@ -38,11 +43,17 @@ def slice_with_pad(a, s, value=0): ...@@ -38,11 +43,17 @@ def slice_with_pad(a, s, value=0):
class CityscapeDataset: class CityscapeDataset:
def __init__(self, dataset_dir, subset='train', config=default_config): def __init__(self, dataset_dir, subset='train', config=default_config):
import commands label_dirname = os.path.join(dataset_dir, 'gtFine/' + subset)
label_dirname = dataset_dir + 'gtFine/' + subset if six.PY2:
label_files = commands.getoutput( import commands
"find %s -type f | grep labelTrainIds | sort" % label_files = commands.getoutput(
label_dirname).splitlines() "find %s -type f | grep labelTrainIds | sort" %
label_dirname).splitlines()
else:
import subprocess
label_files = subprocess.getstatusoutput(
"find %s -type f | grep labelTrainIds | sort" %
label_dirname)[-1].splitlines()
self.label_files = label_files self.label_files = label_files
self.label_dirname = label_dirname self.label_dirname = label_dirname
self.index = 0 self.index = 0
...@@ -50,7 +61,7 @@ class CityscapeDataset: ...@@ -50,7 +61,7 @@ class CityscapeDataset:
self.dataset_dir = dataset_dir self.dataset_dir = dataset_dir
self.config = config self.config = config
self.reset() self.reset()
print "total number", len(label_files) print("total number", len(label_files))
def reset(self, shuffle=False): def reset(self, shuffle=False):
self.index = 0 self.index = 0
...@@ -66,13 +77,14 @@ class CityscapeDataset: ...@@ -66,13 +77,14 @@ class CityscapeDataset:
shape = self.config["crop_size"] shape = self.config["crop_size"]
while True: while True:
ln = self.label_files[self.index] ln = self.label_files[self.index]
img_name = self.dataset_dir + 'leftImg8bit/' + self.subset + ln[len( img_name = os.path.join(
self.label_dirname):] self.dataset_dir,
'leftImg8bit/' + self.subset + ln[len(self.label_dirname):])
img_name = img_name.replace('gtFine_labelTrainIds', 'leftImg8bit') img_name = img_name.replace('gtFine_labelTrainIds', 'leftImg8bit')
label = cv2.imread(ln) label = cv2.imread(ln)
img = cv2.imread(img_name) img = cv2.imread(img_name)
if img is None: if img is None:
print "load img failed:", img_name print("load img failed:", img_name)
self.next_img() self.next_img()
else: else:
break break
...@@ -128,5 +140,7 @@ class CityscapeDataset: ...@@ -128,5 +140,7 @@ class CityscapeDataset:
from prefetch_generator import BackgroundGenerator from prefetch_generator import BackgroundGenerator
batches = BackgroundGenerator(batches, 100) batches = BackgroundGenerator(batches, 100)
except: except:
print "You can install 'prefetch_generator' for acceleration of data reading." print(
"You can install 'prefetch_generator' for acceleration of data reading."
)
return batches return batches
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os import os
os.environ['FLAGS_fraction_of_gpu_memory_to_use'] = '0.98' os.environ['FLAGS_fraction_of_gpu_memory_to_use'] = '0.98'
...@@ -126,13 +129,12 @@ exe = fluid.Executor(place) ...@@ -126,13 +129,12 @@ exe = fluid.Executor(place)
exe.run(sp) exe.run(sp)
if args.init_weights_path: if args.init_weights_path:
print "load from:", args.init_weights_path print("load from:", args.init_weights_path)
load_model() load_model()
dataset = CityscapeDataset(args.dataset_path, 'train') dataset = CityscapeDataset(args.dataset_path, 'train')
if args.parallel: if args.parallel:
print "Using ParallelExecutor."
exe_p = fluid.ParallelExecutor( exe_p = fluid.ParallelExecutor(
use_cuda=True, loss_name=loss_mean.name, main_program=tp) use_cuda=True, loss_name=loss_mean.name, main_program=tp)
...@@ -149,9 +151,9 @@ for i, imgs, labels, names in batches: ...@@ -149,9 +151,9 @@ for i, imgs, labels, names in batches:
'label': labels}, 'label': labels},
fetch_list=[pred, loss_mean]) fetch_list=[pred, loss_mean])
if i % 100 == 0: if i % 100 == 0:
print "Model is saved to", args.save_weights_path print("Model is saved to", args.save_weights_path)
save_model() save_model()
print "step %s, loss: %s" % (i, np.mean(retv[1])) print("step %s, loss: %s" % (i, np.mean(retv[1])))
print "Training done. Model is saved to", args.save_weights_path print("Training done. Model is saved to", args.save_weights_path)
save_model() save_model()
...@@ -10,3 +10,4 @@ output* ...@@ -10,3 +10,4 @@ output*
pred pred
eval_tools eval_tools
box* box*
PyramidBox_WiderFace*
...@@ -9,6 +9,7 @@ import time ...@@ -9,6 +9,7 @@ import time
import numpy as np import numpy as np
import threading import threading
import multiprocessing import multiprocessing
import traceback
try: try:
import queue import queue
except ImportError: except ImportError:
...@@ -71,6 +72,7 @@ class GeneratorEnqueuer(object): ...@@ -71,6 +72,7 @@ class GeneratorEnqueuer(object):
try: try:
task() task()
except Exception: except Exception:
traceback.print_exc()
self._stop_event.set() self._stop_event.set()
break break
else: else:
...@@ -78,6 +80,7 @@ class GeneratorEnqueuer(object): ...@@ -78,6 +80,7 @@ class GeneratorEnqueuer(object):
try: try:
task() task()
except Exception: except Exception:
traceback.print_exc()
self._stop_event.set() self._stop_event.set()
break break
......
...@@ -427,6 +427,7 @@ class PyramidBox(object): ...@@ -427,6 +427,7 @@ class PyramidBox(object):
overlap_threshold=0.35, overlap_threshold=0.35,
neg_overlap=0.35) neg_overlap=0.35)
loss = fluid.layers.reduce_sum(loss) loss = fluid.layers.reduce_sum(loss)
loss.persistable = True
return loss return loss
def train(self): def train(self):
......
...@@ -250,6 +250,10 @@ def train_generator(settings, file_list, batch_size, shuffle=True): ...@@ -250,6 +250,10 @@ def train_generator(settings, file_list, batch_size, shuffle=True):
ymin = float(temp_info_box[1]) ymin = float(temp_info_box[1])
w = float(temp_info_box[2]) w = float(temp_info_box[2])
h = float(temp_info_box[3]) h = float(temp_info_box[3])
# Filter out wrong labels
if w < 0 or h < 0:
continue
xmax = xmin + w xmax = xmin + w
ymax = ymin + h ymax = ymin + h
...@@ -294,7 +298,7 @@ def train(settings, ...@@ -294,7 +298,7 @@ def train(settings,
generator_output = enqueuer.queue.get() generator_output = enqueuer.queue.get()
break break
else: else:
time.sleep(0.02) time.sleep(0.01)
yield generator_output yield generator_output
generator_output = None generator_output = None
finally: finally:
......
...@@ -167,7 +167,7 @@ def train(args, config, train_params, train_file_list): ...@@ -167,7 +167,7 @@ def train(args, config, train_params, train_file_list):
shutil.rmtree(model_path) shutil.rmtree(model_path)
print('save models to %s' % (model_path)) print('save models to %s' % (model_path))
fluid.io.save_persistables(exe, model_path) fluid.io.save_persistables(exe, model_path, main_program=program)
train_py_reader.start() train_py_reader.start()
try: try:
...@@ -189,13 +189,13 @@ def train(args, config, train_params, train_file_list): ...@@ -189,13 +189,13 @@ def train(args, config, train_params, train_file_list):
fetch_vars = [np.mean(np.array(v)) for v in fetch_vars] fetch_vars = [np.mean(np.array(v)) for v in fetch_vars]
if batch_id % 10 == 0: if batch_id % 10 == 0:
if not args.use_pyramidbox: if not args.use_pyramidbox:
print("Pass {0}, batch {1}, loss {2}, time {3}".format( print("Pass {:d}, batch {:d}, loss {:.6f}, time {:.5f}".format(
pass_id, batch_id, fetch_vars[0], pass_id, batch_id, fetch_vars[0],
start_time - prev_start_time)) start_time - prev_start_time))
else: else:
print("Pass {0}, batch {1}, face loss {2}, " \ print("Pass {:d}, batch {:d}, face loss {:.6f}, " \
"head loss {3}, " \ "head loss {:.6f}, " \
"time {4}".format(pass_id, "time {:.5f}".format(pass_id,
batch_id, fetch_vars[0], fetch_vars[1], batch_id, fetch_vars[0], fetch_vars[1],
start_time - prev_start_time)) start_time - prev_start_time))
if pass_id % 1 == 0 or pass_id == epoc_num - 1: if pass_id % 1 == 0 or pass_id == epoc_num - 1:
......
...@@ -82,9 +82,6 @@ def save_widerface_bboxes(image_path, bboxes_scores, output_dir): ...@@ -82,9 +82,6 @@ def save_widerface_bboxes(image_path, bboxes_scores, output_dir):
image_name = image_path.split('/')[-1] image_name = image_path.split('/')[-1]
image_class = image_path.split('/')[-2] image_class = image_path.split('/')[-2]
image_name = image_name.encode('utf-8')
image_class = image_class.encode('utf-8')
odir = os.path.join(output_dir, image_class) odir = os.path.join(output_dir, image_class)
if not os.path.exists(odir): if not os.path.exists(odir):
os.makedirs(odir) os.makedirs(odir)
......
...@@ -43,7 +43,7 @@ After data preparation, one can start the training step by: ...@@ -43,7 +43,7 @@ After data preparation, one can start the training step by:
python train.py \ python train.py \
--max_size=1333 \ --max_size=1333 \
--scales=800 \ --scales=[800] \
--batch_size=8 \ --batch_size=8 \
--model_save_dir=output/ --model_save_dir=output/
...@@ -57,6 +57,22 @@ After data preparation, one can start the training step by: ...@@ -57,6 +57,22 @@ After data preparation, one can start the training step by:
sh ./pretrained/download.sh sh ./pretrained/download.sh
Set `pretrained_model` to load pre-trained model. In addition, this parameter is used to load trained model when finetuning as well. Set `pretrained_model` to load pre-trained model. In addition, this parameter is used to load trained model when finetuning as well.
Please make sure that pretrained_model is downloaded and loaded correctly, otherwise, the loss may be NAN during training.
**Install the [cocoapi](https://github.com/cocodataset/cocoapi):**
To train the model, [cocoapi](https://github.com/cocodataset/cocoapi) is needed. Install the cocoapi:
# COCOAPI=/path/to/clone/cocoapi
git clone https://github.com/cocodataset/cocoapi.git $COCOAPI
cd $COCOAPI/PythonAPI
# if cython is not installed
pip install Cython
# Install into global site-packages
make install
# Alternatively, if you do not have permissions or prefer
# not to install the COCO API into global site-packages
python2 setup.py install --user
**data reader introduction:** **data reader introduction:**
...@@ -103,18 +119,7 @@ Finetuning is to finetune model weights in a specific task by loading pretrained ...@@ -103,18 +119,7 @@ Finetuning is to finetune model weights in a specific task by loading pretrained
## Evaluation ## Evaluation
Evaluation is to evaluate the performance of a trained model. This sample provides `eval_coco_map.py` which uses a COCO-specific mAP metric defined by [COCO committee](http://cocodataset.org/#detections-eval). To use `eval_coco_map.py` , [cocoapi](https://github.com/cocodataset/cocoapi) is needed. Install the cocoapi: Evaluation is to evaluate the performance of a trained model. This sample provides `eval_coco_map.py` which uses a COCO-specific mAP metric defined by [COCO committee](http://cocodataset.org/#detections-eval).
# COCOAPI=/path/to/clone/cocoapi
git clone https://github.com/cocodataset/cocoapi.git $COCOAPI
cd $COCOAPI/PythonAPI
# if cython is not installed
pip install Cython
# Install into global site-packages
make install
# Alternatively, if you do not have permissions or prefer
# not to install the COCO API into global site-packages
python2 setup.py install --user
`eval_coco_map.py` is the main executor for evalution, one can start evalution step by: `eval_coco_map.py` is the main executor for evalution, one can start evalution step by:
...@@ -136,7 +141,7 @@ Faster RCNN mAP ...@@ -136,7 +141,7 @@ Faster RCNN mAP
| Detectron | 8 | 180000 | 0.315 | | Detectron | 8 | 180000 | 0.315 |
| Fluid minibatch padding | 8 | 180000 | 0.314 | | Fluid minibatch padding | 8 | 180000 | 0.314 |
| Fluid all padding | 8 | 180000 | 0.308 | | Fluid all padding | 8 | 180000 | 0.308 |
| Fluid no padding |6 | 240000 | 0.317 | | Fluid no padding |8 | 180000 | 0.316 |
* Fluid all padding: Each image padding to 1333\*1333. * Fluid all padding: Each image padding to 1333\*1333.
* Fluid minibatch padding: Images in one batch padding to the same size. This method is same as detectron. * Fluid minibatch padding: Images in one batch padding to the same size. This method is same as detectron.
......
...@@ -42,7 +42,7 @@ Faster RCNN 目标检测模型 ...@@ -42,7 +42,7 @@ Faster RCNN 目标检测模型
python train.py \ python train.py \
--max_size=1333 \ --max_size=1333 \
--scales=800 \ --scales=[800] \
--batch_size=8 \ --batch_size=8 \
--model_save_dir=output/ \ --model_save_dir=output/ \
--pretrained_model=${path_to_pretrain_model} --pretrained_model=${path_to_pretrain_model}
...@@ -57,6 +57,22 @@ Faster RCNN 目标检测模型 ...@@ -57,6 +57,22 @@ Faster RCNN 目标检测模型
sh ./pretrained/download.sh sh ./pretrained/download.sh
通过初始化`pretrained_model` 加载预训练模型。同时在参数微调时也采用该设置加载已训练模型。 通过初始化`pretrained_model` 加载预训练模型。同时在参数微调时也采用该设置加载已训练模型。
请在训练前确认预训练模型下载与加载正确,否则训练过程中损失可能会出现NAN。
**安装[cocoapi](https://github.com/cocodataset/cocoapi):**
训练前需要首先下载[cocoapi](https://github.com/cocodataset/cocoapi)
# COCOAPI=/path/to/clone/cocoapi
git clone https://github.com/cocodataset/cocoapi.git $COCOAPI
cd $COCOAPI/PythonAPI
# if cython is not installed
pip install Cython
# Install into global site-packages
make install
# Alternatively, if you do not have permissions or prefer
# not to install the COCO API into global site-packages
python2 setup.py install --user
**数据读取器说明:** 数据读取器定义在reader.py中。所有图像将短边等比例缩放至`scales`,若长边大于`max_size`, 则再次将长边等比例缩放至`max_iter`。在训练阶段,对图像采用水平翻转。支持将同一个batch内的图像padding为相同尺寸。 **数据读取器说明:** 数据读取器定义在reader.py中。所有图像将短边等比例缩放至`scales`,若长边大于`max_size`, 则再次将长边等比例缩放至`max_iter`。在训练阶段,对图像采用水平翻转。支持将同一个batch内的图像padding为相同尺寸。
...@@ -87,18 +103,7 @@ Faster RCNN 训练loss ...@@ -87,18 +103,7 @@ Faster RCNN 训练loss
## 模型评估 ## 模型评估
模型评估是指对训练完毕的模型评估各类性能指标。本示例采用[COCO官方评估](http://cocodataset.org/#detections-eval),使用前需要首先下载[cocoapi](https://github.com/cocodataset/cocoapi) 模型评估是指对训练完毕的模型评估各类性能指标。本示例采用[COCO官方评估](http://cocodataset.org/#detections-eval)
# COCOAPI=/path/to/clone/cocoapi
git clone https://github.com/cocodataset/cocoapi.git $COCOAPI
cd $COCOAPI/PythonAPI
# if cython is not installed
pip install Cython
# Install into global site-packages
make install
# Alternatively, if you do not have permissions or prefer
# not to install the COCO API into global site-packages
python2 setup.py install --user
`eval_coco_map.py`是评估模块的主要执行程序,调用示例如下: `eval_coco_map.py`是评估模块的主要执行程序,调用示例如下:
...@@ -120,7 +125,7 @@ Faster RCNN mAP ...@@ -120,7 +125,7 @@ Faster RCNN mAP
| Detectron | 8 | 180000 | 0.315 | | Detectron | 8 | 180000 | 0.315 |
| Fluid minibatch padding | 8 | 180000 | 0.314 | | Fluid minibatch padding | 8 | 180000 | 0.314 |
| Fluid all padding | 8 | 180000 | 0.308 | | Fluid all padding | 8 | 180000 | 0.308 |
| Fluid no padding |6 | 240000 | 0.317 | | Fluid no padding |8 | 180000 | 0.316 |
* Fluid all padding: 每张图像填充为1333\*1333大小。 * Fluid all padding: 每张图像填充为1333\*1333大小。
* Fluid minibatch padding: 同一个batch内的图像填充为相同尺寸。该方法与detectron处理相同。 * Fluid minibatch padding: 同一个batch内的图像填充为相同尺寸。该方法与detectron处理相同。
......
# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved.
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
from edict import AttrDict
import six
import numpy as np
_C = AttrDict()
cfg = _C
#
# Training options
#
_C.TRAIN = AttrDict()
# scales an image's shortest side
_C.TRAIN.scales = [800]
# max size of longest side
_C.TRAIN.max_size = 1333
# images per GPU in minibatch
_C.TRAIN.im_per_batch = 1
# roi minibatch size per image
_C.TRAIN.batch_size_per_im = 512
# target fraction of foreground roi minibatch
_C.TRAIN.fg_fractrion = 0.25
# overlap threshold for a foreground roi
_C.TRAIN.fg_thresh = 0.5
# overlap threshold for a background roi
_C.TRAIN.bg_thresh_hi = 0.5
_C.TRAIN.bg_thresh_lo = 0.0
# If False, only resize image and not pad, image shape is different between
# GPUs in one mini-batch. If True, image shape is the same in one mini-batch.
_C.TRAIN.padding_minibatch = False
# Snapshot period
_C.TRAIN.snapshot_iter = 10000
# number of RPN proposals to keep before NMS
_C.TRAIN.rpn_pre_nms_top_n = 12000
# number of RPN proposals to keep after NMS
_C.TRAIN.rpn_post_nms_top_n = 2000
# NMS threshold used on RPN proposals
_C.TRAIN.rpn_nms_thresh = 0.7
# min size in RPN proposals
_C.TRAIN.rpn_min_size = 0.0
# eta for adaptive NMS in RPN
_C.TRAIN.rpn_eta = 1.0
# number of RPN examples per image
_C.TRAIN.rpn_batch_size_per_im = 256
# remove anchors out of the image
_C.TRAIN.rpn_straddle_thresh = 0.
# target fraction of foreground examples pre RPN minibatch
_C.TRAIN.rpn_fg_fraction = 0.5
# min overlap between anchor and gt box to be a positive examples
_C.TRAIN.rpn_positive_overlap = 0.7
# max overlap between anchor and gt box to be a negative examples
_C.TRAIN.rpn_negative_overlap = 0.3
# stopgrad at a specified stage
_C.TRAIN.freeze_at = 2
# min area of ground truth box
_C.TRAIN.gt_min_area = -1
#
# Inference options
#
_C.TEST = AttrDict()
# scales an image's shortest side
_C.TEST.scales = [800]
# max size of longest side
_C.TEST.max_size = 1333
# eta for adaptive NMS in RPN
_C.TEST.rpn_eta = 1.0
# min score threshold to infer
_C.TEST.score_thresh = 0.05
# overlap threshold used for NMS
_C.TEST.nms_thresh = 0.5
# number of RPN proposals to keep before NMS
_C.TEST.rpn_pre_nms_top_n = 6000
# number of RPN proposals to keep after NMS
_C.TEST.rpn_post_nms_top_n = 1000
# min size in RPN proposals
_C.TEST.rpn_min_size = 0.0
# max number of detections
_C.TEST.detectiions_per_im = 100
# NMS threshold used on RPN proposals
_C.TEST.rpn_nms_thresh = 0.7
#
# Model options
#
# weight for bbox regression targets
_C.bbox_reg_weights = [0.1, 0.1, 0.2, 0.2]
# RPN anchor sizes
_C.anchor_sizes = [32, 64, 128, 256, 512]
# RPN anchor ratio
_C.aspect_ratio = [0.5, 1, 2]
# variance of anchors
_C.variances = [1., 1., 1., 1.]
# stride of feature map
_C.rpn_stride = [16.0, 16.0]
# Use roi pool or roi align, 'RoIPool' or 'RoIAlign'
_C.roi_func = 'RoIAlign'
# sampling ratio for roi align
_C.sampling_ratio = 0
# pooled width and pooled height
_C.roi_resolution = 14
# spatial scale
_C.spatial_scale = 1. / 16.
#
# SOLVER options
#
# derived learning rate the to get the final learning rate.
_C.learning_rate = 0.01
# maximum number of iterations
_C.max_iter = 180000
# warm up to learning rate
_C.warm_up_iter = 500
_C.warm_up_factor = 1. / 3.
# lr steps_with_decay
_C.lr_steps = [120000, 160000]
_C.lr_gamma = 0.1
# L2 regularization hyperparameter
_C.weight_decay = 0.0001
# momentum with SGD
_C.momentum = 0.9
#
# ENV options
#
# support both CPU and GPU
_C.use_gpu = True
# Whether use parallel
_C.parallel = True
# Class number
_C.class_num = 81
# support pyreader
_C.use_pyreader = True
# pixel mean values
_C.pixel_means = [102.9801, 115.9465, 122.7717]
# clip box to prevent overflowing
_C.bbox_clip = np.log(1000. / 16.)
# dataset path
_C.train_file_list = 'annotations/instances_train2017.json'
_C.train_data_dir = 'train2017'
_C.val_file_list = 'annotations/instances_val2017.json'
_C.val_data_dir = 'val2017'
def merge_cfg_from_args(args, mode):
"""Merge config keys, values in args into the global config."""
if mode == 'train':
sub_d = _C.TRAIN
else:
sub_d = _C.TEST
for k, v in sorted(six.iteritems(vars(args))):
d = _C
try:
value = eval(v)
except:
value = v
if k in sub_d:
sub_d[k] = value
else:
d[k] = value
...@@ -27,21 +27,27 @@ from __future__ import unicode_literals ...@@ -27,21 +27,27 @@ from __future__ import unicode_literals
import cv2 import cv2
import numpy as np import numpy as np
from config import cfg
def get_image_blob(roidb, settings): def get_image_blob(roidb, mode):
"""Builds an input blob from the images in the roidb at the specified """Builds an input blob from the images in the roidb at the specified
scales. scales.
""" """
scale_ind = np.random.randint(0, high=len(settings.scales)) if mode == 'train':
scales = cfg.TRAIN.scales
scale_ind = np.random.randint(0, high=len(scales))
target_size = scales[scale_ind]
max_size = cfg.TRAIN.max_size
else:
target_size = cfg.TEST.scales[0]
max_size = cfg.TEST.max_size
im = cv2.imread(roidb['image']) im = cv2.imread(roidb['image'])
assert im is not None, \ assert im is not None, \
'Failed to read image \'{}\''.format(roidb['image']) 'Failed to read image \'{}\''.format(roidb['image'])
if roidb['flipped']: if roidb['flipped']:
im = im[:, ::-1, :] im = im[:, ::-1, :]
target_size = settings.scales[scale_ind] im, im_scale = prep_im_for_blob(im, cfg.pixel_means, target_size, max_size)
im, im_scale = prep_im_for_blob(im, settings.mean_value, target_size,
settings.max_size)
return im, im_scale return im, im_scale
......
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
class AttrDict(dict):
def __init__(self, *args, **kwargs):
super(AttrDict, self).__init__(*args, **kwargs)
def __getattr__(self, name):
if name in self.__dict__:
return self.__dict__[name]
elif name in self:
return self[name]
else:
raise AttributeError(name)
def __setattr__(self, name, value):
if name in self.__dict__:
self.__dict__[name] = value
else:
self[name] = value
...@@ -29,18 +29,20 @@ import models.resnet as resnet ...@@ -29,18 +29,20 @@ import models.resnet as resnet
import json import json
from pycocotools.coco import COCO from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval, Params from pycocotools.cocoeval import COCOeval, Params
from config import cfg
def eval(cfg): def eval():
if '2014' in cfg.dataset: if '2014' in cfg.dataset:
test_list = 'annotations/instances_val2014.json' test_list = 'annotations/instances_val2014.json'
elif '2017' in cfg.dataset: elif '2017' in cfg.dataset:
test_list = 'annotations/instances_val2017.json' test_list = 'annotations/instances_val2017.json'
image_shape = [3, cfg.max_size, cfg.max_size] image_shape = [3, cfg.TEST.max_size, cfg.TEST.max_size]
class_nums = cfg.class_num class_nums = cfg.class_num
batch_size = cfg.batch_size devices = os.getenv("CUDA_VISIBLE_DEVICES") or ""
devices_num = len(devices.split(","))
total_batch_size = devices_num * cfg.TRAIN.im_per_batch
cocoGt = COCO(os.path.join(cfg.data_dir, test_list)) cocoGt = COCO(os.path.join(cfg.data_dir, test_list))
numId_to_catId_map = {i + 1: v for i, v in enumerate(cocoGt.getCatIds())} numId_to_catId_map = {i + 1: v for i, v in enumerate(cocoGt.getCatIds())}
category_ids = cocoGt.getCatIds() category_ids = cocoGt.getCatIds()
...@@ -51,7 +53,6 @@ def eval(cfg): ...@@ -51,7 +53,6 @@ def eval(cfg):
label_list[0] = ['background'] label_list[0] = ['background']
model = model_builder.FasterRCNN( model = model_builder.FasterRCNN(
cfg=cfg,
add_conv_body_func=resnet.add_ResNet50_conv4_body, add_conv_body_func=resnet.add_ResNet50_conv4_body,
add_roi_box_head_func=resnet.add_ResNet_roi_conv5_head, add_roi_box_head_func=resnet.add_ResNet_roi_conv5_head,
use_pyreader=False, use_pyreader=False,
...@@ -66,7 +67,7 @@ def eval(cfg): ...@@ -66,7 +67,7 @@ def eval(cfg):
return os.path.exists(os.path.join(cfg.pretrained_model, var.name)) return os.path.exists(os.path.join(cfg.pretrained_model, var.name))
fluid.io.load_vars(exe, cfg.pretrained_model, predicate=if_exist) fluid.io.load_vars(exe, cfg.pretrained_model, predicate=if_exist)
# yapf: enable # yapf: enable
test_reader = reader.test(cfg, batch_size) test_reader = reader.test(total_batch_size)
feeder = fluid.DataFeeder(place=place, feed_list=model.feeds()) feeder = fluid.DataFeeder(place=place, feed_list=model.feeds())
dts_res = [] dts_res = []
...@@ -80,11 +81,11 @@ def eval(cfg): ...@@ -80,11 +81,11 @@ def eval(cfg):
fetch_list=[v.name for v in fetch_list], fetch_list=[v.name for v in fetch_list],
feed=feeder.feed(batch_data), feed=feeder.feed(batch_data),
return_numpy=False) return_numpy=False)
new_lod, nmsed_out = get_nmsed_box(cfg, rpn_rois_v, confs_v, locs_v, new_lod, nmsed_out = get_nmsed_box(rpn_rois_v, confs_v, locs_v,
class_nums, im_info, class_nums, im_info,
numId_to_catId_map) numId_to_catId_map)
dts_res += get_dt_res(batch_size, new_lod, nmsed_out, batch_data) dts_res += get_dt_res(total_batch_size, new_lod, nmsed_out, batch_data)
end = time.time() end = time.time()
print('batch id: {}, time: {}'.format(batch_id, end - start)) print('batch id: {}, time: {}'.format(batch_id, end - start))
with open("detection_result.json", 'w') as outfile: with open("detection_result.json", 'w') as outfile:
...@@ -100,6 +101,4 @@ def eval(cfg): ...@@ -100,6 +101,4 @@ def eval(cfg):
if __name__ == '__main__': if __name__ == '__main__':
args = parse_args() args = parse_args()
print_arguments(args) print_arguments(args)
eval()
data_args = reader.Settings(args)
eval(data_args)
...@@ -20,6 +20,7 @@ import box_utils ...@@ -20,6 +20,7 @@ import box_utils
from PIL import Image from PIL import Image
from PIL import ImageDraw from PIL import ImageDraw
from PIL import ImageFont from PIL import ImageFont
from config import cfg
def box_decoder(target_box, prior_box, prior_box_var): def box_decoder(target_box, prior_box, prior_box_var):
...@@ -31,10 +32,8 @@ def box_decoder(target_box, prior_box, prior_box_var): ...@@ -31,10 +32,8 @@ def box_decoder(target_box, prior_box, prior_box_var):
prior_box_loc[:, 3] = (prior_box[:, 3] + prior_box[:, 1]) / 2 prior_box_loc[:, 3] = (prior_box[:, 3] + prior_box[:, 1]) / 2
pred_bbox = np.zeros_like(target_box, dtype=np.float32) pred_bbox = np.zeros_like(target_box, dtype=np.float32)
for i in range(prior_box.shape[0]): for i in range(prior_box.shape[0]):
dw = np.minimum(prior_box_var[2] * target_box[i, 2::4], dw = np.minimum(prior_box_var[2] * target_box[i, 2::4], cfg.bbox_clip)
np.log(1000. / 16.)) dh = np.minimum(prior_box_var[3] * target_box[i, 3::4], cfg.bbox_clip)
dh = np.minimum(prior_box_var[3] * target_box[i, 3::4],
np.log(1000. / 16.))
pred_bbox[i, 0::4] = prior_box_var[0] * target_box[ pred_bbox[i, 0::4] = prior_box_var[0] * target_box[
i, 0::4] * prior_box_loc[i, 0] + prior_box_loc[i, 2] i, 0::4] * prior_box_loc[i, 0] + prior_box_loc[i, 2]
pred_bbox[i, 1::4] = prior_box_var[1] * target_box[ pred_bbox[i, 1::4] = prior_box_var[1] * target_box[
...@@ -67,11 +66,11 @@ def clip_tiled_boxes(boxes, im_shape): ...@@ -67,11 +66,11 @@ def clip_tiled_boxes(boxes, im_shape):
return boxes return boxes
def get_nmsed_box(args, rpn_rois, confs, locs, class_nums, im_info, def get_nmsed_box(rpn_rois, confs, locs, class_nums, im_info,
numId_to_catId_map): numId_to_catId_map):
lod = rpn_rois.lod()[0] lod = rpn_rois.lod()[0]
rpn_rois_v = np.array(rpn_rois) rpn_rois_v = np.array(rpn_rois)
variance_v = np.array([0.1, 0.1, 0.2, 0.2]) variance_v = np.array(cfg.bbox_reg_weights)
confs_v = np.array(confs) confs_v = np.array(confs)
locs_v = np.array(locs) locs_v = np.array(locs)
rois = box_decoder(locs_v, rpn_rois_v, variance_v) rois = box_decoder(locs_v, rpn_rois_v, variance_v)
...@@ -89,12 +88,12 @@ def get_nmsed_box(args, rpn_rois, confs, locs, class_nums, im_info, ...@@ -89,12 +88,12 @@ def get_nmsed_box(args, rpn_rois, confs, locs, class_nums, im_info,
cls_boxes = [[] for _ in range(class_nums)] cls_boxes = [[] for _ in range(class_nums)]
scores_n = confs_v[start:end, :] scores_n = confs_v[start:end, :]
for j in range(1, class_nums): for j in range(1, class_nums):
inds = np.where(scores_n[:, j] > args.score_threshold)[0] inds = np.where(scores_n[:, j] > cfg.TEST.score_thresh)[0]
scores_j = scores_n[inds, j] scores_j = scores_n[inds, j]
rois_j = rois_n[inds, j * 4:(j + 1) * 4] rois_j = rois_n[inds, j * 4:(j + 1) * 4]
dets_j = np.hstack((rois_j, scores_j[:, np.newaxis])).astype( dets_j = np.hstack((rois_j, scores_j[:, np.newaxis])).astype(
np.float32, copy=False) np.float32, copy=False)
keep = box_utils.nms(dets_j, args.nms_threshold) keep = box_utils.nms(dets_j, cfg.TEST.nms_thresh)
nms_dets = dets_j[keep, :] nms_dets = dets_j[keep, :]
#add labels #add labels
cat_id = numId_to_catId_map[j] cat_id = numId_to_catId_map[j]
...@@ -105,8 +104,8 @@ def get_nmsed_box(args, rpn_rois, confs, locs, class_nums, im_info, ...@@ -105,8 +104,8 @@ def get_nmsed_box(args, rpn_rois, confs, locs, class_nums, im_info,
# Limit to max_per_image detections **over all classes** # Limit to max_per_image detections **over all classes**
image_scores = np.hstack( image_scores = np.hstack(
[cls_boxes[j][:, -2] for j in range(1, class_nums)]) [cls_boxes[j][:, -2] for j in range(1, class_nums)])
if len(image_scores) > 100: if len(image_scores) > cfg.TEST.detectiions_per_im:
image_thresh = np.sort(image_scores)[-100] image_thresh = np.sort(image_scores)[-cfg.TEST.detectiions_per_im]
for j in range(1, class_nums): for j in range(1, class_nums):
keep = np.where(cls_boxes[j][:, -2] >= image_thresh)[0] keep = np.where(cls_boxes[j][:, -2] >= image_thresh)[0]
cls_boxes[j] = cls_boxes[j][keep, :] cls_boxes[j] = cls_boxes[j][keep, :]
......
fluid/faster_rcnn/image/mAP.jpg

80.0 KB | W: | H:

fluid/faster_rcnn/image/mAP.jpg

41.0 KB | W: | H:

fluid/faster_rcnn/image/mAP.jpg
fluid/faster_rcnn/image/mAP.jpg
fluid/faster_rcnn/image/mAP.jpg
fluid/faster_rcnn/image/mAP.jpg
  • 2-up
  • Swipe
  • Onion skin
import os
import time
import numpy as np
from eval_helper import get_nmsed_box
from eval_helper import get_dt_res
from eval_helper import draw_bounding_box_on_image
import paddle
import paddle.fluid as fluid
import reader
from utility import print_arguments, parse_args
import models.model_builder as model_builder
import models.resnet as resnet
import json
from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval, Params
from config import cfg
def infer():
if '2014' in cfg.dataset:
test_list = 'annotations/instances_val2014.json'
elif '2017' in cfg.dataset:
test_list = 'annotations/instances_val2017.json'
cocoGt = COCO(os.path.join(cfg.data_dir, test_list))
numId_to_catId_map = {i + 1: v for i, v in enumerate(cocoGt.getCatIds())}
category_ids = cocoGt.getCatIds()
label_list = {
item['id']: item['name']
for item in cocoGt.loadCats(category_ids)
}
label_list[0] = ['background']
image_shape = [3, cfg.TEST.max_size, cfg.TEST.max_size]
class_nums = cfg.class_num
model = model_builder.FasterRCNN(
add_conv_body_func=resnet.add_ResNet50_conv4_body,
add_roi_box_head_func=resnet.add_ResNet_roi_conv5_head,
use_pyreader=False,
is_train=False)
model.build_model(image_shape)
rpn_rois, confs, locs = model.eval_out()
place = fluid.CUDAPlace(0) if cfg.use_gpu else fluid.CPUPlace()
exe = fluid.Executor(place)
# yapf: disable
if cfg.pretrained_model:
def if_exist(var):
return os.path.exists(os.path.join(cfg.pretrained_model, var.name))
fluid.io.load_vars(exe, cfg.pretrained_model, predicate=if_exist)
# yapf: enable
infer_reader = reader.infer()
feeder = fluid.DataFeeder(place=place, feed_list=model.feeds())
dts_res = []
fetch_list = [rpn_rois, confs, locs]
data = next(infer_reader())
im_info = [data[0][1]]
rpn_rois_v, confs_v, locs_v = exe.run(
fetch_list=[v.name for v in fetch_list],
feed=feeder.feed(data),
return_numpy=False)
new_lod, nmsed_out = get_nmsed_box(rpn_rois_v, confs_v, locs_v, class_nums,
im_info, numId_to_catId_map)
path = os.path.join(cfg.image_path, cfg.image_name)
draw_bounding_box_on_image(path, nmsed_out, cfg.draw_threshold, label_list)
if __name__ == '__main__':
args = parse_args()
print_arguments(args)
infer()
...@@ -17,11 +17,11 @@ from paddle.fluid.param_attr import ParamAttr ...@@ -17,11 +17,11 @@ from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.initializer import Constant from paddle.fluid.initializer import Constant
from paddle.fluid.initializer import Normal from paddle.fluid.initializer import Normal
from paddle.fluid.regularizer import L2Decay from paddle.fluid.regularizer import L2Decay
from config import cfg
class FasterRCNN(object): class FasterRCNN(object):
def __init__(self, def __init__(self,
cfg=None,
add_conv_body_func=None, add_conv_body_func=None,
add_roi_box_head_func=None, add_roi_box_head_func=None,
is_train=True, is_train=True,
...@@ -29,7 +29,6 @@ class FasterRCNN(object): ...@@ -29,7 +29,6 @@ class FasterRCNN(object):
use_random=True): use_random=True):
self.add_conv_body_func = add_conv_body_func self.add_conv_body_func = add_conv_body_func
self.add_roi_box_head_func = add_roi_box_head_func self.add_roi_box_head_func = add_roi_box_head_func
self.cfg = cfg
self.is_train = is_train self.is_train = is_train
self.use_pyreader = use_pyreader self.use_pyreader = use_pyreader
self.use_random = use_random self.use_random = use_random
...@@ -111,10 +110,10 @@ class FasterRCNN(object): ...@@ -111,10 +110,10 @@ class FasterRCNN(object):
name="conv_rpn_b", learning_rate=2., regularizer=L2Decay(0.))) name="conv_rpn_b", learning_rate=2., regularizer=L2Decay(0.)))
self.anchor, self.var = fluid.layers.anchor_generator( self.anchor, self.var = fluid.layers.anchor_generator(
input=rpn_conv, input=rpn_conv,
anchor_sizes=self.cfg.anchor_sizes, anchor_sizes=cfg.anchor_sizes,
aspect_ratios=self.cfg.aspect_ratios, aspect_ratios=cfg.aspect_ratio,
variance=self.cfg.variance, variance=cfg.variances,
stride=[16.0, 16.0]) stride=cfg.rpn_stride)
num_anchor = self.anchor.shape[2] num_anchor = self.anchor.shape[2]
# Proposal classification scores # Proposal classification scores
self.rpn_cls_score = fluid.layers.conv2d( self.rpn_cls_score = fluid.layers.conv2d(
...@@ -152,8 +151,12 @@ class FasterRCNN(object): ...@@ -152,8 +151,12 @@ class FasterRCNN(object):
rpn_cls_score_prob = fluid.layers.sigmoid( rpn_cls_score_prob = fluid.layers.sigmoid(
self.rpn_cls_score, name='rpn_cls_score_prob') self.rpn_cls_score, name='rpn_cls_score_prob')
pre_nms_top_n = 12000 if self.is_train else 6000 param_obj = cfg.TRAIN if self.is_train else cfg.TEST
post_nms_top_n = 2000 if self.is_train else 1000 pre_nms_top_n = param_obj.rpn_pre_nms_top_n
post_nms_top_n = param_obj.rpn_post_nms_top_n
nms_thresh = param_obj.rpn_nms_thresh
min_size = param_obj.rpn_min_size
eta = param_obj.rpn_eta
rpn_rois, rpn_roi_probs = fluid.layers.generate_proposals( rpn_rois, rpn_roi_probs = fluid.layers.generate_proposals(
scores=rpn_cls_score_prob, scores=rpn_cls_score_prob,
bbox_deltas=self.rpn_bbox_pred, bbox_deltas=self.rpn_bbox_pred,
...@@ -162,9 +165,9 @@ class FasterRCNN(object): ...@@ -162,9 +165,9 @@ class FasterRCNN(object):
variances=self.var, variances=self.var,
pre_nms_top_n=pre_nms_top_n, pre_nms_top_n=pre_nms_top_n,
post_nms_top_n=post_nms_top_n, post_nms_top_n=post_nms_top_n,
nms_thresh=0.7, nms_thresh=nms_thresh,
min_size=0.0, min_size=min_size,
eta=1.0) eta=eta)
self.rpn_rois = rpn_rois self.rpn_rois = rpn_rois
if self.is_train: if self.is_train:
outs = fluid.layers.generate_proposal_labels( outs = fluid.layers.generate_proposal_labels(
...@@ -173,13 +176,13 @@ class FasterRCNN(object): ...@@ -173,13 +176,13 @@ class FasterRCNN(object):
is_crowd=self.is_crowd, is_crowd=self.is_crowd,
gt_boxes=self.gt_box, gt_boxes=self.gt_box,
im_info=self.im_info, im_info=self.im_info,
batch_size_per_im=self.cfg.batch_size_per_im, batch_size_per_im=cfg.TRAIN.batch_size_per_im,
fg_fraction=0.25, fg_fraction=cfg.TRAIN.fg_fractrion,
fg_thresh=0.5, fg_thresh=cfg.TRAIN.fg_thresh,
bg_thresh_hi=0.5, bg_thresh_hi=cfg.TRAIN.bg_thresh_hi,
bg_thresh_lo=0.0, bg_thresh_lo=cfg.TRAIN.bg_thresh_lo,
bbox_reg_weights=[0.1, 0.1, 0.2, 0.2], bbox_reg_weights=cfg.bbox_reg_weights,
class_nums=self.cfg.class_num, class_nums=cfg.class_num,
use_random=self.use_random) use_random=self.use_random)
self.rois = outs[0] self.rois = outs[0]
...@@ -193,15 +196,24 @@ class FasterRCNN(object): ...@@ -193,15 +196,24 @@ class FasterRCNN(object):
pool_rois = self.rois pool_rois = self.rois
else: else:
pool_rois = self.rpn_rois pool_rois = self.rpn_rois
pool = fluid.layers.roi_pool( if cfg.roi_func == 'RoIPool':
input=roi_input, pool = fluid.layers.roi_pool(
rois=pool_rois, input=roi_input,
pooled_height=14, rois=pool_rois,
pooled_width=14, pooled_height=cfg.roi_resolution,
spatial_scale=0.0625) pooled_width=cfg.roi_resolution,
spatial_scale=cfg.spatial_scale)
elif cfg.roi_func == 'RoIAlign':
pool = fluid.layers.roi_align(
input=roi_input,
rois=pool_rois,
pooled_height=cfg.roi_resolution,
pooled_width=cfg.roi_resolution,
spatial_scale=cfg.spatial_scale,
sampling_ratio=cfg.sampling_ratio)
rcnn_out = self.add_roi_box_head_func(pool) rcnn_out = self.add_roi_box_head_func(pool)
self.cls_score = fluid.layers.fc(input=rcnn_out, self.cls_score = fluid.layers.fc(input=rcnn_out,
size=self.cfg.class_num, size=cfg.class_num,
act=None, act=None,
name='cls_score', name='cls_score',
param_attr=ParamAttr( param_attr=ParamAttr(
...@@ -213,7 +225,7 @@ class FasterRCNN(object): ...@@ -213,7 +225,7 @@ class FasterRCNN(object):
learning_rate=2., learning_rate=2.,
regularizer=L2Decay(0.))) regularizer=L2Decay(0.)))
self.bbox_pred = fluid.layers.fc(input=rcnn_out, self.bbox_pred = fluid.layers.fc(input=rcnn_out,
size=4 * self.cfg.class_num, size=4 * cfg.class_num,
act=None, act=None,
name='bbox_pred', name='bbox_pred',
param_attr=ParamAttr( param_attr=ParamAttr(
...@@ -257,8 +269,7 @@ class FasterRCNN(object): ...@@ -257,8 +269,7 @@ class FasterRCNN(object):
x=rpn_cls_score_reshape, shape=(0, -1, 1)) x=rpn_cls_score_reshape, shape=(0, -1, 1))
rpn_bbox_pred_reshape = fluid.layers.reshape( rpn_bbox_pred_reshape = fluid.layers.reshape(
x=rpn_bbox_pred_reshape, shape=(0, -1, 4)) x=rpn_bbox_pred_reshape, shape=(0, -1, 4))
score_pred, loc_pred, score_tgt, loc_tgt, bbox_weight = \
score_pred, loc_pred, score_tgt, loc_tgt = \
fluid.layers.rpn_target_assign( fluid.layers.rpn_target_assign(
bbox_pred=rpn_bbox_pred_reshape, bbox_pred=rpn_bbox_pred_reshape,
cls_logits=rpn_cls_score_reshape, cls_logits=rpn_cls_score_reshape,
...@@ -267,11 +278,11 @@ class FasterRCNN(object): ...@@ -267,11 +278,11 @@ class FasterRCNN(object):
gt_boxes=self.gt_box, gt_boxes=self.gt_box,
is_crowd=self.is_crowd, is_crowd=self.is_crowd,
im_info=self.im_info, im_info=self.im_info,
rpn_batch_size_per_im=256, rpn_batch_size_per_im=cfg.TRAIN.rpn_batch_size_per_im,
rpn_straddle_thresh=0.0, rpn_straddle_thresh=cfg.TRAIN.rpn_straddle_thresh,
rpn_fg_fraction=0.5, rpn_fg_fraction=cfg.TRAIN.rpn_fg_fraction,
rpn_positive_overlap=0.7, rpn_positive_overlap=cfg.TRAIN.rpn_positive_overlap,
rpn_negative_overlap=0.3, rpn_negative_overlap=cfg.TRAIN.rpn_negative_overlap,
use_random=self.use_random) use_random=self.use_random)
score_tgt = fluid.layers.cast(x=score_tgt, dtype='float32') score_tgt = fluid.layers.cast(x=score_tgt, dtype='float32')
rpn_cls_loss = fluid.layers.sigmoid_cross_entropy_with_logits( rpn_cls_loss = fluid.layers.sigmoid_cross_entropy_with_logits(
...@@ -279,7 +290,12 @@ class FasterRCNN(object): ...@@ -279,7 +290,12 @@ class FasterRCNN(object):
rpn_cls_loss = fluid.layers.reduce_mean( rpn_cls_loss = fluid.layers.reduce_mean(
rpn_cls_loss, name='loss_rpn_cls') rpn_cls_loss, name='loss_rpn_cls')
rpn_reg_loss = fluid.layers.smooth_l1(x=loc_pred, y=loc_tgt, sigma=3.0) rpn_reg_loss = fluid.layers.smooth_l1(
x=loc_pred,
y=loc_tgt,
sigma=3.0,
inside_weight=bbox_weight,
outside_weight=bbox_weight)
rpn_reg_loss = fluid.layers.reduce_sum( rpn_reg_loss = fluid.layers.reduce_sum(
rpn_reg_loss, name='loss_rpn_bbox') rpn_reg_loss, name='loss_rpn_bbox')
score_shape = fluid.layers.shape(score_tgt) score_shape = fluid.layers.shape(score_tgt)
......
...@@ -16,6 +16,7 @@ import paddle.fluid as fluid ...@@ -16,6 +16,7 @@ import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.initializer import Constant from paddle.fluid.initializer import Constant
from paddle.fluid.regularizer import L2Decay from paddle.fluid.regularizer import L2Decay
from config import cfg
def conv_bn_layer(input, def conv_bn_layer(input,
...@@ -88,8 +89,7 @@ def conv_affine_layer(input, ...@@ -88,8 +89,7 @@ def conv_affine_layer(input,
default_initializer=Constant(0.)) default_initializer=Constant(0.))
bias.stop_gradient = True bias.stop_gradient = True
elt_mul = fluid.layers.elementwise_mul(x=conv, y=scale, axis=1) out = fluid.layers.affine_channel(x=conv, scale=scale, bias=bias)
out = fluid.layers.elementwise_add(x=elt_mul, y=bias, axis=1)
if act == 'relu': if act == 'relu':
out = fluid.layers.relu(x=out) out = fluid.layers.relu(x=out)
return out return out
...@@ -137,7 +137,7 @@ ResNet_cfg = { ...@@ -137,7 +137,7 @@ ResNet_cfg = {
} }
def add_ResNet50_conv4_body(body_input, freeze_at=2): def add_ResNet50_conv4_body(body_input):
stages, block_func = ResNet_cfg[50] stages, block_func = ResNet_cfg[50]
stages = stages[0:3] stages = stages[0:3]
conv1 = conv_affine_layer( conv1 = conv_affine_layer(
...@@ -149,13 +149,13 @@ def add_ResNet50_conv4_body(body_input, freeze_at=2): ...@@ -149,13 +149,13 @@ def add_ResNet50_conv4_body(body_input, freeze_at=2):
pool_stride=2, pool_stride=2,
pool_padding=1) pool_padding=1)
res2 = layer_warp(block_func, pool1, 64, stages[0], 1, name="res2") res2 = layer_warp(block_func, pool1, 64, stages[0], 1, name="res2")
if freeze_at == 2: if cfg.TRAIN.freeze_at == 2:
res2.stop_gradient = True res2.stop_gradient = True
res3 = layer_warp(block_func, res2, 128, stages[1], 2, name="res3") res3 = layer_warp(block_func, res2, 128, stages[1], 2, name="res3")
if freeze_at == 3: if cfg.TRAIN.freeze_at == 3:
res3.stop_gradient = True res3.stop_gradient = True
res4 = layer_warp(block_func, res3, 256, stages[2], 2, name="res4") res4 = layer_warp(block_func, res3, 256, stages[2], 2, name="res4")
if freeze_at == 4: if cfg.TRAIN.freeze_at == 4:
res4.stop_gradient = True res4.stop_gradient = True
return res4 return res4
......
DIR="$( cd "$(dirname "$0")" ; pwd -P )" DIR="$(dirname "$PWD -P")"
cd "$DIR" cd "$DIR"
# Download the data. # Download the data.
......
...@@ -26,19 +26,18 @@ import paddle.fluid.profiler as profiler ...@@ -26,19 +26,18 @@ import paddle.fluid.profiler as profiler
import models.model_builder as model_builder import models.model_builder as model_builder
import models.resnet as resnet import models.resnet as resnet
from learning_rate import exponential_with_warmup_decay from learning_rate import exponential_with_warmup_decay
from config import cfg
def train(cfg): def train():
batch_size = cfg.batch_size
learning_rate = cfg.learning_rate learning_rate = cfg.learning_rate
image_shape = [3, cfg.max_size, cfg.max_size] image_shape = [3, cfg.TRAIN.max_size, cfg.TRAIN.max_size]
num_iterations = cfg.max_iter num_iterations = cfg.max_iter
devices = os.getenv("CUDA_VISIBLE_DEVICES") or "" devices = os.getenv("CUDA_VISIBLE_DEVICES") or ""
devices_num = len(devices.split(",")) devices_num = len(devices.split(","))
total_batch_size = devices_num * cfg.TRAIN.im_per_batch
model = model_builder.FasterRCNN( model = model_builder.FasterRCNN(
cfg=cfg,
add_conv_body_func=resnet.add_ResNet50_conv4_body, add_conv_body_func=resnet.add_ResNet50_conv4_body,
add_roi_box_head_func=resnet.add_ResNet_roi_conv5_head, add_roi_box_head_func=resnet.add_ResNet_roi_conv5_head,
use_pyreader=cfg.use_pyreader, use_pyreader=cfg.use_pyreader,
...@@ -51,8 +50,10 @@ def train(cfg): ...@@ -51,8 +50,10 @@ def train(cfg):
rpn_reg_loss.persistable = True rpn_reg_loss.persistable = True
loss = loss_cls + loss_bbox + rpn_cls_loss + rpn_reg_loss loss = loss_cls + loss_bbox + rpn_cls_loss + rpn_reg_loss
boundaries = [120000, 160000] boundaries = cfg.lr_steps
values = [learning_rate, learning_rate * 0.1, learning_rate * 0.01] gamma = cfg.lr_gamma
step_num = len(lr_steps)
values = [learning_rate * (gamma**i) for i in range(step_num + 1)]
optimizer = fluid.optimizer.Momentum( optimizer = fluid.optimizer.Momentum(
learning_rate=exponential_with_warmup_decay( learning_rate=exponential_with_warmup_decay(
...@@ -82,22 +83,16 @@ def train(cfg): ...@@ -82,22 +83,16 @@ def train(cfg):
train_exe = fluid.ParallelExecutor( train_exe = fluid.ParallelExecutor(
use_cuda=bool(cfg.use_gpu), loss_name=loss.name) use_cuda=bool(cfg.use_gpu), loss_name=loss.name)
assert cfg.batch_size % devices_num == 0, \
"batch_size = %d, devices_num = %d" %(cfg.batch_size, devices_num)
batch_size_per_dev = cfg.batch_size / devices_num
if cfg.use_pyreader: if cfg.use_pyreader:
train_reader = reader.train( train_reader = reader.train(
cfg, batch_size=cfg.TRAIN.im_per_batch,
batch_size=batch_size_per_dev, total_batch_size=total_batch_size,
total_batch_size=cfg.batch_size, padding_total=cfg.TRAIN.padding_minibatch,
padding_total=cfg.padding_minibatch,
shuffle=False) shuffle=False)
py_reader = model.py_reader py_reader = model.py_reader
py_reader.decorate_paddle_reader(train_reader) py_reader.decorate_paddle_reader(train_reader)
else: else:
train_reader = reader.train( train_reader = reader.train(batch_size=total_batch_size, shuffle=False)
cfg, batch_size=cfg.batch_size, shuffle=False)
feeder = fluid.DataFeeder(place=place, feed_list=model.feeds()) feeder = fluid.DataFeeder(place=place, feed_list=model.feeds())
fetch_list = [loss, loss_cls, loss_bbox, rpn_cls_loss, rpn_reg_loss] fetch_list = [loss, loss_cls, loss_bbox, rpn_cls_loss, rpn_reg_loss]
...@@ -109,7 +104,7 @@ def train(cfg): ...@@ -109,7 +104,7 @@ def train(cfg):
for batch_id in range(iterations): for batch_id in range(iterations):
start_time = time.time() start_time = time.time()
data = train_reader().next() data = next(train_reader())
end_time = time.time() end_time = time.time()
reader_time.append(end_time - start_time) reader_time.append(end_time - start_time)
start_time = time.time() start_time = time.time()
...@@ -163,8 +158,7 @@ def train(cfg): ...@@ -163,8 +158,7 @@ def train(cfg):
run_func(2) run_func(2)
# profiling # profiling
start = time.time() start = time.time()
use_profile = False if cfg.use_profile:
if use_profile:
with profiler.profiler('GPU', 'total', '/tmp/profile_file'): with profiler.profiler('GPU', 'total', '/tmp/profile_file'):
reader_time, run_time, total_images = run_func(num_iterations) reader_time, run_time, total_images = run_func(num_iterations)
else: else:
...@@ -181,6 +175,4 @@ def train(cfg): ...@@ -181,6 +175,4 @@ def train(cfg):
if __name__ == '__main__': if __name__ == '__main__':
args = parse_args() args = parse_args()
print_arguments(args) print_arguments(args)
train()
data_args = reader.Settings(args)
train(data_args)
...@@ -26,58 +26,45 @@ from collections import deque ...@@ -26,58 +26,45 @@ from collections import deque
from roidbs import JsonDataset from roidbs import JsonDataset
import data_utils import data_utils
from config import cfg
class Settings(object): def coco(mode,
def __init__(self, args=None):
for arg, value in sorted(six.iteritems(vars(args))):
setattr(self, arg, value)
if 'coco2014' in args.dataset:
self.class_nums = 81
self.train_file_list = 'annotations/instances_train2014.json'
self.train_data_dir = 'train2014'
self.val_file_list = 'annotations/instances_val2014.json'
self.val_data_dir = 'val2014'
elif 'coco2017' in args.dataset:
self.class_nums = 81
self.train_file_list = 'annotations/instances_train2017.json'
self.train_data_dir = 'train2017'
self.val_file_list = 'annotations/instances_val2017.json'
self.val_data_dir = 'val2017'
else:
raise NotImplementedError('Dataset {} not supported'.format(
self.dataset))
self.mean_value = np.array(self.mean_value)[
np.newaxis, np.newaxis, :].astype('float32')
def coco(settings,
mode,
batch_size=None, batch_size=None,
total_batch_size=None, total_batch_size=None,
padding_total=False, padding_total=False,
shuffle=False): shuffle=False):
if 'coco2014' in cfg.dataset:
cfg.train_file_list = 'annotations/instances_train2014.json'
cfg.train_data_dir = 'train2014'
cfg.val_file_list = 'annotations/instances_val2014.json'
cfg.val_data_dir = 'val2014'
elif 'coco2017' in cfg.dataset:
cfg.train_file_list = 'annotations/instances_train2017.json'
cfg.train_data_dir = 'train2017'
cfg.val_file_list = 'annotations/instances_val2017.json'
cfg.val_data_dir = 'val2017'
else:
raise NotImplementedError('Dataset {} not supported'.format(
cfg.dataset))
cfg.mean_value = np.array(cfg.pixel_means)[np.newaxis,
np.newaxis, :].astype('float32')
total_batch_size = total_batch_size if total_batch_size else batch_size total_batch_size = total_batch_size if total_batch_size else batch_size
if mode != 'infer': if mode != 'infer':
assert total_batch_size % batch_size == 0 assert total_batch_size % batch_size == 0
if mode == 'train': if mode == 'train':
settings.train_file_list = os.path.join(settings.data_dir, cfg.train_file_list = os.path.join(cfg.data_dir, cfg.train_file_list)
settings.train_file_list) cfg.train_data_dir = os.path.join(cfg.data_dir, cfg.train_data_dir)
settings.train_data_dir = os.path.join(settings.data_dir,
settings.train_data_dir)
elif mode == 'test' or mode == 'infer': elif mode == 'test' or mode == 'infer':
settings.val_file_list = os.path.join(settings.data_dir, cfg.val_file_list = os.path.join(cfg.data_dir, cfg.val_file_list)
settings.val_file_list) cfg.val_data_dir = os.path.join(cfg.data_dir, cfg.val_data_dir)
settings.val_data_dir = os.path.join(settings.data_dir, json_dataset = JsonDataset(train=(mode == 'train'))
settings.val_data_dir)
json_dataset = JsonDataset(settings, train=(mode == 'train'))
roidbs = json_dataset.get_roidb() roidbs = json_dataset.get_roidb()
print("{} on {} with {} roidbs".format(mode, settings.dataset, len(roidbs))) print("{} on {} with {} roidbs".format(mode, cfg.dataset, len(roidbs)))
def roidb_reader(roidb, mode): def roidb_reader(roidb, mode):
im, im_scales = data_utils.get_image_blob(roidb, settings) im, im_scales = data_utils.get_image_blob(roidb, mode)
im_id = roidb['id'] im_id = roidb['id']
im_height = np.round(roidb['height'] * im_scales) im_height = np.round(roidb['height'] * im_scales)
im_width = np.round(roidb['width'] * im_scales) im_width = np.round(roidb['width'] * im_scales)
...@@ -150,7 +137,7 @@ def coco(settings, ...@@ -150,7 +137,7 @@ def coco(settings,
else: else:
for roidb in roidbs: for roidb in roidbs:
if settings.image_name not in roidb['image']: if cfg.image_name not in roidb['image']:
continue continue
im, im_info, im_id = roidb_reader(roidb, mode) im, im_info, im_id = roidb_reader(roidb, mode)
batch_out = [(im, im_info, im_id)] batch_out = [(im, im_info, im_id)]
...@@ -159,23 +146,14 @@ def coco(settings, ...@@ -159,23 +146,14 @@ def coco(settings,
return reader return reader
def train(settings, def train(batch_size, total_batch_size=None, padding_total=False, shuffle=True):
batch_size,
total_batch_size=None,
padding_total=False,
shuffle=True):
return coco( return coco(
settings, 'train', batch_size, total_batch_size, padding_total, shuffle=shuffle)
'train',
batch_size,
total_batch_size,
padding_total,
shuffle=shuffle)
def test(settings, batch_size, total_batch_size=None, padding_total=False): def test(batch_size, total_batch_size=None, padding_total=False):
return coco(settings, 'test', batch_size, total_batch_size, shuffle=False) return coco('test', batch_size, total_batch_size, shuffle=False)
def infer(settings): def infer():
return coco(settings, 'infer') return coco('infer')
...@@ -26,7 +26,6 @@ from __future__ import print_function ...@@ -26,7 +26,6 @@ from __future__ import print_function
from __future__ import unicode_literals from __future__ import unicode_literals
import copy import copy
import cPickle as pickle
import logging import logging
import numpy as np import numpy as np
import os import os
...@@ -37,6 +36,7 @@ import matplotlib ...@@ -37,6 +36,7 @@ import matplotlib
matplotlib.use('Agg') matplotlib.use('Agg')
from pycocotools.coco import COCO from pycocotools.coco import COCO
import box_utils import box_utils
from config import cfg
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
...@@ -44,16 +44,16 @@ logger = logging.getLogger(__name__) ...@@ -44,16 +44,16 @@ logger = logging.getLogger(__name__)
class JsonDataset(object): class JsonDataset(object):
"""A class representing a COCO json dataset.""" """A class representing a COCO json dataset."""
def __init__(self, args, train=False): def __init__(self, train=False):
print('Creating: {}'.format(args.dataset)) print('Creating: {}'.format(cfg.dataset))
self.name = args.dataset self.name = cfg.dataset
self.is_train = train self.is_train = train
if self.is_train: if self.is_train:
data_dir = args.train_data_dir data_dir = cfg.train_data_dir
file_list = args.train_file_list file_list = cfg.train_file_list
else: else:
data_dir = args.val_data_dir data_dir = cfg.val_data_dir
file_list = args.val_file_list file_list = cfg.val_file_list
self.image_directory = data_dir self.image_directory = data_dir
self.COCO = COCO(file_list) self.COCO = COCO(file_list)
# Set up dataset classes # Set up dataset classes
...@@ -91,7 +91,6 @@ class JsonDataset(object): ...@@ -91,7 +91,6 @@ class JsonDataset(object):
end_time = time.time() end_time = time.time()
print('_add_gt_annotations took {:.3f}s'.format(end_time - print('_add_gt_annotations took {:.3f}s'.format(end_time -
start_time)) start_time))
print('Appending horizontally-flipped training examples...') print('Appending horizontally-flipped training examples...')
self._extend_with_flipped_entries(roidb) self._extend_with_flipped_entries(roidb)
print('Loaded dataset: {:s}'.format(self.name)) print('Loaded dataset: {:s}'.format(self.name))
...@@ -130,7 +129,7 @@ class JsonDataset(object): ...@@ -130,7 +129,7 @@ class JsonDataset(object):
width = entry['width'] width = entry['width']
height = entry['height'] height = entry['height']
for obj in objs: for obj in objs:
if obj['area'] < -1: #cfg.TRAIN.GT_MIN_AREA: if obj['area'] < cfg.TRAIN.gt_min_area:
continue continue
if 'ignore' in obj and obj['ignore'] == 1: if 'ignore' in obj and obj['ignore'] == 1:
continue continue
......
...@@ -28,11 +28,12 @@ import reader ...@@ -28,11 +28,12 @@ import reader
import models.model_builder as model_builder import models.model_builder as model_builder
import models.resnet as resnet import models.resnet as resnet
from learning_rate import exponential_with_warmup_decay from learning_rate import exponential_with_warmup_decay
from config import cfg
def train(cfg): def train():
learning_rate = cfg.learning_rate learning_rate = cfg.learning_rate
image_shape = [3, cfg.max_size, cfg.max_size] image_shape = [3, cfg.TRAIN.max_size, cfg.TRAIN.max_size]
if cfg.debug: if cfg.debug:
fluid.default_startup_program().random_seed = 1000 fluid.default_startup_program().random_seed = 1000
...@@ -43,9 +44,9 @@ def train(cfg): ...@@ -43,9 +44,9 @@ def train(cfg):
devices = os.getenv("CUDA_VISIBLE_DEVICES") or "" devices = os.getenv("CUDA_VISIBLE_DEVICES") or ""
devices_num = len(devices.split(",")) devices_num = len(devices.split(","))
total_batch_size = devices_num * cfg.TRAIN.im_per_batch
model = model_builder.FasterRCNN( model = model_builder.FasterRCNN(
cfg=cfg,
add_conv_body_func=resnet.add_ResNet50_conv4_body, add_conv_body_func=resnet.add_ResNet50_conv4_body,
add_roi_box_head_func=resnet.add_ResNet_roi_conv5_head, add_roi_box_head_func=resnet.add_ResNet_roi_conv5_head,
use_pyreader=cfg.use_pyreader, use_pyreader=cfg.use_pyreader,
...@@ -58,18 +59,20 @@ def train(cfg): ...@@ -58,18 +59,20 @@ def train(cfg):
rpn_reg_loss.persistable = True rpn_reg_loss.persistable = True
loss = loss_cls + loss_bbox + rpn_cls_loss + rpn_reg_loss loss = loss_cls + loss_bbox + rpn_cls_loss + rpn_reg_loss
boundaries = [120000, 160000] boundaries = cfg.lr_steps
values = [learning_rate, learning_rate * 0.1, learning_rate * 0.01] gamma = cfg.lr_gamma
step_num = len(cfg.lr_steps)
values = [learning_rate * (gamma**i) for i in range(step_num + 1)]
optimizer = fluid.optimizer.Momentum( optimizer = fluid.optimizer.Momentum(
learning_rate=exponential_with_warmup_decay( learning_rate=exponential_with_warmup_decay(
learning_rate=learning_rate, learning_rate=learning_rate,
boundaries=boundaries, boundaries=boundaries,
values=values, values=values,
warmup_iter=500, warmup_iter=cfg.warm_up_iter,
warmup_factor=1.0 / 3.0), warmup_factor=cfg.warm_up_factor),
regularization=fluid.regularizer.L2Decay(0.0001), regularization=fluid.regularizer.L2Decay(cfg.weight_decay),
momentum=0.9) momentum=cfg.momentum)
optimizer.minimize(loss) optimizer.minimize(loss)
fluid.memory_optimize(fluid.default_main_program()) fluid.memory_optimize(fluid.default_main_program())
...@@ -89,20 +92,16 @@ def train(cfg): ...@@ -89,20 +92,16 @@ def train(cfg):
train_exe = fluid.ParallelExecutor( train_exe = fluid.ParallelExecutor(
use_cuda=bool(cfg.use_gpu), loss_name=loss.name) use_cuda=bool(cfg.use_gpu), loss_name=loss.name)
assert cfg.batch_size % devices_num == 0
batch_size_per_dev = cfg.batch_size / devices_num
if cfg.use_pyreader: if cfg.use_pyreader:
train_reader = reader.train( train_reader = reader.train(
cfg, batch_size=cfg.TRAIN.im_per_batch,
batch_size=batch_size_per_dev, total_batch_size=total_batch_size,
total_batch_size=cfg.batch_size, padding_total=cfg.TRAIN.padding_minibatch,
padding_total=cfg.padding_minibatch,
shuffle=True) shuffle=True)
py_reader = model.py_reader py_reader = model.py_reader
py_reader.decorate_paddle_reader(train_reader) py_reader.decorate_paddle_reader(train_reader)
else: else:
train_reader = reader.train( train_reader = reader.train(batch_size=total_batch_size, shuffle=True)
cfg, batch_size=cfg.batch_size, shuffle=True)
feeder = fluid.DataFeeder(place=place, feed_list=model.feeds()) feeder = fluid.DataFeeder(place=place, feed_list=model.feeds())
def save_model(postfix): def save_model(postfix):
...@@ -133,7 +132,7 @@ def train(cfg): ...@@ -133,7 +132,7 @@ def train(cfg):
smoothed_loss.get_median_value( smoothed_loss.get_median_value(
), start_time - prev_start_time)) ), start_time - prev_start_time))
sys.stdout.flush() sys.stdout.flush()
if (iter_id + 1) % cfg.snapshot_stride == 0: if (iter_id + 1) % cfg.TRAIN.snapshot_iter == 0:
save_model("model_iter{}".format(iter_id)) save_model("model_iter{}".format(iter_id))
except fluid.core.EOFException: except fluid.core.EOFException:
py_reader.reset() py_reader.reset()
...@@ -159,7 +158,7 @@ def train(cfg): ...@@ -159,7 +158,7 @@ def train(cfg):
iter_id, lr[0], iter_id, lr[0],
smoothed_loss.get_median_value(), start_time - prev_start_time)) smoothed_loss.get_median_value(), start_time - prev_start_time))
sys.stdout.flush() sys.stdout.flush()
if (iter_id + 1) % cfg.snapshot_stride == 0: if (iter_id + 1) % cfg.TRAIN.snapshot_iter == 0:
save_model("model_iter{}".format(iter_id)) save_model("model_iter{}".format(iter_id))
if (iter_id + 1) == cfg.max_iter: if (iter_id + 1) == cfg.max_iter:
break break
...@@ -175,6 +174,4 @@ def train(cfg): ...@@ -175,6 +174,4 @@ def train(cfg):
if __name__ == '__main__': if __name__ == '__main__':
args = parse_args() args = parse_args()
print_arguments(args) print_arguments(args)
train()
data_args = reader.Settings(args)
train(data_args)
...@@ -18,7 +18,7 @@ Contains common utility functions. ...@@ -18,7 +18,7 @@ Contains common utility functions.
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
import sys
import distutils.util import distutils.util
import numpy as np import numpy as np
import six import six
...@@ -26,6 +26,7 @@ from collections import deque ...@@ -26,6 +26,7 @@ from collections import deque
from paddle.fluid import core from paddle.fluid import core
import argparse import argparse
import functools import functools
from config import *
def print_arguments(args): def print_arguments(args):
...@@ -96,31 +97,33 @@ def parse_args(): ...@@ -96,31 +97,33 @@ def parse_args():
add_arg('model_save_dir', str, 'output', "The path to save model.") add_arg('model_save_dir', str, 'output', "The path to save model.")
add_arg('pretrained_model', str, 'imagenet_resnet50_fusebn', "The init model path.") add_arg('pretrained_model', str, 'imagenet_resnet50_fusebn', "The init model path.")
add_arg('dataset', str, 'coco2017', "coco2014, coco2017.") add_arg('dataset', str, 'coco2017', "coco2014, coco2017.")
add_arg('data_dir', str, 'data/COCO17', "The data root path.")
add_arg('class_num', int, 81, "Class number.") add_arg('class_num', int, 81, "Class number.")
add_arg('data_dir', str, 'data/COCO17', "The data root path.")
add_arg('use_pyreader', bool, True, "Use pyreader.") add_arg('use_pyreader', bool, True, "Use pyreader.")
add_arg('use_profile', bool, False, "Whether use profiler.")
add_arg('padding_minibatch',bool, False, add_arg('padding_minibatch',bool, False,
"If False, only resize image and not pad, image shape is different between" "If False, only resize image and not pad, image shape is different between"
" GPUs in one mini-batch. If True, image shape is the same in one mini-batch.") " GPUs in one mini-batch. If True, image shape is the same in one mini-batch.")
#SOLVER #SOLVER
add_arg('learning_rate', float, 0.01, "Learning rate.") add_arg('learning_rate', float, 0.01, "Learning rate.")
add_arg('max_iter', int, 180000, "Iter number.") add_arg('max_iter', int, 180000, "Iter number.")
add_arg('log_window', int, 1, "Log smooth window, set 1 for debug, set 20 for train.") add_arg('log_window', int, 20, "Log smooth window, set 1 for debug, set 20 for train.")
add_arg('snapshot_stride', int, 10000, "save model every snapshot stride.") # FAST RCNN
# RPN # RPN
add_arg('anchor_sizes', int, [32,64,128,256,512], "The size of anchors.") add_arg('anchor_sizes', int, [32,64,128,256,512], "The size of anchors.")
add_arg('aspect_ratios', float, [0.5,1.0,2.0], "The ratio of anchors.") add_arg('aspect_ratios', float, [0.5,1.0,2.0], "The ratio of anchors.")
add_arg('variance', float, [1.,1.,1.,1.], "The variance of anchors.") add_arg('variance', float, [1.,1.,1.,1.], "The variance of anchors.")
add_arg('rpn_stride', float, 16., "Stride of the feature map that RPN is attached.") add_arg('rpn_stride', float, [16.,16.], "Stride of the feature map that RPN is attached.")
# FAST RCNN add_arg('rpn_nms_thresh', float, 0.7, "NMS threshold used on RPN proposals")
# TRAIN TEST INFER # TRAIN TEST INFER
add_arg('batch_size', int, 1, "Minibatch size.") add_arg('im_per_batch', int, 1, "Minibatch size.")
add_arg('max_size', int, 1333, "The resized image height.") add_arg('max_size', int, 1333, "The resized image height.")
add_arg('scales', int, [800], "The resized image height.") add_arg('scales', int, [800], "The resized image height.")
add_arg('batch_size_per_im',int, 512, "fast rcnn head batch size") add_arg('batch_size_per_im',int, 512, "fast rcnn head batch size")
add_arg('mean_value', float, [102.9801, 115.9465, 122.7717], "pixel mean") add_arg('pixel_means', float, [102.9801, 115.9465, 122.7717], "pixel mean")
add_arg('nms_threshold', float, 0.5, "NMS threshold.") add_arg('nms_thresh', float, 0.5, "NMS threshold.")
add_arg('score_threshold', float, 0.05, "score threshold for NMS.") add_arg('score_thresh', float, 0.05, "score threshold for NMS.")
add_arg('snapshot_stride', int, 10000, "save model every snapshot stride.")
add_arg('debug', bool, False, "Debug mode") add_arg('debug', bool, False, "Debug mode")
# SINGLE EVAL AND DRAW # SINGLE EVAL AND DRAW
add_arg('draw_threshold', float, 0.8, "Confidence threshold to draw bbox.") add_arg('draw_threshold', float, 0.8, "Confidence threshold to draw bbox.")
...@@ -128,4 +131,9 @@ def parse_args(): ...@@ -128,4 +131,9 @@ def parse_args():
add_arg('image_name', str, '', "The single image used to inference and visualize.") add_arg('image_name', str, '', "The single image used to inference and visualize.")
# yapf: enable # yapf: enable
args = parser.parse_args() args = parser.parse_args()
file_name = sys.argv[0]
if 'train' in file_name or 'profile' in file_name:
merge_cfg_from_args(args, 'train')
else:
merge_cfg_from_args(args, 'test')
return args return args
...@@ -12,8 +12,12 @@ ...@@ -12,8 +12,12 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import sys import sys
import os import os
import six
import argparse import argparse
import functools import functools
import matplotlib import matplotlib
...@@ -102,7 +106,7 @@ def train(args): ...@@ -102,7 +106,7 @@ def train(args):
noise_data = np.random.uniform( noise_data = np.random.uniform(
low=-1.0, high=1.0, low=-1.0, high=1.0,
size=[args.batch_size, NOISE_SIZE]).astype('float32') size=[args.batch_size, NOISE_SIZE]).astype('float32')
real_image = np.array(map(lambda x: x[0], data)).reshape( real_image = np.array(list(map(lambda x: x[0], data))).reshape(
-1, 784).astype('float32') -1, 784).astype('float32')
conditions_data = np.array([x[1] for x in data]).reshape( conditions_data = np.array([x[1] for x in data]).reshape(
[-1, 1]).astype("float32") [-1, 1]).astype("float32")
...@@ -138,7 +142,7 @@ def train(args): ...@@ -138,7 +142,7 @@ def train(args):
d_loss_np = [d_loss_1[0][0], d_loss_2[0][0]] d_loss_np = [d_loss_1[0][0], d_loss_2[0][0]]
for _ in xrange(NUM_TRAIN_TIMES_OF_DG): for _ in six.moves.xrange(NUM_TRAIN_TIMES_OF_DG):
noise_data = np.random.uniform( noise_data = np.random.uniform(
low=-1.0, high=1.0, low=-1.0, high=1.0,
size=[args.batch_size, NOISE_SIZE]).astype('float32') size=[args.batch_size, NOISE_SIZE]).astype('float32')
...@@ -159,8 +163,9 @@ def train(args): ...@@ -159,8 +163,9 @@ def train(args):
total_images = np.concatenate([real_image, generated_images]) total_images = np.concatenate([real_image, generated_images])
fig = plot(total_images) fig = plot(total_images)
msg = "Epoch ID={0}\n Batch ID={1}\n D-Loss={2}\n DG-Loss={3}\n gen={4}".format( msg = "Epoch ID={0}\n Batch ID={1}\n D-Loss={2}\n DG-Loss={3}\n gen={4}".format(
pass_id, batch_id, d_loss_np, dg_loss_np, pass_id, batch_id,
check(generated_images)) np.sum(d_loss_np),
np.sum(dg_loss_np), check(generated_images))
print(msg) print(msg)
plt.title(msg) plt.title(msg)
plt.savefig( plt.savefig(
......
...@@ -12,11 +12,15 @@ ...@@ -12,11 +12,15 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import sys import sys
import os import os
import argparse import argparse
import functools import functools
import matplotlib import matplotlib
import six
import numpy as np import numpy as np
import paddle import paddle
import paddle.fluid as fluid import paddle.fluid as fluid
...@@ -98,7 +102,7 @@ def train(args): ...@@ -98,7 +102,7 @@ def train(args):
noise_data = np.random.uniform( noise_data = np.random.uniform(
low=-1.0, high=1.0, low=-1.0, high=1.0,
size=[args.batch_size, NOISE_SIZE]).astype('float32') size=[args.batch_size, NOISE_SIZE]).astype('float32')
real_image = np.array(map(lambda x: x[0], data)).reshape( real_image = np.array(list(map(lambda x: x[0], data))).reshape(
-1, 784).astype('float32') -1, 784).astype('float32')
real_labels = np.ones( real_labels = np.ones(
shape=[real_image.shape[0], 1], dtype='float32') shape=[real_image.shape[0], 1], dtype='float32')
...@@ -128,7 +132,7 @@ def train(args): ...@@ -128,7 +132,7 @@ def train(args):
d_loss_np = [d_loss_1[0][0], d_loss_2[0][0]] d_loss_np = [d_loss_1[0][0], d_loss_2[0][0]]
for _ in xrange(NUM_TRAIN_TIMES_OF_DG): for _ in six.moves.xrange(NUM_TRAIN_TIMES_OF_DG):
noise_data = np.random.uniform( noise_data = np.random.uniform(
low=-1.0, high=1.0, low=-1.0, high=1.0,
size=[args.batch_size, NOISE_SIZE]).astype('float32') size=[args.batch_size, NOISE_SIZE]).astype('float32')
...@@ -146,7 +150,8 @@ def train(args): ...@@ -146,7 +150,8 @@ def train(args):
fig = plot(total_images) fig = plot(total_images)
msg = "Epoch ID={0} Batch ID={1} D-Loss={2} DG-Loss={3}\n gen={4}".format( msg = "Epoch ID={0} Batch ID={1} D-Loss={2} DG-Loss={3}\n gen={4}".format(
pass_id, batch_id, pass_id, batch_id,
np.sum(d_loss_np), dg_loss_np, check(generated_images)) np.sum(d_loss_np),
np.sum(dg_loss_np), check(generated_images))
print(msg) print(msg)
plt.title(msg) plt.title(msg)
plt.savefig( plt.savefig(
......
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import paddle import paddle
import paddle.fluid as fluid import paddle.fluid as fluid
from utility import get_parent_function_name from utility import get_parent_function_name
...@@ -98,19 +101,19 @@ def D_cond(image, y): ...@@ -98,19 +101,19 @@ def D_cond(image, y):
h2 = bn(fc(h1, dfc_dim), act='leaky_relu') h2 = bn(fc(h1, dfc_dim), act='leaky_relu')
h2 = fluid.layers.concat([h2, y], 1) h2 = fluid.layers.concat([h2, y], 1)
h3 = fc(h2, 1) h3 = fc(h2, 1, act='sigmoid')
return h3 return h3
def G_cond(z, y): def G_cond(z, y):
s_h, s_w = output_height, output_width s_h, s_w = output_height, output_width
s_h2, s_h4 = int(s_h / 2), int(s_h / 4) s_h2, s_h4 = int(s_h // 2), int(s_h // 4)
s_w2, s_w4 = int(s_w / 2), int(s_w / 4) s_w2, s_w4 = int(s_w // 2), int(s_w // 4)
yb = fluid.layers.reshape(y, [-1, y_dim, 1, 1]) #NCHW yb = fluid.layers.reshape(y, [-1, y_dim, 1, 1]) #NCHW
z = fluid.layers.concat([z, y], 1) z = fluid.layers.concat([z, y], 1)
h0 = bn(fc(z, gfc_dim / 2), act='relu') h0 = bn(fc(z, gfc_dim // 2), act='relu')
h0 = fluid.layers.concat([h0, y], 1) h0 = fluid.layers.concat([h0, y], 1)
h1 = bn(fc(h0, gf_dim * 2 * s_h4 * s_w4), act='relu') h1 = bn(fc(h0, gf_dim * 2 * s_h4 * s_w4), act='relu')
...@@ -128,14 +131,14 @@ def D(x): ...@@ -128,14 +131,14 @@ def D(x):
x = conv(x, df_dim, act='leaky_relu') x = conv(x, df_dim, act='leaky_relu')
x = bn(conv(x, df_dim * 2), act='leaky_relu') x = bn(conv(x, df_dim * 2), act='leaky_relu')
x = bn(fc(x, dfc_dim), act='leaky_relu') x = bn(fc(x, dfc_dim), act='leaky_relu')
x = fc(x, 1, act=None) x = fc(x, 1, act='sigmoid')
return x return x
def G(x): def G(x):
x = bn(fc(x, gfc_dim)) x = bn(fc(x, gfc_dim))
x = bn(fc(x, gf_dim * 2 * img_dim / 4 * img_dim / 4)) x = bn(fc(x, gf_dim * 2 * img_dim // 4 * img_dim // 4))
x = fluid.layers.reshape(x, [-1, gf_dim * 2, img_dim / 4, img_dim / 4]) x = fluid.layers.reshape(x, [-1, gf_dim * 2, img_dim // 4, img_dim // 4])
x = deconv(x, gf_dim * 2, act='relu', output_size=[14, 14]) x = deconv(x, gf_dim * 2, act='relu', output_size=[14, 14])
x = deconv(x, 1, filter_size=5, padding=2, act='tanh', output_size=[28, 28]) x = deconv(x, 1, filter_size=5, padding=2, act='tanh', output_size=[28, 28])
x = fluid.layers.reshape(x, shape=[-1, 28 * 28]) x = fluid.layers.reshape(x, shape=[-1, 28 * 28])
......
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import math import math
import distutils.util import distutils.util
import numpy as np import numpy as np
import inspect import inspect
import matplotlib import matplotlib
import six
matplotlib.use('agg') matplotlib.use('agg')
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec import matplotlib.gridspec as gridspec
...@@ -54,7 +58,7 @@ def print_arguments(args): ...@@ -54,7 +58,7 @@ def print_arguments(args):
:type args: argparse.Namespace :type args: argparse.Namespace
""" """
print("----------- Configuration Arguments -----------") print("----------- Configuration Arguments -----------")
for arg, value in sorted(vars(args).iteritems()): for arg, value in sorted(six.iteritems(vars(args))):
print("%s: %s" % (arg, value)) print("%s: %s" % (arg, value))
print("------------------------------------------------") print("------------------------------------------------")
......
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os import os
from PIL import Image from PIL import Image
import numpy as np import numpy as np
from itertools import izip
A_LIST_FILE = "./data/horse2zebra/trainA.txt" A_LIST_FILE = "./data/horse2zebra/trainA.txt"
B_LIST_FILE = "./data/horse2zebra/trainB.txt" B_LIST_FILE = "./data/horse2zebra/trainB.txt"
...@@ -70,11 +72,3 @@ def b_test_reader(): ...@@ -70,11 +72,3 @@ def b_test_reader():
Reader of images with B style for test. Reader of images with B style for test.
""" """
return reader_creater(B_TEST_LIST_FILE, cycle=False, return_name=True) return reader_creater(B_TEST_LIST_FILE, cycle=False, return_name=True)
if __name__ == "__main__":
for A, B in izip(a_test_reader()(), a_test_reader()()):
print A[0].shape
print A[1]
print B[0].shape
print B[1]
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import data_reader import data_reader
import os import os
import random import random
...@@ -9,7 +12,6 @@ import paddle.fluid as fluid ...@@ -9,7 +12,6 @@ import paddle.fluid as fluid
import numpy as np import numpy as np
from paddle.fluid import core from paddle.fluid import core
from trainer import * from trainer import *
from itertools import izip
from scipy.misc import imsave from scipy.misc import imsave
import paddle.fluid.profiler as profiler import paddle.fluid.profiler as profiler
from utility import add_arguments, print_arguments, ImagePool from utility import add_arguments, print_arguments, ImagePool
...@@ -66,7 +68,7 @@ def train(args): ...@@ -66,7 +68,7 @@ def train(args):
if not os.path.exists(out_path): if not os.path.exists(out_path):
os.makedirs(out_path) os.makedirs(out_path)
i = 0 i = 0
for data_A, data_B in izip(A_test_reader(), B_test_reader()): for data_A, data_B in zip(A_test_reader(), B_test_reader()):
A_name = data_A[1] A_name = data_A[1]
B_name = data_B[1] B_name = data_B[1]
tensor_A = core.LoDTensor() tensor_A = core.LoDTensor()
...@@ -114,7 +116,7 @@ def train(args): ...@@ -114,7 +116,7 @@ def train(args):
exe, out_path + "/d_a", main_program=d_A_trainer.program) exe, out_path + "/d_a", main_program=d_A_trainer.program)
fluid.io.save_persistables( fluid.io.save_persistables(
exe, out_path + "/d_b", main_program=d_B_trainer.program) exe, out_path + "/d_b", main_program=d_B_trainer.program)
print "saved checkpoint to [%s]" % out_path print("saved checkpoint to {}".format(out_path))
sys.stdout.flush() sys.stdout.flush()
def init_model(): def init_model():
...@@ -128,7 +130,7 @@ def train(args): ...@@ -128,7 +130,7 @@ def train(args):
exe, args.init_model + "/d_a", main_program=d_A_trainer.program) exe, args.init_model + "/d_a", main_program=d_A_trainer.program)
fluid.io.load_persistables( fluid.io.load_persistables(
exe, args.init_model + "/d_b", main_program=d_B_trainer.program) exe, args.init_model + "/d_b", main_program=d_B_trainer.program)
print "Load model from [%s]" % args.init_model print("Load model from {}".format(args.init_model))
if args.init_model: if args.init_model:
init_model() init_model()
...@@ -136,8 +138,8 @@ def train(args): ...@@ -136,8 +138,8 @@ def train(args):
for epoch in range(args.epoch): for epoch in range(args.epoch):
batch_id = 0 batch_id = 0
for i in range(max_images_num): for i in range(max_images_num):
data_A = A_reader.next() data_A = next(A_reader)
data_B = B_reader.next() data_B = next(B_reader)
tensor_A = core.LoDTensor() tensor_A = core.LoDTensor()
tensor_B = core.LoDTensor() tensor_B = core.LoDTensor()
tensor_A.set(data_A, place) tensor_A.set(data_A, place)
...@@ -174,9 +176,9 @@ def train(args): ...@@ -174,9 +176,9 @@ def train(args):
feed={"input_A": tensor_A, feed={"input_A": tensor_A,
"fake_pool_A": fake_pool_A}) "fake_pool_A": fake_pool_A})
print "epoch[%d]; batch[%d]; g_A_loss: %s; d_B_loss: %s; g_B_loss: %s; d_A_loss: %s;" % ( print("epoch{}; batch{}; g_A_loss: {}; d_B_loss: {}; g_B_loss: {}; d_A_loss: {};".format(
epoch, batch_id, g_A_loss[0], d_B_loss[0], g_B_loss[0], epoch, batch_id, g_A_loss[0], d_B_loss[0], g_B_loss[0],
d_A_loss[0]) d_A_loss[0]))
sys.stdout.flush() sys.stdout.flush()
batch_id += 1 batch_id += 1
......
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from model import * from model import *
import paddle.fluid as fluid import paddle.fluid as fluid
......
...@@ -17,6 +17,7 @@ from __future__ import absolute_import ...@@ -17,6 +17,7 @@ from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
import distutils.util import distutils.util
import six
import random import random
import glob import glob
import numpy as np import numpy as np
...@@ -39,7 +40,7 @@ def print_arguments(args): ...@@ -39,7 +40,7 @@ def print_arguments(args):
:type args: argparse.Namespace :type args: argparse.Namespace
""" """
print("----------- Configuration Arguments -----------") print("----------- Configuration Arguments -----------")
for arg, value in sorted(vars(args).iteritems()): for arg, value in sorted(six.iteritems(vars(args))):
print("%s: %s" % (arg, value)) print("%s: %s" % (arg, value))
print("------------------------------------------------") print("------------------------------------------------")
......
...@@ -8,7 +8,7 @@ import os ...@@ -8,7 +8,7 @@ import os
import cv2 import cv2
import paddle.fluid as fluid import paddle.fluid as fluid
import paddle.v2 as paddle import paddle
from icnet import icnet from icnet import icnet
from utils import add_arguments, print_arguments, get_feeder_data from utils import add_arguments, print_arguments, get_feeder_data
from paddle.fluid.layers.learning_rate_scheduler import _decay_step_counter from paddle.fluid.layers.learning_rate_scheduler import _decay_step_counter
...@@ -111,10 +111,10 @@ def infer(args): ...@@ -111,10 +111,10 @@ def infer(args):
for line in open(args.images_list): for line in open(args.images_list):
image_file = args.images_path + "/" + line.strip() image_file = args.images_path + "/" + line.strip()
filename = os.path.basename(image_file) filename = os.path.basename(image_file)
image = paddle.image.load_image( image = paddle.dataset.image.load_image(
image_file, is_color=True).astype("float32") image_file, is_color=True).astype("float32")
image -= IMG_MEAN image -= IMG_MEAN
img = paddle.image.to_chw(image)[np.newaxis, :] img = paddle.dataset.image.to_chw(image)[np.newaxis, :]
image_t = fluid.core.LoDTensor() image_t = fluid.core.LoDTensor()
image_t.set(img, place) image_t.set(img, place)
result = exe.run(inference_program, result = exe.run(inference_program,
......
...@@ -17,6 +17,7 @@ from paddle.fluid.initializer import init_on_cpu ...@@ -17,6 +17,7 @@ from paddle.fluid.initializer import init_on_cpu
if 'ce_mode' in os.environ: if 'ce_mode' in os.environ:
np.random.seed(10) np.random.seed(10)
fluid.default_startup_program().random_seed = 90
parser = argparse.ArgumentParser(description=__doc__) parser = argparse.ArgumentParser(description=__doc__)
add_arg = functools.partial(add_arguments, argparser=parser) add_arg = functools.partial(add_arguments, argparser=parser)
...@@ -91,9 +92,6 @@ def train(args): ...@@ -91,9 +92,6 @@ def train(args):
place = fluid.CUDAPlace(0) place = fluid.CUDAPlace(0)
exe = fluid.Executor(place) exe = fluid.Executor(place)
if 'ce_mode' in os.environ:
fluid.default_startup_program().random_seed = 90
exe.run(fluid.default_startup_program()) exe.run(fluid.default_startup_program())
if args.init_model is not None: if args.init_model is not None:
...@@ -126,9 +124,10 @@ def train(args): ...@@ -126,9 +124,10 @@ def train(args):
sub124_loss += results[3] sub124_loss += results[3]
# training log # training log
if iter_id % LOG_PERIOD == 0: if iter_id % LOG_PERIOD == 0:
print("Iter[%d]; train loss: %.3f; sub4_loss: %.3f; sub24_loss: %.3f; sub124_loss: %.3f" % ( print(
iter_id, t_loss / LOG_PERIOD, sub4_loss / LOG_PERIOD, "Iter[%d]; train loss: %.3f; sub4_loss: %.3f; sub24_loss: %.3f; sub124_loss: %.3f"
sub24_loss / LOG_PERIOD, sub124_loss / LOG_PERIOD)) % (iter_id, t_loss / LOG_PERIOD, sub4_loss / LOG_PERIOD,
sub24_loss / LOG_PERIOD, sub124_loss / LOG_PERIOD))
print("kpis train_cost %f" % (t_loss / LOG_PERIOD)) print("kpis train_cost %f" % (t_loss / LOG_PERIOD))
t_loss = 0. t_loss = 0.
......
...@@ -14,7 +14,7 @@ ...@@ -14,7 +14,7 @@
## 安装 ## 安装
在当前目录下运行样例代码需要PadddlePaddle Fluid的v0.13.0或以上的版本。如果你的运行环境中的PaddlePaddle低于此版本,请根据[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明来更新PaddlePaddle。 在当前目录下运行样例代码需要PadddlePaddle Fluid的v0.13.0或以上的版本。如果你的运行环境中的PaddlePaddle低于此版本,请根据安装文档中的说明来更新PaddlePaddle。
## 数据准备 ## 数据准备
......
...@@ -19,7 +19,7 @@ test_acc_top1_kpi = AccKpi( ...@@ -19,7 +19,7 @@ test_acc_top1_kpi = AccKpi(
test_acc_top5_kpi = AccKpi( test_acc_top5_kpi = AccKpi(
'test_acc_top5', 0.02, 0, actived=True, desc='TOP5 ACC') 'test_acc_top5', 0.02, 0, actived=True, desc='TOP5 ACC')
test_cost_kpi = CostKpi('test_cost', 0.02, 0, actived=True, desc='train cost') test_cost_kpi = CostKpi('test_cost', 0.02, 0, actived=True, desc='train cost')
train_speed_kpi = AccKpi( train_speed_kpi = DurationKpi(
'train_speed', 'train_speed',
0.05, 0.05,
0, 0,
...@@ -38,7 +38,7 @@ test_acc_top5_card4_kpi = AccKpi( ...@@ -38,7 +38,7 @@ test_acc_top5_card4_kpi = AccKpi(
'test_acc_top5_card4', 0.02, 0, actived=True, desc='TOP5 ACC') 'test_acc_top5_card4', 0.02, 0, actived=True, desc='TOP5 ACC')
test_cost_card4_kpi = CostKpi( test_cost_card4_kpi = CostKpi(
'test_cost_card4', 0.02, 0, actived=True, desc='train cost') 'test_cost_card4', 0.02, 0, actived=True, desc='train cost')
train_speed_card4_kpi = AccKpi( train_speed_card4_kpi = DurationKpi(
'train_speed_card4', 'train_speed_card4',
0.05, 0.05,
0, 0,
......
...@@ -19,7 +19,7 @@ This tool is used to convert a Caffe model to a Fluid model ...@@ -19,7 +19,7 @@ This tool is used to convert a Caffe model to a Fluid model
- Download one from github directly - Download one from github directly
``` ```
cd proto/ && wget https://github.com/ethereon/caffe-tensorflow/blob/master/kaffe/caffe/caffepb.py cd proto/ && wget https://raw.githubusercontent.com/ethereon/caffe-tensorflow/master/kaffe/caffe/caffepb.py
``` ```
2. Convert the Caffe model to Fluid model 2. Convert the Caffe model to Fluid model
......
...@@ -8,7 +8,7 @@ import sys ...@@ -8,7 +8,7 @@ import sys
import os import os
import numpy as np import numpy as np
import paddle.fluid as fluid import paddle.fluid as fluid
import paddle.v2 as paddle import paddle
def test_model(exe, test_program, fetch_list, test_reader, feeder): def test_model(exe, test_program, fetch_list, test_reader, feeder):
......
...@@ -21,6 +21,11 @@ def parse_args(): ...@@ -21,6 +21,11 @@ def parse_args():
action='store_true', action='store_true',
help='If set, run \ help='If set, run \
the task with continuous evaluation logs.') the task with continuous evaluation logs.')
parser.add_argument(
'--num_devices',
type=int,
default=1,
help='Number of GPU devices')
args = parser.parse_args() args = parser.parse_args()
return args return args
...@@ -132,7 +137,7 @@ def train(train_reader, ...@@ -132,7 +137,7 @@ def train(train_reader,
"src_wordseq": lod_src_wordseq, "src_wordseq": lod_src_wordseq,
"dst_wordseq": lod_dst_wordseq "dst_wordseq": lod_dst_wordseq
}, },
fetch_list=fetch_list) fetch_list=fetch_list)
avg_ppl = np.exp(ret_avg_cost[0]) avg_ppl = np.exp(ret_avg_cost[0])
newest_ppl = np.mean(avg_ppl) newest_ppl = np.mean(avg_ppl)
if i % 100 == 0: if i % 100 == 0:
...@@ -153,7 +158,7 @@ def train(train_reader, ...@@ -153,7 +158,7 @@ def train(train_reader,
print("kpis imikolov_20_avg_ppl %s" % newest_ppl) print("kpis imikolov_20_avg_ppl %s" % newest_ppl)
else: else:
print("kpis imikolov_20_pass_duration_card%s %s" % \ print("kpis imikolov_20_pass_duration_card%s %s" % \
(gpu_num, total_time / epoch_idx)) (gpu_num, total_time / epoch_idx))
print("kpis imikolov_20_avg_ppl_card%s %s" % print("kpis imikolov_20_avg_ppl_card%s %s" %
(gpu_num, newest_ppl)) (gpu_num, newest_ppl))
save_dir = "%s/epoch_%d" % (model_dir, epoch_idx) save_dir = "%s/epoch_%d" % (model_dir, epoch_idx)
...@@ -165,13 +170,13 @@ def train(train_reader, ...@@ -165,13 +170,13 @@ def train(train_reader,
print("finish training") print("finish training")
def get_cards(enable_ce): def get_cards(args):
if enable_ce: if args.enable_ce:
cards = os.environ.get('CUDA_VISIBLE_DEVICES') cards = os.environ.get('CUDA_VISIBLE_DEVICES')
num = len(cards.split(",")) num = len(cards.split(","))
return num return num
else: else:
return fluid.core.get_cuda_device_count() return args.num_devices
def train_net(): def train_net():
...@@ -179,7 +184,7 @@ def train_net(): ...@@ -179,7 +184,7 @@ def train_net():
batch_size = 20 batch_size = 20
args = parse_args() args = parse_args()
vocab, train_reader, test_reader = utils.prepare_data( vocab, train_reader, test_reader = utils.prepare_data(
batch_size=batch_size * get_cards(args.enable_ce), buffer_size=1000, \ batch_size=batch_size * get_cards(args), buffer_size=1000, \
word_freq_threshold=0, enable_ce = args.enable_ce) word_freq_threshold=0, enable_ce = args.enable_ce)
train( train(
train_reader=train_reader, train_reader=train_reader,
......
# Abstract
Dureader is an end-to-end neural network model for machine reading comprehension style question answering, which aims to answer questions from given passages. We first match the question and passages with a bidireactional attention flow network to obtrain the question-aware passages represenation. Then we employ a pointer network to locate the positions of answers from passages. Our experimental evalutions show that DuReader model achieves the state-of-the-art results in DuReader Dadaset.
# Dataset
DuReader Dataset is a new large-scale real-world and human sourced MRC dataset in Chinese. DuReader focuses on real-world open-domain question answering. The advantages of DuReader over existing datasets are concluded as follows:
- Real question
- Real article
- Real answer
- Real application scenario
- Rich annotation
# Network
DuReader model is inspired by 3 classic reading comprehension models([BiDAF](https://arxiv.org/abs/1611.01603), [Match-LSTM](https://arxiv.org/abs/1608.07905), [R-NET](https://www.microsoft.com/en-us/research/wp-content/uploads/2017/05/r-net.pdf)).
DuReader model is a hierarchical multi-stage process and consists of five layers
- **Word Embedding Layer** maps each word to a vector using a pre-trained word embedding model.
- **Encoding Layer** extracts context infomation for each position in question and passages with a bi-directional LSTM network.
- **Attention Flow Layer** couples the query and context vectors and produces a set of query-aware feature vectors for each word in the context. Please refer to [BiDAF](https://arxiv.org/abs/1611.01603) for more details.
- **Fusion Layer** employs a layer of bi-directional LSTM to capture the interaction among context words independent of the query.
- **Decode Layer** employs an answer point network with attention pooling of the quesiton to locate the positions of answers from passages. Please refer to [Match-LSTM](https://arxiv.org/abs/1608.07905) and [R-NET](https://www.microsoft.com/en-us/research/wp-content/uploads/2017/05/r-net.pdf) for more details.
## How to Run
### Download the Dataset
To Download DuReader dataset:
```
cd data && bash download.sh
```
For more details about DuReader dataset please refer to [DuReader Dataset Homepage](https://ai.baidu.com//broad/subordinate?dataset=dureader).
### Download Thirdparty Dependencies
We use Bleu and Rouge as evaluation metrics, the calculation of these metrics relies on the scoring scripts under [coco-caption](https://github.com/tylin/coco-caption), to download them, run:
```
cd utils && bash download_thirdparty.sh
```
### Environment Requirements
For now we've only tested on PaddlePaddle v1.0, to install PaddlePaddle and for more details about PaddlePaddle, see [PaddlePaddle Homepage](http://paddlepaddle.org).
### Preparation
Before training the model, we have to make sure that the data is ready. For preparation, we will check the data files, make directories and extract a vocabulary for later use. You can run the following command to do this with a specified task name:
```
sh run.sh --prepare
```
You can specify the files for train/dev/test by setting the `trainset`/`devset`/`testset`.
### Training
To train the model and you can also set the hyper-parameters such as the learning rate by using `--learning_rate NUM`. For example, to train the model for 10 passes, you can run:
```
sh run.sh --train --pass_num 10
```
The training process includes an evaluation on the dev set after each training epoch. By default, the model with the least Bleu-4 score on the dev set will be saved.
### Evaluation
To conduct a single evaluation on the dev set with the the model already trained, you can run the following command:
```
sh run.sh --evaluate --load_dir models/1
```
### Prediction
You can also predict answers for the samples in some files using the following command:
```
sh run.sh --predict --load_dir models/1 --testset ../data/preprocessed/testset/search.dev.json
```
By default, the results are saved at `../data/results/` folder. You can change this by specifying `--result_dir DIR_PATH`.
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import distutils.util
def parse_args():
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument(
'--prepare',
action='store_true',
help='create the directories, prepare the vocabulary and embeddings')
parser.add_argument('--train', action='store_true', help='train the model')
parser.add_argument(
'--evaluate', action='store_true', help='evaluate the model on dev set')
parser.add_argument(
'--predict',
action='store_true',
help='predict the answers for test set with trained model')
parser.add_argument(
"--embed_size",
type=int,
default=300,
help="The dimension of embedding table. (default: %(default)d)")
parser.add_argument(
"--hidden_size",
type=int,
default=300,
help="The size of rnn hidden unit. (default: %(default)d)")
parser.add_argument(
"--batch_size",
type=int,
default=32,
help="The sequence number of a mini-batch data. (default: %(default)d)")
parser.add_argument(
"--pass_num",
type=int,
default=5,
help="The pass number to train. (default: %(default)d)")
parser.add_argument(
"--learning_rate",
type=float,
default=0.001,
help="Learning rate used to train the model. (default: %(default)f)")
parser.add_argument(
"--weight_decay",
type=float,
default=0.0001,
help="Weight decay. (default: %(default)f)")
parser.add_argument(
"--use_gpu",
type=distutils.util.strtobool,
default=True,
help="Whether to use gpu. (default: %(default)d)")
parser.add_argument(
"--save_dir",
type=str,
default="model",
help="Specify the path to save trained models.")
parser.add_argument(
"--load_dir",
type=str,
default="",
help="Specify the path to load trained models.")
parser.add_argument(
"--save_interval",
type=int,
default=1,
help="Save the trained model every n passes."
"(default: %(default)d)")
parser.add_argument(
"--log_interval",
type=int,
default=50,
help="log the train loss every n batches."
"(default: %(default)d)")
parser.add_argument(
"--dev_interval",
type=int,
default=1000,
help="cal dev loss every n batches."
"(default: %(default)d)")
parser.add_argument('--optim', default='adam', help='optimizer type')
parser.add_argument('--trainset', nargs='+', help='train dataset')
parser.add_argument('--devset', nargs='+', help='dev dataset')
parser.add_argument('--testset', nargs='+', help='test dataset')
parser.add_argument('--vocab_dir', help='dict')
parser.add_argument('--max_p_num', type=int, default=5)
parser.add_argument('--max_a_len', type=int, default=200)
parser.add_argument('--max_p_len', type=int, default=500)
parser.add_argument('--max_q_len', type=int, default=9)
parser.add_argument('--doc_num', type=int, default=5)
parser.add_argument('--para_print', action='store_true')
parser.add_argument('--drop_rate', type=float, default=0.0)
parser.add_argument('--random_seed', type=int, default=123)
parser.add_argument(
'--log_path',
help='path of the log file. If not set, logs are printed to console')
parser.add_argument(
'--result_dir',
default='../data/results/',
help='the dir to output the results')
parser.add_argument(
'--result_name',
default='test_result',
help='the file name of the results')
args = parser.parse_args()
return args
#!/bin/bash
# ==============================================================================
# Copyright 2017 Baidu.com, Inc. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
if [[ -d preprocessed ]] && [[ -d raw ]]; then
echo "data exist"
exit 0
else
wget -c --no-check-certificate http://dureader.gz.bcebos.com/dureader_preprocessed.zip
fi
if md5sum --status -c md5sum.txt; then
unzip dureader_preprocessed.zip
else
echo "download data error!" >> /dev/stderr
exit 1
fi
7a4c28026f7dc94e8135d17203c63664 dureader_preprocessed.zip
# -*- coding:utf8 -*-
# ==============================================================================
# Copyright 2017 Baidu.com, Inc. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""
This module implements data process strategies.
"""
import os
import json
import logging
import numpy as np
from collections import Counter
class BRCDataset(object):
"""
This module implements the APIs for loading and using baidu reading comprehension dataset
"""
def __init__(self,
max_p_num,
max_p_len,
max_q_len,
train_files=[],
dev_files=[],
test_files=[]):
self.logger = logging.getLogger("brc")
self.max_p_num = max_p_num
self.max_p_len = max_p_len
self.max_q_len = max_q_len
self.train_set, self.dev_set, self.test_set = [], [], []
if train_files:
for train_file in train_files:
self.train_set += self._load_dataset(train_file, train=True)
self.logger.info('Train set size: {} questions.'.format(
len(self.train_set)))
if dev_files:
for dev_file in dev_files:
self.dev_set += self._load_dataset(dev_file)
self.logger.info('Dev set size: {} questions.'.format(
len(self.dev_set)))
if test_files:
for test_file in test_files:
self.test_set += self._load_dataset(test_file)
self.logger.info('Test set size: {} questions.'.format(
len(self.test_set)))
def _load_dataset(self, data_path, train=False):
"""
Loads the dataset
Args:
data_path: the data file to load
"""
with open(data_path) as fin:
data_set = []
for lidx, line in enumerate(fin):
sample = json.loads(line.strip())
if train:
if len(sample['answer_spans']) == 0:
continue
if sample['answer_spans'][0][1] >= self.max_p_len:
continue
if 'answer_docs' in sample:
sample['answer_passages'] = sample['answer_docs']
sample['question_tokens'] = sample['segmented_question']
sample['passages'] = []
for d_idx, doc in enumerate(sample['documents']):
if train:
most_related_para = doc['most_related_para']
sample['passages'].append({
'passage_tokens':
doc['segmented_paragraphs'][most_related_para],
'is_selected': doc['is_selected']
})
else:
para_infos = []
for para_tokens in doc['segmented_paragraphs']:
question_tokens = sample['segmented_question']
common_with_question = Counter(
para_tokens) & Counter(question_tokens)
correct_preds = sum(common_with_question.values())
if correct_preds == 0:
recall_wrt_question = 0
else:
recall_wrt_question = float(
correct_preds) / len(question_tokens)
para_infos.append((para_tokens, recall_wrt_question,
len(para_tokens)))
para_infos.sort(key=lambda x: (-x[1], x[2]))
fake_passage_tokens = []
for para_info in para_infos[:1]:
fake_passage_tokens += para_info[0]
sample['passages'].append({
'passage_tokens': fake_passage_tokens
})
data_set.append(sample)
return data_set
def _one_mini_batch(self, data, indices, pad_id):
"""
Get one mini batch
Args:
data: all data
indices: the indices of the samples to be selected
pad_id:
Returns:
one batch of data
"""
batch_data = {
'raw_data': [data[i] for i in indices],
'question_token_ids': [],
'question_length': [],
'passage_token_ids': [],
'passage_length': [],
'start_id': [],
'end_id': []
}
max_passage_num = max(
[len(sample['passages']) for sample in batch_data['raw_data']])
#max_passage_num = min(self.max_p_num, max_passage_num)
max_passage_num = self.max_p_num
for sidx, sample in enumerate(batch_data['raw_data']):
for pidx in range(max_passage_num):
if pidx < len(sample['passages']):
batch_data['question_token_ids'].append(sample[
'question_token_ids'])
batch_data['question_length'].append(
len(sample['question_token_ids']))
passage_token_ids = sample['passages'][pidx][
'passage_token_ids']
batch_data['passage_token_ids'].append(passage_token_ids)
batch_data['passage_length'].append(
min(len(passage_token_ids), self.max_p_len))
else:
batch_data['question_token_ids'].append([])
batch_data['question_length'].append(0)
batch_data['passage_token_ids'].append([])
batch_data['passage_length'].append(0)
batch_data, padded_p_len, padded_q_len = self._dynamic_padding(
batch_data, pad_id)
for sample in batch_data['raw_data']:
if 'answer_passages' in sample and len(sample['answer_passages']):
gold_passage_offset = padded_p_len * sample['answer_passages'][
0]
batch_data['start_id'].append(gold_passage_offset + sample[
'answer_spans'][0][0])
batch_data['end_id'].append(gold_passage_offset + sample[
'answer_spans'][0][1])
else:
# fake span for some samples, only valid for testing
batch_data['start_id'].append(0)
batch_data['end_id'].append(0)
return batch_data
def _dynamic_padding(self, batch_data, pad_id):
"""
Dynamically pads the batch_data with pad_id
"""
pad_p_len = min(self.max_p_len, max(batch_data['passage_length']))
pad_q_len = min(self.max_q_len, max(batch_data['question_length']))
batch_data['passage_token_ids'] = [
(ids + [pad_id] * (pad_p_len - len(ids)))[:pad_p_len]
for ids in batch_data['passage_token_ids']
]
batch_data['question_token_ids'] = [
(ids + [pad_id] * (pad_q_len - len(ids)))[:pad_q_len]
for ids in batch_data['question_token_ids']
]
return batch_data, pad_p_len, pad_q_len
def word_iter(self, set_name=None):
"""
Iterates over all the words in the dataset
Args:
set_name: if it is set, then the specific set will be used
Returns:
a generator
"""
if set_name is None:
data_set = self.train_set + self.dev_set + self.test_set
elif set_name == 'train':
data_set = self.train_set
elif set_name == 'dev':
data_set = self.dev_set
elif set_name == 'test':
data_set = self.test_set
else:
raise NotImplementedError('No data set named as {}'.format(
set_name))
if data_set is not None:
for sample in data_set:
for token in sample['question_tokens']:
yield token
for passage in sample['passages']:
for token in passage['passage_tokens']:
yield token
def convert_to_ids(self, vocab):
"""
Convert the question and passage in the original dataset to ids
Args:
vocab: the vocabulary on this dataset
"""
for data_set in [self.train_set, self.dev_set, self.test_set]:
if data_set is None:
continue
for sample in data_set:
sample['question_token_ids'] = vocab.convert_to_ids(sample[
'question_tokens'])
for passage in sample['passages']:
passage['passage_token_ids'] = vocab.convert_to_ids(passage[
'passage_tokens'])
def gen_mini_batches(self, set_name, batch_size, pad_id, shuffle=True):
"""
Generate data batches for a specific dataset (train/dev/test)
Args:
set_name: train/dev/test to indicate the set
batch_size: number of samples in one batch
pad_id: pad id
shuffle: if set to be true, the data is shuffled.
Returns:
a generator for all batches
"""
if set_name == 'train':
data = self.train_set
elif set_name == 'dev':
data = self.dev_set
elif set_name == 'test':
data = self.test_set
else:
raise NotImplementedError('No data set named as {}'.format(
set_name))
data_size = len(data)
indices = np.arange(data_size)
if shuffle:
np.random.shuffle(indices)
for batch_start in np.arange(0, data_size, batch_size):
batch_indices = indices[batch_start:batch_start + batch_size]
yield self._one_mini_batch(data, batch_indices, pad_id)
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import paddle.fluid.layers as layers
import paddle.fluid as fluid
import numpy as np
def dropout(input, args):
if args.drop_rate:
return layers.dropout(
input,
dropout_prob=args.drop_rate,
seed=args.random_seed,
is_test=False)
else:
return input
def bi_lstm_encoder(input_seq, gate_size, para_name, args):
# A bi-directional lstm encoder implementation.
# Linear transformation part for input gate, output gate, forget gate
# and cell activation vectors need be done outside of dynamic_lstm.
# So the output size is 4 times of gate_size.
input_forward_proj = layers.fc(
input=input_seq,
param_attr=fluid.ParamAttr(name=para_name + '_fw_gate_w'),
size=gate_size * 4,
act=None,
bias_attr=False)
input_reversed_proj = layers.fc(
input=input_seq,
param_attr=fluid.ParamAttr(name=para_name + '_bw_gate_w'),
size=gate_size * 4,
act=None,
bias_attr=False)
forward, _ = layers.dynamic_lstm(
input=input_forward_proj,
size=gate_size * 4,
use_peepholes=False,
param_attr=fluid.ParamAttr(name=para_name + '_fw_lstm_w'),
bias_attr=fluid.ParamAttr(name=para_name + '_fw_lstm_b'))
reversed, _ = layers.dynamic_lstm(
input=input_reversed_proj,
param_attr=fluid.ParamAttr(name=para_name + '_bw_lstm_w'),
bias_attr=fluid.ParamAttr(name=para_name + '_bw_lstm_b'),
size=gate_size * 4,
is_reverse=True,
use_peepholes=False)
encoder_out = layers.concat(input=[forward, reversed], axis=1)
return encoder_out
def encoder(input_name, para_name, shape, hidden_size, args):
input_ids = layers.data(
name=input_name, shape=[1], dtype='int64', lod_level=1)
input_embedding = layers.embedding(
input=input_ids,
size=shape,
dtype='float32',
is_sparse=True,
param_attr=fluid.ParamAttr(name='embedding_para'))
encoder_out = bi_lstm_encoder(
input_seq=input_embedding,
gate_size=hidden_size,
para_name=para_name,
args=args)
return dropout(encoder_out, args)
def attn_flow(q_enc, p_enc, p_ids_name, args):
tag = p_ids_name + "::"
drnn = layers.DynamicRNN()
with drnn.block():
h_cur = drnn.step_input(p_enc)
u_all = drnn.static_input(q_enc)
h_expd = layers.sequence_expand(x=h_cur, y=u_all)
s_t_mul = layers.elementwise_mul(x=u_all, y=h_expd, axis=0)
s_t_sum = layers.reduce_sum(input=s_t_mul, dim=1, keep_dim=True)
s_t_re = layers.reshape(s_t_sum, shape=[-1, 0])
s_t = layers.sequence_softmax(input=s_t_re)
u_expr = layers.elementwise_mul(x=u_all, y=s_t, axis=0)
u_expr = layers.sequence_pool(input=u_expr, pool_type='sum')
b_t = layers.sequence_pool(input=s_t_sum, pool_type='max')
drnn.output(u_expr, b_t)
U_expr, b = drnn()
b_norm = layers.sequence_softmax(input=b)
h_expr = layers.elementwise_mul(x=p_enc, y=b_norm, axis=0)
h_expr = layers.sequence_pool(input=h_expr, pool_type='sum')
H_expr = layers.sequence_expand(x=h_expr, y=p_enc)
H_expr = layers.lod_reset(x=H_expr, y=p_enc)
h_u = layers.elementwise_mul(x=p_enc, y=U_expr, axis=0)
h_h = layers.elementwise_mul(x=p_enc, y=H_expr, axis=0)
g = layers.concat(input=[p_enc, U_expr, h_u, h_h], axis=1)
return dropout(g, args)
def lstm_step(x_t, hidden_t_prev, cell_t_prev, size, para_name, args):
def linear(inputs, para_name, args):
return layers.fc(input=inputs,
size=size,
param_attr=fluid.ParamAttr(name=para_name + '_w'),
bias_attr=fluid.ParamAttr(name=para_name + '_b'))
input_cat = layers.concat([hidden_t_prev, x_t], axis=1)
forget_gate = layers.sigmoid(x=linear(input_cat, para_name + '_lstm_f',
args))
input_gate = layers.sigmoid(x=linear(input_cat, para_name + '_lstm_i',
args))
output_gate = layers.sigmoid(x=linear(input_cat, para_name + '_lstm_o',
args))
cell_tilde = layers.tanh(x=linear(input_cat, para_name + '_lstm_c', args))
cell_t = layers.sums(input=[
layers.elementwise_mul(
x=forget_gate, y=cell_t_prev), layers.elementwise_mul(
x=input_gate, y=cell_tilde)
])
hidden_t = layers.elementwise_mul(x=output_gate, y=layers.tanh(x=cell_t))
return hidden_t, cell_t
#point network
def point_network_decoder(p_vec, q_vec, hidden_size, args):
tag = 'pn_decoder:'
init_random = fluid.initializer.Normal(loc=0.0, scale=1.0)
random_attn = layers.create_parameter(
shape=[1, hidden_size],
dtype='float32',
default_initializer=init_random)
random_attn = layers.fc(
input=random_attn,
size=hidden_size,
act=None,
param_attr=fluid.ParamAttr(name=tag + 'random_attn_fc_w'),
bias_attr=fluid.ParamAttr(name=tag + 'random_attn_fc_b'))
random_attn = layers.reshape(random_attn, shape=[-1])
U = layers.fc(input=q_vec,
param_attr=fluid.ParamAttr(name=tag + 'q_vec_fc_w'),
bias_attr=False,
size=hidden_size,
act=None) + random_attn
U = layers.tanh(U)
logits = layers.fc(input=U,
param_attr=fluid.ParamAttr(name=tag + 'logits_fc_w'),
bias_attr=fluid.ParamAttr(name=tag + 'logits_fc_b'),
size=1,
act=None)
scores = layers.sequence_softmax(input=logits)
pooled_vec = layers.elementwise_mul(x=q_vec, y=scores, axis=0)
pooled_vec = layers.sequence_pool(input=pooled_vec, pool_type='sum')
init_state = layers.fc(
input=pooled_vec,
param_attr=fluid.ParamAttr(name=tag + 'init_state_fc_w'),
bias_attr=fluid.ParamAttr(name=tag + 'init_state_fc_b'),
size=hidden_size,
act=None)
def custom_dynamic_rnn(p_vec, init_state, hidden_size, para_name, args):
tag = para_name + "custom_dynamic_rnn:"
def static_rnn(step,
p_vec=p_vec,
init_state=None,
para_name='',
args=args):
tag = para_name + "static_rnn:"
ctx = layers.fc(
input=p_vec,
param_attr=fluid.ParamAttr(name=tag + 'context_fc_w'),
bias_attr=fluid.ParamAttr(name=tag + 'context_fc_b'),
size=hidden_size,
act=None)
beta = []
c_prev = init_state
m_prev = init_state
for i in range(step):
m_prev0 = layers.fc(
input=m_prev,
size=hidden_size,
act=None,
param_attr=fluid.ParamAttr(name=tag + 'm_prev0_fc_w'),
bias_attr=fluid.ParamAttr(name=tag + 'm_prev0_fc_b'))
m_prev1 = layers.sequence_expand(x=m_prev0, y=ctx)
Fk = ctx + m_prev1
Fk = layers.tanh(Fk)
logits = layers.fc(
input=Fk,
size=1,
act=None,
param_attr=fluid.ParamAttr(name=tag + 'logits_fc_w'),
bias_attr=fluid.ParamAttr(name=tag + 'logits_fc_b'))
scores = layers.sequence_softmax(input=logits)
attn_ctx = layers.elementwise_mul(x=p_vec, y=scores, axis=0)
attn_ctx = layers.sequence_pool(input=attn_ctx, pool_type='sum')
hidden_t, cell_t = lstm_step(
attn_ctx,
hidden_t_prev=m_prev,
cell_t_prev=c_prev,
size=hidden_size,
para_name=tag,
args=args)
m_prev = hidden_t
c_prev = cell_t
beta.append(scores)
return beta
return static_rnn(
2, p_vec=p_vec, init_state=init_state, para_name=para_name)
fw_outputs = custom_dynamic_rnn(p_vec, init_state, hidden_size, tag + "fw:",
args)
bw_outputs = custom_dynamic_rnn(p_vec, init_state, hidden_size, tag + "bw:",
args)
start_prob = layers.elementwise_add(
x=fw_outputs[0], y=bw_outputs[1], axis=0) / 2
end_prob = layers.elementwise_add(
x=fw_outputs[1], y=bw_outputs[0], axis=0) / 2
return start_prob, end_prob
def fusion(g, args):
m = bi_lstm_encoder(
input_seq=g, gate_size=args.hidden_size, para_name='fusion', args=args)
return dropout(m, args)
def rc_model(hidden_size, vocab, args):
emb_shape = [vocab.size(), vocab.embed_dim]
# stage 1:encode
p_ids_names = []
q_ids_names = []
ms = []
gs = []
qs = []
for i in range(args.doc_num):
p_ids_name = "pids_%d" % i
p_ids_names.append(p_ids_name)
p_enc_i = encoder(p_ids_name, 'p_enc', emb_shape, hidden_size, args)
q_ids_name = "qids_%d" % i
q_ids_names.append(q_ids_name)
q_enc_i = encoder(q_ids_name, 'q_enc', emb_shape, hidden_size, args)
# stage 2:match
g_i = attn_flow(q_enc_i, p_enc_i, p_ids_name, args)
# stage 3:fusion
m_i = fusion(g_i, args)
ms.append(m_i)
gs.append(g_i)
qs.append(q_enc_i)
m = layers.sequence_concat(input=ms)
g = layers.sequence_concat(input=gs)
q_vec = layers.sequence_concat(input=qs)
# stage 4:decode
start_probs, end_probs = point_network_decoder(
p_vec=m, q_vec=q_vec, hidden_size=hidden_size, args=args)
start_labels = layers.data(
name="start_lables", shape=[1], dtype='float32', lod_level=1)
end_labels = layers.data(
name="end_lables", shape=[1], dtype='float32', lod_level=1)
cost0 = layers.sequence_pool(
layers.cross_entropy(
input=start_probs, label=start_labels, soft_label=True),
'sum')
cost1 = layers.sequence_pool(
layers.cross_entropy(
input=end_probs, label=end_labels, soft_label=True),
'sum')
cost0 = layers.mean(cost0)
cost1 = layers.mean(cost1)
cost = cost0 + cost1
cost.persistable = True
feeding_list = q_ids_names + ["start_lables", "end_lables"] + p_ids_names
return cost, start_probs, end_probs, feeding_list
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import time
import os
import random
import json
import six
import paddle
import paddle.fluid as fluid
import paddle.fluid.core as core
import paddle.fluid.framework as framework
from paddle.fluid.executor import Executor
import sys
if sys.version[0] == '2':
reload(sys)
sys.setdefaultencoding("utf-8")
sys.path.append('..')
from args import *
import rc_model
from dataset import BRCDataset
import logging
import pickle
from utils import normalize
from utils import compute_bleu_rouge
from vocab import Vocab
def prepare_batch_input(insts, args):
doc_num = args.doc_num
batch_size = len(insts['raw_data'])
new_insts = []
for i in range(batch_size):
p_id = []
q_id = []
p_ids = []
q_ids = []
p_len = 0
for j in range(i * doc_num, (i + 1) * doc_num):
p_ids.append(insts['passage_token_ids'][j])
p_id = p_id + insts['passage_token_ids'][j]
q_ids.append(insts['question_token_ids'][j])
q_id = q_id + insts['question_token_ids'][j]
p_len = len(p_id)
def _get_label(idx, ref_len):
ret = [0.0] * ref_len
if idx >= 0 and idx < ref_len:
ret[idx] = 1.0
return [[x] for x in ret]
start_label = _get_label(insts['start_id'][i], p_len)
end_label = _get_label(insts['end_id'][i], p_len)
new_inst = q_ids + [start_label, end_label] + p_ids
new_insts.append(new_inst)
return new_insts
def LodTensor_Array(lod_tensor):
lod = lod_tensor.lod()
array = np.array(lod_tensor)
new_array = []
for i in range(len(lod[0]) - 1):
new_array.append(array[lod[0][i]:lod[0][i + 1]])
return new_array
def print_para(train_prog, train_exe, logger, args):
if args.para_print:
param_list = train_prog.block(0).all_parameters()
param_name_list = [p.name for p in param_list]
num_sum = 0
for p_name in param_name_list:
p_array = np.array(train_exe.scope.find_var(p_name).get_tensor())
param_num = np.prod(p_array.shape)
num_sum = num_sum + param_num
logger.info(
"param: {0}, mean={1} max={2} min={3} num={4} {5}".format(
p_name,
p_array.mean(),
p_array.max(), p_array.min(), p_array.shape, param_num))
logger.info("total param num: {0}".format(num_sum))
def find_best_answer_for_passage(start_probs, end_probs, passage_len, args):
"""
Finds the best answer with the maximum start_prob * end_prob from a single passage
"""
if passage_len is None:
passage_len = len(start_probs)
else:
passage_len = min(len(start_probs), passage_len)
best_start, best_end, max_prob = -1, -1, 0
for start_idx in range(passage_len):
for ans_len in range(args.max_a_len):
end_idx = start_idx + ans_len
if end_idx >= passage_len:
continue
prob = start_probs[start_idx] * end_probs[end_idx]
if prob > max_prob:
best_start = start_idx
best_end = end_idx
max_prob = prob
return (best_start, best_end), max_prob
def find_best_answer(sample, start_prob, end_prob, padded_p_len, args):
"""
Finds the best answer for a sample given start_prob and end_prob for each position.
This will call find_best_answer_for_passage because there are multiple passages in a sample
"""
best_p_idx, best_span, best_score = None, None, 0
for p_idx, passage in enumerate(sample['passages']):
if p_idx >= args.max_p_num:
continue
passage_len = min(args.max_p_len, len(passage['passage_tokens']))
answer_span, score = find_best_answer_for_passage(
start_prob[p_idx * padded_p_len:(p_idx + 1) * padded_p_len],
end_prob[p_idx * padded_p_len:(p_idx + 1) * padded_p_len],
passage_len, args)
if score > best_score:
best_score = score
best_p_idx = p_idx
best_span = answer_span
if best_p_idx is None or best_span is None:
best_answer = ''
else:
best_answer = ''.join(sample['passages'][best_p_idx]['passage_tokens'][
best_span[0]:best_span[1] + 1])
return best_answer
def validation(inference_program, avg_cost, s_probs, e_probs, feed_order, place,
vocab, brc_data, logger, args):
"""
"""
parallel_executor = fluid.ParallelExecutor(
main_program=inference_program,
use_cuda=bool(args.use_gpu),
loss_name=avg_cost.name)
print_para(inference_program, parallel_executor, logger, args)
# Use test set as validation each pass
total_loss = 0.0
count = 0
pred_answers, ref_answers = [], []
val_feed_list = [
inference_program.global_block().var(var_name)
for var_name in feed_order
]
val_feeder = fluid.DataFeeder(val_feed_list, place)
pad_id = vocab.get_id(vocab.pad_token)
dev_batches = brc_data.gen_mini_batches(
'dev', args.batch_size, pad_id, shuffle=False)
for batch_id, batch in enumerate(dev_batches, 1):
feed_data = prepare_batch_input(batch, args)
val_fetch_outs = parallel_executor.run(
feed=val_feeder.feed(feed_data),
fetch_list=[avg_cost.name, s_probs.name, e_probs.name],
return_numpy=False)
total_loss += np.array(val_fetch_outs[0])[0]
start_probs = LodTensor_Array(val_fetch_outs[1])
end_probs = LodTensor_Array(val_fetch_outs[2])
count += len(batch['raw_data'])
padded_p_len = len(batch['passage_token_ids'][0])
for sample, start_prob, end_prob in zip(batch['raw_data'], start_probs,
end_probs):
best_answer = find_best_answer(sample, start_prob, end_prob,
padded_p_len, args)
pred_answers.append({
'question_id': sample['question_id'],
'question_type': sample['question_type'],
'answers': [best_answer],
'entity_answers': [[]],
'yesno_answers': []
})
if 'answers' in sample:
ref_answers.append({
'question_id': sample['question_id'],
'question_type': sample['question_type'],
'answers': sample['answers'],
'entity_answers': [[]],
'yesno_answers': []
})
if args.result_dir is not None and args.result_name is not None:
if not os.path.exists(args.result_dir):
os.makedirs(args.result_dir)
result_file = os.path.join(args.result_dir, args.result_name + '.json')
with open(result_file, 'w') as fout:
for pred_answer in pred_answers:
fout.write(json.dumps(pred_answer, ensure_ascii=False) + '\n')
logger.info('Saving {} results to {}'.format(args.result_name,
result_file))
ave_loss = 1.0 * total_loss / count
# compute the bleu and rouge scores if reference answers is provided
if len(ref_answers) > 0:
pred_dict, ref_dict = {}, {}
for pred, ref in zip(pred_answers, ref_answers):
question_id = ref['question_id']
if len(ref['answers']) > 0:
pred_dict[question_id] = normalize(pred['answers'])
ref_dict[question_id] = normalize(ref['answers'])
bleu_rouge = compute_bleu_rouge(pred_dict, ref_dict)
else:
bleu_rouge = None
return ave_loss, bleu_rouge
def train(logger, args):
logger.info('Load data_set and vocab...')
with open(os.path.join(args.vocab_dir, 'vocab.data'), 'rb') as fin:
if six.PY2:
vocab = pickle.load(fin)
else:
vocab = pickle.load(fin, encoding='bytes')
logger.info('vocab size is {} and embed dim is {}'.format(vocab.size(
), vocab.embed_dim))
brc_data = BRCDataset(args.max_p_num, args.max_p_len, args.max_q_len,
args.trainset, args.devset)
logger.info('Converting text into ids...')
brc_data.convert_to_ids(vocab)
logger.info('Initialize the model...')
# build model
main_program = fluid.Program()
startup_prog = fluid.Program()
main_program.random_seed = args.random_seed
startup_prog.random_seed = args.random_seed
with fluid.program_guard(main_program, startup_prog):
with fluid.unique_name.guard():
avg_cost, s_probs, e_probs, feed_order = rc_model.rc_model(
args.hidden_size, vocab, args)
# clone from default main program and use it as the validation program
inference_program = main_program.clone(for_test=True)
# build optimizer
if args.optim == 'sgd':
optimizer = fluid.optimizer.SGD(
learning_rate=args.learning_rate,
regularization=fluid.regularizer.L2DecayRegularizer(
regularization_coeff=args.weight_decay))
elif args.optim == 'adam':
optimizer = fluid.optimizer.Adam(
learning_rate=args.learning_rate,
regularization=fluid.regularizer.L2DecayRegularizer(
regularization_coeff=args.weight_decay))
elif args.optim == 'rprop':
optimizer = fluid.optimizer.RMSPropOptimizer(
learning_rate=args.learning_rate,
regularization=fluid.regularizer.L2DecayRegularizer(
regularization_coeff=args.weight_decay))
else:
logger.error('Unsupported optimizer: {}'.format(args.optim))
exit(-1)
optimizer.minimize(avg_cost)
# initialize parameters
place = core.CUDAPlace(0) if args.use_gpu else core.CPUPlace()
exe = Executor(place)
if args.load_dir:
logger.info('load from {}'.format(args.load_dir))
fluid.io.load_persistables(
exe, args.load_dir, main_program=main_program)
else:
exe.run(startup_prog)
embedding_para = fluid.global_scope().find_var(
'embedding_para').get_tensor()
embedding_para.set(vocab.embeddings.astype(np.float32), place)
# prepare data
feed_list = [
main_program.global_block().var(var_name)
for var_name in feed_order
]
feeder = fluid.DataFeeder(feed_list, place)
logger.info('Training the model...')
parallel_executor = fluid.ParallelExecutor(
main_program=main_program,
use_cuda=bool(args.use_gpu),
loss_name=avg_cost.name)
print_para(main_program, parallel_executor, logger, args)
for pass_id in range(1, args.pass_num + 1):
pass_start_time = time.time()
pad_id = vocab.get_id(vocab.pad_token)
train_batches = brc_data.gen_mini_batches(
'train', args.batch_size, pad_id, shuffle=True)
log_every_n_batch, n_batch_loss = args.log_interval, 0
total_num, total_loss = 0, 0
for batch_id, batch in enumerate(train_batches, 1):
input_data_dict = prepare_batch_input(batch, args)
fetch_outs = parallel_executor.run(
feed=feeder.feed(input_data_dict),
fetch_list=[avg_cost.name],
return_numpy=False)
cost_train = np.array(fetch_outs[0])[0]
total_num += len(batch['raw_data'])
n_batch_loss += cost_train
total_loss += cost_train * len(batch['raw_data'])
if log_every_n_batch > 0 and batch_id % log_every_n_batch == 0:
print_para(main_program, parallel_executor, logger,
args)
logger.info(
'Average loss from batch {} to {} is {}'.format(
batch_id - log_every_n_batch + 1, batch_id,
"%.10f" % (n_batch_loss / log_every_n_batch)))
n_batch_loss = 0
if args.dev_interval > 0 and batch_id % args.dev_interval == 0:
eval_loss, bleu_rouge = validation(
inference_program, avg_cost, s_probs, e_probs,
feed_order, place, vocab, brc_data, logger, args)
logger.info('Dev eval loss {}'.format(eval_loss))
logger.info('Dev eval result: {}'.format(bleu_rouge))
pass_end_time = time.time()
logger.info('Evaluating the model after epoch {}'.format(
pass_id))
if brc_data.dev_set is not None:
eval_loss, bleu_rouge = validation(
inference_program, avg_cost, s_probs, e_probs,
feed_order, place, vocab, brc_data, logger, args)
logger.info('Dev eval loss {}'.format(eval_loss))
logger.info('Dev eval result: {}'.format(bleu_rouge))
else:
logger.warning(
'No dev set is loaded for evaluation in the dataset!')
time_consumed = pass_end_time - pass_start_time
logger.info('Average train loss for epoch {} is {}'.format(
pass_id, "%.10f" % (1.0 * total_loss / total_num)))
if pass_id % args.save_interval == 0:
model_path = os.path.join(args.save_dir, str(pass_id))
if not os.path.isdir(model_path):
os.makedirs(model_path)
fluid.io.save_persistables(
executor=exe,
dirname=model_path,
main_program=main_program)
def evaluate(logger, args):
logger.info('Load data_set and vocab...')
with open(os.path.join(args.vocab_dir, 'vocab.data'), 'rb') as fin:
vocab = pickle.load(fin)
logger.info('vocab size is {} and embed dim is {}'.format(vocab.size(
), vocab.embed_dim))
brc_data = BRCDataset(
args.max_p_num, args.max_p_len, args.max_q_len, dev_files=args.devset)
logger.info('Converting text into ids...')
brc_data.convert_to_ids(vocab)
logger.info('Initialize the model...')
# build model
main_program = fluid.Program()
startup_prog = fluid.Program()
main_program.random_seed = args.random_seed
startup_prog.random_seed = args.random_seed
with fluid.program_guard(main_program, startup_prog):
with fluid.unique_name.guard():
avg_cost, s_probs, e_probs, feed_order = rc_model.rc_model(
args.hidden_size, vocab, args)
# initialize parameters
place = core.CUDAPlace(0) if args.use_gpu else core.CPUPlace()
exe = Executor(place)
if args.load_dir:
logger.info('load from {}'.format(args.load_dir))
fluid.io.load_persistables(
exe, args.load_dir, main_program=main_program)
else:
logger.error('No model file to load ...')
return
# prepare data
feed_list = [
main_program.global_block().var(var_name)
for var_name in feed_order
]
feeder = fluid.DataFeeder(feed_list, place)
inference_program = main_program.clone(for_test=True)
eval_loss, bleu_rouge = validation(
inference_program, avg_cost, s_probs, e_probs, feed_order,
place, vocab, brc_data, logger, args)
logger.info('Dev eval loss {}'.format(eval_loss))
logger.info('Dev eval result: {}'.format(bleu_rouge))
logger.info('Predicted answers are saved to {}'.format(
os.path.join(args.result_dir)))
def predict(logger, args):
logger.info('Load data_set and vocab...')
with open(os.path.join(args.vocab_dir, 'vocab.data'), 'rb') as fin:
vocab = pickle.load(fin)
logger.info('vocab size is {} and embed dim is {}'.format(vocab.size(
), vocab.embed_dim))
brc_data = BRCDataset(
args.max_p_num, args.max_p_len, args.max_q_len, dev_files=args.testset)
logger.info('Converting text into ids...')
brc_data.convert_to_ids(vocab)
logger.info('Initialize the model...')
# build model
main_program = fluid.Program()
startup_prog = fluid.Program()
main_program.random_seed = args.random_seed
startup_prog.random_seed = args.random_seed
with fluid.program_guard(main_program, startup_prog):
with fluid.unique_name.guard():
avg_cost, s_probs, e_probs, feed_order = rc_model.rc_model(
args.hidden_size, vocab, args)
# initialize parameters
place = core.CUDAPlace(0) if args.use_gpu else core.CPUPlace()
exe = Executor(place)
if args.load_dir:
logger.info('load from {}'.format(args.load_dir))
fluid.io.load_persistables(
exe, args.load_dir, main_program=main_program)
else:
logger.error('No model file to load ...')
return
# prepare data
feed_list = [
main_program.global_block().var(var_name)
for var_name in feed_order
]
feeder = fluid.DataFeeder(feed_list, place)
inference_program = main_program.clone(for_test=True)
eval_loss, bleu_rouge = validation(
inference_program, avg_cost, s_probs, e_probs, feed_order,
place, vocab, brc_data, logger, args)
def prepare(logger, args):
"""
checks data, creates the directories, prepare the vocabulary and embeddings
"""
logger.info('Checking the data files...')
for data_path in args.trainset + args.devset + args.testset:
assert os.path.exists(data_path), '{} file does not exist.'.format(
data_path)
logger.info('Preparing the directories...')
for dir_path in [args.vocab_dir, args.save_dir, args.result_dir]:
if not os.path.exists(dir_path):
os.makedirs(dir_path)
logger.info('Building vocabulary...')
brc_data = BRCDataset(args.max_p_num, args.max_p_len, args.max_q_len,
args.trainset, args.devset, args.testset)
vocab = Vocab(lower=True)
for word in brc_data.word_iter('train'):
vocab.add(word)
unfiltered_vocab_size = vocab.size()
vocab.filter_tokens_by_cnt(min_cnt=2)
filtered_num = unfiltered_vocab_size - vocab.size()
logger.info('After filter {} tokens, the final vocab size is {}'.format(
filtered_num, vocab.size()))
logger.info('Assigning embeddings...')
vocab.randomly_init_embeddings(args.embed_size)
logger.info('Saving vocab...')
with open(os.path.join(args.vocab_dir, 'vocab.data'), 'wb') as fout:
pickle.dump(vocab, fout)
logger.info('Done with preparing!')
if __name__ == '__main__':
args = parse_args()
random.seed(args.random_seed)
np.random.seed(args.random_seed)
logger = logging.getLogger("brc")
logger.setLevel(logging.INFO)
formatter = logging.Formatter(
'%(asctime)s - %(name)s - %(levelname)s - %(message)s')
if args.log_path:
file_handler = logging.FileHandler(args.log_path)
file_handler.setLevel(logging.INFO)
file_handler.setFormatter(formatter)
logger.addHandler(file_handler)
else:
console_handler = logging.StreamHandler()
console_handler.setLevel(logging.INFO)
console_handler.setFormatter(formatter)
logger.addHandler(console_handler)
args = parse_args()
logger.info('Running with args : {}'.format(args))
if args.prepare:
prepare(logger, args)
if args.train:
train(logger, args)
if args.evaluate:
evaluate(logger, args)
if args.predict:
predict(logger, args)
export CUDA_VISIBLE_DEVICES=1
python run.py \
--trainset 'data/preprocessed/trainset/search.train.json' \
'data/preprocessed/trainset/zhidao.train.json' \
--devset 'data/preprocessed/devset/search.dev.json' \
'data/preprocessed/devset/zhidao.dev.json' \
--testset 'data/preprocessed/testset/search.test.json' \
'data/preprocessed/testset/zhidao.test.json' \
--vocab_dir 'data/vocab' \
--use_gpu true \
--save_dir ./models \
--pass_num 10 \
--learning_rate 0.001 \
--batch_size 8 \
--embed_size 300 \
--hidden_size 150 \
--max_p_num 5 \
--max_p_len 500 \
--max_q_len 60 \
--max_a_len 200 \
--drop_rate 0.2 $@\
# coding:utf8
# ==============================================================================
# Copyright 2017 Baidu.com, Inc. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""
This package implements some utility functions shared by PaddlePaddle
and Tensorflow model implementations.
Authors: liuyuan(liuyuan04@baidu.com)
Date: 2017/10/06 18:23:06
"""
from .dureader_eval import compute_bleu_rouge
from .dureader_eval import normalize
from .preprocess import find_fake_answer
from .preprocess import find_best_question_match
__all__ = [
'compute_bleu_rouge',
'normalize',
'find_fake_answer',
'find_best_question_match',
]
#!/bin/bash
# ==============================================================================
# Copyright 2017 Baidu.com, Inc. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
# We use Bleu and Rouge as evaluation metrics, the calculation of these metrics
# relies on the scoring scripts under "https://github.com/tylin/coco-caption"
bleu_base_url='https://raw.githubusercontent.com/tylin/coco-caption/master/pycocoevalcap/bleu'
bleu_files=("LICENSE" "__init__.py" "bleu.py" "bleu_scorer.py")
rouge_base_url="https://raw.githubusercontent.com/tylin/coco-caption/master/pycocoevalcap/rouge"
rouge_files=("__init__.py" "rouge.py")
download() {
local metric=$1; shift;
local base_url=$1; shift;
local fnames=($@);
mkdir -p ${metric}
for fname in ${fnames[@]};
do
printf "downloading: %s\n" ${base_url}/${fname}
wget --no-check-certificate ${base_url}/${fname} -O ${metric}/${fname}
done
}
# prepare rouge
download "rouge_metric" ${rouge_base_url} ${rouge_files[@]}
# prepare bleu
download "bleu_metric" ${bleu_base_url} ${bleu_files[@]}
# convert python 2.x source code to python 3.x
2to3 -w "../utils/bleu_metric/bleu_scorer.py"
2to3 -w "../utils/bleu_metric/bleu.py"
# -*- coding:utf8 -*-
# ==============================================================================
# Copyright 2017 Baidu.com, Inc. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""
This module computes evaluation metrics for DuReader dataset.
"""
import argparse
import json
import sys
import zipfile
from collections import Counter
from .bleu_metric.bleu import Bleu
from .rouge_metric.rouge import Rouge
EMPTY = ''
YESNO_LABELS = set(['Yes', 'No', 'Depends'])
def normalize(s):
"""
Normalize strings to space joined chars.
Args:
s: a list of strings.
Returns:
A list of normalized strings.
"""
if not s:
return s
normalized = []
for ss in s:
tokens = [c for c in list(ss) if len(c.strip()) != 0]
normalized.append(' '.join(tokens))
return normalized
def data_check(obj, task):
"""
Check data.
Raises:
Raises AssertionError when data is not legal.
"""
assert 'question_id' in obj, "Missing 'question_id' field."
assert 'question_type' in obj, \
"Missing 'question_type' field. question_id: {}".format(obj['question_type'])
assert 'yesno_answers' in obj, \
"Missing 'yesno_answers' field. question_id: {}".format(obj['question_id'])
assert isinstance(obj['yesno_answers'], list), \
r"""'yesno_answers' field must be a list, if the 'question_type' is not
'YES_NO', then this field should be an empty list.
question_id: {}""".format(obj['question_id'])
assert 'entity_answers' in obj, \
"Missing 'entity_answers' field. question_id: {}".format(obj['question_id'])
assert isinstance(obj['entity_answers'], list) \
and len(obj['entity_answers']) > 0, \
r"""'entity_answers' field must be a list, and has at least one element,
which can be a empty list. question_id: {}""".format(obj['question_id'])
def read_file(file_name, task, is_ref=False):
"""
Read predict answers or reference answers from file.
Args:
file_name: the name of the file containing predict result or reference
result.
Returns:
A dictionary mapping question_id to the result information. The result
information itself is also a dictionary with has four keys:
- question_type: type of the query.
- yesno_answers: A list of yesno answers corresponding to 'answers'.
- answers: A list of predicted answers.
- entity_answers: A list, each element is also a list containing the entities
tagged out from the corresponding answer string.
"""
def _open(file_name, mode, zip_obj=None):
if zip_obj is not None:
return zip_obj.open(file_name, mode)
return open(file_name, mode)
results = {}
keys = ['answers', 'yesno_answers', 'entity_answers', 'question_type']
if is_ref:
keys += ['source']
zf = zipfile.ZipFile(file_name, 'r') if file_name.endswith('.zip') else None
file_list = [file_name] if zf is None else zf.namelist()
for fn in file_list:
for line in _open(fn, 'r', zip_obj=zf):
try:
obj = json.loads(line.strip())
except ValueError:
raise ValueError("Every line of data should be legal json")
data_check(obj, task)
qid = obj['question_id']
assert qid not in results, "Duplicate question_id: {}".format(qid)
results[qid] = {}
for k in keys:
results[qid][k] = obj[k]
return results
def compute_bleu_rouge(pred_dict, ref_dict, bleu_order=4):
"""
Compute bleu and rouge scores.
"""
assert set(pred_dict.keys()) == set(ref_dict.keys()), \
"missing keys: {}".format(set(ref_dict.keys()) - set(pred_dict.keys()))
scores = {}
bleu_scores, _ = Bleu(bleu_order).compute_score(ref_dict, pred_dict)
for i, bleu_score in enumerate(bleu_scores):
scores['Bleu-%d' % (i + 1)] = bleu_score
rouge_score, _ = Rouge().compute_score(ref_dict, pred_dict)
scores['Rouge-L'] = rouge_score
return scores
def local_prf(pred_list, ref_list):
"""
Compute local precision recall and f1-score,
given only one prediction list and one reference list
"""
common = Counter(pred_list) & Counter(ref_list)
num_same = sum(common.values())
if num_same == 0:
return 0, 0, 0
p = 1.0 * num_same / len(pred_list)
r = 1.0 * num_same / len(ref_list)
f1 = (2 * p * r) / (p + r)
return p, r, f1
def compute_prf(pred_dict, ref_dict):
"""
Compute precision recall and f1-score.
"""
pred_question_ids = set(pred_dict.keys())
ref_question_ids = set(ref_dict.keys())
correct_preds, total_correct, total_preds = 0, 0, 0
for question_id in ref_question_ids:
pred_entity_list = pred_dict.get(question_id, [[]])
assert len(pred_entity_list) == 1, \
'the number of entity list for question_id {} is not 1.'.format(question_id)
pred_entity_list = pred_entity_list[0]
all_ref_entity_lists = ref_dict[question_id]
best_local_f1 = 0
best_ref_entity_list = None
for ref_entity_list in all_ref_entity_lists:
local_f1 = local_prf(pred_entity_list, ref_entity_list)[2]
if local_f1 > best_local_f1:
best_ref_entity_list = ref_entity_list
best_local_f1 = local_f1
if best_ref_entity_list is None:
if len(all_ref_entity_lists) > 0:
best_ref_entity_list = sorted(
all_ref_entity_lists, key=lambda x: len(x))[0]
else:
best_ref_entity_list = []
gold_entities = set(best_ref_entity_list)
pred_entities = set(pred_entity_list)
correct_preds += len(gold_entities & pred_entities)
total_preds += len(pred_entities)
total_correct += len(gold_entities)
p = float(correct_preds) / total_preds if correct_preds > 0 else 0
r = float(correct_preds) / total_correct if correct_preds > 0 else 0
f1 = 2 * p * r / (p + r) if correct_preds > 0 else 0
return {'Precision': p, 'Recall': r, 'F1': f1}
def prepare_prf(pred_dict, ref_dict):
"""
Prepares data for calculation of prf scores.
"""
preds = {k: v['entity_answers'] for k, v in pred_dict.items()}
refs = {k: v['entity_answers'] for k, v in ref_dict.items()}
return preds, refs
def filter_dict(result_dict, key_tag):
"""
Filter a subset of the result_dict, where keys ends with 'key_tag'.
"""
filtered = {}
for k, v in result_dict.items():
if k.endswith(key_tag):
filtered[k] = v
return filtered
def get_metrics(pred_result, ref_result, task, source):
"""
Computes metrics.
"""
metrics = {}
ref_result_filtered = {}
pred_result_filtered = {}
if source == 'both':
ref_result_filtered = ref_result
pred_result_filtered = pred_result
else:
for question_id, info in ref_result.items():
if info['source'] == source:
ref_result_filtered[question_id] = info
if question_id in pred_result:
pred_result_filtered[question_id] = pred_result[question_id]
if task == 'main' or task == 'all' \
or task == 'description':
pred_dict, ref_dict = prepare_bleu(pred_result_filtered,
ref_result_filtered, task)
metrics = compute_bleu_rouge(pred_dict, ref_dict)
elif task == 'yesno':
pred_dict, ref_dict = prepare_bleu(pred_result_filtered,
ref_result_filtered, task)
keys = ['Yes', 'No', 'Depends']
preds = [filter_dict(pred_dict, k) for k in keys]
refs = [filter_dict(ref_dict, k) for k in keys]
metrics = compute_bleu_rouge(pred_dict, ref_dict)
for k, pred, ref in zip(keys, preds, refs):
m = compute_bleu_rouge(pred, ref)
k_metric = [(k + '|' + key, v) for key, v in m.items()]
metrics.update(k_metric)
elif task == 'entity':
pred_dict, ref_dict = prepare_prf(pred_result_filtered,
ref_result_filtered)
pred_dict_bleu, ref_dict_bleu = prepare_bleu(pred_result_filtered,
ref_result_filtered, task)
metrics = compute_prf(pred_dict, ref_dict)
metrics.update(compute_bleu_rouge(pred_dict_bleu, ref_dict_bleu))
else:
raise ValueError("Illegal task name: {}".format(task))
return metrics
def prepare_bleu(pred_result, ref_result, task):
"""
Prepares data for calculation of bleu and rouge scores.
"""
pred_list, ref_list = [], []
qids = ref_result.keys()
for qid in qids:
if task == 'main':
pred, ref = get_main_result(qid, pred_result, ref_result)
elif task == 'yesno':
pred, ref = get_yesno_result(qid, pred_result, ref_result)
elif task == 'all':
pred, ref = get_all_result(qid, pred_result, ref_result)
elif task == 'entity':
pred, ref = get_entity_result(qid, pred_result, ref_result)
elif task == 'description':
pred, ref = get_desc_result(qid, pred_result, ref_result)
else:
raise ValueError("Illegal task name: {}".format(task))
if pred and ref:
pred_list += pred
ref_list += ref
pred_dict = dict(pred_list)
ref_dict = dict(ref_list)
for qid, ans in ref_dict.items():
ref_dict[qid] = normalize(ref_dict[qid])
pred_dict[qid] = normalize(pred_dict.get(qid, [EMPTY]))
if not ans or ans == [EMPTY]:
del ref_dict[qid]
del pred_dict[qid]
for k, v in pred_dict.items():
assert len(v) == 1, \
"There should be only one predict answer. question_id: {}".format(k)
return pred_dict, ref_dict
def get_main_result(qid, pred_result, ref_result):
"""
Prepare answers for task 'main'.
Args:
qid: question_id.
pred_result: A dict include all question_id's result information read
from args.pred_file.
ref_result: A dict incluce all question_id's result information read
from args.ref_file.
Returns:
Two lists, the first one contains predict result, the second
one contains reference result of the same question_id. Each list has
elements of tuple (question_id, answers), 'answers' is a list of strings.
"""
ref_ans = ref_result[qid]['answers']
if not ref_ans:
ref_ans = [EMPTY]
pred_ans = pred_result.get(qid, {}).get('answers', [])[:1]
if not pred_ans:
pred_ans = [EMPTY]
return [(qid, pred_ans)], [(qid, ref_ans)]
def get_entity_result(qid, pred_result, ref_result):
"""
Prepare answers for task 'entity'.
Args:
qid: question_id.
pred_result: A dict include all question_id's result information read
from args.pred_file.
ref_result: A dict incluce all question_id's result information read
from args.ref_file.
Returns:
Two lists, the first one contains predict result, the second
one contains reference result of the same question_id. Each list has
elements of tuple (question_id, answers), 'answers' is a list of strings.
"""
if ref_result[qid]['question_type'] != 'ENTITY':
return None, None
return get_main_result(qid, pred_result, ref_result)
def get_desc_result(qid, pred_result, ref_result):
"""
Prepare answers for task 'description'.
Args:
qid: question_id.
pred_result: A dict include all question_id's result information read
from args.pred_file.
ref_result: A dict incluce all question_id's result information read
from args.ref_file.
Returns:
Two lists, the first one contains predict result, the second
one contains reference result of the same question_id. Each list has
elements of tuple (question_id, answers), 'answers' is a list of strings.
"""
if ref_result[qid]['question_type'] != 'DESCRIPTION':
return None, None
return get_main_result(qid, pred_result, ref_result)
def get_yesno_result(qid, pred_result, ref_result):
"""
Prepare answers for task 'yesno'.
Args:
qid: question_id.
pred_result: A dict include all question_id's result information read
from args.pred_file.
ref_result: A dict incluce all question_id's result information read
from args.ref_file.
Returns:
Two lists, the first one contains predict result, the second
one contains reference result of the same question_id. Each list has
elements of tuple (question_id, answers), 'answers' is a list of strings.
"""
def _uniq(li, is_ref):
uniq_li = []
left = []
keys = set()
for k, v in li:
if k not in keys:
uniq_li.append((k, v))
keys.add(k)
else:
left.append((k, v))
if is_ref:
dict_li = dict(uniq_li)
for k, v in left:
dict_li[k] += v
uniq_li = [(k, v) for k, v in dict_li.items()]
return uniq_li
def _expand_result(uniq_li):
expanded = uniq_li[:]
keys = set([x[0] for x in uniq_li])
for k in YESNO_LABELS - keys:
expanded.append((k, [EMPTY]))
return expanded
def _get_yesno_ans(qid, result_dict, is_ref=False):
if qid not in result_dict:
return [(str(qid) + '_' + k, v) for k, v in _expand_result([])]
yesno_answers = result_dict[qid]['yesno_answers']
answers = result_dict[qid]['answers']
lbl_ans = _uniq([(k, [v]) for k, v in zip(yesno_answers, answers)],
is_ref)
ret = [(str(qid) + '_' + k, v) for k, v in _expand_result(lbl_ans)]
return ret
if ref_result[qid]['question_type'] != 'YES_NO':
return None, None
ref_ans = _get_yesno_ans(qid, ref_result, is_ref=True)
pred_ans = _get_yesno_ans(qid, pred_result)
return pred_ans, ref_ans
def get_all_result(qid, pred_result, ref_result):
"""
Prepare answers for task 'all'.
Args:
qid: question_id.
pred_result: A dict include all question_id's result information read
from args.pred_file.
ref_result: A dict incluce all question_id's result information read
from args.ref_file.
Returns:
Two lists, the first one contains predict result, the second
one contains reference result of the same question_id. Each list has
elements of tuple (question_id, answers), 'answers' is a list of strings.
"""
if ref_result[qid]['question_type'] == 'YES_NO':
return get_yesno_result(qid, pred_result, ref_result)
return get_main_result(qid, pred_result, ref_result)
def format_metrics(metrics, task, err_msg):
"""
Format metrics. 'err' field returns any error occured during evaluation.
Args:
metrics: A dict object contains metrics for different tasks.
task: Task name.
err_msg: Exception raised during evaluation.
Returns:
Formatted result.
"""
result = {}
sources = ["both", "search", "zhidao"]
if err_msg is not None:
return {'errorMsg': str(err_msg), 'errorCode': 1, 'data': []}
data = []
if task != 'all' and task != 'main':
sources = ["both"]
if task == 'entity':
metric_names = ["Bleu-4", "Rouge-L"]
metric_names_prf = ["F1", "Precision", "Recall"]
for name in metric_names + metric_names_prf:
for src in sources:
obj = {
"name": name,
"value": round(metrics[src].get(name, 0) * 100, 2),
"type": src,
}
data.append(obj)
elif task == 'yesno':
metric_names = ["Bleu-4", "Rouge-L"]
details = ["Yes", "No", "Depends"]
src = sources[0]
for name in metric_names:
obj = {
"name": name,
"value": round(metrics[src].get(name, 0) * 100, 2),
"type": 'All',
}
data.append(obj)
for d in details:
obj = {
"name": name,
"value": \
round(metrics[src].get(d + '|' + name, 0) * 100, 2),
"type": d,
}
data.append(obj)
else:
metric_names = ["Bleu-4", "Rouge-L"]
for name in metric_names:
for src in sources:
obj = {
"name": name,
"value": \
round(metrics[src].get(name, 0) * 100, 2),
"type": src,
}
data.append(obj)
result["data"] = data
result["errorCode"] = 0
result["errorMsg"] = "success"
return result
def main(args):
"""
Do evaluation.
"""
err = None
metrics = {}
try:
pred_result = read_file(args.pred_file, args.task)
ref_result = read_file(args.ref_file, args.task, is_ref=True)
sources = ['both', 'search', 'zhidao']
if args.task not in set(['main', 'all']):
sources = sources[:1]
for source in sources:
metrics[source] = get_metrics(pred_result, ref_result, args.task,
source)
except ValueError as ve:
err = ve
except AssertionError as ae:
err = ae
print(json.dumps(
format_metrics(metrics, args.task, err), ensure_ascii=False).encode(
'utf8'))
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('pred_file', help='predict file')
parser.add_argument('ref_file', help='reference file')
parser.add_argument(
'task', help='task name: Main|Yes_No|All|Entity|Description')
args = parser.parse_args()
args.task = args.task.lower().replace('_', '')
main(args)
# -*- coding:utf8 -*-
# ==============================================================================
# Copyright 2017 Baidu.com, Inc. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""
Utility function to generate vocabulary file.
"""
import argparse
import sys
import json
from itertools import chain
def get_vocab(files, vocab_file):
"""
Builds vocabulary file from field 'segmented_paragraphs'
and 'segmented_question'.
Args:
files: A list of file names.
vocab_file: The file that stores the vocabulary.
"""
vocab = {}
for f in files:
with open(f, 'r') as fin:
for line in fin:
obj = json.loads(line.strip())
paras = [
chain(*d['segmented_paragraphs']) for d in obj['documents']
]
doc_tokens = chain(*paras)
question_tokens = obj['segmented_question']
for t in list(doc_tokens) + question_tokens:
vocab[t] = vocab.get(t, 0) + 1
# output
sorted_vocab = sorted(
[(v, c) for v, c in vocab.items()], key=lambda x: x[1], reverse=True)
with open(vocab_file, 'w') as outf:
for w, c in sorted_vocab:
print >> outf, '{}\t{}'.format(w.encode('utf8'), c)
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument(
'--files',
nargs='+',
required=True,
help='file list to count vocab from.')
parser.add_argument(
'--vocab', required=True, help='file to store counted vocab.')
args = parser.parse_args()
get_vocab(args.files, args.vocab)
#coding=utf8
import os, sys, json
import nltk
def _nltk_tokenize(sequence):
tokens = nltk.word_tokenize(sequence)
cur_char_offset = 0
token_offsets = []
token_words = []
for token in tokens:
cur_char_offset = sequence.find(token, cur_char_offset)
token_offsets.append(
[cur_char_offset, cur_char_offset + len(token) - 1])
token_words.append(token)
return token_offsets, token_words
def segment(input_js):
_, input_js['segmented_question'] = _nltk_tokenize(input_js['question'])
for doc_id, doc in enumerate(input_js['documents']):
doc['segmented_title'] = []
doc['segmented_paragraphs'] = []
for para_id, para in enumerate(doc['paragraphs']):
_, seg_para = _nltk_tokenize(para)
doc['segmented_paragraphs'].append(seg_para)
if 'answers' in input_js:
input_js['segmented_answers'] = []
for answer_id, answer in enumerate(input_js['answers']):
_, seg_answer = _nltk_tokenize(answer)
input_js['segmented_answers'].append(seg_answer)
if __name__ == '__main__':
if len(sys.argv) != 2:
print('Usage: tokenize_data.py <input_path>')
exit()
nltk.download('punkt')
for line in open(sys.argv[1]):
dureader_js = json.loads(line.strip())
segment(dureader_js)
print(json.dumps(dureader_js))
#coding=utf8
import sys
import json
import pandas as pd
def trans(input_js):
output_js = {}
output_js['question'] = input_js['query']
output_js['question_type'] = input_js['query_type']
output_js['question_id'] = input_js['query_id']
output_js['fact_or_opinion'] = ""
output_js['documents'] = []
for para_id, para in enumerate(input_js['passages']):
doc = {}
doc['title'] = ""
if 'is_selected' in para:
doc['is_selected'] = True if para['is_selected'] != 0 else False
doc['paragraphs'] = [para['passage_text']]
output_js['documents'].append(doc)
if 'answers' in input_js:
output_js['answers'] = input_js['answers']
return output_js
if __name__ == '__main__':
if len(sys.argv) != 2:
print('Usage: marcov1_to_dureader.py <input_path>')
exit()
df = pd.read_json(sys.argv[1])
for row in df.iterrows():
marco_js = json.loads(row[1].to_json())
dureader_js = trans(marco_js)
print(json.dumps(dureader_js))
import sys
import json
import pandas as pd
if __name__ == '__main__':
if len(sys.argv) != 3:
print('Usage: tojson.py <input_path> <output_path>')
exit()
infile = sys.argv[1]
outfile = sys.argv[2]
df = pd.read_json(infile)
with open(outfile, 'w') as f:
for row in df.iterrows():
f.write(row[1].to_json() + '\n')
###############################################################################
# ==============================================================================
# Copyright 2017 Baidu.com, Inc. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""
This module finds the most related paragraph of each document according to recall.
"""
import sys
if sys.version[0] == '2':
reload(sys)
sys.setdefaultencoding("utf-8")
import json
from collections import Counter
def precision_recall_f1(prediction, ground_truth):
"""
This function calculates and returns the precision, recall and f1-score
Args:
prediction: prediction string or list to be matched
ground_truth: golden string or list reference
Returns:
floats of (p, r, f1)
Raises:
None
"""
if not isinstance(prediction, list):
prediction_tokens = prediction.split()
else:
prediction_tokens = prediction
if not isinstance(ground_truth, list):
ground_truth_tokens = ground_truth.split()
else:
ground_truth_tokens = ground_truth
common = Counter(prediction_tokens) & Counter(ground_truth_tokens)
num_same = sum(common.values())
if num_same == 0:
return 0, 0, 0
p = 1.0 * num_same / len(prediction_tokens)
r = 1.0 * num_same / len(ground_truth_tokens)
f1 = (2 * p * r) / (p + r)
return p, r, f1
def recall(prediction, ground_truth):
"""
This function calculates and returns the recall
Args:
prediction: prediction string or list to be matched
ground_truth: golden string or list reference
Returns:
floats of recall
Raises:
None
"""
return precision_recall_f1(prediction, ground_truth)[1]
def f1_score(prediction, ground_truth):
"""
This function calculates and returns the f1-score
Args:
prediction: prediction string or list to be matched
ground_truth: golden string or list reference
Returns:
floats of f1
Raises:
None
"""
return precision_recall_f1(prediction, ground_truth)[2]
def metric_max_over_ground_truths(metric_fn, prediction, ground_truths):
"""
This function calculates and returns the precision, recall and f1-score
Args:
metric_fn: metric function pointer which calculates scores according to corresponding logic.
prediction: prediction string or list to be matched
ground_truth: golden string or list reference
Returns:
floats of (p, r, f1)
Raises:
None
"""
scores_for_ground_truths = []
for ground_truth in ground_truths:
score = metric_fn(prediction, ground_truth)
scores_for_ground_truths.append(score)
return max(scores_for_ground_truths)
def find_best_question_match(doc, question, with_score=False):
"""
For each docment, find the paragraph that matches best to the question.
Args:
doc: The document object.
question: The question tokens.
with_score: If True then the match score will be returned,
otherwise False.
Returns:
The index of the best match paragraph, if with_score=False,
otherwise returns a tuple of the index of the best match paragraph
and the match score of that paragraph.
"""
most_related_para = -1
max_related_score = 0
most_related_para_len = 0
for p_idx, para_tokens in enumerate(doc['segmented_paragraphs']):
if len(question) > 0:
related_score = metric_max_over_ground_truths(recall, para_tokens,
question)
else:
related_score = 0
if related_score > max_related_score \
or (related_score == max_related_score \
and len(para_tokens) < most_related_para_len):
most_related_para = p_idx
max_related_score = related_score
most_related_para_len = len(para_tokens)
if most_related_para == -1:
most_related_para = 0
if with_score:
return most_related_para, max_related_score
return most_related_para
def find_fake_answer(sample):
"""
For each document, finds the most related paragraph based on recall,
then finds a span that maximize the f1_score compared with the gold answers
and uses this span as a fake answer span
Args:
sample: a sample in the dataset
Returns:
None
Raises:
None
"""
for doc in sample['documents']:
most_related_para = -1
most_related_para_len = 999999
max_related_score = 0
for p_idx, para_tokens in enumerate(doc['segmented_paragraphs']):
if len(sample['segmented_answers']) > 0:
related_score = metric_max_over_ground_truths(
recall, para_tokens, sample['segmented_answers'])
else:
continue
if related_score > max_related_score \
or (related_score == max_related_score
and len(para_tokens) < most_related_para_len):
most_related_para = p_idx
most_related_para_len = len(para_tokens)
max_related_score = related_score
doc['most_related_para'] = most_related_para
sample['answer_docs'] = []
sample['answer_spans'] = []
sample['fake_answers'] = []
sample['match_scores'] = []
best_match_score = 0
best_match_d_idx, best_match_span = -1, [-1, -1]
best_fake_answer = None
answer_tokens = set()
for segmented_answer in sample['segmented_answers']:
answer_tokens = answer_tokens | set(
[token for token in segmented_answer])
for d_idx, doc in enumerate(sample['documents']):
if not doc['is_selected']:
continue
if doc['most_related_para'] == -1:
doc['most_related_para'] = 0
most_related_para_tokens = doc['segmented_paragraphs'][doc[
'most_related_para']][:1000]
for start_tidx in range(len(most_related_para_tokens)):
if most_related_para_tokens[start_tidx] not in answer_tokens:
continue
for end_tidx in range(
len(most_related_para_tokens) - 1, start_tidx - 1, -1):
span_tokens = most_related_para_tokens[start_tidx:end_tidx + 1]
if len(sample['segmented_answers']) > 0:
match_score = metric_max_over_ground_truths(
f1_score, span_tokens, sample['segmented_answers'])
else:
match_score = 0
if match_score == 0:
break
if match_score > best_match_score:
best_match_d_idx = d_idx
best_match_span = [start_tidx, end_tidx]
best_match_score = match_score
best_fake_answer = ''.join(span_tokens)
if best_match_score > 0:
sample['answer_docs'].append(best_match_d_idx)
sample['answer_spans'].append(best_match_span)
sample['fake_answers'].append(best_fake_answer)
sample['match_scores'].append(best_match_score)
if __name__ == '__main__':
for line in sys.stdin:
sample = json.loads(line)
find_fake_answer(sample)
print(json.dumps(sample, encoding='utf8', ensure_ascii=False))
#!/bin/bash
input_file=$1
output_file=$2
# convert the data from MARCO V2 (json) format to MARCO V1 (jsonl) format.
# the script was forked from MARCO repo.
# the format of MARCO V1 is much more easier to explore.
python3 marcov2_to_v1_tojsonl.py $input_file $input_file.marcov1
# convert the data from MARCO V1 format to DuReader format.
python3 marcov1_to_dureader.py $input_file.marcov1 >$input_file.dureader_raw
# tokenize the data.
python3 marco_tokenize_data.py $input_file.dureader_raw >$input_file.segmented
# find fake answers (indicating the start and end positions of answers in the document) for train and dev sets.
# note that this should not be applied for test set, since there is no ground truth in test set.
python preprocess.py $input_file.segmented >$output_file
# remove the temporal data files.
rm -rf $input_file.dureader_raw $input_file.segmented
# -*- coding:utf8 -*-
# ==============================================================================
# Copyright 2017 Baidu.com, Inc. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""
This module implements the Vocab class for converting string to id and back
"""
import numpy as np
class Vocab(object):
"""
Implements a vocabulary to store the tokens in the data, with their corresponding embeddings.
"""
def __init__(self, filename=None, initial_tokens=None, lower=False):
self.id2token = {}
self.token2id = {}
self.token_cnt = {}
self.lower = lower
self.embed_dim = None
self.embeddings = None
self.pad_token = '<blank>'
self.unk_token = '<unk>'
self.initial_tokens = initial_tokens if initial_tokens is not None else []
self.initial_tokens.extend([self.pad_token, self.unk_token])
for token in self.initial_tokens:
self.add(token)
if filename is not None:
self.load_from_file(filename)
def size(self):
"""
get the size of vocabulary
Returns:
an integer indicating the size
"""
return len(self.id2token)
def load_from_file(self, file_path):
"""
loads the vocab from file_path
Args:
file_path: a file with a word in each line
"""
for line in open(file_path, 'r'):
token = line.rstrip('\n')
self.add(token)
def get_id(self, token):
"""
gets the id of a token, returns the id of unk token if token is not in vocab
Args:
key: a string indicating the word
Returns:
an integer
"""
token = token.lower() if self.lower else token
try:
return self.token2id[token]
except KeyError:
return self.token2id[self.unk_token]
def get_token(self, idx):
"""
gets the token corresponding to idx, returns unk token if idx is not in vocab
Args:
idx: an integer
returns:
a token string
"""
try:
return self.id2token[idx]
except KeyError:
return self.unk_token
def add(self, token, cnt=1):
"""
adds the token to vocab
Args:
token: a string
cnt: a num indicating the count of the token to add, default is 1
"""
token = token.lower() if self.lower else token
if token in self.token2id:
idx = self.token2id[token]
else:
idx = len(self.id2token)
self.id2token[idx] = token
self.token2id[token] = idx
if cnt > 0:
if token in self.token_cnt:
self.token_cnt[token] += cnt
else:
self.token_cnt[token] = cnt
return idx
def filter_tokens_by_cnt(self, min_cnt):
"""
filter the tokens in vocab by their count
Args:
min_cnt: tokens with frequency less than min_cnt is filtered
"""
filtered_tokens = [
token for token in self.token2id if self.token_cnt[token] >= min_cnt
]
# rebuild the token x id map
self.token2id = {}
self.id2token = {}
for token in self.initial_tokens:
self.add(token, cnt=0)
for token in filtered_tokens:
self.add(token, cnt=0)
def randomly_init_embeddings(self, embed_dim):
"""
randomly initializes the embeddings for each token
Args:
embed_dim: the size of the embedding for each token
"""
self.embed_dim = embed_dim
self.embeddings = np.random.rand(self.size(), embed_dim)
for token in [self.pad_token, self.unk_token]:
self.embeddings[self.get_id(token)] = np.zeros([self.embed_dim])
def load_pretrained_embeddings(self, embedding_path):
"""
loads the pretrained embeddings from embedding_path,
tokens not in pretrained embeddings will be filtered
Args:
embedding_path: the path of the pretrained embedding file
"""
trained_embeddings = {}
with open(embedding_path, 'r') as fin:
for line in fin:
contents = line.strip().split()
token = contents[0].decode('utf8')
if token not in self.token2id:
continue
trained_embeddings[token] = list(map(float, contents[1:]))
if self.embed_dim is None:
self.embed_dim = len(contents) - 1
filtered_tokens = trained_embeddings.keys()
# rebuild the token x id map
self.token2id = {}
self.id2token = {}
for token in self.initial_tokens:
self.add(token, cnt=0)
for token in filtered_tokens:
self.add(token, cnt=0)
# load embeddings
self.embeddings = np.zeros([self.size(), self.embed_dim])
for token in self.token2id.keys():
if token in trained_embeddings:
self.embeddings[self.get_id(token)] = trained_embeddings[token]
def convert_to_ids(self, tokens):
"""
Convert a list of tokens to ids, use unk_token if the token is not in vocab.
Args:
tokens: a list of token
Returns:
a list of ids
"""
vec = [self.get_id(label) for label in tokens]
return vec
def recover_from_ids(self, ids, stop_id=None):
"""
Convert a list of ids to tokens, stop converting if the stop_id is encountered
Args:
ids: a list of ids to convert
stop_id: the stop id, default is None
Returns:
a list of tokens
"""
tokens = []
for i in ids:
tokens += [self.get_token(i)]
if stop_id is not None and i == stop_id:
break
return tokens
import os import os
import math import math
import random import random
import cPickle
import functools import functools
import numpy as np import numpy as np
#import paddle.v2 as paddle
import paddle import paddle
from PIL import Image, ImageEnhance from PIL import Image, ImageEnhance
...@@ -45,9 +43,9 @@ for i, item in enumerate(test_list): ...@@ -45,9 +43,9 @@ for i, item in enumerate(test_list):
test_data[label] = [] test_data[label] = []
test_data[label].append(path) test_data[label].append(path)
print "train_data size:", len(train_data) print("train_data size:", len(train_data))
print "test_data size:", len(test_data) print("test_data size:", len(test_data))
print "test_data image number:", len(test_image_list) print("test_data image number:", len(test_image_list))
random.shuffle(test_image_list) random.shuffle(test_image_list)
...@@ -214,11 +212,11 @@ def eml_iterator(data, ...@@ -214,11 +212,11 @@ def eml_iterator(data,
color_jitter=False, color_jitter=False,
rotate=False): rotate=False):
def reader(): def reader():
labs = data.keys() labs = list(data.keys())
lab_num = len(labs) lab_num = len(labs)
ind = range(0, lab_num) ind = list(range(0, lab_num))
assert batch_size % samples_each_class == 0, "batch_size % samples_each_class != 0" assert batch_size % samples_each_class == 0, "batch_size % samples_each_class != 0"
num_class = batch_size/samples_each_class num_class = batch_size // samples_each_class
for i in range(iter_size): for i in range(iter_size):
random.shuffle(ind) random.shuffle(ind)
for n in range(num_class): for n in range(num_class):
...@@ -245,9 +243,9 @@ def quadruplet_iterator(data, ...@@ -245,9 +243,9 @@ def quadruplet_iterator(data,
color_jitter=False, color_jitter=False,
rotate=False): rotate=False):
def reader(): def reader():
labs = data.keys() labs = list(data.keys())
lab_num = len(labs) lab_num = len(labs)
ind = range(0, lab_num) ind = list(range(0, lab_num))
for i in range(iter_size): for i in range(iter_size):
random.shuffle(ind) random.shuffle(ind)
ind_sample = ind[:class_num] ind_sample = ind[:class_num]
...@@ -255,7 +253,7 @@ def quadruplet_iterator(data, ...@@ -255,7 +253,7 @@ def quadruplet_iterator(data,
for ind_i in ind_sample: for ind_i in ind_sample:
lab = labs[ind_i] lab = labs[ind_i]
data_list = data[lab] data_list = data[lab]
data_ind = range(0, len(data_list)) data_ind = list(range(0, len(data_list)))
random.shuffle(data_ind) random.shuffle(data_ind)
anchor_ind = data_ind[:samples_each_class] anchor_ind = data_ind[:samples_each_class]
...@@ -277,15 +275,15 @@ def triplet_iterator(data, ...@@ -277,15 +275,15 @@ def triplet_iterator(data,
color_jitter=False, color_jitter=False,
rotate=False): rotate=False):
def reader(): def reader():
labs = data.keys() labs = list(data.keys())
lab_num = len(labs) lab_num = len(labs)
ind = range(0, lab_num) ind = list(range(0, lab_num))
for i in range(iter_size): for i in range(iter_size):
random.shuffle(ind) random.shuffle(ind)
ind_pos, ind_neg = ind[:2] ind_pos, ind_neg = ind[:2]
lab_pos = labs[ind_pos] lab_pos = labs[ind_pos]
pos_data_list = data[lab_pos] pos_data_list = data[lab_pos]
data_ind = range(0, len(pos_data_list)) data_ind = list(range(0, len(pos_data_list)))
random.shuffle(data_ind) random.shuffle(data_ind)
anchor_ind, pos_ind = data_ind[:2] anchor_ind, pos_ind = data_ind[:2]
...@@ -346,7 +344,7 @@ def quadruplet_train(class_num, samples_each_class): ...@@ -346,7 +344,7 @@ def quadruplet_train(class_num, samples_each_class):
def triplet_train(batch_size): def triplet_train(batch_size):
assert(batch_size % 3 == 0) assert(batch_size % 3 == 0)
return triplet_iterator(train_data, 'train', batch_size, iter_size = batch_size/3 * 100, \ return triplet_iterator(train_data, 'train', batch_size, iter_size = batch_size//3 * 100, \
shuffle=True, color_jitter=False, rotate=False) shuffle=True, color_jitter=False, rotate=False)
def test(): def test():
......
import datareader as reader
import math import math
import numpy as np import numpy as np
import paddle.fluid as fluid import paddle.fluid as fluid
from metrics import calculate_order_dist_matrix from . import datareader as reader
from metrics import get_gpu_num from .metrics import calculate_order_dist_matrix
from .metrics import get_gpu_num
class emlloss(): class emlloss():
def __init__(self, train_batch_size = 40, samples_each_class=2): def __init__(self, train_batch_size = 40, samples_each_class=2):
...@@ -11,9 +11,9 @@ class emlloss(): ...@@ -11,9 +11,9 @@ class emlloss():
self.samples_each_class = samples_each_class self.samples_each_class = samples_each_class
self.train_batch_size = train_batch_size self.train_batch_size = train_batch_size
assert(train_batch_size % num_gpus == 0) assert(train_batch_size % num_gpus == 0)
self.cal_loss_batch_size = train_batch_size / num_gpus self.cal_loss_batch_size = train_batch_size // num_gpus
assert(self.cal_loss_batch_size % samples_each_class == 0) assert(self.cal_loss_batch_size % samples_each_class == 0)
class_num = train_batch_size / samples_each_class class_num = train_batch_size // samples_each_class
self.train_reader = reader.eml_train(train_batch_size, samples_each_class) self.train_reader = reader.eml_train(train_batch_size, samples_each_class)
self.test_reader = reader.test() self.test_reader = reader.test()
......
...@@ -20,12 +20,14 @@ def recall_topk(fea, lab, k = 1): ...@@ -20,12 +20,14 @@ def recall_topk(fea, lab, k = 1):
import subprocess import subprocess
import os import os
def get_gpu_num(): def get_gpu_num():
visibledevice = os.getenv('CUDA_VISIBLE_DEVICES') visibledevice = os.getenv('CUDA_VISIBLE_DEVICES')
if visibledevice: if visibledevice:
devicenum = len(visibledevice.split(',')) devicenum = len(visibledevice.split(','))
else: else:
devicenum = subprocess.check_output(['nvidia-smi', '-L']).count('\n') devicenum = subprocess.check_output(
[str.encode('nvidia-smi'), str.encode('-L')]).decode('utf-8').count('\n')
return devicenum return devicenum
import paddle as paddle import paddle as paddle
......
import numpy as np import numpy as np
import datareader as reader
import paddle.fluid as fluid import paddle.fluid as fluid
from metrics import calculate_order_dist_matrix from . import datareader as reader
from metrics import get_gpu_num from .metrics import calculate_order_dist_matrix
from .metrics import get_gpu_num
class quadrupletloss(): class quadrupletloss():
def __init__(self, def __init__(self,
...@@ -14,9 +14,9 @@ class quadrupletloss(): ...@@ -14,9 +14,9 @@ class quadrupletloss():
self.samples_each_class = samples_each_class self.samples_each_class = samples_each_class
self.train_batch_size = train_batch_size self.train_batch_size = train_batch_size
assert(train_batch_size % num_gpus == 0) assert(train_batch_size % num_gpus == 0)
self.cal_loss_batch_size = train_batch_size / num_gpus self.cal_loss_batch_size = train_batch_size // num_gpus
assert(self.cal_loss_batch_size % samples_each_class == 0) assert(self.cal_loss_batch_size % samples_each_class == 0)
class_num = train_batch_size / samples_each_class class_num = train_batch_size // samples_each_class
self.train_reader = reader.quadruplet_train(class_num, samples_each_class) self.train_reader = reader.quadruplet_train(class_num, samples_each_class)
self.test_reader = reader.test() self.test_reader = reader.test()
......
import datareader as reader from . import datareader as reader
import paddle.fluid as fluid import paddle.fluid as fluid
class tripletloss(): class tripletloss():
......
...@@ -75,7 +75,7 @@ class ResNet(): ...@@ -75,7 +75,7 @@ class ResNet():
num_filters=num_filters, num_filters=num_filters,
filter_size=filter_size, filter_size=filter_size,
stride=stride, stride=stride,
padding=(filter_size - 1) / 2, padding=(filter_size - 1) // 2,
groups=groups, groups=groups,
act=None, act=None,
bias_attr=False) bias_attr=False)
......
...@@ -127,7 +127,7 @@ class SE_ResNeXt(): ...@@ -127,7 +127,7 @@ class SE_ResNeXt():
num_filters=num_filters, num_filters=num_filters,
filter_size=filter_size, filter_size=filter_size,
stride=stride, stride=stride,
padding=(filter_size - 1) / 2, padding=(filter_size - 1) // 2,
groups=groups, groups=groups,
act=None, act=None,
bias_attr=False) bias_attr=False)
......
...@@ -93,7 +93,7 @@ def train(args): ...@@ -93,7 +93,7 @@ def train(args):
elif loss_name == "emlloss": elif loss_name == "emlloss":
metricloss = emlloss( metricloss = emlloss(
train_batch_size = args.train_batch_size, train_batch_size = args.train_batch_size,
samples_each_class=2 samples_each_class = args.samples_each_class
) )
cost_metric = metricloss.loss(out[0]) cost_metric = metricloss.loss(out[0])
avg_cost_metric = fluid.layers.mean(x=cost_metric) avg_cost_metric = fluid.layers.mean(x=cost_metric)
......
...@@ -17,6 +17,7 @@ from __future__ import absolute_import ...@@ -17,6 +17,7 @@ from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
import distutils.util import distutils.util
import six
import numpy as np import numpy as np
from paddle.fluid import core from paddle.fluid import core
...@@ -37,7 +38,7 @@ def print_arguments(args): ...@@ -37,7 +38,7 @@ def print_arguments(args):
:type args: argparse.Namespace :type args: argparse.Namespace
""" """
print("----------- Configuration Arguments -----------") print("----------- Configuration Arguments -----------")
for arg, value in sorted(vars(args).iteritems()): for arg, value in sorted(six.iteritems(vars(args))):
print("%s: %s" % (arg, value)) print("%s: %s" % (arg, value))
print("------------------------------------------------") print("------------------------------------------------")
......
运行本目录下的范例模型需要安装PaddlePaddle Fluid 1.0版。如果您的 PaddlePaddle 安装版本低于此要求,请按照[安装文档](http://www.paddlepaddle.org/docs/develop/documentation/zh/build_and_install/pip_install_cn.html)中的说明更新 PaddlePaddle 安装版本。
# 机器翻译:RNN Search
以下是本范例模型的简要目录结构及说明:
```text
.
├── README.md # 文档,本文件
├── args.py # 训练、预测以及模型参数
├── train.py # 训练主程序
├── infer.py # 预测主程序
├── attention_model.py # 带注意力机制的翻译模型配置
└── no_attention_model.py # 无注意力机制的翻译模型配置
```
## 简介
机器翻译(machine translation, MT)是用计算机来实现不同语言之间翻译的技术。被翻译的语言通常称为源语言(source language),翻译成的结果语言称为目标语言(target language)。机器翻译即实现从源语言到目标语言转换的过程,是自然语言处理的重要研究领域之一。
近年来,深度学习技术的发展不断为机器翻译任务带来新的突破。直接用神经网络将源语言映射到目标语言,即端到端的神经网络机器翻译(End-to-End Neural Machine Translation, End-to-End NMT)模型逐渐成为主流,此类模型一般简称为NMT模型。
本目录包含一个经典的机器翻译模型[RNN Search](https://arxiv.org/pdf/1409.0473.pdf)的Paddle Fluid实现。事实上,RNN search是一个较为传统的NMT模型,在现阶段,其表现已被很多新模型(如[Transformer](https://arxiv.org/abs/1706.03762))超越。但除机器翻译外,该模型是许多序列到序列(sequence to sequence, 以下简称Seq2Seq)类模型的基础,很多解决其他NLP问题的模型均以此模型为基础;因此其在NLP领域具有重要意义,并被广泛用作Baseline.
本目录下此范例模型的实现,旨在展示如何用Paddle Fluid实现一个带有注意力机制(Attention)的RNN模型来解决Seq2Seq类问题,以及如何使用带有Beam Search算法的解码器。如果您仅仅只是需要在机器翻译方面有着较好翻译效果的模型,则建议您参考[Transformer的Paddle Fluid实现](https://github.com/PaddlePaddle/models/tree/develop/fluid/neural_machine_translation/transformer)
## 模型概览
RNN Search模型使用了经典的编码器-解码器(Encoder-Decoder)的框架结构来解决Seq2Seq类问题。这种方法先用编码器将源序列编码成vector,再用解码器将该vector解码为目标序列。这其实模拟了人类在进行翻译类任务时的行为:先解析源语言,理解其含义,再根据该含义来写出目标语言的语句。编码器和解码器往往都使用RNN来实现。关于此方法的具体原理和数学表达式,可以参考[深度学习101](http://www.paddlepaddle.org/documentation/docs/zh/0.15.0/beginners_guide/basics/machine_translation/index.html).
本模型中,在编码器方面,我们的实现使用了双向循环神经网络(Bi-directional Recurrent Neural Network);在解码器方面,我们使用了带注意力(Attention)机制的RNN解码器,并同时提供了一个不带注意力机制的解码器实现作为对比;而在预测方面我们使用柱搜索(beam search)算法来生成翻译的目标语句。以下将分别介绍用到的这些方法。
### 双向循环神经网络
这里介绍Bengio团队在论文\[[2](#参考文献),[4](#参考文献)\]中提出的一种双向循环网络结构。该结构的目的是输入一个序列,得到其在每个时刻的特征表示,即输出的每个时刻都用定长向量表示到该时刻的上下文语义信息。
具体来说,该双向循环神经网络分别在时间维以顺序和逆序——即前向(forward)和后向(backward)——依次处理输入序列,并将每个时间步RNN的输出拼接成为最终的输出层。这样每个时间步的输出节点,都包含了输入序列中当前时刻完整的过去和未来的上下文信息。下图展示的是一个按时间步展开的双向循环神经网络。该网络包含一个前向和一个后向RNN,其中有六个权重矩阵:输入到前向隐层和后向隐层的权重矩阵($W_1, W_3$),隐层到隐层自己的权重矩阵($W_2,W_5$),前向隐层和后向隐层到输出层的权重矩阵($W_4, W_6$)。注意,该网络的前向隐层和后向隐层之间没有连接。
<p align="center">
<img src="images/bi_rnn.png" width=450><br/>
图1. 按时间步展开的双向循环神经网络
</p>
<p align="center">
<img src="images/encoder_attention.png" width=500><br/>
图2. 使用双向LSTM的编码器
</p>
### 注意力机制
如果编码阶段的输出是一个固定维度的向量,会带来以下两个问题:1)不论源语言序列的长度是5个词还是50个词,如果都用固定维度的向量去编码其中的语义和句法结构信息,对模型来说是一个非常高的要求,特别是对长句子序列而言;2)直觉上,当人类翻译一句话时,会对与当前译文更相关的源语言片段上给予更多关注,且关注点会随着翻译的进行而改变。而固定维度的向量则相当于,任何时刻都对源语言所有信息给予了同等程度的关注,这是不合理的。因此,Bahdanau等人\[[4](#参考文献)\]引入注意力(attention)机制,可以对编码后的上下文片段进行解码,以此来解决长句子的特征学习问题。下面介绍在注意力机制下的解码器结构。
与简单的解码器不同,这里$z_i$的计算公式为:
$$z_{i+1}=\phi _{\theta '}\left ( c_i,u_i,z_i \right )$$
可见,源语言句子的编码向量表示为第$i$个词的上下文片段$c_i$,即针对每一个目标语言中的词$u_i$,都有一个特定的$c_i$与之对应。$c_i$的计算公式如下:
$$c_i=\sum _{j=1}^{T}a_{ij}h_j, a_i=\left[ a_{i1},a_{i2},...,a_{iT}\right ]$$
从公式中可以看出,注意力机制是通过对编码器中各时刻的RNN状态$h_j$进行加权平均实现的。权重$a_{ij}$表示目标语言中第$i$个词对源语言中第$j$个词的注意力大小,$a_{ij}$的计算公式如下:
$$a_{ij} = {exp(e_{ij}) \over {\sum_{k=1}^T exp(e_{ik})}}$$
$$e_{ij} = {align(z_i, h_j)}$$
其中,$align$可以看作是一个对齐模型,用来衡量目标语言中第$i$个词和源语言中第$j$个词的匹配程度。具体而言,这个程度是通过解码RNN的第$i$个隐层状态$z_i$和源语言句子的第$j$个上下文片段$h_j$计算得到的。传统的对齐模型中,目标语言的每个词明确对应源语言的一个或多个词(hard alignment);而在注意力模型中采用的是soft alignment,即任何两个目标语言和源语言词间均存在一定的关联,且这个关联强度是由模型计算得到的实数,因此可以融入整个NMT框架,并通过反向传播算法进行训练。
<p align="center">
<img src="images/decoder_attention.png" width=500><br/>
图3. 基于注意力机制的解码器
</p>
### 柱搜索算法
柱搜索([beam search](http://en.wikipedia.org/wiki/Beam_search))是一种启发式图搜索算法,用于在图或树中搜索有限集合中的最优扩展节点,通常用在解空间非常大的系统(如机器翻译、语音识别)中,原因是内存无法装下图或树中所有展开的解。如在机器翻译任务中希望翻译“`<s>你好<e>`”,就算目标语言字典中只有3个词(`<s>`, `<e>`, `hello`),也可能生成无限句话(`hello`循环出现的次数不定),为了找到其中较好的翻译结果,我们可采用柱搜索算法。
柱搜索算法使用广度优先策略建立搜索树,在树的每一层,按照启发代价(heuristic cost)(本教程中,为生成词的log概率之和)对节点进行排序,然后仅留下预先确定的个数(文献中通常称为beam width、beam size、柱宽度等)的节点。只有这些节点会在下一层继续扩展,其他节点就被剪掉了,也就是说保留了质量较高的节点,剪枝了质量较差的节点。因此,搜索所占用的空间和时间大幅减少,但缺点是无法保证一定获得最优解。
使用柱搜索算法的解码阶段,目标是最大化生成序列的概率。思路是:
1. 每一个时刻,根据源语言句子的编码信息$c$、生成的第$i$个目标语言序列单词$u_i$和$i$时刻RNN的隐层状态$z_i$,计算出下一个隐层状态$z_{i+1}$。
2. 将$z_{i+1}$通过`softmax`归一化,得到目标语言序列的第$i+1$个单词的概率分布$p_{i+1}$。
3. 根据$p_{i+1}$采样出单词$u_{i+1}$。
4. 重复步骤1~3,直到获得句子结束标记`<e>`或超过句子的最大生成长度为止。
注意:$z_{i+1}$和$p_{i+1}$的计算公式同解码器中的一样。且由于生成时的每一步都是通过贪心法实现的,因此并不能保证得到全局最优解。
## 数据介绍
本教程使用[WMT-14](http://www-lium.univ-lemans.fr/~schwenk/cslm_joint_paper/)数据集中的[bitexts(after selection)](http://www-lium.univ-lemans.fr/~schwenk/cslm_joint_paper/data/bitexts.tgz)作为训练集,[dev+test data](http://www-lium.univ-lemans.fr/~schwenk/cslm_joint_paper/data/dev+test.tgz)作为测试集和生成集。
### 数据预处理
我们的预处理流程包括两步:
- 将每个源语言到目标语言的平行语料库文件合并为一个文件:
- 合并每个`XXX.src``XXX.trg`文件为`XXX`
- `XXX`中的第$i$行内容为`XXX.src`中的第$i$行和`XXX.trg`中的第$i$行连接,用'\t'分隔。
- 创建训练数据的“源字典”和“目标字典”。每个字典都有**DICTSIZE**个单词,包括:语料中词频最高的(DICTSIZE - 3)个单词,和3个特殊符号`<s>`(序列的开始)、`<e>`(序列的结束)和`<unk>`(未登录词)。
### 示例数据
因为完整的数据集数据量较大,为了验证训练流程,PaddlePaddle接口paddle.dataset.wmt14中默认提供了一个经过预处理的[较小规模的数据集](http://paddlepaddle.bj.bcebos.com/demo/wmt_shrinked_data/wmt14.tgz)
该数据集有193319条训练数据,6003条测试数据,词典长度为30000。因为数据规模限制,使用该数据集训练出来的模型效果无法保证。
## 训练模型
`train.py`包含训练程序的主函数,要使用默认参数开始训练,只需要简单地执行:
```sh
python train.py
```
您可以使用命令行参数来设置模型训练时的参数。要显示所有可用的命令行参数,执行:
```sh
python train.py -h
```
这样会显示所有的命令行参数的描述,以及其默认值。默认的模型是带有注意力机制的。您也可以尝试运行无注意力机制的模型,命令如下:
```sh
python train.py --no_attention
```
训练好的模型默认会被保存到`./models`路径下。您可以用命令行参数`--save_dir`来指定模型的保存路径。默认每个pass结束时会保存一个模型。
## 生成预测结果
在模型训练好后,可以用`infer.py`来生成预测结果。同样的,使用默认参数,只需要执行:
```sh
python infer.py
```
您也可以同样用命令行来指定各参数。注意,预测时的参数设置必须与训练时完全一致,否则载入模型会失败。您可以用`--pass_num`参数来选择读取哪个pass结束时保存的模型。同时您可以使用`--beam_width`参数来选择beam search宽度。
## 参考文献
1. Koehn P. [Statistical machine translation](https://books.google.com.hk/books?id=4v_Cx1wIMLkC&printsec=frontcover&hl=zh-CN&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false)[M]. Cambridge University Press, 2009.
2. Cho K, Van Merriënboer B, Gulcehre C, et al. [Learning phrase representations using RNN encoder-decoder for statistical machine translation](http://www.aclweb.org/anthology/D/D14/D14-1179.pdf)[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014: 1724-1734.
3. Chung J, Gulcehre C, Cho K H, et al. [Empirical evaluation of gated recurrent neural networks on sequence modeling](https://arxiv.org/abs/1412.3555)[J]. arXiv preprint arXiv:1412.3555, 2014.
4. Bahdanau D, Cho K, Bengio Y. [Neural machine translation by jointly learning to align and translate](https://arxiv.org/abs/1409.0473)[C]//Proceedings of ICLR 2015, 2015.
5. Papineni K, Roukos S, Ward T, et al. [BLEU: a method for automatic evaluation of machine translation](http://dl.acm.org/citation.cfm?id=1073135)[C]//Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 2002: 311-318.
<br/>
<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="知识共享许可协议" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" href="http://purl.org/dc/dcmitype/Text" property="dct:title" rel="dct:type">本教程</span><a xmlns:cc="http://creativecommons.org/ns#" href="http://book.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">PaddlePaddle</a> 创作,采用 <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">知识共享 署名-相同方式共享 4.0 国际 许可协议</a>进行许可。
\ No newline at end of file
...@@ -7,9 +7,9 @@ from kpi import CostKpi, DurationKpi, AccKpi ...@@ -7,9 +7,9 @@ from kpi import CostKpi, DurationKpi, AccKpi
#### NOTE kpi.py should shared in models in some way!!!! #### NOTE kpi.py should shared in models in some way!!!!
train_cost_kpi = CostKpi('train_cost', 0.02, 0, actived=True) train_cost_kpi = CostKpi('train_cost', 0.02, 0, actived=False)
test_cost_kpi = CostKpi('test_cost', 0.005, 0, actived=True) test_cost_kpi = CostKpi('test_cost', 0.005, 0, actived=False)
train_duration_kpi = DurationKpi('train_duration', 0.06, 0, actived=True) train_duration_kpi = DurationKpi('train_duration', 0.06, 0, actived=False)
tracking_kpis = [ tracking_kpis = [
train_cost_kpi, train_cost_kpi,
......
...@@ -52,7 +52,8 @@ def parse_args(): ...@@ -52,7 +52,8 @@ def parse_args():
"--pass_num", "--pass_num",
type=int, type=int,
default=5, default=5,
help="The pass number to train. (default: %(default)d)") help="The pass number to train. In inference mode, load the saved model"
" at the end of given pass.(default: %(default)d)")
parser.add_argument( parser.add_argument(
"--learning_rate", "--learning_rate",
type=float, type=float,
...@@ -66,17 +67,17 @@ def parse_args(): ...@@ -66,17 +67,17 @@ def parse_args():
"--beam_size", "--beam_size",
type=int, type=int,
default=3, default=3,
help="The width for beam searching. (default: %(default)d)") help="The width for beam search. (default: %(default)d)")
parser.add_argument( parser.add_argument(
"--use_gpu", "--use_gpu",
type=distutils.util.strtobool, type=distutils.util.strtobool,
default=True, default=True,
help="Whether to use gpu. (default: %(default)d)") help="Whether to use gpu or not. (default: %(default)d)")
parser.add_argument( parser.add_argument(
"--max_length", "--max_length",
type=int, type=int,
default=50, default=50,
help="The maximum length of sequence when doing generation. " help="The maximum sequence length for translation result."
"(default: %(default)d)") "(default: %(default)d)")
parser.add_argument( parser.add_argument(
"--save_dir", "--save_dir",
......
...@@ -122,9 +122,8 @@ def seq_to_seq_net(embedding_dim, encoder_size, decoder_size, source_dict_dim, ...@@ -122,9 +122,8 @@ def seq_to_seq_net(embedding_dim, encoder_size, decoder_size, source_dict_dim,
decoder_state_expand = fluid.layers.sequence_expand( decoder_state_expand = fluid.layers.sequence_expand(
x=decoder_state_proj, y=encoder_proj) x=decoder_state_proj, y=encoder_proj)
# concated lod should inherit from encoder_proj # concated lod should inherit from encoder_proj
concated = fluid.layers.concat( mixed_state = encoder_proj + decoder_state_expand
input=[encoder_proj, decoder_state_expand], axis=1) attention_weights = fluid.layers.fc(input=mixed_state,
attention_weights = fluid.layers.fc(input=concated,
size=1, size=1,
bias_attr=False) bias_attr=False)
attention_weights = fluid.layers.sequence_softmax( attention_weights = fluid.layers.sequence_softmax(
......
...@@ -16,7 +16,7 @@ ...@@ -16,7 +16,7 @@
├── reader.py # 数据读取接口 ├── reader.py # 数据读取接口
├── README.md # 文档 ├── README.md # 文档
├── train.py # 训练脚本 ├── train.py # 训练脚本
└── util.py # wordpiece 数据解码工具 └── gen_data.sh # 数据生成脚本
``` ```
### 简介 ### 简介
...@@ -59,36 +59,29 @@ Decoder 具有和 Encoder 类似的结构,只是相比于组成 Encoder 的 la ...@@ -59,36 +59,29 @@ Decoder 具有和 Encoder 类似的结构,只是相比于组成 Encoder 的 la
### 数据准备 ### 数据准备
WMT 数据集是机器翻译领域公认的主流数据集;WMT 英德和英法数据集也是 Transformer 论文中所用数据集,其中英德数据集使用了 BPE(byte-pair encoding)[4]编码的数据,英法数据集使用了 wordpiece [5]的数据。我们这里也将使用 WMT 英德和英法翻译数据,并和论文保持一致使用 BPE 和 wordpiece 的数据,下面给出了使用的方法。对于其他自定义数据,参照下文遵循或转换为类似的数据格式即可。 WMT 数据集是机器翻译领域公认的主流数据集[WMT'16 EN-DE 数据集](http://www.statmt.org/wmt16/translation-task.html)是其中一个中等规模的数据集,也是 Transformer 论文中用到的一个数据集,这里将其作为示例,可以直接运行 `gen_data.sh` 脚本进行 WMT'16 EN-DE 数据集的下载和预处理。数据处理过程主要包括 Tokenize 和 BPE 编码(byte-pair encoding);BPE 编码的数据能够较好的解决未登录词(out-of-vocabulary,OOV)的问题[4],其在 Transformer 论文中也被使用。运行成功后,将会生成文件夹 `gen_data`,其目录结构如下(可在 `gen_data.sh` 中修改):
#### WMT 英德翻译数据 ```text
.
[WMT'16 EN-DE 数据集](http://www.statmt.org/wmt16/translation-task.html)是一个中等规模的数据集。参照论文,英德数据集我们使用 BPE 编码的数据,这能够更好的解决未登录词(out-of-vocabulary,OOV)的问题[4]。用到的 BPE 数据可以参照[这里](https://github.com/google/seq2seq/blob/master/docs/data.md)进行下载(如果希望在自定义数据中使用 BPE 编码,可以参照[这里](https://github.com/rsennrich/subword-nmt)进行预处理),下载后解压,其中 `train.tok.clean.bpe.32000.en``train.tok.clean.bpe.32000.de` 为使用 BPE 的训练数据(平行语料,分别对应了英语和德语,经过了 tokenize 和 BPE 的处理),`newstest2016.tok.bpe.32000.en``newstest2016.tok.bpe.32000.de` 等为测试数据(`newstest2016.tok.en``newstest2016.tok.de` 等则为对应的未使用 BPE 的测试数据),`vocab.bpe.32000` 为相应的词典文件(源语言和目标语言共享该词典文件)。 ├── wmt16_ende_data # WMT16 英德翻译数据
├── wmt16_ende_data_bpe # BPE 编码的 WMT16 英德翻译数据
由于本示例中的数据读取脚本 `reader.py` 默认使用的样本数据的格式为 `\t` 分隔的的源语言和目标语言句子对(默认句子中的词之间使用空格分隔),因此需要将源语言到目标语言的平行语料库文件合并为一个文件,可以执行以下命令进行合并: ├── mosesdecoder # Moses 机器翻译工具集,包含了 Tokenize、BLEU 评估等脚本
```sh └── subword-nmt # BPE 编码的代码
paste -d '\t' train.tok.clean.bpe.32000.en train.tok.clean.bpe.32000.de > train.tok.clean.bpe.32000.en-de
```
此外,下载的词典文件 `vocab.bpe.32000` 中未包含表示序列开始、序列结束和未登录词的特殊符号,可以使用如下命令在词典中加入 `<s>``<e>``<unk>` 作为这三个特殊符号(用 BPE 表示数据已有效避免了未登录词的问题,这里加入只是做通用处理)。
```sh
sed -i '1i\<s>\n<e>\n<unk>' vocab.bpe.32000
``` ```
#### WMT 英法翻译数据 `gen_data/wmt16_ende_data_bpe` 中是我们最终使用的英德翻译数据,其中 `train.tok.clean.bpe.32000.en-de` 为训练数据,`newstest2016.tok.bpe.32000.en-de` 等为验证和测试数据,。`vocab_all.bpe.32000` 为相应的词典文件(已加入 `<s>``<e>``<unk>` 这三个特殊符号,源语言和目标语言共享该词典文件)。
[WMT'14 EN-FR 数据集](http://www.statmt.org/wmt14/translation-task.html)是一个较大规模的数据集。参照论文,英法数据我们使用 wordpiece 表示的数据,wordpiece 和 BPE 类似同为采用 sub-word units 来解决 OOV 问题的方法[5]。我们提供了已完成预处理的 wordpiece 数据的下载,可以从[这里](http://transformer-data.bj.bcebos.com/wmt14_enfr.tar)下载,其中 `train.wordpiece.en-fr` 为使用 wordpiece 的训练数据,`newstest2014.wordpiece.en-fr` 为测试数据(`newstest2014.tok.en``newstest2014.tok.fr` 为对应的未经 wordpiece 处理过的测试数据,使用[脚本](https://github.com/moses-smt/mosesdecoder/blob/master/scripts/tokenizer/tokenizer.perl)进行了 tokenize 的处理),`vocab.wordpiece.en-fr` 为相应的词典文件(源语言和目标语言共享该词典文件)。
提供的英法翻译数据无需进行额外的处理,可以直接使用;需要注意的是,这些用 wordpiece 表示的数据中句子内的 token 之间使用 `\x01` 而非空格进行分隔(因部分 token 内包含空格),这需要在训练时进行指定 对于其他自定义数据,转换为类似 `train.tok.clean.bpe.32000.en-de` 的数据格式(`\t` 分隔的源语言和目标语言句子对,句子中的 token 之间使用空格分隔)即可;如需使用 BPE 编码,可参考,亦可以使用类似 WMT,使用 `gen_data.sh` 进行处理
### 模型训练 ### 模型训练
`train.py` 是模型训练脚本。以英德翻译数据为例,可以执行以下命令进行模型训练: `train.py` 是模型训练脚本。以英德翻译数据为例,可以执行以下命令进行模型训练:
```sh ```sh
python -u train.py \ python -u train.py \
--src_vocab_fpath data/vocab.bpe.32000 \ --src_vocab_fpath gen_data/wmt16_ende_data_bpe/vocab_all.bpe.32000 \
--trg_vocab_fpath data/vocab.bpe.32000 \ --trg_vocab_fpath gen_data/wmt16_ende_data_bpe/vocab_all.bpe.32000 \
--special_token '<s>' '<e>' '<unk>' \ --special_token '<s>' '<e>' '<unk>' \
--train_file_pattern data/train.tok.clean.bpe.32000.en-de \ --train_file_pattern gen_data/wmt16_ende_data_bpe/train.tok.clean.bpe.32000.en-de \
--token_delimiter ' ' \ --token_delimiter ' ' \
--use_token_batch True \ --use_token_batch True \
--batch_size 4096 \ --batch_size 4096 \
...@@ -104,10 +97,10 @@ python train.py --help ...@@ -104,10 +97,10 @@ python train.py --help
```sh ```sh
python -u train.py \ python -u train.py \
--src_vocab_fpath data/vocab.bpe.32000 \ --src_vocab_fpath gen_data/wmt16_ende_data_bpe/vocab_all.bpe.32000 \
--trg_vocab_fpath data/vocab.bpe.32000 \ --trg_vocab_fpath gen_data/wmt16_ende_data_bpe/vocab_all.bpe.32000 \
--special_token '<s>' '<e>' '<unk>' \ --special_token '<s>' '<e>' '<unk>' \
--train_file_pattern data/train.tok.clean.bpe.32000.en-de \ --train_file_pattern gen_data/wmt16_ende_data_bpe/train.tok.clean.bpe.32000.en-de \
--token_delimiter ' ' \ --token_delimiter ' ' \
--use_token_batch True \ --use_token_batch True \
--batch_size 3200 \ --batch_size 3200 \
...@@ -120,7 +113,7 @@ python -u train.py \ ...@@ -120,7 +113,7 @@ python -u train.py \
n_head 16 \ n_head 16 \
prepostprocess_dropout 0.3 prepostprocess_dropout 0.3
``` ```
有关这些参数更详细信息的请参考 `config.py` 中的注释说明。对于英法翻译数据,执行训练和英德翻译训练类似,修改命令中的词典和数据文件为英法数据相应文件的路径,另外要注意的是由于英法翻译数据 token 间不是使用空格进行分隔,需要修改 `token_delimiter` 参数的设置为 `--token_delimiter '\x01'` 有关这些参数更详细信息的请参考 `config.py` 中的注释说明。
训练时默认使用所有 GPU,可以通过 `CUDA_VISIBLE_DEVICES` 环境变量来设置使用的 GPU 数目。也可以只使用 CPU 训练(通过参数 `--divice CPU` 设置),训练速度相对较慢。在训练过程中,每隔一定 iteration 后(通过参数 `save_freq` 设置,默认为10000)保存模型到参数 `model_dir` 指定的目录,每个 epoch 结束后也会保存 checkpiont 到 `ckpt_dir` 指定的目录,每个 iteration 将打印如下的日志到标准输出: 训练时默认使用所有 GPU,可以通过 `CUDA_VISIBLE_DEVICES` 环境变量来设置使用的 GPU 数目。也可以只使用 CPU 训练(通过参数 `--divice CPU` 设置),训练速度相对较慢。在训练过程中,每隔一定 iteration 后(通过参数 `save_freq` 设置,默认为10000)保存模型到参数 `model_dir` 指定的目录,每个 epoch 结束后也会保存 checkpiont 到 `ckpt_dir` 指定的目录,每个 iteration 将打印如下的日志到标准输出:
```txt ```txt
...@@ -141,10 +134,10 @@ step_idx: 9, epoch: 0, batch: 9, avg loss: 10.993434, normalized loss: 9.616467, ...@@ -141,10 +134,10 @@ step_idx: 9, epoch: 0, batch: 9, avg loss: 10.993434, normalized loss: 9.616467,
`infer.py` 是模型预测脚本。以英德翻译数据为例,模型训练完成后可以执行以下命令对指定文件中的文本进行翻译: `infer.py` 是模型预测脚本。以英德翻译数据为例,模型训练完成后可以执行以下命令对指定文件中的文本进行翻译:
```sh ```sh
python -u infer.py \ python -u infer.py \
--src_vocab_fpath data/vocab.bpe.32000 \ --src_vocab_fpath gen_data/wmt16_ende_data_bpe/vocab_all.bpe.32000 \
--trg_vocab_fpath data/vocab.bpe.32000 \ --trg_vocab_fpath gen_data/wmt16_ende_data_bpe/vocab_all.bpe.32000 \
--special_token '<s>' '<e>' '<unk>' \ --special_token '<s>' '<e>' '<unk>' \
--test_file_pattern data/newstest2016.tok.bpe.32000.en-de \ --test_file_pattern gen_data/wmt16_ende_data_bpe/newstest2016.tok.bpe.32000.en-de \
--use_wordpiece False \ --use_wordpiece False \
--token_delimiter ' ' \ --token_delimiter ' ' \
--batch_size 32 \ --batch_size 32 \
...@@ -159,14 +152,9 @@ python -u infer.py \ ...@@ -159,14 +152,9 @@ python -u infer.py \
sed -r 's/(@@ )|(@@ ?$)//g' predict.txt > predict.tok.txt sed -r 's/(@@ )|(@@ ?$)//g' predict.txt > predict.tok.txt
``` ```
对于英法翻译的 wordpiece 数据,执行预测和英德翻译预测类似,修改命令中的词典和数据文件为英法数据相应文件的路径,另外需要注意修改 `token_delimiter` 参数的设置为 `--token_delimiter '\x01'`;同时要修改 `use_wordpiece` 参数的设置为 `--use_wordpiece True`,这会在预测时将翻译得到的 wordpiece 数据还原为原始数据输出。为了使用 tokenize 的数据进行评估,还需要对翻译结果进行 tokenize 的处理,[Moses](https://github.com/moses-smt/mosesdecoder) 提供了一系列机器翻译相关的脚本。执行 `git clone https://github.com/moses-smt/mosesdecoder.git` 克隆 mosesdecoder 仓库后,可以使用其中的 `tokenizer.perl` 脚本对 `predict.txt` 内的翻译结果进行 tokenize 处理并输出到 `predict.tok.txt` 中,如下: 接下来就可以使用参考翻译对翻译结果进行 BLEU 指标的评估了。以英德翻译 `newstest2016.tok.de` 数据为例,执行如下命令:
```sh
perl mosesdecoder/scripts/tokenizer/tokenizer.perl -l fr < predict.txt > predict.tok.txt
```
接下来就可以使用参考翻译对翻译结果进行 BLEU 指标的评估了。计算 BLEU 值的脚本也在 Moses 中包含,以英德翻译 `newstest2016.tok.de` 数据为例,执行如下命令:
```sh ```sh
perl mosesdecoder/scripts/generic/multi-bleu.perl data/newstest2016.tok.de < predict.tok.txt perl gen_data/mosesdecoder/scripts/generic/multi-bleu.perl gen_data/wmt16_ende_data/newstest2016.tok.de < predict.tok.txt
``` ```
可以看到类似如下的结果(为单机两卡训练 200K 个 iteration 后模型的预测结果)。 可以看到类似如下的结果(为单机两卡训练 200K 个 iteration 后模型的预测结果)。
``` ```
...@@ -174,11 +162,10 @@ BLEU = 33.08, 64.2/39.2/26.4/18.5 (BP=0.994, ratio=0.994, hyp_len=61971, ref_len ...@@ -174,11 +162,10 @@ BLEU = 33.08, 64.2/39.2/26.4/18.5 (BP=0.994, ratio=0.994, hyp_len=61971, ref_len
``` ```
目前在未使用 model average 的情况下,英德翻译 base model 八卡训练 100K 个 iteration 后测试 BLEU 值如下: 目前在未使用 model average 的情况下,英德翻译 base model 八卡训练 100K 个 iteration 后测试 BLEU 值如下:
| 测试集 | newstest2013 | newstest2014 | newstest2015 | newstest2016 | | 测试集 | newstest2014 | newstest2015 | newstest2016 |
|-|-|-|-|-| |-|-|-|-|
| BLEU | 25.27 | 26.05 | 28.75 | 33.27 | | BLEU | 26.05 | 28.75 | 33.27 |
英法翻译 base model 八卡训练 100K 个 iteration 后在 `newstest2014` 上测试 BLEU 值为36.。
### 分布式训练 ### 分布式训练
...@@ -260,4 +247,3 @@ export PADDLE_PORT=6177 ...@@ -260,4 +247,3 @@ export PADDLE_PORT=6177
2. He K, Zhang X, Ren S, et al. [Deep residual learning for image recognition](http://openaccess.thecvf.com/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf)[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778. 2. He K, Zhang X, Ren S, et al. [Deep residual learning for image recognition](http://openaccess.thecvf.com/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf)[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.
3. Ba J L, Kiros J R, Hinton G E. [Layer normalization](https://arxiv.org/pdf/1607.06450.pdf)[J]. arXiv preprint arXiv:1607.06450, 2016. 3. Ba J L, Kiros J R, Hinton G E. [Layer normalization](https://arxiv.org/pdf/1607.06450.pdf)[J]. arXiv preprint arXiv:1607.06450, 2016.
4. Sennrich R, Haddow B, Birch A. [Neural machine translation of rare words with subword units](https://arxiv.org/pdf/1508.07909)[J]. arXiv preprint arXiv:1508.07909, 2015. 4. Sennrich R, Haddow B, Birch A. [Neural machine translation of rare words with subword units](https://arxiv.org/pdf/1508.07909)[J]. arXiv preprint arXiv:1508.07909, 2015.
5. Wu Y, Schuster M, Chen Z, et al. [Google's neural machine translation system: Bridging the gap between human and machine translation](https://arxiv.org/pdf/1609.08144.pdf)[J]. arXiv preprint arXiv:1609.08144, 2016.
#! /usr/bin/env bash
set -e
OUTPUT_DIR=$PWD/gen_data
###############################################################################
# change these variables for other WMT data
###############################################################################
OUTPUT_DIR_DATA="${OUTPUT_DIR}/wmt16_ende_data"
OUTPUT_DIR_BPE_DATA="${OUTPUT_DIR}/wmt16_ende_data_bpe"
LANG1="en"
LANG2="de"
# each of TRAIN_DATA: data_url data_file_lang1 data_file_lang2
TRAIN_DATA=(
'http://www.statmt.org/europarl/v7/de-en.tgz'
'europarl-v7.de-en.en' 'europarl-v7.de-en.de'
'http://www.statmt.org/wmt13/training-parallel-commoncrawl.tgz'
'commoncrawl.de-en.en' 'commoncrawl.de-en.de'
'http://data.statmt.org/wmt16/translation-task/training-parallel-nc-v11.tgz'
'news-commentary-v11.de-en.en' 'news-commentary-v11.de-en.de'
)
# each of DEV_TEST_DATA: data_url data_file_lang1 data_file_lang2
DEV_TEST_DATA=(
'http://data.statmt.org/wmt16/translation-task/dev.tgz'
'newstest201[45]-deen-ref.en.sgm' 'newstest201[45]-deen-src.de.sgm'
'http://data.statmt.org/wmt16/translation-task/test.tgz'
'newstest2016-deen-ref.en.sgm' 'newstest2016-deen-src.de.sgm'
)
###############################################################################
###############################################################################
# change these variables for other WMT data
###############################################################################
# OUTPUT_DIR_DATA="${OUTPUT_DIR}/wmt14_enfr_data"
# OUTPUT_DIR_BPE_DATA="${OUTPUT_DIR}/wmt14_enfr_data_bpe"
# LANG1="en"
# LANG2="fr"
# # each of TRAIN_DATA: ata_url data_tgz data_file
# TRAIN_DATA=(
# 'http://www.statmt.org/wmt13/training-parallel-commoncrawl.tgz'
# 'commoncrawl.fr-en.en' 'commoncrawl.fr-en.fr'
# 'http://www.statmt.org/wmt13/training-parallel-europarl-v7.tgz'
# 'training/europarl-v7.fr-en.en' 'training/europarl-v7.fr-en.fr'
# 'http://www.statmt.org/wmt14/training-parallel-nc-v9.tgz'
# 'training/news-commentary-v9.fr-en.en' 'training/news-commentary-v9.fr-en.fr'
# 'http://www.statmt.org/wmt10/training-giga-fren.tar'
# 'giga-fren.release2.fixed.en.*' 'giga-fren.release2.fixed.fr.*'
# 'http://www.statmt.org/wmt13/training-parallel-un.tgz'
# 'un/undoc.2000.fr-en.en' 'un/undoc.2000.fr-en.fr'
# )
# # each of DEV_TEST_DATA: data_url data_tgz data_file_lang1 data_file_lang2
# DEV_TEST_DATA=(
# 'http://data.statmt.org/wmt16/translation-task/dev.tgz'
# '.*/newstest201[45]-fren-ref.en.sgm' '.*/newstest201[45]-fren-src.fr.sgm'
# 'http://data.statmt.org/wmt16/translation-task/test.tgz'
# '.*/newstest2016-fren-ref.en.sgm' '.*/newstest2016-fren-src.fr.sgm'
# )
###############################################################################
mkdir -p $OUTPUT_DIR_DATA $OUTPUT_DIR_BPE_DATA
# Extract training data
for ((i=0;i<${#TRAIN_DATA[@]};i+=3)); do
data_url=${TRAIN_DATA[i]}
data_tgz=${data_url##*/} # training-parallel-commoncrawl.tgz
data=${data_tgz%.*} # training-parallel-commoncrawl
data_lang1=${TRAIN_DATA[i+1]}
data_lang2=${TRAIN_DATA[i+2]}
if [ ! -e ${OUTPUT_DIR_DATA}/${data_tgz} ]; then
echo "Download "${data_url}
wget -O ${OUTPUT_DIR_DATA}/${data_tgz} ${data_url}
fi
if [ ! -d ${OUTPUT_DIR_DATA}/${data} ]; then
echo "Extract "${data_tgz}
mkdir -p ${OUTPUT_DIR_DATA}/${data}
tar_type=${data_tgz:0-3}
if [ ${tar_type} == "tar" ]; then
tar -xvf ${OUTPUT_DIR_DATA}/${data_tgz} -C ${OUTPUT_DIR_DATA}/${data}
else
tar -xvzf ${OUTPUT_DIR_DATA}/${data_tgz} -C ${OUTPUT_DIR_DATA}/${data}
fi
fi
# concatenate all training data
for data_lang in $data_lang1 $data_lang2; do
for f in `find ${OUTPUT_DIR_DATA}/${data} -regex ".*/${data_lang}"`; do
data_dir=`dirname $f`
data_file=`basename $f`
f_base=${f%.*}
f_ext=${f##*.}
if [ $f_ext == "gz" ]; then
gunzip $f
l=${f_base##*.}
f_base=${f_base%.*}
else
l=${f_ext}
fi
if [ $i -eq 0 ]; then
cat ${f_base}.$l > ${OUTPUT_DIR_DATA}/train.$l
else
cat ${f_base}.$l >> ${OUTPUT_DIR_DATA}/train.$l
fi
done
done
done
# Clone mosesdecoder
if [ ! -d ${OUTPUT_DIR}/mosesdecoder ]; then
echo "Cloning moses for data processing"
git clone https://github.com/moses-smt/mosesdecoder.git ${OUTPUT_DIR}/mosesdecoder
fi
# Extract develop and test data
dev_test_data=""
for ((i=0;i<${#DEV_TEST_DATA[@]};i+=3)); do
data_url=${DEV_TEST_DATA[i]}
data_tgz=${data_url##*/} # training-parallel-commoncrawl.tgz
data=${data_tgz%.*} # training-parallel-commoncrawl
data_lang1=${DEV_TEST_DATA[i+1]}
data_lang2=${DEV_TEST_DATA[i+2]}
if [ ! -e ${OUTPUT_DIR_DATA}/${data_tgz} ]; then
echo "Download "${data_url}
wget -O ${OUTPUT_DIR_DATA}/${data_tgz} ${data_url}
fi
if [ ! -d ${OUTPUT_DIR_DATA}/${data} ]; then
echo "Extract "${data_tgz}
mkdir -p ${OUTPUT_DIR_DATA}/${data}
tar_type=${data_tgz:0-3}
if [ ${tar_type} == "tar" ]; then
tar -xvf ${OUTPUT_DIR_DATA}/${data_tgz} -C ${OUTPUT_DIR_DATA}/${data}
else
tar -xvzf ${OUTPUT_DIR_DATA}/${data_tgz} -C ${OUTPUT_DIR_DATA}/${data}
fi
fi
for data_lang in $data_lang1 $data_lang2; do
for f in `find ${OUTPUT_DIR_DATA}/${data} -regex ".*/${data_lang}"`; do
data_dir=`dirname $f`
data_file=`basename $f`
data_out=`echo ${data_file} | cut -d '-' -f 1` # newstest2016
l=`echo ${data_file} | cut -d '.' -f 2` # en
dev_test_data="${dev_test_data}\|${data_out}" # to make regexp
if [ ! -e ${OUTPUT_DIR_DATA}/${data_out}.$l ]; then
${OUTPUT_DIR}/mosesdecoder/scripts/ems/support/input-from-sgm.perl \
< $f > ${OUTPUT_DIR_DATA}/${data_out}.$l
fi
done
done
done
# Tokenize data
for l in ${LANG1} ${LANG2}; do
for f in `ls ${OUTPUT_DIR_DATA}/*.$l | grep "\(train${dev_test_data}\)\.$l$"`; do
f_base=${f%.*} # dir/train dir/newstest2016
f_out=$f_base.tok.$l
if [ ! -e $f_out ]; then
echo "Tokenize "$f
${OUTPUT_DIR}/mosesdecoder/scripts/tokenizer/tokenizer.perl -q -l $l -threads 8 < $f > $f_out
fi
done
done
# Clean data
for f in ${OUTPUT_DIR_DATA}/train.${LANG1} ${OUTPUT_DIR_DATA}/train.tok.${LANG1}; do
f_base=${f%.*} # dir/train dir/train.tok
f_out=${f_base}.clean
if [ ! -e $f_out.${LANG1} ] && [ ! -e $f_out.${LANG2} ]; then
echo "Clean "${f_base}
${OUTPUT_DIR}/mosesdecoder/scripts/training/clean-corpus-n.perl $f_base ${LANG1} ${LANG2} ${f_out} 1 80
fi
done
# Clone subword-nmt and generate BPE data
if [ ! -d ${OUTPUT_DIR}/subword-nmt ]; then
git clone https://github.com/rsennrich/subword-nmt.git ${OUTPUT_DIR}/subword-nmt
fi
# Generate BPE data and vocabulary
for num_operations in 32000; do
if [ ! -e ${OUTPUT_DIR_BPE_DATA}/bpe.${num_operations} ]; then
echo "Learn BPE with ${num_operations} merge operations"
cat ${OUTPUT_DIR_DATA}/train.tok.clean.${LANG1} ${OUTPUT_DIR_DATA}/train.tok.clean.${LANG2} | \
${OUTPUT_DIR}/subword-nmt/learn_bpe.py -s $num_operations > ${OUTPUT_DIR_BPE_DATA}/bpe.${num_operations}
fi
for l in ${LANG1} ${LANG2}; do
for f in `ls ${OUTPUT_DIR_DATA}/*.$l | grep "\(train${dev_test_data}\)\.tok\(\.clean\)\?\.$l$"`; do
f_base=${f%.*} # dir/train.tok dir/train.tok.clean dir/newstest2016.tok
f_base=${f_base##*/} # train.tok train.tok.clean newstest2016.tok
f_out=${OUTPUT_DIR_BPE_DATA}/${f_base}.bpe.${num_operations}.$l
if [ ! -e $f_out ]; then
echo "Apply BPE to "$f
${OUTPUT_DIR}/subword-nmt/apply_bpe.py -c ${OUTPUT_DIR_BPE_DATA}/bpe.${num_operations} < $f > $f_out
fi
done
done
if [ ! -e ${OUTPUT_DIR_BPE_DATA}/vocab.bpe.${num_operations} ]; then
echo "Create vocabulary for BPE data"
cat ${OUTPUT_DIR_BPE_DATA}/train.tok.clean.bpe.${num_operations}.${LANG1} ${OUTPUT_DIR_BPE_DATA}/train.tok.clean.bpe.${num_operations}.${LANG2} | \
${OUTPUT_DIR}/subword-nmt/get_vocab.py | cut -f1 -d ' ' > ${OUTPUT_DIR_BPE_DATA}/vocab.bpe.${num_operations}
fi
done
# Adapt to the reader
for f in ${OUTPUT_DIR_BPE_DATA}/*.bpe.${num_operations}.${LANG1}; do
f_base=${f%.*} # dir/train.tok.clean.bpe.32000 dir/newstest2016.tok.bpe.32000
f_out=${f_base}.${LANG1}-${LANG2}
if [ ! -e $f_out ]; then
paste -d '\t' $f_base.${LANG1} $f_base.${LANG2} > $f_out
fi
done
if [ ! -e ${OUTPUT_DIR_BPE_DATA}/vocab_all.bpe.${num_operations} ]; then
sed '1i\<s>\n<e>\n<unk>' ${OUTPUT_DIR_BPE_DATA}/vocab.bpe.${num_operations} > ${OUTPUT_DIR_BPE_DATA}/vocab_all.bpe.${num_operations}
fi
echo "All done."
...@@ -13,7 +13,6 @@ from model import fast_decode as fast_decoder ...@@ -13,7 +13,6 @@ from model import fast_decode as fast_decoder
from config import * from config import *
from train import pad_batch_data from train import pad_batch_data
import reader import reader
import util
def parse_args(): def parse_args():
...@@ -49,21 +48,12 @@ def parse_args(): ...@@ -49,21 +48,12 @@ def parse_args():
default=["<s>", "<e>", "<unk>"], default=["<s>", "<e>", "<unk>"],
nargs=3, nargs=3,
help="The <bos>, <eos> and <unk> tokens in the dictionary.") help="The <bos>, <eos> and <unk> tokens in the dictionary.")
parser.add_argument(
"--use_wordpiece",
type=ast.literal_eval,
default=False,
help="The flag indicating if the data in wordpiece. The EN-FR data "
"we provided is wordpiece data. For wordpiece data, converting ids to "
"original words is a little different and some special codes are "
"provided in util.py to do this.")
parser.add_argument( parser.add_argument(
"--token_delimiter", "--token_delimiter",
type=lambda x: str(x.encode().decode("unicode-escape")), type=lambda x: str(x.encode().decode("unicode-escape")),
default=" ", default=" ",
help="The delimiter used to split tokens in source or target sentences. " help="The delimiter used to split tokens in source or target sentences. "
"For EN-DE BPE data we provided, use spaces as token delimiter.; " "For EN-DE BPE data we provided, use spaces as token delimiter. ")
"For EN-FR wordpiece data we provided, use '\x01' as token delimiter.")
parser.add_argument( parser.add_argument(
'opts', 'opts',
help='See config.py for all options', help='See config.py for all options',
...@@ -144,7 +134,7 @@ def prepare_batch_input(insts, data_input_names, src_pad_idx, bos_idx, n_head, ...@@ -144,7 +134,7 @@ def prepare_batch_input(insts, data_input_names, src_pad_idx, bos_idx, n_head,
return input_dict return input_dict
def fast_infer(test_data, trg_idx2word, use_wordpiece): def fast_infer(test_data, trg_idx2word):
""" """
Inference by beam search decoder based solely on Fluid operators. Inference by beam search decoder based solely on Fluid operators.
""" """
...@@ -202,9 +192,7 @@ def fast_infer(test_data, trg_idx2word, use_wordpiece): ...@@ -202,9 +192,7 @@ def fast_infer(test_data, trg_idx2word, use_wordpiece):
trg_idx2word[idx] trg_idx2word[idx]
for idx in post_process_seq( for idx in post_process_seq(
np.array(seq_ids)[sub_start:sub_end]) np.array(seq_ids)[sub_start:sub_end])
]) if not use_wordpiece else util.subtoken_ids_to_str( ]))
post_process_seq(np.array(seq_ids)[sub_start:sub_end]),
trg_idx2word))
scores[i].append(np.array(seq_scores)[sub_end - 1]) scores[i].append(np.array(seq_scores)[sub_end - 1])
print(hyps[i][-1]) print(hyps[i][-1])
if len(hyps[i]) >= InferTaskConfig.n_best: if len(hyps[i]) >= InferTaskConfig.n_best:
...@@ -232,7 +220,7 @@ def infer(args, inferencer=fast_infer): ...@@ -232,7 +220,7 @@ def infer(args, inferencer=fast_infer):
clip_last_batch=False) clip_last_batch=False)
trg_idx2word = test_data.load_dict( trg_idx2word = test_data.load_dict(
dict_path=args.trg_vocab_fpath, reverse=True) dict_path=args.trg_vocab_fpath, reverse=True)
inferencer(test_data, trg_idx2word, args.use_wordpiece) inferencer(test_data, trg_idx2word)
if __name__ == "__main__": if __name__ == "__main__":
......
...@@ -219,6 +219,7 @@ def prepare_encoder_decoder(src_word, ...@@ -219,6 +219,7 @@ def prepare_encoder_decoder(src_word,
size=[src_max_len, src_emb_dim], size=[src_max_len, src_emb_dim],
param_attr=fluid.ParamAttr( param_attr=fluid.ParamAttr(
name=pos_enc_param_name, trainable=False)) name=pos_enc_param_name, trainable=False))
src_pos_enc.stop_gradient = True
enc_input = src_word_emb + src_pos_enc enc_input = src_word_emb + src_pos_enc
return layers.dropout( return layers.dropout(
enc_input, enc_input,
...@@ -458,7 +459,7 @@ def transformer(src_vocab_size, ...@@ -458,7 +459,7 @@ def transformer(src_vocab_size,
use_py_reader=False, use_py_reader=False,
is_test=False): is_test=False):
if weight_sharing: if weight_sharing:
assert src_vocab_size == src_vocab_size, ( assert src_vocab_size == trg_vocab_size, (
"Vocabularies in source and target should be same for weight sharing." "Vocabularies in source and target should be same for weight sharing."
) )
......
import glob import glob
import six
import os import os
import tarfile import tarfile
...@@ -262,8 +263,10 @@ class DataReader(object): ...@@ -262,8 +263,10 @@ class DataReader(object):
if not os.path.isfile(fpath): if not os.path.isfile(fpath):
raise IOError("Invalid file: %s" % fpath) raise IOError("Invalid file: %s" % fpath)
with open(fpath, "r") as f: with open(fpath, "rb") as f:
for line in f: for line in f:
if six.PY3:
line = line.decode()
fields = line.strip("\n").split(self._field_delimiter) fields = line.strip("\n").split(self._field_delimiter)
if (not self._only_src and len(fields) == 2) or ( if (not self._only_src and len(fields) == 2) or (
self._only_src and len(fields) == 1): self._only_src and len(fields) == 1):
...@@ -272,8 +275,10 @@ class DataReader(object): ...@@ -272,8 +275,10 @@ class DataReader(object):
@staticmethod @staticmethod
def load_dict(dict_path, reverse=False): def load_dict(dict_path, reverse=False):
word_dict = {} word_dict = {}
with open(dict_path, "r") as fdict: with open(dict_path, "rb") as fdict:
for idx, line in enumerate(fdict): for idx, line in enumerate(fdict):
if six.PY3:
line = line.decode()
if reverse: if reverse:
word_dict[idx] = line.strip("\n") word_dict[idx] = line.strip("\n")
else: else:
......
...@@ -19,6 +19,7 @@ import logging ...@@ -19,6 +19,7 @@ import logging
import sys import sys
import copy import copy
def parse_args(): def parse_args():
parser = argparse.ArgumentParser("Training for Transformer.") parser = argparse.ArgumentParser("Training for Transformer.")
parser.add_argument( parser.add_argument(
...@@ -86,8 +87,7 @@ def parse_args(): ...@@ -86,8 +87,7 @@ def parse_args():
type=lambda x: str(x.encode().decode("unicode-escape")), type=lambda x: str(x.encode().decode("unicode-escape")),
default=" ", default=" ",
help="The delimiter used to split tokens in source or target sentences. " help="The delimiter used to split tokens in source or target sentences. "
"For EN-DE BPE data we provided, use spaces as token delimiter. " "For EN-DE BPE data we provided, use spaces as token delimiter. ")
"For EN-FR wordpiece data we provided, use '\x01' as token delimiter.")
parser.add_argument( parser.add_argument(
'opts', 'opts',
help='See config.py for all options', help='See config.py for all options',
...@@ -128,15 +128,11 @@ def parse_args(): ...@@ -128,15 +128,11 @@ def parse_args():
default=True, default=True,
help="The flag indicating whether to use py_reader.") help="The flag indicating whether to use py_reader.")
parser.add_argument( parser.add_argument(
"--fetch_steps", "--fetch_steps", type=int, default=100, help="Fetch outputs steps.")
type=int,
default=100,
help="Fetch outputs steps.")
#parser.add_argument( #parser.add_argument(
# '--profile', action='store_true', help='If set, profile a few steps.') # '--profile', action='store_true', help='If set, profile a few steps.')
args = parser.parse_args() args = parser.parse_args()
# Append args related to dict # Append args related to dict
src_dict = reader.DataReader.load_dict(args.src_vocab_fpath) src_dict = reader.DataReader.load_dict(args.src_vocab_fpath)
...@@ -151,16 +147,14 @@ def parse_args(): ...@@ -151,16 +147,14 @@ def parse_args():
[TrainTaskConfig, ModelHyperParams]) [TrainTaskConfig, ModelHyperParams])
return args return args
def append_nccl2_prepare(trainer_id, worker_endpoints, current_endpoint): def append_nccl2_prepare(trainer_id, worker_endpoints, current_endpoint):
assert(trainer_id >= 0 and assert (trainer_id >= 0 and len(worker_endpoints) > 1 and
len(worker_endpoints) > 1 and current_endpoint in worker_endpoints)
current_endpoint in worker_endpoints)
eps = copy.deepcopy(worker_endpoints) eps = copy.deepcopy(worker_endpoints)
eps.remove(current_endpoint) eps.remove(current_endpoint)
nccl_id_var = fluid.default_startup_program().global_block().create_var( nccl_id_var = fluid.default_startup_program().global_block().create_var(
name="NCCLID", name="NCCLID", persistable=True, type=fluid.core.VarDesc.VarType.RAW)
persistable=True,
type=fluid.core.VarDesc.VarType.RAW)
fluid.default_startup_program().global_block().append_op( fluid.default_startup_program().global_block().append_op(
type="gen_nccl_id", type="gen_nccl_id",
inputs={}, inputs={},
...@@ -172,6 +166,7 @@ def append_nccl2_prepare(trainer_id, worker_endpoints, current_endpoint): ...@@ -172,6 +166,7 @@ def append_nccl2_prepare(trainer_id, worker_endpoints, current_endpoint):
}) })
return nccl_id_var return nccl_id_var
def pad_batch_data(insts, def pad_batch_data(insts,
pad_idx, pad_idx,
n_head, n_head,
...@@ -385,8 +380,11 @@ def py_reader_provider_wrapper(data_reader): ...@@ -385,8 +380,11 @@ def py_reader_provider_wrapper(data_reader):
def test_context(exe, train_exe, dev_count): def test_context(exe, train_exe, dev_count):
# Context to do validation. # Context to do validation.
startup_prog = fluid.Program()
test_prog = fluid.Program() test_prog = fluid.Program()
startup_prog = fluid.Program()
if args.enable_ce:
test_prog.random_seed = 1000
startup_prog.random_seed = 1000
with fluid.program_guard(test_prog, startup_prog): with fluid.program_guard(test_prog, startup_prog):
with fluid.unique_name.guard(): with fluid.unique_name.guard():
sum_cost, avg_cost, predict, token_num, pyreader = transformer( sum_cost, avg_cost, predict, token_num, pyreader = transformer(
...@@ -448,8 +446,17 @@ def test_context(exe, train_exe, dev_count): ...@@ -448,8 +446,17 @@ def test_context(exe, train_exe, dev_count):
return test return test
def train_loop(exe, train_prog, startup_prog, dev_count, sum_cost, avg_cost, def train_loop(exe,
token_num, predict, pyreader, nccl2_num_trainers=1, nccl2_trainer_id=0): train_prog,
startup_prog,
dev_count,
sum_cost,
avg_cost,
token_num,
predict,
pyreader,
nccl2_num_trainers=1,
nccl2_trainer_id=0):
# Initialize the parameters. # Initialize the parameters.
if TrainTaskConfig.ckpt_path: if TrainTaskConfig.ckpt_path:
fluid.io.load_persistables(exe, TrainTaskConfig.ckpt_path) fluid.io.load_persistables(exe, TrainTaskConfig.ckpt_path)
...@@ -483,7 +490,8 @@ def train_loop(exe, train_prog, startup_prog, dev_count, sum_cost, avg_cost, ...@@ -483,7 +490,8 @@ def train_loop(exe, train_prog, startup_prog, dev_count, sum_cost, avg_cost,
main_program=train_prog, main_program=train_prog,
build_strategy=build_strategy, build_strategy=build_strategy,
exec_strategy=exec_strategy, exec_strategy=exec_strategy,
num_trainers=nccl2_num_trainers, trainer_id=nccl2_trainer_id) num_trainers=nccl2_num_trainers,
trainer_id=nccl2_trainer_id)
if args.val_file_pattern is not None: if args.val_file_pattern is not None:
test = test_context(exe, train_exe, dev_count) test = test_context(exe, train_exe, dev_count)
...@@ -509,7 +517,7 @@ def train_loop(exe, train_prog, startup_prog, dev_count, sum_cost, avg_cost, ...@@ -509,7 +517,7 @@ def train_loop(exe, train_prog, startup_prog, dev_count, sum_cost, avg_cost,
data_generator = train_data() data_generator = train_data()
batch_id = 0 batch_id = 0
avg_batch_time=time.time() avg_batch_time = time.time()
while True: while True:
try: try:
feed_dict_list = prepare_feed_dict_list(data_generator, feed_dict_list = prepare_feed_dict_list(data_generator,
...@@ -522,28 +530,33 @@ def train_loop(exe, train_prog, startup_prog, dev_count, sum_cost, avg_cost, ...@@ -522,28 +530,33 @@ def train_loop(exe, train_prog, startup_prog, dev_count, sum_cost, avg_cost,
elif TrainTaskConfig.profile and batch_id == 10: elif TrainTaskConfig.profile and batch_id == 10:
logging.info("end profiler") logging.info("end profiler")
#logging.info("profiling total time: ", time.time() - start_time) #logging.info("profiling total time: ", time.time() - start_time)
profiler.stop_profiler("total", "./transformer_local_profile_{}_pass{}".format(batch_id, pass_id)) profiler.stop_profiler(
"total", "./transformer_local_profile_{}_pass{}".format(
batch_id, pass_id))
sys.exit(0) sys.exit(0)
logging.info("batch_id:{}".format(batch_id)) logging.info("batch_id:{}".format(batch_id))
outs = train_exe.run( outs = train_exe.run(
fetch_list=[sum_cost.name, token_num.name] if (batch_id % args.fetch_steps == 0 or TrainTaskConfig.profile) else[], fetch_list=[sum_cost.name, token_num.name]
feed=feed_dict_list) if (batch_id % args.fetch_steps == 0 or
TrainTaskConfig.profile) else [],
feed=feed_dict_list)
if (batch_id % args.fetch_steps == 0 and batch_id > 0): if (batch_id % args.fetch_steps == 0 and batch_id > 0):
sum_cost_val, token_num_val = np.array(outs[0]), np.array(outs[ sum_cost_val, token_num_val = np.array(outs[0]), np.array(
1]) outs[1])
# sum the cost from multi-devices # sum the cost from multi-devices
total_sum_cost = sum_cost_val.sum() total_sum_cost = sum_cost_val.sum()
total_token_num = token_num_val.sum() total_token_num = token_num_val.sum()
total_avg_cost = total_sum_cost / total_token_num total_avg_cost = total_sum_cost / total_token_num
logging.info("step_idx: %d, epoch: %d, batch: %d, avg loss: %f, " logging.info(
"normalized loss: %f, ppl: %f, speed: %.2f step/s" % "step_idx: %d, epoch: %d, batch: %d, avg loss: %f, "
(step_idx, pass_id, batch_id, total_avg_cost, "normalized loss: %f, ppl: %f, speed: %.2f step/s" %
total_avg_cost - loss_normalizer, (step_idx, pass_id, batch_id, total_avg_cost,
np.exp([min(total_avg_cost, 100)]), total_avg_cost - loss_normalizer,
args.fetch_steps / (time.time() - avg_batch_time))) np.exp([min(total_avg_cost, 100)]),
args.fetch_steps / (time.time() - avg_batch_time)))
if step_idx % int(TrainTaskConfig. if step_idx % int(TrainTaskConfig.
save_freq) == TrainTaskConfig.save_freq - 1: save_freq) == TrainTaskConfig.save_freq - 1:
...@@ -557,7 +570,7 @@ def train_loop(exe, train_prog, startup_prog, dev_count, sum_cost, avg_cost, ...@@ -557,7 +570,7 @@ def train_loop(exe, train_prog, startup_prog, dev_count, sum_cost, avg_cost,
"iter_" + str(step_idx) + ".infer.model"), "iter_" + str(step_idx) + ".infer.model"),
train_prog) train_prog)
if batch_id % args.fetch_steps == 0 and batch_id > 0: if batch_id % args.fetch_steps == 0 and batch_id > 0:
avg_batch_time=time.time() avg_batch_time = time.time()
init_flag = False init_flag = False
batch_id += 1 batch_id += 1
step_idx += 1 step_idx += 1
...@@ -640,7 +653,7 @@ def train(args): ...@@ -640,7 +653,7 @@ def train(args):
use_py_reader=args.use_py_reader, use_py_reader=args.use_py_reader,
is_test=False) is_test=False)
optimizer=None optimizer = None
if args.sync: if args.sync:
lr_decay = fluid.layers.learning_rate_scheduler.noam_decay( lr_decay = fluid.layers.learning_rate_scheduler.noam_decay(
ModelHyperParams.d_model, TrainTaskConfig.warmup_steps) ModelHyperParams.d_model, TrainTaskConfig.warmup_steps)
...@@ -682,8 +695,10 @@ def train(args): ...@@ -682,8 +695,10 @@ def train(args):
print("worker_endpoints:", worker_endpoints) print("worker_endpoints:", worker_endpoints)
print("current_endpoint:", current_endpoint) print("current_endpoint:", current_endpoint)
append_nccl2_prepare(trainer_id, worker_endpoints, current_endpoint) append_nccl2_prepare(trainer_id, worker_endpoints, current_endpoint)
train_loop(exe, fluid.default_main_program(), dev_count, sum_cost, avg_cost, train_loop(exe,
lr_scheduler, token_num, predict, trainers_num, trainer_id) fluid.default_main_program(), dev_count, sum_cost,
avg_cost, lr_scheduler, token_num, predict, trainers_num,
trainer_id)
return return
port = os.getenv("PADDLE_PORT", "6174") port = os.getenv("PADDLE_PORT", "6174")
...@@ -732,7 +747,6 @@ def train(args): ...@@ -732,7 +747,6 @@ def train(args):
elif training_role == "TRAINER": elif training_role == "TRAINER":
logging.info("distributed: trainer started") logging.info("distributed: trainer started")
trainer_prog = t.get_trainer_program() trainer_prog = t.get_trainer_program()
''' '''
print("trainer start:") print("trainer start:")
program_to_code(pserver_startup) program_to_code(pserver_startup)
...@@ -744,13 +758,15 @@ def train(args): ...@@ -744,13 +758,15 @@ def train(args):
train_loop(exe, train_prog, startup_prog, dev_count, sum_cost, train_loop(exe, train_prog, startup_prog, dev_count, sum_cost,
avg_cost, token_num, predict, pyreader) avg_cost, token_num, predict, pyreader)
else: else:
logging.critical("environment var TRAINER_ROLE should be TRAINER os PSERVER") logging.critical(
"environment var TRAINER_ROLE should be TRAINER os PSERVER")
exit(1) exit(1)
if __name__ == "__main__": if __name__ == "__main__":
LOG_FORMAT = "[%(asctime)s %(levelname)s %(filename)s:%(lineno)d] %(message)s" LOG_FORMAT = "[%(asctime)s %(levelname)s %(filename)s:%(lineno)d] %(message)s"
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG, format=LOG_FORMAT) logging.basicConfig(
stream=sys.stdout, level=logging.DEBUG, format=LOG_FORMAT)
args = parse_args() args = parse_args()
train(args) train(args)
import sys
import re
import six
import unicodedata
# Regular expression for unescaping token strings.
# '\u' is converted to '_'
# '\\' is converted to '\'
# '\213;' is converted to unichr(213)
# Inverse of escaping.
_UNESCAPE_REGEX = re.compile(r"\\u|\\\\|\\([0-9]+);")
# This set contains all letter and number characters.
_ALPHANUMERIC_CHAR_SET = set(
six.unichr(i) for i in range(sys.maxunicode)
if (unicodedata.category(six.unichr(i)).startswith("L") or
unicodedata.category(six.unichr(i)).startswith("N")))
# Unicode utility functions that work with Python 2 and 3
def native_to_unicode(s):
return s if is_unicode(s) else to_unicode(s)
def unicode_to_native(s):
if six.PY2:
return s.encode("utf-8") if is_unicode(s) else s
else:
return s
def is_unicode(s):
if six.PY2:
if isinstance(s, unicode):
return True
else:
if isinstance(s, str):
return True
return False
def to_unicode(s, ignore_errors=False):
if is_unicode(s):
return s
error_mode = "ignore" if ignore_errors else "strict"
return s.decode("utf-8", errors=error_mode)
def unescape_token(escaped_token):
"""
Inverse of encoding escaping.
"""
def match(m):
if m.group(1) is None:
return u"_" if m.group(0) == u"\\u" else u"\\"
try:
return six.unichr(int(m.group(1)))
except (ValueError, OverflowError) as _:
return u"\u3013" # Unicode for undefined character.
trimmed = escaped_token[:-1] if escaped_token.endswith(
"_") else escaped_token
return _UNESCAPE_REGEX.sub(match, trimmed)
def subtoken_ids_to_str(subtoken_ids, vocabs):
"""
Convert a list of subtoken(word piece) ids to a native string.
Refer to SubwordTextEncoder in Tensor2Tensor.
"""
subtokens = [vocabs.get(subtoken_id, u"") for subtoken_id in subtoken_ids]
# Convert a list of subtokens to a list of tokens.
concatenated = "".join([native_to_unicode(t) for t in subtokens])
split = concatenated.split("_")
tokens = []
for t in split:
if t:
unescaped = unescape_token(t + "_")
if unescaped:
tokens.append(unescaped)
# Convert a list of tokens to a unicode string (by inserting spaces bewteen
# word tokens).
token_is_alnum = [t[0] in _ALPHANUMERIC_CHAR_SET for t in tokens]
ret = []
for i, token in enumerate(tokens):
if i > 0 and token_is_alnum[i - 1] and token_is_alnum[i]:
ret.append(u" ")
ret.append(token)
seq = "".join(ret)
return unicode_to_native(seq)
...@@ -20,3 +20,4 @@ data/pascalvoc/trainval.txt ...@@ -20,3 +20,4 @@ data/pascalvoc/trainval.txt
log* log*
*.log *.log
ssd_mobilenet_v1_pascalvoc*
...@@ -38,7 +38,8 @@ train_parameters = { ...@@ -38,7 +38,8 @@ train_parameters = {
"batch_size": 64, "batch_size": 64,
"lr": 0.001, "lr": 0.001,
"lr_epochs": [40, 60, 80, 100], "lr_epochs": [40, 60, 80, 100],
"lr_decay": [1, 0.5, 0.25, 0.1, 0.01] "lr_decay": [1, 0.5, 0.25, 0.1, 0.01],
"ap_version": '11point',
}, },
"coco2014": { "coco2014": {
"train_images": 82783, "train_images": 82783,
...@@ -47,7 +48,8 @@ train_parameters = { ...@@ -47,7 +48,8 @@ train_parameters = {
"batch_size": 64, "batch_size": 64,
"lr": 0.001, "lr": 0.001,
"lr_epochs": [12, 19], "lr_epochs": [12, 19],
"lr_decay": [1, 0.5, 0.25] "lr_decay": [1, 0.5, 0.25],
"ap_version": 'integral', # should use eval_coco_map.py to test model
}, },
"coco2017": { "coco2017": {
"train_images": 118287, "train_images": 118287,
...@@ -56,7 +58,8 @@ train_parameters = { ...@@ -56,7 +58,8 @@ train_parameters = {
"batch_size": 64, "batch_size": 64,
"lr": 0.001, "lr": 0.001,
"lr_epochs": [12, 19], "lr_epochs": [12, 19],
"lr_decay": [1, 0.5, 0.25] "lr_decay": [1, 0.5, 0.25],
"ap_version": 'integral', # should use eval_coco_map.py to test model
} }
} }
...@@ -77,6 +80,7 @@ def optimizer_setting(train_params): ...@@ -77,6 +80,7 @@ def optimizer_setting(train_params):
def build_program(main_prog, startup_prog, train_params, is_train): def build_program(main_prog, startup_prog, train_params, is_train):
image_shape = train_params['image_shape'] image_shape = train_params['image_shape']
class_num = train_params['class_num'] class_num = train_params['class_num']
ap_version = train_params['ap_version']
with fluid.program_guard(main_prog, startup_prog): with fluid.program_guard(main_prog, startup_prog):
py_reader = fluid.layers.py_reader( py_reader = fluid.layers.py_reader(
capacity=64, capacity=64,
...@@ -88,16 +92,16 @@ def build_program(main_prog, startup_prog, train_params, is_train): ...@@ -88,16 +92,16 @@ def build_program(main_prog, startup_prog, train_params, is_train):
image, gt_box, gt_label, difficult = fluid.layers.read_file(py_reader) image, gt_box, gt_label, difficult = fluid.layers.read_file(py_reader)
locs, confs, box, box_var = mobile_net(class_num, image, image_shape) locs, confs, box, box_var = mobile_net(class_num, image, image_shape)
if is_train: if is_train:
loss = fluid.layers.ssd_loss(locs, confs, gt_box, gt_label, box, with fluid.unique_name.guard("train"):
box_var) loss = fluid.layers.ssd_loss(locs, confs, gt_box, gt_label, box,
loss = fluid.layers.reduce_sum(loss) box_var)
optimizer = optimizer_setting(train_params) loss = fluid.layers.reduce_sum(loss)
optimizer.minimize(loss) optimizer = optimizer_setting(train_params)
optimizer.minimize(loss)
else: else:
with fluid.unique_name.guard("inference"):
nmsed_out = fluid.layers.detection_output( nmsed_out = fluid.layers.detection_output(
locs, confs, box, box_var, nms_threshold=0.45) locs, confs, box, box_var, nms_threshold=0.45)
with fluid.program_guard(main_prog):
loss = fluid.evaluator.DetectionMAP( loss = fluid.evaluator.DetectionMAP(
nmsed_out, nmsed_out,
gt_label, gt_label,
...@@ -106,7 +110,7 @@ def build_program(main_prog, startup_prog, train_params, is_train): ...@@ -106,7 +110,7 @@ def build_program(main_prog, startup_prog, train_params, is_train):
class_num, class_num,
overlap_threshold=0.5, overlap_threshold=0.5,
evaluate_difficult=False, evaluate_difficult=False,
ap_version=args.ap_version) ap_version=ap_version)
return py_reader, loss return py_reader, loss
...@@ -230,7 +234,7 @@ def train(args, ...@@ -230,7 +234,7 @@ def train(args,
loss_v = np.mean(np.array(loss_v)) loss_v = np.mean(np.array(loss_v))
every_epoc_loss.append(loss_v) every_epoc_loss.append(loss_v)
if batch_id % 20 == 0: if batch_id % 20 == 0:
print("Epoc {0}, batch {1}, loss {2}, time {3}".format( print("Epoc {:d}, batch {:d}, loss {:.6f}, time {:.5f}".format(
epoc_id, batch_id, loss_v, start_time - prev_start_time)) epoc_id, batch_id, loss_v, start_time - prev_start_time))
end_time = time.time() end_time = time.time()
total_time += end_time - start_time total_time += end_time - start_time
......
...@@ -2,6 +2,7 @@ from __future__ import absolute_import ...@@ -2,6 +2,7 @@ from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
import paddle.fluid as fluid import paddle.fluid as fluid
import six
decoder_size = 128 decoder_size = 128
word_vector_dim = 128 word_vector_dim = 128
...@@ -22,7 +23,7 @@ def conv_bn_pool(input, ...@@ -22,7 +23,7 @@ def conv_bn_pool(input,
pool=True, pool=True,
use_cudnn=True): use_cudnn=True):
tmp = input tmp = input
for i in xrange(group): for i in six.moves.xrange(group):
filter_size = 3 filter_size = 3
conv_std = (2.0 / (filter_size**2 * tmp.shape[1]))**0.5 conv_std = (2.0 / (filter_size**2 * tmp.shape[1]))**0.5
conv_param = fluid.ParamAttr( conv_param = fluid.ParamAttr(
......
import paddle.v2 as paddle
import paddle.fluid as fluid import paddle.fluid as fluid
from utility import add_arguments, print_arguments, to_lodtensor, get_ctc_feeder_data, get_attention_feeder_data from utility import add_arguments, print_arguments, to_lodtensor, get_ctc_feeder_data, get_attention_feeder_data
from attention_model import attention_eval from attention_model import attention_eval
......
from __future__ import print_function from __future__ import print_function
import paddle.v2 as paddle
import paddle.fluid as fluid import paddle.fluid as fluid
from utility import add_arguments, print_arguments, to_lodtensor, get_ctc_feeder_data, get_attention_feeder_for_infer from utility import add_arguments, print_arguments, to_lodtensor, get_ctc_feeder_data, get_attention_feeder_for_infer
import paddle.fluid.profiler as profiler import paddle.fluid.profiler as profiler
......
import numpy as np import numpy as np
import paddle.v2 as paddle
import paddle.fluid as fluid import paddle.fluid as fluid
# reproducible # reproducible
np.random.seed(1) np.random.seed(1)
......
# GRU4REC
以下是本例的简要目录结构及说明:
```text
.
├── README.md # 文档
├── train.py # 训练脚本
├── infer.py # 预测脚本
├── utils # 通用函数
├── convert_format.py # 转换数据格式
├── small_train.txt # 小样本训练集
└── small_test.txt # 小样本测试集
```
## 简介
GRU4REC模型的介绍可以参阅论文[Session-based Recommendations with Recurrent Neural Networks](https://arxiv.org/abs/1511.06939),在本例中,我们实现了GRU4REC的模型。
## RSC15 数据下载及预处理
运行命令 下载RSC15官网数据集
```
curl -Lo yoochoose-data.7z https://s3-eu-west-1.amazonaws.com/yc-rdata/yoochoose-data.7z
7z x yoochoose-data.7z
```
GRU4REC的数据过滤,下载脚本[https://github.com/hidasib/GRU4Rec/blob/master/examples/rsc15/preprocess.py](https://github.com/hidasib/GRU4Rec/blob/master/examples/rsc15/preprocess.py)
注意修改文件路径
line12: PATH_TO_ORIGINAL_DATA = './'
line13:PATH_TO_PROCESSED_DATA = './'
注意使用python3 执行脚本
```
python preprocess.py
```
生成的数据格式如下
```
SessionId ItemId Time
1 214536502 1396839069.277
1 214536500 1396839249.868
1 214536506 1396839286.998
1 214577561 1396839420.306
2 214662742 1396850197.614
2 214662742 1396850239.373
2 214825110 1396850317.446
2 214757390 1396850390.71
2 214757407 1396850438.247
```
数据格式需要转换 运行脚本
```
python convert_format.py
```
模型的训练及测试数据如下,一行表示一个用户按照时间顺序的序列
```
214536502 214536500 214536506 214577561
214662742 214662742 214825110 214757390 214757407 214551617
214716935 214774687 214832672
214836765 214706482
214701242 214826623
214826835 214826715
214838855 214838855
214576500 214576500 214576500
214821275 214821275 214821371 214821371 214821371 214717089 214563337 214706462 214717436 214743335 214826837 214819762
214717867 214717867
```
## 训练
'--use_cuda 1' 表示使用gpu, 缺省表示使用cpu '--parallel 1' 表示使用多卡,缺省表示使用单卡
GPU 环境
运行命令 `CUDA_VISIBLE_DEVICES=0 python train.py train_file test_file --use_cuda 1` 开始训练模型。
```
CUDA_VISIBLE_DEVICES=0 python train.py small_train.txt small_test.txt --use_cuda 1
```
CPU 环境
运行命令 `python train.py train_file test_file` 开始训练模型。
```
python train.py small_train.txt small_test.txt
```
当前支持的参数可参见[train.py](./train.py) `train_net` 函数
```python
batch_size = 50 # batch大小 推荐500()
args = parse_args()
vocab, train_reader, test_reader = utils.prepare_data(
train_file, test_file,batch_size=batch_size * get_cards(args),\
buffer_size=1000, word_freq_threshold=0) # buffer_size 局部序列长度排序
train(
train_reader=train_reader,
vocab=vocab,
network=network,
hid_size=100, # embedding and hidden size
base_lr=0.01, # base learning rate
batch_size=batch_size,
pass_num=10, # the number of passed for training
use_cuda=use_cuda, # whether to use GPU card
parallel=parallel, # whether to be parallel
model_dir="model_recall20", # directory to save model
init_low_bound=-0.1, # uniform parameter initialization lower bound
init_high_bound=0.1) # uniform parameter initialization upper bound
```
## 自定义网络结构
可在[train.py](./train.py) `network` 函数中调整网络结构,当前的网络结构如下:
```python
emb = fluid.layers.embedding(
input=src,
size=[vocab_size, hid_size],
param_attr=fluid.ParamAttr(
initializer=fluid.initializer.Uniform(
low=init_low_bound, high=init_high_bound),
learning_rate=emb_lr_x),
is_sparse=True)
fc0 = fluid.layers.fc(input=emb,
size=hid_size * 3,
param_attr=fluid.ParamAttr(
initializer=fluid.initializer.Uniform(
low=init_low_bound, high=init_high_bound),
learning_rate=gru_lr_x))
gru_h0 = fluid.layers.dynamic_gru(
input=fc0,
size=hid_size,
param_attr=fluid.ParamAttr(
initializer=fluid.initializer.Uniform(
low=init_low_bound, high=init_high_bound),
learning_rate=gru_lr_x))
fc = fluid.layers.fc(input=gru_h0,
size=vocab_size,
act='softmax',
param_attr=fluid.ParamAttr(
initializer=fluid.initializer.Uniform(
low=init_low_bound, high=init_high_bound),
learning_rate=fc_lr_x))
cost = fluid.layers.cross_entropy(input=fc, label=dst)
acc = fluid.layers.accuracy(input=fc, label=dst, k=20)
```
## 训练结果示例
我们在Tesla K40m单GPU卡上训练的日志如下所示
```text
epoch_1 start
step:100 ppl:441.468
step:200 ppl:311.043
step:300 ppl:218.952
step:400 ppl:186.172
step:500 ppl:188.600
step:600 ppl:131.213
step:700 ppl:165.770
step:800 ppl:164.414
step:900 ppl:156.470
step:1000 ppl:174.201
step:1100 ppl:118.619
step:1200 ppl:122.635
step:1300 ppl:118.220
step:1400 ppl:90.372
step:1500 ppl:135.018
step:1600 ppl:114.327
step:1700 ppl:141.806
step:1800 ppl:93.416
step:1900 ppl:92.897
step:2000 ppl:121.703
step:2100 ppl:96.288
step:2200 ppl:88.355
step:2300 ppl:101.737
step:2400 ppl:95.934
step:2500 ppl:86.158
step:2600 ppl:80.925
step:2700 ppl:202.219
step:2800 ppl:106.828
step:2900 ppl:91.458
step:3000 ppl:105.988
step:3100 ppl:87.067
step:3200 ppl:92.651
step:3300 ppl:101.145
step:3400 ppl:91.247
step:3500 ppl:107.656
step:3600 ppl:89.410
...
...
step:15700 ppl:76.819
step:15800 ppl:62.257
step:15900 ppl:81.735
epoch:1 num_steps:15907 time_cost(s):4154.096032
model saved in model_recall20/epoch_1
...
```
## 预测
运行命令 `CUDA_VISIBLE_DEVICES=0 python infer.py model_dir start_epoch last_epoch(inclusive) train_file test_file` 开始预测.其中,start_epoch指定开始预测的轮次,last_epoch指定结束的轮次,例如
```python
CUDA_VISIBLE_DEVICES=0 python infer.py model 1 10 small_train.txt small_test.txt
```
## 预测结果示例
```text
model:model_r@20/epoch_1 recall@20:0.613 time_cost(s):12.23
model:model_r@20/epoch_2 recall@20:0.647 time_cost(s):12.33
model:model_r@20/epoch_3 recall@20:0.662 time_cost(s):12.38
model:model_r@20/epoch_4 recall@20:0.669 time_cost(s):12.21
model:model_r@20/epoch_5 recall@20:0.673 time_cost(s):12.17
model:model_r@20/epoch_6 recall@20:0.675 time_cost(s):12.26
model:model_r@20/epoch_7 recall@20:0.677 time_cost(s):12.25
model:model_r@20/epoch_8 recall@20:0.679 time_cost(s):12.37
model:model_r@20/epoch_9 recall@20:0.680 time_cost(s):12.22
model:model_r@20/epoch_10 recall@20:0.681 time_cost(s):12.2
```
import sys
def convert_format(input, output):
with open(input) as rf:
with open(output, "w") as wf:
last_sess = -1
sign = 1
i = 0
for l in rf:
i = i + 1
if i == 1:
continue
if (i % 1000000 == 1):
print(i)
tokens = l.strip().split()
if (int(tokens[0]) != last_sess):
if (sign):
sign = 0
wf.write(tokens[1] + " ")
else:
wf.write("\n" + tokens[1] + " ")
last_sess = int(tokens[0])
else:
wf.write(tokens[1] + " ")
input = "rsc15_train_tr.txt"
output = "rsc15_train_tr_paddle.txt"
input2 = "rsc15_test.txt"
output2 = "rsc15_test_paddle.txt"
convert_format(input, output)
convert_format(input2, output2)
import sys
import time
import math
import unittest
import contextlib
import numpy as np
import six
import paddle.fluid as fluid
import paddle
import utils
def infer(test_reader, use_cuda, model_path):
""" inference function """
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
exe = fluid.Executor(place)
with fluid.scope_guard(fluid.core.Scope()):
infer_program, feed_target_names, fetch_vars = fluid.io.load_inference_model(
model_path, exe)
accum_num_recall = 0.0
accum_num_sum = 0.0
t0 = time.time()
step_id = 0
for data in test_reader():
step_id += 1
src_wordseq = utils.to_lodtensor([dat[0] for dat in data], place)
label_data = [dat[1] for dat in data]
dst_wordseq = utils.to_lodtensor(label_data, place)
para = exe.run(
infer_program,
feed={"src_wordseq": src_wordseq,
"dst_wordseq": dst_wordseq},
fetch_list=fetch_vars,
return_numpy=False)
acc_ = para[1]._get_float_element(0)
data_length = len(
np.concatenate(
label_data, axis=0).astype("int64"))
accum_num_sum += (data_length)
accum_num_recall += (data_length * acc_)
if step_id % 100 == 0:
print("step:%d " % (step_id), accum_num_recall / accum_num_sum)
t1 = time.time()
print("model:%s recall@20:%.3f time_cost(s):%.2f" %
(model_path, accum_num_recall / accum_num_sum, t1 - t0))
if __name__ == "__main__":
if len(sys.argv) != 6:
print(
"Usage: %s model_dir start_epoch last_epoch(inclusive) train_file test_file"
)
exit(0)
train_file = ""
test_file = ""
model_dir = sys.argv[1]
try:
start_index = int(sys.argv[2])
last_index = int(sys.argv[3])
train_file = sys.argv[4]
test_file = sys.argv[5]
except:
iprint(
"Usage: %s model_dir start_ipoch last_epoch(inclusive) train_file test_file"
)
exit(-1)
vocab, train_reader, test_reader = utils.prepare_data(
train_file,
test_file,
batch_size=5,
buffer_size=1000,
word_freq_threshold=0)
for epoch in xrange(start_index, last_index + 1):
epoch_path = model_dir + "/epoch_" + str(epoch)
infer(test_reader=test_reader, use_cuda=True, model_path=epoch_path)
214586805 214509260
214857547 214857268 214857260
214859848 214857787
214687963 214531502 214687963
214696532 214859034 214858850
214857570 214857810 214857568 214857787 214857182
214857562 214857570 214857562 214857568
214859132 214545928 214859132 214551913
214858843 214859859 214858912 214858691 214859900
214561888 214561888
214688430 214688435 214688430
214536302 214531376 214531659 214531440 214531466 214513382 214550996
214854930 214854930
214858856 214690775 214859306
214859872 214858912 214858689
214859310 214859338 214859338 214859942 214859293 214859889 214859338 214859889 214859075 214859338 214859338 214859889
214574906 214574906
214859342 214859342 214858777 214851155 214851152 214572433
214537127 214857257
214857570 214857570 214857568 214857562 214857015
214854352 214854352 214854354
214738466 214855010 214857605 214856552 214574906 214857765 214849299
214858365 214859900 214859126 214858689 214859126 214859126 214857759 214858850 214859895 214859300
214857260 214561481 214848995 214849052 214865212
214857596 214819412 214819412
214849342 214849342
214859902 214854845 214854845 214854825
214859306 214859126 214859126
214644962 214644960 214644958
214696432 214696434
214708372 214508287 214684093
214857015 214857015 214858847 214690130
214858787 214859855
214858847 214696532 214859304 214854845
214586805 214586805
214857568 214857570
214696532 214858850 214859034 214569238 214568120 214854165 214684785 214854262 214567327
214602729 214857568 214857596
214859122 214858687 214859122 214859872
214555607 214836225 214836225 214836223
214849299 214829724 214855010 214829801 214574906 214586722 214684307 214857570
214859872 214695525
214845947 214586722 214829801
214829312 214546123
214849055 214849052
214509260 214587932 214596435 214644960 214696432 214696434 214545928 214857030 214636329 214832604 214574906
214586805 214586805
214587932 214587932
214857568 214857549 214854894
214836819 214836819 214595855 214595855
214858787 214858787
214854860 214857701
214848750 214643908
214858847 214859872 214859038 214859855 214690130
214847780 214696817 214717305
214509260 214509260
214853122 214853122 214853122 214853323
214858847 214858631 214858691
214859859 214819807 214853072 214853072 214819730
214820450 214705115 214586805
214858787 214859036
214829842 214864967
214846033 214850949
214587932 214586805 214509260 214696432 214855110 214545928
214858856 214859081 214859306 214858854
214690839 214690839 214711277 214839607 214582942 214582942
214857030 214832604
214857570 214855046 214859870 214577475 214858687 214656380
214854845 214854845 214854684 214859893 214854845 214854778
214850630 214848159 214848159 214848159 214848159 214848159 214848159 214848159
214856248 214856248
214858365 214858905 214858905
214712274 214855046
214845947 214845947 214831946 214717511 214846014 214854729
214561462 214561462 214561481 214561481
214836819 214853250
214858854 214859915 214859306 214854300
214857660 214857787 214539307 214855010 214855046 214849299 214856981 214849055
214855046 214854877 214568102 214539523 214579762 214539347 214641127 214600995 214833733 214600995 214684633 214645121 214658040 214712276 214857660 214687895 214854313 214857517
214845962 214853165 214846119
214854146 214859034
214819412 214819412 214819412 214819412
214849747 214578350 214561991
214854341 214854341
214644855 214644857 214531153
214644960 214862167
214640490 214600918 214600922
214854710 214857759 214859306
214858843 214859297 214858631 214859117 214858689 214858912 214859902 214690127
214586805 214586805
214859306 214859306 214859126
214859034 214696532 214858850 214859126 214859859 214859034 214859859 214858850
214857782 214849048 214857787
214854148 214857787 214854877
214858631 214858631 214690127 214859034 214858850 214859117 214858631 214859300 214858843 214859859 214859859
214646036 214646036
214858847 214858631 214690127 214859297
214861603 214700002 214700000 214835117 214700000 214857830 214700000 214712235 214700000 214700002 214510700 214835713 214712235 214853321
214854855 214854815 214854815
214857185 214854637 214829765 214848384 214829765 214856546 214848596 214835167 214563335 214553837 214536185 214855982 214845515 214550844 214712006
214536502 214536500 214536506 214577561
214662742 214662742 214825110 214757390 214757407 214551617
214716935 214774687 214832672
214836765 214706482
214701242 214826623
214826835 214826715
214838855 214838855
214576500 214576500 214576500
214821275 214821275 214821371 214821371 214821371 214717089 214563337 214706462 214717436 214743335 214826837 214819762
214717867 214717867
214836761 214684513 214836761
214577732 214587013 214577732
214826897 214820441
214684093 214684093 214684093
214561790 214561790 214611457 214611457
214577732 214577732
214838503 214838503 214838503 214838503 214838503 214548744
214718203 214718203 214718203 214718203
214837485 214837485 214837485 214837487 214837487 214821315 214586711 214821305 214821307 214844357 214821341 214821309 214551617 214551617 214612920 214837487
214613743 214613743 214539110 214539110
214827028 214827017 214537796 214840762 214707930 214707930 214585652 214536197 214536195 214646169
214579288 214714790 214676070 214601407
214532036 214700432
214836789 214836789 214710804
214537967 214537967
214718246 214826835
214835257 214835265
214834865 214571188 214571188 214571188 214820225 214820225 214820225 214820225 214820225 214820225 214820225 214820225 214706441 214706441 214706441 214706441
214652878 214716737 214652878
214684721 214680356
214551594 214586970
214826769 214537967
214819745 214819745
214691587 214587915
214821277 214821277 214821277 214821277 214821277
214716932 214716932 214716932 214716932 214716932 214716932
214712235 214581489 214602605
214820441 214826897 214826702 214684513 214838100 214544357 214551626 214691484
214545935 214819438 214839907 214835917 214836210
214698491 214523692
214695307 214695305 214538317 214677448
214819468 214716977 214716977 214716977 214716977 214716939
214544355 214601212 214601212 214601212
214716982 214716984
214844248 214844248
214515834 214515830
214717318 214717318
214832557 214559660 214559660 214819520 214586540
214587797 214835775 214844109
214714794 214601407 214826619 214746427 214821300 214717562 214826927 214748334 214826908 214800262
214709645 214709645 214709645 214709645 214709645
214532072 214532070
214827022 214840419
214716984 214832657
214662975 214537779 214840762
214821277 214821277 214821277
214748300 214748293
214826955 214826606 214687642
214832559 214832559 214832559 214821017 214821017 214572234 214826715 214826715
214509135 214536853 214509133 214509135 214509135 214509135 214717877 214826615 214716982
214819472 214687685
214821285 214821285 214826801 214826801
214826705 214826705
214668590 214826872
214652220 214840483 214840483 214717286 214558807 214821300 214826908 214826908 214826908 214554637 214819430 214819430 214826837 214826837 214820392 214820392 214586694 214819376 214553844 214601229 214555500 214695127 214819760 214717850 214718385 214743369 214743369
214648475 214648340 214648438 214648455 214712936 214712887 214696149 214717097 214534352 214534352 214717097
214560099 214560099 214560099 214832750 214560099
214685621 214684093 214546097 214685623
214819685 214839907 214839905 214811752
214717007 214717003 214716928
214820842 214819490
214555869 214537185
214840599 214835735
214838100 214706216
214829737 214821315
214748293 214748293
214712272 214820450
214821380 214821380
214826799 214827005 214718390 214718396 214826627
214841060 214841060
214687768 214706445
214811752 214811754
214594678 214594680 214594680
214821369 214821369 214697771 214697512 214697413 214697409 214652409 214537127 214537127 214820237 214820237 214709645 214699213 214820237 214820237 214820237 214709645 214537127
214554358 214716950
214821275 214829741
214829741 214820842 214821279 214703790
214716954 214838366
214821022 214820814
214684721 214821369 214826833 214819472
214821315 214821305
214826702 214821275
214717847 214819719 214748336
214536440 214536437
214512416 214512416
214839313 214839313 214839313
214826705 214826705
214510044 214510044 214510044 214582387 214537535 214584812 214537535 214584810
214600989 214704180
214705693 214696824 214705682 214696817 214705691 214705693 214711710 214705691 214705691 214687539 214705687 214744796 214681648 214717307 214577750 214650382 214744796 214696817 214705682 214711710
import os
import sys
import time
import six
import numpy as np
import math
import argparse
import paddle.fluid as fluid
import paddle
import time
import utils
SEED = 102
def parse_args():
parser = argparse.ArgumentParser("gru4rec benchmark.")
parser.add_argument('train_file')
parser.add_argument('test_file')
parser.add_argument('--use_cuda', help='whether use gpu')
parser.add_argument('--parallel', help='whether parallel')
parser.add_argument(
'--enable_ce',
action='store_true',
help='If set, run \
the task with continuous evaluation logs.')
parser.add_argument(
'--num_devices', type=int, default=1, help='Number of GPU devices')
args = parser.parse_args()
return args
def network(src, dst, vocab_size, hid_size, init_low_bound, init_high_bound):
""" network definition """
emb_lr_x = 10.0
gru_lr_x = 1.0
fc_lr_x = 1.0
emb = fluid.layers.embedding(
input=src,
size=[vocab_size, hid_size],
param_attr=fluid.ParamAttr(
initializer=fluid.initializer.Uniform(
low=init_low_bound, high=init_high_bound),
learning_rate=emb_lr_x),
is_sparse=True)
fc0 = fluid.layers.fc(input=emb,
size=hid_size * 3,
param_attr=fluid.ParamAttr(
initializer=fluid.initializer.Uniform(
low=init_low_bound, high=init_high_bound),
learning_rate=gru_lr_x))
gru_h0 = fluid.layers.dynamic_gru(
input=fc0,
size=hid_size,
param_attr=fluid.ParamAttr(
initializer=fluid.initializer.Uniform(
low=init_low_bound, high=init_high_bound),
learning_rate=gru_lr_x))
fc = fluid.layers.fc(input=gru_h0,
size=vocab_size,
act='softmax',
param_attr=fluid.ParamAttr(
initializer=fluid.initializer.Uniform(
low=init_low_bound, high=init_high_bound),
learning_rate=fc_lr_x))
cost = fluid.layers.cross_entropy(input=fc, label=dst)
acc = fluid.layers.accuracy(input=fc, label=dst, k=20)
return cost, acc
def train(train_reader,
vocab,
network,
hid_size,
base_lr,
batch_size,
pass_num,
use_cuda,
parallel,
model_dir,
init_low_bound=-0.04,
init_high_bound=0.04):
""" train network """
args = parse_args()
if args.enable_ce:
# random seed must set before configuring the network.
fluid.default_startup_program().random_seed = SEED
vocab_size = len(vocab)
# Input data
src_wordseq = fluid.layers.data(
name="src_wordseq", shape=[1], dtype="int64", lod_level=1)
dst_wordseq = fluid.layers.data(
name="dst_wordseq", shape=[1], dtype="int64", lod_level=1)
# Train program
avg_cost = None
cost, acc = network(src_wordseq, dst_wordseq, vocab_size, hid_size,
init_low_bound, init_high_bound)
avg_cost = fluid.layers.mean(x=cost)
# Optimization to minimize lost
sgd_optimizer = fluid.optimizer.Adagrad(learning_rate=base_lr)
sgd_optimizer.minimize(avg_cost)
# Initialize executor
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
if parallel:
train_exe = fluid.ParallelExecutor(
use_cuda=use_cuda, loss_name=avg_cost.name)
else:
train_exe = exe
total_time = 0.0
fetch_list = [avg_cost.name]
for pass_idx in six.moves.xrange(pass_num):
epoch_idx = pass_idx + 1
print("epoch_%d start" % epoch_idx)
t0 = time.time()
i = 0
newest_ppl = 0
for data in train_reader():
i += 1
lod_src_wordseq = utils.to_lodtensor([dat[0] for dat in data],
place)
lod_dst_wordseq = utils.to_lodtensor([dat[1] for dat in data],
place)
ret_avg_cost = train_exe.run(feed={
"src_wordseq": lod_src_wordseq,
"dst_wordseq": lod_dst_wordseq
},
fetch_list=fetch_list)
avg_ppl = np.exp(ret_avg_cost[0])
newest_ppl = np.mean(avg_ppl)
if i % 10 == 0:
print("step:%d ppl:%.3f" % (i, newest_ppl))
t1 = time.time()
total_time += t1 - t0
print("epoch:%d num_steps:%d time_cost(s):%f" %
(epoch_idx, i, total_time / epoch_idx))
if pass_idx == pass_num - 1 and args.enable_ce:
#Note: The following logs are special for CE monitoring.
#Other situations do not need to care about these logs.
gpu_num = get_cards(args.enable_ce)
if gpu_num == 1:
print("kpis rsc15_pass_duration %s" %
(total_time / epoch_idx))
print("kpis rsc15_avg_ppl %s" % newest_ppl)
else:
print("kpis rsc15_pass_duration_card%s %s" % \
(gpu_num, total_time / epoch_idx))
print("kpis rsc15_avg_ppl_card%s %s" %
(gpu_num, newest_ppl))
save_dir = "%s/epoch_%d" % (model_dir, epoch_idx)
feed_var_names = ["src_wordseq", "dst_wordseq"]
fetch_vars = [avg_cost, acc]
fluid.io.save_inference_model(save_dir, feed_var_names, fetch_vars, exe)
print("model saved in %s" % save_dir)
print("finish training")
def get_cards(args):
if args.enable_ce:
cards = os.environ.get('CUDA_VISIBLE_DEVICES')
num = len(cards.split(","))
return num
else:
return args.num_devices
def train_net():
""" do training """
args = parse_args()
train_file = args.train_file
test_file = args.test_file
use_cuda = True if args.use_cuda else False
parallel = True if args.parallel else False
print("use_cuda:", use_cuda, "parallel:", parallel)
batch_size = 50
vocab, train_reader, test_reader = utils.prepare_data(
train_file, test_file,batch_size=batch_size * get_cards(args),\
buffer_size=1000, word_freq_threshold=0)
train(
train_reader=train_reader,
vocab=vocab,
network=network,
hid_size=100,
base_lr=0.01,
batch_size=batch_size,
pass_num=10,
use_cuda=use_cuda,
parallel=parallel,
model_dir="model_recall20",
init_low_bound=-0.1,
init_high_bound=0.1)
if __name__ == "__main__":
train_net()
import sys
import collections
import six
import time
import numpy as np
import paddle.fluid as fluid
import paddle
def to_lodtensor(data, place):
""" convert to LODtensor """
seq_lens = [len(seq) for seq in data]
cur_len = 0
lod = [cur_len]
for l in seq_lens:
cur_len += l
lod.append(cur_len)
flattened_data = np.concatenate(data, axis=0).astype("int64")
flattened_data = flattened_data.reshape([len(flattened_data), 1])
res = fluid.LoDTensor()
res.set(flattened_data, place)
res.set_lod([lod])
return res
def prepare_data(train_filename,
test_filename,
batch_size,
buffer_size=1000,
word_freq_threshold=0,
enable_ce=False):
""" prepare the English Pann Treebank (PTB) data """
print("start constuct word dict")
vocab = build_dict(word_freq_threshold, train_filename, test_filename)
print("construct word dict done\n")
if enable_ce:
train_reader = paddle.batch(
train(
train_filename, vocab, buffer_size, data_type=DataType.SEQ),
batch_size)
else:
train_reader = sort_batch(
paddle.reader.shuffle(
train(
train_filename, vocab, buffer_size, data_type=DataType.SEQ),
buf_size=buffer_size),
batch_size,
batch_size * 20)
test_reader = sort_batch(
test(
test_filename, vocab, buffer_size, data_type=DataType.SEQ),
batch_size,
batch_size * 20)
return vocab, train_reader, test_reader
def sort_batch(reader, batch_size, sort_group_size, drop_last=False):
"""
Create a batched reader.
:param reader: the data reader to read from.
:type reader: callable
:param batch_size: size of each mini-batch
:type batch_size: int
:param sort_group_size: size of partial sorted batch
:type sort_group_size: int
:param drop_last: drop the last batch, if the size of last batch is not equal to batch_size.
:type drop_last: bool
:return: the batched reader.
:rtype: callable
"""
def batch_reader():
r = reader()
b = []
for instance in r:
b.append(instance)
if len(b) == sort_group_size:
sortl = sorted(b, key=lambda x: len(x[0]), reverse=True)
b = []
c = []
for sort_i in sortl:
c.append(sort_i)
if (len(c) == batch_size):
yield c
c = []
if drop_last == False and len(b) != 0:
sortl = sorted(b, key=lambda x: len(x[0]), reverse=True)
c = []
for sort_i in sortl:
c.append(sort_i)
if (len(c) == batch_size):
yield c
c = []
# Batch size check
batch_size = int(batch_size)
if batch_size <= 0:
raise ValueError("batch_size should be a positive integeral value, "
"but got batch_size={}".format(batch_size))
return batch_reader
class DataType(object):
SEQ = 2
def word_count(input_file, word_freq=None):
"""
compute word count from corpus
"""
if word_freq is None:
word_freq = collections.defaultdict(int)
for l in input_file:
for w in l.strip().split():
word_freq[w] += 1
return word_freq
def build_dict(min_word_freq=50, train_filename="", test_filename=""):
"""
Build a word dictionary from the corpus, Keys of the dictionary are words,
and values are zero-based IDs of these words.
"""
with open(train_filename) as trainf:
with open(test_filename) as testf:
word_freq = word_count(testf, word_count(trainf))
word_freq = [x for x in six.iteritems(word_freq) if x[1] > min_word_freq]
word_freq_sorted = sorted(word_freq, key=lambda x: (-x[1], x[0]))
words, _ = list(zip(*word_freq_sorted))
word_idx = dict(list(zip(words, six.moves.range(len(words)))))
return word_idx
def reader_creator(filename, word_idx, n, data_type):
def reader():
with open(filename) as f:
for l in f:
if DataType.SEQ == data_type:
l = l.strip().split()
l = [word_idx.get(w) for w in l]
src_seq = l[:len(l) - 1]
trg_seq = l[1:]
if n > 0 and len(src_seq) > n: continue
yield src_seq, trg_seq
else:
assert False, 'error data type'
return reader
def train(filename, word_idx, n, data_type=DataType.SEQ):
return reader_creator(filename, word_idx, n, data_type)
def test(filename, word_idx, n, data_type=DataType.SEQ):
return reader_creator(filename, word_idx, n, data_type)
...@@ -89,7 +89,7 @@ def train(train_reader, ...@@ -89,7 +89,7 @@ def train(train_reader,
def train_net(): def train_net():
word_dict, train_reader, test_reader = utils.prepare_data( word_dict, train_reader, test_reader = utils.prepare_data(
"imdb", self_dict=False, batch_size=128, buf_size=50000) "imdb", self_dict=False, batch_size=4, buf_size=50000)
if sys.argv[1] == "bow": if sys.argv[1] == "bow":
train( train(
...@@ -101,7 +101,7 @@ def train_net(): ...@@ -101,7 +101,7 @@ def train_net():
save_dirname="bow_model", save_dirname="bow_model",
lr=0.002, lr=0.002,
pass_num=30, pass_num=30,
batch_size=128) batch_size=4)
elif sys.argv[1] == "cnn": elif sys.argv[1] == "cnn":
train( train(
train_reader, train_reader,
...@@ -134,7 +134,7 @@ def train_net(): ...@@ -134,7 +134,7 @@ def train_net():
save_dirname="gru_model", save_dirname="gru_model",
lr=0.05, lr=0.05,
pass_num=30, pass_num=30,
batch_size=128) batch_size=4)
else: else:
print("network name cannot be found!") print("network name cannot be found!")
sys.exit(1) sys.exit(1)
......
# Text matching on Quora qestion-answer pair dataset
## contents
* [Introduction](#introduction)
* [a brief review of the Quora Question Pair (QQP) Task](#a-brief-review-of-the-quora-question-pair-qqp-task)
* [Our Work](#our-work)
* [Environment Preparation](#environment-preparation)
* [Install Fluid release 1.0](#install-fluid-release-10)
* [cpu version](#cpu-version)
* [gpu version](#gpu-version)
* [Have I installed Fluid successfully?](#have-i-installed-fluid-successfully)
* [Prepare Data](#prepare-data)
* [Train and evaluate](#train-and-evaluate)
* [Models](#models)
* [Results](#results)
## Introduction
### a brief review of the Quora Question Pair (QQP) Task
The [Quora Question Pair](https://data.quora.com/First-Quora-Dataset-Release-Question-Pairs) dataset contains 400,000 question pairs from [Quora](https://www.quora.com/), where people ask and answer questions related to specific areas. Each sample in the dataset consists of two questions (both English) and a label that represents whether the questions are duplicate. The dataset is well annotated by human.
Below are two samples from the dataset. The last column indicates whether the two questions are duplicate (1) or not (0).
|id | qid1 | qid2| question1| question2| is_duplicate
|:---:|:---:|:---:|:---:|:---:|:---:|
|0 |1 |2 |What is the step by step guide to invest in share market in india? |What is the step by step guide to invest in share market? |0|
|1 |3 |4 |What is the story of Kohinoor (Koh-i-Noor) Diamond? | What would happen if the Indian government stole the Kohinoor (Koh-i-Noor) diamond back? |0|
A [kaggle competition](https://www.kaggle.com/c/quora-question-pairs#description) was held based on this dataset in 2017. The kagglers were given a training dataset (with labels), and requested to make predictions on a test dataset (without labels). The predictions were evaluated by the log-likelihood loss on the test data.
The kaggle competition has inspired much effective work. However, most of these models are rule-based and difficult to be transferred to new tasks. Researchers are seeking for more general models that work well on this task and other natual language processing (NLP) tasks.
[Wang _et al._](https://arxiv.org/abs/1702.03814) proposed a bilateral multi-perspective matching (BIMPM) model based on the Quora Question Pair dataset. They splitted the original dataset to [3 parts](https://drive.google.com/file/d/0B0PlTAo--BnaQWlsZl9FZ3l1c28/view?usp=sharing): _train.tsv_ (384,348 samples), _dev.tsv_ (10,000 samples) and _test.tsv_ (10,000 samples). The class distribution of _train.tsv_ is unbalanced (37% positive and 63% negative), while those of _dev.tsv_ and _test.tsv_ are balanced(50% positive and 50% negetive). We used the same splitting method in our experiments.
### Our Work
Based on the Quora Question Pair Dataset, we implemented some classic models in the area of neural language understanding (NLU). The accuracy of prediction results are evaluated on the _test.tsv_ from [Wang _et al._](https://arxiv.org/abs/1702.03814).
## Environment Preparation
### Install Fluid release 1.0
Please follow the [official document in English](http://www.paddlepaddle.org/documentation/docs/en/1.0/build_and_install/pip_install_en.html) or [official document in Chinese](http://www.paddlepaddle.org/documentation/docs/zh/1.0/beginners_guide/install/Start.html) to install the Fluid deep learning framework.
#### Have I installed Fluid successfully?
Run the following script from your command line:
```shell
python -c "import paddle"
```
If Fluid is installed successfully you should see no error message. Feel free to open issues under the [PaddlePaddle repository](https://github.com/PaddlePaddle/Paddle/issues) for support.
## Prepare Data
Please download the Quora dataset from [Google drive](https://drive.google.com/file/d/0B0PlTAo--BnaQWlsZl9FZ3l1c28/view?usp=sharing) and unzip to $HOME/.cache/paddle/dataset.
Then run _data/prepare_quora_data.sh_ to download the pre-trained _word2vec_ embedding file -- _glove.840B.300d.zip_:
```shell
sh data/prepare_quora_data.sh
```
At this point the dataset directory ($HOME/.cache/paddle/dataset) structure should be:
```shell
$HOME/.cache/paddle/dataset
|- Quora_question_pair_partition
|- train.tsv
|- test.tsv
|- dev.tsv
|- readme.txt
|- wordvec.txt
|- glove.840B.300d.txt
```
## Train and evaluate
We provide multiple models and configurations. Details are shown in `models` and `configs` directories. For a quick start, please run the _cdssmNet_ model with the corresponding configuration:
```shell
python train_and_evaluate.py \
--model_name=cdssmNet \
--config=cdssm_base
```
Logs will be output to the console. If everything works well, the logging information will have the same formats as the content in _cdssm_base.log_.
All configurations used in our experiments are as follows:
|Model|Config|command
|:----:|:----:|:----:|
|cdssmNet|cdssm_base|python train_and_evaluate.py --model_name=cdssmNet --config=cdssm_base
|DecAttNet|decatt_glove|python train_and_evaluate.py --model_name=DecAttNet --config=decatt_glove
|InferSentNet|infer_sent_v1|python train_and_evaluate.py --model_name=InferSentNet --config=infer_sent_v1
|InferSentNet|infer_sent_v2|python train_and_evaluate.py --model_name=InferSentNet --config=infer_sent_v2
|SSENet|sse_base|python train_and_evaluate.py --model_name=SSENet --config=sse_base
## Models
We implemeted 4 models for now: the convolutional deep-structured semantic model (CDSSM, CNN-based), the InferSent model (RNN-based), the shortcut-stacked encoder (SSE, RNN-based), and the decomposed attention model (DecAtt, attention-based).
|Model|features|Context Encoder|Match Layer|Classification Layer
|:----:|:----:|:----:|:----:|:----:|
|CDSSM|word|1 layer conv1d|concatenation|MLP
|DecAtt|word|Attention|concatenation|MLP
|InferSent|word|1 layer Bi-LSTM|concatenation/element-wise product/<br>absolute element-wise difference|MLP
|SSE|word|3 layer Bi-LSTM|concatenation/element-wise product/<br>absolute element-wise difference|MLP
### CDSSM
```
@inproceedings{shen2014learning,
title={Learning semantic representations using convolutional neural networks for web search},
author={Shen, Yelong and He, Xiaodong and Gao, Jianfeng and Deng, Li and Mesnil, Gr{\'e}goire},
booktitle={Proceedings of the 23rd International Conference on World Wide Web},
pages={373--374},
year={2014},
organization={ACM}
}
```
### InferSent
```
@article{conneau2017supervised,
title={Supervised learning of universal sentence representations from natural language inference data},
author={Conneau, Alexis and Kiela, Douwe and Schwenk, Holger and Barrault, Loic and Bordes, Antoine},
journal={arXiv preprint arXiv:1705.02364},
year={2017}
}
```
### SSE
```
@article{nie2017shortcut,
title={Shortcut-stacked sentence encoders for multi-domain inference},
author={Nie, Yixin and Bansal, Mohit},
journal={arXiv preprint arXiv:1708.02312},
year={2017}
}
```
### DecAtt
```
@article{tomar2017neural,
title={Neural paraphrase identification of questions with noisy pretraining},
author={Tomar, Gaurav Singh and Duque, Thyago and T{\"a}ckstr{\"o}m, Oscar and Uszkoreit, Jakob and Das, Dipanjan},
journal={arXiv preprint arXiv:1704.04565},
year={2017}
}
```
## Results
|Model|Config|dev accuracy| test accuracy
|:----:|:----:|:----:|:----:|
|cdssmNet|cdssm_base|83.56%|82.83%|
|DecAttNet|decatt_glove|86.31%|86.22%|
|InferSentNet|infer_sent_v1|87.15%|86.62%|
|InferSentNet|infer_sent_v2|88.55%|88.43%|
|SSENet|sse_base|88.35%|88.25%|
In our experiment, we found that LSTM-based models outperformed convolution-based models. The DecAtt model has fewer parameters than LSTM-based models, but is sensitive to hyper-parameters.
<p align="center">
<img src="imgs/models_test_acc.png" width = "500" alt="test_acc"/>
</p>
net_name: cdssmNet
config {'save_dirname': 'model_dir', 'optimizer_type': 'adam', 'duplicate_data': False, 'train_samples_num': 384348, 'droprate_fc': 0.1, 'fc_dim': 128, 'kernel_count': 300, 'mlp_hid_dim': [128, 128], 'OOV_fill': 'uniform', 'class_dim': 2, 'epoch_num': 50, 'lr_decay': 1, 'learning_rate': 0.001, 'batch_size': 128, 'use_lod_tensor': True, 'metric_type': ['accuracy'], 'embedding_norm': False, 'emb_dim': 300, 'droprate_conv': 0.1, 'use_pretrained_word_embedding': True, 'kernel_size': 5, 'dict_dim': 40000}
Generating word dict...
('Vocab size: ', 36057)
loading word2vec from /home/dongdaxiang/.cache/paddle/dataset/glove.840B.300d.txt
preparing pretrained word embedding ...
pretrained_word_embedding to be load: [[-0.086864 0.19161 0.10915 ... -0.01516 0.11108
0.2065 ]
[ 0.27204 -0.06203 -0.1884 ... 0.13015 -0.18317
0.1323 ]
[-0.20628 0.36716 -0.071933 ... 0.14271 0.50059
0.038025 ]
...
[ 0.03847164 0.01711482 0.01181574 ... 0.03926358 -0.04032813
-0.02135365]
[ 0.04201478 -0.02560226 -0.02281064 ... 0.00920258 0.04321
0.0227482 ]
[-0.04984529 -0.00176931 0.03022346 ... 0.0298265 0.02384543
0.00974313]]
param name: emb.w; param shape: (40000L, 300L)
param name: conv1d.w; param shape: (1500L, 300L)
param name: fc1.w; param shape: (300L, 128L)
param name: fc1.b; param shape: (128L,)
param name: fc_2.w_0; param shape: (256L, 128L)
param name: fc_2.b_0; param shape: (128L,)
param name: fc_3.w_0; param shape: (128L, 128L)
param name: fc_3.b_0; param shape: (128L,)
param name: fc_4.w_0; param shape: (128L, 2L)
param name: fc_4.b_0; param shape: (2L,)
loading pretrained word embedding to param
[Wed Oct 10 16:33:18 2018] epoch_id: -1, dev_cost: 0.693804, accuracy: 0.5109
[Wed Oct 10 16:33:18 2018] epoch_id: -1, test_cost: 0.693670, accuracy: 0.5096
[Wed Oct 10 16:33:18 2018] Start Training
[Wed Oct 10 16:33:27 2018] epoch_id: 0, batch_id: 0, cost: 0.699992, acc: 0.515625
[Wed Oct 10 16:33:30 2018] epoch_id: 0, batch_id: 100, cost: 0.557354, acc: 0.695312
[Wed Oct 10 16:33:33 2018] epoch_id: 0, batch_id: 200, cost: 0.548301, acc: 0.742188
[Wed Oct 10 16:33:35 2018] epoch_id: 0, batch_id: 300, cost: 0.528907, acc: 0.742188
[Wed Oct 10 16:33:39 2018] epoch_id: 0, batch_id: 400, cost: 0.482460, acc: 0.781250
[Wed Oct 10 16:33:41 2018] epoch_id: 0, batch_id: 500, cost: 0.494885, acc: 0.718750
[Wed Oct 10 16:33:44 2018] epoch_id: 0, batch_id: 600, cost: 0.600175, acc: 0.695312
[Wed Oct 10 16:33:46 2018] epoch_id: 0, batch_id: 700, cost: 0.477964, acc: 0.757812
[Wed Oct 10 16:33:49 2018] epoch_id: 0, batch_id: 800, cost: 0.468172, acc: 0.750000
[Wed Oct 10 16:33:51 2018] epoch_id: 0, batch_id: 900, cost: 0.394047, acc: 0.835938
[Wed Oct 10 16:33:54 2018] epoch_id: 0, batch_id: 1000, cost: 0.520142, acc: 0.734375
[Wed Oct 10 16:33:56 2018] epoch_id: 0, batch_id: 1100, cost: 0.471779, acc: 0.757812
[Wed Oct 10 16:33:59 2018] epoch_id: 0, batch_id: 1200, cost: 0.407287, acc: 0.789062
[Wed Oct 10 16:34:01 2018] epoch_id: 0, batch_id: 1300, cost: 0.430800, acc: 0.812500
[Wed Oct 10 16:34:03 2018] epoch_id: 0, batch_id: 1400, cost: 0.421967, acc: 0.796875
[Wed Oct 10 16:34:06 2018] epoch_id: 0, batch_id: 1500, cost: 0.388925, acc: 0.835938
[Wed Oct 10 16:34:08 2018] epoch_id: 0, batch_id: 1600, cost: 0.445022, acc: 0.796875
[Wed Oct 10 16:34:10 2018] epoch_id: 0, batch_id: 1700, cost: 0.439095, acc: 0.796875
[Wed Oct 10 16:34:13 2018] epoch_id: 0, batch_id: 1800, cost: 0.448246, acc: 0.765625
[Wed Oct 10 16:34:15 2018] epoch_id: 0, batch_id: 1900, cost: 0.377162, acc: 0.789062
[Wed Oct 10 16:34:17 2018] epoch_id: 0, batch_id: 2000, cost: 0.460397, acc: 0.820312
[Wed Oct 10 16:34:20 2018] epoch_id: 0, batch_id: 2100, cost: 0.416145, acc: 0.812500
[Wed Oct 10 16:34:22 2018] epoch_id: 0, batch_id: 2200, cost: 0.509166, acc: 0.710938
[Wed Oct 10 16:34:24 2018] epoch_id: 0, batch_id: 2300, cost: 0.450925, acc: 0.765625
[Wed Oct 10 16:34:26 2018] epoch_id: 0, batch_id: 2400, cost: 0.457177, acc: 0.796875
[Wed Oct 10 16:34:29 2018] epoch_id: 0, batch_id: 2500, cost: 0.454368, acc: 0.851562
[Wed Oct 10 16:34:31 2018] epoch_id: 0, batch_id: 2600, cost: 0.478799, acc: 0.750000
[Wed Oct 10 16:34:34 2018] epoch_id: 0, batch_id: 2700, cost: 0.521526, acc: 0.757812
[Wed Oct 10 16:34:36 2018] epoch_id: 0, batch_id: 2800, cost: 0.476336, acc: 0.789062
[Wed Oct 10 16:34:38 2018] epoch_id: 0, batch_id: 2900, cost: 0.407489, acc: 0.812500
[Wed Oct 10 16:34:41 2018] epoch_id: 0, batch_id: 3000, cost: 0.404804, acc: 0.820312
[Wed Oct 10 16:34:42 2018] epoch_id: 0, train_avg_cost: 0.456508, train_avg_acc: 0.779733
[Wed Oct 10 16:34:43 2018] epoch_id: 0, dev_cost: 0.469818, accuracy: 0.7691
[Wed Oct 10 16:34:44 2018] epoch_id: 0, test_cost: 0.462696, accuracy: 0.7734
[Wed Oct 10 16:34:53 2018] epoch_id: 1, batch_id: 0, cost: 0.381106, acc: 0.820312
[Wed Oct 10 16:34:56 2018] epoch_id: 1, batch_id: 100, cost: 0.325008, acc: 0.859375
[Wed Oct 10 16:34:58 2018] epoch_id: 1, batch_id: 200, cost: 0.318922, acc: 0.843750
[Wed Oct 10 16:35:00 2018] epoch_id: 1, batch_id: 300, cost: 0.359727, acc: 0.804688
[Wed Oct 10 16:35:03 2018] epoch_id: 1, batch_id: 400, cost: 0.308632, acc: 0.875000
[Wed Oct 10 16:35:05 2018] epoch_id: 1, batch_id: 500, cost: 0.326841, acc: 0.851562
[Wed Oct 10 16:35:09 2018] epoch_id: 1, batch_id: 600, cost: 0.398975, acc: 0.796875
[Wed Oct 10 16:35:12 2018] epoch_id: 1, batch_id: 700, cost: 0.296837, acc: 0.867188
[Wed Oct 10 16:35:14 2018] epoch_id: 1, batch_id: 800, cost: 0.289739, acc: 0.867188
[Wed Oct 10 16:35:17 2018] epoch_id: 1, batch_id: 900, cost: 0.315425, acc: 0.835938
[Wed Oct 10 16:35:19 2018] epoch_id: 1, batch_id: 1000, cost: 0.340806, acc: 0.828125
[Wed Oct 10 16:35:22 2018] epoch_id: 1, batch_id: 1100, cost: 0.383585, acc: 0.828125
[Wed Oct 10 16:35:24 2018] epoch_id: 1, batch_id: 1200, cost: 0.317520, acc: 0.843750
[Wed Oct 10 16:35:26 2018] epoch_id: 1, batch_id: 1300, cost: 0.308717, acc: 0.875000
[Wed Oct 10 16:35:29 2018] epoch_id: 1, batch_id: 1400, cost: 0.320688, acc: 0.828125
[Wed Oct 10 16:35:31 2018] epoch_id: 1, batch_id: 1500, cost: 0.353638, acc: 0.812500
[Wed Oct 10 16:35:34 2018] epoch_id: 1, batch_id: 1600, cost: 0.379113, acc: 0.804688
[Wed Oct 10 16:35:36 2018] epoch_id: 1, batch_id: 1700, cost: 0.309887, acc: 0.859375
[Wed Oct 10 16:35:38 2018] epoch_id: 1, batch_id: 1800, cost: 0.316372, acc: 0.859375
[Wed Oct 10 16:35:41 2018] epoch_id: 1, batch_id: 1900, cost: 0.405585, acc: 0.804688
[Wed Oct 10 16:35:43 2018] epoch_id: 1, batch_id: 2000, cost: 0.336917, acc: 0.851562
[Wed Oct 10 16:35:45 2018] epoch_id: 1, batch_id: 2100, cost: 0.347034, acc: 0.835938
[Wed Oct 10 16:35:48 2018] epoch_id: 1, batch_id: 2200, cost: 0.379728, acc: 0.835938
[Wed Oct 10 16:35:50 2018] epoch_id: 1, batch_id: 2300, cost: 0.395257, acc: 0.820312
[Wed Oct 10 16:35:53 2018] epoch_id: 1, batch_id: 2400, cost: 0.398583, acc: 0.812500
[Wed Oct 10 16:35:55 2018] epoch_id: 1, batch_id: 2500, cost: 0.356259, acc: 0.859375
[Wed Oct 10 16:35:57 2018] epoch_id: 1, batch_id: 2600, cost: 0.297765, acc: 0.835938
[Wed Oct 10 16:35:59 2018] epoch_id: 1, batch_id: 2700, cost: 0.353899, acc: 0.835938
[Wed Oct 10 16:36:02 2018] epoch_id: 1, batch_id: 2800, cost: 0.377699, acc: 0.820312
[Wed Oct 10 16:36:04 2018] epoch_id: 1, batch_id: 2900, cost: 0.388959, acc: 0.804688
[Wed Oct 10 16:36:06 2018] epoch_id: 1, batch_id: 3000, cost: 0.344840, acc: 0.835938
[Wed Oct 10 16:36:07 2018] epoch_id: 1, train_avg_cost: 0.346376, train_avg_acc: 0.842572
[Wed Oct 10 16:36:08 2018] epoch_id: 1, dev_cost: 0.402576, accuracy: 0.8094
[Wed Oct 10 16:36:09 2018] epoch_id: 1, test_cost: 0.397121, accuracy: 0.8185
[Wed Oct 10 16:36:18 2018] epoch_id: 2, batch_id: 0, cost: 0.280530, acc: 0.890625
[Wed Oct 10 16:36:20 2018] epoch_id: 2, batch_id: 100, cost: 0.233576, acc: 0.906250
[Wed Oct 10 16:36:22 2018] epoch_id: 2, batch_id: 200, cost: 0.245128, acc: 0.898438
[Wed Oct 10 16:36:25 2018] epoch_id: 2, batch_id: 300, cost: 0.183943, acc: 0.906250
[Wed Oct 10 16:36:27 2018] epoch_id: 2, batch_id: 400, cost: 0.270915, acc: 0.882812
[Wed Oct 10 16:36:30 2018] epoch_id: 2, batch_id: 500, cost: 0.248726, acc: 0.906250
[Wed Oct 10 16:36:32 2018] epoch_id: 2, batch_id: 600, cost: 0.243351, acc: 0.921875
[Wed Oct 10 16:36:35 2018] epoch_id: 2, batch_id: 700, cost: 0.314026, acc: 0.812500
[Wed Oct 10 16:36:38 2018] epoch_id: 2, batch_id: 800, cost: 0.336282, acc: 0.867188
[Wed Oct 10 16:36:41 2018] epoch_id: 2, batch_id: 900, cost: 0.290222, acc: 0.875000
[Wed Oct 10 16:36:43 2018] epoch_id: 2, batch_id: 1000, cost: 0.287339, acc: 0.859375
[Wed Oct 10 16:36:45 2018] epoch_id: 2, batch_id: 1100, cost: 0.225436, acc: 0.890625
[Wed Oct 10 16:36:48 2018] epoch_id: 2, batch_id: 1200, cost: 0.346974, acc: 0.859375
[Wed Oct 10 16:36:50 2018] epoch_id: 2, batch_id: 1300, cost: 0.283542, acc: 0.843750
[Wed Oct 10 16:36:53 2018] epoch_id: 2, batch_id: 1400, cost: 0.203151, acc: 0.921875
[Wed Oct 10 16:36:55 2018] epoch_id: 2, batch_id: 1500, cost: 0.255483, acc: 0.906250
[Wed Oct 10 16:36:58 2018] epoch_id: 2, batch_id: 1600, cost: 0.275010, acc: 0.898438
[Wed Oct 10 16:37:00 2018] epoch_id: 2, batch_id: 1700, cost: 0.264693, acc: 0.867188
[Wed Oct 10 16:37:03 2018] epoch_id: 2, batch_id: 1800, cost: 0.257360, acc: 0.890625
[Wed Oct 10 16:37:05 2018] epoch_id: 2, batch_id: 1900, cost: 0.150528, acc: 0.921875
[Wed Oct 10 16:37:08 2018] epoch_id: 2, batch_id: 2000, cost: 0.229797, acc: 0.906250
[Wed Oct 10 16:37:11 2018] epoch_id: 2, batch_id: 2100, cost: 0.261790, acc: 0.867188
[Wed Oct 10 16:37:14 2018] epoch_id: 2, batch_id: 2200, cost: 0.201237, acc: 0.914062
[Wed Oct 10 16:37:16 2018] epoch_id: 2, batch_id: 2300, cost: 0.296701, acc: 0.875000
[Wed Oct 10 16:37:19 2018] epoch_id: 2, batch_id: 2400, cost: 0.315291, acc: 0.875000
[Wed Oct 10 16:37:21 2018] epoch_id: 2, batch_id: 2500, cost: 0.282715, acc: 0.843750
[Wed Oct 10 16:37:24 2018] epoch_id: 2, batch_id: 2600, cost: 0.296843, acc: 0.843750
[Wed Oct 10 16:37:26 2018] epoch_id: 2, batch_id: 2700, cost: 0.363040, acc: 0.843750
[Wed Oct 10 16:37:29 2018] epoch_id: 2, batch_id: 2800, cost: 0.262465, acc: 0.867188
[Wed Oct 10 16:37:31 2018] epoch_id: 2, batch_id: 2900, cost: 0.208009, acc: 0.906250
[Wed Oct 10 16:37:34 2018] epoch_id: 2, batch_id: 3000, cost: 0.247068, acc: 0.867188
[Wed Oct 10 16:37:34 2018] epoch_id: 2, train_avg_cost: 0.267260, train_avg_acc: 0.884560
[Wed Oct 10 16:37:36 2018] epoch_id: 2, dev_cost: 0.434485, accuracy: 0.8153
[Wed Oct 10 16:37:37 2018] epoch_id: 2, test_cost: 0.425083, accuracy: 0.8243
[Wed Oct 10 16:37:46 2018] epoch_id: 3, batch_id: 0, cost: 0.130899, acc: 0.945312
[Wed Oct 10 16:37:49 2018] epoch_id: 3, batch_id: 100, cost: 0.174115, acc: 0.914062
[Wed Oct 10 16:37:52 2018] epoch_id: 3, batch_id: 200, cost: 0.162655, acc: 0.929688
[Wed Oct 10 16:37:54 2018] epoch_id: 3, batch_id: 300, cost: 0.156763, acc: 0.937500
[Wed Oct 10 16:37:56 2018] epoch_id: 3, batch_id: 400, cost: 0.171531, acc: 0.929688
[Wed Oct 10 16:37:59 2018] epoch_id: 3, batch_id: 500, cost: 0.124120, acc: 0.937500
[Wed Oct 10 16:38:02 2018] epoch_id: 3, batch_id: 600, cost: 0.172306, acc: 0.929688
[Wed Oct 10 16:38:04 2018] epoch_id: 3, batch_id: 700, cost: 0.352722, acc: 0.867188
[Wed Oct 10 16:38:06 2018] epoch_id: 3, batch_id: 800, cost: 0.179998, acc: 0.929688
[Wed Oct 10 16:38:09 2018] epoch_id: 3, batch_id: 900, cost: 0.197941, acc: 0.921875
[Wed Oct 10 16:38:11 2018] epoch_id: 3, batch_id: 1000, cost: 0.163592, acc: 0.937500
[Wed Oct 10 16:38:14 2018] epoch_id: 3, batch_id: 1100, cost: 0.196162, acc: 0.882812
[Wed Oct 10 16:38:16 2018] epoch_id: 3, batch_id: 1200, cost: 0.201064, acc: 0.929688
[Wed Oct 10 16:38:19 2018] epoch_id: 3, batch_id: 1300, cost: 0.162742, acc: 0.921875
[Wed Oct 10 16:38:21 2018] epoch_id: 3, batch_id: 1400, cost: 0.192062, acc: 0.890625
[Wed Oct 10 16:38:23 2018] epoch_id: 3, batch_id: 1500, cost: 0.215189, acc: 0.914062
[Wed Oct 10 16:38:26 2018] epoch_id: 3, batch_id: 1600, cost: 0.148390, acc: 0.945312
[Wed Oct 10 16:38:28 2018] epoch_id: 3, batch_id: 1700, cost: 0.148536, acc: 0.937500
[Wed Oct 10 16:38:32 2018] epoch_id: 3, batch_id: 1800, cost: 0.122290, acc: 0.960938
[Wed Oct 10 16:38:34 2018] epoch_id: 3, batch_id: 1900, cost: 0.152864, acc: 0.945312
[Wed Oct 10 16:38:37 2018] epoch_id: 3, batch_id: 2000, cost: 0.250165, acc: 0.914062
[Wed Oct 10 16:38:39 2018] epoch_id: 3, batch_id: 2100, cost: 0.197931, acc: 0.929688
[Wed Oct 10 16:38:42 2018] epoch_id: 3, batch_id: 2200, cost: 0.167291, acc: 0.937500
[Wed Oct 10 16:38:44 2018] epoch_id: 3, batch_id: 2300, cost: 0.243269, acc: 0.898438
[Wed Oct 10 16:38:47 2018] epoch_id: 3, batch_id: 2400, cost: 0.170633, acc: 0.921875
[Wed Oct 10 16:38:49 2018] epoch_id: 3, batch_id: 2500, cost: 0.182344, acc: 0.921875
[Wed Oct 10 16:38:52 2018] epoch_id: 3, batch_id: 2600, cost: 0.267497, acc: 0.921875
[Wed Oct 10 16:38:54 2018] epoch_id: 3, batch_id: 2700, cost: 0.170150, acc: 0.929688
[Wed Oct 10 16:38:56 2018] epoch_id: 3, batch_id: 2800, cost: 0.198175, acc: 0.890625
[Wed Oct 10 16:38:59 2018] epoch_id: 3, batch_id: 2900, cost: 0.231687, acc: 0.898438
[Wed Oct 10 16:39:01 2018] epoch_id: 3, batch_id: 3000, cost: 0.280869, acc: 0.882812
[Wed Oct 10 16:39:02 2018] epoch_id: 3, train_avg_cost: 0.203352, train_avg_acc: 0.915808
[Wed Oct 10 16:39:03 2018] epoch_id: 3, dev_cost: 0.413912, accuracy: 0.8304
[Wed Oct 10 16:39:04 2018] epoch_id: 3, test_cost: 0.409365, accuracy: 0.8341
[Wed Oct 10 16:39:13 2018] epoch_id: 4, batch_id: 0, cost: 0.208998, acc: 0.945312
[Wed Oct 10 16:39:16 2018] epoch_id: 4, batch_id: 100, cost: 0.148128, acc: 0.929688
[Wed Oct 10 16:39:18 2018] epoch_id: 4, batch_id: 200, cost: 0.079264, acc: 0.976562
[Wed Oct 10 16:39:21 2018] epoch_id: 4, batch_id: 300, cost: 0.125277, acc: 0.937500
[Wed Oct 10 16:39:23 2018] epoch_id: 4, batch_id: 400, cost: 0.105227, acc: 0.968750
[Wed Oct 10 16:39:25 2018] epoch_id: 4, batch_id: 500, cost: 0.063737, acc: 0.984375
[Wed Oct 10 16:39:28 2018] epoch_id: 4, batch_id: 600, cost: 0.148419, acc: 0.937500
[Wed Oct 10 16:39:30 2018] epoch_id: 4, batch_id: 700, cost: 0.118386, acc: 0.937500
[Wed Oct 10 16:39:33 2018] epoch_id: 4, batch_id: 800, cost: 0.236417, acc: 0.898438
[Wed Oct 10 16:39:35 2018] epoch_id: 4, batch_id: 900, cost: 0.131614, acc: 0.945312
[Wed Oct 10 16:39:38 2018] epoch_id: 4, batch_id: 1000, cost: 0.134897, acc: 0.953125
[Wed Oct 10 16:39:40 2018] epoch_id: 4, batch_id: 1100, cost: 0.152974, acc: 0.945312
[Wed Oct 10 16:39:43 2018] epoch_id: 4, batch_id: 1200, cost: 0.173617, acc: 0.937500
[Wed Oct 10 16:39:45 2018] epoch_id: 4, batch_id: 1300, cost: 0.128535, acc: 0.937500
[Wed Oct 10 16:39:48 2018] epoch_id: 4, batch_id: 1400, cost: 0.156204, acc: 0.945312
[Wed Oct 10 16:39:50 2018] epoch_id: 4, batch_id: 1500, cost: 0.130960, acc: 0.937500
[Wed Oct 10 16:39:53 2018] epoch_id: 4, batch_id: 1600, cost: 0.185379, acc: 0.914062
[Wed Oct 10 16:39:55 2018] epoch_id: 4, batch_id: 1700, cost: 0.092890, acc: 0.960938
[Wed Oct 10 16:39:58 2018] epoch_id: 4, batch_id: 1800, cost: 0.147196, acc: 0.929688
[Wed Oct 10 16:40:00 2018] epoch_id: 4, batch_id: 1900, cost: 0.153621, acc: 0.953125
[Wed Oct 10 16:40:03 2018] epoch_id: 4, batch_id: 2000, cost: 0.153048, acc: 0.921875
[Wed Oct 10 16:40:05 2018] epoch_id: 4, batch_id: 2100, cost: 0.205303, acc: 0.898438
[Wed Oct 10 16:40:07 2018] epoch_id: 4, batch_id: 2200, cost: 0.139906, acc: 0.960938
[Wed Oct 10 16:40:10 2018] epoch_id: 4, batch_id: 2300, cost: 0.254768, acc: 0.890625
[Wed Oct 10 16:40:12 2018] epoch_id: 4, batch_id: 2400, cost: 0.076761, acc: 0.968750
[Wed Oct 10 16:40:14 2018] epoch_id: 4, batch_id: 2500, cost: 0.199733, acc: 0.906250
[Wed Oct 10 16:40:16 2018] epoch_id: 4, batch_id: 2600, cost: 0.310914, acc: 0.882812
[Wed Oct 10 16:40:19 2018] epoch_id: 4, batch_id: 2700, cost: 0.148558, acc: 0.921875
[Wed Oct 10 16:40:21 2018] epoch_id: 4, batch_id: 2800, cost: 0.164562, acc: 0.921875
[Wed Oct 10 16:40:23 2018] epoch_id: 4, batch_id: 2900, cost: 0.177139, acc: 0.921875
[Wed Oct 10 16:40:26 2018] epoch_id: 4, batch_id: 3000, cost: 0.112299, acc: 0.968750
[Wed Oct 10 16:40:27 2018] epoch_id: 4, train_avg_cost: 0.156220, train_avg_acc: 0.937780
[Wed Oct 10 16:40:28 2018] epoch_id: 4, dev_cost: 0.468851, accuracy: 0.8348
[Wed Oct 10 16:40:29 2018] epoch_id: 4, test_cost: 0.468213, accuracy: 0.8368
[Wed Oct 10 16:40:38 2018] epoch_id: 5, batch_id: 0, cost: 0.084071, acc: 0.976562
[Wed Oct 10 16:40:41 2018] epoch_id: 5, batch_id: 100, cost: 0.052093, acc: 0.968750
[Wed Oct 10 16:40:43 2018] epoch_id: 5, batch_id: 200, cost: 0.193576, acc: 0.929688
[Wed Oct 10 16:40:46 2018] epoch_id: 5, batch_id: 300, cost: 0.075502, acc: 0.968750
[Wed Oct 10 16:40:48 2018] epoch_id: 5, batch_id: 400, cost: 0.079619, acc: 0.976562
[Wed Oct 10 16:40:51 2018] epoch_id: 5, batch_id: 500, cost: 0.124719, acc: 0.945312
[Wed Oct 10 16:40:53 2018] epoch_id: 5, batch_id: 600, cost: 0.157322, acc: 0.929688
[Wed Oct 10 16:40:56 2018] epoch_id: 5, batch_id: 700, cost: 0.100680, acc: 0.945312
[Wed Oct 10 16:40:58 2018] epoch_id: 5, batch_id: 800, cost: 0.164627, acc: 0.937500
[Wed Oct 10 16:41:00 2018] epoch_id: 5, batch_id: 900, cost: 0.113826, acc: 0.960938
[Wed Oct 10 16:41:03 2018] epoch_id: 5, batch_id: 1000, cost: 0.122406, acc: 0.953125
[Wed Oct 10 16:41:05 2018] epoch_id: 5, batch_id: 1100, cost: 0.098428, acc: 0.960938
[Wed Oct 10 16:41:08 2018] epoch_id: 5, batch_id: 1200, cost: 0.175987, acc: 0.914062
[Wed Oct 10 16:41:10 2018] epoch_id: 5, batch_id: 1300, cost: 0.161037, acc: 0.929688
[Wed Oct 10 16:41:12 2018] epoch_id: 5, batch_id: 1400, cost: 0.058083, acc: 0.976562
[Wed Oct 10 16:41:14 2018] epoch_id: 5, batch_id: 1500, cost: 0.099512, acc: 0.953125
[Wed Oct 10 16:41:17 2018] epoch_id: 5, batch_id: 1600, cost: 0.155458, acc: 0.929688
[Wed Oct 10 16:41:19 2018] epoch_id: 5, batch_id: 1700, cost: 0.149099, acc: 0.953125
[Wed Oct 10 16:41:21 2018] epoch_id: 5, batch_id: 1800, cost: 0.184663, acc: 0.945312
[Wed Oct 10 16:41:24 2018] epoch_id: 5, batch_id: 1900, cost: 0.153789, acc: 0.945312
[Wed Oct 10 16:41:26 2018] epoch_id: 5, batch_id: 2000, cost: 0.135054, acc: 0.945312
[Wed Oct 10 16:41:28 2018] epoch_id: 5, batch_id: 2100, cost: 0.091075, acc: 0.960938
[Wed Oct 10 16:41:30 2018] epoch_id: 5, batch_id: 2200, cost: 0.175665, acc: 0.937500
[Wed Oct 10 16:41:33 2018] epoch_id: 5, batch_id: 2300, cost: 0.092569, acc: 0.976562
[Wed Oct 10 16:41:35 2018] epoch_id: 5, batch_id: 2400, cost: 0.171366, acc: 0.929688
[Wed Oct 10 16:41:37 2018] epoch_id: 5, batch_id: 2500, cost: 0.077127, acc: 0.984375
[Wed Oct 10 16:41:39 2018] epoch_id: 5, batch_id: 2600, cost: 0.133260, acc: 0.960938
[Wed Oct 10 16:41:43 2018] epoch_id: 5, batch_id: 2700, cost: 0.130742, acc: 0.953125
[Wed Oct 10 16:41:45 2018] epoch_id: 5, batch_id: 2800, cost: 0.165412, acc: 0.945312
[Wed Oct 10 16:41:48 2018] epoch_id: 5, batch_id: 2900, cost: 0.099631, acc: 0.953125
[Wed Oct 10 16:41:50 2018] epoch_id: 5, batch_id: 3000, cost: 0.191953, acc: 0.929688
[Wed Oct 10 16:41:51 2018] epoch_id: 5, train_avg_cost: 0.122534, train_avg_acc: 0.952647
[Wed Oct 10 16:41:52 2018] epoch_id: 5, dev_cost: 0.517809, accuracy: 0.8338
[Wed Oct 10 16:41:53 2018] epoch_id: 5, test_cost: 0.516574, accuracy: 0.8379
[Wed Oct 10 16:42:02 2018] epoch_id: 6, batch_id: 0, cost: 0.108672, acc: 0.953125
[Wed Oct 10 16:42:04 2018] epoch_id: 6, batch_id: 100, cost: 0.055064, acc: 0.984375
[Wed Oct 10 16:42:07 2018] epoch_id: 6, batch_id: 200, cost: 0.070521, acc: 0.976562
[Wed Oct 10 16:42:09 2018] epoch_id: 6, batch_id: 300, cost: 0.044554, acc: 0.992188
[Wed Oct 10 16:42:12 2018] epoch_id: 6, batch_id: 400, cost: 0.140199, acc: 0.968750
[Wed Oct 10 16:42:14 2018] epoch_id: 6, batch_id: 500, cost: 0.074043, acc: 0.984375
[Wed Oct 10 16:42:17 2018] epoch_id: 6, batch_id: 600, cost: 0.072380, acc: 0.960938
[Wed Oct 10 16:42:19 2018] epoch_id: 6, batch_id: 700, cost: 0.089520, acc: 0.968750
[Wed Oct 10 16:42:21 2018] epoch_id: 6, batch_id: 800, cost: 0.154753, acc: 0.937500
[Wed Oct 10 16:42:24 2018] epoch_id: 6, batch_id: 900, cost: 0.137237, acc: 0.945312
[Wed Oct 10 16:42:26 2018] epoch_id: 6, batch_id: 1000, cost: 0.155418, acc: 0.953125
[Wed Oct 10 16:42:28 2018] epoch_id: 6, batch_id: 1100, cost: 0.102754, acc: 0.968750
[Wed Oct 10 16:42:31 2018] epoch_id: 6, batch_id: 1200, cost: 0.171521, acc: 0.929688
[Wed Oct 10 16:42:33 2018] epoch_id: 6, batch_id: 1300, cost: 0.089853, acc: 0.984375
[Wed Oct 10 16:42:36 2018] epoch_id: 6, batch_id: 1400, cost: 0.117480, acc: 0.953125
[Wed Oct 10 16:42:38 2018] epoch_id: 6, batch_id: 1500, cost: 0.144428, acc: 0.953125
[Wed Oct 10 16:42:40 2018] epoch_id: 6, batch_id: 1600, cost: 0.100815, acc: 0.945312
[Wed Oct 10 16:42:43 2018] epoch_id: 6, batch_id: 1700, cost: 0.096131, acc: 0.960938
[Wed Oct 10 16:42:45 2018] epoch_id: 6, batch_id: 1800, cost: 0.083034, acc: 0.968750
[Wed Oct 10 16:42:47 2018] epoch_id: 6, batch_id: 1900, cost: 0.144603, acc: 0.937500
[Wed Oct 10 16:42:50 2018] epoch_id: 6, batch_id: 2000, cost: 0.125068, acc: 0.960938
[Wed Oct 10 16:42:52 2018] epoch_id: 6, batch_id: 2100, cost: 0.096932, acc: 0.945312
[Wed Oct 10 16:42:54 2018] epoch_id: 6, batch_id: 2200, cost: 0.187626, acc: 0.906250
[Wed Oct 10 16:42:58 2018] epoch_id: 6, batch_id: 2300, cost: 0.086040, acc: 0.953125
[Wed Oct 10 16:43:00 2018] epoch_id: 6, batch_id: 2400, cost: 0.112231, acc: 0.960938
[Wed Oct 10 16:43:03 2018] epoch_id: 6, batch_id: 2500, cost: 0.086397, acc: 0.976562
[Wed Oct 10 16:43:05 2018] epoch_id: 6, batch_id: 2600, cost: 0.093871, acc: 0.960938
[Wed Oct 10 16:43:07 2018] epoch_id: 6, batch_id: 2700, cost: 0.143658, acc: 0.953125
[Wed Oct 10 16:43:10 2018] epoch_id: 6, batch_id: 2800, cost: 0.144744, acc: 0.945312
[Wed Oct 10 16:43:12 2018] epoch_id: 6, batch_id: 2900, cost: 0.127995, acc: 0.945312
[Wed Oct 10 16:43:14 2018] epoch_id: 6, batch_id: 3000, cost: 0.201635, acc: 0.929688
[Wed Oct 10 16:43:15 2018] epoch_id: 6, train_avg_cost: 0.100383, train_avg_acc: 0.961683
[Wed Oct 10 16:43:16 2018] epoch_id: 6, dev_cost: 0.622004, accuracy: 0.833
[Wed Oct 10 16:43:17 2018] epoch_id: 6, test_cost: 0.604546, accuracy: 0.836
[Wed Oct 10 16:43:25 2018] epoch_id: 7, batch_id: 0, cost: 0.092909, acc: 0.968750
[Wed Oct 10 16:43:28 2018] epoch_id: 7, batch_id: 100, cost: 0.048849, acc: 0.976562
[Wed Oct 10 16:43:31 2018] epoch_id: 7, batch_id: 200, cost: 0.123149, acc: 0.960938
[Wed Oct 10 16:43:33 2018] epoch_id: 7, batch_id: 300, cost: 0.043434, acc: 0.992188
[Wed Oct 10 16:43:35 2018] epoch_id: 7, batch_id: 400, cost: 0.057082, acc: 0.976562
[Wed Oct 10 16:43:38 2018] epoch_id: 7, batch_id: 500, cost: 0.043290, acc: 0.976562
[Wed Oct 10 16:43:40 2018] epoch_id: 7, batch_id: 600, cost: 0.061600, acc: 0.976562
[Wed Oct 10 16:43:42 2018] epoch_id: 7, batch_id: 700, cost: 0.077328, acc: 0.968750
[Wed Oct 10 16:43:45 2018] epoch_id: 7, batch_id: 800, cost: 0.139978, acc: 0.953125
[Wed Oct 10 16:43:48 2018] epoch_id: 7, batch_id: 900, cost: 0.099730, acc: 0.960938
[Wed Oct 10 16:43:51 2018] epoch_id: 7, batch_id: 1000, cost: 0.072699, acc: 0.976562
[Wed Oct 10 16:43:53 2018] epoch_id: 7, batch_id: 1100, cost: 0.031092, acc: 0.992188
[Wed Oct 10 16:43:55 2018] epoch_id: 7, batch_id: 1200, cost: 0.118547, acc: 0.960938
[Wed Oct 10 16:43:57 2018] epoch_id: 7, batch_id: 1300, cost: 0.061420, acc: 0.976562
[Wed Oct 10 16:44:00 2018] epoch_id: 7, batch_id: 1400, cost: 0.096040, acc: 0.968750
[Wed Oct 10 16:44:02 2018] epoch_id: 7, batch_id: 1500, cost: 0.052711, acc: 0.992188
[Wed Oct 10 16:44:04 2018] epoch_id: 7, batch_id: 1600, cost: 0.150460, acc: 0.929688
[Wed Oct 10 16:44:07 2018] epoch_id: 7, batch_id: 1700, cost: 0.097628, acc: 0.976562
[Wed Oct 10 16:44:09 2018] epoch_id: 7, batch_id: 1800, cost: 0.081382, acc: 0.976562
[Wed Oct 10 16:44:11 2018] epoch_id: 7, batch_id: 1900, cost: 0.089064, acc: 0.953125
[Wed Oct 10 16:44:14 2018] epoch_id: 7, batch_id: 2000, cost: 0.084270, acc: 0.968750
[Wed Oct 10 16:44:16 2018] epoch_id: 7, batch_id: 2100, cost: 0.097173, acc: 0.968750
[Wed Oct 10 16:44:18 2018] epoch_id: 7, batch_id: 2200, cost: 0.112953, acc: 0.960938
[Wed Oct 10 16:44:20 2018] epoch_id: 7, batch_id: 2300, cost: 0.116143, acc: 0.953125
[Wed Oct 10 16:44:23 2018] epoch_id: 7, batch_id: 2400, cost: 0.098675, acc: 0.968750
[Wed Oct 10 16:44:25 2018] epoch_id: 7, batch_id: 2500, cost: 0.150993, acc: 0.945312
[Wed Oct 10 16:44:27 2018] epoch_id: 7, batch_id: 2600, cost: 0.076421, acc: 0.968750
[Wed Oct 10 16:44:29 2018] epoch_id: 7, batch_id: 2700, cost: 0.088665, acc: 0.968750
[Wed Oct 10 16:44:32 2018] epoch_id: 7, batch_id: 2800, cost: 0.142891, acc: 0.937500
[Wed Oct 10 16:44:34 2018] epoch_id: 7, batch_id: 2900, cost: 0.088820, acc: 0.968750
[Wed Oct 10 16:44:36 2018] epoch_id: 7, batch_id: 3000, cost: 0.100579, acc: 0.968750
[Wed Oct 10 16:44:37 2018] epoch_id: 7, train_avg_cost: 0.084162, train_avg_acc: 0.968487
[Wed Oct 10 16:44:38 2018] epoch_id: 7, dev_cost: 0.655423, accuracy: 0.8369
[Wed Oct 10 16:44:39 2018] epoch_id: 7, test_cost: 0.663061, accuracy: 0.8352
[Wed Oct 10 16:44:47 2018] epoch_id: 8, batch_id: 0, cost: 0.037309, acc: 0.992188
[Wed Oct 10 16:44:50 2018] epoch_id: 8, batch_id: 100, cost: 0.043888, acc: 0.976562
[Wed Oct 10 16:44:52 2018] epoch_id: 8, batch_id: 200, cost: 0.099702, acc: 0.960938
[Wed Oct 10 16:44:54 2018] epoch_id: 8, batch_id: 300, cost: 0.080207, acc: 0.976562
[Wed Oct 10 16:44:56 2018] epoch_id: 8, batch_id: 400, cost: 0.049319, acc: 0.976562
[Wed Oct 10 16:44:59 2018] epoch_id: 8, batch_id: 500, cost: 0.041202, acc: 0.976562
[Wed Oct 10 16:45:01 2018] epoch_id: 8, batch_id: 600, cost: 0.061663, acc: 0.968750
[Wed Oct 10 16:45:03 2018] epoch_id: 8, batch_id: 700, cost: 0.065126, acc: 0.984375
[Wed Oct 10 16:45:05 2018] epoch_id: 8, batch_id: 800, cost: 0.057770, acc: 0.976562
[Wed Oct 10 16:45:07 2018] epoch_id: 8, batch_id: 900, cost: 0.136513, acc: 0.929688
[Wed Oct 10 16:45:10 2018] epoch_id: 8, batch_id: 1000, cost: 0.054884, acc: 0.968750
[Wed Oct 10 16:45:12 2018] epoch_id: 8, batch_id: 1100, cost: 0.046854, acc: 0.992188
[Wed Oct 10 16:45:14 2018] epoch_id: 8, batch_id: 1200, cost: 0.031739, acc: 1.000000
[Wed Oct 10 16:45:17 2018] epoch_id: 8, batch_id: 1300, cost: 0.127405, acc: 0.953125
[Wed Oct 10 16:45:19 2018] epoch_id: 8, batch_id: 1400, cost: 0.052842, acc: 0.976562
[Wed Oct 10 16:45:21 2018] epoch_id: 8, batch_id: 1500, cost: 0.117588, acc: 0.960938
[Wed Oct 10 16:45:23 2018] epoch_id: 8, batch_id: 1600, cost: 0.078688, acc: 0.968750
[Wed Oct 10 16:45:26 2018] epoch_id: 8, batch_id: 1700, cost: 0.069420, acc: 0.976562
[Wed Oct 10 16:45:28 2018] epoch_id: 8, batch_id: 1800, cost: 0.055502, acc: 0.976562
[Wed Oct 10 16:45:31 2018] epoch_id: 8, batch_id: 1900, cost: 0.161759, acc: 0.945312
[Wed Oct 10 16:45:34 2018] epoch_id: 8, batch_id: 2000, cost: 0.063610, acc: 0.984375
[Wed Oct 10 16:45:36 2018] epoch_id: 8, batch_id: 2100, cost: 0.103227, acc: 0.937500
[Wed Oct 10 16:45:38 2018] epoch_id: 8, batch_id: 2200, cost: 0.065949, acc: 0.976562
[Wed Oct 10 16:45:40 2018] epoch_id: 8, batch_id: 2300, cost: 0.060299, acc: 0.968750
[Wed Oct 10 16:45:43 2018] epoch_id: 8, batch_id: 2400, cost: 0.089557, acc: 0.976562
[Wed Oct 10 16:45:45 2018] epoch_id: 8, batch_id: 2500, cost: 0.095753, acc: 0.968750
[Wed Oct 10 16:45:47 2018] epoch_id: 8, batch_id: 2600, cost: 0.111113, acc: 0.968750
[Wed Oct 10 16:45:49 2018] epoch_id: 8, batch_id: 2700, cost: 0.074921, acc: 0.960938
[Wed Oct 10 16:45:52 2018] epoch_id: 8, batch_id: 2800, cost: 0.105058, acc: 0.945312
[Wed Oct 10 16:45:54 2018] epoch_id: 8, batch_id: 2900, cost: 0.173304, acc: 0.921875
[Wed Oct 10 16:45:56 2018] epoch_id: 8, batch_id: 3000, cost: 0.077586, acc: 0.984375
[Wed Oct 10 16:45:56 2018] epoch_id: 8, train_avg_cost: 0.072280, train_avg_acc: 0.973521
[Wed Oct 10 16:45:57 2018] epoch_id: 8, dev_cost: 0.629243, accuracy: 0.8373
[Wed Oct 10 16:45:58 2018] epoch_id: 8, test_cost: 0.661630, accuracy: 0.8352
[Wed Oct 10 16:46:07 2018] epoch_id: 9, batch_id: 0, cost: 0.044024, acc: 0.984375
[Wed Oct 10 16:46:09 2018] epoch_id: 9, batch_id: 100, cost: 0.033798, acc: 0.992188
[Wed Oct 10 16:46:11 2018] epoch_id: 9, batch_id: 200, cost: 0.077856, acc: 0.976562
[Wed Oct 10 16:46:14 2018] epoch_id: 9, batch_id: 300, cost: 0.119995, acc: 0.953125
[Wed Oct 10 16:46:16 2018] epoch_id: 9, batch_id: 400, cost: 0.006741, acc: 1.000000
[Wed Oct 10 16:46:18 2018] epoch_id: 9, batch_id: 500, cost: 0.097501, acc: 0.953125
[Wed Oct 10 16:46:20 2018] epoch_id: 9, batch_id: 600, cost: 0.097540, acc: 0.960938
[Wed Oct 10 16:46:22 2018] epoch_id: 9, batch_id: 700, cost: 0.085677, acc: 0.976562
[Wed Oct 10 16:46:25 2018] epoch_id: 9, batch_id: 800, cost: 0.131135, acc: 0.960938
[Wed Oct 10 16:46:27 2018] epoch_id: 9, batch_id: 900, cost: 0.058706, acc: 0.960938
[Wed Oct 10 16:46:29 2018] epoch_id: 9, batch_id: 1000, cost: 0.081857, acc: 0.968750
[Wed Oct 10 16:46:31 2018] epoch_id: 9, batch_id: 1100, cost: 0.035656, acc: 0.992188
[Wed Oct 10 16:46:34 2018] epoch_id: 9, batch_id: 1200, cost: 0.023980, acc: 0.992188
[Wed Oct 10 16:46:36 2018] epoch_id: 9, batch_id: 1300, cost: 0.104535, acc: 0.976562
[Wed Oct 10 16:46:38 2018] epoch_id: 9, batch_id: 1400, cost: 0.052738, acc: 0.960938
[Wed Oct 10 16:46:40 2018] epoch_id: 9, batch_id: 1500, cost: 0.049284, acc: 0.984375
[Wed Oct 10 16:46:43 2018] epoch_id: 9, batch_id: 1600, cost: 0.040960, acc: 0.976562
[Wed Oct 10 16:46:45 2018] epoch_id: 9, batch_id: 1700, cost: 0.054090, acc: 0.976562
[Wed Oct 10 16:46:47 2018] epoch_id: 9, batch_id: 1800, cost: 0.030307, acc: 0.992188
[Wed Oct 10 16:46:49 2018] epoch_id: 9, batch_id: 1900, cost: 0.152908, acc: 0.960938
[Wed Oct 10 16:46:52 2018] epoch_id: 9, batch_id: 2000, cost: 0.133532, acc: 0.945312
[Wed Oct 10 16:46:54 2018] epoch_id: 9, batch_id: 2100, cost: 0.162579, acc: 0.929688
[Wed Oct 10 16:46:56 2018] epoch_id: 9, batch_id: 2200, cost: 0.037171, acc: 0.984375
[Wed Oct 10 16:46:58 2018] epoch_id: 9, batch_id: 2300, cost: 0.036093, acc: 0.992188
[Wed Oct 10 16:47:00 2018] epoch_id: 9, batch_id: 2400, cost: 0.066371, acc: 0.976562
[Wed Oct 10 16:47:02 2018] epoch_id: 9, batch_id: 2500, cost: 0.047459, acc: 0.984375
[Wed Oct 10 16:47:04 2018] epoch_id: 9, batch_id: 2600, cost: 0.031411, acc: 0.992188
[Wed Oct 10 16:47:06 2018] epoch_id: 9, batch_id: 2700, cost: 0.107300, acc: 0.953125
[Wed Oct 10 16:47:09 2018] epoch_id: 9, batch_id: 2800, cost: 0.041434, acc: 0.984375
[Wed Oct 10 16:47:11 2018] epoch_id: 9, batch_id: 2900, cost: 0.081185, acc: 0.960938
[Wed Oct 10 16:47:13 2018] epoch_id: 9, batch_id: 3000, cost: 0.096274, acc: 0.960938
[Wed Oct 10 16:47:13 2018] epoch_id: 9, train_avg_cost: 0.063124, train_avg_acc: 0.976961
[Wed Oct 10 16:47:14 2018] epoch_id: 9, dev_cost: 0.678009, accuracy: 0.8403
[Wed Oct 10 16:47:15 2018] epoch_id: 9, test_cost: 0.707977, accuracy: 0.8359
[Wed Oct 10 16:47:24 2018] epoch_id: 10, batch_id: 0, cost: 0.053481, acc: 0.968750
[Wed Oct 10 16:47:26 2018] epoch_id: 10, batch_id: 100, cost: 0.024990, acc: 0.984375
[Wed Oct 10 16:47:29 2018] epoch_id: 10, batch_id: 200, cost: 0.025989, acc: 0.992188
[Wed Oct 10 16:47:31 2018] epoch_id: 10, batch_id: 300, cost: 0.016467, acc: 0.992188
[Wed Oct 10 16:47:33 2018] epoch_id: 10, batch_id: 400, cost: 0.013582, acc: 1.000000
[Wed Oct 10 16:47:35 2018] epoch_id: 10, batch_id: 500, cost: 0.062821, acc: 0.984375
[Wed Oct 10 16:47:38 2018] epoch_id: 10, batch_id: 600, cost: 0.018919, acc: 0.992188
[Wed Oct 10 16:47:40 2018] epoch_id: 10, batch_id: 700, cost: 0.113543, acc: 0.937500
[Wed Oct 10 16:47:43 2018] epoch_id: 10, batch_id: 800, cost: 0.042273, acc: 0.984375
[Wed Oct 10 16:47:45 2018] epoch_id: 10, batch_id: 900, cost: 0.040787, acc: 0.976562
[Wed Oct 10 16:47:47 2018] epoch_id: 10, batch_id: 1000, cost: 0.013215, acc: 1.000000
[Wed Oct 10 16:47:50 2018] epoch_id: 10, batch_id: 1100, cost: 0.056862, acc: 0.984375
[Wed Oct 10 16:47:52 2018] epoch_id: 10, batch_id: 1200, cost: 0.114343, acc: 0.960938
[Wed Oct 10 16:47:54 2018] epoch_id: 10, batch_id: 1300, cost: 0.068139, acc: 0.968750
[Wed Oct 10 16:47:57 2018] epoch_id: 10, batch_id: 1400, cost: 0.036262, acc: 0.984375
[Wed Oct 10 16:47:59 2018] epoch_id: 10, batch_id: 1500, cost: 0.031832, acc: 0.984375
[Wed Oct 10 16:48:01 2018] epoch_id: 10, batch_id: 1600, cost: 0.098699, acc: 0.953125
[Wed Oct 10 16:48:03 2018] epoch_id: 10, batch_id: 1700, cost: 0.073122, acc: 0.976562
[Wed Oct 10 16:48:06 2018] epoch_id: 10, batch_id: 1800, cost: 0.035890, acc: 0.984375
[Wed Oct 10 16:48:08 2018] epoch_id: 10, batch_id: 1900, cost: 0.036370, acc: 0.968750
[Wed Oct 10 16:48:10 2018] epoch_id: 10, batch_id: 2000, cost: 0.073071, acc: 0.976562
[Wed Oct 10 16:48:12 2018] epoch_id: 10, batch_id: 2100, cost: 0.017344, acc: 1.000000
[Wed Oct 10 16:48:15 2018] epoch_id: 10, batch_id: 2200, cost: 0.146855, acc: 0.953125
[Wed Oct 10 16:48:17 2018] epoch_id: 10, batch_id: 2300, cost: 0.068342, acc: 0.968750
[Wed Oct 10 16:48:19 2018] epoch_id: 10, batch_id: 2400, cost: 0.026733, acc: 0.992188
[Wed Oct 10 16:48:21 2018] epoch_id: 10, batch_id: 2500, cost: 0.085184, acc: 0.976562
[Wed Oct 10 16:48:23 2018] epoch_id: 10, batch_id: 2600, cost: 0.065530, acc: 0.984375
[Wed Oct 10 16:48:26 2018] epoch_id: 10, batch_id: 2700, cost: 0.111871, acc: 0.968750
[Wed Oct 10 16:48:29 2018] epoch_id: 10, batch_id: 2800, cost: 0.063721, acc: 0.968750
[Wed Oct 10 16:48:31 2018] epoch_id: 10, batch_id: 2900, cost: 0.026759, acc: 0.992188
[Wed Oct 10 16:48:34 2018] epoch_id: 10, batch_id: 3000, cost: 0.031338, acc: 0.992188
[Wed Oct 10 16:48:34 2018] epoch_id: 10, train_avg_cost: 0.055555, train_avg_acc: 0.979852
[Wed Oct 10 16:48:35 2018] epoch_id: 10, dev_cost: 0.782007, accuracy: 0.8366
[Wed Oct 10 16:48:36 2018] epoch_id: 10, test_cost: 0.795087, accuracy: 0.8369
[Wed Oct 10 16:48:44 2018] epoch_id: 11, batch_id: 0, cost: 0.032797, acc: 0.992188
[Wed Oct 10 16:48:47 2018] epoch_id: 11, batch_id: 100, cost: 0.011773, acc: 0.992188
[Wed Oct 10 16:48:49 2018] epoch_id: 11, batch_id: 200, cost: 0.012297, acc: 1.000000
[Wed Oct 10 16:48:51 2018] epoch_id: 11, batch_id: 300, cost: 0.032454, acc: 0.992188
[Wed Oct 10 16:48:53 2018] epoch_id: 11, batch_id: 400, cost: 0.100247, acc: 0.976562
[Wed Oct 10 16:48:55 2018] epoch_id: 11, batch_id: 500, cost: 0.035470, acc: 0.992188
[Wed Oct 10 16:48:58 2018] epoch_id: 11, batch_id: 600, cost: 0.032553, acc: 0.984375
[Wed Oct 10 16:49:00 2018] epoch_id: 11, batch_id: 700, cost: 0.035226, acc: 0.984375
[Wed Oct 10 16:49:02 2018] epoch_id: 11, batch_id: 800, cost: 0.010961, acc: 1.000000
[Wed Oct 10 16:49:04 2018] epoch_id: 11, batch_id: 900, cost: 0.033747, acc: 0.984375
[Wed Oct 10 16:49:07 2018] epoch_id: 11, batch_id: 1000, cost: 0.052710, acc: 0.976562
[Wed Oct 10 16:49:09 2018] epoch_id: 11, batch_id: 1100, cost: 0.021664, acc: 0.992188
[Wed Oct 10 16:49:11 2018] epoch_id: 11, batch_id: 1200, cost: 0.056635, acc: 0.984375
[Wed Oct 10 16:49:14 2018] epoch_id: 11, batch_id: 1300, cost: 0.007764, acc: 1.000000
[Wed Oct 10 16:49:16 2018] epoch_id: 11, batch_id: 1400, cost: 0.042336, acc: 0.976562
[Wed Oct 10 16:49:18 2018] epoch_id: 11, batch_id: 1500, cost: 0.077117, acc: 0.976562
[Wed Oct 10 16:49:20 2018] epoch_id: 11, batch_id: 1600, cost: 0.082522, acc: 0.976562
[Wed Oct 10 16:49:22 2018] epoch_id: 11, batch_id: 1700, cost: 0.022290, acc: 1.000000
[Wed Oct 10 16:49:25 2018] epoch_id: 11, batch_id: 1800, cost: 0.033992, acc: 0.984375
[Wed Oct 10 16:49:27 2018] epoch_id: 11, batch_id: 1900, cost: 0.027460, acc: 0.992188
[Wed Oct 10 16:49:29 2018] epoch_id: 11, batch_id: 2000, cost: 0.032003, acc: 0.992188
[Wed Oct 10 16:49:31 2018] epoch_id: 11, batch_id: 2100, cost: 0.070170, acc: 0.976562
[Wed Oct 10 16:49:33 2018] epoch_id: 11, batch_id: 2200, cost: 0.017124, acc: 0.992188
[Wed Oct 10 16:49:36 2018] epoch_id: 11, batch_id: 2300, cost: 0.037207, acc: 0.984375
[Wed Oct 10 16:49:39 2018] epoch_id: 11, batch_id: 2400, cost: 0.018202, acc: 1.000000
[Wed Oct 10 16:49:41 2018] epoch_id: 11, batch_id: 2500, cost: 0.059570, acc: 0.976562
[Wed Oct 10 16:49:43 2018] epoch_id: 11, batch_id: 2600, cost: 0.009950, acc: 1.000000
[Wed Oct 10 16:49:46 2018] epoch_id: 11, batch_id: 2700, cost: 0.015869, acc: 1.000000
[Wed Oct 10 16:49:48 2018] epoch_id: 11, batch_id: 2800, cost: 0.049429, acc: 0.984375
[Wed Oct 10 16:49:50 2018] epoch_id: 11, batch_id: 2900, cost: 0.061248, acc: 0.976562
[Wed Oct 10 16:49:52 2018] epoch_id: 11, batch_id: 3000, cost: 0.007281, acc: 1.000000
[Wed Oct 10 16:49:53 2018] epoch_id: 11, train_avg_cost: 0.049100, train_avg_acc: 0.982414
[Wed Oct 10 16:49:54 2018] epoch_id: 11, dev_cost: 0.919803, accuracy: 0.8392
[Wed Oct 10 16:49:55 2018] epoch_id: 11, test_cost: 0.963836, accuracy: 0.8354
[Wed Oct 10 16:50:03 2018] epoch_id: 12, batch_id: 0, cost: 0.021594, acc: 0.992188
[Wed Oct 10 16:50:05 2018] epoch_id: 12, batch_id: 100, cost: 0.003167, acc: 1.000000
[Wed Oct 10 16:50:08 2018] epoch_id: 12, batch_id: 200, cost: 0.034331, acc: 0.984375
[Wed Oct 10 16:50:10 2018] epoch_id: 12, batch_id: 300, cost: 0.044300, acc: 0.984375
[Wed Oct 10 16:50:12 2018] epoch_id: 12, batch_id: 400, cost: 0.010300, acc: 1.000000
[Wed Oct 10 16:50:15 2018] epoch_id: 12, batch_id: 500, cost: 0.071121, acc: 0.968750
[Wed Oct 10 16:50:17 2018] epoch_id: 12, batch_id: 600, cost: 0.027463, acc: 0.984375
[Wed Oct 10 16:50:19 2018] epoch_id: 12, batch_id: 700, cost: 0.023278, acc: 0.992188
[Wed Oct 10 16:50:22 2018] epoch_id: 12, batch_id: 800, cost: 0.024731, acc: 0.992188
[Wed Oct 10 16:50:25 2018] epoch_id: 12, batch_id: 900, cost: 0.033520, acc: 0.992188
[Wed Oct 10 16:50:27 2018] epoch_id: 12, batch_id: 1000, cost: 0.066168, acc: 0.984375
[Wed Oct 10 16:50:29 2018] epoch_id: 12, batch_id: 1100, cost: 0.086032, acc: 0.976562
[Wed Oct 10 16:50:32 2018] epoch_id: 12, batch_id: 1200, cost: 0.041718, acc: 0.968750
[Wed Oct 10 16:50:34 2018] epoch_id: 12, batch_id: 1300, cost: 0.085903, acc: 0.968750
[Wed Oct 10 16:50:36 2018] epoch_id: 12, batch_id: 1400, cost: 0.022963, acc: 0.992188
[Wed Oct 10 16:50:38 2018] epoch_id: 12, batch_id: 1500, cost: 0.008185, acc: 1.000000
[Wed Oct 10 16:50:41 2018] epoch_id: 12, batch_id: 1600, cost: 0.057872, acc: 0.968750
[Wed Oct 10 16:50:43 2018] epoch_id: 12, batch_id: 1700, cost: 0.011306, acc: 1.000000
[Wed Oct 10 16:50:45 2018] epoch_id: 12, batch_id: 1800, cost: 0.030697, acc: 0.984375
[Wed Oct 10 16:50:47 2018] epoch_id: 12, batch_id: 1900, cost: 0.049713, acc: 0.984375
[Wed Oct 10 16:50:50 2018] epoch_id: 12, batch_id: 2000, cost: 0.050341, acc: 0.976562
[Wed Oct 10 16:50:52 2018] epoch_id: 12, batch_id: 2100, cost: 0.024994, acc: 0.992188
[Wed Oct 10 16:50:54 2018] epoch_id: 12, batch_id: 2200, cost: 0.046852, acc: 0.968750
[Wed Oct 10 16:50:56 2018] epoch_id: 12, batch_id: 2300, cost: 0.055520, acc: 0.976562
[Wed Oct 10 16:50:59 2018] epoch_id: 12, batch_id: 2400, cost: 0.085991, acc: 0.968750
[Wed Oct 10 16:51:01 2018] epoch_id: 12, batch_id: 2500, cost: 0.044263, acc: 0.984375
[Wed Oct 10 16:51:03 2018] epoch_id: 12, batch_id: 2600, cost: 0.071548, acc: 0.976562
[Wed Oct 10 16:51:05 2018] epoch_id: 12, batch_id: 2700, cost: 0.039594, acc: 0.976562
[Wed Oct 10 16:51:08 2018] epoch_id: 12, batch_id: 2800, cost: 0.058939, acc: 0.984375
[Wed Oct 10 16:51:10 2018] epoch_id: 12, batch_id: 2900, cost: 0.070956, acc: 0.968750
[Wed Oct 10 16:51:12 2018] epoch_id: 12, batch_id: 3000, cost: 0.059941, acc: 0.960938
[Wed Oct 10 16:51:13 2018] epoch_id: 12, train_avg_cost: 0.044984, train_avg_acc: 0.983741
[Wed Oct 10 16:51:14 2018] epoch_id: 12, dev_cost: 0.742705, accuracy: 0.8364
[Wed Oct 10 16:51:14 2018] epoch_id: 12, test_cost: 0.765290, accuracy: 0.8355
[Wed Oct 10 16:51:23 2018] epoch_id: 13, batch_id: 0, cost: 0.054822, acc: 0.968750
[Wed Oct 10 16:51:25 2018] epoch_id: 13, batch_id: 100, cost: 0.066483, acc: 0.976562
[Wed Oct 10 16:51:28 2018] epoch_id: 13, batch_id: 200, cost: 0.007064, acc: 1.000000
[Wed Oct 10 16:51:30 2018] epoch_id: 13, batch_id: 300, cost: 0.050190, acc: 0.984375
[Wed Oct 10 16:51:32 2018] epoch_id: 13, batch_id: 400, cost: 0.044636, acc: 0.984375
[Wed Oct 10 16:51:34 2018] epoch_id: 13, batch_id: 500, cost: 0.040963, acc: 0.984375
[Wed Oct 10 16:51:37 2018] epoch_id: 13, batch_id: 600, cost: 0.029529, acc: 0.992188
[Wed Oct 10 16:51:39 2018] epoch_id: 13, batch_id: 700, cost: 0.011587, acc: 1.000000
[Wed Oct 10 16:51:41 2018] epoch_id: 13, batch_id: 800, cost: 0.039673, acc: 0.984375
[Wed Oct 10 16:51:43 2018] epoch_id: 13, batch_id: 900, cost: 0.028793, acc: 0.984375
[Wed Oct 10 16:51:46 2018] epoch_id: 13, batch_id: 1000, cost: 0.055973, acc: 0.968750
[Wed Oct 10 16:51:48 2018] epoch_id: 13, batch_id: 1100, cost: 0.016087, acc: 0.992188
[Wed Oct 10 16:51:50 2018] epoch_id: 13, batch_id: 1200, cost: 0.096423, acc: 0.960938
[Wed Oct 10 16:51:52 2018] epoch_id: 13, batch_id: 1300, cost: 0.019652, acc: 0.992188
[Wed Oct 10 16:51:55 2018] epoch_id: 13, batch_id: 1400, cost: 0.018604, acc: 0.992188
[Wed Oct 10 16:51:57 2018] epoch_id: 13, batch_id: 1500, cost: 0.060169, acc: 0.960938
[Wed Oct 10 16:51:59 2018] epoch_id: 13, batch_id: 1600, cost: 0.014124, acc: 0.992188
[Wed Oct 10 16:52:01 2018] epoch_id: 13, batch_id: 1700, cost: 0.029843, acc: 0.984375
[Wed Oct 10 16:52:05 2018] epoch_id: 13, batch_id: 1800, cost: 0.063125, acc: 0.976562
[Wed Oct 10 16:52:07 2018] epoch_id: 13, batch_id: 1900, cost: 0.070910, acc: 0.953125
[Wed Oct 10 16:52:09 2018] epoch_id: 13, batch_id: 2000, cost: 0.042864, acc: 0.984375
[Wed Oct 10 16:52:11 2018] epoch_id: 13, batch_id: 2100, cost: 0.014658, acc: 0.992188
[Wed Oct 10 16:52:14 2018] epoch_id: 13, batch_id: 2200, cost: 0.075003, acc: 0.968750
[Wed Oct 10 16:52:16 2018] epoch_id: 13, batch_id: 2300, cost: 0.034856, acc: 0.976562
[Wed Oct 10 16:52:18 2018] epoch_id: 13, batch_id: 2400, cost: 0.040518, acc: 0.976562
[Wed Oct 10 16:52:20 2018] epoch_id: 13, batch_id: 2500, cost: 0.040826, acc: 0.976562
[Wed Oct 10 16:52:23 2018] epoch_id: 13, batch_id: 2600, cost: 0.043420, acc: 0.968750
[Wed Oct 10 16:52:25 2018] epoch_id: 13, batch_id: 2700, cost: 0.027364, acc: 0.984375
[Wed Oct 10 16:52:27 2018] epoch_id: 13, batch_id: 2800, cost: 0.030051, acc: 0.984375
[Wed Oct 10 16:52:30 2018] epoch_id: 13, batch_id: 2900, cost: 0.040024, acc: 0.984375
[Wed Oct 10 16:52:32 2018] epoch_id: 13, batch_id: 3000, cost: 0.054583, acc: 0.968750
[Wed Oct 10 16:52:32 2018] epoch_id: 13, train_avg_cost: 0.041237, train_avg_acc: 0.985349
[Wed Oct 10 16:52:33 2018] epoch_id: 13, dev_cost: 1.078762, accuracy: 0.8411
[Wed Oct 10 16:52:34 2018] epoch_id: 13, test_cost: 1.111191, accuracy: 0.8358
[Wed Oct 10 16:52:43 2018] epoch_id: 14, batch_id: 0, cost: 0.003011, acc: 1.000000
[Wed Oct 10 16:52:45 2018] epoch_id: 14, batch_id: 100, cost: 0.006236, acc: 1.000000
[Wed Oct 10 16:52:48 2018] epoch_id: 14, batch_id: 200, cost: 0.017501, acc: 0.992188
[Wed Oct 10 16:52:50 2018] epoch_id: 14, batch_id: 300, cost: 0.062686, acc: 0.976562
[Wed Oct 10 16:52:52 2018] epoch_id: 14, batch_id: 400, cost: 0.008696, acc: 1.000000
[Wed Oct 10 16:52:54 2018] epoch_id: 14, batch_id: 500, cost: 0.033238, acc: 0.984375
[Wed Oct 10 16:52:57 2018] epoch_id: 14, batch_id: 600, cost: 0.086478, acc: 0.976562
[Wed Oct 10 16:52:59 2018] epoch_id: 14, batch_id: 700, cost: 0.009820, acc: 0.992188
[Wed Oct 10 16:53:01 2018] epoch_id: 14, batch_id: 800, cost: 0.066287, acc: 0.992188
[Wed Oct 10 16:53:03 2018] epoch_id: 14, batch_id: 900, cost: 0.004043, acc: 1.000000
[Wed Oct 10 16:53:05 2018] epoch_id: 14, batch_id: 1000, cost: 0.007859, acc: 1.000000
[Wed Oct 10 16:53:08 2018] epoch_id: 14, batch_id: 1100, cost: 0.040856, acc: 0.976562
[Wed Oct 10 16:53:10 2018] epoch_id: 14, batch_id: 1200, cost: 0.038995, acc: 0.984375
[Wed Oct 10 16:53:12 2018] epoch_id: 14, batch_id: 1300, cost: 0.026738, acc: 0.992188
[Wed Oct 10 16:53:14 2018] epoch_id: 14, batch_id: 1400, cost: 0.048141, acc: 0.968750
[Wed Oct 10 16:53:16 2018] epoch_id: 14, batch_id: 1500, cost: 0.081051, acc: 0.976562
[Wed Oct 10 16:53:19 2018] epoch_id: 14, batch_id: 1600, cost: 0.017602, acc: 0.992188
[Wed Oct 10 16:53:21 2018] epoch_id: 14, batch_id: 1700, cost: 0.018175, acc: 0.992188
[Wed Oct 10 16:53:23 2018] epoch_id: 14, batch_id: 1800, cost: 0.076890, acc: 0.968750
[Wed Oct 10 16:53:25 2018] epoch_id: 14, batch_id: 1900, cost: 0.060768, acc: 0.976562
[Wed Oct 10 16:53:28 2018] epoch_id: 14, batch_id: 2000, cost: 0.020131, acc: 0.984375
[Wed Oct 10 16:53:30 2018] epoch_id: 14, batch_id: 2100, cost: 0.077612, acc: 0.976562
[Wed Oct 10 16:53:32 2018] epoch_id: 14, batch_id: 2200, cost: 0.101997, acc: 0.960938
[Wed Oct 10 16:53:34 2018] epoch_id: 14, batch_id: 2300, cost: 0.061213, acc: 0.976562
[Wed Oct 10 16:53:37 2018] epoch_id: 14, batch_id: 2400, cost: 0.048987, acc: 0.976562
[Wed Oct 10 16:53:39 2018] epoch_id: 14, batch_id: 2500, cost: 0.037741, acc: 0.984375
[Wed Oct 10 16:53:41 2018] epoch_id: 14, batch_id: 2600, cost: 0.011101, acc: 1.000000
[Wed Oct 10 16:53:43 2018] epoch_id: 14, batch_id: 2700, cost: 0.019846, acc: 0.992188
[Wed Oct 10 16:53:45 2018] epoch_id: 14, batch_id: 2800, cost: 0.026633, acc: 1.000000
[Wed Oct 10 16:53:48 2018] epoch_id: 14, batch_id: 2900, cost: 0.048637, acc: 0.976562
[Wed Oct 10 16:53:50 2018] epoch_id: 14, batch_id: 3000, cost: 0.056658, acc: 0.992188
[Wed Oct 10 16:53:50 2018] epoch_id: 14, train_avg_cost: 0.037520, train_avg_acc: 0.986595
[Wed Oct 10 16:53:51 2018] epoch_id: 14, dev_cost: 0.958707, accuracy: 0.8367
[Wed Oct 10 16:53:52 2018] epoch_id: 14, test_cost: 0.974553, accuracy: 0.8382
[Wed Oct 10 16:54:01 2018] epoch_id: 15, batch_id: 0, cost: 0.015232, acc: 1.000000
[Wed Oct 10 16:54:04 2018] epoch_id: 15, batch_id: 100, cost: 0.007195, acc: 1.000000
[Wed Oct 10 16:54:06 2018] epoch_id: 15, batch_id: 200, cost: 0.017140, acc: 0.992188
[Wed Oct 10 16:54:08 2018] epoch_id: 15, batch_id: 300, cost: 0.003196, acc: 1.000000
[Wed Oct 10 16:54:10 2018] epoch_id: 15, batch_id: 400, cost: 0.046839, acc: 0.976562
[Wed Oct 10 16:54:13 2018] epoch_id: 15, batch_id: 500, cost: 0.038533, acc: 0.992188
[Wed Oct 10 16:54:15 2018] epoch_id: 15, batch_id: 600, cost: 0.016502, acc: 0.992188
[Wed Oct 10 16:54:17 2018] epoch_id: 15, batch_id: 700, cost: 0.041825, acc: 0.976562
[Wed Oct 10 16:54:20 2018] epoch_id: 15, batch_id: 800, cost: 0.083583, acc: 0.968750
[Wed Oct 10 16:54:22 2018] epoch_id: 15, batch_id: 900, cost: 0.013552, acc: 0.992188
[Wed Oct 10 16:54:24 2018] epoch_id: 15, batch_id: 1000, cost: 0.015114, acc: 1.000000
[Wed Oct 10 16:54:26 2018] epoch_id: 15, batch_id: 1100, cost: 0.020185, acc: 0.992188
[Wed Oct 10 16:54:29 2018] epoch_id: 15, batch_id: 1200, cost: 0.023274, acc: 0.984375
[Wed Oct 10 16:54:31 2018] epoch_id: 15, batch_id: 1300, cost: 0.013836, acc: 1.000000
[Wed Oct 10 16:54:33 2018] epoch_id: 15, batch_id: 1400, cost: 0.091024, acc: 0.984375
[Wed Oct 10 16:54:36 2018] epoch_id: 15, batch_id: 1500, cost: 0.047340, acc: 0.976562
[Wed Oct 10 16:54:38 2018] epoch_id: 15, batch_id: 1600, cost: 0.030423, acc: 0.992188
[Wed Oct 10 16:54:40 2018] epoch_id: 15, batch_id: 1700, cost: 0.014750, acc: 0.992188
[Wed Oct 10 16:54:42 2018] epoch_id: 15, batch_id: 1800, cost: 0.090613, acc: 0.968750
[Wed Oct 10 16:54:45 2018] epoch_id: 15, batch_id: 1900, cost: 0.030791, acc: 0.984375
[Wed Oct 10 16:54:47 2018] epoch_id: 15, batch_id: 2000, cost: 0.046719, acc: 0.976562
[Wed Oct 10 16:54:49 2018] epoch_id: 15, batch_id: 2100, cost: 0.043871, acc: 0.984375
[Wed Oct 10 16:54:51 2018] epoch_id: 15, batch_id: 2200, cost: 0.078455, acc: 0.968750
[Wed Oct 10 16:54:53 2018] epoch_id: 15, batch_id: 2300, cost: 0.029536, acc: 0.976562
[Wed Oct 10 16:54:56 2018] epoch_id: 15, batch_id: 2400, cost: 0.028696, acc: 0.984375
[Wed Oct 10 16:54:58 2018] epoch_id: 15, batch_id: 2500, cost: 0.007129, acc: 0.992188
[Wed Oct 10 16:55:00 2018] epoch_id: 15, batch_id: 2600, cost: 0.049990, acc: 0.976562
[Wed Oct 10 16:55:03 2018] epoch_id: 15, batch_id: 2700, cost: 0.040309, acc: 0.984375
[Wed Oct 10 16:55:06 2018] epoch_id: 15, batch_id: 2800, cost: 0.098748, acc: 0.976562
[Wed Oct 10 16:55:08 2018] epoch_id: 15, batch_id: 2900, cost: 0.005371, acc: 1.000000
[Wed Oct 10 16:55:10 2018] epoch_id: 15, batch_id: 3000, cost: 0.060264, acc: 0.960938
[Wed Oct 10 16:55:11 2018] epoch_id: 15, train_avg_cost: 0.034637, train_avg_acc: 0.987582
[Wed Oct 10 16:55:12 2018] epoch_id: 15, dev_cost: 0.858216, accuracy: 0.8365
[Wed Oct 10 16:55:13 2018] epoch_id: 15, test_cost: 0.874420, accuracy: 0.8411
[Wed Oct 10 16:55:21 2018] epoch_id: 16, batch_id: 0, cost: 0.013283, acc: 1.000000
[Wed Oct 10 16:55:23 2018] epoch_id: 16, batch_id: 100, cost: 0.038128, acc: 0.984375
[Wed Oct 10 16:55:25 2018] epoch_id: 16, batch_id: 200, cost: 0.031110, acc: 0.976562
[Wed Oct 10 16:55:28 2018] epoch_id: 16, batch_id: 300, cost: 0.005346, acc: 1.000000
[Wed Oct 10 16:55:30 2018] epoch_id: 16, batch_id: 400, cost: 0.027634, acc: 0.984375
[Wed Oct 10 16:55:32 2018] epoch_id: 16, batch_id: 500, cost: 0.065929, acc: 0.976562
[Wed Oct 10 16:55:35 2018] epoch_id: 16, batch_id: 600, cost: 0.012638, acc: 0.992188
[Wed Oct 10 16:55:37 2018] epoch_id: 16, batch_id: 700, cost: 0.057962, acc: 0.984375
[Wed Oct 10 16:55:39 2018] epoch_id: 16, batch_id: 800, cost: 0.064390, acc: 0.976562
[Wed Oct 10 16:55:42 2018] epoch_id: 16, batch_id: 900, cost: 0.018866, acc: 0.992188
[Wed Oct 10 16:55:44 2018] epoch_id: 16, batch_id: 1000, cost: 0.004791, acc: 1.000000
[Wed Oct 10 16:55:46 2018] epoch_id: 16, batch_id: 1100, cost: 0.012691, acc: 0.992188
[Wed Oct 10 16:55:49 2018] epoch_id: 16, batch_id: 1200, cost: 0.033199, acc: 0.992188
[Wed Oct 10 16:55:51 2018] epoch_id: 16, batch_id: 1300, cost: 0.007757, acc: 1.000000
[Wed Oct 10 16:55:53 2018] epoch_id: 16, batch_id: 1400, cost: 0.016653, acc: 0.992188
[Wed Oct 10 16:55:55 2018] epoch_id: 16, batch_id: 1500, cost: 0.034653, acc: 0.968750
[Wed Oct 10 16:55:58 2018] epoch_id: 16, batch_id: 1600, cost: 0.051049, acc: 0.976562
[Wed Oct 10 16:56:00 2018] epoch_id: 16, batch_id: 1700, cost: 0.001466, acc: 1.000000
[Wed Oct 10 16:56:02 2018] epoch_id: 16, batch_id: 1800, cost: 0.035508, acc: 0.992188
[Wed Oct 10 16:56:05 2018] epoch_id: 16, batch_id: 1900, cost: 0.022919, acc: 0.984375
[Wed Oct 10 16:56:07 2018] epoch_id: 16, batch_id: 2000, cost: 0.102175, acc: 0.976562
[Wed Oct 10 16:56:09 2018] epoch_id: 16, batch_id: 2100, cost: 0.012663, acc: 1.000000
[Wed Oct 10 16:56:11 2018] epoch_id: 16, batch_id: 2200, cost: 0.026142, acc: 0.984375
[Wed Oct 10 16:56:15 2018] epoch_id: 16, batch_id: 2300, cost: 0.007566, acc: 1.000000
[Wed Oct 10 16:56:17 2018] epoch_id: 16, batch_id: 2400, cost: 0.043235, acc: 0.976562
[Wed Oct 10 16:56:20 2018] epoch_id: 16, batch_id: 2500, cost: 0.039383, acc: 0.984375
[Wed Oct 10 16:56:22 2018] epoch_id: 16, batch_id: 2600, cost: 0.009917, acc: 1.000000
[Wed Oct 10 16:56:24 2018] epoch_id: 16, batch_id: 2700, cost: 0.036917, acc: 0.984375
[Wed Oct 10 16:56:26 2018] epoch_id: 16, batch_id: 2800, cost: 0.012813, acc: 1.000000
[Wed Oct 10 16:56:29 2018] epoch_id: 16, batch_id: 2900, cost: 0.033933, acc: 0.984375
[Wed Oct 10 16:56:31 2018] epoch_id: 16, batch_id: 3000, cost: 0.007463, acc: 1.000000
[Wed Oct 10 16:56:32 2018] epoch_id: 16, train_avg_cost: 0.031971, train_avg_acc: 0.988555
[Wed Oct 10 16:56:33 2018] epoch_id: 16, dev_cost: 0.955907, accuracy: 0.8389
[Wed Oct 10 16:56:34 2018] epoch_id: 16, test_cost: 0.953062, accuracy: 0.8389
[Wed Oct 10 16:56:42 2018] epoch_id: 17, batch_id: 0, cost: 0.031323, acc: 0.992188
[Wed Oct 10 16:56:44 2018] epoch_id: 17, batch_id: 100, cost: 0.010965, acc: 1.000000
[Wed Oct 10 16:56:46 2018] epoch_id: 17, batch_id: 200, cost: 0.056771, acc: 0.976562
[Wed Oct 10 16:56:49 2018] epoch_id: 17, batch_id: 300, cost: 0.026509, acc: 0.992188
[Wed Oct 10 16:56:51 2018] epoch_id: 17, batch_id: 400, cost: 0.039409, acc: 0.992188
[Wed Oct 10 16:56:53 2018] epoch_id: 17, batch_id: 500, cost: 0.063554, acc: 0.976562
[Wed Oct 10 16:56:55 2018] epoch_id: 17, batch_id: 600, cost: 0.035896, acc: 0.976562
[Wed Oct 10 16:56:58 2018] epoch_id: 17, batch_id: 700, cost: 0.022053, acc: 0.992188
[Wed Oct 10 16:57:00 2018] epoch_id: 17, batch_id: 800, cost: 0.024150, acc: 0.976562
[Wed Oct 10 16:57:03 2018] epoch_id: 17, batch_id: 900, cost: 0.009064, acc: 0.992188
[Wed Oct 10 16:57:05 2018] epoch_id: 17, batch_id: 1000, cost: 0.037311, acc: 0.976562
[Wed Oct 10 16:57:08 2018] epoch_id: 17, batch_id: 1100, cost: 0.036577, acc: 0.976562
[Wed Oct 10 16:57:10 2018] epoch_id: 17, batch_id: 1200, cost: 0.020783, acc: 0.992188
[Wed Oct 10 16:57:12 2018] epoch_id: 17, batch_id: 1300, cost: 0.017610, acc: 0.992188
[Wed Oct 10 16:57:14 2018] epoch_id: 17, batch_id: 1400, cost: 0.027604, acc: 0.976562
[Wed Oct 10 16:57:17 2018] epoch_id: 17, batch_id: 1500, cost: 0.040730, acc: 0.992188
[Wed Oct 10 16:57:19 2018] epoch_id: 17, batch_id: 1600, cost: 0.077946, acc: 0.984375
[Wed Oct 10 16:57:21 2018] epoch_id: 17, batch_id: 1700, cost: 0.021349, acc: 0.984375
[Wed Oct 10 16:57:24 2018] epoch_id: 17, batch_id: 1800, cost: 0.016132, acc: 0.992188
[Wed Oct 10 16:57:26 2018] epoch_id: 17, batch_id: 1900, cost: 0.018797, acc: 0.984375
[Wed Oct 10 16:57:28 2018] epoch_id: 17, batch_id: 2000, cost: 0.009052, acc: 1.000000
[Wed Oct 10 16:57:30 2018] epoch_id: 17, batch_id: 2100, cost: 0.028399, acc: 0.992188
[Wed Oct 10 16:57:33 2018] epoch_id: 17, batch_id: 2200, cost: 0.009593, acc: 1.000000
[Wed Oct 10 16:57:35 2018] epoch_id: 17, batch_id: 2300, cost: 0.018474, acc: 0.992188
[Wed Oct 10 16:57:37 2018] epoch_id: 17, batch_id: 2400, cost: 0.007873, acc: 1.000000
[Wed Oct 10 16:57:40 2018] epoch_id: 17, batch_id: 2500, cost: 0.054923, acc: 0.976562
[Wed Oct 10 16:57:42 2018] epoch_id: 17, batch_id: 2600, cost: 0.019036, acc: 0.992188
[Wed Oct 10 16:57:44 2018] epoch_id: 17, batch_id: 2700, cost: 0.017081, acc: 1.000000
[Wed Oct 10 16:57:46 2018] epoch_id: 17, batch_id: 2800, cost: 0.045522, acc: 0.976562
[Wed Oct 10 16:57:49 2018] epoch_id: 17, batch_id: 2900, cost: 0.034922, acc: 0.984375
[Wed Oct 10 16:57:51 2018] epoch_id: 17, batch_id: 3000, cost: 0.039566, acc: 0.984375
[Wed Oct 10 16:57:51 2018] epoch_id: 17, train_avg_cost: 0.030061, train_avg_acc: 0.989478
[Wed Oct 10 16:57:52 2018] epoch_id: 17, dev_cost: 1.184997, accuracy: 0.8406
[Wed Oct 10 16:57:53 2018] epoch_id: 17, test_cost: 1.175792, accuracy: 0.8372
[Wed Oct 10 16:58:02 2018] epoch_id: 18, batch_id: 0, cost: 0.015059, acc: 0.992188
[Wed Oct 10 16:58:04 2018] epoch_id: 18, batch_id: 100, cost: 0.023421, acc: 0.992188
[Wed Oct 10 16:58:06 2018] epoch_id: 18, batch_id: 200, cost: 0.007234, acc: 1.000000
[Wed Oct 10 16:58:08 2018] epoch_id: 18, batch_id: 300, cost: 0.007139, acc: 1.000000
[Wed Oct 10 16:58:10 2018] epoch_id: 18, batch_id: 400, cost: 0.007934, acc: 1.000000
[Wed Oct 10 16:58:13 2018] epoch_id: 18, batch_id: 500, cost: 0.004312, acc: 1.000000
[Wed Oct 10 16:58:15 2018] epoch_id: 18, batch_id: 600, cost: 0.001806, acc: 1.000000
[Wed Oct 10 16:58:17 2018] epoch_id: 18, batch_id: 700, cost: 0.004790, acc: 1.000000
[Wed Oct 10 16:58:20 2018] epoch_id: 18, batch_id: 800, cost: 0.048477, acc: 0.992188
[Wed Oct 10 16:58:22 2018] epoch_id: 18, batch_id: 900, cost: 0.066390, acc: 0.992188
[Wed Oct 10 16:58:24 2018] epoch_id: 18, batch_id: 1000, cost: 0.014440, acc: 0.992188
[Wed Oct 10 16:58:26 2018] epoch_id: 18, batch_id: 1100, cost: 0.020435, acc: 0.992188
[Wed Oct 10 16:58:29 2018] epoch_id: 18, batch_id: 1200, cost: 0.007474, acc: 0.992188
[Wed Oct 10 16:58:31 2018] epoch_id: 18, batch_id: 1300, cost: 0.036209, acc: 0.984375
[Wed Oct 10 16:58:33 2018] epoch_id: 18, batch_id: 1400, cost: 0.026540, acc: 0.984375
[Wed Oct 10 16:58:35 2018] epoch_id: 18, batch_id: 1500, cost: 0.019448, acc: 0.992188
[Wed Oct 10 16:58:38 2018] epoch_id: 18, batch_id: 1600, cost: 0.052421, acc: 0.968750
[Wed Oct 10 16:58:40 2018] epoch_id: 18, batch_id: 1700, cost: 0.022365, acc: 0.992188
[Wed Oct 10 16:58:42 2018] epoch_id: 18, batch_id: 1800, cost: 0.135754, acc: 0.984375
[Wed Oct 10 16:58:45 2018] epoch_id: 18, batch_id: 1900, cost: 0.037197, acc: 0.992188
[Wed Oct 10 16:58:48 2018] epoch_id: 18, batch_id: 2000, cost: 0.010672, acc: 0.992188
[Wed Oct 10 16:58:50 2018] epoch_id: 18, batch_id: 2100, cost: 0.012909, acc: 1.000000
[Wed Oct 10 16:58:52 2018] epoch_id: 18, batch_id: 2200, cost: 0.061615, acc: 0.976562
[Wed Oct 10 16:58:55 2018] epoch_id: 18, batch_id: 2300, cost: 0.081252, acc: 0.960938
[Wed Oct 10 16:58:57 2018] epoch_id: 18, batch_id: 2400, cost: 0.009792, acc: 1.000000
[Wed Oct 10 16:58:59 2018] epoch_id: 18, batch_id: 2500, cost: 0.039835, acc: 0.984375
[Wed Oct 10 16:59:02 2018] epoch_id: 18, batch_id: 2600, cost: 0.002643, acc: 1.000000
[Wed Oct 10 16:59:04 2018] epoch_id: 18, batch_id: 2700, cost: 0.017633, acc: 0.992188
[Wed Oct 10 16:59:06 2018] epoch_id: 18, batch_id: 2800, cost: 0.050407, acc: 0.976562
[Wed Oct 10 16:59:08 2018] epoch_id: 18, batch_id: 2900, cost: 0.066672, acc: 0.960938
[Wed Oct 10 16:59:11 2018] epoch_id: 18, batch_id: 3000, cost: 0.023438, acc: 0.984375
[Wed Oct 10 16:59:11 2018] epoch_id: 18, train_avg_cost: 0.028777, train_avg_acc: 0.989884
[Wed Oct 10 16:59:12 2018] epoch_id: 18, dev_cost: 1.191979, accuracy: 0.8346
[Wed Oct 10 16:59:13 2018] epoch_id: 18, test_cost: 1.159855, accuracy: 0.8344
[Wed Oct 10 16:59:22 2018] epoch_id: 19, batch_id: 0, cost: 0.023233, acc: 0.992188
[Wed Oct 10 16:59:24 2018] epoch_id: 19, batch_id: 100, cost: 0.006624, acc: 1.000000
[Wed Oct 10 16:59:26 2018] epoch_id: 19, batch_id: 200, cost: 0.018784, acc: 0.992188
[Wed Oct 10 16:59:28 2018] epoch_id: 19, batch_id: 300, cost: 0.012745, acc: 0.992188
[Wed Oct 10 16:59:31 2018] epoch_id: 19, batch_id: 400, cost: 0.010857, acc: 1.000000
[Wed Oct 10 16:59:33 2018] epoch_id: 19, batch_id: 500, cost: 0.006066, acc: 1.000000
[Wed Oct 10 16:59:35 2018] epoch_id: 19, batch_id: 600, cost: 0.014349, acc: 0.992188
[Wed Oct 10 16:59:38 2018] epoch_id: 19, batch_id: 700, cost: 0.016725, acc: 0.992188
[Wed Oct 10 16:59:40 2018] epoch_id: 19, batch_id: 800, cost: 0.069121, acc: 0.984375
[Wed Oct 10 16:59:42 2018] epoch_id: 19, batch_id: 900, cost: 0.018849, acc: 0.984375
[Wed Oct 10 16:59:44 2018] epoch_id: 19, batch_id: 1000, cost: 0.031679, acc: 0.984375
[Wed Oct 10 16:59:47 2018] epoch_id: 19, batch_id: 1100, cost: 0.010815, acc: 0.992188
[Wed Oct 10 16:59:49 2018] epoch_id: 19, batch_id: 1200, cost: 0.015778, acc: 0.992188
[Wed Oct 10 16:59:51 2018] epoch_id: 19, batch_id: 1300, cost: 0.055160, acc: 0.984375
[Wed Oct 10 16:59:53 2018] epoch_id: 19, batch_id: 1400, cost: 0.009311, acc: 0.992188
[Wed Oct 10 16:59:55 2018] epoch_id: 19, batch_id: 1500, cost: 0.014874, acc: 0.992188
[Wed Oct 10 16:59:58 2018] epoch_id: 19, batch_id: 1600, cost: 0.038188, acc: 0.992188
[Wed Oct 10 17:00:00 2018] epoch_id: 19, batch_id: 1700, cost: 0.001565, acc: 1.000000
[Wed Oct 10 17:00:02 2018] epoch_id: 19, batch_id: 1800, cost: 0.013963, acc: 0.992188
[Wed Oct 10 17:00:04 2018] epoch_id: 19, batch_id: 1900, cost: 0.028362, acc: 0.992188
[Wed Oct 10 17:00:06 2018] epoch_id: 19, batch_id: 2000, cost: 0.006552, acc: 1.000000
[Wed Oct 10 17:00:09 2018] epoch_id: 19, batch_id: 2100, cost: 0.045230, acc: 0.992188
[Wed Oct 10 17:00:11 2018] epoch_id: 19, batch_id: 2200, cost: 0.029525, acc: 0.984375
[Wed Oct 10 17:00:13 2018] epoch_id: 19, batch_id: 2300, cost: 0.009774, acc: 0.992188
[Wed Oct 10 17:00:15 2018] epoch_id: 19, batch_id: 2400, cost: 0.003385, acc: 1.000000
[Wed Oct 10 17:00:18 2018] epoch_id: 19, batch_id: 2500, cost: 0.030629, acc: 0.984375
[Wed Oct 10 17:00:20 2018] epoch_id: 19, batch_id: 2600, cost: 0.039615, acc: 0.992188
[Wed Oct 10 17:00:22 2018] epoch_id: 19, batch_id: 2700, cost: 0.016678, acc: 0.992188
[Wed Oct 10 17:00:24 2018] epoch_id: 19, batch_id: 2800, cost: 0.004723, acc: 1.000000
[Wed Oct 10 17:00:26 2018] epoch_id: 19, batch_id: 2900, cost: 0.018062, acc: 0.992188
[Wed Oct 10 17:00:29 2018] epoch_id: 19, batch_id: 3000, cost: 0.032904, acc: 0.984375
[Wed Oct 10 17:00:29 2018] epoch_id: 19, train_avg_cost: 0.026175, train_avg_acc: 0.991055
[Wed Oct 10 17:00:30 2018] epoch_id: 19, dev_cost: 1.013367, accuracy: 0.8388
[Wed Oct 10 17:00:31 2018] epoch_id: 19, test_cost: 1.016906, accuracy: 0.8335
[Wed Oct 10 17:00:40 2018] epoch_id: 20, batch_id: 0, cost: 0.019038, acc: 0.992188
[Wed Oct 10 17:00:42 2018] epoch_id: 20, batch_id: 100, cost: 0.001216, acc: 1.000000
[Wed Oct 10 17:00:44 2018] epoch_id: 20, batch_id: 200, cost: 0.006635, acc: 1.000000
[Wed Oct 10 17:00:47 2018] epoch_id: 20, batch_id: 300, cost: 0.051503, acc: 0.984375
[Wed Oct 10 17:00:49 2018] epoch_id: 20, batch_id: 400, cost: 0.044815, acc: 0.992188
[Wed Oct 10 17:00:51 2018] epoch_id: 20, batch_id: 500, cost: 0.041529, acc: 0.992188
[Wed Oct 10 17:00:53 2018] epoch_id: 20, batch_id: 600, cost: 0.010035, acc: 1.000000
[Wed Oct 10 17:00:56 2018] epoch_id: 20, batch_id: 700, cost: 0.019799, acc: 0.992188
[Wed Oct 10 17:00:58 2018] epoch_id: 20, batch_id: 800, cost: 0.062296, acc: 0.984375
[Wed Oct 10 17:01:00 2018] epoch_id: 20, batch_id: 900, cost: 0.015680, acc: 0.992188
[Wed Oct 10 17:01:03 2018] epoch_id: 20, batch_id: 1000, cost: 0.051963, acc: 0.984375
[Wed Oct 10 17:01:05 2018] epoch_id: 20, batch_id: 1100, cost: 0.023968, acc: 0.984375
[Wed Oct 10 17:01:07 2018] epoch_id: 20, batch_id: 1200, cost: 0.079527, acc: 0.984375
[Wed Oct 10 17:01:09 2018] epoch_id: 20, batch_id: 1300, cost: 0.039612, acc: 0.992188
[Wed Oct 10 17:01:12 2018] epoch_id: 20, batch_id: 1400, cost: 0.010211, acc: 1.000000
[Wed Oct 10 17:01:14 2018] epoch_id: 20, batch_id: 1500, cost: 0.012661, acc: 0.992188
[Wed Oct 10 17:01:16 2018] epoch_id: 20, batch_id: 1600, cost: 0.051475, acc: 0.984375
[Wed Oct 10 17:01:18 2018] epoch_id: 20, batch_id: 1700, cost: 0.013513, acc: 1.000000
[Wed Oct 10 17:01:21 2018] epoch_id: 20, batch_id: 1800, cost: 0.006646, acc: 1.000000
[Wed Oct 10 17:01:23 2018] epoch_id: 20, batch_id: 1900, cost: 0.013369, acc: 0.992188
[Wed Oct 10 17:01:25 2018] epoch_id: 20, batch_id: 2000, cost: 0.030614, acc: 0.984375
[Wed Oct 10 17:01:27 2018] epoch_id: 20, batch_id: 2100, cost: 0.003242, acc: 1.000000
[Wed Oct 10 17:01:30 2018] epoch_id: 20, batch_id: 2200, cost: 0.051409, acc: 0.984375
[Wed Oct 10 17:01:32 2018] epoch_id: 20, batch_id: 2300, cost: 0.005996, acc: 1.000000
[Wed Oct 10 17:01:34 2018] epoch_id: 20, batch_id: 2400, cost: 0.049493, acc: 0.976562
[Wed Oct 10 17:01:36 2018] epoch_id: 20, batch_id: 2500, cost: 0.013635, acc: 0.992188
[Wed Oct 10 17:01:38 2018] epoch_id: 20, batch_id: 2600, cost: 0.019265, acc: 1.000000
[Wed Oct 10 17:01:41 2018] epoch_id: 20, batch_id: 2700, cost: 0.040467, acc: 0.976562
[Wed Oct 10 17:01:44 2018] epoch_id: 20, batch_id: 2800, cost: 0.029407, acc: 0.992188
[Wed Oct 10 17:01:46 2018] epoch_id: 20, batch_id: 2900, cost: 0.036886, acc: 0.976562
[Wed Oct 10 17:01:49 2018] epoch_id: 20, batch_id: 3000, cost: 0.018317, acc: 0.992188
[Wed Oct 10 17:01:49 2018] epoch_id: 20, train_avg_cost: 0.025258, train_avg_acc: 0.991367
[Wed Oct 10 17:01:50 2018] epoch_id: 20, dev_cost: 1.125290, accuracy: 0.8358
[Wed Oct 10 17:01:51 2018] epoch_id: 20, test_cost: 1.148761, accuracy: 0.832
[Wed Oct 10 17:01:59 2018] epoch_id: 21, batch_id: 0, cost: 0.020581, acc: 0.992188
[Wed Oct 10 17:02:02 2018] epoch_id: 21, batch_id: 100, cost: 0.021132, acc: 0.992188
[Wed Oct 10 17:02:04 2018] epoch_id: 21, batch_id: 200, cost: 0.040257, acc: 0.976562
[Wed Oct 10 17:02:06 2018] epoch_id: 21, batch_id: 300, cost: 0.013450, acc: 1.000000
[Wed Oct 10 17:02:08 2018] epoch_id: 21, batch_id: 400, cost: 0.027469, acc: 0.992188
[Wed Oct 10 17:02:11 2018] epoch_id: 21, batch_id: 500, cost: 0.007088, acc: 0.992188
[Wed Oct 10 17:02:13 2018] epoch_id: 21, batch_id: 600, cost: 0.028169, acc: 0.992188
[Wed Oct 10 17:02:15 2018] epoch_id: 21, batch_id: 700, cost: 0.067799, acc: 0.984375
[Wed Oct 10 17:02:17 2018] epoch_id: 21, batch_id: 800, cost: 0.003184, acc: 1.000000
[Wed Oct 10 17:02:20 2018] epoch_id: 21, batch_id: 900, cost: 0.011056, acc: 0.992188
[Wed Oct 10 17:02:22 2018] epoch_id: 21, batch_id: 1000, cost: 0.012187, acc: 1.000000
[Wed Oct 10 17:02:24 2018] epoch_id: 21, batch_id: 1100, cost: 0.009409, acc: 0.992188
[Wed Oct 10 17:02:26 2018] epoch_id: 21, batch_id: 1200, cost: 0.000739, acc: 1.000000
[Wed Oct 10 17:02:29 2018] epoch_id: 21, batch_id: 1300, cost: 0.002971, acc: 1.000000
[Wed Oct 10 17:02:31 2018] epoch_id: 21, batch_id: 1400, cost: 0.031287, acc: 0.984375
[Wed Oct 10 17:02:33 2018] epoch_id: 21, batch_id: 1500, cost: 0.023455, acc: 0.992188
[Wed Oct 10 17:02:36 2018] epoch_id: 21, batch_id: 1600, cost: 0.007438, acc: 1.000000
[Wed Oct 10 17:02:38 2018] epoch_id: 21, batch_id: 1700, cost: 0.035499, acc: 0.968750
[Wed Oct 10 17:02:40 2018] epoch_id: 21, batch_id: 1800, cost: 0.012515, acc: 1.000000
[Wed Oct 10 17:02:42 2018] epoch_id: 21, batch_id: 1900, cost: 0.008550, acc: 1.000000
[Wed Oct 10 17:02:45 2018] epoch_id: 21, batch_id: 2000, cost: 0.051551, acc: 0.992188
[Wed Oct 10 17:02:47 2018] epoch_id: 21, batch_id: 2100, cost: 0.004980, acc: 1.000000
[Wed Oct 10 17:02:49 2018] epoch_id: 21, batch_id: 2200, cost: 0.006854, acc: 1.000000
[Wed Oct 10 17:02:51 2018] epoch_id: 21, batch_id: 2300, cost: 0.071025, acc: 0.968750
[Wed Oct 10 17:02:55 2018] epoch_id: 21, batch_id: 2400, cost: 0.013599, acc: 1.000000
[Wed Oct 10 17:02:57 2018] epoch_id: 21, batch_id: 2500, cost: 0.025085, acc: 0.992188
[Wed Oct 10 17:02:59 2018] epoch_id: 21, batch_id: 2600, cost: 0.018276, acc: 0.984375
[Wed Oct 10 17:03:01 2018] epoch_id: 21, batch_id: 2700, cost: 0.040565, acc: 0.984375
[Wed Oct 10 17:03:04 2018] epoch_id: 21, batch_id: 2800, cost: 0.099454, acc: 0.968750
[Wed Oct 10 17:03:06 2018] epoch_id: 21, batch_id: 2900, cost: 0.017812, acc: 0.992188
[Wed Oct 10 17:03:08 2018] epoch_id: 21, batch_id: 3000, cost: 0.019825, acc: 0.992188
[Wed Oct 10 17:03:09 2018] epoch_id: 21, train_avg_cost: 0.024180, train_avg_acc: 0.991505
[Wed Oct 10 17:03:10 2018] epoch_id: 21, dev_cost: 1.413867, accuracy: 0.836
[Wed Oct 10 17:03:11 2018] epoch_id: 21, test_cost: 1.380237, accuracy: 0.8353
[Wed Oct 10 17:03:19 2018] epoch_id: 22, batch_id: 0, cost: 0.001493, acc: 1.000000
[Wed Oct 10 17:03:21 2018] epoch_id: 22, batch_id: 100, cost: 0.017211, acc: 0.984375
[Wed Oct 10 17:03:23 2018] epoch_id: 22, batch_id: 200, cost: 0.015626, acc: 0.992188
[Wed Oct 10 17:03:25 2018] epoch_id: 22, batch_id: 300, cost: 0.002411, acc: 1.000000
[Wed Oct 10 17:03:28 2018] epoch_id: 22, batch_id: 400, cost: 0.098118, acc: 0.984375
[Wed Oct 10 17:03:30 2018] epoch_id: 22, batch_id: 500, cost: 0.031192, acc: 0.992188
[Wed Oct 10 17:03:32 2018] epoch_id: 22, batch_id: 600, cost: 0.002122, acc: 1.000000
[Wed Oct 10 17:03:34 2018] epoch_id: 22, batch_id: 700, cost: 0.006148, acc: 1.000000
[Wed Oct 10 17:03:38 2018] epoch_id: 22, batch_id: 800, cost: 0.007830, acc: 1.000000
[Wed Oct 10 17:03:40 2018] epoch_id: 22, batch_id: 900, cost: 0.009371, acc: 1.000000
[Wed Oct 10 17:03:43 2018] epoch_id: 22, batch_id: 1000, cost: 0.024280, acc: 0.984375
[Wed Oct 10 17:03:45 2018] epoch_id: 22, batch_id: 1100, cost: 0.067847, acc: 0.984375
[Wed Oct 10 17:03:47 2018] epoch_id: 22, batch_id: 1200, cost: 0.024875, acc: 0.984375
[Wed Oct 10 17:03:50 2018] epoch_id: 22, batch_id: 1300, cost: 0.004252, acc: 1.000000
[Wed Oct 10 17:03:52 2018] epoch_id: 22, batch_id: 1400, cost: 0.014934, acc: 0.992188
[Wed Oct 10 17:03:54 2018] epoch_id: 22, batch_id: 1500, cost: 0.008299, acc: 1.000000
[Wed Oct 10 17:03:56 2018] epoch_id: 22, batch_id: 1600, cost: 0.007932, acc: 1.000000
[Wed Oct 10 17:03:59 2018] epoch_id: 22, batch_id: 1700, cost: 0.007008, acc: 1.000000
[Wed Oct 10 17:04:01 2018] epoch_id: 22, batch_id: 1800, cost: 0.028636, acc: 0.984375
[Wed Oct 10 17:04:03 2018] epoch_id: 22, batch_id: 1900, cost: 0.012712, acc: 0.992188
[Wed Oct 10 17:04:05 2018] epoch_id: 22, batch_id: 2000, cost: 0.027561, acc: 0.992188
[Wed Oct 10 17:04:08 2018] epoch_id: 22, batch_id: 2100, cost: 0.017589, acc: 0.992188
[Wed Oct 10 17:04:10 2018] epoch_id: 22, batch_id: 2200, cost: 0.016391, acc: 0.992188
[Wed Oct 10 17:04:12 2018] epoch_id: 22, batch_id: 2300, cost: 0.042172, acc: 0.984375
[Wed Oct 10 17:04:14 2018] epoch_id: 22, batch_id: 2400, cost: 0.024060, acc: 0.984375
[Wed Oct 10 17:04:17 2018] epoch_id: 22, batch_id: 2500, cost: 0.014206, acc: 1.000000
[Wed Oct 10 17:04:19 2018] epoch_id: 22, batch_id: 2600, cost: 0.028562, acc: 0.992188
[Wed Oct 10 17:04:21 2018] epoch_id: 22, batch_id: 2700, cost: 0.013936, acc: 0.992188
[Wed Oct 10 17:04:23 2018] epoch_id: 22, batch_id: 2800, cost: 0.023205, acc: 0.984375
[Wed Oct 10 17:04:26 2018] epoch_id: 22, batch_id: 2900, cost: 0.031024, acc: 0.984375
[Wed Oct 10 17:04:28 2018] epoch_id: 22, batch_id: 3000, cost: 0.004115, acc: 1.000000
[Wed Oct 10 17:04:29 2018] epoch_id: 22, train_avg_cost: 0.022458, train_avg_acc: 0.992184
[Wed Oct 10 17:04:30 2018] epoch_id: 22, dev_cost: 1.388674, accuracy: 0.8329
[Wed Oct 10 17:04:31 2018] epoch_id: 22, test_cost: 1.366122, accuracy: 0.8359
[Wed Oct 10 17:04:39 2018] epoch_id: 23, batch_id: 0, cost: 0.012273, acc: 0.992188
[Wed Oct 10 17:04:41 2018] epoch_id: 23, batch_id: 100, cost: 0.010904, acc: 0.992188
[Wed Oct 10 17:04:44 2018] epoch_id: 23, batch_id: 200, cost: 0.001967, acc: 1.000000
[Wed Oct 10 17:04:46 2018] epoch_id: 23, batch_id: 300, cost: 0.006554, acc: 1.000000
[Wed Oct 10 17:04:48 2018] epoch_id: 23, batch_id: 400, cost: 0.005179, acc: 1.000000
[Wed Oct 10 17:04:50 2018] epoch_id: 23, batch_id: 500, cost: 0.014761, acc: 0.992188
[Wed Oct 10 17:04:53 2018] epoch_id: 23, batch_id: 600, cost: 0.015971, acc: 0.992188
[Wed Oct 10 17:04:55 2018] epoch_id: 23, batch_id: 700, cost: 0.058416, acc: 0.984375
[Wed Oct 10 17:04:57 2018] epoch_id: 23, batch_id: 800, cost: 0.005064, acc: 1.000000
[Wed Oct 10 17:04:59 2018] epoch_id: 23, batch_id: 900, cost: 0.003761, acc: 1.000000
[Wed Oct 10 17:05:02 2018] epoch_id: 23, batch_id: 1000, cost: 0.002844, acc: 1.000000
[Wed Oct 10 17:05:04 2018] epoch_id: 23, batch_id: 1100, cost: 0.010259, acc: 1.000000
[Wed Oct 10 17:05:06 2018] epoch_id: 23, batch_id: 1200, cost: 0.005445, acc: 1.000000
[Wed Oct 10 17:05:09 2018] epoch_id: 23, batch_id: 1300, cost: 0.018197, acc: 0.992188
[Wed Oct 10 17:05:11 2018] epoch_id: 23, batch_id: 1400, cost: 0.016600, acc: 0.992188
[Wed Oct 10 17:05:13 2018] epoch_id: 23, batch_id: 1500, cost: 0.047691, acc: 0.992188
[Wed Oct 10 17:05:15 2018] epoch_id: 23, batch_id: 1600, cost: 0.084442, acc: 0.984375
[Wed Oct 10 17:05:18 2018] epoch_id: 23, batch_id: 1700, cost: 0.044283, acc: 0.992188
[Wed Oct 10 17:05:21 2018] epoch_id: 23, batch_id: 1800, cost: 0.120200, acc: 0.984375
[Wed Oct 10 17:05:23 2018] epoch_id: 23, batch_id: 1900, cost: 0.013874, acc: 0.992188
[Wed Oct 10 17:05:26 2018] epoch_id: 23, batch_id: 2000, cost: 0.027709, acc: 0.984375
[Wed Oct 10 17:05:28 2018] epoch_id: 23, batch_id: 2100, cost: 0.017088, acc: 0.992188
[Wed Oct 10 17:05:30 2018] epoch_id: 23, batch_id: 2200, cost: 0.049081, acc: 0.976562
[Wed Oct 10 17:05:32 2018] epoch_id: 23, batch_id: 2300, cost: 0.013016, acc: 0.992188
[Wed Oct 10 17:05:35 2018] epoch_id: 23, batch_id: 2400, cost: 0.015467, acc: 0.992188
[Wed Oct 10 17:05:37 2018] epoch_id: 23, batch_id: 2500, cost: 0.002745, acc: 1.000000
[Wed Oct 10 17:05:39 2018] epoch_id: 23, batch_id: 2600, cost: 0.002618, acc: 1.000000
[Wed Oct 10 17:05:42 2018] epoch_id: 23, batch_id: 2700, cost: 0.010789, acc: 1.000000
[Wed Oct 10 17:05:44 2018] epoch_id: 23, batch_id: 2800, cost: 0.026513, acc: 0.984375
[Wed Oct 10 17:05:46 2018] epoch_id: 23, batch_id: 2900, cost: 0.056513, acc: 0.984375
[Wed Oct 10 17:05:49 2018] epoch_id: 23, batch_id: 3000, cost: 0.007607, acc: 1.000000
[Wed Oct 10 17:05:49 2018] epoch_id: 23, train_avg_cost: 0.021786, train_avg_acc: 0.992707
[Wed Oct 10 17:05:50 2018] epoch_id: 23, dev_cost: 1.181561, accuracy: 0.8368
[Wed Oct 10 17:05:51 2018] epoch_id: 23, test_cost: 1.209735, accuracy: 0.8339
[Wed Oct 10 17:06:00 2018] epoch_id: 24, batch_id: 0, cost: 0.005431, acc: 1.000000
[Wed Oct 10 17:06:02 2018] epoch_id: 24, batch_id: 100, cost: 0.017588, acc: 0.984375
[Wed Oct 10 17:06:04 2018] epoch_id: 24, batch_id: 200, cost: 0.078571, acc: 0.976562
[Wed Oct 10 17:06:06 2018] epoch_id: 24, batch_id: 300, cost: 0.003192, acc: 1.000000
[Wed Oct 10 17:06:09 2018] epoch_id: 24, batch_id: 400, cost: 0.008610, acc: 1.000000
[Wed Oct 10 17:06:11 2018] epoch_id: 24, batch_id: 500, cost: 0.010603, acc: 0.992188
[Wed Oct 10 17:06:13 2018] epoch_id: 24, batch_id: 600, cost: 0.068159, acc: 0.984375
[Wed Oct 10 17:06:15 2018] epoch_id: 24, batch_id: 700, cost: 0.031611, acc: 0.992188
[Wed Oct 10 17:06:18 2018] epoch_id: 24, batch_id: 800, cost: 0.005276, acc: 1.000000
[Wed Oct 10 17:06:20 2018] epoch_id: 24, batch_id: 900, cost: 0.019978, acc: 0.992188
[Wed Oct 10 17:06:22 2018] epoch_id: 24, batch_id: 1000, cost: 0.061957, acc: 0.992188
[Wed Oct 10 17:06:25 2018] epoch_id: 24, batch_id: 1100, cost: 0.015165, acc: 0.992188
[Wed Oct 10 17:06:27 2018] epoch_id: 24, batch_id: 1200, cost: 0.052448, acc: 0.976562
[Wed Oct 10 17:06:29 2018] epoch_id: 24, batch_id: 1300, cost: 0.003287, acc: 1.000000
[Wed Oct 10 17:06:31 2018] epoch_id: 24, batch_id: 1400, cost: 0.027564, acc: 0.992188
[Wed Oct 10 17:06:34 2018] epoch_id: 24, batch_id: 1500, cost: 0.002861, acc: 1.000000
[Wed Oct 10 17:06:36 2018] epoch_id: 24, batch_id: 1600, cost: 0.022500, acc: 0.992188
[Wed Oct 10 17:06:38 2018] epoch_id: 24, batch_id: 1700, cost: 0.041690, acc: 0.984375
[Wed Oct 10 17:06:40 2018] epoch_id: 24, batch_id: 1800, cost: 0.016889, acc: 0.992188
[Wed Oct 10 17:06:43 2018] epoch_id: 24, batch_id: 1900, cost: 0.026357, acc: 0.992188
[Wed Oct 10 17:06:45 2018] epoch_id: 24, batch_id: 2000, cost: 0.035357, acc: 0.984375
[Wed Oct 10 17:06:47 2018] epoch_id: 24, batch_id: 2100, cost: 0.070517, acc: 0.960938
[Wed Oct 10 17:06:49 2018] epoch_id: 24, batch_id: 2200, cost: 0.021093, acc: 0.984375
[Wed Oct 10 17:06:52 2018] epoch_id: 24, batch_id: 2300, cost: 0.003296, acc: 1.000000
[Wed Oct 10 17:06:54 2018] epoch_id: 24, batch_id: 2400, cost: 0.002669, acc: 1.000000
[Wed Oct 10 17:06:56 2018] epoch_id: 24, batch_id: 2500, cost: 0.047008, acc: 0.976562
[Wed Oct 10 17:06:58 2018] epoch_id: 24, batch_id: 2600, cost: 0.015561, acc: 0.992188
[Wed Oct 10 17:07:00 2018] epoch_id: 24, batch_id: 2700, cost: 0.074711, acc: 0.984375
[Wed Oct 10 17:07:03 2018] epoch_id: 24, batch_id: 2800, cost: 0.021376, acc: 0.992188
[Wed Oct 10 17:07:05 2018] epoch_id: 24, batch_id: 2900, cost: 0.013928, acc: 1.000000
[Wed Oct 10 17:07:07 2018] epoch_id: 24, batch_id: 3000, cost: 0.019474, acc: 0.992188
[Wed Oct 10 17:07:07 2018] epoch_id: 24, train_avg_cost: 0.020611, train_avg_acc: 0.992913
[Wed Oct 10 17:07:08 2018] epoch_id: 24, dev_cost: 1.249092, accuracy: 0.8329
[Wed Oct 10 17:07:09 2018] epoch_id: 24, test_cost: 1.206091, accuracy: 0.8348
[Wed Oct 10 17:07:18 2018] epoch_id: 25, batch_id: 0, cost: 0.009832, acc: 1.000000
[Wed Oct 10 17:07:21 2018] epoch_id: 25, batch_id: 100, cost: 0.007028, acc: 1.000000
[Wed Oct 10 17:07:23 2018] epoch_id: 25, batch_id: 200, cost: 0.029548, acc: 0.984375
[Wed Oct 10 17:07:25 2018] epoch_id: 25, batch_id: 300, cost: 0.001753, acc: 1.000000
[Wed Oct 10 17:07:28 2018] epoch_id: 25, batch_id: 400, cost: 0.001457, acc: 1.000000
[Wed Oct 10 17:07:30 2018] epoch_id: 25, batch_id: 500, cost: 0.004209, acc: 1.000000
[Wed Oct 10 17:07:32 2018] epoch_id: 25, batch_id: 600, cost: 0.002758, acc: 1.000000
[Wed Oct 10 17:07:35 2018] epoch_id: 25, batch_id: 700, cost: 0.039204, acc: 0.984375
[Wed Oct 10 17:07:37 2018] epoch_id: 25, batch_id: 800, cost: 0.004454, acc: 1.000000
[Wed Oct 10 17:07:39 2018] epoch_id: 25, batch_id: 900, cost: 0.005273, acc: 1.000000
[Wed Oct 10 17:07:41 2018] epoch_id: 25, batch_id: 1000, cost: 0.008021, acc: 0.992188
[Wed Oct 10 17:07:44 2018] epoch_id: 25, batch_id: 1100, cost: 0.037441, acc: 0.976562
[Wed Oct 10 17:07:46 2018] epoch_id: 25, batch_id: 1200, cost: 0.011153, acc: 1.000000
[Wed Oct 10 17:07:48 2018] epoch_id: 25, batch_id: 1300, cost: 0.064342, acc: 0.992188
[Wed Oct 10 17:07:50 2018] epoch_id: 25, batch_id: 1400, cost: 0.036600, acc: 0.992188
[Wed Oct 10 17:07:53 2018] epoch_id: 25, batch_id: 1500, cost: 0.046661, acc: 0.992188
[Wed Oct 10 17:07:55 2018] epoch_id: 25, batch_id: 1600, cost: 0.015580, acc: 1.000000
[Wed Oct 10 17:07:57 2018] epoch_id: 25, batch_id: 1700, cost: 0.008311, acc: 1.000000
[Wed Oct 10 17:07:59 2018] epoch_id: 25, batch_id: 1800, cost: 0.004560, acc: 1.000000
[Wed Oct 10 17:08:02 2018] epoch_id: 25, batch_id: 1900, cost: 0.012200, acc: 1.000000
[Wed Oct 10 17:08:04 2018] epoch_id: 25, batch_id: 2000, cost: 0.006555, acc: 1.000000
[Wed Oct 10 17:08:06 2018] epoch_id: 25, batch_id: 2100, cost: 0.028259, acc: 0.992188
[Wed Oct 10 17:08:08 2018] epoch_id: 25, batch_id: 2200, cost: 0.003801, acc: 1.000000
[Wed Oct 10 17:08:11 2018] epoch_id: 25, batch_id: 2300, cost: 0.004532, acc: 1.000000
[Wed Oct 10 17:08:13 2018] epoch_id: 25, batch_id: 2400, cost: 0.008551, acc: 1.000000
[Wed Oct 10 17:08:15 2018] epoch_id: 25, batch_id: 2500, cost: 0.013781, acc: 0.992188
[Wed Oct 10 17:08:17 2018] epoch_id: 25, batch_id: 2600, cost: 0.024098, acc: 0.992188
[Wed Oct 10 17:08:21 2018] epoch_id: 25, batch_id: 2700, cost: 0.009117, acc: 0.992188
[Wed Oct 10 17:08:23 2018] epoch_id: 25, batch_id: 2800, cost: 0.032231, acc: 0.984375
[Wed Oct 10 17:08:25 2018] epoch_id: 25, batch_id: 2900, cost: 0.004502, acc: 1.000000
[Wed Oct 10 17:08:28 2018] epoch_id: 25, batch_id: 3000, cost: 0.006727, acc: 1.000000
[Wed Oct 10 17:08:28 2018] epoch_id: 25, train_avg_cost: 0.020529, train_avg_acc: 0.993019
[Wed Oct 10 17:08:29 2018] epoch_id: 25, dev_cost: 1.238637, accuracy: 0.8323
[Wed Oct 10 17:08:30 2018] epoch_id: 25, test_cost: 1.213099, accuracy: 0.8345
[Wed Oct 10 17:08:38 2018] epoch_id: 26, batch_id: 0, cost: 0.040923, acc: 0.992188
[Wed Oct 10 17:08:40 2018] epoch_id: 26, batch_id: 100, cost: 0.003892, acc: 1.000000
[Wed Oct 10 17:08:43 2018] epoch_id: 26, batch_id: 200, cost: 0.005719, acc: 1.000000
[Wed Oct 10 17:08:45 2018] epoch_id: 26, batch_id: 300, cost: 0.011791, acc: 1.000000
[Wed Oct 10 17:08:47 2018] epoch_id: 26, batch_id: 400, cost: 0.015297, acc: 0.992188
[Wed Oct 10 17:08:49 2018] epoch_id: 26, batch_id: 500, cost: 0.067796, acc: 0.984375
[Wed Oct 10 17:08:52 2018] epoch_id: 26, batch_id: 600, cost: 0.041215, acc: 0.992188
[Wed Oct 10 17:08:54 2018] epoch_id: 26, batch_id: 700, cost: 0.017786, acc: 0.984375
[Wed Oct 10 17:08:56 2018] epoch_id: 26, batch_id: 800, cost: 0.033173, acc: 0.992188
[Wed Oct 10 17:08:59 2018] epoch_id: 26, batch_id: 900, cost: 0.007282, acc: 0.992188
[Wed Oct 10 17:09:01 2018] epoch_id: 26, batch_id: 1000, cost: 0.028577, acc: 0.992188
[Wed Oct 10 17:09:03 2018] epoch_id: 26, batch_id: 1100, cost: 0.017994, acc: 0.992188
[Wed Oct 10 17:09:05 2018] epoch_id: 26, batch_id: 1200, cost: 0.005319, acc: 1.000000
[Wed Oct 10 17:09:08 2018] epoch_id: 26, batch_id: 1300, cost: 0.030209, acc: 0.992188
[Wed Oct 10 17:09:10 2018] epoch_id: 26, batch_id: 1400, cost: 0.012992, acc: 0.992188
[Wed Oct 10 17:09:12 2018] epoch_id: 26, batch_id: 1500, cost: 0.014228, acc: 0.992188
[Wed Oct 10 17:09:15 2018] epoch_id: 26, batch_id: 1600, cost: 0.008148, acc: 1.000000
[Wed Oct 10 17:09:17 2018] epoch_id: 26, batch_id: 1700, cost: 0.003299, acc: 1.000000
[Wed Oct 10 17:09:19 2018] epoch_id: 26, batch_id: 1800, cost: 0.026134, acc: 0.992188
[Wed Oct 10 17:09:22 2018] epoch_id: 26, batch_id: 1900, cost: 0.016610, acc: 1.000000
[Wed Oct 10 17:09:24 2018] epoch_id: 26, batch_id: 2000, cost: 0.019105, acc: 0.992188
[Wed Oct 10 17:09:26 2018] epoch_id: 26, batch_id: 2100, cost: 0.004593, acc: 1.000000
[Wed Oct 10 17:09:28 2018] epoch_id: 26, batch_id: 2200, cost: 0.036595, acc: 0.992188
[Wed Oct 10 17:09:32 2018] epoch_id: 26, batch_id: 2300, cost: 0.003857, acc: 1.000000
[Wed Oct 10 17:09:34 2018] epoch_id: 26, batch_id: 2400, cost: 0.002700, acc: 1.000000
[Wed Oct 10 17:09:36 2018] epoch_id: 26, batch_id: 2500, cost: 0.002269, acc: 1.000000
[Wed Oct 10 17:09:38 2018] epoch_id: 26, batch_id: 2600, cost: 0.022186, acc: 0.992188
[Wed Oct 10 17:09:41 2018] epoch_id: 26, batch_id: 2700, cost: 0.035991, acc: 0.976562
[Wed Oct 10 17:09:43 2018] epoch_id: 26, batch_id: 2800, cost: 0.005430, acc: 1.000000
[Wed Oct 10 17:09:45 2018] epoch_id: 26, batch_id: 2900, cost: 0.017578, acc: 0.992188
[Wed Oct 10 17:09:47 2018] epoch_id: 26, batch_id: 3000, cost: 0.030596, acc: 0.984375
[Wed Oct 10 17:09:48 2018] epoch_id: 26, train_avg_cost: 0.019528, train_avg_acc: 0.993425
[Wed Oct 10 17:09:49 2018] epoch_id: 26, dev_cost: 1.452644, accuracy: 0.8334
[Wed Oct 10 17:09:50 2018] epoch_id: 26, test_cost: 1.449995, accuracy: 0.8329
[Wed Oct 10 17:09:58 2018] epoch_id: 27, batch_id: 0, cost: 0.006640, acc: 1.000000
[Wed Oct 10 17:10:00 2018] epoch_id: 27, batch_id: 100, cost: 0.001101, acc: 1.000000
[Wed Oct 10 17:10:02 2018] epoch_id: 27, batch_id: 200, cost: 0.019329, acc: 0.992188
[Wed Oct 10 17:10:05 2018] epoch_id: 27, batch_id: 300, cost: 0.002996, acc: 1.000000
[Wed Oct 10 17:10:07 2018] epoch_id: 27, batch_id: 400, cost: 0.002077, acc: 1.000000
[Wed Oct 10 17:10:09 2018] epoch_id: 27, batch_id: 500, cost: 0.007058, acc: 1.000000
[Wed Oct 10 17:10:11 2018] epoch_id: 27, batch_id: 600, cost: 0.002119, acc: 1.000000
[Wed Oct 10 17:10:14 2018] epoch_id: 27, batch_id: 700, cost: 0.039876, acc: 0.984375
[Wed Oct 10 17:10:16 2018] epoch_id: 27, batch_id: 800, cost: 0.010680, acc: 1.000000
[Wed Oct 10 17:10:19 2018] epoch_id: 27, batch_id: 900, cost: 0.004508, acc: 1.000000
[Wed Oct 10 17:10:21 2018] epoch_id: 27, batch_id: 1000, cost: 0.029683, acc: 0.984375
[Wed Oct 10 17:10:24 2018] epoch_id: 27, batch_id: 1100, cost: 0.011985, acc: 1.000000
[Wed Oct 10 17:10:26 2018] epoch_id: 27, batch_id: 1200, cost: 0.004091, acc: 1.000000
[Wed Oct 10 17:10:28 2018] epoch_id: 27, batch_id: 1300, cost: 0.028585, acc: 0.984375
[Wed Oct 10 17:10:30 2018] epoch_id: 27, batch_id: 1400, cost: 0.001462, acc: 1.000000
[Wed Oct 10 17:10:33 2018] epoch_id: 27, batch_id: 1500, cost: 0.033079, acc: 0.992188
[Wed Oct 10 17:10:35 2018] epoch_id: 27, batch_id: 1600, cost: 0.017679, acc: 0.992188
[Wed Oct 10 17:10:37 2018] epoch_id: 27, batch_id: 1700, cost: 0.000921, acc: 1.000000
[Wed Oct 10 17:10:39 2018] epoch_id: 27, batch_id: 1800, cost: 0.029850, acc: 0.984375
[Wed Oct 10 17:10:42 2018] epoch_id: 27, batch_id: 1900, cost: 0.005679, acc: 1.000000
[Wed Oct 10 17:10:44 2018] epoch_id: 27, batch_id: 2000, cost: 0.007635, acc: 0.992188
[Wed Oct 10 17:10:46 2018] epoch_id: 27, batch_id: 2100, cost: 0.056935, acc: 0.984375
[Wed Oct 10 17:10:48 2018] epoch_id: 27, batch_id: 2200, cost: 0.014361, acc: 1.000000
[Wed Oct 10 17:10:51 2018] epoch_id: 27, batch_id: 2300, cost: 0.040282, acc: 0.984375
[Wed Oct 10 17:10:53 2018] epoch_id: 27, batch_id: 2400, cost: 0.004073, acc: 1.000000
[Wed Oct 10 17:10:55 2018] epoch_id: 27, batch_id: 2500, cost: 0.013922, acc: 0.984375
[Wed Oct 10 17:10:57 2018] epoch_id: 27, batch_id: 2600, cost: 0.018309, acc: 0.992188
[Wed Oct 10 17:10:59 2018] epoch_id: 27, batch_id: 2700, cost: 0.011584, acc: 0.992188
[Wed Oct 10 17:11:02 2018] epoch_id: 27, batch_id: 2800, cost: 0.018637, acc: 0.992188
[Wed Oct 10 17:11:04 2018] epoch_id: 27, batch_id: 2900, cost: 0.013617, acc: 0.992188
[Wed Oct 10 17:11:06 2018] epoch_id: 27, batch_id: 3000, cost: 0.079333, acc: 0.976562
[Wed Oct 10 17:11:07 2018] epoch_id: 27, train_avg_cost: 0.018039, train_avg_acc: 0.993701
[Wed Oct 10 17:11:08 2018] epoch_id: 27, dev_cost: 1.463991, accuracy: 0.8333
[Wed Oct 10 17:11:09 2018] epoch_id: 27, test_cost: 1.450415, accuracy: 0.8334
[Wed Oct 10 17:11:17 2018] epoch_id: 28, batch_id: 0, cost: 0.023539, acc: 0.984375
[Wed Oct 10 17:11:20 2018] epoch_id: 28, batch_id: 100, cost: 0.005577, acc: 1.000000
[Wed Oct 10 17:11:22 2018] epoch_id: 28, batch_id: 200, cost: 0.001478, acc: 1.000000
[Wed Oct 10 17:11:24 2018] epoch_id: 28, batch_id: 300, cost: 0.005870, acc: 1.000000
[Wed Oct 10 17:11:26 2018] epoch_id: 28, batch_id: 400, cost: 0.021292, acc: 0.992188
[Wed Oct 10 17:11:29 2018] epoch_id: 28, batch_id: 500, cost: 0.032081, acc: 0.984375
[Wed Oct 10 17:11:31 2018] epoch_id: 28, batch_id: 600, cost: 0.004568, acc: 1.000000
[Wed Oct 10 17:11:33 2018] epoch_id: 28, batch_id: 700, cost: 0.006552, acc: 1.000000
[Wed Oct 10 17:11:35 2018] epoch_id: 28, batch_id: 800, cost: 0.012579, acc: 0.992188
[Wed Oct 10 17:11:38 2018] epoch_id: 28, batch_id: 900, cost: 0.004214, acc: 1.000000
[Wed Oct 10 17:11:40 2018] epoch_id: 28, batch_id: 1000, cost: 0.023843, acc: 0.984375
[Wed Oct 10 17:11:42 2018] epoch_id: 28, batch_id: 1100, cost: 0.017869, acc: 0.992188
[Wed Oct 10 17:11:44 2018] epoch_id: 28, batch_id: 1200, cost: 0.045617, acc: 0.984375
[Wed Oct 10 17:11:46 2018] epoch_id: 28, batch_id: 1300, cost: 0.012739, acc: 0.992188
[Wed Oct 10 17:11:49 2018] epoch_id: 28, batch_id: 1400, cost: 0.020053, acc: 0.992188
[Wed Oct 10 17:11:51 2018] epoch_id: 28, batch_id: 1500, cost: 0.006956, acc: 1.000000
[Wed Oct 10 17:11:53 2018] epoch_id: 28, batch_id: 1600, cost: 0.022830, acc: 0.984375
[Wed Oct 10 17:11:55 2018] epoch_id: 28, batch_id: 1700, cost: 0.008924, acc: 1.000000
[Wed Oct 10 17:11:58 2018] epoch_id: 28, batch_id: 1800, cost: 0.013902, acc: 0.992188
[Wed Oct 10 17:12:01 2018] epoch_id: 28, batch_id: 1900, cost: 0.026418, acc: 0.984375
[Wed Oct 10 17:12:03 2018] epoch_id: 28, batch_id: 2000, cost: 0.006809, acc: 1.000000
[Wed Oct 10 17:12:05 2018] epoch_id: 28, batch_id: 2100, cost: 0.041039, acc: 0.984375
[Wed Oct 10 17:12:08 2018] epoch_id: 28, batch_id: 2200, cost: 0.023235, acc: 0.992188
[Wed Oct 10 17:12:10 2018] epoch_id: 28, batch_id: 2300, cost: 0.057685, acc: 0.976562
[Wed Oct 10 17:12:12 2018] epoch_id: 28, batch_id: 2400, cost: 0.012688, acc: 1.000000
[Wed Oct 10 17:12:14 2018] epoch_id: 28, batch_id: 2500, cost: 0.010697, acc: 0.992188
[Wed Oct 10 17:12:16 2018] epoch_id: 28, batch_id: 2600, cost: 0.025213, acc: 0.992188
[Wed Oct 10 17:12:19 2018] epoch_id: 28, batch_id: 2700, cost: 0.011269, acc: 0.992188
[Wed Oct 10 17:12:21 2018] epoch_id: 28, batch_id: 2800, cost: 0.001141, acc: 1.000000
[Wed Oct 10 17:12:23 2018] epoch_id: 28, batch_id: 2900, cost: 0.049410, acc: 0.984375
[Wed Oct 10 17:12:25 2018] epoch_id: 28, batch_id: 3000, cost: 0.019739, acc: 0.992188
[Wed Oct 10 17:12:26 2018] epoch_id: 28, train_avg_cost: 0.018105, train_avg_acc: 0.993756
[Wed Oct 10 17:12:27 2018] epoch_id: 28, dev_cost: 1.200318, accuracy: 0.8345
[Wed Oct 10 17:12:28 2018] epoch_id: 28, test_cost: 1.228304, accuracy: 0.8308
[Wed Oct 10 17:12:36 2018] epoch_id: 29, batch_id: 0, cost: 0.004694, acc: 1.000000
[Wed Oct 10 17:12:39 2018] epoch_id: 29, batch_id: 100, cost: 0.008528, acc: 0.992188
[Wed Oct 10 17:12:41 2018] epoch_id: 29, batch_id: 200, cost: 0.006778, acc: 0.992188
[Wed Oct 10 17:12:43 2018] epoch_id: 29, batch_id: 300, cost: 0.026610, acc: 0.992188
[Wed Oct 10 17:12:45 2018] epoch_id: 29, batch_id: 400, cost: 0.008479, acc: 1.000000
[Wed Oct 10 17:12:47 2018] epoch_id: 29, batch_id: 500, cost: 0.021705, acc: 0.984375
[Wed Oct 10 17:12:50 2018] epoch_id: 29, batch_id: 600, cost: 0.010583, acc: 0.992188
[Wed Oct 10 17:12:52 2018] epoch_id: 29, batch_id: 700, cost: 0.056105, acc: 0.992188
[Wed Oct 10 17:12:54 2018] epoch_id: 29, batch_id: 800, cost: 0.000675, acc: 1.000000
[Wed Oct 10 17:12:56 2018] epoch_id: 29, batch_id: 900, cost: 0.011277, acc: 1.000000
[Wed Oct 10 17:12:58 2018] epoch_id: 29, batch_id: 1000, cost: 0.006004, acc: 1.000000
[Wed Oct 10 17:13:01 2018] epoch_id: 29, batch_id: 1100, cost: 0.000914, acc: 1.000000
[Wed Oct 10 17:13:03 2018] epoch_id: 29, batch_id: 1200, cost: 0.001097, acc: 1.000000
[Wed Oct 10 17:13:05 2018] epoch_id: 29, batch_id: 1300, cost: 0.002556, acc: 1.000000
[Wed Oct 10 17:13:07 2018] epoch_id: 29, batch_id: 1400, cost: 0.005061, acc: 1.000000
[Wed Oct 10 17:13:10 2018] epoch_id: 29, batch_id: 1500, cost: 0.002417, acc: 1.000000
[Wed Oct 10 17:13:12 2018] epoch_id: 29, batch_id: 1600, cost: 0.001037, acc: 1.000000
[Wed Oct 10 17:13:14 2018] epoch_id: 29, batch_id: 1700, cost: 0.003415, acc: 1.000000
[Wed Oct 10 17:13:16 2018] epoch_id: 29, batch_id: 1800, cost: 0.033230, acc: 0.984375
[Wed Oct 10 17:13:19 2018] epoch_id: 29, batch_id: 1900, cost: 0.002914, acc: 1.000000
[Wed Oct 10 17:13:21 2018] epoch_id: 29, batch_id: 2000, cost: 0.036463, acc: 0.984375
[Wed Oct 10 17:13:23 2018] epoch_id: 29, batch_id: 2100, cost: 0.067978, acc: 0.976562
[Wed Oct 10 17:13:25 2018] epoch_id: 29, batch_id: 2200, cost: 0.028088, acc: 0.992188
[Wed Oct 10 17:13:28 2018] epoch_id: 29, batch_id: 2300, cost: 0.013688, acc: 0.992188
[Wed Oct 10 17:13:30 2018] epoch_id: 29, batch_id: 2400, cost: 0.000238, acc: 1.000000
[Wed Oct 10 17:13:32 2018] epoch_id: 29, batch_id: 2500, cost: 0.006287, acc: 1.000000
[Wed Oct 10 17:13:35 2018] epoch_id: 29, batch_id: 2600, cost: 0.058838, acc: 0.992188
[Wed Oct 10 17:13:37 2018] epoch_id: 29, batch_id: 2700, cost: 0.013440, acc: 0.992188
[Wed Oct 10 17:13:39 2018] epoch_id: 29, batch_id: 2800, cost: 0.002577, acc: 1.000000
[Wed Oct 10 17:13:41 2018] epoch_id: 29, batch_id: 2900, cost: 0.020076, acc: 0.992188
[Wed Oct 10 17:13:43 2018] epoch_id: 29, batch_id: 3000, cost: 0.025126, acc: 0.992188
[Wed Oct 10 17:13:44 2018] epoch_id: 29, train_avg_cost: 0.017397, train_avg_acc: 0.994107
[Wed Oct 10 17:13:45 2018] epoch_id: 29, dev_cost: 1.314838, accuracy: 0.8304
[Wed Oct 10 17:13:46 2018] epoch_id: 29, test_cost: 1.349980, accuracy: 0.8298
[Wed Oct 10 17:13:55 2018] epoch_id: 30, batch_id: 0, cost: 0.063661, acc: 0.984375
[Wed Oct 10 17:13:57 2018] epoch_id: 30, batch_id: 100, cost: 0.005445, acc: 1.000000
[Wed Oct 10 17:13:59 2018] epoch_id: 30, batch_id: 200, cost: 0.025451, acc: 0.984375
[Wed Oct 10 17:14:01 2018] epoch_id: 30, batch_id: 300, cost: 0.019455, acc: 0.992188
[Wed Oct 10 17:14:04 2018] epoch_id: 30, batch_id: 400, cost: 0.000182, acc: 1.000000
[Wed Oct 10 17:14:06 2018] epoch_id: 30, batch_id: 500, cost: 0.036089, acc: 0.984375
[Wed Oct 10 17:14:08 2018] epoch_id: 30, batch_id: 600, cost: 0.003895, acc: 1.000000
[Wed Oct 10 17:14:10 2018] epoch_id: 30, batch_id: 700, cost: 0.012125, acc: 0.992188
[Wed Oct 10 17:14:13 2018] epoch_id: 30, batch_id: 800, cost: 0.007463, acc: 1.000000
[Wed Oct 10 17:14:15 2018] epoch_id: 30, batch_id: 900, cost: 0.043093, acc: 0.992188
[Wed Oct 10 17:14:17 2018] epoch_id: 30, batch_id: 1000, cost: 0.023025, acc: 0.992188
[Wed Oct 10 17:14:20 2018] epoch_id: 30, batch_id: 1100, cost: 0.008640, acc: 0.992188
[Wed Oct 10 17:14:22 2018] epoch_id: 30, batch_id: 1200, cost: 0.023361, acc: 0.984375
[Wed Oct 10 17:14:24 2018] epoch_id: 30, batch_id: 1300, cost: 0.003226, acc: 1.000000
[Wed Oct 10 17:14:27 2018] epoch_id: 30, batch_id: 1400, cost: 0.010225, acc: 0.992188
[Wed Oct 10 17:14:29 2018] epoch_id: 30, batch_id: 1500, cost: 0.009733, acc: 1.000000
[Wed Oct 10 17:14:31 2018] epoch_id: 30, batch_id: 1600, cost: 0.014048, acc: 0.992188
[Wed Oct 10 17:14:34 2018] epoch_id: 30, batch_id: 1700, cost: 0.008200, acc: 1.000000
[Wed Oct 10 17:14:36 2018] epoch_id: 30, batch_id: 1800, cost: 0.035217, acc: 0.992188
[Wed Oct 10 17:14:38 2018] epoch_id: 30, batch_id: 1900, cost: 0.002707, acc: 1.000000
[Wed Oct 10 17:14:40 2018] epoch_id: 30, batch_id: 2000, cost: 0.028292, acc: 0.984375
[Wed Oct 10 17:14:43 2018] epoch_id: 30, batch_id: 2100, cost: 0.003164, acc: 1.000000
[Wed Oct 10 17:14:45 2018] epoch_id: 30, batch_id: 2200, cost: 0.014421, acc: 0.992188
[Wed Oct 10 17:14:47 2018] epoch_id: 30, batch_id: 2300, cost: 0.001986, acc: 1.000000
[Wed Oct 10 17:14:49 2018] epoch_id: 30, batch_id: 2400, cost: 0.038462, acc: 0.992188
[Wed Oct 10 17:14:52 2018] epoch_id: 30, batch_id: 2500, cost: 0.003580, acc: 1.000000
[Wed Oct 10 17:14:54 2018] epoch_id: 30, batch_id: 2600, cost: 0.061259, acc: 0.984375
[Wed Oct 10 17:14:56 2018] epoch_id: 30, batch_id: 2700, cost: 0.042758, acc: 0.992188
[Wed Oct 10 17:14:59 2018] epoch_id: 30, batch_id: 2800, cost: 0.012991, acc: 0.992188
[Wed Oct 10 17:15:02 2018] epoch_id: 30, batch_id: 2900, cost: 0.021263, acc: 0.992188
[Wed Oct 10 17:15:04 2018] epoch_id: 30, batch_id: 3000, cost: 0.046058, acc: 0.992188
[Wed Oct 10 17:15:05 2018] epoch_id: 30, train_avg_cost: 0.016908, train_avg_acc: 0.994391
[Wed Oct 10 17:15:06 2018] epoch_id: 30, dev_cost: 1.214737, accuracy: 0.8343
[Wed Oct 10 17:15:07 2018] epoch_id: 30, test_cost: 1.247275, accuracy: 0.828
[Wed Oct 10 17:15:15 2018] epoch_id: 31, batch_id: 0, cost: 0.019613, acc: 0.992188
[Wed Oct 10 17:15:17 2018] epoch_id: 31, batch_id: 100, cost: 0.048000, acc: 0.984375
[Wed Oct 10 17:15:19 2018] epoch_id: 31, batch_id: 200, cost: 0.038604, acc: 0.992188
[Wed Oct 10 17:15:21 2018] epoch_id: 31, batch_id: 300, cost: 0.003548, acc: 1.000000
[Wed Oct 10 17:15:24 2018] epoch_id: 31, batch_id: 400, cost: 0.001539, acc: 1.000000
[Wed Oct 10 17:15:26 2018] epoch_id: 31, batch_id: 500, cost: 0.034219, acc: 0.992188
[Wed Oct 10 17:15:28 2018] epoch_id: 31, batch_id: 600, cost: 0.005696, acc: 1.000000
[Wed Oct 10 17:15:31 2018] epoch_id: 31, batch_id: 700, cost: 0.012590, acc: 0.992188
[Wed Oct 10 17:15:33 2018] epoch_id: 31, batch_id: 800, cost: 0.010021, acc: 0.992188
[Wed Oct 10 17:15:35 2018] epoch_id: 31, batch_id: 900, cost: 0.004838, acc: 1.000000
[Wed Oct 10 17:15:38 2018] epoch_id: 31, batch_id: 1000, cost: 0.006327, acc: 1.000000
[Wed Oct 10 17:15:40 2018] epoch_id: 31, batch_id: 1100, cost: 0.019881, acc: 0.992188
[Wed Oct 10 17:15:42 2018] epoch_id: 31, batch_id: 1200, cost: 0.006641, acc: 1.000000
[Wed Oct 10 17:15:44 2018] epoch_id: 31, batch_id: 1300, cost: 0.014323, acc: 0.992188
[Wed Oct 10 17:15:47 2018] epoch_id: 31, batch_id: 1400, cost: 0.008565, acc: 1.000000
[Wed Oct 10 17:15:49 2018] epoch_id: 31, batch_id: 1500, cost: 0.003106, acc: 1.000000
[Wed Oct 10 17:15:51 2018] epoch_id: 31, batch_id: 1600, cost: 0.023656, acc: 0.992188
[Wed Oct 10 17:15:53 2018] epoch_id: 31, batch_id: 1700, cost: 0.014398, acc: 1.000000
[Wed Oct 10 17:15:56 2018] epoch_id: 31, batch_id: 1800, cost: 0.005019, acc: 1.000000
[Wed Oct 10 17:15:58 2018] epoch_id: 31, batch_id: 1900, cost: 0.042051, acc: 0.984375
[Wed Oct 10 17:16:00 2018] epoch_id: 31, batch_id: 2000, cost: 0.005070, acc: 1.000000
[Wed Oct 10 17:16:03 2018] epoch_id: 31, batch_id: 2100, cost: 0.071147, acc: 0.984375
[Wed Oct 10 17:16:05 2018] epoch_id: 31, batch_id: 2200, cost: 0.004077, acc: 1.000000
[Wed Oct 10 17:16:07 2018] epoch_id: 31, batch_id: 2300, cost: 0.000753, acc: 1.000000
[Wed Oct 10 17:16:11 2018] epoch_id: 31, batch_id: 2400, cost: 0.007293, acc: 1.000000
[Wed Oct 10 17:16:13 2018] epoch_id: 31, batch_id: 2500, cost: 0.020403, acc: 0.992188
[Wed Oct 10 17:16:15 2018] epoch_id: 31, batch_id: 2600, cost: 0.002491, acc: 1.000000
[Wed Oct 10 17:16:17 2018] epoch_id: 31, batch_id: 2700, cost: 0.001376, acc: 1.000000
[Wed Oct 10 17:16:20 2018] epoch_id: 31, batch_id: 2800, cost: 0.006589, acc: 1.000000
[Wed Oct 10 17:16:22 2018] epoch_id: 31, batch_id: 2900, cost: 0.009986, acc: 1.000000
[Wed Oct 10 17:16:24 2018] epoch_id: 31, batch_id: 3000, cost: 0.004628, acc: 1.000000
[Wed Oct 10 17:16:25 2018] epoch_id: 31, train_avg_cost: 0.016863, train_avg_acc: 0.994502
[Wed Oct 10 17:16:26 2018] epoch_id: 31, dev_cost: 1.237226, accuracy: 0.8348
[Wed Oct 10 17:16:27 2018] epoch_id: 31, test_cost: 1.256692, accuracy: 0.8327
[Wed Oct 10 17:16:35 2018] epoch_id: 32, batch_id: 0, cost: 0.001936, acc: 1.000000
[Wed Oct 10 17:16:37 2018] epoch_id: 32, batch_id: 100, cost: 0.002628, acc: 1.000000
[Wed Oct 10 17:16:40 2018] epoch_id: 32, batch_id: 200, cost: 0.006948, acc: 1.000000
[Wed Oct 10 17:16:42 2018] epoch_id: 32, batch_id: 300, cost: 0.001289, acc: 1.000000
[Wed Oct 10 17:16:44 2018] epoch_id: 32, batch_id: 400, cost: 0.016850, acc: 1.000000
[Wed Oct 10 17:16:46 2018] epoch_id: 32, batch_id: 500, cost: 0.001709, acc: 1.000000
[Wed Oct 10 17:16:49 2018] epoch_id: 32, batch_id: 600, cost: 0.000500, acc: 1.000000
[Wed Oct 10 17:16:51 2018] epoch_id: 32, batch_id: 700, cost: 0.026876, acc: 0.992188
[Wed Oct 10 17:16:54 2018] epoch_id: 32, batch_id: 800, cost: 0.032499, acc: 0.992188
[Wed Oct 10 17:16:56 2018] epoch_id: 32, batch_id: 900, cost: 0.008563, acc: 1.000000
[Wed Oct 10 17:16:59 2018] epoch_id: 32, batch_id: 1000, cost: 0.033638, acc: 0.992188
[Wed Oct 10 17:17:01 2018] epoch_id: 32, batch_id: 1100, cost: 0.021626, acc: 0.992188
[Wed Oct 10 17:17:03 2018] epoch_id: 32, batch_id: 1200, cost: 0.035490, acc: 0.984375
[Wed Oct 10 17:17:05 2018] epoch_id: 32, batch_id: 1300, cost: 0.064303, acc: 0.992188
[Wed Oct 10 17:17:08 2018] epoch_id: 32, batch_id: 1400, cost: 0.000839, acc: 1.000000
[Wed Oct 10 17:17:10 2018] epoch_id: 32, batch_id: 1500, cost: 0.014770, acc: 0.992188
[Wed Oct 10 17:17:12 2018] epoch_id: 32, batch_id: 1600, cost: 0.067803, acc: 0.992188
[Wed Oct 10 17:17:14 2018] epoch_id: 32, batch_id: 1700, cost: 0.001507, acc: 1.000000
[Wed Oct 10 17:17:17 2018] epoch_id: 32, batch_id: 1800, cost: 0.039594, acc: 0.984375
[Wed Oct 10 17:17:19 2018] epoch_id: 32, batch_id: 1900, cost: 0.016198, acc: 0.992188
[Wed Oct 10 17:17:21 2018] epoch_id: 32, batch_id: 2000, cost: 0.027783, acc: 0.984375
[Wed Oct 10 17:17:24 2018] epoch_id: 32, batch_id: 2100, cost: 0.010040, acc: 0.992188
[Wed Oct 10 17:17:26 2018] epoch_id: 32, batch_id: 2200, cost: 0.043833, acc: 0.992188
[Wed Oct 10 17:17:28 2018] epoch_id: 32, batch_id: 2300, cost: 0.012850, acc: 0.992188
[Wed Oct 10 17:17:31 2018] epoch_id: 32, batch_id: 2400, cost: 0.010643, acc: 1.000000
[Wed Oct 10 17:17:33 2018] epoch_id: 32, batch_id: 2500, cost: 0.013513, acc: 0.992188
[Wed Oct 10 17:17:35 2018] epoch_id: 32, batch_id: 2600, cost: 0.021498, acc: 0.984375
[Wed Oct 10 17:17:38 2018] epoch_id: 32, batch_id: 2700, cost: 0.048091, acc: 0.984375
[Wed Oct 10 17:17:40 2018] epoch_id: 32, batch_id: 2800, cost: 0.054710, acc: 0.984375
[Wed Oct 10 17:17:42 2018] epoch_id: 32, batch_id: 2900, cost: 0.028200, acc: 0.992188
[Wed Oct 10 17:17:44 2018] epoch_id: 32, batch_id: 3000, cost: 0.052160, acc: 0.992188
[Wed Oct 10 17:17:45 2018] epoch_id: 32, train_avg_cost: 0.016115, train_avg_acc: 0.994599
[Wed Oct 10 17:17:46 2018] epoch_id: 32, dev_cost: 1.182178, accuracy: 0.8359
[Wed Oct 10 17:17:47 2018] epoch_id: 32, test_cost: 1.183695, accuracy: 0.8297
[Wed Oct 10 17:17:55 2018] epoch_id: 33, batch_id: 0, cost: 0.002170, acc: 1.000000
[Wed Oct 10 17:17:58 2018] epoch_id: 33, batch_id: 100, cost: 0.000724, acc: 1.000000
[Wed Oct 10 17:18:00 2018] epoch_id: 33, batch_id: 200, cost: 0.102036, acc: 0.968750
[Wed Oct 10 17:18:02 2018] epoch_id: 33, batch_id: 300, cost: 0.006967, acc: 1.000000
[Wed Oct 10 17:18:04 2018] epoch_id: 33, batch_id: 400, cost: 0.004401, acc: 1.000000
[Wed Oct 10 17:18:07 2018] epoch_id: 33, batch_id: 500, cost: 0.006693, acc: 1.000000
[Wed Oct 10 17:18:09 2018] epoch_id: 33, batch_id: 600, cost: 0.002759, acc: 1.000000
[Wed Oct 10 17:18:11 2018] epoch_id: 33, batch_id: 700, cost: 0.000587, acc: 1.000000
[Wed Oct 10 17:18:13 2018] epoch_id: 33, batch_id: 800, cost: 0.006432, acc: 1.000000
[Wed Oct 10 17:18:16 2018] epoch_id: 33, batch_id: 900, cost: 0.043751, acc: 0.984375
[Wed Oct 10 17:18:18 2018] epoch_id: 33, batch_id: 1000, cost: 0.006652, acc: 1.000000
[Wed Oct 10 17:18:20 2018] epoch_id: 33, batch_id: 1100, cost: 0.008419, acc: 1.000000
[Wed Oct 10 17:18:23 2018] epoch_id: 33, batch_id: 1200, cost: 0.012309, acc: 0.992188
[Wed Oct 10 17:18:25 2018] epoch_id: 33, batch_id: 1300, cost: 0.023884, acc: 0.984375
[Wed Oct 10 17:18:27 2018] epoch_id: 33, batch_id: 1400, cost: 0.011711, acc: 0.992188
[Wed Oct 10 17:18:29 2018] epoch_id: 33, batch_id: 1500, cost: 0.005948, acc: 1.000000
[Wed Oct 10 17:18:32 2018] epoch_id: 33, batch_id: 1600, cost: 0.014363, acc: 0.992188
[Wed Oct 10 17:18:34 2018] epoch_id: 33, batch_id: 1700, cost: 0.000291, acc: 1.000000
[Wed Oct 10 17:18:37 2018] epoch_id: 33, batch_id: 1800, cost: 0.005694, acc: 1.000000
[Wed Oct 10 17:18:40 2018] epoch_id: 33, batch_id: 1900, cost: 0.170195, acc: 0.984375
[Wed Oct 10 17:18:42 2018] epoch_id: 33, batch_id: 2000, cost: 0.001044, acc: 1.000000
[Wed Oct 10 17:18:44 2018] epoch_id: 33, batch_id: 2100, cost: 0.004921, acc: 1.000000
[Wed Oct 10 17:18:46 2018] epoch_id: 33, batch_id: 2200, cost: 0.006203, acc: 1.000000
[Wed Oct 10 17:18:48 2018] epoch_id: 33, batch_id: 2300, cost: 0.038624, acc: 0.984375
[Wed Oct 10 17:18:51 2018] epoch_id: 33, batch_id: 2400, cost: 0.067313, acc: 0.976562
[Wed Oct 10 17:18:53 2018] epoch_id: 33, batch_id: 2500, cost: 0.040853, acc: 0.992188
[Wed Oct 10 17:18:55 2018] epoch_id: 33, batch_id: 2600, cost: 0.039087, acc: 0.984375
[Wed Oct 10 17:18:57 2018] epoch_id: 33, batch_id: 2700, cost: 0.004672, acc: 1.000000
[Wed Oct 10 17:19:00 2018] epoch_id: 33, batch_id: 2800, cost: 0.021997, acc: 0.984375
[Wed Oct 10 17:19:02 2018] epoch_id: 33, batch_id: 2900, cost: 0.013635, acc: 1.000000
[Wed Oct 10 17:19:04 2018] epoch_id: 33, batch_id: 3000, cost: 0.009055, acc: 0.992188
[Wed Oct 10 17:19:05 2018] epoch_id: 33, train_avg_cost: 0.014972, train_avg_acc: 0.995145
[Wed Oct 10 17:19:06 2018] epoch_id: 33, dev_cost: 1.819085, accuracy: 0.8352
[Wed Oct 10 17:19:07 2018] epoch_id: 33, test_cost: 1.859041, accuracy: 0.8314
[Wed Oct 10 17:19:15 2018] epoch_id: 34, batch_id: 0, cost: 0.026821, acc: 0.992188
[Wed Oct 10 17:19:17 2018] epoch_id: 34, batch_id: 100, cost: 0.001463, acc: 1.000000
[Wed Oct 10 17:19:20 2018] epoch_id: 34, batch_id: 200, cost: 0.000579, acc: 1.000000
[Wed Oct 10 17:19:22 2018] epoch_id: 34, batch_id: 300, cost: 0.000492, acc: 1.000000
[Wed Oct 10 17:19:24 2018] epoch_id: 34, batch_id: 400, cost: 0.000671, acc: 1.000000
[Wed Oct 10 17:19:26 2018] epoch_id: 34, batch_id: 500, cost: 0.007763, acc: 1.000000
[Wed Oct 10 17:19:29 2018] epoch_id: 34, batch_id: 600, cost: 0.018827, acc: 0.992188
[Wed Oct 10 17:19:31 2018] epoch_id: 34, batch_id: 700, cost: 0.004606, acc: 1.000000
[Wed Oct 10 17:19:33 2018] epoch_id: 34, batch_id: 800, cost: 0.004697, acc: 1.000000
[Wed Oct 10 17:19:35 2018] epoch_id: 34, batch_id: 900, cost: 0.003752, acc: 1.000000
[Wed Oct 10 17:19:38 2018] epoch_id: 34, batch_id: 1000, cost: 0.003546, acc: 1.000000
[Wed Oct 10 17:19:40 2018] epoch_id: 34, batch_id: 1100, cost: 0.003848, acc: 1.000000
[Wed Oct 10 17:19:42 2018] epoch_id: 34, batch_id: 1200, cost: 0.010363, acc: 1.000000
[Wed Oct 10 17:19:44 2018] epoch_id: 34, batch_id: 1300, cost: 0.013875, acc: 0.992188
[Wed Oct 10 17:19:47 2018] epoch_id: 34, batch_id: 1400, cost: 0.009212, acc: 0.992188
[Wed Oct 10 17:19:49 2018] epoch_id: 34, batch_id: 1500, cost: 0.047909, acc: 0.992188
[Wed Oct 10 17:19:51 2018] epoch_id: 34, batch_id: 1600, cost: 0.012809, acc: 0.992188
[Wed Oct 10 17:19:53 2018] epoch_id: 34, batch_id: 1700, cost: 0.009717, acc: 1.000000
[Wed Oct 10 17:19:56 2018] epoch_id: 34, batch_id: 1800, cost: 0.026330, acc: 0.984375
[Wed Oct 10 17:19:58 2018] epoch_id: 34, batch_id: 1900, cost: 0.016982, acc: 0.992188
[Wed Oct 10 17:20:00 2018] epoch_id: 34, batch_id: 2000, cost: 0.021416, acc: 0.992188
[Wed Oct 10 17:20:03 2018] epoch_id: 34, batch_id: 2100, cost: 0.001120, acc: 1.000000
[Wed Oct 10 17:20:05 2018] epoch_id: 34, batch_id: 2200, cost: 0.011436, acc: 1.000000
[Wed Oct 10 17:20:07 2018] epoch_id: 34, batch_id: 2300, cost: 0.007605, acc: 0.992188
[Wed Oct 10 17:20:10 2018] epoch_id: 34, batch_id: 2400, cost: 0.026308, acc: 0.992188
[Wed Oct 10 17:20:12 2018] epoch_id: 34, batch_id: 2500, cost: 0.006798, acc: 1.000000
[Wed Oct 10 17:20:14 2018] epoch_id: 34, batch_id: 2600, cost: 0.017334, acc: 0.992188
[Wed Oct 10 17:20:16 2018] epoch_id: 34, batch_id: 2700, cost: 0.030094, acc: 0.992188
[Wed Oct 10 17:20:18 2018] epoch_id: 34, batch_id: 2800, cost: 0.053259, acc: 0.992188
[Wed Oct 10 17:20:21 2018] epoch_id: 34, batch_id: 2900, cost: 0.061547, acc: 0.968750
[Wed Oct 10 17:20:23 2018] epoch_id: 34, batch_id: 3000, cost: 0.002864, acc: 1.000000
[Wed Oct 10 17:20:24 2018] epoch_id: 34, train_avg_cost: 0.014813, train_avg_acc: 0.995064
[Wed Oct 10 17:20:25 2018] epoch_id: 34, dev_cost: 1.697732, accuracy: 0.8346
[Wed Oct 10 17:20:26 2018] epoch_id: 34, test_cost: 1.721137, accuracy: 0.8341
[Wed Oct 10 17:20:34 2018] epoch_id: 35, batch_id: 0, cost: 0.000268, acc: 1.000000
[Wed Oct 10 17:20:37 2018] epoch_id: 35, batch_id: 100, cost: 0.001389, acc: 1.000000
[Wed Oct 10 17:20:39 2018] epoch_id: 35, batch_id: 200, cost: 0.003275, acc: 1.000000
[Wed Oct 10 17:20:41 2018] epoch_id: 35, batch_id: 300, cost: 0.006535, acc: 1.000000
[Wed Oct 10 17:20:43 2018] epoch_id: 35, batch_id: 400, cost: 0.005316, acc: 1.000000
[Wed Oct 10 17:20:45 2018] epoch_id: 35, batch_id: 500, cost: 0.017976, acc: 0.992188
[Wed Oct 10 17:20:48 2018] epoch_id: 35, batch_id: 600, cost: 0.060320, acc: 0.992188
[Wed Oct 10 17:20:50 2018] epoch_id: 35, batch_id: 700, cost: 0.004358, acc: 1.000000
[Wed Oct 10 17:20:52 2018] epoch_id: 35, batch_id: 800, cost: 0.003560, acc: 1.000000
[Wed Oct 10 17:20:55 2018] epoch_id: 35, batch_id: 900, cost: 0.017978, acc: 0.992188
[Wed Oct 10 17:20:57 2018] epoch_id: 35, batch_id: 1000, cost: 0.007025, acc: 1.000000
[Wed Oct 10 17:20:59 2018] epoch_id: 35, batch_id: 1100, cost: 0.008777, acc: 0.992188
[Wed Oct 10 17:21:01 2018] epoch_id: 35, batch_id: 1200, cost: 0.006591, acc: 1.000000
[Wed Oct 10 17:21:04 2018] epoch_id: 35, batch_id: 1300, cost: 0.008911, acc: 0.992188
[Wed Oct 10 17:21:06 2018] epoch_id: 35, batch_id: 1400, cost: 0.038343, acc: 0.984375
[Wed Oct 10 17:21:08 2018] epoch_id: 35, batch_id: 1500, cost: 0.001654, acc: 1.000000
[Wed Oct 10 17:21:10 2018] epoch_id: 35, batch_id: 1600, cost: 0.002577, acc: 1.000000
[Wed Oct 10 17:21:13 2018] epoch_id: 35, batch_id: 1700, cost: 0.026908, acc: 0.992188
[Wed Oct 10 17:21:15 2018] epoch_id: 35, batch_id: 1800, cost: 0.024004, acc: 0.992188
[Wed Oct 10 17:21:17 2018] epoch_id: 35, batch_id: 1900, cost: 0.013134, acc: 0.992188
[Wed Oct 10 17:21:19 2018] epoch_id: 35, batch_id: 2000, cost: 0.003633, acc: 1.000000
[Wed Oct 10 17:21:21 2018] epoch_id: 35, batch_id: 2100, cost: 0.011727, acc: 0.992188
[Wed Oct 10 17:21:24 2018] epoch_id: 35, batch_id: 2200, cost: 0.019991, acc: 0.992188
[Wed Oct 10 17:21:26 2018] epoch_id: 35, batch_id: 2300, cost: 0.004771, acc: 1.000000
[Wed Oct 10 17:21:28 2018] epoch_id: 35, batch_id: 2400, cost: 0.013732, acc: 0.992188
[Wed Oct 10 17:21:30 2018] epoch_id: 35, batch_id: 2500, cost: 0.096741, acc: 0.984375
[Wed Oct 10 17:21:33 2018] epoch_id: 35, batch_id: 2600, cost: 0.006102, acc: 1.000000
[Wed Oct 10 17:21:36 2018] epoch_id: 35, batch_id: 2700, cost: 0.007046, acc: 0.992188
[Wed Oct 10 17:21:38 2018] epoch_id: 35, batch_id: 2800, cost: 0.028777, acc: 0.984375
[Wed Oct 10 17:21:41 2018] epoch_id: 35, batch_id: 2900, cost: 0.116960, acc: 0.976562
[Wed Oct 10 17:21:43 2018] epoch_id: 35, batch_id: 3000, cost: 0.039752, acc: 0.968750
[Wed Oct 10 17:21:44 2018] epoch_id: 35, train_avg_cost: 0.014921, train_avg_acc: 0.995075
[Wed Oct 10 17:21:45 2018] epoch_id: 35, dev_cost: 1.203598, accuracy: 0.8348
[Wed Oct 10 17:21:45 2018] epoch_id: 35, test_cost: 1.205202, accuracy: 0.8347
[Wed Oct 10 17:21:54 2018] epoch_id: 36, batch_id: 0, cost: 0.009331, acc: 1.000000
[Wed Oct 10 17:21:56 2018] epoch_id: 36, batch_id: 100, cost: 0.004473, acc: 1.000000
[Wed Oct 10 17:21:58 2018] epoch_id: 36, batch_id: 200, cost: 0.001097, acc: 1.000000
[Wed Oct 10 17:22:00 2018] epoch_id: 36, batch_id: 300, cost: 0.001914, acc: 1.000000
[Wed Oct 10 17:22:03 2018] epoch_id: 36, batch_id: 400, cost: 0.003967, acc: 1.000000
[Wed Oct 10 17:22:05 2018] epoch_id: 36, batch_id: 500, cost: 0.008101, acc: 1.000000
[Wed Oct 10 17:22:07 2018] epoch_id: 36, batch_id: 600, cost: 0.037581, acc: 0.976562
[Wed Oct 10 17:22:09 2018] epoch_id: 36, batch_id: 700, cost: 0.031872, acc: 0.992188
[Wed Oct 10 17:22:11 2018] epoch_id: 36, batch_id: 800, cost: 0.002586, acc: 1.000000
[Wed Oct 10 17:22:14 2018] epoch_id: 36, batch_id: 900, cost: 0.025838, acc: 0.984375
[Wed Oct 10 17:22:16 2018] epoch_id: 36, batch_id: 1000, cost: 0.012382, acc: 0.992188
[Wed Oct 10 17:22:18 2018] epoch_id: 36, batch_id: 1100, cost: 0.006482, acc: 1.000000
[Wed Oct 10 17:22:20 2018] epoch_id: 36, batch_id: 1200, cost: 0.006437, acc: 1.000000
[Wed Oct 10 17:22:23 2018] epoch_id: 36, batch_id: 1300, cost: 0.026039, acc: 0.992188
[Wed Oct 10 17:22:25 2018] epoch_id: 36, batch_id: 1400, cost: 0.017908, acc: 0.992188
[Wed Oct 10 17:22:27 2018] epoch_id: 36, batch_id: 1500, cost: 0.025722, acc: 0.984375
[Wed Oct 10 17:22:29 2018] epoch_id: 36, batch_id: 1600, cost: 0.031398, acc: 0.992188
[Wed Oct 10 17:22:32 2018] epoch_id: 36, batch_id: 1700, cost: 0.034194, acc: 0.984375
[Wed Oct 10 17:22:34 2018] epoch_id: 36, batch_id: 1800, cost: 0.001353, acc: 1.000000
[Wed Oct 10 17:22:36 2018] epoch_id: 36, batch_id: 1900, cost: 0.000942, acc: 1.000000
[Wed Oct 10 17:22:38 2018] epoch_id: 36, batch_id: 2000, cost: 0.004051, acc: 1.000000
[Wed Oct 10 17:22:40 2018] epoch_id: 36, batch_id: 2100, cost: 0.016359, acc: 0.992188
[Wed Oct 10 17:22:43 2018] epoch_id: 36, batch_id: 2200, cost: 0.010324, acc: 1.000000
[Wed Oct 10 17:22:46 2018] epoch_id: 36, batch_id: 2300, cost: 0.015250, acc: 1.000000
[Wed Oct 10 17:22:48 2018] epoch_id: 36, batch_id: 2400, cost: 0.053711, acc: 0.976562
[Wed Oct 10 17:22:51 2018] epoch_id: 36, batch_id: 2500, cost: 0.059409, acc: 0.984375
[Wed Oct 10 17:22:53 2018] epoch_id: 36, batch_id: 2600, cost: 0.009707, acc: 1.000000
[Wed Oct 10 17:22:55 2018] epoch_id: 36, batch_id: 2700, cost: 0.003367, acc: 1.000000
[Wed Oct 10 17:22:58 2018] epoch_id: 36, batch_id: 2800, cost: 0.001207, acc: 1.000000
[Wed Oct 10 17:23:00 2018] epoch_id: 36, batch_id: 2900, cost: 0.009538, acc: 0.992188
[Wed Oct 10 17:23:02 2018] epoch_id: 36, batch_id: 3000, cost: 0.013745, acc: 0.992188
[Wed Oct 10 17:23:03 2018] epoch_id: 36, train_avg_cost: 0.014009, train_avg_acc: 0.995522
[Wed Oct 10 17:23:04 2018] epoch_id: 36, dev_cost: 1.647745, accuracy: 0.8324
[Wed Oct 10 17:23:05 2018] epoch_id: 36, test_cost: 1.662931, accuracy: 0.8368
[Wed Oct 10 17:23:13 2018] epoch_id: 37, batch_id: 0, cost: 0.009128, acc: 1.000000
[Wed Oct 10 17:23:15 2018] epoch_id: 37, batch_id: 100, cost: 0.000989, acc: 1.000000
[Wed Oct 10 17:23:17 2018] epoch_id: 37, batch_id: 200, cost: 0.031867, acc: 0.992188
[Wed Oct 10 17:23:20 2018] epoch_id: 37, batch_id: 300, cost: 0.016197, acc: 0.984375
[Wed Oct 10 17:23:22 2018] epoch_id: 37, batch_id: 400, cost: 0.004157, acc: 1.000000
[Wed Oct 10 17:23:24 2018] epoch_id: 37, batch_id: 500, cost: 0.004215, acc: 1.000000
[Wed Oct 10 17:23:26 2018] epoch_id: 37, batch_id: 600, cost: 0.000303, acc: 1.000000
[Wed Oct 10 17:23:29 2018] epoch_id: 37, batch_id: 700, cost: 0.005056, acc: 1.000000
[Wed Oct 10 17:23:31 2018] epoch_id: 37, batch_id: 800, cost: 0.016816, acc: 0.992188
[Wed Oct 10 17:23:34 2018] epoch_id: 37, batch_id: 900, cost: 0.036067, acc: 0.984375
[Wed Oct 10 17:23:37 2018] epoch_id: 37, batch_id: 1000, cost: 0.002430, acc: 1.000000
[Wed Oct 10 17:23:39 2018] epoch_id: 37, batch_id: 1100, cost: 0.001621, acc: 1.000000
[Wed Oct 10 17:23:41 2018] epoch_id: 37, batch_id: 1200, cost: 0.034505, acc: 0.992188
[Wed Oct 10 17:23:43 2018] epoch_id: 37, batch_id: 1300, cost: 0.008605, acc: 0.992188
[Wed Oct 10 17:23:45 2018] epoch_id: 37, batch_id: 1400, cost: 0.039387, acc: 0.984375
[Wed Oct 10 17:23:48 2018] epoch_id: 37, batch_id: 1500, cost: 0.005761, acc: 1.000000
[Wed Oct 10 17:23:50 2018] epoch_id: 37, batch_id: 1600, cost: 0.002905, acc: 1.000000
[Wed Oct 10 17:23:52 2018] epoch_id: 37, batch_id: 1700, cost: 0.009640, acc: 1.000000
[Wed Oct 10 17:23:55 2018] epoch_id: 37, batch_id: 1800, cost: 0.004734, acc: 1.000000
[Wed Oct 10 17:23:57 2018] epoch_id: 37, batch_id: 1900, cost: 0.029191, acc: 0.992188
[Wed Oct 10 17:23:59 2018] epoch_id: 37, batch_id: 2000, cost: 0.000724, acc: 1.000000
[Wed Oct 10 17:24:01 2018] epoch_id: 37, batch_id: 2100, cost: 0.014325, acc: 0.992188
[Wed Oct 10 17:24:04 2018] epoch_id: 37, batch_id: 2200, cost: 0.004239, acc: 1.000000
[Wed Oct 10 17:24:06 2018] epoch_id: 37, batch_id: 2300, cost: 0.000597, acc: 1.000000
[Wed Oct 10 17:24:08 2018] epoch_id: 37, batch_id: 2400, cost: 0.008226, acc: 1.000000
[Wed Oct 10 17:24:10 2018] epoch_id: 37, batch_id: 2500, cost: 0.001601, acc: 1.000000
[Wed Oct 10 17:24:12 2018] epoch_id: 37, batch_id: 2600, cost: 0.014527, acc: 0.992188
[Wed Oct 10 17:24:15 2018] epoch_id: 37, batch_id: 2700, cost: 0.010813, acc: 0.992188
[Wed Oct 10 17:24:17 2018] epoch_id: 37, batch_id: 2800, cost: 0.015832, acc: 0.992188
[Wed Oct 10 17:24:19 2018] epoch_id: 37, batch_id: 2900, cost: 0.063636, acc: 0.976562
[Wed Oct 10 17:24:22 2018] epoch_id: 37, batch_id: 3000, cost: 0.003993, acc: 1.000000
[Wed Oct 10 17:24:22 2018] epoch_id: 37, train_avg_cost: 0.014056, train_avg_acc: 0.995431
[Wed Oct 10 17:24:23 2018] epoch_id: 37, dev_cost: 1.500988, accuracy: 0.8334
[Wed Oct 10 17:24:24 2018] epoch_id: 37, test_cost: 1.491400, accuracy: 0.8327
[Wed Oct 10 17:24:33 2018] epoch_id: 38, batch_id: 0, cost: 0.016895, acc: 0.992188
[Wed Oct 10 17:24:35 2018] epoch_id: 38, batch_id: 100, cost: 0.001690, acc: 1.000000
[Wed Oct 10 17:24:38 2018] epoch_id: 38, batch_id: 200, cost: 0.009989, acc: 1.000000
[Wed Oct 10 17:24:40 2018] epoch_id: 38, batch_id: 300, cost: 0.023480, acc: 0.984375
[Wed Oct 10 17:24:42 2018] epoch_id: 38, batch_id: 400, cost: 0.004687, acc: 1.000000
[Wed Oct 10 17:24:45 2018] epoch_id: 38, batch_id: 500, cost: 0.020183, acc: 0.992188
[Wed Oct 10 17:24:47 2018] epoch_id: 38, batch_id: 600, cost: 0.028614, acc: 0.992188
[Wed Oct 10 17:24:49 2018] epoch_id: 38, batch_id: 700, cost: 0.000448, acc: 1.000000
[Wed Oct 10 17:24:51 2018] epoch_id: 38, batch_id: 800, cost: 0.000913, acc: 1.000000
[Wed Oct 10 17:24:54 2018] epoch_id: 38, batch_id: 900, cost: 0.022090, acc: 0.992188
[Wed Oct 10 17:24:56 2018] epoch_id: 38, batch_id: 1000, cost: 0.006918, acc: 0.992188
[Wed Oct 10 17:24:58 2018] epoch_id: 38, batch_id: 1100, cost: 0.028611, acc: 0.984375
[Wed Oct 10 17:25:00 2018] epoch_id: 38, batch_id: 1200, cost: 0.013097, acc: 0.992188
[Wed Oct 10 17:25:03 2018] epoch_id: 38, batch_id: 1300, cost: 0.014227, acc: 0.992188
[Wed Oct 10 17:25:05 2018] epoch_id: 38, batch_id: 1400, cost: 0.033064, acc: 0.992188
[Wed Oct 10 17:25:07 2018] epoch_id: 38, batch_id: 1500, cost: 0.004276, acc: 1.000000
[Wed Oct 10 17:25:09 2018] epoch_id: 38, batch_id: 1600, cost: 0.016516, acc: 0.992188
[Wed Oct 10 17:25:12 2018] epoch_id: 38, batch_id: 1700, cost: 0.004443, acc: 1.000000
[Wed Oct 10 17:25:14 2018] epoch_id: 38, batch_id: 1800, cost: 0.001648, acc: 1.000000
[Wed Oct 10 17:25:17 2018] epoch_id: 38, batch_id: 1900, cost: 0.026780, acc: 0.992188
[Wed Oct 10 17:25:20 2018] epoch_id: 38, batch_id: 2000, cost: 0.006375, acc: 0.992188
[Wed Oct 10 17:25:22 2018] epoch_id: 38, batch_id: 2100, cost: 0.013131, acc: 0.992188
[Wed Oct 10 17:25:24 2018] epoch_id: 38, batch_id: 2200, cost: 0.012666, acc: 1.000000
[Wed Oct 10 17:25:26 2018] epoch_id: 38, batch_id: 2300, cost: 0.001973, acc: 1.000000
[Wed Oct 10 17:25:29 2018] epoch_id: 38, batch_id: 2400, cost: 0.005966, acc: 1.000000
[Wed Oct 10 17:25:31 2018] epoch_id: 38, batch_id: 2500, cost: 0.011249, acc: 0.992188
[Wed Oct 10 17:25:33 2018] epoch_id: 38, batch_id: 2600, cost: 0.022209, acc: 0.992188
[Wed Oct 10 17:25:36 2018] epoch_id: 38, batch_id: 2700, cost: 0.003999, acc: 1.000000
[Wed Oct 10 17:25:38 2018] epoch_id: 38, batch_id: 2800, cost: 0.010264, acc: 0.992188
[Wed Oct 10 17:25:40 2018] epoch_id: 38, batch_id: 2900, cost: 0.003841, acc: 1.000000
[Wed Oct 10 17:25:42 2018] epoch_id: 38, batch_id: 3000, cost: 0.075514, acc: 0.992188
[Wed Oct 10 17:25:43 2018] epoch_id: 38, train_avg_cost: 0.013573, train_avg_acc: 0.995548
[Wed Oct 10 17:25:44 2018] epoch_id: 38, dev_cost: 1.577028, accuracy: 0.8317
[Wed Oct 10 17:25:45 2018] epoch_id: 38, test_cost: 1.546861, accuracy: 0.8363
[Wed Oct 10 17:25:54 2018] epoch_id: 39, batch_id: 0, cost: 0.000487, acc: 1.000000
[Wed Oct 10 17:25:56 2018] epoch_id: 39, batch_id: 100, cost: 0.003988, acc: 1.000000
[Wed Oct 10 17:25:58 2018] epoch_id: 39, batch_id: 200, cost: 0.069709, acc: 0.984375
[Wed Oct 10 17:26:00 2018] epoch_id: 39, batch_id: 300, cost: 0.031796, acc: 0.992188
[Wed Oct 10 17:26:03 2018] epoch_id: 39, batch_id: 400, cost: 0.007788, acc: 1.000000
[Wed Oct 10 17:26:05 2018] epoch_id: 39, batch_id: 500, cost: 0.014854, acc: 0.992188
[Wed Oct 10 17:26:07 2018] epoch_id: 39, batch_id: 600, cost: 0.017382, acc: 0.992188
[Wed Oct 10 17:26:09 2018] epoch_id: 39, batch_id: 700, cost: 0.003342, acc: 1.000000
[Wed Oct 10 17:26:12 2018] epoch_id: 39, batch_id: 800, cost: 0.003279, acc: 1.000000
[Wed Oct 10 17:26:14 2018] epoch_id: 39, batch_id: 900, cost: 0.018283, acc: 0.992188
[Wed Oct 10 17:26:16 2018] epoch_id: 39, batch_id: 1000, cost: 0.000697, acc: 1.000000
[Wed Oct 10 17:26:18 2018] epoch_id: 39, batch_id: 1100, cost: 0.003188, acc: 1.000000
[Wed Oct 10 17:26:21 2018] epoch_id: 39, batch_id: 1200, cost: 0.002884, acc: 1.000000
[Wed Oct 10 17:26:23 2018] epoch_id: 39, batch_id: 1300, cost: 0.016443, acc: 0.992188
[Wed Oct 10 17:26:25 2018] epoch_id: 39, batch_id: 1400, cost: 0.036063, acc: 0.992188
[Wed Oct 10 17:26:28 2018] epoch_id: 39, batch_id: 1500, cost: 0.010849, acc: 0.992188
[Wed Oct 10 17:26:30 2018] epoch_id: 39, batch_id: 1600, cost: 0.002218, acc: 1.000000
[Wed Oct 10 17:26:32 2018] epoch_id: 39, batch_id: 1700, cost: 0.011184, acc: 1.000000
[Wed Oct 10 17:26:34 2018] epoch_id: 39, batch_id: 1800, cost: 0.002410, acc: 1.000000
[Wed Oct 10 17:26:37 2018] epoch_id: 39, batch_id: 1900, cost: 0.010422, acc: 0.992188
[Wed Oct 10 17:26:39 2018] epoch_id: 39, batch_id: 2000, cost: 0.012162, acc: 0.992188
[Wed Oct 10 17:26:41 2018] epoch_id: 39, batch_id: 2100, cost: 0.042420, acc: 0.984375
[Wed Oct 10 17:26:43 2018] epoch_id: 39, batch_id: 2200, cost: 0.006210, acc: 1.000000
[Wed Oct 10 17:26:46 2018] epoch_id: 39, batch_id: 2300, cost: 0.002905, acc: 1.000000
[Wed Oct 10 17:26:48 2018] epoch_id: 39, batch_id: 2400, cost: 0.067472, acc: 0.992188
[Wed Oct 10 17:26:50 2018] epoch_id: 39, batch_id: 2500, cost: 0.030382, acc: 0.992188
[Wed Oct 10 17:26:52 2018] epoch_id: 39, batch_id: 2600, cost: 0.049727, acc: 0.992188
[Wed Oct 10 17:26:54 2018] epoch_id: 39, batch_id: 2700, cost: 0.024157, acc: 0.984375
[Wed Oct 10 17:26:57 2018] epoch_id: 39, batch_id: 2800, cost: 0.021991, acc: 0.992188
[Wed Oct 10 17:26:59 2018] epoch_id: 39, batch_id: 2900, cost: 0.001997, acc: 1.000000
[Wed Oct 10 17:27:01 2018] epoch_id: 39, batch_id: 3000, cost: 0.001907, acc: 1.000000
[Wed Oct 10 17:27:02 2018] epoch_id: 39, train_avg_cost: 0.012756, train_avg_acc: 0.995835
[Wed Oct 10 17:27:03 2018] epoch_id: 39, dev_cost: 1.650582, accuracy: 0.8342
[Wed Oct 10 17:27:04 2018] epoch_id: 39, test_cost: 1.662477, accuracy: 0.8325
[Wed Oct 10 17:27:12 2018] epoch_id: 40, batch_id: 0, cost: 0.000858, acc: 1.000000
[Wed Oct 10 17:27:15 2018] epoch_id: 40, batch_id: 100, cost: 0.000849, acc: 1.000000
[Wed Oct 10 17:27:17 2018] epoch_id: 40, batch_id: 200, cost: 0.016273, acc: 0.992188
[Wed Oct 10 17:27:19 2018] epoch_id: 40, batch_id: 300, cost: 0.042659, acc: 0.992188
[Wed Oct 10 17:27:21 2018] epoch_id: 40, batch_id: 400, cost: 0.010672, acc: 0.992188
[Wed Oct 10 17:27:24 2018] epoch_id: 40, batch_id: 500, cost: 0.000544, acc: 1.000000
[Wed Oct 10 17:27:26 2018] epoch_id: 40, batch_id: 600, cost: 0.005578, acc: 1.000000
[Wed Oct 10 17:27:28 2018] epoch_id: 40, batch_id: 700, cost: 0.039266, acc: 0.992188
[Wed Oct 10 17:27:31 2018] epoch_id: 40, batch_id: 800, cost: 0.013144, acc: 0.992188
[Wed Oct 10 17:27:33 2018] epoch_id: 40, batch_id: 900, cost: 0.000740, acc: 1.000000
[Wed Oct 10 17:27:35 2018] epoch_id: 40, batch_id: 1000, cost: 0.003259, acc: 1.000000
[Wed Oct 10 17:27:37 2018] epoch_id: 40, batch_id: 1100, cost: 0.002126, acc: 1.000000
[Wed Oct 10 17:27:40 2018] epoch_id: 40, batch_id: 1200, cost: 0.003089, acc: 1.000000
[Wed Oct 10 17:27:42 2018] epoch_id: 40, batch_id: 1300, cost: 0.000690, acc: 1.000000
[Wed Oct 10 17:27:44 2018] epoch_id: 40, batch_id: 1400, cost: 0.000283, acc: 1.000000
[Wed Oct 10 17:27:46 2018] epoch_id: 40, batch_id: 1500, cost: 0.013878, acc: 0.984375
[Wed Oct 10 17:27:49 2018] epoch_id: 40, batch_id: 1600, cost: 0.005389, acc: 1.000000
[Wed Oct 10 17:27:51 2018] epoch_id: 40, batch_id: 1700, cost: 0.024631, acc: 0.992188
[Wed Oct 10 17:27:53 2018] epoch_id: 40, batch_id: 1800, cost: 0.003978, acc: 1.000000
[Wed Oct 10 17:27:55 2018] epoch_id: 40, batch_id: 1900, cost: 0.004993, acc: 1.000000
[Wed Oct 10 17:27:58 2018] epoch_id: 40, batch_id: 2000, cost: 0.014580, acc: 0.984375
[Wed Oct 10 17:28:00 2018] epoch_id: 40, batch_id: 2100, cost: 0.003148, acc: 1.000000
[Wed Oct 10 17:28:02 2018] epoch_id: 40, batch_id: 2200, cost: 0.000848, acc: 1.000000
[Wed Oct 10 17:28:04 2018] epoch_id: 40, batch_id: 2300, cost: 0.009250, acc: 1.000000
[Wed Oct 10 17:28:06 2018] epoch_id: 40, batch_id: 2400, cost: 0.006138, acc: 1.000000
[Wed Oct 10 17:28:09 2018] epoch_id: 40, batch_id: 2500, cost: 0.050052, acc: 0.984375
[Wed Oct 10 17:28:11 2018] epoch_id: 40, batch_id: 2600, cost: 0.005259, acc: 1.000000
[Wed Oct 10 17:28:13 2018] epoch_id: 40, batch_id: 2700, cost: 0.027375, acc: 0.984375
[Wed Oct 10 17:28:17 2018] epoch_id: 40, batch_id: 2800, cost: 0.010132, acc: 0.992188
[Wed Oct 10 17:28:19 2018] epoch_id: 40, batch_id: 2900, cost: 0.003442, acc: 1.000000
[Wed Oct 10 17:28:21 2018] epoch_id: 40, batch_id: 3000, cost: 0.005328, acc: 1.000000
[Wed Oct 10 17:28:22 2018] epoch_id: 40, train_avg_cost: 0.013034, train_avg_acc: 0.995832
[Wed Oct 10 17:28:23 2018] epoch_id: 40, dev_cost: 1.424795, accuracy: 0.8311
[Wed Oct 10 17:28:24 2018] epoch_id: 40, test_cost: 1.404285, accuracy: 0.8345
[Wed Oct 10 17:28:32 2018] epoch_id: 41, batch_id: 0, cost: 0.023169, acc: 0.992188
[Wed Oct 10 17:28:34 2018] epoch_id: 41, batch_id: 100, cost: 0.008356, acc: 0.992188
[Wed Oct 10 17:28:36 2018] epoch_id: 41, batch_id: 200, cost: 0.034033, acc: 0.992188
[Wed Oct 10 17:28:39 2018] epoch_id: 41, batch_id: 300, cost: 0.003154, acc: 1.000000
[Wed Oct 10 17:28:41 2018] epoch_id: 41, batch_id: 400, cost: 0.000178, acc: 1.000000
[Wed Oct 10 17:28:43 2018] epoch_id: 41, batch_id: 500, cost: 0.001488, acc: 1.000000
[Wed Oct 10 17:28:45 2018] epoch_id: 41, batch_id: 600, cost: 0.034724, acc: 0.992188
[Wed Oct 10 17:28:48 2018] epoch_id: 41, batch_id: 700, cost: 0.011531, acc: 0.992188
[Wed Oct 10 17:28:50 2018] epoch_id: 41, batch_id: 800, cost: 0.003504, acc: 1.000000
[Wed Oct 10 17:28:52 2018] epoch_id: 41, batch_id: 900, cost: 0.010360, acc: 0.992188
[Wed Oct 10 17:28:54 2018] epoch_id: 41, batch_id: 1000, cost: 0.014474, acc: 0.992188
[Wed Oct 10 17:28:57 2018] epoch_id: 41, batch_id: 1100, cost: 0.005857, acc: 1.000000
[Wed Oct 10 17:28:59 2018] epoch_id: 41, batch_id: 1200, cost: 0.007621, acc: 0.992188
[Wed Oct 10 17:29:01 2018] epoch_id: 41, batch_id: 1300, cost: 0.013386, acc: 0.992188
[Wed Oct 10 17:29:03 2018] epoch_id: 41, batch_id: 1400, cost: 0.004675, acc: 1.000000
[Wed Oct 10 17:29:05 2018] epoch_id: 41, batch_id: 1500, cost: 0.023563, acc: 0.984375
[Wed Oct 10 17:29:07 2018] epoch_id: 41, batch_id: 1600, cost: 0.001719, acc: 1.000000
[Wed Oct 10 17:29:10 2018] epoch_id: 41, batch_id: 1700, cost: 0.000334, acc: 1.000000
[Wed Oct 10 17:29:12 2018] epoch_id: 41, batch_id: 1800, cost: 0.001468, acc: 1.000000
[Wed Oct 10 17:29:14 2018] epoch_id: 41, batch_id: 1900, cost: 0.002295, acc: 1.000000
[Wed Oct 10 17:29:16 2018] epoch_id: 41, batch_id: 2000, cost: 0.021738, acc: 0.984375
[Wed Oct 10 17:29:19 2018] epoch_id: 41, batch_id: 2100, cost: 0.023329, acc: 0.984375
[Wed Oct 10 17:29:21 2018] epoch_id: 41, batch_id: 2200, cost: 0.005678, acc: 1.000000
[Wed Oct 10 17:29:23 2018] epoch_id: 41, batch_id: 2300, cost: 0.004800, acc: 1.000000
[Wed Oct 10 17:29:27 2018] epoch_id: 41, batch_id: 2400, cost: 0.007035, acc: 1.000000
[Wed Oct 10 17:29:29 2018] epoch_id: 41, batch_id: 2500, cost: 0.041456, acc: 0.976562
[Wed Oct 10 17:29:31 2018] epoch_id: 41, batch_id: 2600, cost: 0.011735, acc: 0.992188
[Wed Oct 10 17:29:33 2018] epoch_id: 41, batch_id: 2700, cost: 0.016611, acc: 0.992188
[Wed Oct 10 17:29:36 2018] epoch_id: 41, batch_id: 2800, cost: 0.004084, acc: 1.000000
[Wed Oct 10 17:29:38 2018] epoch_id: 41, batch_id: 2900, cost: 0.001111, acc: 1.000000
[Wed Oct 10 17:29:40 2018] epoch_id: 41, batch_id: 3000, cost: 0.015571, acc: 0.992188
[Wed Oct 10 17:29:41 2018] epoch_id: 41, train_avg_cost: 0.012473, train_avg_acc: 0.995959
[Wed Oct 10 17:29:42 2018] epoch_id: 41, dev_cost: 1.301212, accuracy: 0.8313
[Wed Oct 10 17:29:43 2018] epoch_id: 41, test_cost: 1.292132, accuracy: 0.8314
[Wed Oct 10 17:29:51 2018] epoch_id: 42, batch_id: 0, cost: 0.006710, acc: 1.000000
[Wed Oct 10 17:29:53 2018] epoch_id: 42, batch_id: 100, cost: 0.003760, acc: 1.000000
[Wed Oct 10 17:29:56 2018] epoch_id: 42, batch_id: 200, cost: 0.007728, acc: 1.000000
[Wed Oct 10 17:29:58 2018] epoch_id: 42, batch_id: 300, cost: 0.010997, acc: 1.000000
[Wed Oct 10 17:30:00 2018] epoch_id: 42, batch_id: 400, cost: 0.015313, acc: 0.984375
[Wed Oct 10 17:30:02 2018] epoch_id: 42, batch_id: 500, cost: 0.000985, acc: 1.000000
[Wed Oct 10 17:30:05 2018] epoch_id: 42, batch_id: 600, cost: 0.001277, acc: 1.000000
[Wed Oct 10 17:30:07 2018] epoch_id: 42, batch_id: 700, cost: 0.002231, acc: 1.000000
[Wed Oct 10 17:30:10 2018] epoch_id: 42, batch_id: 800, cost: 0.002233, acc: 1.000000
[Wed Oct 10 17:30:12 2018] epoch_id: 42, batch_id: 900, cost: 0.002083, acc: 1.000000
[Wed Oct 10 17:30:15 2018] epoch_id: 42, batch_id: 1000, cost: 0.004574, acc: 1.000000
[Wed Oct 10 17:30:17 2018] epoch_id: 42, batch_id: 1100, cost: 0.004339, acc: 1.000000
[Wed Oct 10 17:30:19 2018] epoch_id: 42, batch_id: 1200, cost: 0.006596, acc: 1.000000
[Wed Oct 10 17:30:22 2018] epoch_id: 42, batch_id: 1300, cost: 0.000877, acc: 1.000000
[Wed Oct 10 17:30:24 2018] epoch_id: 42, batch_id: 1400, cost: 0.001873, acc: 1.000000
[Wed Oct 10 17:30:26 2018] epoch_id: 42, batch_id: 1500, cost: 0.000632, acc: 1.000000
[Wed Oct 10 17:30:29 2018] epoch_id: 42, batch_id: 1600, cost: 0.002006, acc: 1.000000
[Wed Oct 10 17:30:31 2018] epoch_id: 42, batch_id: 1700, cost: 0.002035, acc: 1.000000
[Wed Oct 10 17:30:33 2018] epoch_id: 42, batch_id: 1800, cost: 0.010094, acc: 1.000000
[Wed Oct 10 17:30:35 2018] epoch_id: 42, batch_id: 1900, cost: 0.002634, acc: 1.000000
[Wed Oct 10 17:30:38 2018] epoch_id: 42, batch_id: 2000, cost: 0.045660, acc: 0.984375
[Wed Oct 10 17:30:40 2018] epoch_id: 42, batch_id: 2100, cost: 0.034275, acc: 0.984375
[Wed Oct 10 17:30:42 2018] epoch_id: 42, batch_id: 2200, cost: 0.001633, acc: 1.000000
[Wed Oct 10 17:30:44 2018] epoch_id: 42, batch_id: 2300, cost: 0.001030, acc: 1.000000
[Wed Oct 10 17:30:47 2018] epoch_id: 42, batch_id: 2400, cost: 0.002235, acc: 1.000000
[Wed Oct 10 17:30:49 2018] epoch_id: 42, batch_id: 2500, cost: 0.017729, acc: 0.992188
[Wed Oct 10 17:30:51 2018] epoch_id: 42, batch_id: 2600, cost: 0.004357, acc: 1.000000
[Wed Oct 10 17:30:53 2018] epoch_id: 42, batch_id: 2700, cost: 0.000981, acc: 1.000000
[Wed Oct 10 17:30:56 2018] epoch_id: 42, batch_id: 2800, cost: 0.000964, acc: 1.000000
[Wed Oct 10 17:30:58 2018] epoch_id: 42, batch_id: 2900, cost: 0.018888, acc: 0.992188
[Wed Oct 10 17:31:00 2018] epoch_id: 42, batch_id: 3000, cost: 0.032965, acc: 0.984375
[Wed Oct 10 17:31:01 2018] epoch_id: 42, train_avg_cost: 0.013007, train_avg_acc: 0.995946
[Wed Oct 10 17:31:02 2018] epoch_id: 42, dev_cost: 1.701511, accuracy: 0.8335
[Wed Oct 10 17:31:03 2018] epoch_id: 42, test_cost: 1.704458, accuracy: 0.8312
[Wed Oct 10 17:31:12 2018] epoch_id: 43, batch_id: 0, cost: 0.002044, acc: 1.000000
[Wed Oct 10 17:31:14 2018] epoch_id: 43, batch_id: 100, cost: 0.018454, acc: 0.992188
[Wed Oct 10 17:31:16 2018] epoch_id: 43, batch_id: 200, cost: 0.002746, acc: 1.000000
[Wed Oct 10 17:31:18 2018] epoch_id: 43, batch_id: 300, cost: 0.008316, acc: 0.992188
[Wed Oct 10 17:31:21 2018] epoch_id: 43, batch_id: 400, cost: 0.009446, acc: 1.000000
[Wed Oct 10 17:31:23 2018] epoch_id: 43, batch_id: 500, cost: 0.000336, acc: 1.000000
[Wed Oct 10 17:31:25 2018] epoch_id: 43, batch_id: 600, cost: 0.000436, acc: 1.000000
[Wed Oct 10 17:31:27 2018] epoch_id: 43, batch_id: 700, cost: 0.000142, acc: 1.000000
[Wed Oct 10 17:31:30 2018] epoch_id: 43, batch_id: 800, cost: 0.001449, acc: 1.000000
[Wed Oct 10 17:31:32 2018] epoch_id: 43, batch_id: 900, cost: 0.040274, acc: 0.992188
[Wed Oct 10 17:31:34 2018] epoch_id: 43, batch_id: 1000, cost: 0.002314, acc: 1.000000
[Wed Oct 10 17:31:36 2018] epoch_id: 43, batch_id: 1100, cost: 0.008140, acc: 0.992188
[Wed Oct 10 17:31:39 2018] epoch_id: 43, batch_id: 1200, cost: 0.001320, acc: 1.000000
[Wed Oct 10 17:31:41 2018] epoch_id: 43, batch_id: 1300, cost: 0.000427, acc: 1.000000
[Wed Oct 10 17:31:43 2018] epoch_id: 43, batch_id: 1400, cost: 0.004985, acc: 1.000000
[Wed Oct 10 17:31:46 2018] epoch_id: 43, batch_id: 1500, cost: 0.005165, acc: 1.000000
[Wed Oct 10 17:31:48 2018] epoch_id: 43, batch_id: 1600, cost: 0.006397, acc: 1.000000
[Wed Oct 10 17:31:50 2018] epoch_id: 43, batch_id: 1700, cost: 0.026334, acc: 0.984375
[Wed Oct 10 17:31:54 2018] epoch_id: 43, batch_id: 1800, cost: 0.003058, acc: 1.000000
[Wed Oct 10 17:31:56 2018] epoch_id: 43, batch_id: 1900, cost: 0.009215, acc: 1.000000
[Wed Oct 10 17:31:58 2018] epoch_id: 43, batch_id: 2000, cost: 0.005750, acc: 1.000000
[Wed Oct 10 17:32:01 2018] epoch_id: 43, batch_id: 2100, cost: 0.006973, acc: 1.000000
[Wed Oct 10 17:32:03 2018] epoch_id: 43, batch_id: 2200, cost: 0.040183, acc: 0.984375
[Wed Oct 10 17:32:05 2018] epoch_id: 43, batch_id: 2300, cost: 0.007980, acc: 0.992188
[Wed Oct 10 17:32:07 2018] epoch_id: 43, batch_id: 2400, cost: 0.018794, acc: 0.992188
[Wed Oct 10 17:32:10 2018] epoch_id: 43, batch_id: 2500, cost: 0.031288, acc: 0.984375
[Wed Oct 10 17:32:12 2018] epoch_id: 43, batch_id: 2600, cost: 0.010219, acc: 0.992188
[Wed Oct 10 17:32:14 2018] epoch_id: 43, batch_id: 2700, cost: 0.021514, acc: 0.984375
[Wed Oct 10 17:32:17 2018] epoch_id: 43, batch_id: 2800, cost: 0.005614, acc: 1.000000
[Wed Oct 10 17:32:19 2018] epoch_id: 43, batch_id: 2900, cost: 0.065875, acc: 0.984375
[Wed Oct 10 17:32:21 2018] epoch_id: 43, batch_id: 3000, cost: 0.013279, acc: 0.992188
[Wed Oct 10 17:32:22 2018] epoch_id: 43, train_avg_cost: 0.011822, train_avg_acc: 0.996238
[Wed Oct 10 17:32:23 2018] epoch_id: 43, dev_cost: 1.703876, accuracy: 0.8322
[Wed Oct 10 17:32:24 2018] epoch_id: 43, test_cost: 1.724094, accuracy: 0.8315
[Wed Oct 10 17:32:32 2018] epoch_id: 44, batch_id: 0, cost: 0.003358, acc: 1.000000
[Wed Oct 10 17:32:34 2018] epoch_id: 44, batch_id: 100, cost: 0.003024, acc: 1.000000
[Wed Oct 10 17:32:37 2018] epoch_id: 44, batch_id: 200, cost: 0.038726, acc: 0.992188
[Wed Oct 10 17:32:39 2018] epoch_id: 44, batch_id: 300, cost: 0.001766, acc: 1.000000
[Wed Oct 10 17:32:41 2018] epoch_id: 44, batch_id: 400, cost: 0.005300, acc: 1.000000
[Wed Oct 10 17:32:43 2018] epoch_id: 44, batch_id: 500, cost: 0.023175, acc: 0.992188
[Wed Oct 10 17:32:46 2018] epoch_id: 44, batch_id: 600, cost: 0.002893, acc: 1.000000
[Wed Oct 10 17:32:48 2018] epoch_id: 44, batch_id: 700, cost: 0.025870, acc: 0.976562
[Wed Oct 10 17:32:50 2018] epoch_id: 44, batch_id: 800, cost: 0.019898, acc: 0.992188
[Wed Oct 10 17:32:52 2018] epoch_id: 44, batch_id: 900, cost: 0.001718, acc: 1.000000
[Wed Oct 10 17:32:55 2018] epoch_id: 44, batch_id: 1000, cost: 0.000221, acc: 1.000000
[Wed Oct 10 17:32:57 2018] epoch_id: 44, batch_id: 1100, cost: 0.002172, acc: 1.000000
[Wed Oct 10 17:32:59 2018] epoch_id: 44, batch_id: 1200, cost: 0.001158, acc: 1.000000
[Wed Oct 10 17:33:02 2018] epoch_id: 44, batch_id: 1300, cost: 0.004667, acc: 1.000000
[Wed Oct 10 17:33:04 2018] epoch_id: 44, batch_id: 1400, cost: 0.000685, acc: 1.000000
[Wed Oct 10 17:33:06 2018] epoch_id: 44, batch_id: 1500, cost: 0.007730, acc: 1.000000
[Wed Oct 10 17:33:08 2018] epoch_id: 44, batch_id: 1600, cost: 0.006694, acc: 1.000000
[Wed Oct 10 17:33:11 2018] epoch_id: 44, batch_id: 1700, cost: 0.009508, acc: 0.992188
[Wed Oct 10 17:33:13 2018] epoch_id: 44, batch_id: 1800, cost: 0.018037, acc: 0.992188
[Wed Oct 10 17:33:15 2018] epoch_id: 44, batch_id: 1900, cost: 0.020902, acc: 0.976562
[Wed Oct 10 17:33:18 2018] epoch_id: 44, batch_id: 2000, cost: 0.006977, acc: 0.992188
[Wed Oct 10 17:33:20 2018] epoch_id: 44, batch_id: 2100, cost: 0.004821, acc: 1.000000
[Wed Oct 10 17:33:22 2018] epoch_id: 44, batch_id: 2200, cost: 0.000209, acc: 1.000000
[Wed Oct 10 17:33:25 2018] epoch_id: 44, batch_id: 2300, cost: 0.008764, acc: 0.992188
[Wed Oct 10 17:33:27 2018] epoch_id: 44, batch_id: 2400, cost: 0.029171, acc: 0.992188
[Wed Oct 10 17:33:29 2018] epoch_id: 44, batch_id: 2500, cost: 0.015028, acc: 0.992188
[Wed Oct 10 17:33:31 2018] epoch_id: 44, batch_id: 2600, cost: 0.007096, acc: 1.000000
[Wed Oct 10 17:33:33 2018] epoch_id: 44, batch_id: 2700, cost: 0.000547, acc: 1.000000
[Wed Oct 10 17:33:36 2018] epoch_id: 44, batch_id: 2800, cost: 0.004024, acc: 1.000000
[Wed Oct 10 17:33:38 2018] epoch_id: 44, batch_id: 2900, cost: 0.002191, acc: 1.000000
[Wed Oct 10 17:33:40 2018] epoch_id: 44, batch_id: 3000, cost: 0.008875, acc: 1.000000
[Wed Oct 10 17:33:41 2018] epoch_id: 44, train_avg_cost: 0.012328, train_avg_acc: 0.996076
[Wed Oct 10 17:33:42 2018] epoch_id: 44, dev_cost: 1.575702, accuracy: 0.8331
[Wed Oct 10 17:33:43 2018] epoch_id: 44, test_cost: 1.573283, accuracy: 0.8313
[Wed Oct 10 17:33:52 2018] epoch_id: 45, batch_id: 0, cost: 0.002271, acc: 1.000000
[Wed Oct 10 17:33:54 2018] epoch_id: 45, batch_id: 100, cost: 0.005500, acc: 1.000000
[Wed Oct 10 17:33:56 2018] epoch_id: 45, batch_id: 200, cost: 0.001735, acc: 1.000000
[Wed Oct 10 17:33:58 2018] epoch_id: 45, batch_id: 300, cost: 0.008910, acc: 1.000000
[Wed Oct 10 17:34:01 2018] epoch_id: 45, batch_id: 400, cost: 0.010551, acc: 0.992188
[Wed Oct 10 17:34:03 2018] epoch_id: 45, batch_id: 500, cost: 0.005958, acc: 1.000000
[Wed Oct 10 17:34:05 2018] epoch_id: 45, batch_id: 600, cost: 0.012035, acc: 0.992188
[Wed Oct 10 17:34:07 2018] epoch_id: 45, batch_id: 700, cost: 0.002110, acc: 1.000000
[Wed Oct 10 17:34:10 2018] epoch_id: 45, batch_id: 800, cost: 0.014834, acc: 0.992188
[Wed Oct 10 17:34:12 2018] epoch_id: 45, batch_id: 900, cost: 0.010944, acc: 0.992188
[Wed Oct 10 17:34:14 2018] epoch_id: 45, batch_id: 1000, cost: 0.017574, acc: 0.992188
[Wed Oct 10 17:34:16 2018] epoch_id: 45, batch_id: 1100, cost: 0.006877, acc: 1.000000
[Wed Oct 10 17:34:19 2018] epoch_id: 45, batch_id: 1200, cost: 0.001731, acc: 1.000000
[Wed Oct 10 17:34:21 2018] epoch_id: 45, batch_id: 1300, cost: 0.002963, acc: 1.000000
[Wed Oct 10 17:34:23 2018] epoch_id: 45, batch_id: 1400, cost: 0.009798, acc: 1.000000
[Wed Oct 10 17:34:25 2018] epoch_id: 45, batch_id: 1500, cost: 0.003309, acc: 1.000000
[Wed Oct 10 17:34:28 2018] epoch_id: 45, batch_id: 1600, cost: 0.022402, acc: 0.984375
[Wed Oct 10 17:34:30 2018] epoch_id: 45, batch_id: 1700, cost: 0.003854, acc: 1.000000
[Wed Oct 10 17:34:32 2018] epoch_id: 45, batch_id: 1800, cost: 0.000418, acc: 1.000000
[Wed Oct 10 17:34:35 2018] epoch_id: 45, batch_id: 1900, cost: 0.014512, acc: 0.992188
[Wed Oct 10 17:34:37 2018] epoch_id: 45, batch_id: 2000, cost: 0.031922, acc: 0.992188
[Wed Oct 10 17:34:39 2018] epoch_id: 45, batch_id: 2100, cost: 0.002671, acc: 1.000000
[Wed Oct 10 17:34:42 2018] epoch_id: 45, batch_id: 2200, cost: 0.042934, acc: 0.984375
[Wed Oct 10 17:34:44 2018] epoch_id: 45, batch_id: 2300, cost: 0.008559, acc: 1.000000
[Wed Oct 10 17:34:46 2018] epoch_id: 45, batch_id: 2400, cost: 0.050518, acc: 0.984375
[Wed Oct 10 17:34:48 2018] epoch_id: 45, batch_id: 2500, cost: 0.001887, acc: 1.000000
[Wed Oct 10 17:34:50 2018] epoch_id: 45, batch_id: 2600, cost: 0.002196, acc: 1.000000
[Wed Oct 10 17:34:54 2018] epoch_id: 45, batch_id: 2700, cost: 0.002765, acc: 1.000000
[Wed Oct 10 17:34:56 2018] epoch_id: 45, batch_id: 2800, cost: 0.024691, acc: 0.992188
[Wed Oct 10 17:34:59 2018] epoch_id: 45, batch_id: 2900, cost: 0.003790, acc: 1.000000
[Wed Oct 10 17:35:01 2018] epoch_id: 45, batch_id: 3000, cost: 0.001317, acc: 1.000000
[Wed Oct 10 17:35:01 2018] epoch_id: 45, train_avg_cost: 0.012084, train_avg_acc: 0.996298
[Wed Oct 10 17:35:02 2018] epoch_id: 45, dev_cost: 1.603634, accuracy: 0.8321
[Wed Oct 10 17:35:03 2018] epoch_id: 45, test_cost: 1.609678, accuracy: 0.8291
[Wed Oct 10 17:35:12 2018] epoch_id: 46, batch_id: 0, cost: 0.002291, acc: 1.000000
[Wed Oct 10 17:35:14 2018] epoch_id: 46, batch_id: 100, cost: 0.018703, acc: 0.992188
[Wed Oct 10 17:35:16 2018] epoch_id: 46, batch_id: 200, cost: 0.004407, acc: 1.000000
[Wed Oct 10 17:35:18 2018] epoch_id: 46, batch_id: 300, cost: 0.000953, acc: 1.000000
[Wed Oct 10 17:35:21 2018] epoch_id: 46, batch_id: 400, cost: 0.000732, acc: 1.000000
[Wed Oct 10 17:35:23 2018] epoch_id: 46, batch_id: 500, cost: 0.011275, acc: 0.992188
[Wed Oct 10 17:35:25 2018] epoch_id: 46, batch_id: 600, cost: 0.009521, acc: 1.000000
[Wed Oct 10 17:35:27 2018] epoch_id: 46, batch_id: 700, cost: 0.000671, acc: 1.000000
[Wed Oct 10 17:35:30 2018] epoch_id: 46, batch_id: 800, cost: 0.000768, acc: 1.000000
[Wed Oct 10 17:35:32 2018] epoch_id: 46, batch_id: 900, cost: 0.001357, acc: 1.000000
[Wed Oct 10 17:35:34 2018] epoch_id: 46, batch_id: 1000, cost: 0.001384, acc: 1.000000
[Wed Oct 10 17:35:37 2018] epoch_id: 46, batch_id: 1100, cost: 0.010220, acc: 0.992188
[Wed Oct 10 17:35:39 2018] epoch_id: 46, batch_id: 1200, cost: 0.006540, acc: 1.000000
[Wed Oct 10 17:35:41 2018] epoch_id: 46, batch_id: 1300, cost: 0.002771, acc: 1.000000
[Wed Oct 10 17:35:44 2018] epoch_id: 46, batch_id: 1400, cost: 0.010623, acc: 0.992188
[Wed Oct 10 17:35:46 2018] epoch_id: 46, batch_id: 1500, cost: 0.000798, acc: 1.000000
[Wed Oct 10 17:35:48 2018] epoch_id: 46, batch_id: 1600, cost: 0.004519, acc: 1.000000
[Wed Oct 10 17:35:50 2018] epoch_id: 46, batch_id: 1700, cost: 0.010096, acc: 1.000000
[Wed Oct 10 17:35:53 2018] epoch_id: 46, batch_id: 1800, cost: 0.001868, acc: 1.000000
[Wed Oct 10 17:35:55 2018] epoch_id: 46, batch_id: 1900, cost: 0.039460, acc: 0.984375
[Wed Oct 10 17:35:57 2018] epoch_id: 46, batch_id: 2000, cost: 0.008906, acc: 1.000000
[Wed Oct 10 17:35:59 2018] epoch_id: 46, batch_id: 2100, cost: 0.008440, acc: 0.992188
[Wed Oct 10 17:36:02 2018] epoch_id: 46, batch_id: 2200, cost: 0.014774, acc: 0.992188
[Wed Oct 10 17:36:05 2018] epoch_id: 46, batch_id: 2300, cost: 0.016775, acc: 0.992188
[Wed Oct 10 17:36:07 2018] epoch_id: 46, batch_id: 2400, cost: 0.008999, acc: 0.992188
[Wed Oct 10 17:36:10 2018] epoch_id: 46, batch_id: 2500, cost: 0.001394, acc: 1.000000
[Wed Oct 10 17:36:12 2018] epoch_id: 46, batch_id: 2600, cost: 0.005627, acc: 1.000000
[Wed Oct 10 17:36:14 2018] epoch_id: 46, batch_id: 2700, cost: 0.003667, acc: 1.000000
[Wed Oct 10 17:36:16 2018] epoch_id: 46, batch_id: 2800, cost: 0.016338, acc: 0.992188
[Wed Oct 10 17:36:19 2018] epoch_id: 46, batch_id: 2900, cost: 0.005622, acc: 1.000000
[Wed Oct 10 17:36:21 2018] epoch_id: 46, batch_id: 3000, cost: 0.003068, acc: 1.000000
[Wed Oct 10 17:36:21 2018] epoch_id: 46, train_avg_cost: 0.011216, train_avg_acc: 0.996266
[Wed Oct 10 17:36:22 2018] epoch_id: 46, dev_cost: 1.772260, accuracy: 0.8309
[Wed Oct 10 17:36:23 2018] epoch_id: 46, test_cost: 1.783967, accuracy: 0.8284
[Wed Oct 10 17:36:32 2018] epoch_id: 47, batch_id: 0, cost: 0.001193, acc: 1.000000
[Wed Oct 10 17:36:34 2018] epoch_id: 47, batch_id: 100, cost: 0.000584, acc: 1.000000
[Wed Oct 10 17:36:36 2018] epoch_id: 47, batch_id: 200, cost: 0.001534, acc: 1.000000
[Wed Oct 10 17:36:38 2018] epoch_id: 47, batch_id: 300, cost: 0.014105, acc: 0.992188
[Wed Oct 10 17:36:41 2018] epoch_id: 47, batch_id: 400, cost: 0.000929, acc: 1.000000
[Wed Oct 10 17:36:43 2018] epoch_id: 47, batch_id: 500, cost: 0.007649, acc: 0.992188
[Wed Oct 10 17:36:45 2018] epoch_id: 47, batch_id: 600, cost: 0.009973, acc: 0.992188
[Wed Oct 10 17:36:47 2018] epoch_id: 47, batch_id: 700, cost: 0.006471, acc: 1.000000
[Wed Oct 10 17:36:49 2018] epoch_id: 47, batch_id: 800, cost: 0.002720, acc: 1.000000
[Wed Oct 10 17:36:53 2018] epoch_id: 47, batch_id: 900, cost: 0.001402, acc: 1.000000
[Wed Oct 10 17:36:55 2018] epoch_id: 47, batch_id: 1000, cost: 0.000697, acc: 1.000000
[Wed Oct 10 17:36:57 2018] epoch_id: 47, batch_id: 1100, cost: 0.001998, acc: 1.000000
[Wed Oct 10 17:36:59 2018] epoch_id: 47, batch_id: 1200, cost: 0.009035, acc: 0.992188
[Wed Oct 10 17:37:02 2018] epoch_id: 47, batch_id: 1300, cost: 0.006139, acc: 1.000000
[Wed Oct 10 17:37:04 2018] epoch_id: 47, batch_id: 1400, cost: 0.007283, acc: 1.000000
[Wed Oct 10 17:37:06 2018] epoch_id: 47, batch_id: 1500, cost: 0.016960, acc: 0.992188
[Wed Oct 10 17:37:08 2018] epoch_id: 47, batch_id: 1600, cost: 0.001158, acc: 1.000000
[Wed Oct 10 17:37:11 2018] epoch_id: 47, batch_id: 1700, cost: 0.001425, acc: 1.000000
[Wed Oct 10 17:37:13 2018] epoch_id: 47, batch_id: 1800, cost: 0.001285, acc: 1.000000
[Wed Oct 10 17:37:15 2018] epoch_id: 47, batch_id: 1900, cost: 0.002734, acc: 1.000000
[Wed Oct 10 17:37:17 2018] epoch_id: 47, batch_id: 2000, cost: 0.000576, acc: 1.000000
[Wed Oct 10 17:37:20 2018] epoch_id: 47, batch_id: 2100, cost: 0.001285, acc: 1.000000
[Wed Oct 10 17:37:22 2018] epoch_id: 47, batch_id: 2200, cost: 0.000798, acc: 1.000000
[Wed Oct 10 17:37:24 2018] epoch_id: 47, batch_id: 2300, cost: 0.059468, acc: 0.984375
[Wed Oct 10 17:37:26 2018] epoch_id: 47, batch_id: 2400, cost: 0.004177, acc: 1.000000
[Wed Oct 10 17:37:29 2018] epoch_id: 47, batch_id: 2500, cost: 0.001915, acc: 1.000000
[Wed Oct 10 17:37:31 2018] epoch_id: 47, batch_id: 2600, cost: 0.000491, acc: 1.000000
[Wed Oct 10 17:37:33 2018] epoch_id: 47, batch_id: 2700, cost: 0.001129, acc: 1.000000
[Wed Oct 10 17:37:35 2018] epoch_id: 47, batch_id: 2800, cost: 0.000988, acc: 1.000000
[Wed Oct 10 17:37:38 2018] epoch_id: 47, batch_id: 2900, cost: 0.024258, acc: 0.992188
[Wed Oct 10 17:37:40 2018] epoch_id: 47, batch_id: 3000, cost: 0.000902, acc: 1.000000
[Wed Oct 10 17:37:41 2018] epoch_id: 47, train_avg_cost: 0.011536, train_avg_acc: 0.996225
[Wed Oct 10 17:37:42 2018] epoch_id: 47, dev_cost: 2.235156, accuracy: 0.8326
[Wed Oct 10 17:37:43 2018] epoch_id: 47, test_cost: 2.289617, accuracy: 0.8318
[Wed Oct 10 17:37:51 2018] epoch_id: 48, batch_id: 0, cost: 0.007975, acc: 1.000000
[Wed Oct 10 17:37:54 2018] epoch_id: 48, batch_id: 100, cost: 0.000491, acc: 1.000000
[Wed Oct 10 17:37:56 2018] epoch_id: 48, batch_id: 200, cost: 0.015161, acc: 0.992188
[Wed Oct 10 17:37:58 2018] epoch_id: 48, batch_id: 300, cost: 0.030692, acc: 0.992188
[Wed Oct 10 17:38:00 2018] epoch_id: 48, batch_id: 400, cost: 0.016749, acc: 0.992188
[Wed Oct 10 17:38:03 2018] epoch_id: 48, batch_id: 500, cost: 0.005637, acc: 1.000000
[Wed Oct 10 17:38:05 2018] epoch_id: 48, batch_id: 600, cost: 0.014267, acc: 0.992188
[Wed Oct 10 17:38:07 2018] epoch_id: 48, batch_id: 700, cost: 0.002352, acc: 1.000000
[Wed Oct 10 17:38:10 2018] epoch_id: 48, batch_id: 800, cost: 0.002758, acc: 1.000000
[Wed Oct 10 17:38:12 2018] epoch_id: 48, batch_id: 900, cost: 0.000367, acc: 1.000000
[Wed Oct 10 17:38:14 2018] epoch_id: 48, batch_id: 1000, cost: 0.003479, acc: 1.000000
[Wed Oct 10 17:38:16 2018] epoch_id: 48, batch_id: 1100, cost: 0.006107, acc: 1.000000
[Wed Oct 10 17:38:19 2018] epoch_id: 48, batch_id: 1200, cost: 0.000989, acc: 1.000000
[Wed Oct 10 17:38:21 2018] epoch_id: 48, batch_id: 1300, cost: 0.000442, acc: 1.000000
[Wed Oct 10 17:38:23 2018] epoch_id: 48, batch_id: 1400, cost: 0.002006, acc: 1.000000
[Wed Oct 10 17:38:25 2018] epoch_id: 48, batch_id: 1500, cost: 0.022174, acc: 0.992188
[Wed Oct 10 17:38:28 2018] epoch_id: 48, batch_id: 1600, cost: 0.004670, acc: 1.000000
[Wed Oct 10 17:38:30 2018] epoch_id: 48, batch_id: 1700, cost: 0.014862, acc: 0.992188
[Wed Oct 10 17:38:32 2018] epoch_id: 48, batch_id: 1800, cost: 0.004648, acc: 1.000000
[Wed Oct 10 17:38:36 2018] epoch_id: 48, batch_id: 1900, cost: 0.035342, acc: 0.992188
[Wed Oct 10 17:38:38 2018] epoch_id: 48, batch_id: 2000, cost: 0.018578, acc: 0.992188
[Wed Oct 10 17:38:40 2018] epoch_id: 48, batch_id: 2100, cost: 0.003790, acc: 1.000000
[Wed Oct 10 17:38:42 2018] epoch_id: 48, batch_id: 2200, cost: 0.026731, acc: 0.984375
[Wed Oct 10 17:38:45 2018] epoch_id: 48, batch_id: 2300, cost: 0.003608, acc: 1.000000
[Wed Oct 10 17:38:47 2018] epoch_id: 48, batch_id: 2400, cost: 0.005601, acc: 1.000000
[Wed Oct 10 17:38:49 2018] epoch_id: 48, batch_id: 2500, cost: 0.000833, acc: 1.000000
[Wed Oct 10 17:38:52 2018] epoch_id: 48, batch_id: 2600, cost: 0.004157, acc: 1.000000
[Wed Oct 10 17:38:54 2018] epoch_id: 48, batch_id: 2700, cost: 0.010146, acc: 0.992188
[Wed Oct 10 17:38:56 2018] epoch_id: 48, batch_id: 2800, cost: 0.001127, acc: 1.000000
[Wed Oct 10 17:38:58 2018] epoch_id: 48, batch_id: 2900, cost: 0.004332, acc: 1.000000
[Wed Oct 10 17:39:01 2018] epoch_id: 48, batch_id: 3000, cost: 0.004895, acc: 1.000000
[Wed Oct 10 17:39:01 2018] epoch_id: 48, train_avg_cost: 0.010959, train_avg_acc: 0.996475
[Wed Oct 10 17:39:02 2018] epoch_id: 48, dev_cost: 1.764490, accuracy: 0.8343
[Wed Oct 10 17:39:03 2018] epoch_id: 48, test_cost: 1.826369, accuracy: 0.8296
[Wed Oct 10 17:39:12 2018] epoch_id: 49, batch_id: 0, cost: 0.004527, acc: 1.000000
[Wed Oct 10 17:39:14 2018] epoch_id: 49, batch_id: 100, cost: 0.003537, acc: 1.000000
[Wed Oct 10 17:39:16 2018] epoch_id: 49, batch_id: 200, cost: 0.034318, acc: 0.992188
[Wed Oct 10 17:39:19 2018] epoch_id: 49, batch_id: 300, cost: 0.024897, acc: 0.992188
[Wed Oct 10 17:39:21 2018] epoch_id: 49, batch_id: 400, cost: 0.002212, acc: 1.000000
[Wed Oct 10 17:39:23 2018] epoch_id: 49, batch_id: 500, cost: 0.012678, acc: 0.992188
[Wed Oct 10 17:39:25 2018] epoch_id: 49, batch_id: 600, cost: 0.006081, acc: 1.000000
[Wed Oct 10 17:39:28 2018] epoch_id: 49, batch_id: 700, cost: 0.004294, acc: 1.000000
[Wed Oct 10 17:39:30 2018] epoch_id: 49, batch_id: 800, cost: 0.000339, acc: 1.000000
[Wed Oct 10 17:39:32 2018] epoch_id: 49, batch_id: 900, cost: 0.006350, acc: 0.992188
[Wed Oct 10 17:39:35 2018] epoch_id: 49, batch_id: 1000, cost: 0.002183, acc: 1.000000
[Wed Oct 10 17:39:37 2018] epoch_id: 49, batch_id: 1100, cost: 0.006977, acc: 1.000000
[Wed Oct 10 17:39:39 2018] epoch_id: 49, batch_id: 1200, cost: 0.003140, acc: 1.000000
[Wed Oct 10 17:39:41 2018] epoch_id: 49, batch_id: 1300, cost: 0.003361, acc: 1.000000
[Wed Oct 10 17:39:44 2018] epoch_id: 49, batch_id: 1400, cost: 0.002039, acc: 1.000000
[Wed Oct 10 17:39:46 2018] epoch_id: 49, batch_id: 1500, cost: 0.001850, acc: 1.000000
[Wed Oct 10 17:39:48 2018] epoch_id: 49, batch_id: 1600, cost: 0.045419, acc: 0.992188
[Wed Oct 10 17:39:50 2018] epoch_id: 49, batch_id: 1700, cost: 0.000883, acc: 1.000000
[Wed Oct 10 17:39:53 2018] epoch_id: 49, batch_id: 1800, cost: 0.002086, acc: 1.000000
[Wed Oct 10 17:39:55 2018] epoch_id: 49, batch_id: 1900, cost: 0.014964, acc: 0.992188
[Wed Oct 10 17:39:57 2018] epoch_id: 49, batch_id: 2000, cost: 0.002001, acc: 1.000000
[Wed Oct 10 17:39:59 2018] epoch_id: 49, batch_id: 2100, cost: 0.013663, acc: 0.984375
[Wed Oct 10 17:40:02 2018] epoch_id: 49, batch_id: 2200, cost: 0.013116, acc: 0.992188
[Wed Oct 10 17:40:04 2018] epoch_id: 49, batch_id: 2300, cost: 0.002713, acc: 1.000000
[Wed Oct 10 17:40:06 2018] epoch_id: 49, batch_id: 2400, cost: 0.004193, acc: 1.000000
[Wed Oct 10 17:40:08 2018] epoch_id: 49, batch_id: 2500, cost: 0.001507, acc: 1.000000
[Wed Oct 10 17:40:11 2018] epoch_id: 49, batch_id: 2600, cost: 0.034837, acc: 0.992188
[Wed Oct 10 17:40:13 2018] epoch_id: 49, batch_id: 2700, cost: 0.006245, acc: 1.000000
[Wed Oct 10 17:40:15 2018] epoch_id: 49, batch_id: 2800, cost: 0.003659, acc: 1.000000
[Wed Oct 10 17:40:17 2018] epoch_id: 49, batch_id: 2900, cost: 0.002175, acc: 1.000000
[Wed Oct 10 17:40:19 2018] epoch_id: 49, batch_id: 3000, cost: 0.000767, acc: 1.000000
[Wed Oct 10 17:40:20 2018] epoch_id: 49, train_avg_cost: 0.011233, train_avg_acc: 0.996326
[Wed Oct 10 17:40:21 2018] epoch_id: 49, dev_cost: 1.652680, accuracy: 0.8353
[Wed Oct 10 17:40:22 2018] epoch_id: 49, test_cost: 1.685406, accuracy: 0.8324
# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from .cdssm import cdssm_base
from .dec_att import decatt_glove
from .sse import sse_base
from .infer_sent import infer_sent_v1
from .infer_sent import infer_sent_v2
from __future__ import print_function
class config(object):
def __init__(self):
self.batch_size = 128
self.epoch_num = 50
self.optimizer_type = 'adam' # sgd, adagrad
# pretrained word embedding
self.use_pretrained_word_embedding = True
# when employing pretrained word embedding,
# out of vocabulary words' embedding is initialized with uniform or normal numbers
self.OOV_fill = 'uniform'
self.embedding_norm = False
# or else, use padding and masks for sequence data
self.use_lod_tensor = True
# lr = lr * lr_decay after each epoch
self.lr_decay = 1
self.learning_rate = 0.001
self.save_dirname = 'model_dir'
self.train_samples_num = 384348
self.duplicate_data = False
self.metric_type = ['accuracy']
def list_config(self):
print("config", self.__dict__)
def has_member(self, var_name):
return var_name in self.__dict__
if __name__ == "__main__":
basic = config()
basic.list_config()
basic.ahh = 2
basic.list_config()
# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from . import basic_config
def cdssm_base():
"""
set configs
"""
config = basic_config.config()
config.learning_rate = 0.001
config.save_dirname = "model_dir"
config.use_pretrained_word_embedding = True
config.dict_dim = 40000 # approx_vocab_size
# net config
config.emb_dim = 300
config.kernel_size = 5
config.kernel_count = 300
config.fc_dim = 128
config.mlp_hid_dim = [128, 128]
config.droprate_conv = 0.1
config.droprate_fc = 0.1
config.class_dim = 2
return config
# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from . import basic_config
def decatt_glove():
"""
use config 'decAtt_glove' in the paper 'Neural Paraphrase Identification of Questions with Noisy Pretraining'
"""
config = basic_config.config()
config.learning_rate = 0.05
config.save_dirname = "model_dir"
config.use_pretrained_word_embedding = True
config.dict_dim = 40000 # approx_vocab_size
config.metric_type = ['accuracy', 'accuracy_with_threshold']
config.optimizer_type = 'sgd'
config.lr_decay = 1
config.use_lod_tensor = False
config.embedding_norm = False
config.OOV_fill = 'uniform'
config.duplicate_data = False
# net config
config.emb_dim = 300
config.proj_emb_dim = 200 #TODO: has project?
config.num_units = [400, 200]
config.word_embedding_trainable = True
config.droprate = 0.1
config.share_wight_btw_seq = True
config.class_dim = 2
return config
def decatt_word():
"""
use config 'decAtt_glove' in the paper 'Neural Paraphrase Identification of Questions with Noisy Pretraining'
"""
config = basic_config.config()
config.learning_rate = 0.05
config.save_dirname = "model_dir"
config.use_pretrained_word_embedding = False
config.dict_dim = 40000 # approx_vocab_size
config.metric_type = ['accuracy', 'accuracy_with_threshold']
config.optimizer_type = 'sgd'
config.lr_decay = 1
config.use_lod_tensor = False
config.embedding_norm = False
config.OOV_fill = 'uniform'
config.duplicate_data = False
# net config
config.emb_dim = 300
config.proj_emb_dim = 200 #TODO: has project?
config.num_units = [400, 200]
config.word_embedding_trainable = True
config.droprate = 0.1
config.share_wight_btw_seq = True
config.class_dim = 2
return config
# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from . import basic_config
def infer_sent_v1():
"""
set configs
"""
config = basic_config.config()
config.learning_rate = 0.1
config.lr_decay = 0.99
config.optimizer_type = 'sgd'
config.save_dirname = "model_dir"
config.use_pretrained_word_embedding = True
config.dict_dim = 40000 # approx_vocab_size
config.class_dim = 2
# net config
config.emb_dim = 300
config.droprate_lstm = 0.0
config.droprate_fc = 0.0
config.word_embedding_trainable = False
config.rnn_hid_dim = 2048
config.mlp_non_linear = False
return config
def infer_sent_v2():
"""
use our own config
"""
config = basic_config.config()
config.learning_rate = 0.0002
config.lr_decay = 0.99
config.optimizer_type = 'adam'
config.save_dirname = "model_dir"
config.use_pretrained_word_embedding = True
config.dict_dim = 40000 # approx_vocab_size
config.class_dim = 2
# net config
config.emb_dim = 300
config.droprate_lstm = 0.0
config.droprate_fc = 0.2
config.word_embedding_trainable = False
config.rnn_hid_dim = 2048
config.mlp_non_linear = True
return config
# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from . import basic_config
def sse_base():
"""
use config in the paper 'Shortcut-Stacked Sentence Encoders for Multi-Domain Inference'
"""
config = basic_config.config()
config.learning_rate = 0.0002
config.lr_decay = 0.7
config.save_dirname = "model_dir"
config.use_pretrained_word_embedding = True
config.dict_dim = 40000 # approx_vocab_size
config.metric_type = ['accuracy']
config.optimizer_type = 'adam'
config.use_lod_tensor = True
config.embedding_norm = False
config.OOV_fill = 'uniform'
config.duplicate_data = False
# net config
config.emb_dim = 300
config.rnn_hid_dim = [512, 1024, 2048]
config.fc_dim = [1600, 1600]
config.droprate_lstm = 0.0
config.droprate_fc = 0.1
config.class_dim = 2
return config
# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Please download the Quora dataset firstly from https://drive.google.com/file/d/0B0PlTAo--BnaQWlsZl9FZ3l1c28/view?usp=sharing
# to the ROOT_DIR: $HOME/.cache/paddle/dataset
DATA_DIR=$HOME/.cache/paddle/dataset
wget --directory-prefix=$DATA_DIR http://nlp.stanford.edu/data/glove.840B.300d.zip
unzip $DATA_DIR/glove.840B.300d.zip
# The finally dataset dir should be like
# $HOME/.cache/paddle/dataset
# |- Quora_question_pair_partition
# |- train.tsv
# |- test.tsv
# |- dev.tsv
# |- readme.txt
# |- wordvec.txt
# |- glove.840B.300d.txt
Image files for this model: text_matching_on_quora
# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import numpy as np
"""
This Module defines evaluate metrics for classification tasks
"""
def accuracy(y_pred, label):
"""
define correct: the top 1 class in y_pred is the same as y_true
"""
y_pred = np.squeeze(y_pred)
y_pred_idx = np.argmax(y_pred, axis=1)
return 1.0 * np.sum(y_pred_idx == label) / label.shape[0]
def accuracy_with_threshold(y_pred, label, threshold=0.5):
"""
define correct: the y_true class's prob in y_pred is bigger than threshold
when threshold is 0.5, This fuction is equal to accuracy
"""
y_pred = np.squeeze(y_pred)
y_pred_idx = (y_pred[:, 1] > threshold).astype(int)
return 1.0 * np.sum(y_pred_idx == label) / label.shape[0]
# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from .cdssm import cdssmNet
from .dec_att import DecAttNet
from .sse import SSENet
from .infer_sent import InferSentNet
# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle.fluid as fluid
class cdssmNet():
"""cdssm net"""
def __init__(self, config):
self._config = config
def __call__(self, seq1, seq2, label):
return self.body(seq1, seq2, label, self._config)
def body(self, seq1, seq2, label, config):
"""Body function"""
def conv_model(seq):
embed = fluid.layers.embedding(input=seq, size=[config.dict_dim, config.emb_dim], param_attr='emb.w')
conv = fluid.layers.sequence_conv(embed,
num_filters=config.kernel_count,
filter_size=config.kernel_size,
filter_stride=1,
padding=True, # TODO: what is padding
bias_attr=False,
param_attr='conv1d.w',
act='relu')
#print paddle.parameters.get('conv1d.w').shape
conv = fluid.layers.dropout(conv, dropout_prob = config.droprate_conv)
pool = fluid.layers.sequence_pool(conv, pool_type="max")
fc = fluid.layers.fc(pool,
size=config.fc_dim,
param_attr='fc1.w',
bias_attr='fc1.b',
act='relu')
return fc
def MLP(vec):
for dim in config.mlp_hid_dim:
vec = fluid.layers.fc(vec, size=dim, act='relu')
vec = fluid.layers.dropout(vec, dropout_prob=config.droprate_fc)
return vec
seq1_fc = conv_model(seq1)
seq2_fc = conv_model(seq2)
concated_seq = fluid.layers.concat(input=[seq1_fc, seq2_fc], axis=1)
mlp_res = MLP(concated_seq)
prediction = fluid.layers.fc(mlp_res, size=config.class_dim, act='softmax')
loss = fluid.layers.cross_entropy(input=prediction, label=label)
avg_cost = fluid.layers.mean(x=loss)
acc = fluid.layers.accuracy(input=prediction, label=label)
return avg_cost, acc, prediction
# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle.fluid as fluid
class DecAttNet():
"""decompose attention net"""
def __init__(self, config):
self._config = config
self.initializer = fluid.initializer.Xavier(uniform=False)
def __call__(self, seq1, seq2, mask1, mask2, label):
return self.body(seq1, seq2, mask1, mask2, label)
def body(self, seq1, seq2, mask1, mask2, label):
"""Body function"""
transformed_q1 = self.transformation(seq1)
transformed_q2 = self.transformation(seq2)
masked_q1 = self.apply_mask(transformed_q1, mask1)
masked_q2 = self.apply_mask(transformed_q2, mask2)
alpha, beta = self.attend(masked_q1, masked_q2)
if self._config.share_wight_btw_seq:
seq1_compare = self.compare(masked_q1, beta, param_prefix='compare')
seq2_compare = self.compare(masked_q2, alpha, param_prefix='compare')
else:
seq1_compare = self.compare(masked_q1, beta, param_prefix='compare_1')
seq2_compare = self.compare(masked_q2, alpha, param_prefix='compare_2')
aggregate_res = self.aggregate(seq1_compare, seq2_compare)
prediction = fluid.layers.fc(aggregate_res, size=self._config.class_dim, act='softmax')
loss = fluid.layers.cross_entropy(input=prediction, label=label)
avg_cost = fluid.layers.mean(x=loss)
acc = fluid.layers.accuracy(input=prediction, label=label)
return avg_cost, acc, prediction
def apply_mask(self, seq, mask):
"""
apply mask on seq
Input: seq in shape [batch_size, seq_len, embedding_size]
Input: mask in shape [batch_size, seq_len]
Output: masked seq in shape [batch_size, seq_len, embedding_size]
"""
return fluid.layers.elementwise_mul(x=seq, y=mask, axis=0)
def feed_forward_2d(self, vec, param_prefix):
"""
Input: vec in shape [batch_size, seq_len, vec_dim]
Output: fc2 in shape [batch_size, seq_len, num_units[1]]
"""
fc1 = fluid.layers.fc(vec, size=self._config.num_units[0], num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=param_prefix+'_fc1.w',
initializer=self.initializer),
bias_attr=param_prefix + '_fc1.b', act='relu')
fc1 = fluid.layers.dropout(fc1, dropout_prob = self._config.droprate)
fc2 = fluid.layers.fc(fc1, size=self._config.num_units[1], num_flatten_dims=2,
param_attr=fluid.ParamAttr(name=param_prefix+'_fc2.w',
initializer=self.initializer),
bias_attr=param_prefix + '_fc2.b', act='relu')
fc2 = fluid.layers.dropout(fc2, dropout_prob = self._config.droprate)
return fc2
def feed_forward(self, vec, param_prefix):
"""
Input: vec in shape [batch_size, vec_dim]
Output: fc2 in shape [batch_size, num_units[1]]
"""
fc1 = fluid.layers.fc(vec, size=self._config.num_units[0], num_flatten_dims=1,
param_attr=fluid.ParamAttr(name=param_prefix+'_fc1.w',
initializer=self.initializer),
bias_attr=param_prefix + '_fc1.b', act='relu')
fc1 = fluid.layers.dropout(fc1, dropout_prob = self._config.droprate)
fc2 = fluid.layers.fc(fc1, size=self._config.num_units[1], num_flatten_dims=1,
param_attr=fluid.ParamAttr(name=param_prefix+'_fc2.w',
initializer=self.initializer),
bias_attr=param_prefix + '_fc2.b', act='relu')
fc2 = fluid.layers.dropout(fc2, dropout_prob = self._config.droprate)
return fc2
def transformation(self, seq):
embed = fluid.layers.embedding(input=seq, size=[self._config.dict_dim, self._config.emb_dim],
param_attr=fluid.ParamAttr(name='emb.w', trainable=self._config.word_embedding_trainable))
if self._config.proj_emb_dim is not None:
return fluid.layers.fc(embed, size=self._config.proj_emb_dim, num_flatten_dims=2,
param_attr=fluid.ParamAttr(name='project' + '_fc1.w',
initializer=self.initializer),
bias_attr=False,
act=None)
return embed
def attend(self, seq1, seq2):
"""
Input: seq1, shape [batch_size, seq_len1, embed_size]
Input: seq2, shape [batch_size, seq_len2, embed_size]
Output: alpha, shape [batch_size, seq_len1, embed_size]
Output: beta, shape [batch_size, seq_len2, embed_size]
"""
if self._config.share_wight_btw_seq:
seq1 = self.feed_forward_2d(seq1, param_prefix="attend")
seq2 = self.feed_forward_2d(seq2, param_prefix="attend")
else:
seq1 = self.feed_forward_2d(seq1, param_prefix="attend_1")
seq2 = self.feed_forward_2d(seq2, param_prefix="attend_2")
attention_weight = fluid.layers.matmul(seq1, seq2, transpose_y=True)
normalized_attention_weight = fluid.layers.softmax(attention_weight)
beta = fluid.layers.matmul(normalized_attention_weight, seq2)
attention_weight_t = fluid.layers.transpose(attention_weight, perm=[0, 2, 1])
normalized_attention_weight_t = fluid.layers.softmax(attention_weight_t)
alpha = fluid.layers.matmul(normalized_attention_weight_t, seq1)
return alpha, beta
def compare(self, seq, soft_alignment, param_prefix):
concat_seq = fluid.layers.concat(input=[seq, soft_alignment], axis=2)
return self.feed_forward_2d(concat_seq, param_prefix="compare")
def aggregate(self, vec1, vec2):
vec1 = fluid.layers.reduce_sum(vec1, dim=1)
vec2 = fluid.layers.reduce_sum(vec2, dim=1)
concat_vec = fluid.layers.concat(input=[vec1, vec2], axis=1)
return self.feed_forward(concat_vec, param_prefix='aggregate')
# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle.fluid as fluid
from .my_layers import bi_lstm_layer
from .match_layers import ElementwiseMatching
class InferSentNet():
"""
Base on the paper: Supervised Learning of Universal Sentence Representations from Natural Language Inference Data:
https://arxiv.org/abs/1705.02364
"""
def __init__(self, config):
self._config = config
def __call__(self, seq1, seq2, label):
return self.body(seq1, seq2, label, self._config)
def body(self, seq1, seq2, label, config):
"""Body function"""
seq1_rnn = self.encoder(seq1)
seq2_rnn = self.encoder(seq2)
seq_match = ElementwiseMatching(seq1_rnn, seq2_rnn)
mlp_res = self.MLP(seq_match)
prediction = fluid.layers.fc(mlp_res, size=self._config.class_dim, act='softmax')
loss = fluid.layers.cross_entropy(input=prediction, label=label)
avg_cost = fluid.layers.mean(x=loss)
acc = fluid.layers.accuracy(input=prediction, label=label)
return avg_cost, acc, prediction
def encoder(self, seq):
"""encoder"""
embed = fluid.layers.embedding(
input=seq,
size=[self._config.dict_dim, self._config.emb_dim],
param_attr=fluid.ParamAttr(name='emb.w', trainable=self._config.word_embedding_trainable))
bi_lstm_h = bi_lstm_layer(
embed,
rnn_hid_dim = self._config.rnn_hid_dim,
name='encoder')
bi_lstm_h = fluid.layers.dropout(bi_lstm_h, dropout_prob=self._config.droprate_lstm)
pool = fluid.layers.sequence_pool(input=bi_lstm_h, pool_type='max')
return pool
def MLP(self, vec):
if self._config.mlp_non_linear:
drop1 = fluid.layers.dropout(vec, dropout_prob=self._config.droprate_fc)
fc1 = fluid.layers.fc(drop1, size=512, act='tanh')
drop2 = fluid.layers.dropout(fc1, dropout_prob=self._config.droprate_fc)
fc2 = fluid.layers.fc(drop2, size=512, act='tanh')
res = fluid.layers.dropout(fc2, dropout_prob=self._config.droprate_fc)
else:
fc1 = fluid.layers.fc(vec, size=512, act=None)
res = fluid.layers.fc(fc1, size=512, act=None)
return res
# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
This Module provide different kinds of Match layers
"""
import paddle.fluid as fluid
def MultiPerspectiveMatching(vec1, vec2, perspective_num):
"""
MultiPerspectiveMatching
"""
sim_res = None
for i in range(perspective_num):
vec1_res = fluid.layers.elementwise_add_with_weight(
vec1,
param_attr="elementwise_add_with_weight." + str(i))
vec2_res = fluid.layers.elementwise_add_with_weight(
vec2,
param_attr="elementwise_add_with_weight." + str(i))
m = fluid.layers.cos_sim(vec1_res, vec2_res)
if sim_res is None:
sim_res = m
else:
sim_res = fluid.layers.concat(input=[sim_res, m], axis=1)
return sim_res
def ConcateMatching(vec1, vec2):
"""
ConcateMatching
"""
#TODO: assert shape
return fluid.layers.concat(input=[vec1, vec2], axis=1)
def ElementwiseMatching(vec1, vec2):
"""
reference: [Supervised Learning of Universal Sentence Representations from Natural Language Inference Data](https://arxiv.org/abs/1705.02364)
"""
elementwise_mul = fluid.layers.elementwise_mul(x=vec1, y=vec2)
elementwise_sub = fluid.layers.elementwise_sub(x=vec1, y=vec2)
elementwise_abs_sub = fluid.layers.abs(elementwise_sub)
return fluid.layers.concat(input=[vec1, vec2, elementwise_mul, elementwise_abs_sub], axis=1)
# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
This module defines some Frequently-used DNN layers
"""
import paddle.fluid as fluid
def bi_lstm_layer(input, rnn_hid_dim, name):
"""
This is a Bi-directional LSTM(long short term memory) Module
"""
fc0 = fluid.layers.fc(input=input, # fc for lstm
size=rnn_hid_dim * 4,
param_attr=name + '.fc0.w',
bias_attr=False,
act=None)
lstm_h, c = fluid.layers.dynamic_lstm(
input=fc0,
size=rnn_hid_dim * 4,
is_reverse=False,
param_attr=name + '.lstm_w',
bias_attr=name + '.lstm_b')
reversed_lstm_h, reversed_c = fluid.layers.dynamic_lstm(
input=fc0,
size=rnn_hid_dim * 4,
is_reverse=True,
param_attr=name + '.reversed_lstm_w',
bias_attr=name + '.reversed_lstm_b')
return fluid.layers.concat(input=[lstm_h, reversed_lstm_h], axis=1)
# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle.fluid as fluid
from .my_layers import bi_lstm_layer
from .match_layers import ElementwiseMatching
class SSENet():
"""
SSE net: Shortcut-Stacked Sentence Encoders for Multi-Domain Inference
https://arxiv.org/abs/1708.02312
"""
def __init__(self, config):
self._config = config
def __call__(self, seq1, seq2, label):
return self.body(seq1, seq2, label, self._config)
def body(self, seq1, seq2, label, config):
"""Body function"""
def stacked_bi_rnn_model(seq):
embed = fluid.layers.embedding(input=seq, size=[self._config.dict_dim, self._config.emb_dim], param_attr='emb.w')
stacked_lstm_out = [embed]
for i in range(len(self._config.rnn_hid_dim)):
if i == 0:
feature = embed
else:
feature = fluid.layers.concat(input = stacked_lstm_out, axis=1)
bi_lstm_h = bi_lstm_layer(feature,
rnn_hid_dim=self._config.rnn_hid_dim[i],
name="lstm_" + str(i))
# add dropout except for the last stacked lstm layer
if i != len(self._config.rnn_hid_dim) - 1:
bi_lstm_h = fluid.layers.dropout(bi_lstm_h, dropout_prob=self._config.droprate_lstm)
stacked_lstm_out.append(bi_lstm_h)
pool = fluid.layers.sequence_pool(input=bi_lstm_h, pool_type='max')
return pool
def MLP(vec):
for i in range(len(self._config.fc_dim)):
vec = fluid.layers.fc(vec, size=self._config.fc_dim[i], act='relu')
# add dropout after every layer of MLP
vec = fluid.layers.dropout(vec, dropout_prob=self._config.droprate_fc)
return vec
seq1_rnn = stacked_bi_rnn_model(seq1)
seq2_rnn = stacked_bi_rnn_model(seq2)
seq_match = ElementwiseMatching(seq1_rnn, seq2_rnn)
mlp_res = MLP(seq_match)
prediction = fluid.layers.fc(mlp_res, size=self._config.class_dim, act='softmax')
loss = fluid.layers.cross_entropy(input=prediction, label=label)
avg_cost = fluid.layers.mean(x=loss)
acc = fluid.layers.accuracy(input=prediction, label=label)
return avg_cost, acc, prediction
# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
This Module provide pretrained word-embeddings
"""
from __future__ import print_function, unicode_literals
import numpy as np
import time, datetime
import os, sys
def Glove840B_300D(filepath, keys=None):
"""
input: the "glove.840B.300d.txt" file path
return: a dict, key: word (unicode), value: a numpy array with shape [300]
"""
if keys is not None:
assert(isinstance(keys, set))
print("loading word2vec from ", filepath)
print("please wait for a minute.")
start = time.time()
word2vec = {}
with open(filepath, "r") as f:
for line in f:
if sys.version_info <= (3, 0): # for python2
line = line.decode('utf-8')
info = line.strip("\n").split(" ")
word = info[0]
if (keys is not None) and (word not in keys):
continue
vector = info[1:]
assert(len(vector) == 300)
word2vec[word] = np.asarray(vector, dtype='float32')
end = time.time()
print("Spent ", str(datetime.timedelta(seconds=end-start)), " on loading word2vec.")
return word2vec
if __name__ == '__main__':
from os.path import expanduser
home = expanduser("~")
embed_dict = Glove840B_300D(os.path.join(home, "./.cache/paddle/dataset/glove.840B.300d.txt"))
exit(0)
# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
"""
import paddle.dataset.common
import collections
import tarfile
import re
import string
import random
import os, sys
import nltk
from os.path import expanduser
__all__ = ['word_dict', 'train', 'dev', 'test']
URL = "https://drive.google.com/file/d/0B0PlTAo--BnaQWlsZl9FZ3l1c28/view"
DATA_HOME = os.path.expanduser('~/.cache/paddle/dataset')
DATA_DIR = "Quora_question_pair_partition"
QUORA_TRAIN_FILE_NAME = os.path.join(DATA_HOME, DATA_DIR, 'train.tsv')
QUORA_DEV_FILE_NAME = os.path.join(DATA_HOME, DATA_DIR, 'dev.tsv')
QUORA_TEST_FILE_NAME = os.path.join(DATA_HOME, DATA_DIR, 'test.tsv')
# punctuation or nltk or space
TOKENIZE_METHOD='space'
COLUMN_COUNT = 4
def tokenize(s):
if sys.version_info <= (3, 0): # for python2
s = s.decode('utf-8')
if TOKENIZE_METHOD == "nltk":
return nltk.tokenize.word_tokenize(s)
elif TOKENIZE_METHOD == "punctuation":
return s.translate({ord(char): None for char in string.punctuation}).lower().split()
elif TOKENIZE_METHOD == "space":
return s.split()
else:
raise RuntimeError("Invalid tokenize method")
def maybe_open(file_name):
if not os.path.isfile(file_name):
msg = "file not exist: %s\nPlease download the dataset firstly from: %s\n\n" % (file_name, URL) + \
("# The finally dataset dir should be like\n\n"
"$HOME/.cache/paddle/dataset\n"
" |- Quora_question_pair_partition\n"
" |- train.tsv\n"
" |- test.tsv\n"
" |- dev.tsv\n"
" |- readme.txt\n"
" |- wordvec.txt\n")
raise RuntimeError(msg)
return open(file_name, 'r')
def tokenized_question_pairs(file_name):
"""
"""
with maybe_open(file_name) as f:
questions = {}
lines = f.readlines()
for line in lines:
info = line.strip().split('\t')
if len(info) != COLUMN_COUNT:
# formatting error
continue
(label, question1, question2, id) = info
question1 = tokenize(question1)
question2 = tokenize(question2)
yield question1, question2, int(label)
def tokenized_questions(file_name):
"""
"""
with maybe_open(file_name) as f:
lines = f.readlines()
for line in lines:
info = line.strip().split('\t')
if len(info) != COLUMN_COUNT:
# formatting error
continue
(label, question1, question2, id) = info
yield tokenize(question1)
yield tokenize(question2)
def build_dict(file_name, cutoff):
"""
Build a word dictionary from the corpus. Keys of the dictionary are words,
and values are zero-based IDs of these words.
"""
word_freq = collections.defaultdict(int)
for doc in tokenized_questions(file_name):
for word in doc:
word_freq[word] += 1
word_freq = filter(lambda x: x[1] > cutoff, word_freq.items())
dictionary = sorted(word_freq, key=lambda x: (-x[1], x[0]))
words, _ = list(zip(*dictionary))
word_idx = dict(zip(words, range(len(words))))
word_idx['<unk>'] = len(words)
word_idx['<pad>'] = len(words) + 1
return word_idx
def reader_creator(file_name, word_idx):
UNK_ID = word_idx['<unk>']
def reader():
for (q1, q2, label) in tokenized_question_pairs(file_name):
q1_ids = [word_idx.get(w, UNK_ID) for w in q1]
q2_ids = [word_idx.get(w, UNK_ID) for w in q2]
if q1_ids != [] and q2_ids != []: # [] is not allowed in fluid
assert(label in [0, 1])
yield q1_ids, q2_ids, label
return reader
def train(word_idx):
"""
Quora training set creator.
It returns a reader creator, each sample in the reader is two zero-based ID
list and label in [0, 1].
:param word_idx: word dictionary
:type word_idx: dict
:return: Training reader creator
:rtype: callable
"""
return reader_creator(QUORA_TRAIN_FILE_NAME, word_idx)
def dev(word_idx):
"""
Quora develop set creator.
It returns a reader creator, each sample in the reader is two zero-based ID
list and label in [0, 1].
:param word_idx: word dictionary
:type word_idx: dict
:return: develop reader creator
:rtype: callable
"""
return reader_creator(QUORA_DEV_FILE_NAME, word_idx)
def test(word_idx):
"""
Quora test set creator.
It returns a reader creator, each sample in the reader is two zero-based ID
list and label in [0, 1].
:param word_idx: word dictionary
:type word_idx: dict
:return: Test reader creator
:rtype: callable
"""
return reader_creator(QUORA_TEST_FILE_NAME, word_idx)
def word_dict():
"""
Build a word dictionary from the corpus.
:return: Word dictionary
:rtype: dict
"""
return build_dict(file_name=QUORA_TRAIN_FILE_NAME, cutoff=4)
#Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
import os
import sys
import time
import argparse
import unittest
import contextlib
import numpy as np
import paddle.fluid as fluid
import utils, metric, configs
import models
from pretrained_word2vec import Glove840B_300D
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument('--model_name', type=str, default='cdssmNet', help="Which model to train")
parser.add_argument('--config', type=str, default='cdssm_base', help="The global config setting")
DATA_DIR = os.path.join(os.path.expanduser('~'), '.cache/paddle/dataset')
def evaluate(epoch_id, exe, inference_program, dev_reader, test_reader, fetch_list, feeder, metric_type):
"""
evaluate on test/dev dataset
"""
def infer(test_reader):
"""
do inference function
"""
total_cost = 0.0
total_count = 0
preds, labels = [], []
for data in test_reader():
avg_cost, avg_acc, batch_prediction = exe.run(inference_program,
feed=feeder.feed(data),
fetch_list=fetch_list,
return_numpy=True)
total_cost += avg_cost * len(data)
total_count += len(data)
preds.append(batch_prediction)
labels.append(np.asarray([x[-1] for x in data], dtype=np.int64))
y_pred = np.concatenate(preds)
y_label = np.concatenate(labels)
metric_res = []
for metric_name in metric_type:
if metric_name == 'accuracy_with_threshold':
metric_res.append((metric_name, metric.accuracy_with_threshold(y_pred, y_label, threshold=0.3)))
elif metric_name == 'accuracy':
metric_res.append((metric_name, metric.accuracy(y_pred, y_label)))
else:
print("Unknown metric type: ", metric_name)
exit()
return total_cost / (total_count * 1.0), metric_res
dev_cost, dev_metric_res = infer(dev_reader)
print("[%s] epoch_id: %d, dev_cost: %f, " % (
time.asctime( time.localtime(time.time()) ),
epoch_id,
dev_cost)
+ ', '.join([str(x[0]) + ": " + str(x[1]) for x in dev_metric_res]))
test_cost, test_metric_res = infer(test_reader)
print("[%s] epoch_id: %d, test_cost: %f, " % (
time.asctime( time.localtime(time.time()) ),
epoch_id,
test_cost)
+ ', '.join([str(x[0]) + ": " + str(x[1]) for x in test_metric_res]))
print("")
def train_and_evaluate(train_reader,
test_reader,
dev_reader,
network,
optimizer,
global_config,
pretrained_word_embedding,
use_cuda,
parallel):
"""
train network
"""
# define the net
if global_config.use_lod_tensor:
# automatic add batch dim
q1 = fluid.layers.data(
name="question1", shape=[1], dtype="int64", lod_level=1)
q2 = fluid.layers.data(
name="question2", shape=[1], dtype="int64", lod_level=1)
label = fluid.layers.data(name="label", shape=[1], dtype="int64")
cost, acc, prediction = network(q1, q2, label)
else:
# shape: [batch_size, max_seq_len_in_batch, 1]
q1 = fluid.layers.data(
name="question1", shape=[-1, -1, 1], dtype="int64")
q2 = fluid.layers.data(
name="question2", shape=[-1, -1, 1], dtype="int64")
# shape: [batch_size, max_seq_len_in_batch]
mask1 = fluid.layers.data(name="mask1", shape=[-1, -1], dtype="float32")
mask2 = fluid.layers.data(name="mask2", shape=[-1, -1], dtype="float32")
label = fluid.layers.data(name="label", shape=[1], dtype="int64")
cost, acc, prediction = network(q1, q2, mask1, mask2, label)
if parallel:
# TODO: Paarallel Training
print("Parallel Training is not supported for now.")
sys.exit(1)
#optimizer.minimize(cost)
if use_cuda:
print("Using GPU")
place = fluid.CUDAPlace(0)
else:
print("Using CPU")
place = fluid.CPUPlace()
exe = fluid.Executor(place)
if global_config.use_lod_tensor:
feeder = fluid.DataFeeder(feed_list=[q1, q2, label], place=place)
else:
feeder = fluid.DataFeeder(feed_list=[q1, q2, mask1, mask2, label], place=place)
# logging param info
for param in fluid.default_main_program().global_block().all_parameters():
print("param name: %s; param shape: %s" % (param.name, param.shape))
# define inference_program
inference_program = fluid.default_main_program().clone(for_test=True)
optimizer.minimize(cost)
exe.run(fluid.default_startup_program())
# load emb from a numpy erray
if pretrained_word_embedding is not None:
print("loading pretrained word embedding to param")
embedding_name = "emb.w"
embedding_param = fluid.global_scope().find_var(embedding_name).get_tensor()
embedding_param.set(pretrained_word_embedding, place)
evaluate(-1,
exe,
inference_program,
dev_reader,
test_reader,
fetch_list=[cost, acc, prediction],
feeder=feeder,
metric_type=global_config.metric_type)
# start training
print("[%s] Start Training" % time.asctime(time.localtime(time.time())))
for epoch_id in range(global_config.epoch_num):
data_size, data_count, total_acc, total_cost = 0, 0, 0.0, 0.0
batch_id = 0
for data in train_reader():
avg_cost_np, avg_acc_np = exe.run(fluid.default_main_program(),
feed=feeder.feed(data),
fetch_list=[cost, acc])
data_size = len(data)
total_acc += data_size * avg_acc_np
total_cost += data_size * avg_cost_np
data_count += data_size
if batch_id % 100 == 0:
print("[%s] epoch_id: %d, batch_id: %d, cost: %f, acc: %f" % (
time.asctime(time.localtime(time.time())),
epoch_id,
batch_id,
avg_cost_np,
avg_acc_np))
batch_id += 1
avg_cost = total_cost / data_count
avg_acc = total_acc / data_count
print("")
print("[%s] epoch_id: %d, train_avg_cost: %f, train_avg_acc: %f" % (
time.asctime( time.localtime(time.time()) ), epoch_id, avg_cost, avg_acc))
epoch_model = global_config.save_dirname + "/" + "epoch" + str(epoch_id)
fluid.io.save_inference_model(epoch_model, ["question1", "question2", "label"], acc, exe)
evaluate(epoch_id,
exe,
inference_program,
dev_reader,
test_reader,
fetch_list=[cost, acc, prediction],
feeder=feeder,
metric_type=global_config.metric_type)
def main():
"""
This function will parse argments, prepare data and prepare pretrained embedding
"""
args = parser.parse_args()
global_config = configs.__dict__[args.config]()
print("net_name: ", args.model_name)
net = models.__dict__[args.model_name](global_config)
# get word_dict
word_dict = utils.getDict(data_type="quora_question_pairs")
# get reader
train_reader, dev_reader, test_reader = utils.prepare_data(
"quora_question_pairs",
word_dict=word_dict,
batch_size = global_config.batch_size,
buf_size=800000,
duplicate_data=global_config.duplicate_data,
use_pad=(not global_config.use_lod_tensor))
# load pretrained_word_embedding
if global_config.use_pretrained_word_embedding:
word2vec = Glove840B_300D(filepath=os.path.join(DATA_DIR, "glove.840B.300d.txt"),
keys=set(word_dict.keys()))
pretrained_word_embedding = utils.get_pretrained_word_embedding(
word2vec=word2vec,
word2id=word_dict,
config=global_config)
print("pretrained_word_embedding to be load:", pretrained_word_embedding)
else:
pretrained_word_embedding = None
# define optimizer
optimizer = utils.getOptimizer(global_config)
# use cuda or not
if not global_config.has_member('use_cuda'):
global_config.use_cuda = 'CUDA_VISIBLE_DEVICES' in os.environ
global_config.list_config()
train_and_evaluate(
train_reader,
dev_reader,
test_reader,
net,
optimizer,
global_config,
pretrained_word_embedding,
use_cuda=global_config.use_cuda,
parallel=False)
if __name__ == "__main__":
main()
# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
This module provides utilities for data generator and optimizer definition
"""
import sys
import time
import numpy as np
import paddle.fluid as fluid
import paddle
import quora_question_pairs
def to_lodtensor(data, place):
"""
convert to LODtensor
"""
seq_lens = [len(seq) for seq in data]
cur_len = 0
lod = [cur_len]
for l in seq_lens:
cur_len += l
lod.append(cur_len)
flattened_data = np.concatenate(data, axis=0).astype("int64")
flattened_data = flattened_data.reshape([len(flattened_data), 1])
res = fluid.LoDTensor()
res.set(flattened_data, place)
res.set_lod([lod])
return res
def getOptimizer(global_config):
"""
get Optimizer by config
"""
if global_config.optimizer_type == "adam":
optimizer = fluid.optimizer.Adam(learning_rate=fluid.layers.exponential_decay(
learning_rate=global_config.learning_rate,
decay_steps=global_config.train_samples_num // global_config.batch_size,
decay_rate=global_config.lr_decay))
elif global_config.optimizer_type == "sgd":
optimizer = fluid.optimizer.SGD(learning_rate=fluid.layers.exponential_decay(
learning_rate=global_config.learning_rate,
decay_steps=global_config.train_samples_num // global_config.batch_size,
decay_rate=global_config.lr_decay))
elif global_config.optimizer_type == "adagrad":
optimizer = fluid.optimizer.Adagrad(learning_rate=fluid.layers.exponential_decay(
learning_rate=global_config.learning_rate,
decay_steps=global_config.train_samples_num // global_config.batch_size,
decay_rate=global_config.lr_decay))
return optimizer
def get_pretrained_word_embedding(word2vec, word2id, config):
"""get pretrained embedding in shape [config.dict_dim, config.emb_dim]"""
print("preparing pretrained word embedding ...")
assert(config.dict_dim >= len(word2id))
word2id = sorted(word2id.items(), key = lambda x : x[1])
words = [x[0] for x in word2id]
words = words + ['<not-a-real-words>'] * (config.dict_dim - len(words))
pretrained_emb = []
for _, word in enumerate(words):
if word in word2vec:
assert(len(word2vec[word] == config.emb_dim))
if config.embedding_norm:
pretrained_emb.append(word2vec[word] / np.linalg.norm(word2vec[word]))
else:
pretrained_emb.append(word2vec[word])
elif config.OOV_fill == 'uniform':
pretrained_emb.append(np.random.uniform(-0.05, 0.05, size=[config.emb_dim]).astype(np.float32))
elif config.OOV_fill == 'normal':
pretrained_emb.append(np.random.normal(loc=0.0, scale=0.1, size=[config.emb_dim]).astype(np.float32))
else:
print("Unkown OOV fill method: ", OOV_fill)
exit()
word_embedding = np.stack(pretrained_emb)
return word_embedding
def getDict(data_type="quora_question_pairs"):
"""
get word2id dict from quora dataset
"""
print("Generating word dict...")
if data_type == "quora_question_pairs":
word_dict = quora_question_pairs.word_dict()
else:
raise RuntimeError("No such dataset")
print("Vocab size: ", len(word_dict))
return word_dict
def duplicate(reader):
"""
duplicate the quora qestion pairs since there are 2 questions in a sample
Input: reader, which yield (question1, question2, label)
Output: reader, which yield (question1, question2, label) and yield (question2, question1, label)
"""
def duplicated_reader():
for data in reader():
(q1, q2, label) = data
yield (q1, q2, label)
yield (q2, q1, label)
return duplicated_reader
def pad(reader, PAD_ID):
"""
Input: reader, yield batches of [(question1, question2, label), ... ]
Output: padded_reader, yield batches of [(padded_question1, padded_question2, mask1, mask2, label), ... ]
"""
assert(isinstance(PAD_ID, int))
def padded_reader():
for batch in reader():
max_len1 = max([len(data[0]) for data in batch])
max_len2 = max([len(data[1]) for data in batch])
padded_batch = []
for data in batch:
question1, question2, label = data
seq_len1 = len(question1)
seq_len2 = len(question2)
mask1 = [1] * seq_len1 + [0] * (max_len1 - seq_len1)
mask2 = [1] * seq_len2 + [0] * (max_len2 - seq_len2)
padded_question1 = question1 + [PAD_ID] * (max_len1 - seq_len1)
padded_question2 = question2 + [PAD_ID] * (max_len2 - seq_len2)
padded_question1 = [[x] for x in padded_question1] # last dim of questions must be 1, according to fluid's request
padded_question2 = [[x] for x in padded_question2]
assert(len(mask1) == max_len1)
assert(len(mask2) == max_len2)
assert(len(padded_question1) == max_len1)
assert(len(padded_question2) == max_len2)
padded_batch.append((padded_question1, padded_question2, mask1, mask2, label))
yield padded_batch
return padded_reader
def prepare_data(data_type,
word_dict,
batch_size,
buf_size=50000,
duplicate_data=False,
use_pad=False):
"""
prepare data
"""
PAD_ID=word_dict['<pad>']
if data_type == "quora_question_pairs":
# train/dev/test reader are batched iters which yield a batch of (question1, question2, label) each time
# qestion1 and question2 are lists of word ID
# label is 0 or 1
# for example: ([1, 3, 2], [7, 5, 4, 99], 1)
def prepare_reader(reader):
if duplicate_data:
reader = duplicate(reader)
reader = paddle.batch(
paddle.reader.shuffle(reader, buf_size=buf_size),
batch_size=batch_size,
drop_last=False)
if use_pad:
reader = pad(reader, PAD_ID=PAD_ID)
return reader
train_reader = prepare_reader(quora_question_pairs.train(word_dict))
dev_reader = prepare_reader(quora_question_pairs.dev(word_dict))
test_reader = prepare_reader(quora_question_pairs.test(word_dict))
else:
raise RuntimeError("no such dataset")
return train_reader, dev_reader, test_reader
...@@ -2,7 +2,7 @@ import os ...@@ -2,7 +2,7 @@ import os
import numpy as np import numpy as np
import time import time
import sys import sys
import paddle.v2 as paddle import paddle
import paddle.fluid as fluid import paddle.fluid as fluid
from resnet import TSN_ResNet from resnet import TSN_ResNet
import reader import reader
......
...@@ -2,7 +2,7 @@ import os ...@@ -2,7 +2,7 @@ import os
import numpy as np import numpy as np
import time import time
import sys import sys
import paddle.v2 as paddle import paddle
import paddle.fluid as fluid import paddle.fluid as fluid
from resnet import TSN_ResNet from resnet import TSN_ResNet
import reader import reader
......
import os import os
import sys
import math import math
import random import random
import functools import functools
import cPickle try:
from cStringIO import StringIO import cPickle as pickle
from cStringIO import StringIO
except ImportError:
import pickle
from io import BytesIO
import numpy as np import numpy as np
import paddle.v2 as paddle import paddle
from PIL import Image, ImageEnhance from PIL import Image, ImageEnhance
random.seed(0) random.seed(0)
DATA_DIM = 224
THREAD = 8 THREAD = 8
BUF_SIZE = 1024 BUF_SIZE = 1024
...@@ -22,17 +25,13 @@ INFER_LIST = 'data/test.list' ...@@ -22,17 +25,13 @@ INFER_LIST = 'data/test.list'
img_mean = np.array([0.485, 0.456, 0.406]).reshape((3, 1, 1)) img_mean = np.array([0.485, 0.456, 0.406]).reshape((3, 1, 1))
img_std = np.array([0.229, 0.224, 0.225]).reshape((3, 1, 1)) img_std = np.array([0.229, 0.224, 0.225]).reshape((3, 1, 1))
python_ver = sys.version_info
def imageloader(buf): def imageloader(buf):
if isinstance(buf, str): if isinstance(buf, str):
tempbuff = StringIO()
tempbuff.write(buf)
tempbuff.seek(0)
img = Image.open(tempbuff)
elif isinstance(buf, collections.Sequence):
img = Image.open(StringIO(buf[-1]))
else:
img = Image.open(StringIO(buf)) img = Image.open(StringIO(buf))
else:
img = Image.open(BytesIO(buf))
return img.convert('RGB') return img.convert('RGB')
...@@ -98,7 +97,7 @@ def group_center_crop(img_group, target_size): ...@@ -98,7 +97,7 @@ def group_center_crop(img_group, target_size):
def video_loader(frames, nsample, mode): def video_loader(frames, nsample, mode):
videolen = len(frames) videolen = len(frames)
average_dur = videolen / nsample average_dur = videolen // nsample
imgs = [] imgs = []
for i in range(nsample): for i in range(nsample):
...@@ -111,12 +110,12 @@ def video_loader(frames, nsample, mode): ...@@ -111,12 +110,12 @@ def video_loader(frames, nsample, mode):
idx = i idx = i
else: else:
if average_dur >= 1: if average_dur >= 1:
idx = (average_dur - 1) / 2 idx = (average_dur - 1) // 2
idx += i * average_dur idx += i * average_dur
else: else:
idx = i idx = i
imgbuf = frames[idx % videolen] imgbuf = frames[int(idx % videolen)]
img = imageloader(imgbuf) img = imageloader(imgbuf)
imgs.append(img) imgs.append(img)
...@@ -125,7 +124,10 @@ def video_loader(frames, nsample, mode): ...@@ -125,7 +124,10 @@ def video_loader(frames, nsample, mode):
def decode_pickle(sample, mode, seg_num, short_size, target_size): def decode_pickle(sample, mode, seg_num, short_size, target_size):
pickle_path = sample[0] pickle_path = sample[0]
data_loaded = cPickle.load(open(pickle_path)) if python_ver < (3, 0):
data_loaded = pickle.load(open(pickle_path, 'rb'))
else:
data_loaded = pickle.load(open(pickle_path, 'rb'), encoding='bytes')
vid, label, frames = data_loaded vid, label, frames = data_loaded
imgs = video_loader(frames, seg_num, mode) imgs = video_loader(frames, seg_num, mode)
......
...@@ -22,7 +22,7 @@ class TSN_ResNet(): ...@@ -22,7 +22,7 @@ class TSN_ResNet():
num_filters=num_filters, num_filters=num_filters,
filter_size=filter_size, filter_size=filter_size,
stride=stride, stride=stride,
padding=(filter_size - 1) / 2, padding=(filter_size - 1) // 2,
groups=groups, groups=groups,
act=None, act=None,
bias_attr=False) bias_attr=False)
......
...@@ -2,7 +2,7 @@ import os ...@@ -2,7 +2,7 @@ import os
import numpy as np import numpy as np
import time import time
import sys import sys
import paddle.v2 as paddle import paddle
import paddle.fluid as fluid import paddle.fluid as fluid
from resnet import TSN_ResNet from resnet import TSN_ResNet
import reader import reader
...@@ -91,7 +91,7 @@ def train(args): ...@@ -91,7 +91,7 @@ def train(args):
fluid.io.load_vars(exe, pretrained_model, vars=vars) fluid.io.load_vars(exe, pretrained_model, vars=vars)
# reader # reader
train_reader = paddle.batch(reader.train(seg_num), batch_size=batch_size) train_reader = paddle.batch(reader.train(seg_num), batch_size=batch_size, drop_last=True)
# test in single GPU # test in single GPU
test_reader = paddle.batch(reader.test(seg_num), batch_size=batch_size / 16) test_reader = paddle.batch(reader.test(seg_num), batch_size=batch_size / 16)
feeder = fluid.DataFeeder(place=place, feed_list=[image, label]) feeder = fluid.DataFeeder(place=place, feed_list=[image, label])
......
...@@ -17,6 +17,7 @@ from __future__ import division ...@@ -17,6 +17,7 @@ from __future__ import division
from __future__ import print_function from __future__ import print_function
import distutils.util import distutils.util
import numpy as np import numpy as np
import six
from paddle.fluid import core from paddle.fluid import core
...@@ -36,7 +37,7 @@ def print_arguments(args): ...@@ -36,7 +37,7 @@ def print_arguments(args):
:type args: argparse.Namespace :type args: argparse.Namespace
""" """
print("----------- Configuration Arguments -----------") print("----------- Configuration Arguments -----------")
for arg, value in sorted(vars(args).iteritems()): for arg, value in sorted(six.iteritems(vars(args))):
print("%s: %s" % (arg, value)) print("%s: %s" % (arg, value))
print("------------------------------------------------") print("------------------------------------------------")
......
...@@ -12,23 +12,23 @@ The word embedding expresses words with a real vector. Each dimension of the vec ...@@ -12,23 +12,23 @@ The word embedding expresses words with a real vector. Each dimension of the vec
In the example of word vectors, we show how to use Hierarchical-Sigmoid and Noise Contrastive Estimation (NCE) to accelerate word-vector learning. In the example of word vectors, we show how to use Hierarchical-Sigmoid and Noise Contrastive Estimation (NCE) to accelerate word-vector learning.
- 1.1 [Hsigmoid Accelerated Word Vector Training](https://github.com/PaddlePaddle/models/tree/develop/v2/hsigmoid) - 1.1 [Hsigmoid Accelerated Word Vector Training](https://github.com/PaddlePaddle/models/tree/develop/legacy/hsigmoid)
- 1.2 [Noise Contrastive Estimation Accelerated Word Vector Training](https://github.com/PaddlePaddle/models/tree/develop/v2/nce_cost) - 1.2 [Noise Contrastive Estimation Accelerated Word Vector Training](https://github.com/PaddlePaddle/models/tree/develop/legacy/nce_cost)
## 2. RNN language model ## 2. RNN language model
The language model is important in the field of natural language processing. In addition to getting the word vector (a by-product of language model training), it can also help us to generate text. Given a number of words, the language model can help us predict the next most likely word. In the example of using the language model to generate text, we focus on the recurrent neural network language model. We can use the instructions in the document quickly adapt to their training corpus, complete automatic writing poetry, automatic writing prose and other interesting models. The language model is important in the field of natural language processing. In addition to getting the word vector (a by-product of language model training), it can also help us to generate text. Given a number of words, the language model can help us predict the next most likely word. In the example of using the language model to generate text, we focus on the recurrent neural network language model. We can use the instructions in the document quickly adapt to their training corpus, complete automatic writing poetry, automatic writing prose and other interesting models.
- 2.1 [Generate text using the RNN language model](https://github.com/PaddlePaddle/models/tree/develop/v2/generate_sequence_by_rnn_lm) - 2.1 [Generate text using the RNN language model](https://github.com/PaddlePaddle/models/tree/develop/legacy/generate_sequence_by_rnn_lm)
## 3. Click-Through Rate prediction ## 3. Click-Through Rate prediction
The click-through rate model predicts the probability that a user will click on an ad. This is widely used for advertising technology. Logistic Regression has a good learning performance for large-scale sparse features in the early stages of the development of click-through rate prediction. In recent years, DNN model because of its strong learning ability to gradually take the banner rate of the task of the banner. The click-through rate model predicts the probability that a user will click on an ad. This is widely used for advertising technology. Logistic Regression has a good learning performance for large-scale sparse features in the early stages of the development of click-through rate prediction. In recent years, DNN model because of its strong learning ability to gradually take the banner rate of the task of the banner.
In the example of click-through rate estimates, we first give the Google's Wide & Deep model. This model combines the advantages of DNN and the applicable logistic regression model for DNN and large-scale sparse features. Then we provide the deep factorization machine for click-through rate prediction. The deep factorization machine combines the factorization machine and deep neural networks to model both low order and high order interactions of input features. In the example of click-through rate estimates, we first give the Google's Wide & Deep model. This model combines the advantages of DNN and the applicable logistic regression model for DNN and large-scale sparse features. Then we provide the deep factorization machine for click-through rate prediction. The deep factorization machine combines the factorization machine and deep neural networks to model both low order and high order interactions of input features.
- 3.1 [Click-Through Rate Model](https://github.com/PaddlePaddle/models/tree/develop/v2/ctr) - 3.1 [Click-Through Rate Model](https://github.com/PaddlePaddle/models/tree/develop/legacy/ctr)
- 3.2 [Deep Factorization Machine for Click-Through Rate prediction](https://github.com/PaddlePaddle/models/tree/develop/v2/deep_fm) - 3.2 [Deep Factorization Machine for Click-Through Rate prediction](https://github.com/PaddlePaddle/models/tree/develop/legacy/deep_fm)
## 4. Text classification ## 4. Text classification
...@@ -36,7 +36,7 @@ Text classification is one of the most basic tasks in natural language processin ...@@ -36,7 +36,7 @@ Text classification is one of the most basic tasks in natural language processin
For text classification, we provide a non-sequential text classification model based on DNN and CNN. (For LSTM-based model, please refer to PaddleBook [Sentiment Analysis](http://www.paddlepaddle.org/docs/develop/book/06.understand_sentiment/index.html)). For text classification, we provide a non-sequential text classification model based on DNN and CNN. (For LSTM-based model, please refer to PaddleBook [Sentiment Analysis](http://www.paddlepaddle.org/docs/develop/book/06.understand_sentiment/index.html)).
- 4.1 [Sentiment analysis based on DNN / CNN](https://github.com/PaddlePaddle/models/tree/develop/v2/text_classification) - 4.1 [Sentiment analysis based on DNN / CNN](https://github.com/PaddlePaddle/models/tree/develop/legacy/text_classification)
## 5. Learning to rank ## 5. Learning to rank
...@@ -45,14 +45,14 @@ The depth neural network can be used to model the fractional function to form va ...@@ -45,14 +45,14 @@ The depth neural network can be used to model the fractional function to form va
The algorithms for learning to rank are usually categorized into three groups by their input representation and the loss function. These are pointwise, pairwise and listwise approaches. Here we demonstrate RankLoss loss function method (pairwise approach), and LambdaRank loss function method (listwise approach). (For Pointwise approaches, please refer to [Recommended System](http://www.paddlepaddle.org/docs/develop/book/05.recommender_system/index.html)). The algorithms for learning to rank are usually categorized into three groups by their input representation and the loss function. These are pointwise, pairwise and listwise approaches. Here we demonstrate RankLoss loss function method (pairwise approach), and LambdaRank loss function method (listwise approach). (For Pointwise approaches, please refer to [Recommended System](http://www.paddlepaddle.org/docs/develop/book/05.recommender_system/index.html)).
- 5.1 [Learning to rank based on Pairwise and Listwise approches](https://github.com/PaddlePaddle/models/tree/develop/v2/ltr) - 5.1 [Learning to rank based on Pairwise and Listwise approches](https://github.com/PaddlePaddle/models/tree/develop/legacy/ltr)
## 6. Semantic model ## 6. Semantic model
The deep structured semantic model uses the DNN model to learn the vector representation of the low latitude in a continuous semantic space, finally models the semantic similarity between the two sentences. The deep structured semantic model uses the DNN model to learn the vector representation of the low latitude in a continuous semantic space, finally models the semantic similarity between the two sentences.
In this example, we demonstrate how to use PaddlePaddle to implement a generic deep structured semantic model to model the semantic similarity between two strings. The model supports different network structures such as CNN (Convolutional Network), FC (Fully Connected Network), RNN (Recurrent Neural Network), and different loss functions such as classification, regression, and sequencing. In this example, we demonstrate how to use PaddlePaddle to implement a generic deep structured semantic model to model the semantic similarity between two strings. The model supports different network structures such as CNN (Convolutional Network), FC (Fully Connected Network), RNN (Recurrent Neural Network), and different loss functions such as classification, regression, and sequencing.
- 6.1 [Deep structured semantic model](https://github.com/PaddlePaddle/models/tree/develop/v2/dssm) - 6.1 [Deep structured semantic model](https://github.com/PaddlePaddle/models/tree/develop/legacy/dssm)
## 7. Sequence tagging ## 7. Sequence tagging
...@@ -60,7 +60,7 @@ Given the input sequence, the sequence tagging model is one of the most basic ta ...@@ -60,7 +60,7 @@ Given the input sequence, the sequence tagging model is one of the most basic ta
In the example of the sequence tagging, we describe how to train an end-to-end sequence tagging model with the Named Entity Recognition (NER) task as an example. In the example of the sequence tagging, we describe how to train an end-to-end sequence tagging model with the Named Entity Recognition (NER) task as an example.
- 7.1 [Name Entity Recognition](https://github.com/PaddlePaddle/models/tree/develop/v2/sequence_tagging_for_ner) - 7.1 [Name Entity Recognition](https://github.com/PaddlePaddle/models/tree/develop/legacy/sequence_tagging_for_ner)
## 8. Sequence to sequence learning ## 8. Sequence to sequence learning
...@@ -68,19 +68,19 @@ Sequence-to-sequence model has a wide range of applications. This includes machi ...@@ -68,19 +68,19 @@ Sequence-to-sequence model has a wide range of applications. This includes machi
As an example for sequence-to-sequence learning, we take the machine translation task. We demonstrate the sequence-to-sequence mapping model without attention mechanism, which is the basis for all sequence-to-sequence learning models. We will use scheduled sampling to improve the problem of error accumulation in the RNN model, and machine translation with external memory mechanism. As an example for sequence-to-sequence learning, we take the machine translation task. We demonstrate the sequence-to-sequence mapping model without attention mechanism, which is the basis for all sequence-to-sequence learning models. We will use scheduled sampling to improve the problem of error accumulation in the RNN model, and machine translation with external memory mechanism.
- 8.1 [Basic Sequence-to-sequence model](https://github.com/PaddlePaddle/models/tree/develop/v2/nmt_without_attention) - 8.1 [Basic Sequence-to-sequence model](https://github.com/PaddlePaddle/models/tree/develop/legacy/nmt_without_attention)
## 9. Image classification ## 9. Image classification
For the example of image classification, we show you how to train AlexNet, VGG, GoogLeNet, ResNet, Inception-v4, Inception-Resnet-V2 and Xception models in PaddlePaddle. It also provides model conversion tools that convert Caffe or TensorFlow trained model files into PaddlePaddle model files. For the example of image classification, we show you how to train AlexNet, VGG, GoogLeNet, ResNet, Inception-v4, Inception-Resnet-V2 and Xception models in PaddlePaddle. It also provides model conversion tools that convert Caffe or TensorFlow trained model files into PaddlePaddle model files.
- 9.1 [convert Caffe model file to PaddlePaddle model file](https://github.com/PaddlePaddle/models/tree/develop/v2/image_classification/caffe2paddle) - 9.1 [convert Caffe model file to PaddlePaddle model file](https://github.com/PaddlePaddle/models/tree/develop/legacy/image_classification/caffe2paddle)
- 9.2 [convert TensorFlow model file to PaddlePaddle model file](https://github.com/PaddlePaddle/models/tree/develop/v2/image_classification/tf2paddle) - 9.2 [convert TensorFlow model file to PaddlePaddle model file](https://github.com/PaddlePaddle/models/tree/develop/legacy/image_classification/tf2paddle)
- 9.3 [AlexNet](https://github.com/PaddlePaddle/models/tree/develop/v2/image_classification) - 9.3 [AlexNet](https://github.com/PaddlePaddle/models/tree/develop/legacy/image_classification)
- 9.4 [VGG](https://github.com/PaddlePaddle/models/tree/develop/v2/image_classification) - 9.4 [VGG](https://github.com/PaddlePaddle/models/tree/develop/legacy/image_classification)
- 9.5 [Residual Network](https://github.com/PaddlePaddle/models/tree/develop/v2/image_classification) - 9.5 [Residual Network](https://github.com/PaddlePaddle/models/tree/develop/legacy/image_classification)
- 9.6 [Inception-v4](https://github.com/PaddlePaddle/models/tree/develop/v2/image_classification) - 9.6 [Inception-v4](https://github.com/PaddlePaddle/models/tree/develop/legacy/image_classification)
- 9.7 [Inception-Resnet-V2](https://github.com/PaddlePaddle/models/tree/develop/v2/image_classification) - 9.7 [Inception-Resnet-V2](https://github.com/PaddlePaddle/models/tree/develop/legacy/image_classification)
- 9.8 [Xception](https://github.com/PaddlePaddle/models/tree/develop/v2/image_classification) - 9.8 [Xception](https://github.com/PaddlePaddle/models/tree/develop/legacy/image_classification)
This tutorial is contributed by [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) and licensed under the [Apache-2.0 license](LICENSE). This tutorial is contributed by [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) and licensed under the [Apache-2.0 license](LICENSE).
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册