diff --git a/PaddleNLP/unarchived/neural_machine_translation/rnn_search/.run_ce.sh b/PaddleNLP/unarchived/neural_machine_translation/rnn_search/.run_ce.sh
deleted file mode 100755
index 6be159cb5268ae215998e7a19045f7aa0d620f63..0000000000000000000000000000000000000000
--- a/PaddleNLP/unarchived/neural_machine_translation/rnn_search/.run_ce.sh
+++ /dev/null
@@ -1,5 +0,0 @@
-###!/bin/bash
-####This file is only used for continuous evaluation.
-
-model_file='train.py'
-python $model_file --pass_num 1 --learning_rate 0.001 --save_interval 10 --enable_ce | python _ce.py
diff --git a/PaddleNLP/unarchived/neural_machine_translation/rnn_search/README.md b/PaddleNLP/unarchived/neural_machine_translation/rnn_search/README.md
index 556ea6f5fc481a120bcca67e4c1c7b9c28856b7f..991ee9cc0ad11672967e7dafb8b63914e6dd5935 100644
--- a/PaddleNLP/unarchived/neural_machine_translation/rnn_search/README.md
+++ b/PaddleNLP/unarchived/neural_machine_translation/rnn_search/README.md
@@ -10,8 +10,11 @@
├── args.py # 训练、预测以及模型参数
├── train.py # 训练主程序
├── infer.py # 预测主程序
+├── run.sh # 默认配置的启动脚本
+├── infer.sh # 默认配置的解码脚本
├── attention_model.py # 带注意力机制的翻译模型配置
-└── no_attention_model.py # 无注意力机制的翻译模型配置
+└── base_model.py # 无注意力机制的翻译模型配置
+
```
## 简介
@@ -19,116 +22,93 @@
近年来,深度学习技术的发展不断为机器翻译任务带来新的突破。直接用神经网络将源语言映射到目标语言,即端到端的神经网络机器翻译(End-to-End Neural Machine Translation, End-to-End NMT)模型逐渐成为主流,此类模型一般简称为NMT模型。
-本目录包含一个经典的机器翻译模型[RNN Search](https://arxiv.org/pdf/1409.0473.pdf)的Paddle Fluid实现。事实上,RNN search是一个较为传统的NMT模型,在现阶段,其表现已被很多新模型(如[Transformer](https://arxiv.org/abs/1706.03762))超越。但除机器翻译外,该模型是许多序列到序列(sequence to sequence, 以下简称Seq2Seq)类模型的基础,很多解决其他NLP问题的模型均以此模型为基础;因此其在NLP领域具有重要意义,并被广泛用作Baseline.
+本目录包含两个经典的机器翻译模型一个base model(不带attention机制),一个带attention机制的翻译模型 .在现阶段,其表现已被很多新模型(如[Transformer](https://arxiv.org/abs/1706.03762))超越。但除机器翻译外,该模型是许多序列到序列(sequence to sequence, 以下简称Seq2Seq)类模型的基础,很多解决其他NLP问题的模型均以此模型为基础;因此其在NLP领域具有重要意义,并被广泛用作Baseline.
本目录下此范例模型的实现,旨在展示如何用Paddle Fluid实现一个带有注意力机制(Attention)的RNN模型来解决Seq2Seq类问题,以及如何使用带有Beam Search算法的解码器。如果您仅仅只是需要在机器翻译方面有着较好翻译效果的模型,则建议您参考[Transformer的Paddle Fluid实现](https://github.com/PaddlePaddle/models/tree/develop/fluid/neural_machine_translation/transformer)。
## 模型概览
RNN Search模型使用了经典的编码器-解码器(Encoder-Decoder)的框架结构来解决Seq2Seq类问题。这种方法先用编码器将源序列编码成vector,再用解码器将该vector解码为目标序列。这其实模拟了人类在进行翻译类任务时的行为:先解析源语言,理解其含义,再根据该含义来写出目标语言的语句。编码器和解码器往往都使用RNN来实现。关于此方法的具体原理和数学表达式,可以参考[深度学习101](http://paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/basics/machine_translation/index.html).
-本模型中,在编码器方面,我们的实现使用了双向循环神经网络(Bi-directional Recurrent Neural Network);在解码器方面,我们使用了带注意力(Attention)机制的RNN解码器,并同时提供了一个不带注意力机制的解码器实现作为对比;而在预测方面我们使用柱搜索(beam search)算法来生成翻译的目标语句。以下将分别介绍用到的这些方法。
-
-### 双向循环神经网络
-这里介绍Bengio团队在论文\[[2](#参考文献),[4](#参考文献)\]中提出的一种双向循环网络结构。该结构的目的是输入一个序列,得到其在每个时刻的特征表示,即输出的每个时刻都用定长向量表示到该时刻的上下文语义信息。
-具体来说,该双向循环神经网络分别在时间维以顺序和逆序——即前向(forward)和后向(backward)——依次处理输入序列,并将每个时间步RNN的输出拼接成为最终的输出层。这样每个时间步的输出节点,都包含了输入序列中当前时刻完整的过去和未来的上下文信息。下图展示的是一个按时间步展开的双向循环神经网络。该网络包含一个前向和一个后向RNN,其中有六个权重矩阵:输入到前向隐层和后向隐层的权重矩阵($W_1, W_3$),隐层到隐层自己的权重矩阵($W_2,W_5$),前向隐层和后向隐层到输出层的权重矩阵($W_4, W_6$)。注意,该网络的前向隐层和后向隐层之间没有连接。
-
-
-
-图1. 按时间步展开的双向循环神经网络
-
-
-
-
-图2. 使用双向LSTM的编码器
-
-
-### 注意力机制
-如果编码阶段的输出是一个固定维度的向量,会带来以下两个问题:1)不论源语言序列的长度是5个词还是50个词,如果都用固定维度的向量去编码其中的语义和句法结构信息,对模型来说是一个非常高的要求,特别是对长句子序列而言;2)直觉上,当人类翻译一句话时,会对与当前译文更相关的源语言片段上给予更多关注,且关注点会随着翻译的进行而改变。而固定维度的向量则相当于,任何时刻都对源语言所有信息给予了同等程度的关注,这是不合理的。因此,Bahdanau等人\[[4](#参考文献)\]引入注意力(attention)机制,可以对编码后的上下文片段进行解码,以此来解决长句子的特征学习问题。下面介绍在注意力机制下的解码器结构。
-
-与简单的解码器不同,这里$z_i$的计算公式为 (由于Github原生不支持LaTeX公式,请您移步[这里](http://www.paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/basics/machine_translation/index.html)查看):
-
-$$z_{i+1}=\phi _{\theta '}\left ( c_i,u_i,z_i \right )$$
-
-可见,源语言句子的编码向量表示为第$i$个词的上下文片段$c_i$,即针对每一个目标语言中的词$u_i$,都有一个特定的$c_i$与之对应。$c_i$的计算公式如下:
-
-$$c_i=\sum _{j=1}^{T}a_{ij}h_j, a_i=\left[ a_{i1},a_{i2},...,a_{iT}\right ]$$
+本模型中,在编码器方面,我们采用了基于LSTM的多层的encoder;在解码器方面,我们使用了带注意力(Attention)机制的RNN decoder,并同时提供了一个不带注意力机制的解码器实现作为对比;而在预测方面我们使用柱搜索(beam search)算法来生成翻译的目标语句。以下将分别介绍用到的这些方法。
-从公式中可以看出,注意力机制是通过对编码器中各时刻的RNN状态$h_j$进行加权平均实现的。权重$a_{ij}$表示目标语言中第$i$个词对源语言中第$j$个词的注意力大小,$a_{ij}$的计算公式如下:
-
-$$a_{ij} = {exp(e_{ij}) \over {\sum_{k=1}^T exp(e_{ik})}}$$
-$$e_{ij} = {align(z_i, h_j)}$$
-
-其中,$align$可以看作是一个对齐模型,用来衡量目标语言中第$i$个词和源语言中第$j$个词的匹配程度。具体而言,这个程度是通过解码RNN的第$i$个隐层状态$z_i$和源语言句子的第$j$个上下文片段$h_j$计算得到的。传统的对齐模型中,目标语言的每个词明确对应源语言的一个或多个词(hard alignment);而在注意力模型中采用的是soft alignment,即任何两个目标语言和源语言词间均存在一定的关联,且这个关联强度是由模型计算得到的实数,因此可以融入整个NMT框架,并通过反向传播算法进行训练。
-
-
-
-图3. 基于注意力机制的解码器
-
+## 数据介绍
-### 柱搜索算法
+本教程使用[IWSLT'15 English-Vietnamese data ](https://nlp.stanford.edu/projects/nmt/)数据集中的英语到越南语的数据作为训练语料,tst2012的数据作为开发集,tst2013的数据作为测试集
-柱搜索([beam search](http://en.wikipedia.org/wiki/Beam_search))是一种启发式图搜索算法,用于在图或树中搜索有限集合中的最优扩展节点,通常用在解空间非常大的系统(如机器翻译、语音识别)中,原因是内存无法装下图或树中所有展开的解。如在机器翻译任务中希望翻译“`你好`”,就算目标语言字典中只有3个词(``, ``, `hello`),也可能生成无限句话(`hello`循环出现的次数不定),为了找到其中较好的翻译结果,我们可采用柱搜索算法。
+### 数据获取
+```sh
+cd data && sh download_en-vi.sh
+```
-柱搜索算法使用广度优先策略建立搜索树,在树的每一层,按照启发代价(heuristic cost)(本教程中,为生成词的log概率之和)对节点进行排序,然后仅留下预先确定的个数(文献中通常称为beam width、beam size、柱宽度等)的节点。只有这些节点会在下一层继续扩展,其他节点就被剪掉了,也就是说保留了质量较高的节点,剪枝了质量较差的节点。因此,搜索所占用的空间和时间大幅减少,但缺点是无法保证一定获得最优解。
-使用柱搜索算法的解码阶段,目标是最大化生成序列的概率。思路是:
+## 训练模型
-1. 每一个时刻,根据源语言句子的编码信息$c$、生成的第$i$个目标语言序列单词$u_i$和$i$时刻RNN的隐层状态$z_i$,计算出下一个隐层状态$z_{i+1}$。
-2. 将$z_{i+1}$通过`softmax`归一化,得到目标语言序列的第$i+1$个单词的概率分布$p_{i+1}$。
-3. 根据$p_{i+1}$采样出单词$u_{i+1}$。
-4. 重复步骤1~3,直到获得句子结束标记``或超过句子的最大生成长度为止。
+`run.sh`包含训练程序的主函数,要使用默认参数开始训练,只需要简单地执行:
+```sh
+python run.sh
+```
-注意:$z_{i+1}$和$p_{i+1}$的计算公式同解码器中的一样。且由于生成时的每一步都是通过贪心法实现的,因此并不能保证得到全局最优解。
+```sh
+ python train.py \
+ --src_lang en --tar_lang vi \
+ --attention True \
+ --num_layers 2 \
+ --hidden_size 512 \
+ --src_vocab_size 17191 \
+ --tar_vocab_size 7709 \
+ --batch_size 128 \
+ --dropout 0.2 \
+ --init_scale 0.1 \
+ --max_grad_norm 5.0 \
+ --train_data_prefix data/en-vi/train \
+ --eval_data_prefix data/en-vi/tst2012 \
+ --test_data_prefix data/en-vi/tst2013 \
+ --vocab_prefix data/en-vi/vocab \
+ --use_gpu True
-## 数据介绍
+```
-本教程使用[WMT-14](http://www-lium.univ-lemans.fr/~schwenk/cslm_joint_paper/)数据集中的[bitexts(after selection)](http://www-lium.univ-lemans.fr/~schwenk/cslm_joint_paper/data/bitexts.tgz)作为训练集,[dev+test data](http://www-lium.univ-lemans.fr/~schwenk/cslm_joint_paper/data/dev+test.tgz)作为测试集和生成集。
-### 数据预处理
+训练程序会在每个epoch训练结束之后,save一次模型
-我们的预处理流程包括两步:
-- 将每个源语言到目标语言的平行语料库文件合并为一个文件:
- - 合并每个`XXX.src`和`XXX.trg`文件为`XXX`。
- - `XXX`中的第$i$行内容为`XXX.src`中的第$i$行和`XXX.trg`中的第$i$行连接,用'\t'分隔。
-- 创建训练数据的“源字典”和“目标字典”。每个字典都有**DICTSIZE**个单词,包括:语料中词频最高的(DICTSIZE - 3)个单词,和3个特殊符号``(序列的开始)、``(序列的结束)和``(未登录词)。
+当模型训练完成之后, 可以利用infer.py的脚本进行预测,默认使用beam search的方法进行预测,加载第10个epoch的模型进行预测,对test的数据集进行解码
+```sh
+python infer.sh
+```
+如果想预测别的数据文件,只需要将 --infer_file参数进行修改
-### 示例数据
+```sh
+ python infer.py \
+ --src_lang en --tar_lang vi \
+ --num_layers 2 \
+ --hidden_size 512 \
+ --src_vocab_size 17191 \
+ --tar_vocab_size 7709 \
+ --batch_size 128 \
+ --dropout 0.2 \
+ --init_scale 0.1 \
+ --max_grad_norm 5.0 \
+ --vocab_prefix data/en-vi/vocab \
+ --infer_file data/en-vi/tst2013.en \
+ --reload_model model_new/epoch_10/ \
+ --use_gpu True
-因为完整的数据集数据量较大,为了验证训练流程,PaddlePaddle接口paddle.dataset.wmt14中默认提供了一个经过预处理的[较小规模的数据集](http://paddlepaddle.bj.bcebos.com/demo/wmt_shrinked_data/wmt14.tgz)。
+```
-该数据集有193319条训练数据,6003条测试数据,词典长度为30000。因为数据规模限制,使用该数据集训练出来的模型效果无法保证。
+## 效果
-## 训练模型
+单个模型 beam_size = 10
-`train.py`包含训练程序的主函数,要使用默认参数开始训练,只需要简单地执行:
```sh
-python train.py
-```
-您可以使用命令行参数来设置模型训练时的参数。要显示所有可用的命令行参数,执行:
-```sh
-python train.py -h
-```
-这样会显示所有的命令行参数的描述,以及其默认值。默认的模型是带有注意力机制的。您也可以尝试运行无注意力机制的模型,命令如下:
-```sh
-python train.py --no_attention
-```
-训练好的模型默认会被保存到`./models`路径下。您可以用命令行参数`--save_dir`来指定模型的保存路径。默认每个pass结束时会保存一个模型。
+no attention
-## 生成预测结果
+tst2012 BLEU: 11.58
+tst2013 BLEU: 12.20
-在模型训练好后,可以用`infer.py`来生成预测结果。同样的,使用默认参数,只需要执行:
-```sh
-python infer.py
-```
-您也可以同样用命令行来指定各参数。注意,预测时的参数设置必须与训练时完全一致,否则载入模型会失败。您可以用`--pass_num`参数来选择读取哪个pass结束时保存的模型。同时您可以使用`--beam_width`参数来选择beam search宽度。
-## 参考文献
-1. Koehn P. [Statistical machine translation](https://books.google.com.hk/books?id=4v_Cx1wIMLkC&printsec=frontcover&hl=zh-CN&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false)[M]. Cambridge University Press, 2009.
-2. Cho K, Van Merriënboer B, Gulcehre C, et al. [Learning phrase representations using RNN encoder-decoder for statistical machine translation](http://www.aclweb.org/anthology/D/D14/D14-1179.pdf)[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014: 1724-1734.
-3. Chung J, Gulcehre C, Cho K H, et al. [Empirical evaluation of gated recurrent neural networks on sequence modeling](https://arxiv.org/abs/1412.3555)[J]. arXiv preprint arXiv:1412.3555, 2014.
-4. Bahdanau D, Cho K, Bengio Y. [Neural machine translation by jointly learning to align and translate](https://arxiv.org/abs/1409.0473)[C]//Proceedings of ICLR 2015, 2015.
-5. Papineni K, Roukos S, Ward T, et al. [BLEU: a method for automatic evaluation of machine translation](http://dl.acm.org/citation.cfm?id=1073135)[C]//Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 2002: 311-318.
+with attention
-
-
本教程 由 PaddlePaddle 创作,采用 知识共享 署名-相同方式共享 4.0 国际 许可协议进行许可。
+tst2012 BLEU: 22.21
+tst2013 BLEU: 25.30
+```
diff --git a/PaddleNLP/unarchived/neural_machine_translation/rnn_search/_ce.py b/PaddleNLP/unarchived/neural_machine_translation/rnn_search/_ce.py
deleted file mode 100644
index e00ac49273ba4bf489e9b837d65d448eaa2aea43..0000000000000000000000000000000000000000
--- a/PaddleNLP/unarchived/neural_machine_translation/rnn_search/_ce.py
+++ /dev/null
@@ -1,63 +0,0 @@
-####this file is only used for continuous evaluation test!
-
-import os
-import sys
-sys.path.append(os.environ['ceroot'])
-from kpi import CostKpi, DurationKpi, AccKpi
-
-#### NOTE kpi.py should shared in models in some way!!!!
-
-train_cost_kpi = CostKpi('train_cost', 0.02, 0, actived=False)
-test_cost_kpi = CostKpi('test_cost', 0.005, 0, actived=False)
-train_duration_kpi = DurationKpi('train_duration', 0.06, 0, actived=False)
-
-tracking_kpis = [
- train_cost_kpi,
- test_cost_kpi,
- train_duration_kpi,
-]
-
-
-def parse_log(log):
- '''
- This method should be implemented by model developers.
-
- The suggestion:
-
- each line in the log should be key, value, for example:
-
- "
- train_cost\t1.0
- test_cost\t1.0
- train_cost\t1.0
- train_cost\t1.0
- train_acc\t1.2
- "
- '''
- for line in log.split('\n'):
- fs = line.strip().split('\t')
- print(fs)
- if len(fs) == 3 and fs[0] == 'kpis':
- print("-----%s" % fs)
- kpi_name = fs[1]
- kpi_value = float(fs[2])
- yield kpi_name, kpi_value
-
-
-def log_to_ce(log):
- kpi_tracker = {}
- for kpi in tracking_kpis:
- kpi_tracker[kpi.name] = kpi
-
- for (kpi_name, kpi_value) in parse_log(log):
- print(kpi_name, kpi_value)
- kpi_tracker[kpi_name].add_record(kpi_value)
- kpi_tracker[kpi_name].persist()
-
-
-if __name__ == '__main__':
- log = sys.stdin.read()
- print("*****")
- print(log)
- print("****")
- log_to_ce(log)
diff --git a/PaddleNLP/unarchived/neural_machine_translation/rnn_search/args.py b/PaddleNLP/unarchived/neural_machine_translation/rnn_search/args.py
index 16f97488d8b976a6eff7dfa38ccddf93fadcbf18..494289a7ace2506d52e4e6a7ff050ceff0fdf4d9 100644
--- a/PaddleNLP/unarchived/neural_machine_translation/rnn_search/args.py
+++ b/PaddleNLP/unarchived/neural_machine_translation/rnn_search/args.py
@@ -23,76 +23,95 @@ import distutils.util
def parse_args():
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument(
- "--embedding_dim",
- type=int,
- default=512,
- help="The dimension of embedding table. (default: %(default)d)")
+ "--train_data_prefix", type=str, help="file prefix for train data")
parser.add_argument(
- "--encoder_size",
- type=int,
- default=512,
- help="The size of encoder bi-rnn unit. (default: %(default)d)")
+ "--eval_data_prefix", type=str, help="file prefix for eval data")
parser.add_argument(
- "--decoder_size",
- type=int,
- default=512,
- help="The size of decoder rnn unit. (default: %(default)d)")
+ "--test_data_prefix", type=str, help="file prefix for test data")
parser.add_argument(
- "--batch_size",
- type=int,
- default=32,
- help="The sequence number of a mini-batch data. (default: %(default)d)")
+ "--vocab_prefix", type=str, help="file prefix for vocab")
+ parser.add_argument("--src_lang", type=str, help="source language suffix")
+ parser.add_argument("--tar_lang", type=str, help="target language suffix")
+
parser.add_argument(
- "--dict_size",
- type=int,
- default=30000,
- help="The dictionary capacity. Dictionaries of source sequence and "
- "target dictionary have same capacity. (default: %(default)d)")
+ "--attention",
+ type=bool,
+ default=False,
+ help="Whether use attention model")
+
parser.add_argument(
- "--pass_num",
- type=int,
- default=5,
- help="The pass number to train. In inference mode, load the saved model"
- " at the end of given pass.(default: %(default)d)")
+ "--optimizer",
+ type=str,
+ default='adam',
+ help="optimizer to use, only supprt[sgd|adam]")
+
parser.add_argument(
"--learning_rate",
type=float,
- default=0.01,
- help="Learning rate used to train the model. (default: %(default)f)")
+ default=0.001,
+ help="learning rate for optimizer")
+
parser.add_argument(
- "--no_attention",
- action='store_true',
- help="If set, run no attention model instead of attention model.")
+ "--num_layers",
+ type=int,
+ default=1,
+ help="layers number of encoder and decoder")
parser.add_argument(
- "--beam_size",
+ "--hidden_size",
type=int,
- default=3,
- help="The width for beam search. (default: %(default)d)")
+ default=100,
+ help="hidden size of encoder and decoder")
+ parser.add_argument("--src_vocab_size", type=int, help="source vocab size")
+ parser.add_argument("--tar_vocab_size", type=int, help="target vocab size")
+
+ parser.add_argument(
+ "--batch_size", type=int, help="batch size of each step")
+
parser.add_argument(
- "--use_gpu",
- type=distutils.util.strtobool,
- default=True,
- help="Whether to use gpu or not. (default: %(default)d)")
+ "--max_epoch", type=int, default=12, help="max epoch for the training")
+
parser.add_argument(
- "--max_length",
+ "--max_len",
type=int,
default=50,
- help="The maximum sequence length for translation result."
- "(default: %(default)d)")
+ help="max length for source and target sentence")
parser.add_argument(
- "--save_dir",
+ "--dropout", type=float, default=0.0, help="drop probability")
+ parser.add_argument(
+ "--init_scale",
+ type=float,
+ default=0.0,
+ help="init scale for parameter")
+ parser.add_argument(
+ "--max_grad_norm",
+ type=float,
+ default=5.0,
+ help="max grad norm for global norm clip")
+
+ parser.add_argument(
+ "--model_path",
type=str,
- default="model",
- help="Specify the path to save trained models.")
+ default='./model',
+ help="model path for model to save")
+
parser.add_argument(
- "--save_interval",
- type=int,
- default=1,
- help="Save the trained model every n passes."
- "(default: %(default)d)")
+ "--reload_model", type=str, help="reload model to inference")
+
+ parser.add_argument(
+ "--infer_file", type=str, help="file name for inference")
+ parser.add_argument(
+ "--infer_output_file",
+ type=str,
+ default='./infer_output',
+ help="file name for inference output")
+ parser.add_argument(
+ "--beam_size", type=int, default=10, help="file name for inference")
+
parser.add_argument(
- "--enable_ce",
- action='store_true',
- help="If set, run the task with continuous evaluation logs.")
+ '--use_gpu',
+ type=bool,
+ default=False,
+ help='Whether using gpu [True|False]')
+
args = parser.parse_args()
return args
diff --git a/PaddleNLP/unarchived/neural_machine_translation/rnn_search/attention_model.py b/PaddleNLP/unarchived/neural_machine_translation/rnn_search/attention_model.py
index 0c72697786819179dabce477a9c8d1be760dca28..eba1d5f36c09d1314de716902234ae41c9536a15 100644
--- a/PaddleNLP/unarchived/neural_machine_translation/rnn_search/attention_model.py
+++ b/PaddleNLP/unarchived/neural_machine_translation/rnn_search/attention_model.py
@@ -1,220 +1,471 @@
-# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
+import paddle.fluid.layers as layers
import paddle.fluid as fluid
-from paddle.fluid.contrib.decoder.beam_search_decoder import *
-
-
-def lstm_step(x_t, hidden_t_prev, cell_t_prev, size):
- def linear(inputs):
- return fluid.layers.fc(input=inputs, size=size, bias_attr=True)
-
- forget_gate = fluid.layers.sigmoid(x=linear([hidden_t_prev, x_t]))
- input_gate = fluid.layers.sigmoid(x=linear([hidden_t_prev, x_t]))
- output_gate = fluid.layers.sigmoid(x=linear([hidden_t_prev, x_t]))
- cell_tilde = fluid.layers.tanh(x=linear([hidden_t_prev, x_t]))
-
- cell_t = fluid.layers.sums(input=[
- fluid.layers.elementwise_mul(
- x=forget_gate, y=cell_t_prev), fluid.layers.elementwise_mul(
- x=input_gate, y=cell_tilde)
- ])
-
- hidden_t = fluid.layers.elementwise_mul(
- x=output_gate, y=fluid.layers.tanh(x=cell_t))
-
- return hidden_t, cell_t
-
-
-def seq_to_seq_net(embedding_dim, encoder_size, decoder_size, source_dict_dim,
- target_dict_dim, is_generating, beam_size, max_length):
- """Construct a seq2seq network."""
-
- def bi_lstm_encoder(input_seq, gate_size):
- # A bi-directional lstm encoder implementation.
- # Linear transformation part for input gate, output gate, forget gate
- # and cell activation vectors need be done outside of dynamic_lstm.
- # So the output size is 4 times of gate_size.
- input_forward_proj = fluid.layers.fc(input=input_seq,
- size=gate_size * 4,
- act='tanh',
- bias_attr=False)
- forward, _ = fluid.layers.dynamic_lstm(
- input=input_forward_proj, size=gate_size * 4, use_peepholes=False)
- input_reversed_proj = fluid.layers.fc(input=input_seq,
- size=gate_size * 4,
- act='tanh',
- bias_attr=False)
- reversed, _ = fluid.layers.dynamic_lstm(
- input=input_reversed_proj,
- size=gate_size * 4,
- is_reverse=True,
- use_peepholes=False)
- return forward, reversed
-
- # The encoding process. Encodes the input words into tensors.
- src_word_idx = fluid.layers.data(
- name='source_sequence', shape=[1], dtype='int64', lod_level=1)
-
- src_embedding = fluid.layers.embedding(
- input=src_word_idx,
- size=[source_dict_dim, embedding_dim],
- dtype='float32')
-
- src_forward, src_reversed = bi_lstm_encoder(
- input_seq=src_embedding, gate_size=encoder_size)
-
- encoded_vector = fluid.layers.concat(
- input=[src_forward, src_reversed], axis=1)
-
- encoded_proj = fluid.layers.fc(input=encoded_vector,
- size=decoder_size,
- bias_attr=False)
-
- backward_first = fluid.layers.sequence_pool(
- input=src_reversed, pool_type='first')
-
- decoder_boot = fluid.layers.fc(input=backward_first,
- size=decoder_size,
- bias_attr=False,
- act='tanh')
-
- cell_init = fluid.layers.fill_constant_batch_size_like(
- input=decoder_boot,
- value=0.0,
- shape=[-1, decoder_size],
- dtype='float32')
- cell_init.stop_gradient = False
-
- # Create a RNN state cell by providing the input and hidden states, and
- # specifies the hidden state as output.
- h = InitState(init=decoder_boot, need_reorder=True)
- c = InitState(init=cell_init)
-
- state_cell = StateCell(
- inputs={'x': None,
- 'encoder_vec': None,
- 'encoder_proj': None},
- states={'h': h,
- 'c': c},
- out_state='h')
-
- def simple_attention(encoder_vec, encoder_proj, decoder_state):
- # The implementation of simple attention model
- decoder_state_proj = fluid.layers.fc(input=decoder_state,
- size=decoder_size,
- bias_attr=False)
- decoder_state_expand = fluid.layers.sequence_expand(
- x=decoder_state_proj, y=encoder_proj)
- # concated lod should inherit from encoder_proj
- mixed_state = encoder_proj + decoder_state_expand
- attention_weights = fluid.layers.fc(input=mixed_state,
- size=1,
- bias_attr=False)
- attention_weights = fluid.layers.sequence_softmax(
- input=attention_weights)
- weigths_reshape = fluid.layers.reshape(x=attention_weights, shape=[-1])
- scaled = fluid.layers.elementwise_mul(
- x=encoder_vec, y=weigths_reshape, axis=0)
- context = fluid.layers.sequence_pool(input=scaled, pool_type='sum')
- return context
-
- @state_cell.state_updater
- def state_updater(state_cell):
- # Define the updater of RNN state cell
- current_word = state_cell.get_input('x')
- encoder_vec = state_cell.get_input('encoder_vec')
- encoder_proj = state_cell.get_input('encoder_proj')
- prev_h = state_cell.get_state('h')
- prev_c = state_cell.get_state('c')
- context = simple_attention(encoder_vec, encoder_proj, prev_h)
- decoder_inputs = fluid.layers.concat(
- input=[context, current_word], axis=1)
- h, c = lstm_step(decoder_inputs, prev_h, prev_c, decoder_size)
- state_cell.set_state('h', h)
- state_cell.set_state('c', c)
-
- # Define the decoding process
- if not is_generating:
- # Training process
- trg_word_idx = fluid.layers.data(
- name='target_sequence', shape=[1], dtype='int64', lod_level=1)
-
- trg_embedding = fluid.layers.embedding(
- input=trg_word_idx,
- size=[target_dict_dim, embedding_dim],
- dtype='float32')
-
- # A decoder for training
- decoder = TrainingDecoder(state_cell)
-
- with decoder.block():
- current_word = decoder.step_input(trg_embedding)
- encoder_vec = decoder.static_input(encoded_vector)
- encoder_proj = decoder.static_input(encoded_proj)
- decoder.state_cell.compute_state(inputs={
- 'x': current_word,
- 'encoder_vec': encoder_vec,
- 'encoder_proj': encoder_proj
- })
- h = decoder.state_cell.get_state('h')
- decoder.state_cell.update_states()
- out = fluid.layers.fc(input=h,
- size=target_dict_dim,
- bias_attr=True,
- act='softmax')
- decoder.output(out)
-
- label = fluid.layers.data(
- name='label_sequence', shape=[1], dtype='int64', lod_level=1)
- cost = fluid.layers.cross_entropy(input=decoder(), label=label)
- avg_cost = fluid.layers.mean(x=cost)
- feeding_list = ["source_sequence", "target_sequence", "label_sequence"]
- return avg_cost, feeding_list
-
- else:
- # Inference
- init_ids = fluid.layers.data(
- name="init_ids", shape=[1], dtype="int64", lod_level=2)
- init_scores = fluid.layers.data(
- name="init_scores", shape=[1], dtype="float32", lod_level=2)
-
- # A beam search decoder
- decoder = BeamSearchDecoder(
- state_cell=state_cell,
- init_ids=init_ids,
- init_scores=init_scores,
- target_dict_dim=target_dict_dim,
- word_dim=embedding_dim,
- input_var_dict={
- 'encoder_vec': encoded_vector,
- 'encoder_proj': encoded_proj
- },
- topk_size=50,
- sparse_emb=True,
- max_len=max_length,
- beam_size=beam_size,
- end_id=1,
- name=None)
-
- decoder.decode()
-
- translation_ids, translation_scores = decoder()
- feeding_list = ["source_sequence"]
-
- return translation_ids, translation_scores, feeding_list
+from paddle.fluid.layers.control_flow import StaticRNN
+import numpy as np
+from paddle.fluid import ParamAttr
+from paddle.fluid.contrib.layers import basic_lstm, BasicLSTMUnit
+from base_model import BaseModel
+
+INF = 1. * 1e5
+alpha = 0.6
+
+
+class AttentionModel(BaseModel):
+ def __init__(self,
+ hidden_size,
+ src_vocab_size,
+ tar_vocab_size,
+ batch_size,
+ num_layers=1,
+ init_scale=0.1,
+ dropout=None,
+ batch_first=True):
+ super(AttentionModel, self).__init__(
+ hidden_size,
+ src_vocab_size,
+ tar_vocab_size,
+ batch_size,
+ num_layers=num_layers,
+ init_scale=init_scale,
+ dropout=dropout,
+ batch_first=batch_first)
+
+ def _build_decoder(self,
+ enc_last_hidden,
+ enc_last_cell,
+ mode='train',
+ beam_size=10):
+
+ dec_input = layers.transpose(self.tar_emb, [1, 0, 2])
+ dec_unit_list = []
+ for i in range(self.num_layers):
+ new_name = "dec_layers_" + str(i)
+ dec_unit_list.append(
+ BasicLSTMUnit(
+ new_name,
+ self.hidden_size,
+ ParamAttr(initializer=fluid.initializer.UniformInitializer(
+ low=-self.init_scale, high=self.init_scale)),
+ ParamAttr(initializer=fluid.initializer.Constant(0.0)), ))
+
+
+ attention_weight = layers.create_parameter([self.hidden_size * 2, self.hidden_size], dtype="float32", name="attention_weight", \
+ default_initializer=fluid.initializer.UniformInitializer(low=-self.init_scale, high=self.init_scale))
+
+ memory_weight = layers.create_parameter([self.hidden_size, self.hidden_size], dtype="float32", name="memory_weight", \
+ default_initializer=fluid.initializer.UniformInitializer(low=-self.init_scale, high=self.init_scale))
+
+ def dot_attention(query, memory, mask=None):
+ attn = layers.matmul(query, memory, transpose_y=True)
+
+ if mask:
+ attn = layers.transpose(attn, [1, 0, 2])
+ attn = layers.elementwise_add(attn, mask * 1000000000, -1)
+ attn = layers.transpose(attn, [1, 0, 2])
+ weight = layers.softmax(attn)
+ weight_memory = layers.matmul(weight, memory)
+
+ return weight_memory, weight
+
+ max_src_seq_len = layers.shape(self.src)[1]
+ src_mask = layers.sequence_mask(
+ self.src_sequence_length, maxlen=max_src_seq_len, dtype='float32')
+
+ softmax_weight = layers.create_parameter([self.hidden_size, self.tar_vocab_size], dtype="float32", name="softmax_weight", \
+ default_initializer=fluid.initializer.UniformInitializer(low=-self.init_scale, high=self.init_scale))
+
+ def decoder_step(currrent_in, pre_feed, pre_hidden_array,
+ pre_cell_array, enc_memory):
+ new_hidden_array = []
+ new_cell_array = []
+
+ step_input = layers.concat([currrent_in, pre_feed], 1)
+
+ for i in range(self.num_layers):
+ pre_hidden = pre_hidden_array[i]
+ pre_cell = pre_cell_array[i]
+
+ new_hidden, new_cell = dec_unit_list[i](step_input, pre_hidden,
+ pre_cell)
+
+ new_hidden_array.append(new_hidden)
+ new_cell_array.append(new_cell)
+
+ step_input = new_hidden
+
+ memory_mask = src_mask - 1.0
+ enc_memory = layers.matmul(enc_memory, memory_weight)
+ att_in = layers.unsqueeze(step_input, [1])
+ dec_att, _ = dot_attention(att_in, enc_memory)
+ dec_att = layers.squeeze(dec_att, [1])
+ concat_att_out = layers.concat([dec_att, step_input], 1)
+ concat_att_out = layers.matmul(concat_att_out, attention_weight)
+
+ return concat_att_out, new_hidden_array, new_cell_array
+
+ if mode == "train":
+ dec_rnn = StaticRNN()
+ with dec_rnn.step():
+ step_input = dec_rnn.step_input(dec_input)
+ input_feed = dec_rnn.memory(
+ batch_ref=dec_input, shape=[-1, self.hidden_size])
+ step_input = layers.concat([step_input, input_feed], 1)
+
+ for i in range(self.num_layers):
+ pre_hidden = dec_rnn.memory(init=enc_last_hidden[i])
+ pre_cell = dec_rnn.memory(init=enc_last_cell[i])
+
+ new_hidden, new_cell = dec_unit_list[i](
+ step_input, pre_hidden, pre_cell)
+
+ dec_rnn.update_memory(pre_hidden, new_hidden)
+ dec_rnn.update_memory(pre_cell, new_cell)
+
+ step_input = new_hidden
+
+ if self.dropout != None and self.dropout > 0.0:
+ print("using dropout", self.dropout)
+ step_input = fluid.layers.dropout(
+ step_input,
+ dropout_prob=self.dropout,
+ dropout_implementation='upscale_in_train')
+ memory_mask = src_mask - 1.0
+ enc_memory = layers.matmul(self.enc_output, memory_weight)
+ att_in = layers.unsqueeze(step_input, [1])
+ dec_att, _ = dot_attention(att_in, enc_memory, memory_mask)
+ dec_att = layers.squeeze(dec_att, [1])
+ concat_att_out = layers.concat([dec_att, step_input], 1)
+ concat_att_out = layers.matmul(concat_att_out, attention_weight)
+ #concat_att_out = layers.tanh( concat_att_out )
+
+ dec_rnn.update_memory(input_feed, concat_att_out)
+
+ dec_rnn.step_output(concat_att_out)
+
+ dec_rnn_out = dec_rnn()
+ dec_output = layers.transpose(dec_rnn_out, [1, 0, 2])
+
+ dec_output = layers.matmul(dec_output, softmax_weight)
+
+ return dec_output
+ elif mode == 'beam_search':
+
+ max_length = max_src_seq_len * 2
+ #max_length = layers.fill_constant( [1], dtype='int32', value = 10)
+ pre_ids = layers.fill_constant([1, 1], dtype='int64', value=1)
+ full_ids = layers.fill_constant([1, 1], dtype='int64', value=1)
+
+ score = layers.fill_constant([1], dtype='float32', value=0.0)
+
+ #eos_ids = layers.fill_constant( [1, 1], dtype='int64', value=2)
+
+ pre_hidden_array = []
+ pre_cell_array = []
+ pre_feed = layers.fill_constant(
+ [beam_size, self.hidden_size], dtype='float32', value=0)
+ for i in range(self.num_layers):
+ pre_hidden_array.append(
+ layers.expand(enc_last_hidden[i], [beam_size, 1]))
+ pre_cell_array.append(
+ layers.expand(enc_last_cell[i], [beam_size, 1]))
+
+ eos_ids = layers.fill_constant([beam_size], dtype='int64', value=2)
+ init_score = np.zeros((beam_size)).astype('float32')
+ init_score[1:] = -INF
+ pre_score = layers.assign(init_score)
+ #pre_score = layers.fill_constant( [1,], dtype='float32', value= 0.0)
+ tokens = layers.fill_constant(
+ [beam_size, 1], dtype='int64', value=1)
+
+ enc_memory = layers.expand(self.enc_output, [beam_size, 1, 1])
+
+ pre_tokens = layers.fill_constant(
+ [beam_size, 1], dtype='int64', value=1)
+
+ finished_seq = layers.fill_constant(
+ [beam_size, 1], dtype='int64', value=0)
+ finished_scores = layers.fill_constant(
+ [beam_size], dtype='float32', value=-INF)
+ finished_flag = layers.fill_constant(
+ [beam_size], dtype='float32', value=0.0)
+
+ step_idx = layers.fill_constant(shape=[1], dtype='int32', value=0)
+ cond = layers.less_than(
+ x=step_idx, y=max_length) # default force_cpu=True
+
+ parent_idx = layers.fill_constant([1], dtype='int32', value=0)
+ while_op = layers.While(cond)
+
+ def compute_topk_scores_and_seq(sequences,
+ scores,
+ scores_to_gather,
+ flags,
+ beam_size,
+ select_beam=None,
+ generate_id=None):
+ scores = layers.reshape(scores, shape=[1, -1])
+ _, topk_indexs = layers.topk(scores, k=beam_size)
+
+ topk_indexs = layers.reshape(topk_indexs, shape=[-1])
+
+ # gather result
+
+ top_seq = layers.gather(sequences, topk_indexs)
+ topk_flags = layers.gather(flags, topk_indexs)
+ topk_gather_scores = layers.gather(scores_to_gather,
+ topk_indexs)
+
+ if select_beam:
+ topk_beam = layers.gather(select_beam, topk_indexs)
+ else:
+ topk_beam = select_beam
+
+ if generate_id:
+ topk_id = layers.gather(generate_id, topk_indexs)
+ else:
+ topk_id = generate_id
+ return top_seq, topk_gather_scores, topk_flags, topk_beam, topk_id
+
+ def grow_alive(curr_seq, curr_scores, curr_log_probs, curr_finished,
+ select_beam, generate_id):
+ curr_scores += curr_finished * -INF
+ return compute_topk_scores_and_seq(
+ curr_seq,
+ curr_scores,
+ curr_log_probs,
+ curr_finished,
+ beam_size,
+ select_beam,
+ generate_id=generate_id)
+
+ def grow_finished(finished_seq, finished_scores, finished_flag,
+ curr_seq, curr_scores, curr_finished):
+ finished_seq = layers.concat(
+ [
+ finished_seq, layers.fill_constant(
+ [beam_size, 1], dtype='int64', value=1)
+ ],
+ axis=1)
+ curr_scores += (1.0 - curr_finished) * -INF
+ #layers.Print( curr_scores, message="curr scores")
+ curr_finished_seq = layers.concat(
+ [finished_seq, curr_seq], axis=0)
+ curr_finished_scores = layers.concat(
+ [finished_scores, curr_scores], axis=0)
+ curr_finished_flags = layers.concat(
+ [finished_flag, curr_finished], axis=0)
+
+ return compute_topk_scores_and_seq(
+ curr_finished_seq, curr_finished_scores,
+ curr_finished_scores, curr_finished_flags, beam_size)
+
+ def is_finished(alive_log_prob, finished_scores,
+ finished_in_finished):
+
+ max_out_len = 200
+ max_length_penalty = layers.pow(layers.fill_constant(
+ [1], dtype='float32', value=((5.0 + max_out_len) / 6.0)),
+ alpha)
+
+ lower_bound_alive_score = layers.slice(
+ alive_log_prob, starts=[0], ends=[1],
+ axes=[0]) / max_length_penalty
+
+ lowest_score_of_fininshed_in_finished = finished_scores * finished_in_finished
+ lowest_score_of_fininshed_in_finished += (
+ 1.0 - finished_in_finished) * -INF
+ lowest_score_of_fininshed_in_finished = layers.reduce_min(
+ lowest_score_of_fininshed_in_finished)
+
+ met = layers.less_than(lower_bound_alive_score,
+ lowest_score_of_fininshed_in_finished)
+ met = layers.cast(met, 'float32')
+ bound_is_met = layers.reduce_sum(met)
+
+ finished_eos_num = layers.reduce_sum(finished_in_finished)
+
+ finish_cond = layers.less_than(
+ finished_eos_num,
+ layers.fill_constant(
+ [1], dtype='float32', value=beam_size))
+
+ return finish_cond
+
+ def grow_top_k(step_idx, alive_seq, alive_log_prob, parant_idx):
+ pre_ids = alive_seq
+
+ dec_step_emb = layers.embedding(
+ input=pre_ids,
+ size=[self.tar_vocab_size, self.hidden_size],
+ dtype='float32',
+ is_sparse=False,
+ param_attr=fluid.ParamAttr(
+ name='target_embedding',
+ initializer=fluid.initializer.UniformInitializer(
+ low=-self.init_scale, high=self.init_scale)))
+
+ dec_att_out, new_hidden_array, new_cell_array = decoder_step(
+ dec_step_emb, pre_feed, pre_hidden_array, pre_cell_array,
+ enc_memory)
+
+ projection = layers.matmul(dec_att_out, softmax_weight)
+
+ logits = layers.softmax(projection)
+ current_log = layers.elementwise_add(
+ x=layers.log(logits), y=alive_log_prob, axis=0)
+ base_1 = layers.cast(step_idx, 'float32') + 6.0
+ base_1 /= 6.0
+ length_penalty = layers.pow(base_1, alpha)
+
+ len_pen = layers.pow((
+ (5. + layers.cast(step_idx + 1, 'float32')) / 6.), alpha)
+
+ current_log = layers.reshape(current_log, shape=[1, -1])
+
+ current_log = current_log / length_penalty
+ topk_scores, topk_indices = layers.topk(
+ input=current_log, k=beam_size)
+
+ topk_scores = layers.reshape(topk_scores, shape=[-1])
+
+ topk_log_probs = topk_scores * length_penalty
+
+ generate_id = layers.reshape(
+ topk_indices, shape=[-1]) % self.tar_vocab_size
+
+ selected_beam = layers.reshape(
+ topk_indices, shape=[-1]) // self.tar_vocab_size
+
+ topk_finished = layers.equal(generate_id, eos_ids)
+
+ topk_finished = layers.cast(topk_finished, 'float32')
+
+ generate_id = layers.reshape(generate_id, shape=[-1, 1])
+
+ pre_tokens_list = layers.gather(tokens, selected_beam)
+
+ full_tokens_list = layers.concat(
+ [pre_tokens_list, generate_id], axis=1)
+
+
+ return full_tokens_list, topk_log_probs, topk_scores, topk_finished, selected_beam, generate_id, \
+ dec_att_out, new_hidden_array, new_cell_array
+
+ with while_op.block():
+ topk_seq, topk_log_probs, topk_scores, topk_finished, topk_beam, topk_generate_id, attention_out, new_hidden_array, new_cell_array = \
+ grow_top_k( step_idx, pre_tokens, pre_score, parent_idx)
+ alive_seq, alive_log_prob, _, alive_beam, alive_id = grow_alive(
+ topk_seq, topk_scores, topk_log_probs, topk_finished,
+ topk_beam, topk_generate_id)
+
+ finished_seq_2, finished_scores_2, finished_flags_2, _, _ = grow_finished(
+ finished_seq, finished_scores, finished_flag, topk_seq,
+ topk_scores, topk_finished)
+
+ finished_cond = is_finished(alive_log_prob, finished_scores_2,
+ finished_flags_2)
+
+ layers.increment(x=step_idx, value=1.0, in_place=True)
+
+ layers.assign(alive_beam, parent_idx)
+ layers.assign(alive_id, pre_tokens)
+ layers.assign(alive_log_prob, pre_score)
+ layers.assign(alive_seq, tokens)
+ layers.assign(finished_seq_2, finished_seq)
+ layers.assign(finished_scores_2, finished_scores)
+ layers.assign(finished_flags_2, finished_flag)
+
+ # update init_hidden, init_cell, input_feed
+ new_feed = layers.gather(attention_out, parent_idx)
+ layers.assign(new_feed, pre_feed)
+ for i in range(self.num_layers):
+ new_hidden_var = layers.gather(new_hidden_array[i],
+ parent_idx)
+ layers.assign(new_hidden_var, pre_hidden_array[i])
+ new_cell_var = layers.gather(new_cell_array[i], parent_idx)
+ layers.assign(new_cell_var, pre_cell_array[i])
+
+ length_cond = layers.less_than(x=step_idx, y=max_length)
+ layers.logical_and(x=length_cond, y=finished_cond, out=cond)
+
+ tokens_with_eos = tokens
+
+ all_seq = layers.concat([tokens_with_eos, finished_seq], axis=0)
+ all_score = layers.concat([pre_score, finished_scores], axis=0)
+ _, topk_index = layers.topk(all_score, k=beam_size)
+ topk_index = layers.reshape(topk_index, shape=[-1])
+ final_seq = layers.gather(all_seq, topk_index)
+ final_score = layers.gather(all_score, topk_index)
+
+ return final_seq
+ elif mode == 'greedy_search':
+ max_length = max_src_seq_len * 2
+ #max_length = layers.fill_constant( [1], dtype='int32', value = 10)
+ pre_ids = layers.fill_constant([1, 1], dtype='int64', value=1)
+ full_ids = layers.fill_constant([1, 1], dtype='int64', value=1)
+
+ score = layers.fill_constant([1], dtype='float32', value=0.0)
+
+ eos_ids = layers.fill_constant([1, 1], dtype='int64', value=2)
+
+ pre_hidden_array = []
+ pre_cell_array = []
+ pre_feed = layers.fill_constant(
+ [1, self.hidden_size], dtype='float32', value=0)
+ for i in range(self.num_layers):
+ pre_hidden_array.append(enc_last_hidden[i])
+ pre_cell_array.append(enc_last_cell[i])
+ #pre_hidden_array.append( layers.fill_constant( [1, hidden_size], dtype='float32', value=0) )
+ #pre_cell_array.append( layers.fill_constant( [1, hidden_size], dtype='float32', value=0) )
+
+ step_idx = layers.fill_constant(shape=[1], dtype='int32', value=0)
+ cond = layers.less_than(
+ x=step_idx, y=max_length) # default force_cpu=True
+ while_op = layers.While(cond)
+
+ with while_op.block():
+
+ dec_step_emb = layers.embedding(
+ input=pre_ids,
+ size=[self.tar_vocab_size, self.hidden_size],
+ dtype='float32',
+ is_sparse=False,
+ param_attr=fluid.ParamAttr(
+ name='target_embedding',
+ initializer=fluid.initializer.UniformInitializer(
+ low=-self.init_scale, high=self.init_scale)))
+
+ dec_att_out, new_hidden_array, new_cell_array = decoder_step(
+ dec_step_emb, pre_feed, pre_hidden_array, pre_cell_array,
+ self.enc_output)
+
+ projection = layers.matmul(dec_att_out, softmax_weight)
+
+ logits = layers.softmax(projection)
+ logits = layers.log(logits)
+
+ current_log = layers.elementwise_add(logits, score, axis=0)
+
+ topk_score, topk_indices = layers.topk(input=current_log, k=1)
+
+ new_ids = layers.concat([full_ids, topk_indices])
+ layers.assign(new_ids, full_ids)
+ #layers.Print( full_ids, message="ful ids")
+ layers.assign(topk_score, score)
+ layers.assign(topk_indices, pre_ids)
+ layers.assign(dec_att_out, pre_feed)
+ for i in range(self.num_layers):
+ layers.assign(new_hidden_array[i], pre_hidden_array[i])
+ layers.assign(new_cell_array[i], pre_cell_array[i])
+
+ layers.increment(x=step_idx, value=1.0, in_place=True)
+
+ eos_met = layers.not_equal(topk_indices, eos_ids)
+ length_cond = layers.less_than(x=step_idx, y=max_length)
+ layers.logical_and(x=length_cond, y=eos_met, out=cond)
+
+ return full_ids
diff --git a/PaddleNLP/unarchived/neural_machine_translation/rnn_search/base_model.py b/PaddleNLP/unarchived/neural_machine_translation/rnn_search/base_model.py
new file mode 100644
index 0000000000000000000000000000000000000000..bebfc2f86ccf46e61639ac4bb723a9de8b08a0eb
--- /dev/null
+++ b/PaddleNLP/unarchived/neural_machine_translation/rnn_search/base_model.py
@@ -0,0 +1,502 @@
+# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import paddle.fluid.layers as layers
+import paddle.fluid as fluid
+from paddle.fluid.layers.control_flow import StaticRNN as PaddingRNN
+import numpy as np
+from paddle.fluid import ParamAttr
+from paddle.fluid.contrib.layers import basic_lstm, BasicLSTMUnit
+
+INF = 1. * 1e5
+alpha = 0.6
+
+
+class BaseModel(object):
+ def __init__(self,
+ hidden_size,
+ src_vocab_size,
+ tar_vocab_size,
+ batch_size,
+ num_layers=1,
+ init_scale=0.1,
+ dropout=None,
+ batch_first=True):
+
+ self.hidden_size = hidden_size
+ self.src_vocab_size = src_vocab_size
+ self.tar_vocab_size = tar_vocab_size
+ self.batch_size = batch_size
+ self.num_layers = num_layers
+ self.init_scale = init_scale
+ self.dropout = dropout
+ self.batch_first = batch_first
+
+ def _build_data(self):
+ self.src = layers.data(name="src", shape=[-1, 1, 1], dtype='int64')
+ self.src_sequence_length = layers.data(
+ name="src_sequence_length", shape=[-1], dtype='int32')
+
+ self.tar = layers.data(name="tar", shape=[-1, 1, 1], dtype='int64')
+ self.tar_sequence_length = layers.data(
+ name="tar_sequence_length", shape=[-1], dtype='int32')
+ self.label = layers.data(name="label", shape=[-1, 1, 1], dtype='int64')
+
+ def _emebdding(self):
+ self.src_emb = layers.embedding(
+ input=self.src,
+ size=[self.src_vocab_size, self.hidden_size],
+ dtype='float32',
+ is_sparse=False,
+ param_attr=fluid.ParamAttr(
+ name='source_embedding',
+ initializer=fluid.initializer.UniformInitializer(
+ low=-self.init_scale, high=self.init_scale)))
+ self.tar_emb = layers.embedding(
+ input=self.tar,
+ size=[self.tar_vocab_size, self.hidden_size],
+ dtype='float32',
+ is_sparse=False,
+ param_attr=fluid.ParamAttr(
+ name='target_embedding',
+ initializer=fluid.initializer.UniformInitializer(
+ low=-self.init_scale, high=self.init_scale)))
+
+ def _build_encoder(self):
+ self.enc_output, enc_last_hidden, enc_last_cell = basic_lstm( self.src_emb, None, None, self.hidden_size, num_layers=self.num_layers, batch_first=self.batch_first, \
+ dropout_prob=self.dropout, \
+ param_attr = ParamAttr( initializer=fluid.initializer.UniformInitializer(low=-self.init_scale, high=self.init_scale) ), \
+ bias_attr = ParamAttr( initializer = fluid.initializer.Constant(0.0) ), \
+ sequence_length=self.src_sequence_length)
+
+ return self.enc_output, enc_last_hidden, enc_last_cell
+
+ def _build_decoder(self,
+ enc_last_hidden,
+ enc_last_cell,
+ mode='train',
+ beam_size=10):
+ softmax_weight = layers.create_parameter([self.hidden_size, self.tar_vocab_size], dtype="float32", name="softmax_weight", \
+ default_initializer=fluid.initializer.UniformInitializer(low=-self.init_scale, high=self.init_scale))
+ if mode == 'train':
+ dec_output, dec_last_hidden, dec_last_cell = basic_lstm( self.tar_emb, enc_last_hidden, enc_last_cell, \
+ self.hidden_size, num_layers=self.num_layers, \
+ batch_first=self.batch_first, \
+ dropout_prob=self.dropout, \
+ param_attr = ParamAttr( initializer=fluid.initializer.UniformInitializer(low=-self.init_scale, high=self.init_scale) ), \
+ bias_attr = ParamAttr( initializer = fluid.initializer.Constant(0.0) ))
+
+ dec_output = layers.matmul(dec_output, softmax_weight)
+
+ return dec_output
+ elif mode == 'beam_search' or mode == 'greedy_search':
+ dec_unit_list = []
+ name = 'basic_lstm'
+ for i in range(self.num_layers):
+ new_name = name + "_layers_" + str(i)
+ dec_unit_list.append(
+ BasicLSTMUnit(
+ new_name, self.hidden_size, dtype='float32'))
+
+ def decoder_step(current_in, pre_hidden_array, pre_cell_array):
+ new_hidden_array = []
+ new_cell_array = []
+
+ step_in = current_in
+ for i in range(self.num_layers):
+ pre_hidden = pre_hidden_array[i]
+ pre_cell = pre_cell_array[i]
+
+ new_hidden, new_cell = dec_unit_list[i](step_in, pre_hidden,
+ pre_cell)
+
+ new_hidden_array.append(new_hidden)
+ new_cell_array.append(new_cell)
+
+ step_in = new_hidden
+
+ return step_in, new_hidden_array, new_cell_array
+
+ if mode == 'beam_search':
+ max_src_seq_len = layers.shape(self.src)[1]
+ max_length = max_src_seq_len * 2
+ #max_length = layers.fill_constant( [1], dtype='int32', value = 10)
+ pre_ids = layers.fill_constant([1, 1], dtype='int64', value=1)
+ full_ids = layers.fill_constant([1, 1], dtype='int64', value=1)
+
+ score = layers.fill_constant([1], dtype='float32', value=0.0)
+
+ #eos_ids = layers.fill_constant( [1, 1], dtype='int64', value=2)
+
+ pre_hidden_array = []
+ pre_cell_array = []
+ pre_feed = layers.fill_constant(
+ [beam_size, self.hidden_size], dtype='float32', value=0)
+ for i in range(self.num_layers):
+ pre_hidden_array.append(
+ layers.expand(enc_last_hidden[i], [beam_size, 1]))
+ pre_cell_array.append(
+ layers.expand(enc_last_cell[i], [beam_size, 1]))
+
+ eos_ids = layers.fill_constant(
+ [beam_size], dtype='int64', value=2)
+ init_score = np.zeros((beam_size)).astype('float32')
+ init_score[1:] = -INF
+ pre_score = layers.assign(init_score)
+ #pre_score = layers.fill_constant( [1,], dtype='float32', value= 0.0)
+ tokens = layers.fill_constant(
+ [beam_size, 1], dtype='int64', value=1)
+
+ enc_memory = layers.expand(self.enc_output, [beam_size, 1, 1])
+
+ pre_tokens = layers.fill_constant(
+ [beam_size, 1], dtype='int64', value=1)
+
+ finished_seq = layers.fill_constant(
+ [beam_size, 1], dtype='int64', value=0)
+ finished_scores = layers.fill_constant(
+ [beam_size], dtype='float32', value=-INF)
+ finished_flag = layers.fill_constant(
+ [beam_size], dtype='float32', value=0.0)
+
+ step_idx = layers.fill_constant(
+ shape=[1], dtype='int32', value=0)
+ cond = layers.less_than(
+ x=step_idx, y=max_length) # default force_cpu=True
+
+ parent_idx = layers.fill_constant([1], dtype='int32', value=0)
+ while_op = layers.While(cond)
+
+ def compute_topk_scores_and_seq(sequences,
+ scores,
+ scores_to_gather,
+ flags,
+ beam_size,
+ select_beam=None,
+ generate_id=None):
+ scores = layers.reshape(scores, shape=[1, -1])
+ _, topk_indexs = layers.topk(scores, k=beam_size)
+
+ topk_indexs = layers.reshape(topk_indexs, shape=[-1])
+
+ # gather result
+
+ top_seq = layers.gather(sequences, topk_indexs)
+ topk_flags = layers.gather(flags, topk_indexs)
+ topk_gather_scores = layers.gather(scores_to_gather,
+ topk_indexs)
+
+ if select_beam:
+ topk_beam = layers.gather(select_beam, topk_indexs)
+ else:
+ topk_beam = select_beam
+
+ if generate_id:
+ topk_id = layers.gather(generate_id, topk_indexs)
+ else:
+ topk_id = generate_id
+ return top_seq, topk_gather_scores, topk_flags, topk_beam, topk_id
+
+ def grow_alive(curr_seq, curr_scores, curr_log_probs,
+ curr_finished, select_beam, generate_id):
+ curr_scores += curr_finished * -INF
+ return compute_topk_scores_and_seq(
+ curr_seq,
+ curr_scores,
+ curr_log_probs,
+ curr_finished,
+ beam_size,
+ select_beam,
+ generate_id=generate_id)
+
+ def grow_finished(finished_seq, finished_scores, finished_flag,
+ curr_seq, curr_scores, curr_finished):
+ finished_seq = layers.concat(
+ [
+ finished_seq, layers.fill_constant(
+ [beam_size, 1], dtype='int64', value=1)
+ ],
+ axis=1)
+ curr_scores += (1.0 - curr_finished) * -INF
+ #layers.Print( curr_scores, message="curr scores")
+ curr_finished_seq = layers.concat(
+ [finished_seq, curr_seq], axis=0)
+ curr_finished_scores = layers.concat(
+ [finished_scores, curr_scores], axis=0)
+ curr_finished_flags = layers.concat(
+ [finished_flag, curr_finished], axis=0)
+
+ return compute_topk_scores_and_seq(
+ curr_finished_seq, curr_finished_scores,
+ curr_finished_scores, curr_finished_flags, beam_size)
+
+ def is_finished(alive_log_prob, finished_scores,
+ finished_in_finished):
+
+ max_out_len = 200
+ max_length_penalty = layers.pow(layers.fill_constant(
+ [1], dtype='float32', value=(
+ (5.0 + max_out_len) / 6.0)),
+ alpha)
+
+ lower_bound_alive_score = layers.slice(
+ alive_log_prob, starts=[0], ends=[1],
+ axes=[0]) / max_length_penalty
+
+ lowest_score_of_fininshed_in_finished = finished_scores * finished_in_finished
+ lowest_score_of_fininshed_in_finished += (
+ 1.0 - finished_in_finished) * -INF
+ lowest_score_of_fininshed_in_finished = layers.reduce_min(
+ lowest_score_of_fininshed_in_finished)
+
+ met = layers.less_than(
+ lower_bound_alive_score,
+ lowest_score_of_fininshed_in_finished)
+ met = layers.cast(met, 'float32')
+ bound_is_met = layers.reduce_sum(met)
+
+ finished_eos_num = layers.reduce_sum(finished_in_finished)
+
+ finish_cond = layers.less_than(
+ finished_eos_num,
+ layers.fill_constant(
+ [1], dtype='float32', value=beam_size))
+
+ return finish_cond
+
+ def grow_top_k(step_idx, alive_seq, alive_log_prob, parant_idx):
+ pre_ids = alive_seq
+
+ dec_step_emb = layers.embedding(
+ input=pre_ids,
+ size=[self.tar_vocab_size, self.hidden_size],
+ dtype='float32',
+ is_sparse=False,
+ param_attr=fluid.ParamAttr(
+ name='target_embedding',
+ initializer=fluid.initializer.UniformInitializer(
+ low=-self.init_scale, high=self.init_scale)))
+
+ dec_att_out, new_hidden_array, new_cell_array = decoder_step(
+ dec_step_emb, pre_hidden_array, pre_cell_array)
+
+ projection = layers.matmul(dec_att_out, softmax_weight)
+
+ logits = layers.softmax(projection)
+ current_log = layers.elementwise_add(
+ x=layers.log(logits), y=alive_log_prob, axis=0)
+ base_1 = layers.cast(step_idx, 'float32') + 6.0
+ base_1 /= 6.0
+ length_penalty = layers.pow(base_1, alpha)
+
+ len_pen = layers.pow(((
+ 5. + layers.cast(step_idx + 1, 'float32')) / 6.), alpha)
+
+ current_log = layers.reshape(current_log, shape=[1, -1])
+
+ current_log = current_log / length_penalty
+ topk_scores, topk_indices = layers.topk(
+ input=current_log, k=beam_size)
+
+ topk_scores = layers.reshape(topk_scores, shape=[-1])
+
+ topk_log_probs = topk_scores * length_penalty
+
+ generate_id = layers.reshape(
+ topk_indices, shape=[-1]) % self.tar_vocab_size
+
+ selected_beam = layers.reshape(
+ topk_indices, shape=[-1]) // self.tar_vocab_size
+
+ topk_finished = layers.equal(generate_id, eos_ids)
+
+ topk_finished = layers.cast(topk_finished, 'float32')
+
+ generate_id = layers.reshape(generate_id, shape=[-1, 1])
+
+ pre_tokens_list = layers.gather(tokens, selected_beam)
+
+ full_tokens_list = layers.concat(
+ [pre_tokens_list, generate_id], axis=1)
+
+
+ return full_tokens_list, topk_log_probs, topk_scores, topk_finished, selected_beam, generate_id, \
+ dec_att_out, new_hidden_array, new_cell_array
+
+ with while_op.block():
+ topk_seq, topk_log_probs, topk_scores, topk_finished, topk_beam, topk_generate_id, attention_out, new_hidden_array, new_cell_array = \
+ grow_top_k( step_idx, pre_tokens, pre_score, parent_idx)
+ alive_seq, alive_log_prob, _, alive_beam, alive_id = grow_alive(
+ topk_seq, topk_scores, topk_log_probs, topk_finished,
+ topk_beam, topk_generate_id)
+
+ finished_seq_2, finished_scores_2, finished_flags_2, _, _ = grow_finished(
+ finished_seq, finished_scores, finished_flag, topk_seq,
+ topk_scores, topk_finished)
+
+ finished_cond = is_finished(
+ alive_log_prob, finished_scores_2, finished_flags_2)
+
+ layers.increment(x=step_idx, value=1.0, in_place=True)
+
+ layers.assign(alive_beam, parent_idx)
+ layers.assign(alive_id, pre_tokens)
+ layers.assign(alive_log_prob, pre_score)
+ layers.assign(alive_seq, tokens)
+ layers.assign(finished_seq_2, finished_seq)
+ layers.assign(finished_scores_2, finished_scores)
+ layers.assign(finished_flags_2, finished_flag)
+
+ # update init_hidden, init_cell, input_feed
+ new_feed = layers.gather(attention_out, parent_idx)
+ layers.assign(new_feed, pre_feed)
+ for i in range(self.num_layers):
+ new_hidden_var = layers.gather(new_hidden_array[i],
+ parent_idx)
+ layers.assign(new_hidden_var, pre_hidden_array[i])
+ new_cell_var = layers.gather(new_cell_array[i],
+ parent_idx)
+ layers.assign(new_cell_var, pre_cell_array[i])
+
+ length_cond = layers.less_than(x=step_idx, y=max_length)
+ layers.logical_and(x=length_cond, y=finished_cond, out=cond)
+
+ tokens_with_eos = tokens
+
+ all_seq = layers.concat([tokens_with_eos, finished_seq], axis=0)
+ all_score = layers.concat([pre_score, finished_scores], axis=0)
+ _, topk_index = layers.topk(all_score, k=beam_size)
+ topk_index = layers.reshape(topk_index, shape=[-1])
+ final_seq = layers.gather(all_seq, topk_index)
+ final_score = layers.gather(all_score, topk_index)
+
+ return final_seq
+ elif mode == 'greedy_search':
+ max_src_seq_len = layers.shape(self.src)[1]
+ max_length = max_src_seq_len * 2
+ #max_length = layers.fill_constant( [1], dtype='int32', value = 10)
+ pre_ids = layers.fill_constant([1, 1], dtype='int64', value=1)
+ full_ids = layers.fill_constant([1, 1], dtype='int64', value=1)
+
+ score = layers.fill_constant([1], dtype='float32', value=0.0)
+
+ eos_ids = layers.fill_constant([1, 1], dtype='int64', value=2)
+
+ pre_hidden_array = []
+ pre_cell_array = []
+ pre_feed = layers.fill_constant(
+ [1, self.hidden_size], dtype='float32', value=0)
+ for i in range(self.num_layers):
+ pre_hidden_array.append(enc_last_hidden[i])
+ pre_cell_array.append(enc_last_cell[i])
+ #pre_hidden_array.append( layers.fill_constant( [1, hidden_size], dtype='float32', value=0) )
+ #pre_cell_array.append( layers.fill_constant( [1, hidden_size], dtype='float32', value=0) )
+
+ step_idx = layers.fill_constant(
+ shape=[1], dtype='int32', value=0)
+ cond = layers.less_than(
+ x=step_idx, y=max_length) # default force_cpu=True
+ while_op = layers.While(cond)
+
+ with while_op.block():
+
+ dec_step_emb = layers.embedding(
+ input=pre_ids,
+ size=[self.tar_vocab_size, self.hidden_size],
+ dtype='float32',
+ is_sparse=False,
+ param_attr=fluid.ParamAttr(
+ name='target_embedding',
+ initializer=fluid.initializer.UniformInitializer(
+ low=-self.init_scale, high=self.init_scale)))
+
+ dec_att_out, new_hidden_array, new_cell_array = decoder_step(
+ dec_step_emb, pre_hidden_array, pre_cell_array)
+
+ projection = layers.matmul(dec_att_out, softmax_weight)
+
+ logits = layers.softmax(projection)
+ logits = layers.log(logits)
+
+ current_log = layers.elementwise_add(logits, score, axis=0)
+
+ topk_score, topk_indices = layers.topk(
+ input=current_log, k=1)
+
+ new_ids = layers.concat([full_ids, topk_indices])
+ layers.assign(new_ids, full_ids)
+ #layers.Print( full_ids, message="ful ids")
+ layers.assign(topk_score, score)
+ layers.assign(topk_indices, pre_ids)
+ layers.assign(dec_att_out, pre_feed)
+ for i in range(self.num_layers):
+ layers.assign(new_hidden_array[i], pre_hidden_array[i])
+ layers.assign(new_cell_array[i], pre_cell_array[i])
+
+ layers.increment(x=step_idx, value=1.0, in_place=True)
+
+ eos_met = layers.not_equal(topk_indices, eos_ids)
+ length_cond = layers.less_than(x=step_idx, y=max_length)
+ layers.logical_and(x=length_cond, y=eos_met, out=cond)
+
+ return full_ids
+
+ raise Exception("error")
+ else:
+ print("mode not supprt", mode)
+
+ def _compute_loss(self, dec_output):
+ loss = layers.softmax_with_cross_entropy(
+ logits=dec_output, label=self.label, soft_label=False)
+
+ loss = layers.reshape(loss, shape=[self.batch_size, -1])
+
+ max_tar_seq_len = layers.shape(self.tar)[1]
+ tar_mask = layers.sequence_mask(
+ self.tar_sequence_length, maxlen=max_tar_seq_len, dtype='float32')
+ loss = loss * tar_mask
+ loss = layers.reduce_mean(loss, dim=[0])
+ loss = layers.reduce_sum(loss)
+
+ loss.permissions = True
+
+ return loss
+
+ def _beam_search(self, enc_last_hidden, enc_last_cell):
+ pass
+
+ def build_graph(self, mode='train', beam_size=10):
+ if mode == 'train' or mode == 'eval':
+ self._build_data()
+ self._emebdding()
+ enc_output, enc_last_hidden, enc_last_cell = self._build_encoder()
+ dec_output = self._build_decoder(enc_last_hidden, enc_last_cell)
+
+ loss = self._compute_loss(dec_output)
+ return loss
+ elif mode == "beam_search" or mode == 'greedy_search':
+ self._build_data()
+ self._emebdding()
+ enc_output, enc_last_hidden, enc_last_cell = self._build_encoder()
+ dec_output = self._build_decoder(
+ enc_last_hidden, enc_last_cell, mode=mode, beam_size=beam_size)
+
+ return dec_output
+ else:
+ print("not support mode ", mode)
+ raise Exception("not support mode: " + mode)
diff --git a/PaddleNLP/unarchived/neural_machine_translation/rnn_search/data/download_en-vi.sh b/PaddleNLP/unarchived/neural_machine_translation/rnn_search/data/download_en-vi.sh
new file mode 100755
index 0000000000000000000000000000000000000000..ae61044bcd34b84c35cf252871535be2fecb7a2e
--- /dev/null
+++ b/PaddleNLP/unarchived/neural_machine_translation/rnn_search/data/download_en-vi.sh
@@ -0,0 +1,33 @@
+#!/bin/sh
+# IWSLT15 Vietnames to English is a small dataset contain 133k parallel data
+# this script download the data from stanford website
+#
+# Usage:
+# ./download_en-vi.sh output_path
+#
+# If output_path is not specified, a dir nameed "./en_vi" will be created and used as
+# output path
+
+set -ex
+OUTPUT_PATH="${1:-en-vi}"
+SITE_PATH="https://nlp.stanford.edu/projects/nmt/data"
+
+mkdir -v -p $OUTPUT_PATH
+
+# Download iwslt15 small dataset from standford website.
+echo "Begin to download training dataset train.en and train.vi."
+wget "$SITE_PATH/iwslt15.en-vi/train.en" -O "$OUTPUT_PATH/train.en"
+wget "$SITE_PATH/iwslt15.en-vi/train.vi" -O "$OUTPUT_PATH/train.vi"
+
+echo "Begin to download dev dataset tst2012.en and tst2012.vi."
+wget "$SITE_PATH/iwslt15.en-vi/tst2012.en" -O "$OUTPUT_PATH/tst2012.en"
+wget "$SITE_PATH/iwslt15.en-vi/tst2012.vi" -O "$OUTPUT_PATH/tst2012.vi"
+
+echo "Begin to download test dataset tst2013.en and tst2013.vi."
+wget "$SITE_PATH/iwslt15.en-vi/tst2013.en" -O "$OUTPUT_PATH/tst2013.en"
+wget "$SITE_PATH/iwslt15.en-vi/tst2013.vi" -O "$OUTPUT_PATH/tst2013.vi"
+
+echo "Begin to ownload vocab file vocab.en and vocab.vi."
+wget "$SITE_PATH/iwslt15.en-vi/vocab.en" -O "$OUTPUT_PATH/vocab.en"
+wget "$SITE_PATH/iwslt15.en-vi/vocab.vi" -O "$OUTPUT_PATH/vocab.vi"
+
diff --git a/PaddleNLP/unarchived/neural_machine_translation/rnn_search/images/bi_rnn.png b/PaddleNLP/unarchived/neural_machine_translation/rnn_search/images/bi_rnn.png
deleted file mode 100644
index 9d8efd50a49d0305586f550344472ab94c93bed3..0000000000000000000000000000000000000000
Binary files a/PaddleNLP/unarchived/neural_machine_translation/rnn_search/images/bi_rnn.png and /dev/null differ
diff --git a/PaddleNLP/unarchived/neural_machine_translation/rnn_search/images/decoder_attention.png b/PaddleNLP/unarchived/neural_machine_translation/rnn_search/images/decoder_attention.png
deleted file mode 100644
index 1b355e7786d25487a3f564af758c2c52c43b4690..0000000000000000000000000000000000000000
Binary files a/PaddleNLP/unarchived/neural_machine_translation/rnn_search/images/decoder_attention.png and /dev/null differ
diff --git a/PaddleNLP/unarchived/neural_machine_translation/rnn_search/images/encoder_attention.png b/PaddleNLP/unarchived/neural_machine_translation/rnn_search/images/encoder_attention.png
deleted file mode 100644
index 28d7a15a3bd65262bde22a3f41b5aa78b46b368a..0000000000000000000000000000000000000000
Binary files a/PaddleNLP/unarchived/neural_machine_translation/rnn_search/images/encoder_attention.png and /dev/null differ
diff --git a/PaddleNLP/unarchived/neural_machine_translation/rnn_search/infer.py b/PaddleNLP/unarchived/neural_machine_translation/rnn_search/infer.py
index f042d5ef63f602ab1d892790c2826c459ef83e5e..9ac4c73e0a289a99267e2c6166b1bb06ff8430db 100644
--- a/PaddleNLP/unarchived/neural_machine_translation/rnn_search/infer.py
+++ b/PaddleNLP/unarchived/neural_machine_translation/rnn_search/infer.py
@@ -17,120 +17,146 @@ from __future__ import division
from __future__ import print_function
import numpy as np
+import time
import os
-import six
+import random
+
+import math
import paddle
import paddle.fluid as fluid
import paddle.fluid.framework as framework
from paddle.fluid.executor import Executor
-from paddle.fluid.contrib.decoder.beam_search_decoder import *
+
+import reader
+
+import sys
+if sys.version[0] == '2':
+ reload(sys)
+ sys.setdefaultencoding("utf-8")
+import os
from args import *
-import attention_model
-import no_attention_model
+#from . import lm_model
+import logging
+import pickle
+
+from attention_model import AttentionModel
+
+from base_model import BaseModel
+SEED = 123
-def infer():
+
+def train():
args = parse_args()
- # Inference
- if args.no_attention:
- translation_ids, translation_scores, feed_order = \
- no_attention_model.seq_to_seq_net(
- args.embedding_dim,
- args.encoder_size,
- args.decoder_size,
- args.dict_size,
- args.dict_size,
- True,
- beam_size=args.beam_size,
- max_length=args.max_length)
+ num_layers = args.num_layers
+ src_vocab_size = args.src_vocab_size
+ tar_vocab_size = args.tar_vocab_size
+ batch_size = args.batch_size
+ dropout = args.dropout
+ init_scale = args.init_scale
+ max_grad_norm = args.max_grad_norm
+ hidden_size = args.hidden_size
+ # inference process
+
+ print("src", src_vocab_size)
+
+ # dropout type using upscale_in_train, dropout can be remove in inferecen
+ # So we can set dropout to 0
+ if args.attention:
+ model = AttentionModel(
+ hidden_size,
+ src_vocab_size,
+ tar_vocab_size,
+ batch_size,
+ num_layers=num_layers,
+ init_scale=init_scale,
+ dropout=0.0)
else:
- translation_ids, translation_scores, feed_order = \
- attention_model.seq_to_seq_net(
- args.embedding_dim,
- args.encoder_size,
- args.decoder_size,
- args.dict_size,
- args.dict_size,
- True,
- beam_size=args.beam_size,
- max_length=args.max_length)
-
- test_batch_generator = paddle.batch(
- paddle.reader.shuffle(
- paddle.dataset.wmt14.test(args.dict_size), buf_size=1000),
- batch_size=args.batch_size,
- drop_last=False)
+ model = BaseModel(
+ hidden_size,
+ src_vocab_size,
+ tar_vocab_size,
+ batch_size,
+ num_layers=num_layers,
+ init_scale=init_scale,
+ dropout=0.0)
+
+ beam_size = args.beam_size
+ trans_res = model.build_graph(mode='beam_search', beam_size=beam_size)
+ # clone from default main program and use it as the validation program
+ main_program = fluid.default_main_program()
place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace()
exe = Executor(place)
exe.run(framework.default_startup_program())
- model_path = os.path.join(args.save_dir, str(args.pass_num))
- fluid.io.load_persistables(
- executor=exe,
- dirname=model_path,
- main_program=framework.default_main_program())
-
- src_dict, trg_dict = paddle.dataset.wmt14.get_dict(args.dict_size)
-
- feed_list = [
- framework.default_main_program().global_block().var(var_name)
- for var_name in feed_order[0:1]
- ]
- feeder = fluid.DataFeeder(feed_list, place)
-
- for batch_id, data in enumerate(test_batch_generator()):
- # The value of batch_size may vary in the last batch
- batch_size = len(data)
-
- # Setup initial ids and scores lod tensor
- init_ids_data = np.array([0 for _ in range(batch_size)], dtype='int64')
- init_scores_data = np.array(
- [1. for _ in range(batch_size)], dtype='float32')
- init_ids_data = init_ids_data.reshape((batch_size, 1))
- init_scores_data = init_scores_data.reshape((batch_size, 1))
- init_recursive_seq_lens = [1] * batch_size
- init_recursive_seq_lens = [
- init_recursive_seq_lens, init_recursive_seq_lens
- ]
- init_ids = fluid.create_lod_tensor(init_ids_data,
- init_recursive_seq_lens, place)
- init_scores = fluid.create_lod_tensor(init_scores_data,
- init_recursive_seq_lens, place)
-
- # Feed dict for inference
- feed_dict = feeder.feed([[x[0]] for x in data])
- feed_dict['init_ids'] = init_ids
- feed_dict['init_scores'] = init_scores
-
- fetch_outs = exe.run(framework.default_main_program(),
- feed=feed_dict,
- fetch_list=[translation_ids, translation_scores],
- return_numpy=False)
-
- # Split the output words by lod levels
- lod_level_1 = fetch_outs[0].lod()[1]
- token_array = np.array(fetch_outs[0])
- result = []
- for i in six.moves.xrange(len(lod_level_1) - 1):
- sentence_list = [
- trg_dict[token]
- for token in token_array[lod_level_1[i]:lod_level_1[i + 1]]
- ]
- sentence = " ".join(sentence_list[1:-1])
- result.append(sentence)
- lod_level_0 = fetch_outs[0].lod()[0]
- paragraphs = [
- result[lod_level_0[i]:lod_level_0[i + 1]]
- for i in six.moves.xrange(len(lod_level_0) - 1)
- ]
-
- for paragraph in paragraphs:
- print(paragraph)
+ source_vocab_file = args.vocab_prefix + "." + args.src_lang
+ infer_file = args.infer_file
+
+ infer_data = reader.raw_mono_data(source_vocab_file, infer_file)
+
+ def prepare_input(batch, epoch_id=0, with_lr=True):
+ src_ids, src_mask, tar_ids, tar_mask = batch
+ res = {}
+ src_ids = src_ids.reshape((src_ids.shape[0], src_ids.shape[1], 1))
+ in_tar = tar_ids[:, :-1]
+ label_tar = tar_ids[:, 1:]
+
+ in_tar = in_tar.reshape((in_tar.shape[0], in_tar.shape[1], 1))
+ in_tar = np.zeros_like(in_tar, dtype='int64')
+ label_tar = label_tar.reshape(
+ (label_tar.shape[0], label_tar.shape[1], 1))
+ label_tar = np.zeros_like(label_tar, dtype='int64')
+
+ res['src'] = src_ids
+ res['tar'] = in_tar
+ res['label'] = label_tar
+ res['src_sequence_length'] = src_mask
+ res['tar_sequence_length'] = tar_mask
+
+ return res, np.sum(tar_mask)
+
+ dir_name = args.reload_model
+ print("dir name", dir_name)
+ fluid.io.load_params(exe, dir_name)
+
+ train_data_iter = reader.get_data_iter(infer_data, 1, mode='eval')
+
+ tar_id2vocab = []
+ tar_vocab_file = args.vocab_prefix + "." + args.tar_lang
+ with open(tar_vocab_file, "r") as f:
+ for line in f.readlines():
+ tar_id2vocab.append(line.strip())
+
+ infer_output_file = args.infer_output_file
+
+ out_file = open(infer_output_file, 'w')
+
+ for batch_id, batch in enumerate(train_data_iter):
+ input_data_feed, word_num = prepare_input(batch, epoch_id=0)
+
+ fetch_outs = exe.run(feed=input_data_feed,
+ fetch_list=[trans_res.name],
+ use_program_cache=False)
+
+ res = [tar_id2vocab[e] for e in fetch_outs[0].reshape(-1)]
+
+ res = res[1:]
+
+ new_res = []
+ for ele in res:
+ if ele == "":
+ break
+ new_res.append(ele)
+
+ out_file.write(' '.join(new_res))
+ out_file.write('\n')
+
+ out_file.close()
if __name__ == '__main__':
- infer()
+ train()
diff --git a/PaddleNLP/unarchived/neural_machine_translation/rnn_search/infer.sh b/PaddleNLP/unarchived/neural_machine_translation/rnn_search/infer.sh
new file mode 100644
index 0000000000000000000000000000000000000000..ffd48da6adc4ecc080f504d3b6a6a244fa6eedd9
--- /dev/null
+++ b/PaddleNLP/unarchived/neural_machine_translation/rnn_search/infer.sh
@@ -0,0 +1,21 @@
+#!/bin/bash
+
+set -ex
+export CUDA_VISIBLE_DEVICES=0
+
+python infer.py \
+ --src_lang en --tar_lang vi \
+ --attention True \
+ --num_layers 2 \
+ --hidden_size 512 \
+ --src_vocab_size 17191 \
+ --tar_vocab_size 7709 \
+ --batch_size 128 \
+ --dropout 0.2 \
+ --init_scale 0.1 \
+ --max_grad_norm 5.0 \
+ --vocab_prefix data/en-vi/vocab \
+ --infer_file data/en-vi/tst2013.en \
+ --reload_model ./model/epoch_10 \
+ --use_gpu True
+
diff --git a/PaddleNLP/unarchived/neural_machine_translation/rnn_search/no_attention_model.py b/PaddleNLP/unarchived/neural_machine_translation/rnn_search/no_attention_model.py
deleted file mode 100644
index 57e7dbe42ad37bbd5d4c85ab4d58b2e1dd3d961b..0000000000000000000000000000000000000000
--- a/PaddleNLP/unarchived/neural_machine_translation/rnn_search/no_attention_model.py
+++ /dev/null
@@ -1,127 +0,0 @@
-# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import paddle.fluid.layers as layers
-from paddle.fluid.contrib.decoder.beam_search_decoder import *
-
-
-def seq_to_seq_net(embedding_dim, encoder_size, decoder_size, source_dict_dim,
- target_dict_dim, is_generating, beam_size, max_length):
- def encoder():
- # Encoder implementation of RNN translation
- src_word = layers.data(
- name="src_word", shape=[1], dtype='int64', lod_level=1)
- src_embedding = layers.embedding(
- input=src_word,
- size=[source_dict_dim, embedding_dim],
- dtype='float32',
- is_sparse=True)
-
- fc1 = layers.fc(input=src_embedding, size=encoder_size * 4, act='tanh')
- lstm_hidden0, lstm_0 = layers.dynamic_lstm(
- input=fc1, size=encoder_size * 4)
- encoder_out = layers.sequence_last_step(input=lstm_hidden0)
- return encoder_out
-
- def decoder_state_cell(context):
- # Decoder state cell, specifies the hidden state variable and its updater
- h = InitState(init=context, need_reorder=True)
- state_cell = StateCell(
- inputs={'x': None}, states={'h': h}, out_state='h')
-
- @state_cell.state_updater
- def updater(state_cell):
- current_word = state_cell.get_input('x')
- prev_h = state_cell.get_state('h')
- # make sure lod of h heritted from prev_h
- h = layers.fc(input=[prev_h, current_word],
- size=decoder_size,
- act='tanh')
- state_cell.set_state('h', h)
-
- return state_cell
-
- def decoder_train(state_cell):
- # Decoder for training implementation of RNN translation
- trg_word = layers.data(
- name="target_word", shape=[1], dtype='int64', lod_level=1)
- trg_embedding = layers.embedding(
- input=trg_word,
- size=[target_dict_dim, embedding_dim],
- dtype='float32',
- is_sparse=True)
-
- # A training decoder
- decoder = TrainingDecoder(state_cell)
-
- # Define the computation in each RNN step done by decoder
- with decoder.block():
- current_word = decoder.step_input(trg_embedding)
- decoder.state_cell.compute_state(inputs={'x': current_word})
- current_score = layers.fc(input=decoder.state_cell.get_state('h'),
- size=target_dict_dim,
- act='softmax')
- decoder.state_cell.update_states()
- decoder.output(current_score)
-
- return decoder()
-
- def decoder_infer(state_cell):
- # Decoder for inference implementation
- init_ids = layers.data(
- name="init_ids", shape=[1], dtype="int64", lod_level=2)
- init_scores = layers.data(
- name="init_scores", shape=[1], dtype="float32", lod_level=2)
-
- # A beam search decoder for inference
- decoder = BeamSearchDecoder(
- state_cell=state_cell,
- init_ids=init_ids,
- init_scores=init_scores,
- target_dict_dim=target_dict_dim,
- word_dim=embedding_dim,
- input_var_dict={},
- topk_size=50,
- sparse_emb=True,
- max_len=max_length,
- beam_size=beam_size,
- end_id=1,
- name=None)
- decoder.decode()
- translation_ids, translation_scores = decoder()
-
- return translation_ids, translation_scores
-
- context = encoder()
- state_cell = decoder_state_cell(context)
-
- if not is_generating:
- label = layers.data(
- name="target_next_word", shape=[1], dtype='int64', lod_level=1)
-
- rnn_out = decoder_train(state_cell)
-
- cost = layers.cross_entropy(input=rnn_out, label=label)
- avg_cost = layers.mean(x=cost)
-
- feeding_list = ['src_word', 'target_word', 'target_next_word']
- return avg_cost, feeding_list
- else:
- translation_ids, translation_scores = decoder_infer(state_cell)
- feeding_list = ['src_word']
- return translation_ids, translation_scores, feeding_list
diff --git a/PaddleNLP/unarchived/neural_machine_translation/rnn_search/reader.py b/PaddleNLP/unarchived/neural_machine_translation/rnn_search/reader.py
new file mode 100644
index 0000000000000000000000000000000000000000..258d042021f2c82e1043d1281fd73d13bcda4aac
--- /dev/null
+++ b/PaddleNLP/unarchived/neural_machine_translation/rnn_search/reader.py
@@ -0,0 +1,210 @@
+# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Utilities for parsing PTB text files."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import collections
+import os
+import sys
+import numpy as np
+
+Py3 = sys.version_info[0] == 3
+
+UNK_ID = 0
+
+
+def _read_words(filename):
+ data = []
+ with open(filename, "r") as f:
+ if Py3:
+ return f.read().replace("\n", "").split()
+ else:
+ return f.read().decode("utf-8").replace("\n", "").split()
+
+
+def read_all_line(filenam):
+ data = []
+ with open(filename, "r") as f:
+ for line in f.readlines():
+ data.append(line.strip())
+
+
+def _build_vocab(filename):
+
+ vocab_dict = {}
+ ids = 0
+ with open(filename, "r") as f:
+ for line in f.readlines():
+ vocab_dict[line.strip()] = ids
+ ids += 1
+
+ print("vocab word num", ids)
+
+ return vocab_dict
+
+
+def _para_file_to_ids(src_file, tar_file, src_vocab, tar_vocab):
+
+ src_data = []
+ with open(src_file, "r") as f_src:
+ for line in f_src.readlines():
+ arra = line.strip().split()
+ ids = [src_vocab[w] if w in src_vocab else UNK_ID for w in arra]
+ ids = ids
+
+ src_data.append(ids)
+
+ tar_data = []
+ with open(tar_file, "r") as f_tar:
+ for line in f_tar.readlines():
+ arra = line.strip().split()
+ ids = [tar_vocab[w] if w in tar_vocab else UNK_ID for w in arra]
+
+ ids = [1] + ids + [2]
+
+ tar_data.append(ids)
+
+ return src_data, tar_data
+
+
+def filter_len(src, tar, max_sequence_len=50):
+ new_src = []
+ new_tar = []
+
+ for id1, id2 in zip(src, tar):
+ if len(id1) > max_sequence_len:
+ id1 = id1[:max_sequence_len]
+ if len(id2) > max_sequence_len + 2:
+ id2 = id2[:max_sequence_len + 2]
+
+ new_src.append(id1)
+ new_tar.append(id2)
+
+ return new_src, new_tar
+
+
+def raw_data(src_lang,
+ tar_lang,
+ vocab_prefix,
+ train_prefix,
+ eval_prefix,
+ test_prefix,
+ max_sequence_len=50):
+
+ src_vocab_file = vocab_prefix + "." + src_lang
+ tar_vocab_file = vocab_prefix + "." + tar_lang
+
+ src_train_file = train_prefix + "." + src_lang
+ tar_train_file = train_prefix + "." + tar_lang
+
+ src_eval_file = eval_prefix + "." + src_lang
+ tar_eval_file = eval_prefix + "." + tar_lang
+
+ src_test_file = test_prefix + "." + src_lang
+ tar_test_file = test_prefix + "." + tar_lang
+
+ src_vocab = _build_vocab(src_vocab_file)
+ tar_vocab = _build_vocab(tar_vocab_file)
+
+ train_src, train_tar = _para_file_to_ids( src_train_file, tar_train_file, \
+ src_vocab, tar_vocab )
+ train_src, train_tar = filter_len(
+ train_src, train_tar, max_sequence_len=max_sequence_len)
+ eval_src, eval_tar = _para_file_to_ids( src_eval_file, tar_eval_file, \
+ src_vocab, tar_vocab )
+
+ test_src, test_tar = _para_file_to_ids( src_test_file, tar_test_file, \
+ src_vocab, tar_vocab )
+
+ return ( train_src, train_tar), (eval_src, eval_tar), (test_src, test_tar),\
+ (src_vocab, tar_vocab)
+
+
+def raw_mono_data(vocab_file, file_path):
+
+ src_vocab = _build_vocab(vocab_file)
+
+ test_src, test_tar = _para_file_to_ids( file_path, file_path, \
+ src_vocab, src_vocab )
+
+ return (test_src, test_tar)
+
+
+def get_data_iter(raw_data, batch_size, mode='train'):
+
+ src_data, tar_data = raw_data
+
+ data_len = len(src_data)
+
+ index = np.arange(data_len)
+ if mode == "train":
+ np.random.shuffle(index)
+
+ def to_pad_np(data, source=False):
+ max_len = 0
+ for ele in data:
+ if len(ele) > max_len:
+ max_len = len(ele)
+
+ ids = np.ones((batch_size, max_len), dtype='int64') * 2
+ mask = np.zeros((batch_size), dtype='int32')
+
+ for i, ele in enumerate(data):
+ ids[i, :len(ele)] = ele
+ if not source:
+ mask[i] = len(ele) - 1
+ else:
+ mask[i] = len(ele)
+
+ return ids, mask
+
+ b_src = []
+
+ cache_num = 20
+ if mode != "train":
+ cache_num = 1
+ for j in range(data_len):
+ if len(b_src) == batch_size * cache_num:
+ # build batch size
+
+ # sort
+ new_cache = sorted(b_src, key=lambda k: len(k[0]))
+
+ for i in range(cache_num):
+ batch_data = new_cache[i * batch_size:(i + 1) * batch_size]
+ src_cache = [w[0] for w in batch_data]
+ tar_cache = [w[1] for w in batch_data]
+ src_ids, src_mask = to_pad_np(src_cache, source=True)
+ tar_ids, tar_mask = to_pad_np(tar_cache)
+
+ #print( "src ids", src_ids )
+ yield (src_ids, src_mask, tar_ids, tar_mask)
+
+ b_src = []
+
+ b_src.append((src_data[index[j]], tar_data[index[j]]))
+ if len(b_src) == batch_size * cache_num:
+ new_cache = sorted(b_src, key=lambda k: len(k[0]))
+
+ for i in range(cache_num):
+ batch_data = new_cache[i * batch_size:(i + 1) * batch_size]
+ src_cache = [w[0] for w in batch_data]
+ tar_cache = [w[1] for w in batch_data]
+ src_ids, src_mask = to_pad_np(src_cache, source=True)
+ tar_ids, tar_mask = to_pad_np(tar_cache)
+
+ #print( "src ids", src_ids )
+ yield (src_ids, src_mask, tar_ids, tar_mask)
diff --git a/PaddleNLP/unarchived/neural_machine_translation/rnn_search/run.sh b/PaddleNLP/unarchived/neural_machine_translation/rnn_search/run.sh
new file mode 100644
index 0000000000000000000000000000000000000000..cf48282deea544f5ca2d233a4af7336fb664cc65
--- /dev/null
+++ b/PaddleNLP/unarchived/neural_machine_translation/rnn_search/run.sh
@@ -0,0 +1,22 @@
+#!/bin/bash
+
+set -ex
+export CUDA_VISIBLE_DEVICES=0
+
+python train.py \
+ --src_lang en --tar_lang vi \
+ --attention True \
+ --num_layers 2 \
+ --hidden_size 512 \
+ --src_vocab_size 17191 \
+ --tar_vocab_size 7709 \
+ --batch_size 128 \
+ --dropout 0.2 \
+ --init_scale 0.1 \
+ --max_grad_norm 5.0 \
+ --train_data_prefix data/en-vi/train \
+ --eval_data_prefix data/en-vi/tst2012 \
+ --test_data_prefix data/en-vi/tst2013 \
+ --vocab_prefix data/en-vi/vocab \
+ --use_gpu True
+
diff --git a/PaddleNLP/unarchived/neural_machine_translation/rnn_search/train.py b/PaddleNLP/unarchived/neural_machine_translation/rnn_search/train.py
index fbb93eab1f0eba05d30b75957b7fb18e42592375..60053c4f3c05a2e7cc6b80d69d1efa87b40cff5d 100644
--- a/PaddleNLP/unarchived/neural_machine_translation/rnn_search/train.py
+++ b/PaddleNLP/unarchived/neural_machine_translation/rnn_search/train.py
@@ -19,150 +19,175 @@ from __future__ import print_function
import numpy as np
import time
import os
+import random
+
+import math
import paddle
import paddle.fluid as fluid
import paddle.fluid.framework as framework
from paddle.fluid.executor import Executor
-from paddle.fluid.contrib.decoder.beam_search_decoder import *
+
+import reader
+
+import sys
+if sys.version[0] == '2':
+ reload(sys)
+ sys.setdefaultencoding("utf-8")
+import os
from args import *
-import attention_model
-import no_attention_model
+from base_model import BaseModel
+from attention_model import AttentionModel
+import logging
+import pickle
+
+SEED = 123
def train():
args = parse_args()
- if args.enable_ce:
- framework.default_startup_program().random_seed = 111
-
+ num_layers = args.num_layers
+ src_vocab_size = args.src_vocab_size
+ tar_vocab_size = args.tar_vocab_size
+ batch_size = args.batch_size
+ dropout = args.dropout
+ init_scale = args.init_scale
+ max_grad_norm = args.max_grad_norm
+ hidden_size = args.hidden_size
# Training process
- if args.no_attention:
- avg_cost, feed_order = no_attention_model.seq_to_seq_net(
- args.embedding_dim,
- args.encoder_size,
- args.decoder_size,
- args.dict_size,
- args.dict_size,
- False,
- beam_size=args.beam_size,
- max_length=args.max_length)
- else:
- avg_cost, feed_order = attention_model.seq_to_seq_net(
- args.embedding_dim,
- args.encoder_size,
- args.decoder_size,
- args.dict_size,
- args.dict_size,
- False,
- beam_size=args.beam_size,
- max_length=args.max_length)
+ if args.attention:
+ model = AttentionModel(
+ hidden_size,
+ src_vocab_size,
+ tar_vocab_size,
+ batch_size,
+ num_layers=num_layers,
+ init_scale=init_scale,
+ dropout=dropout)
+ else:
+ model = BaseModel(
+ hidden_size,
+ src_vocab_size,
+ tar_vocab_size,
+ batch_size,
+ num_layers=num_layers,
+ init_scale=init_scale,
+ dropout=dropout)
+
+ loss = model.build_graph()
# clone from default main program and use it as the validation program
main_program = fluid.default_main_program()
- inference_program = fluid.default_main_program().clone()
-
- optimizer = fluid.optimizer.Adam(
- learning_rate=args.learning_rate,
- regularization=fluid.regularizer.L2DecayRegularizer(
- regularization_coeff=1e-5))
-
- optimizer.minimize(avg_cost)
-
- # Disable shuffle for Continuous Evaluation only
- if not args.enable_ce:
- train_batch_generator = paddle.batch(
- paddle.reader.shuffle(
- paddle.dataset.wmt14.train(args.dict_size), buf_size=1000),
- batch_size=args.batch_size,
- drop_last=False)
-
- test_batch_generator = paddle.batch(
- paddle.reader.shuffle(
- paddle.dataset.wmt14.test(args.dict_size), buf_size=1000),
- batch_size=args.batch_size,
- drop_last=False)
+ inference_program = fluid.default_main_program().clone(for_test=True)
+
+ fluid.clip.set_gradient_clip(clip=fluid.clip.GradientClipByGlobalNorm(
+ clip_norm=max_grad_norm))
+
+ lr = args.learning_rate
+ opt_type = args.optimizer
+ if opt_type == "sgd":
+ optimizer = fluid.optimizer.SGD(lr)
+ elif opt_type == "adam":
+ optimizer = fluid.optimizer.Adam(lr)
else:
- train_batch_generator = paddle.batch(
- paddle.dataset.wmt14.train(args.dict_size),
- batch_size=args.batch_size,
- drop_last=False)
+ print("only support [sgd|adam]")
+ raise Exception("opt type not support")
- test_batch_generator = paddle.batch(
- paddle.dataset.wmt14.test(args.dict_size),
- batch_size=args.batch_size,
- drop_last=False)
+ optimizer.minimize(loss)
place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace()
exe = Executor(place)
exe.run(framework.default_startup_program())
- feed_list = [
- main_program.global_block().var(var_name) for var_name in feed_order
- ]
- feeder = fluid.DataFeeder(feed_list, place)
-
- def validation():
- # Use test set as validation each pass
+ train_data_prefix = args.train_data_prefix
+ eval_data_prefix = args.eval_data_prefix
+ test_data_prefix = args.test_data_prefix
+ vocab_prefix = args.vocab_prefix
+ src_lang = args.src_lang
+ tar_lang = args.tar_lang
+ print("begin to load data")
+ raw_data = reader.raw_data(src_lang, tar_lang, vocab_prefix,
+ train_data_prefix, eval_data_prefix,
+ test_data_prefix, args.max_len)
+ print("finished load data")
+ train_data, valid_data, test_data, _ = raw_data
+
+ def prepare_input(batch, epoch_id=0, with_lr=True):
+ src_ids, src_mask, tar_ids, tar_mask = batch
+ res = {}
+ src_ids = src_ids.reshape((src_ids.shape[0], src_ids.shape[1], 1))
+ in_tar = tar_ids[:, :-1]
+ label_tar = tar_ids[:, 1:]
+
+ in_tar = in_tar.reshape((in_tar.shape[0], in_tar.shape[1], 1))
+ label_tar = label_tar.reshape(
+ (label_tar.shape[0], label_tar.shape[1], 1))
+
+ res['src'] = src_ids
+ res['tar'] = in_tar
+ res['label'] = label_tar
+ res['src_sequence_length'] = src_mask
+ res['tar_sequence_length'] = tar_mask
+
+ return res, np.sum(tar_mask)
+
+ # get train epoch size
+ def eval(data, epoch_id=0):
+ eval_data_iter = reader.get_data_iter(data, batch_size, mode='eval')
total_loss = 0.0
- count = 0
- val_feed_list = [
- inference_program.global_block().var(var_name)
- for var_name in feed_order
- ]
- val_feeder = fluid.DataFeeder(val_feed_list, place)
-
- for batch_id, data in enumerate(test_batch_generator()):
- val_fetch_outs = exe.run(inference_program,
- feed=val_feeder.feed(data),
- fetch_list=[avg_cost],
- return_numpy=False)
-
- total_loss += np.array(val_fetch_outs[0])[0]
- count += 1
-
- return total_loss / count
-
- for pass_id in range(1, args.pass_num + 1):
- pass_start_time = time.time()
- words_seen = 0
- for batch_id, data in enumerate(train_batch_generator()):
- words_seen += len(data) * 2
-
- fetch_outs = exe.run(framework.default_main_program(),
- feed=feeder.feed(data),
- fetch_list=[avg_cost])
-
- avg_cost_train = np.array(fetch_outs[0])
- print('pass_id=%d, batch_id=%d, train_loss: %f' %
- (pass_id, batch_id, avg_cost_train))
- # This is for continuous evaluation only
- if args.enable_ce and batch_id >= 100:
- break
-
- pass_end_time = time.time()
- test_loss = validation()
- time_consumed = pass_end_time - pass_start_time
- words_per_sec = words_seen / time_consumed
- print("pass_id=%d, test_loss: %f, words/s: %f, sec/pass: %f" %
- (pass_id, test_loss, words_per_sec, time_consumed))
-
- # This log is for continuous evaluation only
- if args.enable_ce:
- print("kpis\ttrain_cost\t%f" % avg_cost_train)
- print("kpis\ttest_cost\t%f" % test_loss)
- print("kpis\ttrain_duration\t%f" % time_consumed)
-
- if pass_id % args.save_interval == 0:
- model_path = os.path.join(args.save_dir, str(pass_id))
- if not os.path.isdir(model_path):
- os.makedirs(model_path)
-
- fluid.io.save_persistables(
- executor=exe,
- dirname=model_path,
- main_program=framework.default_main_program())
+ word_count = 0.0
+ for batch_id, batch in enumerate(eval_data_iter):
+ input_data_feed, word_num = prepare_input(
+ batch, epoch_id, with_lr=False)
+ fetch_outs = exe.run(inference_program,
+ feed=input_data_feed,
+ fetch_list=[loss.name],
+ use_program_cache=False)
+
+ cost_train = np.array(fetch_outs[0])
+
+ total_loss += cost_train * batch_size
+ word_count += word_num
+
+ ppl = np.exp(total_loss / word_count)
+
+ return ppl
+
+ max_epoch = args.max_epoch
+ for epoch_id in range(max_epoch):
+ start_time = time.time()
+ print("epoch id", epoch_id)
+ train_data_iter = reader.get_data_iter(train_data, batch_size)
+
+ total_loss = 0
+ word_count = 0.0
+ for batch_id, batch in enumerate(train_data_iter):
+
+ input_data_feed, word_num = prepare_input(batch, epoch_id=epoch_id)
+ fetch_outs = exe.run(feed=input_data_feed,
+ fetch_list=[loss.name],
+ use_program_cache=True)
+
+ cost_train = np.array(fetch_outs[0])
+
+ total_loss += cost_train * batch_size
+ word_count += word_num
+
+ if batch_id > 0 and batch_id % 100 == 0:
+ print("ppl", batch_id, np.exp(total_loss / word_count))
+ total_loss = 0.0
+ word_count = 0.0
+
+ dir_name = args.model_path + "/epoch_" + str(epoch_id)
+ print("begin to save", dir_name)
+ fluid.io.save_params(exe, dir_name)
+ print("save finished")
+ dev_ppl = eval(valid_data)
+ print("dev ppl", dev_ppl)
+ test_ppl = eval(test_data)
+ print("test ppl", test_ppl)
if __name__ == '__main__':