提交 a29003b7 编写于 作者: H Hongyu Liu 提交者: Yibing Liu

Add seq2seq padding (#2603)

* change seq2seq to padding impl; test=develop

* add bleu result; test=develop

* fix formate; test=develop

* fix formate; test=develop
上级 273e300f
###!/bin/bash
####This file is only used for continuous evaluation.
model_file='train.py'
python $model_file --pass_num 1 --learning_rate 0.001 --save_interval 10 --enable_ce | python _ce.py
......@@ -10,8 +10,11 @@
├── args.py # 训练、预测以及模型参数
├── train.py # 训练主程序
├── infer.py # 预测主程序
├── run.sh # 默认配置的启动脚本
├── infer.sh # 默认配置的解码脚本
├── attention_model.py # 带注意力机制的翻译模型配置
└── no_attention_model.py # 无注意力机制的翻译模型配置
└── base_model.py # 无注意力机制的翻译模型配置
```
## 简介
......@@ -19,116 +22,93 @@
近年来,深度学习技术的发展不断为机器翻译任务带来新的突破。直接用神经网络将源语言映射到目标语言,即端到端的神经网络机器翻译(End-to-End Neural Machine Translation, End-to-End NMT)模型逐渐成为主流,此类模型一般简称为NMT模型。
本目录包含一个经典的机器翻译模型[RNN Search](https://arxiv.org/pdf/1409.0473.pdf)的Paddle Fluid实现。事实上,RNN search是一个较为传统的NMT模型,在现阶段,其表现已被很多新模型(如[Transformer](https://arxiv.org/abs/1706.03762))超越。但除机器翻译外,该模型是许多序列到序列(sequence to sequence, 以下简称Seq2Seq)类模型的基础,很多解决其他NLP问题的模型均以此模型为基础;因此其在NLP领域具有重要意义,并被广泛用作Baseline.
本目录包含两个经典的机器翻译模型一个base model(不带attention机制),一个带attention机制的翻译模型 .在现阶段,其表现已被很多新模型(如[Transformer](https://arxiv.org/abs/1706.03762))超越。但除机器翻译外,该模型是许多序列到序列(sequence to sequence, 以下简称Seq2Seq)类模型的基础,很多解决其他NLP问题的模型均以此模型为基础;因此其在NLP领域具有重要意义,并被广泛用作Baseline.
本目录下此范例模型的实现,旨在展示如何用Paddle Fluid实现一个带有注意力机制(Attention)的RNN模型来解决Seq2Seq类问题,以及如何使用带有Beam Search算法的解码器。如果您仅仅只是需要在机器翻译方面有着较好翻译效果的模型,则建议您参考[Transformer的Paddle Fluid实现](https://github.com/PaddlePaddle/models/tree/develop/fluid/neural_machine_translation/transformer)
## 模型概览
RNN Search模型使用了经典的编码器-解码器(Encoder-Decoder)的框架结构来解决Seq2Seq类问题。这种方法先用编码器将源序列编码成vector,再用解码器将该vector解码为目标序列。这其实模拟了人类在进行翻译类任务时的行为:先解析源语言,理解其含义,再根据该含义来写出目标语言的语句。编码器和解码器往往都使用RNN来实现。关于此方法的具体原理和数学表达式,可以参考[深度学习101](http://paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/basics/machine_translation/index.html).
本模型中,在编码器方面,我们的实现使用了双向循环神经网络(Bi-directional Recurrent Neural Network);在解码器方面,我们使用了带注意力(Attention)机制的RNN解码器,并同时提供了一个不带注意力机制的解码器实现作为对比;而在预测方面我们使用柱搜索(beam search)算法来生成翻译的目标语句。以下将分别介绍用到的这些方法。
### 双向循环神经网络
这里介绍Bengio团队在论文\[[2](#参考文献),[4](#参考文献)\]中提出的一种双向循环网络结构。该结构的目的是输入一个序列,得到其在每个时刻的特征表示,即输出的每个时刻都用定长向量表示到该时刻的上下文语义信息。
具体来说,该双向循环神经网络分别在时间维以顺序和逆序——即前向(forward)和后向(backward)——依次处理输入序列,并将每个时间步RNN的输出拼接成为最终的输出层。这样每个时间步的输出节点,都包含了输入序列中当前时刻完整的过去和未来的上下文信息。下图展示的是一个按时间步展开的双向循环神经网络。该网络包含一个前向和一个后向RNN,其中有六个权重矩阵:输入到前向隐层和后向隐层的权重矩阵($W_1, W_3$),隐层到隐层自己的权重矩阵($W_2,W_5$),前向隐层和后向隐层到输出层的权重矩阵($W_4, W_6$)。注意,该网络的前向隐层和后向隐层之间没有连接。
<p align="center">
<img src="images/bi_rnn.png" width=450><br/>
图1. 按时间步展开的双向循环神经网络
</p>
<p align="center">
<img src="images/encoder_attention.png" width=500><br/>
图2. 使用双向LSTM的编码器
</p>
### 注意力机制
如果编码阶段的输出是一个固定维度的向量,会带来以下两个问题:1)不论源语言序列的长度是5个词还是50个词,如果都用固定维度的向量去编码其中的语义和句法结构信息,对模型来说是一个非常高的要求,特别是对长句子序列而言;2)直觉上,当人类翻译一句话时,会对与当前译文更相关的源语言片段上给予更多关注,且关注点会随着翻译的进行而改变。而固定维度的向量则相当于,任何时刻都对源语言所有信息给予了同等程度的关注,这是不合理的。因此,Bahdanau等人\[[4](#参考文献)\]引入注意力(attention)机制,可以对编码后的上下文片段进行解码,以此来解决长句子的特征学习问题。下面介绍在注意力机制下的解码器结构。
与简单的解码器不同,这里$z_i$的计算公式为 (由于Github原生不支持LaTeX公式,请您移步[这里](http://www.paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/basics/machine_translation/index.html)查看):
$$z_{i+1}=\phi _{\theta '}\left ( c_i,u_i,z_i \right )$$
可见,源语言句子的编码向量表示为第$i$个词的上下文片段$c_i$,即针对每一个目标语言中的词$u_i$,都有一个特定的$c_i$与之对应。$c_i$的计算公式如下:
$$c_i=\sum _{j=1}^{T}a_{ij}h_j, a_i=\left[ a_{i1},a_{i2},...,a_{iT}\right ]$$
本模型中,在编码器方面,我们采用了基于LSTM的多层的encoder;在解码器方面,我们使用了带注意力(Attention)机制的RNN decoder,并同时提供了一个不带注意力机制的解码器实现作为对比;而在预测方面我们使用柱搜索(beam search)算法来生成翻译的目标语句。以下将分别介绍用到的这些方法。
从公式中可以看出,注意力机制是通过对编码器中各时刻的RNN状态$h_j$进行加权平均实现的。权重$a_{ij}$表示目标语言中第$i$个词对源语言中第$j$个词的注意力大小,$a_{ij}$的计算公式如下:
$$a_{ij} = {exp(e_{ij}) \over {\sum_{k=1}^T exp(e_{ik})}}$$
$$e_{ij} = {align(z_i, h_j)}$$
其中,$align$可以看作是一个对齐模型,用来衡量目标语言中第$i$个词和源语言中第$j$个词的匹配程度。具体而言,这个程度是通过解码RNN的第$i$个隐层状态$z_i$和源语言句子的第$j$个上下文片段$h_j$计算得到的。传统的对齐模型中,目标语言的每个词明确对应源语言的一个或多个词(hard alignment);而在注意力模型中采用的是soft alignment,即任何两个目标语言和源语言词间均存在一定的关联,且这个关联强度是由模型计算得到的实数,因此可以融入整个NMT框架,并通过反向传播算法进行训练。
<p align="center">
<img src="images/decoder_attention.png" width=500><br/>
图3. 基于注意力机制的解码器
</p>
## 数据介绍
### 柱搜索算法
本教程使用[IWSLT'15 English-Vietnamese data ](https://nlp.stanford.edu/projects/nmt/)数据集中的英语到越南语的数据作为训练语料,tst2012的数据作为开发集,tst2013的数据作为测试集
柱搜索([beam search](http://en.wikipedia.org/wiki/Beam_search))是一种启发式图搜索算法,用于在图或树中搜索有限集合中的最优扩展节点,通常用在解空间非常大的系统(如机器翻译、语音识别)中,原因是内存无法装下图或树中所有展开的解。如在机器翻译任务中希望翻译“`<s>你好<e>`”,就算目标语言字典中只有3个词(`<s>`, `<e>`, `hello`),也可能生成无限句话(`hello`循环出现的次数不定),为了找到其中较好的翻译结果,我们可采用柱搜索算法。
### 数据获取
```sh
cd data && sh download_en-vi.sh
```
柱搜索算法使用广度优先策略建立搜索树,在树的每一层,按照启发代价(heuristic cost)(本教程中,为生成词的log概率之和)对节点进行排序,然后仅留下预先确定的个数(文献中通常称为beam width、beam size、柱宽度等)的节点。只有这些节点会在下一层继续扩展,其他节点就被剪掉了,也就是说保留了质量较高的节点,剪枝了质量较差的节点。因此,搜索所占用的空间和时间大幅减少,但缺点是无法保证一定获得最优解。
使用柱搜索算法的解码阶段,目标是最大化生成序列的概率。思路是:
## 训练模型
1. 每一个时刻,根据源语言句子的编码信息$c$、生成的第$i$个目标语言序列单词$u_i$和$i$时刻RNN的隐层状态$z_i$,计算出下一个隐层状态$z_{i+1}$。
2. 将$z_{i+1}$通过`softmax`归一化,得到目标语言序列的第$i+1$个单词的概率分布$p_{i+1}$。
3. 根据$p_{i+1}$采样出单词$u_{i+1}$。
4. 重复步骤1~3,直到获得句子结束标记`<e>`或超过句子的最大生成长度为止。
`run.sh`包含训练程序的主函数,要使用默认参数开始训练,只需要简单地执行:
```sh
python run.sh
```
注意:$z_{i+1}$和$p_{i+1}$的计算公式同解码器中的一样。且由于生成时的每一步都是通过贪心法实现的,因此并不能保证得到全局最优解。
```sh
python train.py \
--src_lang en --tar_lang vi \
--attention True \
--num_layers 2 \
--hidden_size 512 \
--src_vocab_size 17191 \
--tar_vocab_size 7709 \
--batch_size 128 \
--dropout 0.2 \
--init_scale 0.1 \
--max_grad_norm 5.0 \
--train_data_prefix data/en-vi/train \
--eval_data_prefix data/en-vi/tst2012 \
--test_data_prefix data/en-vi/tst2013 \
--vocab_prefix data/en-vi/vocab \
--use_gpu True
## 数据介绍
```
本教程使用[WMT-14](http://www-lium.univ-lemans.fr/~schwenk/cslm_joint_paper/)数据集中的[bitexts(after selection)](http://www-lium.univ-lemans.fr/~schwenk/cslm_joint_paper/data/bitexts.tgz)作为训练集,[dev+test data](http://www-lium.univ-lemans.fr/~schwenk/cslm_joint_paper/data/dev+test.tgz)作为测试集和生成集。
### 数据预处理
训练程序会在每个epoch训练结束之后,save一次模型
我们的预处理流程包括两步:
- 将每个源语言到目标语言的平行语料库文件合并为一个文件:
- 合并每个`XXX.src``XXX.trg`文件为`XXX`
- `XXX`中的第$i$行内容为`XXX.src`中的第$i$行和`XXX.trg`中的第$i$行连接,用'\t'分隔。
- 创建训练数据的“源字典”和“目标字典”。每个字典都有**DICTSIZE**个单词,包括:语料中词频最高的(DICTSIZE - 3)个单词,和3个特殊符号`<s>`(序列的开始)、`<e>`(序列的结束)和`<unk>`(未登录词)。
当模型训练完成之后, 可以利用infer.py的脚本进行预测,默认使用beam search的方法进行预测,加载第10个epoch的模型进行预测,对test的数据集进行解码
```sh
python infer.sh
```
如果想预测别的数据文件,只需要将 --infer_file参数进行修改
### 示例数据
```sh
python infer.py \
--src_lang en --tar_lang vi \
--num_layers 2 \
--hidden_size 512 \
--src_vocab_size 17191 \
--tar_vocab_size 7709 \
--batch_size 128 \
--dropout 0.2 \
--init_scale 0.1 \
--max_grad_norm 5.0 \
--vocab_prefix data/en-vi/vocab \
--infer_file data/en-vi/tst2013.en \
--reload_model model_new/epoch_10/ \
--use_gpu True
因为完整的数据集数据量较大,为了验证训练流程,PaddlePaddle接口paddle.dataset.wmt14中默认提供了一个经过预处理的[较小规模的数据集](http://paddlepaddle.bj.bcebos.com/demo/wmt_shrinked_data/wmt14.tgz)
```
该数据集有193319条训练数据,6003条测试数据,词典长度为30000。因为数据规模限制,使用该数据集训练出来的模型效果无法保证。
## 效果
## 训练模型
单个模型 beam_size = 10
`train.py`包含训练程序的主函数,要使用默认参数开始训练,只需要简单地执行:
```sh
python train.py
```
您可以使用命令行参数来设置模型训练时的参数。要显示所有可用的命令行参数,执行:
```sh
python train.py -h
```
这样会显示所有的命令行参数的描述,以及其默认值。默认的模型是带有注意力机制的。您也可以尝试运行无注意力机制的模型,命令如下:
```sh
python train.py --no_attention
```
训练好的模型默认会被保存到`./models`路径下。您可以用命令行参数`--save_dir`来指定模型的保存路径。默认每个pass结束时会保存一个模型。
no attention
## 生成预测结果
tst2012 BLEU: 11.58
tst2013 BLEU: 12.20
在模型训练好后,可以用`infer.py`来生成预测结果。同样的,使用默认参数,只需要执行:
```sh
python infer.py
```
您也可以同样用命令行来指定各参数。注意,预测时的参数设置必须与训练时完全一致,否则载入模型会失败。您可以用`--pass_num`参数来选择读取哪个pass结束时保存的模型。同时您可以使用`--beam_width`参数来选择beam search宽度。
## 参考文献
1. Koehn P. [Statistical machine translation](https://books.google.com.hk/books?id=4v_Cx1wIMLkC&printsec=frontcover&hl=zh-CN&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false)[M]. Cambridge University Press, 2009.
2. Cho K, Van Merriënboer B, Gulcehre C, et al. [Learning phrase representations using RNN encoder-decoder for statistical machine translation](http://www.aclweb.org/anthology/D/D14/D14-1179.pdf)[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014: 1724-1734.
3. Chung J, Gulcehre C, Cho K H, et al. [Empirical evaluation of gated recurrent neural networks on sequence modeling](https://arxiv.org/abs/1412.3555)[J]. arXiv preprint arXiv:1412.3555, 2014.
4. Bahdanau D, Cho K, Bengio Y. [Neural machine translation by jointly learning to align and translate](https://arxiv.org/abs/1409.0473)[C]//Proceedings of ICLR 2015, 2015.
5. Papineni K, Roukos S, Ward T, et al. [BLEU: a method for automatic evaluation of machine translation](http://dl.acm.org/citation.cfm?id=1073135)[C]//Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, 2002: 311-318.
with attention
<br/>
<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="知识共享许可协议" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" href="http://purl.org/dc/dcmitype/Text" property="dct:title" rel="dct:type">本教程</span><a xmlns:cc="http://creativecommons.org/ns#" href="http://book.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">PaddlePaddle</a> 创作,采用 <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">知识共享 署名-相同方式共享 4.0 国际 许可协议</a>进行许可。
tst2012 BLEU: 22.21
tst2013 BLEU: 25.30
```
####this file is only used for continuous evaluation test!
import os
import sys
sys.path.append(os.environ['ceroot'])
from kpi import CostKpi, DurationKpi, AccKpi
#### NOTE kpi.py should shared in models in some way!!!!
train_cost_kpi = CostKpi('train_cost', 0.02, 0, actived=False)
test_cost_kpi = CostKpi('test_cost', 0.005, 0, actived=False)
train_duration_kpi = DurationKpi('train_duration', 0.06, 0, actived=False)
tracking_kpis = [
train_cost_kpi,
test_cost_kpi,
train_duration_kpi,
]
def parse_log(log):
'''
This method should be implemented by model developers.
The suggestion:
each line in the log should be key, value, for example:
"
train_cost\t1.0
test_cost\t1.0
train_cost\t1.0
train_cost\t1.0
train_acc\t1.2
"
'''
for line in log.split('\n'):
fs = line.strip().split('\t')
print(fs)
if len(fs) == 3 and fs[0] == 'kpis':
print("-----%s" % fs)
kpi_name = fs[1]
kpi_value = float(fs[2])
yield kpi_name, kpi_value
def log_to_ce(log):
kpi_tracker = {}
for kpi in tracking_kpis:
kpi_tracker[kpi.name] = kpi
for (kpi_name, kpi_value) in parse_log(log):
print(kpi_name, kpi_value)
kpi_tracker[kpi_name].add_record(kpi_value)
kpi_tracker[kpi_name].persist()
if __name__ == '__main__':
log = sys.stdin.read()
print("*****")
print(log)
print("****")
log_to_ce(log)
......@@ -23,76 +23,95 @@ import distutils.util
def parse_args():
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument(
"--embedding_dim",
type=int,
default=512,
help="The dimension of embedding table. (default: %(default)d)")
"--train_data_prefix", type=str, help="file prefix for train data")
parser.add_argument(
"--encoder_size",
type=int,
default=512,
help="The size of encoder bi-rnn unit. (default: %(default)d)")
"--eval_data_prefix", type=str, help="file prefix for eval data")
parser.add_argument(
"--decoder_size",
type=int,
default=512,
help="The size of decoder rnn unit. (default: %(default)d)")
"--test_data_prefix", type=str, help="file prefix for test data")
parser.add_argument(
"--batch_size",
type=int,
default=32,
help="The sequence number of a mini-batch data. (default: %(default)d)")
"--vocab_prefix", type=str, help="file prefix for vocab")
parser.add_argument("--src_lang", type=str, help="source language suffix")
parser.add_argument("--tar_lang", type=str, help="target language suffix")
parser.add_argument(
"--dict_size",
type=int,
default=30000,
help="The dictionary capacity. Dictionaries of source sequence and "
"target dictionary have same capacity. (default: %(default)d)")
"--attention",
type=bool,
default=False,
help="Whether use attention model")
parser.add_argument(
"--pass_num",
type=int,
default=5,
help="The pass number to train. In inference mode, load the saved model"
" at the end of given pass.(default: %(default)d)")
"--optimizer",
type=str,
default='adam',
help="optimizer to use, only supprt[sgd|adam]")
parser.add_argument(
"--learning_rate",
type=float,
default=0.01,
help="Learning rate used to train the model. (default: %(default)f)")
default=0.001,
help="learning rate for optimizer")
parser.add_argument(
"--no_attention",
action='store_true',
help="If set, run no attention model instead of attention model.")
"--num_layers",
type=int,
default=1,
help="layers number of encoder and decoder")
parser.add_argument(
"--beam_size",
"--hidden_size",
type=int,
default=3,
help="The width for beam search. (default: %(default)d)")
default=100,
help="hidden size of encoder and decoder")
parser.add_argument("--src_vocab_size", type=int, help="source vocab size")
parser.add_argument("--tar_vocab_size", type=int, help="target vocab size")
parser.add_argument(
"--batch_size", type=int, help="batch size of each step")
parser.add_argument(
"--use_gpu",
type=distutils.util.strtobool,
default=True,
help="Whether to use gpu or not. (default: %(default)d)")
"--max_epoch", type=int, default=12, help="max epoch for the training")
parser.add_argument(
"--max_length",
"--max_len",
type=int,
default=50,
help="The maximum sequence length for translation result."
"(default: %(default)d)")
help="max length for source and target sentence")
parser.add_argument(
"--save_dir",
"--dropout", type=float, default=0.0, help="drop probability")
parser.add_argument(
"--init_scale",
type=float,
default=0.0,
help="init scale for parameter")
parser.add_argument(
"--max_grad_norm",
type=float,
default=5.0,
help="max grad norm for global norm clip")
parser.add_argument(
"--model_path",
type=str,
default="model",
help="Specify the path to save trained models.")
default='./model',
help="model path for model to save")
parser.add_argument(
"--save_interval",
type=int,
default=1,
help="Save the trained model every n passes."
"(default: %(default)d)")
"--reload_model", type=str, help="reload model to inference")
parser.add_argument(
"--infer_file", type=str, help="file name for inference")
parser.add_argument(
"--infer_output_file",
type=str,
default='./infer_output',
help="file name for inference output")
parser.add_argument(
"--beam_size", type=int, default=10, help="file name for inference")
parser.add_argument(
"--enable_ce",
action='store_true',
help="If set, run the task with continuous evaluation logs.")
'--use_gpu',
type=bool,
default=False,
help='Whether using gpu [True|False]')
args = parser.parse_args()
return args
#!/bin/sh
# IWSLT15 Vietnames to English is a small dataset contain 133k parallel data
# this script download the data from stanford website
#
# Usage:
# ./download_en-vi.sh output_path
#
# If output_path is not specified, a dir nameed "./en_vi" will be created and used as
# output path
set -ex
OUTPUT_PATH="${1:-en-vi}"
SITE_PATH="https://nlp.stanford.edu/projects/nmt/data"
mkdir -v -p $OUTPUT_PATH
# Download iwslt15 small dataset from standford website.
echo "Begin to download training dataset train.en and train.vi."
wget "$SITE_PATH/iwslt15.en-vi/train.en" -O "$OUTPUT_PATH/train.en"
wget "$SITE_PATH/iwslt15.en-vi/train.vi" -O "$OUTPUT_PATH/train.vi"
echo "Begin to download dev dataset tst2012.en and tst2012.vi."
wget "$SITE_PATH/iwslt15.en-vi/tst2012.en" -O "$OUTPUT_PATH/tst2012.en"
wget "$SITE_PATH/iwslt15.en-vi/tst2012.vi" -O "$OUTPUT_PATH/tst2012.vi"
echo "Begin to download test dataset tst2013.en and tst2013.vi."
wget "$SITE_PATH/iwslt15.en-vi/tst2013.en" -O "$OUTPUT_PATH/tst2013.en"
wget "$SITE_PATH/iwslt15.en-vi/tst2013.vi" -O "$OUTPUT_PATH/tst2013.vi"
echo "Begin to ownload vocab file vocab.en and vocab.vi."
wget "$SITE_PATH/iwslt15.en-vi/vocab.en" -O "$OUTPUT_PATH/vocab.en"
wget "$SITE_PATH/iwslt15.en-vi/vocab.vi" -O "$OUTPUT_PATH/vocab.vi"
......@@ -17,120 +17,146 @@ from __future__ import division
from __future__ import print_function
import numpy as np
import time
import os
import six
import random
import math
import paddle
import paddle.fluid as fluid
import paddle.fluid.framework as framework
from paddle.fluid.executor import Executor
from paddle.fluid.contrib.decoder.beam_search_decoder import *
import reader
import sys
if sys.version[0] == '2':
reload(sys)
sys.setdefaultencoding("utf-8")
import os
from args import *
import attention_model
import no_attention_model
#from . import lm_model
import logging
import pickle
from attention_model import AttentionModel
from base_model import BaseModel
SEED = 123
def infer():
def train():
args = parse_args()
# Inference
if args.no_attention:
translation_ids, translation_scores, feed_order = \
no_attention_model.seq_to_seq_net(
args.embedding_dim,
args.encoder_size,
args.decoder_size,
args.dict_size,
args.dict_size,
True,
beam_size=args.beam_size,
max_length=args.max_length)
num_layers = args.num_layers
src_vocab_size = args.src_vocab_size
tar_vocab_size = args.tar_vocab_size
batch_size = args.batch_size
dropout = args.dropout
init_scale = args.init_scale
max_grad_norm = args.max_grad_norm
hidden_size = args.hidden_size
# inference process
print("src", src_vocab_size)
# dropout type using upscale_in_train, dropout can be remove in inferecen
# So we can set dropout to 0
if args.attention:
model = AttentionModel(
hidden_size,
src_vocab_size,
tar_vocab_size,
batch_size,
num_layers=num_layers,
init_scale=init_scale,
dropout=0.0)
else:
translation_ids, translation_scores, feed_order = \
attention_model.seq_to_seq_net(
args.embedding_dim,
args.encoder_size,
args.decoder_size,
args.dict_size,
args.dict_size,
True,
beam_size=args.beam_size,
max_length=args.max_length)
test_batch_generator = paddle.batch(
paddle.reader.shuffle(
paddle.dataset.wmt14.test(args.dict_size), buf_size=1000),
batch_size=args.batch_size,
drop_last=False)
model = BaseModel(
hidden_size,
src_vocab_size,
tar_vocab_size,
batch_size,
num_layers=num_layers,
init_scale=init_scale,
dropout=0.0)
beam_size = args.beam_size
trans_res = model.build_graph(mode='beam_search', beam_size=beam_size)
# clone from default main program and use it as the validation program
main_program = fluid.default_main_program()
place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace()
exe = Executor(place)
exe.run(framework.default_startup_program())
model_path = os.path.join(args.save_dir, str(args.pass_num))
fluid.io.load_persistables(
executor=exe,
dirname=model_path,
main_program=framework.default_main_program())
src_dict, trg_dict = paddle.dataset.wmt14.get_dict(args.dict_size)
feed_list = [
framework.default_main_program().global_block().var(var_name)
for var_name in feed_order[0:1]
]
feeder = fluid.DataFeeder(feed_list, place)
for batch_id, data in enumerate(test_batch_generator()):
# The value of batch_size may vary in the last batch
batch_size = len(data)
# Setup initial ids and scores lod tensor
init_ids_data = np.array([0 for _ in range(batch_size)], dtype='int64')
init_scores_data = np.array(
[1. for _ in range(batch_size)], dtype='float32')
init_ids_data = init_ids_data.reshape((batch_size, 1))
init_scores_data = init_scores_data.reshape((batch_size, 1))
init_recursive_seq_lens = [1] * batch_size
init_recursive_seq_lens = [
init_recursive_seq_lens, init_recursive_seq_lens
]
init_ids = fluid.create_lod_tensor(init_ids_data,
init_recursive_seq_lens, place)
init_scores = fluid.create_lod_tensor(init_scores_data,
init_recursive_seq_lens, place)
# Feed dict for inference
feed_dict = feeder.feed([[x[0]] for x in data])
feed_dict['init_ids'] = init_ids
feed_dict['init_scores'] = init_scores
fetch_outs = exe.run(framework.default_main_program(),
feed=feed_dict,
fetch_list=[translation_ids, translation_scores],
return_numpy=False)
# Split the output words by lod levels
lod_level_1 = fetch_outs[0].lod()[1]
token_array = np.array(fetch_outs[0])
result = []
for i in six.moves.xrange(len(lod_level_1) - 1):
sentence_list = [
trg_dict[token]
for token in token_array[lod_level_1[i]:lod_level_1[i + 1]]
]
sentence = " ".join(sentence_list[1:-1])
result.append(sentence)
lod_level_0 = fetch_outs[0].lod()[0]
paragraphs = [
result[lod_level_0[i]:lod_level_0[i + 1]]
for i in six.moves.xrange(len(lod_level_0) - 1)
]
for paragraph in paragraphs:
print(paragraph)
source_vocab_file = args.vocab_prefix + "." + args.src_lang
infer_file = args.infer_file
infer_data = reader.raw_mono_data(source_vocab_file, infer_file)
def prepare_input(batch, epoch_id=0, with_lr=True):
src_ids, src_mask, tar_ids, tar_mask = batch
res = {}
src_ids = src_ids.reshape((src_ids.shape[0], src_ids.shape[1], 1))
in_tar = tar_ids[:, :-1]
label_tar = tar_ids[:, 1:]
in_tar = in_tar.reshape((in_tar.shape[0], in_tar.shape[1], 1))
in_tar = np.zeros_like(in_tar, dtype='int64')
label_tar = label_tar.reshape(
(label_tar.shape[0], label_tar.shape[1], 1))
label_tar = np.zeros_like(label_tar, dtype='int64')
res['src'] = src_ids
res['tar'] = in_tar
res['label'] = label_tar
res['src_sequence_length'] = src_mask
res['tar_sequence_length'] = tar_mask
return res, np.sum(tar_mask)
dir_name = args.reload_model
print("dir name", dir_name)
fluid.io.load_params(exe, dir_name)
train_data_iter = reader.get_data_iter(infer_data, 1, mode='eval')
tar_id2vocab = []
tar_vocab_file = args.vocab_prefix + "." + args.tar_lang
with open(tar_vocab_file, "r") as f:
for line in f.readlines():
tar_id2vocab.append(line.strip())
infer_output_file = args.infer_output_file
out_file = open(infer_output_file, 'w')
for batch_id, batch in enumerate(train_data_iter):
input_data_feed, word_num = prepare_input(batch, epoch_id=0)
fetch_outs = exe.run(feed=input_data_feed,
fetch_list=[trans_res.name],
use_program_cache=False)
res = [tar_id2vocab[e] for e in fetch_outs[0].reshape(-1)]
res = res[1:]
new_res = []
for ele in res:
if ele == "</s>":
break
new_res.append(ele)
out_file.write(' '.join(new_res))
out_file.write('\n')
out_file.close()
if __name__ == '__main__':
infer()
train()
#!/bin/bash
set -ex
export CUDA_VISIBLE_DEVICES=0
python infer.py \
--src_lang en --tar_lang vi \
--attention True \
--num_layers 2 \
--hidden_size 512 \
--src_vocab_size 17191 \
--tar_vocab_size 7709 \
--batch_size 128 \
--dropout 0.2 \
--init_scale 0.1 \
--max_grad_norm 5.0 \
--vocab_prefix data/en-vi/vocab \
--infer_file data/en-vi/tst2013.en \
--reload_model ./model/epoch_10 \
--use_gpu True
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import paddle.fluid.layers as layers
from paddle.fluid.contrib.decoder.beam_search_decoder import *
def seq_to_seq_net(embedding_dim, encoder_size, decoder_size, source_dict_dim,
target_dict_dim, is_generating, beam_size, max_length):
def encoder():
# Encoder implementation of RNN translation
src_word = layers.data(
name="src_word", shape=[1], dtype='int64', lod_level=1)
src_embedding = layers.embedding(
input=src_word,
size=[source_dict_dim, embedding_dim],
dtype='float32',
is_sparse=True)
fc1 = layers.fc(input=src_embedding, size=encoder_size * 4, act='tanh')
lstm_hidden0, lstm_0 = layers.dynamic_lstm(
input=fc1, size=encoder_size * 4)
encoder_out = layers.sequence_last_step(input=lstm_hidden0)
return encoder_out
def decoder_state_cell(context):
# Decoder state cell, specifies the hidden state variable and its updater
h = InitState(init=context, need_reorder=True)
state_cell = StateCell(
inputs={'x': None}, states={'h': h}, out_state='h')
@state_cell.state_updater
def updater(state_cell):
current_word = state_cell.get_input('x')
prev_h = state_cell.get_state('h')
# make sure lod of h heritted from prev_h
h = layers.fc(input=[prev_h, current_word],
size=decoder_size,
act='tanh')
state_cell.set_state('h', h)
return state_cell
def decoder_train(state_cell):
# Decoder for training implementation of RNN translation
trg_word = layers.data(
name="target_word", shape=[1], dtype='int64', lod_level=1)
trg_embedding = layers.embedding(
input=trg_word,
size=[target_dict_dim, embedding_dim],
dtype='float32',
is_sparse=True)
# A training decoder
decoder = TrainingDecoder(state_cell)
# Define the computation in each RNN step done by decoder
with decoder.block():
current_word = decoder.step_input(trg_embedding)
decoder.state_cell.compute_state(inputs={'x': current_word})
current_score = layers.fc(input=decoder.state_cell.get_state('h'),
size=target_dict_dim,
act='softmax')
decoder.state_cell.update_states()
decoder.output(current_score)
return decoder()
def decoder_infer(state_cell):
# Decoder for inference implementation
init_ids = layers.data(
name="init_ids", shape=[1], dtype="int64", lod_level=2)
init_scores = layers.data(
name="init_scores", shape=[1], dtype="float32", lod_level=2)
# A beam search decoder for inference
decoder = BeamSearchDecoder(
state_cell=state_cell,
init_ids=init_ids,
init_scores=init_scores,
target_dict_dim=target_dict_dim,
word_dim=embedding_dim,
input_var_dict={},
topk_size=50,
sparse_emb=True,
max_len=max_length,
beam_size=beam_size,
end_id=1,
name=None)
decoder.decode()
translation_ids, translation_scores = decoder()
return translation_ids, translation_scores
context = encoder()
state_cell = decoder_state_cell(context)
if not is_generating:
label = layers.data(
name="target_next_word", shape=[1], dtype='int64', lod_level=1)
rnn_out = decoder_train(state_cell)
cost = layers.cross_entropy(input=rnn_out, label=label)
avg_cost = layers.mean(x=cost)
feeding_list = ['src_word', 'target_word', 'target_next_word']
return avg_cost, feeding_list
else:
translation_ids, translation_scores = decoder_infer(state_cell)
feeding_list = ['src_word']
return translation_ids, translation_scores, feeding_list
# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Utilities for parsing PTB text files."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import os
import sys
import numpy as np
Py3 = sys.version_info[0] == 3
UNK_ID = 0
def _read_words(filename):
data = []
with open(filename, "r") as f:
if Py3:
return f.read().replace("\n", "<eos>").split()
else:
return f.read().decode("utf-8").replace("\n", "<eos>").split()
def read_all_line(filenam):
data = []
with open(filename, "r") as f:
for line in f.readlines():
data.append(line.strip())
def _build_vocab(filename):
vocab_dict = {}
ids = 0
with open(filename, "r") as f:
for line in f.readlines():
vocab_dict[line.strip()] = ids
ids += 1
print("vocab word num", ids)
return vocab_dict
def _para_file_to_ids(src_file, tar_file, src_vocab, tar_vocab):
src_data = []
with open(src_file, "r") as f_src:
for line in f_src.readlines():
arra = line.strip().split()
ids = [src_vocab[w] if w in src_vocab else UNK_ID for w in arra]
ids = ids
src_data.append(ids)
tar_data = []
with open(tar_file, "r") as f_tar:
for line in f_tar.readlines():
arra = line.strip().split()
ids = [tar_vocab[w] if w in tar_vocab else UNK_ID for w in arra]
ids = [1] + ids + [2]
tar_data.append(ids)
return src_data, tar_data
def filter_len(src, tar, max_sequence_len=50):
new_src = []
new_tar = []
for id1, id2 in zip(src, tar):
if len(id1) > max_sequence_len:
id1 = id1[:max_sequence_len]
if len(id2) > max_sequence_len + 2:
id2 = id2[:max_sequence_len + 2]
new_src.append(id1)
new_tar.append(id2)
return new_src, new_tar
def raw_data(src_lang,
tar_lang,
vocab_prefix,
train_prefix,
eval_prefix,
test_prefix,
max_sequence_len=50):
src_vocab_file = vocab_prefix + "." + src_lang
tar_vocab_file = vocab_prefix + "." + tar_lang
src_train_file = train_prefix + "." + src_lang
tar_train_file = train_prefix + "." + tar_lang
src_eval_file = eval_prefix + "." + src_lang
tar_eval_file = eval_prefix + "." + tar_lang
src_test_file = test_prefix + "." + src_lang
tar_test_file = test_prefix + "." + tar_lang
src_vocab = _build_vocab(src_vocab_file)
tar_vocab = _build_vocab(tar_vocab_file)
train_src, train_tar = _para_file_to_ids( src_train_file, tar_train_file, \
src_vocab, tar_vocab )
train_src, train_tar = filter_len(
train_src, train_tar, max_sequence_len=max_sequence_len)
eval_src, eval_tar = _para_file_to_ids( src_eval_file, tar_eval_file, \
src_vocab, tar_vocab )
test_src, test_tar = _para_file_to_ids( src_test_file, tar_test_file, \
src_vocab, tar_vocab )
return ( train_src, train_tar), (eval_src, eval_tar), (test_src, test_tar),\
(src_vocab, tar_vocab)
def raw_mono_data(vocab_file, file_path):
src_vocab = _build_vocab(vocab_file)
test_src, test_tar = _para_file_to_ids( file_path, file_path, \
src_vocab, src_vocab )
return (test_src, test_tar)
def get_data_iter(raw_data, batch_size, mode='train'):
src_data, tar_data = raw_data
data_len = len(src_data)
index = np.arange(data_len)
if mode == "train":
np.random.shuffle(index)
def to_pad_np(data, source=False):
max_len = 0
for ele in data:
if len(ele) > max_len:
max_len = len(ele)
ids = np.ones((batch_size, max_len), dtype='int64') * 2
mask = np.zeros((batch_size), dtype='int32')
for i, ele in enumerate(data):
ids[i, :len(ele)] = ele
if not source:
mask[i] = len(ele) - 1
else:
mask[i] = len(ele)
return ids, mask
b_src = []
cache_num = 20
if mode != "train":
cache_num = 1
for j in range(data_len):
if len(b_src) == batch_size * cache_num:
# build batch size
# sort
new_cache = sorted(b_src, key=lambda k: len(k[0]))
for i in range(cache_num):
batch_data = new_cache[i * batch_size:(i + 1) * batch_size]
src_cache = [w[0] for w in batch_data]
tar_cache = [w[1] for w in batch_data]
src_ids, src_mask = to_pad_np(src_cache, source=True)
tar_ids, tar_mask = to_pad_np(tar_cache)
#print( "src ids", src_ids )
yield (src_ids, src_mask, tar_ids, tar_mask)
b_src = []
b_src.append((src_data[index[j]], tar_data[index[j]]))
if len(b_src) == batch_size * cache_num:
new_cache = sorted(b_src, key=lambda k: len(k[0]))
for i in range(cache_num):
batch_data = new_cache[i * batch_size:(i + 1) * batch_size]
src_cache = [w[0] for w in batch_data]
tar_cache = [w[1] for w in batch_data]
src_ids, src_mask = to_pad_np(src_cache, source=True)
tar_ids, tar_mask = to_pad_np(tar_cache)
#print( "src ids", src_ids )
yield (src_ids, src_mask, tar_ids, tar_mask)
#!/bin/bash
set -ex
export CUDA_VISIBLE_DEVICES=0
python train.py \
--src_lang en --tar_lang vi \
--attention True \
--num_layers 2 \
--hidden_size 512 \
--src_vocab_size 17191 \
--tar_vocab_size 7709 \
--batch_size 128 \
--dropout 0.2 \
--init_scale 0.1 \
--max_grad_norm 5.0 \
--train_data_prefix data/en-vi/train \
--eval_data_prefix data/en-vi/tst2012 \
--test_data_prefix data/en-vi/tst2013 \
--vocab_prefix data/en-vi/vocab \
--use_gpu True
......@@ -19,150 +19,175 @@ from __future__ import print_function
import numpy as np
import time
import os
import random
import math
import paddle
import paddle.fluid as fluid
import paddle.fluid.framework as framework
from paddle.fluid.executor import Executor
from paddle.fluid.contrib.decoder.beam_search_decoder import *
import reader
import sys
if sys.version[0] == '2':
reload(sys)
sys.setdefaultencoding("utf-8")
import os
from args import *
import attention_model
import no_attention_model
from base_model import BaseModel
from attention_model import AttentionModel
import logging
import pickle
SEED = 123
def train():
args = parse_args()
if args.enable_ce:
framework.default_startup_program().random_seed = 111
num_layers = args.num_layers
src_vocab_size = args.src_vocab_size
tar_vocab_size = args.tar_vocab_size
batch_size = args.batch_size
dropout = args.dropout
init_scale = args.init_scale
max_grad_norm = args.max_grad_norm
hidden_size = args.hidden_size
# Training process
if args.no_attention:
avg_cost, feed_order = no_attention_model.seq_to_seq_net(
args.embedding_dim,
args.encoder_size,
args.decoder_size,
args.dict_size,
args.dict_size,
False,
beam_size=args.beam_size,
max_length=args.max_length)
else:
avg_cost, feed_order = attention_model.seq_to_seq_net(
args.embedding_dim,
args.encoder_size,
args.decoder_size,
args.dict_size,
args.dict_size,
False,
beam_size=args.beam_size,
max_length=args.max_length)
if args.attention:
model = AttentionModel(
hidden_size,
src_vocab_size,
tar_vocab_size,
batch_size,
num_layers=num_layers,
init_scale=init_scale,
dropout=dropout)
else:
model = BaseModel(
hidden_size,
src_vocab_size,
tar_vocab_size,
batch_size,
num_layers=num_layers,
init_scale=init_scale,
dropout=dropout)
loss = model.build_graph()
# clone from default main program and use it as the validation program
main_program = fluid.default_main_program()
inference_program = fluid.default_main_program().clone()
optimizer = fluid.optimizer.Adam(
learning_rate=args.learning_rate,
regularization=fluid.regularizer.L2DecayRegularizer(
regularization_coeff=1e-5))
optimizer.minimize(avg_cost)
# Disable shuffle for Continuous Evaluation only
if not args.enable_ce:
train_batch_generator = paddle.batch(
paddle.reader.shuffle(
paddle.dataset.wmt14.train(args.dict_size), buf_size=1000),
batch_size=args.batch_size,
drop_last=False)
test_batch_generator = paddle.batch(
paddle.reader.shuffle(
paddle.dataset.wmt14.test(args.dict_size), buf_size=1000),
batch_size=args.batch_size,
drop_last=False)
inference_program = fluid.default_main_program().clone(for_test=True)
fluid.clip.set_gradient_clip(clip=fluid.clip.GradientClipByGlobalNorm(
clip_norm=max_grad_norm))
lr = args.learning_rate
opt_type = args.optimizer
if opt_type == "sgd":
optimizer = fluid.optimizer.SGD(lr)
elif opt_type == "adam":
optimizer = fluid.optimizer.Adam(lr)
else:
train_batch_generator = paddle.batch(
paddle.dataset.wmt14.train(args.dict_size),
batch_size=args.batch_size,
drop_last=False)
print("only support [sgd|adam]")
raise Exception("opt type not support")
test_batch_generator = paddle.batch(
paddle.dataset.wmt14.test(args.dict_size),
batch_size=args.batch_size,
drop_last=False)
optimizer.minimize(loss)
place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace()
exe = Executor(place)
exe.run(framework.default_startup_program())
feed_list = [
main_program.global_block().var(var_name) for var_name in feed_order
]
feeder = fluid.DataFeeder(feed_list, place)
def validation():
# Use test set as validation each pass
train_data_prefix = args.train_data_prefix
eval_data_prefix = args.eval_data_prefix
test_data_prefix = args.test_data_prefix
vocab_prefix = args.vocab_prefix
src_lang = args.src_lang
tar_lang = args.tar_lang
print("begin to load data")
raw_data = reader.raw_data(src_lang, tar_lang, vocab_prefix,
train_data_prefix, eval_data_prefix,
test_data_prefix, args.max_len)
print("finished load data")
train_data, valid_data, test_data, _ = raw_data
def prepare_input(batch, epoch_id=0, with_lr=True):
src_ids, src_mask, tar_ids, tar_mask = batch
res = {}
src_ids = src_ids.reshape((src_ids.shape[0], src_ids.shape[1], 1))
in_tar = tar_ids[:, :-1]
label_tar = tar_ids[:, 1:]
in_tar = in_tar.reshape((in_tar.shape[0], in_tar.shape[1], 1))
label_tar = label_tar.reshape(
(label_tar.shape[0], label_tar.shape[1], 1))
res['src'] = src_ids
res['tar'] = in_tar
res['label'] = label_tar
res['src_sequence_length'] = src_mask
res['tar_sequence_length'] = tar_mask
return res, np.sum(tar_mask)
# get train epoch size
def eval(data, epoch_id=0):
eval_data_iter = reader.get_data_iter(data, batch_size, mode='eval')
total_loss = 0.0
count = 0
val_feed_list = [
inference_program.global_block().var(var_name)
for var_name in feed_order
]
val_feeder = fluid.DataFeeder(val_feed_list, place)
for batch_id, data in enumerate(test_batch_generator()):
val_fetch_outs = exe.run(inference_program,
feed=val_feeder.feed(data),
fetch_list=[avg_cost],
return_numpy=False)
total_loss += np.array(val_fetch_outs[0])[0]
count += 1
return total_loss / count
for pass_id in range(1, args.pass_num + 1):
pass_start_time = time.time()
words_seen = 0
for batch_id, data in enumerate(train_batch_generator()):
words_seen += len(data) * 2
fetch_outs = exe.run(framework.default_main_program(),
feed=feeder.feed(data),
fetch_list=[avg_cost])
avg_cost_train = np.array(fetch_outs[0])
print('pass_id=%d, batch_id=%d, train_loss: %f' %
(pass_id, batch_id, avg_cost_train))
# This is for continuous evaluation only
if args.enable_ce and batch_id >= 100:
break
pass_end_time = time.time()
test_loss = validation()
time_consumed = pass_end_time - pass_start_time
words_per_sec = words_seen / time_consumed
print("pass_id=%d, test_loss: %f, words/s: %f, sec/pass: %f" %
(pass_id, test_loss, words_per_sec, time_consumed))
# This log is for continuous evaluation only
if args.enable_ce:
print("kpis\ttrain_cost\t%f" % avg_cost_train)
print("kpis\ttest_cost\t%f" % test_loss)
print("kpis\ttrain_duration\t%f" % time_consumed)
if pass_id % args.save_interval == 0:
model_path = os.path.join(args.save_dir, str(pass_id))
if not os.path.isdir(model_path):
os.makedirs(model_path)
fluid.io.save_persistables(
executor=exe,
dirname=model_path,
main_program=framework.default_main_program())
word_count = 0.0
for batch_id, batch in enumerate(eval_data_iter):
input_data_feed, word_num = prepare_input(
batch, epoch_id, with_lr=False)
fetch_outs = exe.run(inference_program,
feed=input_data_feed,
fetch_list=[loss.name],
use_program_cache=False)
cost_train = np.array(fetch_outs[0])
total_loss += cost_train * batch_size
word_count += word_num
ppl = np.exp(total_loss / word_count)
return ppl
max_epoch = args.max_epoch
for epoch_id in range(max_epoch):
start_time = time.time()
print("epoch id", epoch_id)
train_data_iter = reader.get_data_iter(train_data, batch_size)
total_loss = 0
word_count = 0.0
for batch_id, batch in enumerate(train_data_iter):
input_data_feed, word_num = prepare_input(batch, epoch_id=epoch_id)
fetch_outs = exe.run(feed=input_data_feed,
fetch_list=[loss.name],
use_program_cache=True)
cost_train = np.array(fetch_outs[0])
total_loss += cost_train * batch_size
word_count += word_num
if batch_id > 0 and batch_id % 100 == 0:
print("ppl", batch_id, np.exp(total_loss / word_count))
total_loss = 0.0
word_count = 0.0
dir_name = args.model_path + "/epoch_" + str(epoch_id)
print("begin to save", dir_name)
fluid.io.save_params(exe, dir_name)
print("save finished")
dev_ppl = eval(valid_data)
print("dev ppl", dev_ppl)
test_ppl = eval(test_data)
print("test ppl", test_ppl)
if __name__ == '__main__':
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册