提交 7d018ca8 编写于 作者: L LiuChiaChi

update seq2seq using 2.0-beta api

上级 295c16b6
运行本目录下的范例模型需要安装PaddlePaddle Fluid 1.7版。如果您的 PaddlePaddle 安装版本低于此要求,请按照[安装文档](https://www.paddlepaddle.org.cn/#quick-start)中的说明更新 PaddlePaddle 安装版本。
运行本目录下的范例模型需要安装PaddlePaddle 2.0-beta版。如果您的 PaddlePaddle 安装版本低于此要求,请按照[安装文档](https://www.paddlepaddle.org.cn/#quick-start)中的说明更新 PaddlePaddle 安装版本。
# Sequence to Sequence (Seq2Seq)
......@@ -11,11 +11,8 @@
├── reader.py # 数据读入程序
├── download.py # 数据下载程序
├── train.py # 训练主程序
├── infer.py # 预测主程序
├── run.sh # 默认配置的启动脚本
├── infer.sh # 默认配置的解码脚本
├── attention_model.py # 带注意力机制的翻译模型程序
└── base_model.py # 无注意力机制的翻译模型程序
└──attention_model.py # 带注意力机制的翻译模型程序
```
## 简介
......@@ -24,7 +21,7 @@ Sequence to Sequence (Seq2Seq),使用编码器-解码器(Encoder-Decoder)
本目录包含Seq2Seq的一个经典样例:机器翻译,实现了一个base model(不带attention机制),一个带attention机制的翻译模型。Seq2Seq翻译模型,模拟了人类在进行翻译类任务时的行为:先解析源语言,理解其含义,再根据该含义来写出目标语言的语句。更多关于机器翻译的具体原理和数学表达式,我们推荐参考[深度学习101](http://paddlepaddle.org/documentation/docs/zh/1.2/beginners_guide/basics/machine_translation/index.html)
**本目录旨在展示如何用Paddle Fluid 1.7的动态图接口实现一个标准的Seq2Seq模型** ,相同网络结构的静态图实现可以参照 [Seq2Seq](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/PaddleTextGEN/seq2seq)
**本目录旨在展示如何用PaddlePaddle 2.0-beta的动态图接口实现一个标准的Seq2Seq模型** ,相同网络结构的静态图实现可以参照 [Seq2Seq](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/PaddleTextGEN/seq2seq)
## 模型概览
......@@ -52,55 +49,34 @@ sh run.sh
```
python train.py \
--src_lang en --tar_lang vi \
--attention True \
--num_layers 2 \
--hidden_size 512 \
--src_vocab_size 17191 \
--tar_vocab_size 7709 \
--batch_size 128 \
--dropout 0.2 \
--init_scale 0.1 \
--max_grad_norm 5.0 \
--train_data_prefix data/en-vi/train \
--eval_data_prefix data/en-vi/tst2012 \
--test_data_prefix data/en-vi/tst2013 \
--vocab_prefix data/en-vi/vocab \
--use_gpu True \
--model_path ./attention_models
--src_lang en --tar_lang vi \
--attention True \
--num_layers 2 \
--hidden_size 512 \
--src_vocab_size 17191 \
--tar_vocab_size 7709 \
--batch_size 128 \
--dropout 0.0 \
--init_scale 0.2 \
--max_grad_norm 5.0 \
--train_data_prefix data/en-vi/train \
--eval_data_prefix data/en-vi/tst2012 \
--test_data_prefix data/en-vi/tst2013 \
--vocab_prefix data/en-vi/vocab \
--use_gpu True \
--model_path attention_models \
--enable_ce \
--learning_rate 0.002 \
--dtype float64 \
--padding_idx 2 \
--optimizer sgd
```
训练程序会在每个epoch训练结束之后,save一次模型。
## 模型预测
当模型训练完成之后, 可以利用infer.sh的脚本进行预测,默认使用beam search的方法进行预测,加载第10个epoch的模型进行预测,对test的数据集进行解码
```
sh infer.sh
```
如果想预测别的数据文件,只需要将 --infer_file参数进行修改。
```
python infer.py \
--attention True \
--src_lang en --tar_lang vi \
--num_layers 2 \
--hidden_size 512 \
--src_vocab_size 17191 \
--tar_vocab_size 7709 \
--batch_size 128 \
--dropout 0.2 \
--init_scale 0.1 \
--max_grad_norm 5.0 \
--vocab_prefix data/en-vi/vocab \
--infer_file data/en-vi/tst2013.en \
--reload_model attention_models/epoch_10 \
--infer_output_file attention_infer_output/infer_output.txt \
--beam_size 10 \
--use_gpu True
```
TODO
## 效果评价
......@@ -109,19 +85,3 @@ python infer.py \
```sh
mosesdecoder/scripts/generic/multi-bleu.perl tst2013.vi < infer_output.txt
```
每个模型分别训练了10次,单次取第10个epoch保存的模型进行预测,取beam_size=10。效果如下(为了便于观察,对10次结果按照升序进行了排序):
```
> no attention
tst2012 BLEU:
[10.75 10.85 10.9 10.94 10.97 11.01 11.01 11.04 11.13 11.4]
tst2013 BLEU:
[10.71 10.71 10.74 10.76 10.91 10.94 11.02 11.16 11.21 11.44]
> with attention
tst2012 BLEU:
[21.14 22.34 22.54 22.65 22.71 22.71 23.08 23.15 23.3 23.4]
tst2013 BLEU:
[23.41 24.79 25.11 25.12 25.19 25.24 25.39 25.61 25.61 25.63]
```
......@@ -52,16 +52,24 @@ def parse_args():
default=0.001,
help="learning rate for optimizer")
parser.add_argument(
"--padding_idx",
type=int,
default=0,
help="padding_idx of embedding")
parser.add_argument(
"--num_layers",
type=int,
default=1,
help="layers number of encoder and decoder")
parser.add_argument(
"--hidden_size",
type=int,
default=100,
help="hidden size of encoder and decoder")
parser.add_argument("--src_vocab_size", type=int, help="source vocab size")
parser.add_argument("--tar_vocab_size", type=int, help="target vocab size")
......@@ -114,6 +122,12 @@ def parse_args():
default=False,
help='Whether using gpu [True|False]')
parser.add_argument(
"--dtype",
type=str,
default='float32',
help="data type of tensor in network")
parser.add_argument(
"--enable_ce",
action='store_true',
......@@ -128,5 +142,7 @@ def parse_args():
type=str,
default='./seq2seq.profile',
help="the profiler output file path. (used for benchmark)")
args = parser.parse_args()
return args
此差异已折叠。
# -*- coding: utf-8 -*-
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import paddle.fluid as fluid
import numpy as np
from paddle.fluid import ParamAttr
from paddle.fluid.dygraph import to_variable
from paddle.fluid.dygraph.nn import Embedding, Linear
from rnn import BasicLSTMUnit
import numpy as np
INF = 1. * 1e5
alpha = 0.6
uniform_initializer = lambda x: fluid.initializer.UniformInitializer(low=-x, high=x)
zero_constant = fluid.initializer.Constant(0.0)
class BaseModel(fluid.dygraph.Layer):
def __init__(self,
hidden_size,
src_vocab_size,
tar_vocab_size,
batch_size,
num_layers=1,
init_scale=0.1,
dropout=None,
beam_size=1,
beam_start_token=1,
beam_end_token=2,
beam_max_step_num=100,
mode='train'):
super(BaseModel, self).__init__()
self.hidden_size = hidden_size
self.src_vocab_size = src_vocab_size
self.tar_vocab_size = tar_vocab_size
self.batch_size = batch_size
self.num_layers = num_layers
self.init_scale = init_scale
self.dropout = dropout
self.beam_size = beam_size
self.beam_start_token = beam_start_token
self.beam_end_token = beam_end_token
self.beam_max_step_num = beam_max_step_num
self.mode = mode
self.kinf = 1e9
param_attr = ParamAttr(initializer=uniform_initializer(self.init_scale))
bias_attr = ParamAttr(initializer=zero_constant)
forget_bias = 1.0
self.src_embeder = Embedding(
size=[self.src_vocab_size, self.hidden_size],
param_attr=fluid.ParamAttr(
initializer=uniform_initializer(init_scale)))
self.tar_embeder = Embedding(
size=[self.tar_vocab_size, self.hidden_size],
is_sparse=False,
param_attr=fluid.ParamAttr(
initializer=uniform_initializer(init_scale)))
self.enc_units = []
for i in range(num_layers):
self.enc_units.append(
self.add_sublayer(
"enc_units_%d" % i,
BasicLSTMUnit(
hidden_size=self.hidden_size,
input_size=self.hidden_size,
param_attr=param_attr,
bias_attr=bias_attr,
forget_bias=forget_bias)))
self.dec_units = []
for i in range(num_layers):
self.dec_units.append(
self.add_sublayer(
"dec_units_%d" % i,
BasicLSTMUnit(
hidden_size=self.hidden_size,
input_size=self.hidden_size,
param_attr=param_attr,
bias_attr=bias_attr,
forget_bias=forget_bias)))
self.fc = fluid.dygraph.nn.Linear(
self.hidden_size,
self.tar_vocab_size,
param_attr=param_attr,
bias_attr=False)
def _transpose_batch_time(self, x):
return fluid.layers.transpose(x, [1, 0] + list(range(2, len(x.shape))))
def _merge_batch_beams(self, x):
return fluid.layers.reshape(x, shape=(-1, x.shape[2]))
def _split_batch_beams(self, x):
return fluid.layers.reshape(x, shape=(-1, self.beam_size, x.shape[1]))
def _expand_to_beam_size(self, x):
x = fluid.layers.unsqueeze(x, [1])
expand_times = [1] * len(x.shape)
expand_times[1] = self.beam_size
x = fluid.layers.expand(x, expand_times)
return x
def _real_state(self, state, new_state, step_mask):
new_state = fluid.layers.elementwise_mul(new_state, step_mask, axis=0) - \
fluid.layers.elementwise_mul(state, (step_mask - 1), axis=0)
return new_state
def _gather(self, x, indices, batch_pos):
topk_coordinates = fluid.layers.stack([batch_pos, indices], axis=2)
return fluid.layers.gather_nd(x, topk_coordinates)
def forward(self, inputs):
#inputs[0] = np.expand_dims(inputs[0], axis=-1)
#inputs[1] = np.expand_dims(inputs[1], axis=-1)
inputs = [fluid.dygraph.to_variable(np_inp) for np_inp in inputs]
src, tar, label, src_sequence_length, tar_sequence_length = inputs
if src.shape[0] < self.batch_size:
self.batch_size = src.shape[0]
src_emb = self.src_embeder(self._transpose_batch_time(src))
enc_hidden = to_variable(
np.zeros(
(self.num_layers, self.batch_size, self.hidden_size),
dtype='float32'))
enc_cell = to_variable(
np.zeros(
(self.num_layers, self.batch_size, self.hidden_size),
dtype='float32'))
max_seq_len = src_emb.shape[0]
enc_len_mask = fluid.layers.sequence_mask(
src_sequence_length, maxlen=max_seq_len, dtype="float32")
enc_len_mask = fluid.layers.transpose(enc_len_mask, [1, 0])
enc_states = [[enc_hidden, enc_cell]]
for l in range(max_seq_len):
step_input = src_emb[l]
step_mask = enc_len_mask[l]
enc_hidden, enc_cell = enc_states[l]
new_enc_hidden, new_enc_cell = [], []
for i in range(self.num_layers):
new_hidden, new_cell = self.enc_units[i](
step_input, enc_hidden[i], enc_cell[i])
new_enc_hidden.append(new_hidden)
new_enc_cell.append(new_cell)
if self.dropout != None and self.dropout > 0.0:
step_input = fluid.layers.dropout(
new_hidden,
dropout_prob=self.dropout,
dropout_implementation='upscale_in_train')
else:
step_input = new_hidden
new_enc_hidden = [
self._real_state(enc_hidden[i], new_enc_hidden[i], step_mask)
for i in range(self.num_layers)
]
new_enc_cell = [
self._real_state(enc_cell[i], new_enc_cell[i], step_mask)
for i in range(self.num_layers)
]
enc_states.append([new_enc_hidden, new_enc_cell])
if self.mode in ['train', 'eval']:
dec_hidden, dec_cell = enc_states[-1]
tar_emb = self.tar_embeder(self._transpose_batch_time(tar))
max_seq_len = tar_emb.shape[0]
dec_output = []
for step_idx in range(max_seq_len):
step_input = tar_emb[step_idx]
new_dec_hidden, new_dec_cell = [], []
for i in range(self.num_layers):
new_hidden, new_cell = self.dec_units[i](
step_input, dec_hidden[i], dec_cell[i])
new_dec_hidden.append(new_hidden)
new_dec_cell.append(new_cell)
if self.dropout != None and self.dropout > 0.0:
step_input = fluid.layers.dropout(
new_hidden,
dropout_prob=self.dropout,
dropout_implementation='upscale_in_train')
else:
step_input = new_hidden
dec_output.append(step_input)
dec_hidden, dec_cell = new_dec_hidden, new_dec_cell
dec_output = fluid.layers.stack(dec_output)
dec_output = self.fc(self._transpose_batch_time(dec_output))
loss = fluid.layers.softmax_with_cross_entropy(
logits=dec_output, label=label, soft_label=False)
loss = fluid.layers.squeeze(loss, axes=[2])
max_tar_seq_len = fluid.layers.shape(tar)[1]
tar_mask = fluid.layers.sequence_mask(
tar_sequence_length, maxlen=max_tar_seq_len, dtype='float32')
loss = loss * tar_mask
loss = fluid.layers.reduce_mean(loss, dim=[0])
loss = fluid.layers.reduce_sum(loss)
return loss
elif self.mode in ['beam_search']:
batch_beam_shape = (self.batch_size, self.beam_size)
#batch_beam_shape_1 = (self.batch_size, self.beam_size, 1)
vocab_size_tensor = to_variable(np.full((1), self.tar_vocab_size))
start_token_tensor = to_variable(
np.full(
batch_beam_shape, self.beam_start_token, dtype='int64'))
end_token_tensor = to_variable(
np.full(
batch_beam_shape, self.beam_end_token, dtype='int64'))
step_input = self.tar_embeder(start_token_tensor)
beam_finished = to_variable(
np.full(
batch_beam_shape, 0, dtype='float32'))
beam_state_log_probs = to_variable(
np.array(
[[0.] + [-self.kinf] * (self.beam_size - 1)],
dtype="float32"))
beam_state_log_probs = fluid.layers.expand(beam_state_log_probs,
[self.batch_size, 1])
dec_hidden, dec_cell = enc_states[-1]
dec_hidden = [
self._expand_to_beam_size(state) for state in dec_hidden
]
dec_cell = [self._expand_to_beam_size(state) for state in dec_cell]
batch_pos = fluid.layers.expand(
fluid.layers.unsqueeze(
to_variable(
np.arange(
0, self.batch_size, 1, dtype="int64")), [1]),
[1, self.beam_size])
predicted_ids = []
parent_ids = []
for step_idx in range(self.beam_max_step_num):
if fluid.layers.reduce_sum(1 - beam_finished).numpy()[0] == 0:
break
step_input = self._merge_batch_beams(step_input)
new_dec_hidden, new_dec_cell = [], []
dec_hidden = [
self._merge_batch_beams(state) for state in dec_hidden
]
dec_cell = [
self._merge_batch_beams(state) for state in dec_cell
]
for i in range(self.num_layers):
new_hidden, new_cell = self.dec_units[i](
step_input, dec_hidden[i], dec_cell[i])
new_dec_hidden.append(new_hidden)
new_dec_cell.append(new_cell)
if self.dropout != None and self.dropout > 0.0:
step_input = fluid.layers.dropout(
new_hidden,
dropout_prob=self.dropout,
dropout_implementation='upscale_in_train')
else:
step_input = new_hidden
cell_outputs = self._split_batch_beams(step_input)
cell_outputs = self.fc(cell_outputs)
# Beam_search_step:
step_log_probs = fluid.layers.log(
fluid.layers.softmax(cell_outputs))
noend_array = [-self.kinf] * self.tar_vocab_size
noend_array[
self.
beam_end_token] = 0 # [-kinf, -kinf, ..., 0, -kinf, ...]
noend_mask_tensor = to_variable(
np.array(
noend_array, dtype='float32'))
# set finished position to one-hot probability of <eos>
step_log_probs = fluid.layers.elementwise_mul(
fluid.layers.expand(fluid.layers.unsqueeze(beam_finished, [2]), [1, 1, self.tar_vocab_size]),
noend_mask_tensor, axis=-1) - \
fluid.layers.elementwise_mul(step_log_probs, (beam_finished - 1), axis=0)
log_probs = fluid.layers.elementwise_add(
x=step_log_probs, y=beam_state_log_probs, axis=0)
scores = fluid.layers.reshape(
log_probs, [-1, self.beam_size * self.tar_vocab_size])
topk_scores, topk_indices = fluid.layers.topk(
input=scores, k=self.beam_size)
beam_indices = fluid.layers.elementwise_floordiv(
topk_indices, vocab_size_tensor) # in which beam
token_indices = fluid.layers.elementwise_mod(
topk_indices, vocab_size_tensor) # position in beam
next_log_probs = self._gather(scores, topk_indices,
batch_pos) #
new_dec_hidden = [
self._split_batch_beams(state) for state in new_dec_hidden
]
new_dec_cell = [
self._split_batch_beams(state) for state in new_dec_cell
]
new_dec_hidden = [
self._gather(x, beam_indices, batch_pos)
for x in new_dec_hidden
]
new_dec_cell = [
self._gather(x, beam_indices, batch_pos)
for x in new_dec_cell
]
next_finished = self._gather(beam_finished, beam_indices,
batch_pos)
next_finished = fluid.layers.cast(next_finished, "bool")
next_finished = fluid.layers.logical_or(
next_finished,
fluid.layers.equal(token_indices, end_token_tensor))
next_finished = fluid.layers.cast(next_finished, "float32")
# prepare for next step
dec_hidden, dec_cell = new_dec_hidden, new_dec_cell
beam_finished = next_finished
beam_state_log_probs = next_log_probs
step_input = self.tar_embeder(
token_indices) # remove unsqueeze in v1.7
predicted_ids.append(token_indices)
parent_ids.append(beam_indices)
predicted_ids = fluid.layers.stack(predicted_ids)
parent_ids = fluid.layers.stack(parent_ids)
predicted_ids = fluid.layers.gather_tree(predicted_ids, parent_ids)
predicted_ids = self._transpose_batch_time(predicted_ids)
return predicted_ids
else:
print("not support mode ", self.mode)
raise Exception("not support mode: " + self.mode)
# -*- coding: utf-8 -*-
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import time
import os
import random
import logging
import math
import io
import paddle
import paddle.fluid as fluid
import reader
import sys
line_tok = '\n'
space_tok = ' '
if sys.version[0] == '2':
reload(sys)
sys.setdefaultencoding("utf-8")
line_tok = u'\n'
space_tok = u' '
logging.basicConfig(format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger("fluid")
logger.setLevel(logging.INFO)
from args import *
import logging
import pickle
from attention_model import AttentionModel
from base_model import BaseModel
def infer():
args = parse_args()
num_layers = args.num_layers
src_vocab_size = args.src_vocab_size
tar_vocab_size = args.tar_vocab_size
batch_size = args.batch_size
dropout = args.dropout
init_scale = args.init_scale
max_grad_norm = args.max_grad_norm
hidden_size = args.hidden_size
# inference process
print("src", src_vocab_size)
place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace()
with fluid.dygraph.guard(place):
# dropout type using upscale_in_train, dropout can be remove in inferecen
# So we can set dropout to 0
if args.attention:
model = AttentionModel(
hidden_size,
src_vocab_size,
tar_vocab_size,
batch_size,
beam_size=args.beam_size,
num_layers=num_layers,
init_scale=init_scale,
dropout=0.0,
mode='beam_search')
else:
model = BaseModel(
hidden_size,
src_vocab_size,
tar_vocab_size,
batch_size,
beam_size=args.beam_size,
num_layers=num_layers,
init_scale=init_scale,
dropout=0.0,
mode='beam_search')
source_vocab_file = args.vocab_prefix + "." + args.src_lang
infer_file = args.infer_file
infer_data = reader.raw_mono_data(source_vocab_file, infer_file)
def prepare_input(batch, epoch_id=0):
src_ids, src_mask, tar_ids, tar_mask = batch
res = {}
src_ids = src_ids.reshape((src_ids.shape[0], src_ids.shape[1]))
in_tar = tar_ids[:, :-1]
label_tar = tar_ids[:, 1:]
in_tar = in_tar.reshape((in_tar.shape[0], in_tar.shape[1]))
label_tar = label_tar.reshape(
(label_tar.shape[0], label_tar.shape[1], 1))
inputs = [src_ids, in_tar, label_tar, src_mask, tar_mask]
return inputs, np.sum(tar_mask)
dir_name = args.reload_model
print("dir name", dir_name)
state_dict, _ = fluid.dygraph.load_dygraph(dir_name)
model.set_dict(state_dict)
model.eval()
train_data_iter = reader.get_data_iter(
infer_data, batch_size, mode='infer')
tar_id2vocab = []
tar_vocab_file = args.vocab_prefix + "." + args.tar_lang
with io.open(tar_vocab_file, "r", encoding='utf-8') as f:
for line in f.readlines():
tar_id2vocab.append(line.strip())
infer_output_file = args.infer_output_file
infer_output_dir = infer_output_file.split('/')[0]
if not os.path.exists(infer_output_dir):
os.mkdir(infer_output_dir)
with io.open(infer_output_file, 'w', encoding='utf-8') as out_file:
for batch_id, batch in enumerate(train_data_iter):
input_data_feed, word_num = prepare_input(batch, epoch_id=0)
# import ipdb; ipdb.set_trace()
outputs = model(input_data_feed)
for i in range(outputs.shape[0]):
ins = outputs[i].numpy()
res = [tar_id2vocab[int(e)] for e in ins[:, 0].reshape(-1)]
new_res = []
for ele in res:
if ele == "</s>":
break
new_res.append(ele)
out_file.write(space_tok.join(new_res))
out_file.write(line_tok)
def check_version():
"""
Log error and exit when the installed version of paddlepaddle is
not satisfied.
"""
err = "PaddlePaddle version 1.6 or higher is required, " \
"or a suitable develop version is satisfied as well. \n" \
"Please make sure the version is good with your code." \
try:
fluid.require_version('1.6.0')
except Exception as e:
logger.error(err)
sys.exit(1)
if __name__ == '__main__':
check_version()
infer()
#!/bin/bash
export CUDA_VISIBLE_DEVICES=7
python infer.py \
--attention True \
--src_lang en --tar_lang vi \
--num_layers 2 \
--hidden_size 512 \
--src_vocab_size 17191 \
--tar_vocab_size 7709 \
--batch_size 1 \
--dropout 0.2 \
--init_scale 0.1 \
--max_grad_norm 5.0 \
--vocab_prefix data/en-vi/vocab \
--infer_file data/en-vi/tst2013.en \
--reload_model attention_models/epoch_10 \
--infer_output_file attention_infer_output/infer_output.txt \
--beam_size 10 \
--use_gpu True
......@@ -12,11 +12,15 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Utilities for parsing PTB text files."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import paddle
from paddle.io import Dataset
import collections
import os
import io
......@@ -131,7 +135,7 @@ def raw_data(src_lang,
test_src, test_tar = _para_file_to_ids( src_test_file, tar_test_file, \
src_vocab, tar_vocab )
return ( train_src, train_tar), (eval_src, eval_tar), (test_src, test_tar),\
return (train_src, train_tar), (eval_src, eval_tar), (test_src, test_tar),\
(src_vocab, tar_vocab)
......@@ -145,75 +149,77 @@ def raw_mono_data(vocab_file, file_path):
return (test_src, test_tar)
def get_data_iter(raw_data,
batch_size,
mode='train',
enable_ce=False,
cache_num=20):
class IWSLTDataset(Dataset):
def __init__(self, raw_data):
super(IWSLTDataset, self).__init__()
self.raw_data = raw_data
self.src_data, self.tar_data = self.sort_data(raw_data)
self.num_samples = len(self.src_data)
def sort_data(self, raw_data):
src_data, trg_data = raw_data
data_pair = []
for src, trg in zip(src_data, trg_data):
data_pair.append([src, trg])
sorted_data_pair = sorted(data_pair, key=lambda k: len(k[0]))
src_data = [data_pair[0] for data_pair in sorted_data_pair]
trg_data = [data_pair[1] for data_pair in sorted_data_pair]
return src_data, trg_data
def __getitem__(self, idx):
src_ids, tar_ids = np.asarray(self.src_data[idx]), np.asarray(
self.tar_data[idx])
src_mask, tar_mask = len(src_ids), len(tar_ids)
return src_ids, tar_ids, src_mask, tar_mask
def __len__(self):
return self.num_samples
def get_max_seq_len(self):
src_max_seq_len = 0
trg_max_seq_len = 0
for data in self.src_data:
src_max_seq_len = max(src_max_seq_len, len(data))
for data in self.tar_data:
trg_max_seq_len = max(trg_max_seq_len, len(data))
src_max_seq_len = min(src_max_seq_len, self.max_seq_len)
trg_max_seq_len = min(trg_max_seq_len, self.max_seq_len)
return src_max_seq_len, trg_max_seq_len
class DataCollector():
def __init__(self):
super(DataCollector, self).__init__()
def pad(self, data, max_len, source=False):
bs = len(data)
ids = np.ones((bs, max_len), dtype='int64') * 2
for i, ele in enumerate(data):
ids[i, :len(ele)] = ele
return ids
src_data, tar_data = raw_data
def __call__(self, samples):
batch_size = len(samples)
data_len = len(src_data)
src_ids_np = np.asarray([sample[0] for sample in samples])
tar_ids_np = np.asarray([sample[1] for sample in samples])
src_len = np.asarray([sample[2] for sample in samples])
trg_len = np.asarray([sample[3] for sample in samples])
index = np.arange(data_len)
if mode == "train" and not enable_ce:
np.random.shuffle(index)
max_source_len = max(max(src_len), 1)
max_tar_len = max(max(trg_len), 1)
def to_pad_np(data, source=False):
max_len = 0
bs = min(batch_size, len(data))
for ele in data:
if len(ele) > max_len:
max_len = len(ele)
src_ids_pad_np = self.pad(src_ids_np, max_source_len, source=True)
tar_ids_pad_np = self.pad(tar_ids_np, max_tar_len)
ids = np.ones((bs, max_len), dtype='int64') * 2
mask = np.zeros((bs), dtype='int32')
in_tar = tar_ids_pad_np[:, :-1]
label_tar = tar_ids_pad_np[:, 1:]
label_tar = label_tar.reshape(
(label_tar.shape[0], label_tar.shape[1], 1))
for i, ele in enumerate(data):
ids[i, :len(ele)] = ele
if not source:
mask[i] = len(ele) - 1
else:
mask[i] = len(ele)
return ids, mask
b_src = []
if mode != "train":
cache_num = 1
for j in range(data_len):
if len(b_src) == batch_size * cache_num:
# build batch size
# sort
if mode == 'infer':
new_cache = b_src
else:
new_cache = sorted(b_src, key=lambda k: len(k[0]))
for i in range(cache_num):
batch_data = new_cache[i * batch_size:(i + 1) * batch_size]
src_cache = [w[0] for w in batch_data]
tar_cache = [w[1] for w in batch_data]
src_ids, src_mask = to_pad_np(src_cache, source=True)
tar_ids, tar_mask = to_pad_np(tar_cache)
yield (src_ids, src_mask, tar_ids, tar_mask)
b_src = []
b_src.append((src_data[index[j]], tar_data[index[j]]))
if len(b_src) == batch_size * cache_num or mode == 'infer':
if mode == 'infer':
new_cache = b_src
else:
new_cache = sorted(b_src, key=lambda k: len(k[0]))
for i in range(cache_num):
batch_end = min(len(new_cache), (i + 1) * batch_size)
batch_data = new_cache[i * batch_size:batch_end]
src_cache = [w[0] for w in batch_data]
tar_cache = [w[1] for w in batch_data]
src_ids, src_mask = to_pad_np(src_cache, source=True)
tar_ids, tar_mask = to_pad_np(tar_cache)
yield (src_ids, src_mask, tar_ids, tar_mask)
input_data_feed = src_ids_pad_np, in_tar, label_tar, src_len, trg_len
return input_data_feed
from paddle.fluid import layers
from paddle.fluid.dygraph import Layer
class BasicLSTMUnit(Layer):
"""
****
BasicLSTMUnit class, Using basic operator to build LSTM
The algorithm can be described as the code below.
.. math::
i_t &= \sigma(W_{ix}x_{t} + W_{ih}h_{t-1} + b_i)
f_t &= \sigma(W_{fx}x_{t} + W_{fh}h_{t-1} + b_f + forget_bias )
o_t &= \sigma(W_{ox}x_{t} + W_{oh}h_{t-1} + b_o)
\\tilde{c_t} &= tanh(W_{cx}x_t + W_{ch}h_{t-1} + b_c)
c_t &= f_t \odot c_{t-1} + i_t \odot \\tilde{c_t}
h_t &= o_t \odot tanh(c_t)
- $W$ terms denote weight matrices (e.g. $W_{ix}$ is the matrix
of weights from the input gate to the input)
- The b terms denote bias vectors ($bx_i$ and $bh_i$ are the input gate bias vector).
- sigmoid is the logistic sigmoid function.
- $i, f, o$ and $c$ are the input gate, forget gate, output gate,
and cell activation vectors, respectively, all of which have the same size as
the cell output activation vector $h$.
- The :math:`\odot` is the element-wise product of the vectors.
- :math:`tanh` is the activation functions.
- :math:`\\tilde{c_t}` is also called candidate hidden state,
which is computed based on the current input and the previous hidden state.
Args:
name_scope(string) : The name scope used to identify parameter and bias name
hidden_size (integer): The hidden size used in the Unit.
param_attr(ParamAttr|None): The parameter attribute for the learnable
weight matrix. Note:
If it is set to None or one attribute of ParamAttr, lstm_unit will
create ParamAttr as param_attr. If the Initializer of the param_attr
is not set, the parameter is initialized with Xavier. Default: None.
bias_attr (ParamAttr|None): The parameter attribute for the bias
of LSTM unit.
If it is set to None or one attribute of ParamAttr, lstm_unit will
create ParamAttr as bias_attr. If the Initializer of the bias_attr
is not set, the bias is initialized as zero. Default: None.
gate_activation (function|None): The activation function for gates (actGate).
Default: 'fluid.layers.sigmoid'
activation (function|None): The activation function for cells (actNode).
Default: 'fluid.layers.tanh'
forget_bias(float|1.0): forget bias used when computing forget gate
dtype(string): data type used in this unit
"""
def __init__(self,
hidden_size,
input_size,
param_attr=None,
bias_attr=None,
gate_activation=None,
activation=None,
forget_bias=1.0,
dtype='float32'):
super(BasicLSTMUnit, self).__init__(dtype)
self._hiden_size = hidden_size
self._param_attr = param_attr
self._bias_attr = bias_attr
self._gate_activation = gate_activation or layers.sigmoid
self._activation = activation or layers.tanh
self._forget_bias = layers.fill_constant(
[1], dtype=dtype, value=forget_bias)
self._forget_bias.stop_gradient = False
self._dtype = dtype
self._input_size = input_size
self._weight = self.create_parameter(
attr=self._param_attr,
shape=[self._input_size + self._hiden_size, 4 * self._hiden_size],
dtype=self._dtype)
self._bias = self.create_parameter(
attr=self._bias_attr,
shape=[4 * self._hiden_size],
dtype=self._dtype,
is_bias=True)
def forward(self, input, pre_hidden, pre_cell):
concat_input_hidden = layers.concat([input, pre_hidden], 1)
gate_input = layers.matmul(x=concat_input_hidden, y=self._weight)
gate_input = layers.elementwise_add(gate_input, self._bias)
i, j, f, o = layers.split(gate_input, num_or_sections=4, dim=-1)
new_cell = layers.elementwise_add(
layers.elementwise_mul(
pre_cell,
layers.sigmoid(layers.elementwise_add(f, self._forget_bias))),
layers.elementwise_mul(layers.sigmoid(i), layers.tanh(j)))
new_hidden = layers.tanh(new_cell) * layers.sigmoid(o)
return new_hidden, new_cell
\ No newline at end of file
......@@ -9,12 +9,17 @@ python train.py \
--src_vocab_size 17191 \
--tar_vocab_size 7709 \
--batch_size 128 \
--dropout 0.2 \
--init_scale 0.1 \
--dropout 0.0 \
--init_scale 0.2 \
--max_grad_norm 5.0 \
--train_data_prefix data/en-vi/train \
--eval_data_prefix data/en-vi/tst2012 \
--test_data_prefix data/en-vi/tst2013 \
--vocab_prefix data/en-vi/vocab \
--use_gpu True \
--model_path attention_models
--model_path attention_models \
--enable_ce \
--learning_rate 0.002 \
--dtype float64 \
--optimizer adam \
--max_epoch 1
......@@ -26,10 +26,10 @@ import math
import contextlib
import paddle
import paddle.fluid as fluid
from paddle.fluid.clip import GradientClipByGlobalNorm
from paddle.io import DataLoader, BatchSampler
import reader as reader
import reader
from reader import IWSLTDataset, DataCollector
import sys
if sys.version[0] == '2':
......@@ -37,182 +37,159 @@ if sys.version[0] == '2':
sys.setdefaultencoding("utf-8")
from args import *
from base_model import BaseModel
from attention_model import AttentionModel
import logging
import pickle
SEED = 102
paddle.framework.manual_seed(SEED)
args = parse_args()
paddle.set_default_dtype(args.dtype)
if args.enable_ce:
np.random.seed(102)
random.seed(102)
def create_model(args):
model = AttentionModel(
args.hidden_size,
args.src_vocab_size,
args.tar_vocab_size,
num_layers=args.num_layers,
init_scale=args.init_scale,
padding_idx=args.padding_idx,
dropout=args.dropout,
dtype=args.dtype)
return model
def create_data_loader(args, place):
print("begin to load data")
raw_data = reader.raw_data(args.src_lang, args.tar_lang, args.vocab_prefix,
args.train_data_prefix, args.eval_data_prefix,
args.test_data_prefix, args.max_len)
train_data, val_data, test_data, _ = raw_data
batch_fn = DataCollector()
def _create_data_loader(data, batch_fn):
dataset = IWSLTDataset(data)
bs = BatchSampler(
dataset=dataset,
shuffle=False,
batch_size=args.batch_size,
drop_last=False)
loader = DataLoader(
dataset,
places=place,
return_list=True,
batch_sampler=bs,
collate_fn=batch_fn)
return loader
train_loader = _create_data_loader(train_data, batch_fn)
val_loader = _create_data_loader(val_data, batch_fn)
test_loader = _create_data_loader(test_data, batch_fn)
return train_loader, val_loader, test_loader
def main():
args = parse_args()
print(args)
num_layers = args.num_layers
src_vocab_size = args.src_vocab_size
tar_vocab_size = args.tar_vocab_size
batch_size = args.batch_size
dropout = args.dropout
init_scale = args.init_scale
max_grad_norm = args.max_grad_norm
hidden_size = args.hidden_size
place = fluid.CUDAPlace(0) if args.use_gpu else fluid.CPUPlace()
with fluid.dygraph.guard(place):
#args.enable_ce = True
if args.enable_ce:
fluid.default_startup_program().random_seed = 102
fluid.default_main_program().random_seed = 102
np.random.seed(102)
random.seed(102)
# Training process
if args.attention:
model = AttentionModel(
hidden_size,
src_vocab_size,
tar_vocab_size,
batch_size,
num_layers=num_layers,
init_scale=init_scale,
dropout=dropout)
else:
model = BaseModel(
hidden_size,
src_vocab_size,
tar_vocab_size,
batch_size,
num_layers=num_layers,
init_scale=init_scale,
dropout=dropout)
gloabl_norm_clip = GradientClipByGlobalNorm(max_grad_norm)
lr = args.learning_rate
opt_type = args.optimizer
if opt_type == "sgd":
optimizer = fluid.optimizer.SGD(lr,
parameter_list=model.parameters(),
grad_clip=gloabl_norm_clip)
elif opt_type == "adam":
optimizer = fluid.optimizer.Adam(
lr,
parameter_list=model.parameters(),
grad_clip=gloabl_norm_clip)
else:
print("only support [sgd|adam]")
raise Exception("opt type not support")
train_data_prefix = args.train_data_prefix
eval_data_prefix = args.eval_data_prefix
test_data_prefix = args.test_data_prefix
vocab_prefix = args.vocab_prefix
src_lang = args.src_lang
tar_lang = args.tar_lang
print("begin to load data")
raw_data = reader.raw_data(src_lang, tar_lang, vocab_prefix,
train_data_prefix, eval_data_prefix,
test_data_prefix, args.max_len)
print("finished load data")
train_data, valid_data, test_data, _ = raw_data
def prepare_input(batch, epoch_id=0):
src_ids, src_mask, tar_ids, tar_mask = batch
res = {}
src_ids = src_ids.reshape((src_ids.shape[0], src_ids.shape[1]))
in_tar = tar_ids[:, :-1]
label_tar = tar_ids[:, 1:]
in_tar = in_tar.reshape((in_tar.shape[0], in_tar.shape[1]))
label_tar = label_tar.reshape(
(label_tar.shape[0], label_tar.shape[1], 1))
inputs = [src_ids, in_tar, label_tar, src_mask, tar_mask]
return inputs, np.sum(tar_mask)
# get train epoch size
def eval(data, epoch_id=0):
model.eval()
eval_data_iter = reader.get_data_iter(data, batch_size, mode='eval')
total_loss = 0.0
word_count = 0.0
for batch_id, batch in enumerate(eval_data_iter):
input_data_feed, word_num = prepare_input(batch, epoch_id)
loss = model(input_data_feed)
total_loss += loss * batch_size
word_count += word_num
ppl = np.exp(total_loss.numpy() / word_count)
model.train()
return ppl
ce_time = []
ce_ppl = []
max_epoch = args.max_epoch
for epoch_id in range(max_epoch):
epoch_start = time.time()
model.train()
if args.enable_ce:
train_data_iter = reader.get_data_iter(
train_data, batch_size, enable_ce=True)
else:
train_data_iter = reader.get_data_iter(train_data, batch_size)
total_loss = 0
word_count = 0.0
place = paddle.CUDAPlace(0) if args.use_gpu else paddle.CPUPlace()
paddle.disable_static()
paddle.framework.manual_seed(SEED)
if args.enable_ce:
np.random.seed(102)
random.seed(102)
model = create_model(args)
gloabl_norm_clip = paddle.nn.GradientClipByGlobalNorm(args.max_grad_norm)
lr = args.learning_rate
opt_type = args.optimizer
if opt_type == "sgd":
optimizer = paddle.optimizer.SGD(lr,
parameters=model.parameters(),
grad_clip=gloabl_norm_clip)
elif opt_type == "adam":
optimizer = paddle.optimizer.Adam(
lr, parameters=model.parameters(), grad_clip=gloabl_norm_clip)
else:
print("only support [sgd|adam]")
raise Exception("opt type not support")
train_loader, val_loader, test_loader = create_data_loader(args, place)
ce_time = []
ce_ppl = []
loss_list = []
max_epoch = args.max_epoch
for epoch_id in range(max_epoch):
epoch_start = time.time()
model.train()
total_loss = 0
word_count = 0.0
batch_times, epoch_times = [], []
batch_start = time.time()
for batch_id, input_data_feed in enumerate(train_loader):
word_num = paddle.sum(input_data_feed[4])
batch_size = input_data_feed[4].shape[0]
word_count += word_num.numpy()
batch_reader_end = time.time()
loss = model(input_data_feed)
optimizer.clear_grad()
loss.backward()
optimizer.step()
train_batch_cost = time.time() - batch_start
batch_times.append(train_batch_cost)
epoch_times.append(train_batch_cost)
total_loss += loss * batch_size
if batch_id > 0 and batch_id % 100 == 0:
print(
"-- Epoch:[%d]; Batch:[%d]; ppl: %.5f, avg_batch_cost: %.5f s, reader_cost: %.5f s"
% (epoch_id, batch_id, np.exp(
total_loss.numpy() / word_count), sum(batch_times) /
len(batch_times), batch_reader_end - batch_start))
ce_ppl.append(np.exp(total_loss.numpy() / word_count))
total_loss = 0.0
word_count = 0.0
batch_times = []
batch_start = time.time()
for batch_id, batch in enumerate(train_data_iter):
batch_reader_end = time.time()
input_data_feed, word_num = prepare_input(
batch, epoch_id=epoch_id)
word_count += word_num
loss = model(input_data_feed)
# print(loss.numpy()[0])
loss.backward()
optimizer.minimize(loss)
model.clear_gradients()
total_loss += loss * batch_size
train_batch_cost = time.time() - batch_start
batch_times.append(train_batch_cost)
if batch_id > 0 and batch_id % 100 == 0:
print(
"-- Epoch:[%d]; Batch:[%d]; ppl: %.5f, batch_cost: %.5f s, reader_cost: %.5f s"
% (epoch_id, batch_id, np.exp(total_loss.numpy() /
word_count),
train_batch_cost, batch_reader_end - batch_start))
ce_ppl.append(np.exp(total_loss.numpy() / word_count))
total_loss = 0.0
word_count = 0.0
batch_start = time.time()
train_epoch_cost = time.time() - epoch_start
print(
"\nTrain epoch:[%d]; epoch_cost: %.5f s; avg_batch_cost: %.5f s/step\n"
% (epoch_id, train_epoch_cost,
sum(batch_times) / len(batch_times)))
ce_time.append(train_epoch_cost)
dir_name = os.path.join(args.model_path, "epoch_" + str(epoch_id))
print("begin to save", dir_name)
paddle.fluid.save_dygraph(model.state_dict(), dir_name)
print("save finished")
dev_ppl = eval(valid_data)
print("dev ppl", dev_ppl)
test_ppl = eval(test_data)
print("test ppl", test_ppl)
if args.enable_ce:
card_num = get_cards()
_ppl = 0
_time = 0
try:
_time = ce_time[-1]
_ppl = ce_ppl[-1]
except:
print("ce info error")
print("kpis\ttrain_duration_card%s\t%s" % (card_num, _time))
print("kpis\ttrain_ppl_card%s\t%f" % (card_num, _ppl))
train_epoch_cost = time.time() - epoch_start
print(
"\nTrain epoch:[%d]; epoch_cost: %.5f s; avg_batch_cost: %.5f s/step\n"
% (epoch_id, train_epoch_cost, sum(epoch_times) / len(epoch_times)))
ce_time.append(train_epoch_cost)
dir_name = os.path.join(args.model_path, "epoch_" + str(epoch_id))
print("begin to save", dir_name)
paddle.save(model.state_dict(), dir_name)
print("save finished")
dev_ppl = eval(model, val_loader)
print("dev ppl", dev_ppl)
test_ppl = eval(model, test_loader)
print("test ppl", test_ppl)
epoch_times = []
if args.enable_ce:
card_num = get_cards()
_ppl = 0
_time = 0
try:
_time = ce_time[-1]
_ppl = ce_ppl[-1]
except:
print("ce info error")
print("kpis\ttrain_duration_card%s\t%s" % (card_num, _time))
print("kpis\ttrain_ppl_card%s\t%f" % (card_num, _ppl))
def get_cards():
......@@ -223,22 +200,24 @@ def get_cards():
return num
def check_version():
"""
Log error and exit when the installed version of paddlepaddle is
not satisfied.
"""
err = "PaddlePaddle version 1.6 or higher is required, " \
"or a suitable develop version is satisfied as well. \n" \
"Please make sure the version is good with your code." \
def eval(model, data_loader, epoch_id=0):
model.eval()
total_loss = 0.0
word_count = 0.0
for batch_id, input_data_feed in enumerate(data_loader):
word_num = paddle.sum(input_data_feed[4])
batch_size = input_data_feed[4].shape[0]
word_count += word_num.numpy()
loss = model(input_data_feed)
total_loss += loss * batch_size
word_count += word_num.numpy()
try:
fluid.require_version('1.6.0')
except Exception as e:
logger.error(err)
sys.exit(1)
ppl = np.exp(total_loss.numpy() / word_count)
model.train()
return ppl
if __name__ == '__main__':
check_version()
main()
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册