未验证 提交 07d13b73 编写于 作者: L linjieccc 提交者: GitHub

Add simultaneous translation models (#1626)

上级 73d41b1c
# transformer_nist_wait_1
|模型名称|transformer_nist_wait_1|
| :--- | :---: |
|类别|同声传译|
|网络|transformer|
|数据集|NIST 2008-中英翻译数据集|
|是否支持Fine-tuning|否|
|模型大小|377MB|
|最新更新日期|2021-09-17|
|数据指标|-|
## 一、模型基本信息
- ### 模型介绍
- 同声传译(Simultaneous Translation),即在句子完成之前进行翻译,同声传译的目标是实现同声传译的自动化,它可以与源语言同时翻译,延迟时间只有几秒钟。
STACL 是论文 [STACL: Simultaneous Translation with Implicit Anticipation and Controllable Latency using Prefix-to-Prefix Framework](https://www.aclweb.org/anthology/P19-1289/) 中针对同传提出的适用于所有同传场景的翻译架构。
- STACL 主要具有以下优势:
- Prefix-to-Prefix架构拥有预测能力,即在未看到源词的情况下仍然可以翻译出对应的目标词,克服了SOV→SVO等词序差异
<p align="center">
<img src="https://user-images.githubusercontent.com/40840292/133761990-13e55d0f-5c3a-476c-8865-5808d13cba97.png"> <br />
</p>
和传统的机器翻译模型主要的区别在于翻译时是否需要利用全句的源句。上图中,Seq2Seq模型需要等到全句的源句(1-5)全部输入Encoder后,Decoder才开始解码进行翻译;而STACL架构采用了Wait-k(图中Wait-2)的策略,当源句只有两个词(1和2)输入到Encoder后,Decoder即可开始解码预测目标句的第一个词。
- Wait-k策略可以不需要全句的源句,直接预测目标句,可以实现任意的字级延迟,同时保持较高的翻译质量。
<p align="center">
<img src="https://user-images.githubusercontent.com/40840292/133762098-6ea6f3ca-0d70-4a0a-981d-0fcc6f3cd96b.png"> <br />
</p>
Wait-k策略首先等待源句单词,然后与源句的其余部分同时翻译,即输出总是隐藏在输入后面。这是受到同声传译人员的启发,同声传译人员通常会在几秒钟内开始翻译演讲者的演讲,在演讲者结束几秒钟后完成。例如,如果k=2,第一个目标词使用前2个源词预测,第二个目标词使用前3个源词预测,以此类推。上图中,(a)simultaneous: our wait-2 等到"布什"和"总统"输入后就开始解码预测"pres.",而(b) non-simultaneous baseline 为传统的翻译模型,需要等到整句"布什 总统 在 莫斯科 与 普京 会晤"才开始解码预测。
- 该PaddleHub Module基于transformer网络结构,采用wait-1策略进行中文到英文的翻译。
## 二、安装
- ### 1、环境依赖
- paddlepaddle >= 2.1.0
- paddlehub >= 2.1.0 | [如何安装PaddleHub](../../../../../docs/docs_ch/get_start/installation.rst)
- ### 2、安装
- ```shell
$ hub install transformer_nist_wait_1
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、预测代码示例
- ```python
import paddlehub as hub
model = hub.Module(name="transformer_nist_wait_1")
# 待预测数据(模拟同声传译实时输入)
text = [
"他",
"他还",
"他还说",
"他还说现在",
"他还说现在正在",
"他还说现在正在为",
"他还说现在正在为这",
"他还说现在正在为这一",
"他还说现在正在为这一会议",
"他还说现在正在为这一会议作出",
"他还说现在正在为这一会议作出安排",
"他还说现在正在为这一会议作出安排。",
]
for t in text:
print("input: {}".format(t))
result = model.translate(t)
print("model output: {}\n".format(result))
# input: 他
# model output: he
#
# input: 他还
# model output: he also
#
# input: 他还说
# model output: he also said
#
# input: 他还说现在
# model output: he also said that
#
# input: 他还说现在正在
# model output: he also said that he
#
# input: 他还说现在正在为
# model output: he also said that he is
#
# input: 他还说现在正在为这
# model output: he also said that he is making
#
# input: 他还说现在正在为这一
# model output: he also said that he is making preparations
#
# input: 他还说现在正在为这一会议
# model output: he also said that he is making preparations for
#
# input: 他还说现在正在为这一会议作出
# model output: he also said that he is making preparations for this
#
# input: 他还说现在正在为这一会议作出安排
# model output: he also said that he is making preparations for this meeting
#
# input: 他还说现在正在为这一会议作出安排。
# model output: he also said that he is making preparations for this meeting .
```
- ### 2、 API
- ```python
__init__(max_length=256, max_out_len=256)
```
- 初始化module, 可配置模型的输入文本的最大长度
- **参数**
- max_length(int): 输入文本的最大长度,默认值为256。
- max_out_len(int): 输出文本的最大解码长度,超过最大解码长度时会截断句子的后半部分,默认值为256。
- ```python
translate(text, use_gpu=False)
```
- 预测API,输入源语言的文本(模拟同传语音输入),解码后输出翻译后的目标语言文本。
- **参数**
- text(str): 输入源语言的文本,数据类型为str
- use_gpu(bool): 是否使用gpu进行预测,默认为False
- **返回**
- result(str): 翻译后的目标语言文本。
## 四、服务部署
- PaddleHub Serving可以部署一个在线语义匹配服务,可以将此接口用于在线web应用。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m transformer_nist_wait_1
```
- 启动时会显示加载模型过程,启动成功后显示
- ```shell
Loading transformer_nist_wait_1 successful.
```
- 这样就完成了服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
# 待预测数据(模拟同声传译实时输入)
text = [
"他",
"他还",
"他还说",
"他还说现在",
"他还说现在正在",
"他还说现在正在为",
"他还说现在正在为这",
"他还说现在正在为这一",
"他还说现在正在为这一会议",
"他还说现在正在为这一会议作出",
"他还说现在正在为这一会议作出安排",
"他还说现在正在为这一会议作出安排。",
]
# 指定预测方法为transformer_nist_wait_1并发送post请求,content-type类型应指定json方式
# HOST_IP为服务器IP
url = "http://HOST_IP:8866/predict/transformer_nist_wait_1"
headers = {"Content-Type": "application/json"}
for t in text:
print("input: {}".format(t))
r = requests.post(url=url, headers=headers, data=json.dumps(t))
# 打印预测结果
print("model output: {}\n".format(result))
- 关于PaddleHub Serving更多信息参考:[服务部署](../../../../../docs/docs_ch/tutorial/serving.md)
## 五、更新历史
* 1.0.0
初始发布
```shell
hub install transformer_nist_wait_1==1.0.0
```
\ No newline at end of file
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import numpy as np
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddlenlp.transformers import WordEmbedding, PositionalEmbedding
class DecoderLayer(nn.TransformerDecoderLayer):
def __init__(self, *args, **kwargs):
super(DecoderLayer, self).__init__(*args, **kwargs)
def forward(self, tgt, memory, tgt_mask=None, memory_mask=None, cache=None):
residual = tgt
if self.normalize_before:
tgt = self.norm1(tgt)
if cache is None:
tgt = self.self_attn(tgt, tgt, tgt, tgt_mask, None)
else:
tgt, incremental_cache = self.self_attn(tgt, tgt, tgt, tgt_mask,
cache[0])
tgt = residual + self.dropout1(tgt)
if not self.normalize_before:
tgt = self.norm1(tgt)
residual = tgt
if self.normalize_before:
tgt = self.norm2(tgt)
if len(memory) == 1:
# Full sent
tgt = self.cross_attn(tgt, memory[0], memory[0], memory_mask, None)
else:
# Wait-k policy
cross_attn_outputs = []
for i in range(tgt.shape[1]):
q = tgt[:, i:i + 1, :]
if i >= len(memory):
e = memory[-1]
else:
e = memory[i]
cross_attn_outputs.append(
self.cross_attn(q, e, e, memory_mask[:, :, i:i + 1, :
e.shape[1]], None))
tgt = paddle.concat(cross_attn_outputs, axis=1)
tgt = residual + self.dropout2(tgt)
if not self.normalize_before:
tgt = self.norm2(tgt)
residual = tgt
if self.normalize_before:
tgt = self.norm3(tgt)
tgt = self.linear2(self.dropout(self.activation(self.linear1(tgt))))
tgt = residual + self.dropout3(tgt)
if not self.normalize_before:
tgt = self.norm3(tgt)
return tgt if cache is None else (tgt, (incremental_cache, ))
class Decoder(nn.TransformerDecoder):
"""
PaddlePaddle 2.1 casts memory_mask.dtype to memory.dtype, but in STACL,
type of memory is list, having no dtype attribute.
"""
def forward(self, tgt, memory, tgt_mask=None, memory_mask=None, cache=None):
output = tgt
new_caches = []
for i, mod in enumerate(self.layers):
if cache is None:
output = mod(output,
memory,
tgt_mask=tgt_mask,
memory_mask=memory_mask,
cache=None)
else:
output, new_cache = mod(output,
memory,
tgt_mask=tgt_mask,
memory_mask=memory_mask,
cache=cache[i])
new_caches.append(new_cache)
if self.norm is not None:
output = self.norm(output)
return output if cache is None else (output, new_caches)
class SimultaneousTransformer(nn.Layer):
"""
model
"""
def __init__(self,
src_vocab_size,
trg_vocab_size,
max_length=256,
n_layer=6,
n_head=8,
d_model=512,
d_inner_hid=2048,
dropout=0.1,
weight_sharing=False,
bos_id=0,
eos_id=1,
waitk=-1):
super(SimultaneousTransformer, self).__init__()
self.trg_vocab_size = trg_vocab_size
self.emb_dim = d_model
self.bos_id = bos_id
self.eos_id = eos_id
self.dropout = dropout
self.waitk = waitk
self.n_layer = n_layer
self.n_head = n_head
self.d_model = d_model
self.src_word_embedding = WordEmbedding(
vocab_size=src_vocab_size, emb_dim=d_model, bos_id=self.bos_id)
self.src_pos_embedding = PositionalEmbedding(
emb_dim=d_model, max_length=max_length+1)
if weight_sharing:
assert src_vocab_size == trg_vocab_size, (
"Vocabularies in source and target should be same for weight sharing."
)
self.trg_word_embedding = self.src_word_embedding
self.trg_pos_embedding = self.src_pos_embedding
else:
self.trg_word_embedding = WordEmbedding(
vocab_size=trg_vocab_size, emb_dim=d_model, bos_id=self.bos_id)
self.trg_pos_embedding = PositionalEmbedding(
emb_dim=d_model, max_length=max_length+1)
encoder_layer = nn.TransformerEncoderLayer(
d_model=d_model,
nhead=n_head,
dim_feedforward=d_inner_hid,
dropout=dropout,
activation='relu',
normalize_before=True,
bias_attr=[False, True])
encoder_norm = nn.LayerNorm(d_model)
self.encoder = nn.TransformerEncoder(
encoder_layer=encoder_layer, num_layers=n_layer, norm=encoder_norm)
decoder_layer = DecoderLayer(
d_model=d_model,
nhead=n_head,
dim_feedforward=d_inner_hid,
dropout=dropout,
activation='relu',
normalize_before=True,
bias_attr=[False, False, True])
decoder_norm = nn.LayerNorm(d_model)
self.decoder = Decoder(
decoder_layer=decoder_layer, num_layers=n_layer, norm=decoder_norm)
if weight_sharing:
self.linear = lambda x: paddle.matmul(
x=x, y=self.trg_word_embedding.word_embedding.weight, transpose_y=True)
else:
self.linear = nn.Linear(
in_features=d_model,
out_features=trg_vocab_size,
bias_attr=False)
def forward(self, src_word, trg_word):
src_max_len = paddle.shape(src_word)[-1]
trg_max_len = paddle.shape(trg_word)[-1]
base_attn_bias = paddle.cast(
src_word == self.bos_id,
dtype=paddle.get_default_dtype()).unsqueeze([1, 2]) * -1e9
src_slf_attn_bias = base_attn_bias
src_slf_attn_bias.stop_gradient = True
trg_slf_attn_bias = paddle.tensor.triu(
(paddle.ones(
(trg_max_len, trg_max_len),
dtype=paddle.get_default_dtype()) * -np.inf),
1)
trg_slf_attn_bias.stop_gradient = True
trg_src_attn_bias = paddle.tile(base_attn_bias, [1, 1, trg_max_len, 1])
src_pos = paddle.cast(
src_word != self.bos_id, dtype="int64") * paddle.arange(
start=0, end=src_max_len)
trg_pos = paddle.cast(
trg_word != self.bos_id, dtype="int64") * paddle.arange(
start=0, end=trg_max_len)
src_emb = self.src_word_embedding(src_word)
src_pos_emb = self.src_pos_embedding(src_pos)
src_emb = src_emb + src_pos_emb
enc_input = F.dropout(
src_emb, p=self.dropout,
training=self.training) if self.dropout else src_emb
with paddle.static.amp.fp16_guard():
if self.waitk >= src_max_len or self.waitk == -1:
# Full sentence
enc_outputs = [
self.encoder(
enc_input, src_mask=src_slf_attn_bias)
]
else:
# Wait-k policy
enc_outputs = []
for i in range(self.waitk, src_max_len + 1):
enc_output = self.encoder(
enc_input[:, :i, :],
src_mask=src_slf_attn_bias[:, :, :, :i])
enc_outputs.append(enc_output)
trg_emb = self.trg_word_embedding(trg_word)
trg_pos_emb = self.trg_pos_embedding(trg_pos)
trg_emb = trg_emb + trg_pos_emb
dec_input = F.dropout(
trg_emb, p=self.dropout,
training=self.training) if self.dropout else trg_emb
dec_output = self.decoder(
dec_input,
enc_outputs,
tgt_mask=trg_slf_attn_bias,
memory_mask=trg_src_attn_bias)
predict = self.linear(dec_output)
return predict
def beam_search(self, src_word, beam_size=4, max_len=256, waitk=-1):
# TODO: "Speculative Beam Search for Simultaneous Translation"
raise NotImplementedError
def greedy_search(self,
src_word,
max_len=256,
waitk=-1,
caches=None,
bos_id=None):
"""
greedy_search uses streaming reader. It doesn't need calling
encoder many times, an a sub-sentence just needs calling encoder once.
So, it needs previous state(caches) and last one of generated
tokens id last time.
"""
src_max_len = paddle.shape(src_word)[-1]
base_attn_bias = paddle.cast(
src_word == self.bos_id,
dtype=paddle.get_default_dtype()).unsqueeze([1, 2]) * -1e9
src_slf_attn_bias = base_attn_bias
src_slf_attn_bias.stop_gradient = True
trg_src_attn_bias = paddle.tile(base_attn_bias, [1, 1, 1, 1])
src_pos = paddle.cast(
src_word != self.bos_id, dtype="int64") * paddle.arange(
start=0, end=src_max_len)
src_emb = self.src_word_embedding(src_word)
src_pos_emb = self.src_pos_embedding(src_pos)
src_emb = src_emb + src_pos_emb
enc_input = F.dropout(
src_emb, p=self.dropout,
training=self.training) if self.dropout else src_emb
enc_outputs = [self.encoder(enc_input, src_mask=src_slf_attn_bias)]
# constant number
batch_size = enc_outputs[-1].shape[0]
max_len = (
enc_outputs[-1].shape[1] + 20) if max_len is None else max_len
end_token_tensor = paddle.full(
shape=[batch_size, 1], fill_value=self.eos_id, dtype="int64")
predict_ids = []
log_probs = paddle.full(
shape=[batch_size, 1], fill_value=0, dtype="float32")
if not bos_id:
trg_word = paddle.full(
shape=[batch_size, 1], fill_value=self.bos_id, dtype="int64")
else:
trg_word = paddle.full(
shape=[batch_size, 1], fill_value=bos_id, dtype="int64")
# init states (caches) for transformer
if not caches:
caches = self.decoder.gen_cache(enc_outputs[-1], do_zip=False)
for i in range(max_len):
trg_pos = paddle.full(
shape=trg_word.shape, fill_value=i, dtype="int64")
trg_emb = self.trg_word_embedding(trg_word)
trg_pos_emb = self.trg_pos_embedding(trg_pos)
trg_emb = trg_emb + trg_pos_emb
dec_input = F.dropout(
trg_emb, p=self.dropout,
training=self.training) if self.dropout else trg_emb
if waitk < 0 or i >= len(enc_outputs):
# if the decoder step is full sent or longer than all source
# step, then read the whole src
_e = enc_outputs[-1]
dec_output, caches = self.decoder(
dec_input, [_e], None,
trg_src_attn_bias[:, :, :, :_e.shape[1]], caches)
else:
_e = enc_outputs[i]
dec_output, caches = self.decoder(
dec_input, [_e], None,
trg_src_attn_bias[:, :, :, :_e.shape[1]], caches)
dec_output = paddle.reshape(
dec_output, shape=[-1, dec_output.shape[-1]])
logits = self.linear(dec_output)
step_log_probs = paddle.log(F.softmax(logits, axis=-1))
log_probs = paddle.add(x=step_log_probs, y=log_probs)
scores = log_probs
topk_scores, topk_indices = paddle.topk(x=scores, k=1)
finished = paddle.equal(topk_indices, end_token_tensor)
trg_word = topk_indices
log_probs = topk_scores
predict_ids.append(topk_indices)
if paddle.all(finished).numpy():
break
predict_ids = paddle.stack(predict_ids, axis=0)
finished_seq = paddle.transpose(predict_ids, [1, 2, 0])
finished_scores = topk_scores
return finished_seq, finished_scores, caches
\ No newline at end of file
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import jieba
import paddle
from paddlenlp.transformers import position_encoding_init
from paddlenlp.transformers import WordEmbedding, PositionalEmbedding
from paddlehub.env import MODULE_HOME
from paddlehub.module.module import moduleinfo, serving
from transformer_nist_wait_1.model import SimultaneousTransformer
from transformer_nist_wait_1.processor import STACLTokenizer, predict
@moduleinfo(
name="transformer_nist_wait_1",
version="1.0.0",
summary="",
author="PaddlePaddle",
author_email="",
type="nlp/simultaneous_translation",
)
class STTransformer():
"""
Transformer model for simultaneous translation.
"""
# Model config
model_config = {
# Number of head used in multi-head attention.
"n_head": 8,
# Number of sub-layers to be stacked in the encoder and decoder.
"n_layer": 6,
# The dimension for word embeddings, which is also the last dimension of
# the input and output of multi-head attention, position-wise feed-forward
# networks, encoder and decoder.
"d_model": 512,
}
def __init__(self,
max_length=256,
max_out_len=256,
):
super(STTransformer, self).__init__()
bpe_codes_fpath = os.path.join(MODULE_HOME, "transformer_nist_wait_1", "assets", "2M.zh2en.dict4bpe.zh")
src_vocab_fpath = os.path.join(MODULE_HOME, "transformer_nist_wait_1", "assets", "nist.20k.zh.vocab")
trg_vocab_fpath = os.path.join(MODULE_HOME, "transformer_nist_wait_1", "assets", "nist.10k.en.vocab")
params_fpath = os.path.join(MODULE_HOME, "transformer_nist_wait_1", "assets", "transformer.pdparams")
self.max_length = max_length
self.max_out_len = max_out_len
self.tokenizer = STACLTokenizer(
bpe_codes_fpath,
src_vocab_fpath,
trg_vocab_fpath,
)
src_vocab_size = self.tokenizer.src_vocab_size
trg_vocab_size = self.tokenizer.trg_vocab_size
self.transformer = SimultaneousTransformer(
src_vocab_size,
trg_vocab_size,
max_length=self.max_length,
n_layer=self.model_config['n_layer'],
n_head=self.model_config['n_head'],
d_model=self.model_config['d_model'],
)
model_dict = paddle.load(params_fpath)
# To avoid a longer length than training, reset the size of position
# encoding to max_length
model_dict["src_pos_embedding.pos_encoder.weight"] = position_encoding_init(
self.max_length + 1, self.model_config['d_model'])
model_dict["trg_pos_embedding.pos_encoder.weight"] = position_encoding_init(
self.max_length + 1, self.model_config['d_model'])
self.transformer.load_dict(model_dict)
@serving
def translate(self, text, use_gpu=False):
paddle.set_device('gpu') if use_gpu else paddle.set_device('cpu')
# Word segmentation
text = ' '.join(jieba.cut(text))
# For decoding max length
decoder_max_length = 1
# For decoding cache
cache = None
# For decoding start token id
bos_id = None
# Current source word index
i = 0
# For decoding: is_last=True, max_len=256
is_last = False
# Tokenized id
user_input_tokenized = []
# Store the translation
result = []
bpe_str, tokenized_src = self.tokenizer.tokenize(text)
while i < len(tokenized_src):
user_input_tokenized.append(tokenized_src[i])
if bpe_str[i] in ['。', '?', '!']:
is_last = True
result, cache, bos_id = predict(
user_input_tokenized,
decoder_max_length,
is_last,
cache,
bos_id,
result,
self.tokenizer,
self.transformer,
max_out_len=self.max_out_len)
i += 1
return " ".join(result)
\ No newline at end of file
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle
from paddlenlp.data import Vocab
from subword_nmt import subword_nmt
class STACLTokenizer:
"""
Jieba+BPE, and convert tokens to ids.
"""
def __init__(self,
bpe_codes_fpath,
src_vocab_fpath,
trg_vocab_fpath,
special_token=["<s>", "<e>", "<unk>"]):
bpe_parser = subword_nmt.create_apply_bpe_parser()
bpe_args = bpe_parser.parse_args(args=['-c', bpe_codes_fpath])
self.bpe = subword_nmt.BPE(bpe_args.codes, bpe_args.merges,
bpe_args.separator, None,
bpe_args.glossaries)
self.src_vocab = Vocab.load_vocabulary(
src_vocab_fpath,
bos_token=special_token[0],
eos_token=special_token[1],
unk_token=special_token[2])
self.trg_vocab = Vocab.load_vocabulary(
trg_vocab_fpath,
bos_token=special_token[0],
eos_token=special_token[1],
unk_token=special_token[2])
self.src_vocab_size = len(self.src_vocab)
self.trg_vocab_size = len(self.trg_vocab)
def tokenize(self, text):
bpe_str = self.bpe.process_line(text)
ids = self.src_vocab.to_indices(bpe_str.split())
return bpe_str.split(), ids
def post_process_seq(seq,
bos_idx=0,
eos_idx=1,
output_bos=False,
output_eos=False):
"""
Post-process the decoded sequence.
"""
eos_pos = len(seq) - 1
for i, idx in enumerate(seq):
if idx == eos_idx:
eos_pos = i
break
seq = [
idx for idx in seq[:eos_pos + 1]
if (output_bos or idx != bos_idx) and (output_eos or idx != eos_idx)
]
return seq
def predict(tokenized_src,
decoder_max_length,
is_last,
cache,
bos_id,
result,
tokenizer,
transformer,
n_best=1,
max_out_len=256,
eos_idx=1,
waitk=1,
):
# Set evaluate mode
transformer.eval()
if len(tokenized_src) < waitk:
return result, cache, bos_id
with paddle.no_grad():
paddle.disable_static()
input_src = tokenized_src
if is_last:
decoder_max_length = max_out_len
input_src += [eos_idx]
src_word = paddle.to_tensor(input_src).unsqueeze(axis=0)
finished_seq, finished_scores, cache = transformer.greedy_search(
src_word,
max_len=decoder_max_length,
waitk=waitk,
caches=cache,
bos_id=bos_id)
finished_seq = finished_seq.numpy()
for beam_idx, beam in enumerate(finished_seq[0]):
if beam_idx >= n_best:
break
id_list = post_process_seq(beam)
if len(id_list) == 0:
continue
bos_id = id_list[-1]
word_list = tokenizer.trg_vocab.to_tokens(id_list)
for word in word_list:
result.append(word)
res = ' '.join(word_list).replace('@@ ', '')
paddle.enable_static()
return result, cache, bos_id
\ No newline at end of file
# transformer_nist_wait_3
|模型名称|transformer_nist_wait_3|
| :--- | :---: |
|类别|同声传译|
|网络|transformer|
|数据集|NIST 2008-中英翻译数据集|
|是否支持Fine-tuning|否|
|模型大小|377MB|
|最新更新日期|2021-09-17|
|数据指标|-|
## 一、模型基本信息
- ### 模型介绍
- 同声传译(Simultaneous Translation),即在句子完成之前进行翻译,同声传译的目标是实现同声传译的自动化,它可以与源语言同时翻译,延迟时间只有几秒钟。
STACL 是论文 [STACL: Simultaneous Translation with Implicit Anticipation and Controllable Latency using Prefix-to-Prefix Framework](https://www.aclweb.org/anthology/P19-1289/) 中针对同传提出的适用于所有同传场景的翻译架构。
- STACL 主要具有以下优势:
- Prefix-to-Prefix架构拥有预测能力,即在未看到源词的情况下仍然可以翻译出对应的目标词,克服了SOV→SVO等词序差异
<p align="center">
<img src="https://user-images.githubusercontent.com/40840292/133761990-13e55d0f-5c3a-476c-8865-5808d13cba97.png"> <br />
</p>
和传统的机器翻译模型主要的区别在于翻译时是否需要利用全句的源句。上图中,Seq2Seq模型需要等到全句的源句(1-5)全部输入Encoder后,Decoder才开始解码进行翻译;而STACL架构采用了Wait-k(图中Wait-2)的策略,当源句只有两个词(1和2)输入到Encoder后,Decoder即可开始解码预测目标句的第一个词。
- Wait-k策略可以不需要全句的源句,直接预测目标句,可以实现任意的字级延迟,同时保持较高的翻译质量。
<p align="center">
<img src="https://user-images.githubusercontent.com/40840292/133762098-6ea6f3ca-0d70-4a0a-981d-0fcc6f3cd96b.png"> <br />
</p>
Wait-k策略首先等待源句单词,然后与源句的其余部分同时翻译,即输出总是隐藏在输入后面。这是受到同声传译人员的启发,同声传译人员通常会在几秒钟内开始翻译演讲者的演讲,在演讲者结束几秒钟后完成。例如,如果k=2,第一个目标词使用前2个源词预测,第二个目标词使用前3个源词预测,以此类推。上图中,(a)simultaneous: our wait-2 等到"布什"和"总统"输入后就开始解码预测"pres.",而(b) non-simultaneous baseline 为传统的翻译模型,需要等到整句"布什 总统 在 莫斯科 与 普京 会晤"才开始解码预测。
- 该PaddleHub Module基于transformer网络结构,采用wait-3策略进行中文到英文的翻译。
## 二、安装
- ### 1、环境依赖
- paddlepaddle >= 2.1.0
- paddlehub >= 2.1.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
- ### 2、安装
- ```shell
$ hub install transformer_nist_wait_3
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、预测代码示例
- ```python
import paddlehub as hub
model = hub.Module(name="transformer_nist_wait_3")
# 待预测数据(模拟同声传译实时输入)
text = [
"他",
"他还",
"他还说",
"他还说现在",
"他还说现在正在",
"他还说现在正在为",
"他还说现在正在为这",
"他还说现在正在为这一",
"他还说现在正在为这一会议",
"他还说现在正在为这一会议作出",
"他还说现在正在为这一会议作出安排",
"他还说现在正在为这一会议作出安排。",
]
for t in text:
print("input: {}".format(t))
result = model.translate(t)
print("model output: {}\n".format(result))
# input: 他
# model output:
#
# input: 他还
# model output:
#
# input: 他还说
# model output: he
#
# input: 他还说现在
# model output: he also
#
# input: 他还说现在正在
# model output: he also said
#
# input: 他还说现在正在为
# model output: he also said that
#
# input: 他还说现在正在为这
# model output: he also said that he
#
# input: 他还说现在正在为这一
# model output: he also said that he is
#
# input: 他还说现在正在为这一会议
# model output: he also said that he is making
#
# input: 他还说现在正在为这一会议作出
# model output: he also said that he is making preparations
#
# input: 他还说现在正在为这一会议作出安排
# model output: he also said that he is making preparations for
#
# input: 他还说现在正在为这一会议作出安排。
# model output: he also said that he is making preparations for this meeting .
```
- ### 2、 API
- ```python
__init__(max_length=256, max_out_len=256)
```
- 初始化module, 可配置模型的输入文本的最大长度
- **参数**
- max_length(int): 输入文本的最大长度,默认值为256。
- max_out_len(int): 输出文本的最大解码长度,超过最大解码长度时会截断句子的后半部分,默认值为256。
- ```python
translate(text, use_gpu=False)
```
- 预测API,输入源语言的文本(模拟同传语音输入),解码后输出翻译后的目标语言文本。
- **参数**
- text(str): 输入源语言的文本,数据类型为str
- use_gpu(bool): 是否使用gpu进行预测,默认为False
- **返回**
- result(str): 翻译后的目标语言文本。
## 四、服务部署
- PaddleHub Serving可以部署一个在线语义匹配服务,可以将此接口用于在线web应用。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m transformer_nist_wait_3
```
- 启动时会显示加载模型过程,启动成功后显示
- ```shell
Loading transformer_nist_wait_3 successful.
```
- 这样就完成了服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
# 待预测数据(模拟同声传译实时输入)
text = [
"他",
"他还",
"他还说",
"他还说现在",
"他还说现在正在",
"他还说现在正在为",
"他还说现在正在为这",
"他还说现在正在为这一",
"他还说现在正在为这一会议",
"他还说现在正在为这一会议作出",
"他还说现在正在为这一会议作出安排",
"他还说现在正在为这一会议作出安排。",
]
# 指定预测方法为transformer_nist_wait_3并发送post请求,content-type类型应指定json方式
# HOST_IP为服务器IP
url = "http://HOST_IP:8866/predict/transformer_nist_wait_3"
headers = {"Content-Type": "application/json"}
for t in text:
print("input: {}".format(t))
r = requests.post(url=url, headers=headers, data=json.dumps(t))
# 打印预测结果
print("model output: {}\n".format(result))
- 关于PaddleHub Serving更多信息参考:[服务部署](../../../../docs/docs_ch/tutorial/serving.md)
## 五、更新历史
* 1.0.0
初始发布
```shell
hub install transformer_nist_wait_3==1.0.0
```
\ No newline at end of file
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import numpy as np
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddlenlp.transformers import WordEmbedding, PositionalEmbedding
class DecoderLayer(nn.TransformerDecoderLayer):
def __init__(self, *args, **kwargs):
super(DecoderLayer, self).__init__(*args, **kwargs)
def forward(self, tgt, memory, tgt_mask=None, memory_mask=None, cache=None):
residual = tgt
if self.normalize_before:
tgt = self.norm1(tgt)
if cache is None:
tgt = self.self_attn(tgt, tgt, tgt, tgt_mask, None)
else:
tgt, incremental_cache = self.self_attn(tgt, tgt, tgt, tgt_mask,
cache[0])
tgt = residual + self.dropout1(tgt)
if not self.normalize_before:
tgt = self.norm1(tgt)
residual = tgt
if self.normalize_before:
tgt = self.norm2(tgt)
if len(memory) == 1:
# Full sent
tgt = self.cross_attn(tgt, memory[0], memory[0], memory_mask, None)
else:
# Wait-k policy
cross_attn_outputs = []
for i in range(tgt.shape[1]):
q = tgt[:, i:i + 1, :]
if i >= len(memory):
e = memory[-1]
else:
e = memory[i]
cross_attn_outputs.append(
self.cross_attn(q, e, e, memory_mask[:, :, i:i + 1, :
e.shape[1]], None))
tgt = paddle.concat(cross_attn_outputs, axis=1)
tgt = residual + self.dropout2(tgt)
if not self.normalize_before:
tgt = self.norm2(tgt)
residual = tgt
if self.normalize_before:
tgt = self.norm3(tgt)
tgt = self.linear2(self.dropout(self.activation(self.linear1(tgt))))
tgt = residual + self.dropout3(tgt)
if not self.normalize_before:
tgt = self.norm3(tgt)
return tgt if cache is None else (tgt, (incremental_cache, ))
class Decoder(nn.TransformerDecoder):
"""
PaddlePaddle 2.1 casts memory_mask.dtype to memory.dtype, but in STACL,
type of memory is list, having no dtype attribute.
"""
def forward(self, tgt, memory, tgt_mask=None, memory_mask=None, cache=None):
output = tgt
new_caches = []
for i, mod in enumerate(self.layers):
if cache is None:
output = mod(output,
memory,
tgt_mask=tgt_mask,
memory_mask=memory_mask,
cache=None)
else:
output, new_cache = mod(output,
memory,
tgt_mask=tgt_mask,
memory_mask=memory_mask,
cache=cache[i])
new_caches.append(new_cache)
if self.norm is not None:
output = self.norm(output)
return output if cache is None else (output, new_caches)
class SimultaneousTransformer(nn.Layer):
"""
model
"""
def __init__(self,
src_vocab_size,
trg_vocab_size,
max_length=256,
n_layer=6,
n_head=8,
d_model=512,
d_inner_hid=2048,
dropout=0.1,
weight_sharing=False,
bos_id=0,
eos_id=1,
waitk=-1):
super(SimultaneousTransformer, self).__init__()
self.trg_vocab_size = trg_vocab_size
self.emb_dim = d_model
self.bos_id = bos_id
self.eos_id = eos_id
self.dropout = dropout
self.waitk = waitk
self.n_layer = n_layer
self.n_head = n_head
self.d_model = d_model
self.src_word_embedding = WordEmbedding(
vocab_size=src_vocab_size, emb_dim=d_model, bos_id=self.bos_id)
self.src_pos_embedding = PositionalEmbedding(
emb_dim=d_model, max_length=max_length+1)
if weight_sharing:
assert src_vocab_size == trg_vocab_size, (
"Vocabularies in source and target should be same for weight sharing."
)
self.trg_word_embedding = self.src_word_embedding
self.trg_pos_embedding = self.src_pos_embedding
else:
self.trg_word_embedding = WordEmbedding(
vocab_size=trg_vocab_size, emb_dim=d_model, bos_id=self.bos_id)
self.trg_pos_embedding = PositionalEmbedding(
emb_dim=d_model, max_length=max_length+1)
encoder_layer = nn.TransformerEncoderLayer(
d_model=d_model,
nhead=n_head,
dim_feedforward=d_inner_hid,
dropout=dropout,
activation='relu',
normalize_before=True,
bias_attr=[False, True])
encoder_norm = nn.LayerNorm(d_model)
self.encoder = nn.TransformerEncoder(
encoder_layer=encoder_layer, num_layers=n_layer, norm=encoder_norm)
decoder_layer = DecoderLayer(
d_model=d_model,
nhead=n_head,
dim_feedforward=d_inner_hid,
dropout=dropout,
activation='relu',
normalize_before=True,
bias_attr=[False, False, True])
decoder_norm = nn.LayerNorm(d_model)
self.decoder = Decoder(
decoder_layer=decoder_layer, num_layers=n_layer, norm=decoder_norm)
if weight_sharing:
self.linear = lambda x: paddle.matmul(
x=x, y=self.trg_word_embedding.word_embedding.weight, transpose_y=True)
else:
self.linear = nn.Linear(
in_features=d_model,
out_features=trg_vocab_size,
bias_attr=False)
def forward(self, src_word, trg_word):
src_max_len = paddle.shape(src_word)[-1]
trg_max_len = paddle.shape(trg_word)[-1]
base_attn_bias = paddle.cast(
src_word == self.bos_id,
dtype=paddle.get_default_dtype()).unsqueeze([1, 2]) * -1e9
src_slf_attn_bias = base_attn_bias
src_slf_attn_bias.stop_gradient = True
trg_slf_attn_bias = paddle.tensor.triu(
(paddle.ones(
(trg_max_len, trg_max_len),
dtype=paddle.get_default_dtype()) * -np.inf),
1)
trg_slf_attn_bias.stop_gradient = True
trg_src_attn_bias = paddle.tile(base_attn_bias, [1, 1, trg_max_len, 1])
src_pos = paddle.cast(
src_word != self.bos_id, dtype="int64") * paddle.arange(
start=0, end=src_max_len)
trg_pos = paddle.cast(
trg_word != self.bos_id, dtype="int64") * paddle.arange(
start=0, end=trg_max_len)
src_emb = self.src_word_embedding(src_word)
src_pos_emb = self.src_pos_embedding(src_pos)
src_emb = src_emb + src_pos_emb
enc_input = F.dropout(
src_emb, p=self.dropout,
training=self.training) if self.dropout else src_emb
with paddle.static.amp.fp16_guard():
if self.waitk >= src_max_len or self.waitk == -1:
# Full sentence
enc_outputs = [
self.encoder(
enc_input, src_mask=src_slf_attn_bias)
]
else:
# Wait-k policy
enc_outputs = []
for i in range(self.waitk, src_max_len + 1):
enc_output = self.encoder(
enc_input[:, :i, :],
src_mask=src_slf_attn_bias[:, :, :, :i])
enc_outputs.append(enc_output)
trg_emb = self.trg_word_embedding(trg_word)
trg_pos_emb = self.trg_pos_embedding(trg_pos)
trg_emb = trg_emb + trg_pos_emb
dec_input = F.dropout(
trg_emb, p=self.dropout,
training=self.training) if self.dropout else trg_emb
dec_output = self.decoder(
dec_input,
enc_outputs,
tgt_mask=trg_slf_attn_bias,
memory_mask=trg_src_attn_bias)
predict = self.linear(dec_output)
return predict
def beam_search(self, src_word, beam_size=4, max_len=256, waitk=-1):
# TODO: "Speculative Beam Search for Simultaneous Translation"
raise NotImplementedError
def greedy_search(self,
src_word,
max_len=256,
waitk=-1,
caches=None,
bos_id=None):
"""
greedy_search uses streaming reader. It doesn't need calling
encoder many times, an a sub-sentence just needs calling encoder once.
So, it needs previous state(caches) and last one of generated
tokens id last time.
"""
src_max_len = paddle.shape(src_word)[-1]
base_attn_bias = paddle.cast(
src_word == self.bos_id,
dtype=paddle.get_default_dtype()).unsqueeze([1, 2]) * -1e9
src_slf_attn_bias = base_attn_bias
src_slf_attn_bias.stop_gradient = True
trg_src_attn_bias = paddle.tile(base_attn_bias, [1, 1, 1, 1])
src_pos = paddle.cast(
src_word != self.bos_id, dtype="int64") * paddle.arange(
start=0, end=src_max_len)
src_emb = self.src_word_embedding(src_word)
src_pos_emb = self.src_pos_embedding(src_pos)
src_emb = src_emb + src_pos_emb
enc_input = F.dropout(
src_emb, p=self.dropout,
training=self.training) if self.dropout else src_emb
enc_outputs = [self.encoder(enc_input, src_mask=src_slf_attn_bias)]
# constant number
batch_size = enc_outputs[-1].shape[0]
max_len = (
enc_outputs[-1].shape[1] + 20) if max_len is None else max_len
end_token_tensor = paddle.full(
shape=[batch_size, 1], fill_value=self.eos_id, dtype="int64")
predict_ids = []
log_probs = paddle.full(
shape=[batch_size, 1], fill_value=0, dtype="float32")
if not bos_id:
trg_word = paddle.full(
shape=[batch_size, 1], fill_value=self.bos_id, dtype="int64")
else:
trg_word = paddle.full(
shape=[batch_size, 1], fill_value=bos_id, dtype="int64")
# init states (caches) for transformer
if not caches:
caches = self.decoder.gen_cache(enc_outputs[-1], do_zip=False)
for i in range(max_len):
trg_pos = paddle.full(
shape=trg_word.shape, fill_value=i, dtype="int64")
trg_emb = self.trg_word_embedding(trg_word)
trg_pos_emb = self.trg_pos_embedding(trg_pos)
trg_emb = trg_emb + trg_pos_emb
dec_input = F.dropout(
trg_emb, p=self.dropout,
training=self.training) if self.dropout else trg_emb
if waitk < 0 or i >= len(enc_outputs):
# if the decoder step is full sent or longer than all source
# step, then read the whole src
_e = enc_outputs[-1]
dec_output, caches = self.decoder(
dec_input, [_e], None,
trg_src_attn_bias[:, :, :, :_e.shape[1]], caches)
else:
_e = enc_outputs[i]
dec_output, caches = self.decoder(
dec_input, [_e], None,
trg_src_attn_bias[:, :, :, :_e.shape[1]], caches)
dec_output = paddle.reshape(
dec_output, shape=[-1, dec_output.shape[-1]])
logits = self.linear(dec_output)
step_log_probs = paddle.log(F.softmax(logits, axis=-1))
log_probs = paddle.add(x=step_log_probs, y=log_probs)
scores = log_probs
topk_scores, topk_indices = paddle.topk(x=scores, k=1)
finished = paddle.equal(topk_indices, end_token_tensor)
trg_word = topk_indices
log_probs = topk_scores
predict_ids.append(topk_indices)
if paddle.all(finished).numpy():
break
predict_ids = paddle.stack(predict_ids, axis=0)
finished_seq = paddle.transpose(predict_ids, [1, 2, 0])
finished_scores = topk_scores
return finished_seq, finished_scores, caches
\ No newline at end of file
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import jieba
import paddle
from paddlenlp.transformers import position_encoding_init
from paddlenlp.transformers import WordEmbedding, PositionalEmbedding
from paddlehub.env import MODULE_HOME
from paddlehub.module.module import moduleinfo, serving
from transformer_nist_wait_3.model import SimultaneousTransformer
from transformer_nist_wait_3.processor import STACLTokenizer, predict
@moduleinfo(
name="transformer_nist_wait_3",
version="1.0.0",
summary="",
author="PaddlePaddle",
author_email="",
type="nlp/simultaneous_translation",
)
class STTransformer():
"""
Transformer model for simultaneous translation.
"""
# Model config
model_config = {
# Number of head used in multi-head attention.
"n_head": 8,
# Number of sub-layers to be stacked in the encoder and decoder.
"n_layer": 6,
# The dimension for word embeddings, which is also the last dimension of
# the input and output of multi-head attention, position-wise feed-forward
# networks, encoder and decoder.
"d_model": 512,
}
def __init__(self,
max_length=256,
max_out_len=256,
):
super(STTransformer, self).__init__()
bpe_codes_fpath = os.path.join(MODULE_HOME, "transformer_nist_wait_3", "assets", "2M.zh2en.dict4bpe.zh")
src_vocab_fpath = os.path.join(MODULE_HOME, "transformer_nist_wait_3", "assets", "nist.20k.zh.vocab")
trg_vocab_fpath = os.path.join(MODULE_HOME, "transformer_nist_wait_3", "assets", "nist.10k.en.vocab")
params_fpath = os.path.join(MODULE_HOME, "transformer_nist_wait_3", "assets", "transformer.pdparams")
self.max_length = max_length
self.max_out_len = max_out_len
self.tokenizer = STACLTokenizer(
bpe_codes_fpath,
src_vocab_fpath,
trg_vocab_fpath,
)
src_vocab_size = self.tokenizer.src_vocab_size
trg_vocab_size = self.tokenizer.trg_vocab_size
self.transformer = SimultaneousTransformer(
src_vocab_size,
trg_vocab_size,
max_length=self.max_length,
n_layer=self.model_config['n_layer'],
n_head=self.model_config['n_head'],
d_model=self.model_config['d_model'],
)
model_dict = paddle.load(params_fpath)
# To avoid a longer length than training, reset the size of position
# encoding to max_length
model_dict["src_pos_embedding.pos_encoder.weight"] = position_encoding_init(
self.max_length + 1, self.model_config['d_model'])
model_dict["trg_pos_embedding.pos_encoder.weight"] = position_encoding_init(
self.max_length + 1, self.model_config['d_model'])
self.transformer.load_dict(model_dict)
@serving
def translate(self, text, use_gpu=False):
paddle.set_device('gpu') if use_gpu else paddle.set_device('cpu')
# Word segmentation
text = ' '.join(jieba.cut(text))
# For decoding max length
decoder_max_length = 1
# For decoding cache
cache = None
# For decoding start token id
bos_id = None
# Current source word index
i = 0
# For decoding: is_last=True, max_len=256
is_last = False
# Tokenized id
user_input_tokenized = []
# Store the translation
result = []
bpe_str, tokenized_src = self.tokenizer.tokenize(text)
while i < len(tokenized_src):
user_input_tokenized.append(tokenized_src[i])
if bpe_str[i] in ['。', '?', '!']:
is_last = True
result, cache, bos_id = predict(
user_input_tokenized,
decoder_max_length,
is_last,
cache,
bos_id,
result,
self.tokenizer,
self.transformer,
max_out_len=self.max_out_len)
i += 1
return " ".join(result)
\ No newline at end of file
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle
from paddlenlp.data import Vocab
from subword_nmt import subword_nmt
class STACLTokenizer:
"""
Jieba+BPE, and convert tokens to ids.
"""
def __init__(self,
bpe_codes_fpath,
src_vocab_fpath,
trg_vocab_fpath,
special_token=["<s>", "<e>", "<unk>"]):
bpe_parser = subword_nmt.create_apply_bpe_parser()
bpe_args = bpe_parser.parse_args(args=['-c', bpe_codes_fpath])
self.bpe = subword_nmt.BPE(bpe_args.codes, bpe_args.merges,
bpe_args.separator, None,
bpe_args.glossaries)
self.src_vocab = Vocab.load_vocabulary(
src_vocab_fpath,
bos_token=special_token[0],
eos_token=special_token[1],
unk_token=special_token[2])
self.trg_vocab = Vocab.load_vocabulary(
trg_vocab_fpath,
bos_token=special_token[0],
eos_token=special_token[1],
unk_token=special_token[2])
self.src_vocab_size = len(self.src_vocab)
self.trg_vocab_size = len(self.trg_vocab)
def tokenize(self, text):
bpe_str = self.bpe.process_line(text)
ids = self.src_vocab.to_indices(bpe_str.split())
return bpe_str.split(), ids
def post_process_seq(seq,
bos_idx=0,
eos_idx=1,
output_bos=False,
output_eos=False):
"""
Post-process the decoded sequence.
"""
eos_pos = len(seq) - 1
for i, idx in enumerate(seq):
if idx == eos_idx:
eos_pos = i
break
seq = [
idx for idx in seq[:eos_pos + 1]
if (output_bos or idx != bos_idx) and (output_eos or idx != eos_idx)
]
return seq
def predict(tokenized_src,
decoder_max_length,
is_last,
cache,
bos_id,
result,
tokenizer,
transformer,
n_best=1,
max_out_len=256,
eos_idx=1,
waitk=3,
):
# Set evaluate mode
transformer.eval()
if len(tokenized_src) < waitk:
return result, cache, bos_id
with paddle.no_grad():
paddle.disable_static()
input_src = tokenized_src
if is_last:
decoder_max_length = max_out_len
input_src += [eos_idx]
src_word = paddle.to_tensor(input_src).unsqueeze(axis=0)
finished_seq, finished_scores, cache = transformer.greedy_search(
src_word,
max_len=decoder_max_length,
waitk=waitk,
caches=cache,
bos_id=bos_id)
finished_seq = finished_seq.numpy()
for beam_idx, beam in enumerate(finished_seq[0]):
if beam_idx >= n_best:
break
id_list = post_process_seq(beam)
if len(id_list) == 0:
continue
bos_id = id_list[-1]
word_list = tokenizer.trg_vocab.to_tokens(id_list)
for word in word_list:
result.append(word)
res = ' '.join(word_list).replace('@@ ', '')
paddle.enable_static()
return result, cache, bos_id
\ No newline at end of file
# transformer_nist_wait_5
|模型名称|transformer_nist_wait_5|
| :--- | :---: |
|类别|同声传译|
|网络|transformer|
|数据集|NIST 2008-中英翻译数据集|
|是否支持Fine-tuning|否|
|模型大小|377MB|
|最新更新日期|2021-09-17|
|数据指标|-|
## 一、模型基本信息
- ### 模型介绍
- 同声传译(Simultaneous Translation),即在句子完成之前进行翻译,同声传译的目标是实现同声传译的自动化,它可以与源语言同时翻译,延迟时间只有几秒钟。
STACL 是论文 [STACL: Simultaneous Translation with Implicit Anticipation and Controllable Latency using Prefix-to-Prefix Framework](https://www.aclweb.org/anthology/P19-1289/) 中针对同传提出的适用于所有同传场景的翻译架构。
- STACL 主要具有以下优势:
- Prefix-to-Prefix架构拥有预测能力,即在未看到源词的情况下仍然可以翻译出对应的目标词,克服了SOV→SVO等词序差异
<p align="center">
<img src="https://user-images.githubusercontent.com/40840292/133761990-13e55d0f-5c3a-476c-8865-5808d13cba97.png"> <br />
</p>
和传统的机器翻译模型主要的区别在于翻译时是否需要利用全句的源句。上图中,Seq2Seq模型需要等到全句的源句(1-5)全部输入Encoder后,Decoder才开始解码进行翻译;而STACL架构采用了Wait-k(图中Wait-2)的策略,当源句只有两个词(1和2)输入到Encoder后,Decoder即可开始解码预测目标句的第一个词。
- Wait-k策略可以不需要全句的源句,直接预测目标句,可以实现任意的字级延迟,同时保持较高的翻译质量。
<p align="center">
<img src="https://user-images.githubusercontent.com/40840292/133762098-6ea6f3ca-0d70-4a0a-981d-0fcc6f3cd96b.png"> <br />
</p>
Wait-k策略首先等待源句单词,然后与源句的其余部分同时翻译,即输出总是隐藏在输入后面。这是受到同声传译人员的启发,同声传译人员通常会在几秒钟内开始翻译演讲者的演讲,在演讲者结束几秒钟后完成。例如,如果k=2,第一个目标词使用前2个源词预测,第二个目标词使用前3个源词预测,以此类推。上图中,(a)simultaneous: our wait-2 等到"布什"和"总统"输入后就开始解码预测"pres.",而(b) non-simultaneous baseline 为传统的翻译模型,需要等到整句"布什 总统 在 莫斯科 与 普京 会晤"才开始解码预测。
- 该PaddleHub Module基于transformer网络结构,采用wait-5策略进行中文到英文的翻译。
## 二、安装
- ### 1、环境依赖
- paddlepaddle >= 2.1.0
- paddlehub >= 2.1.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
- ### 2、安装
- ```shell
$ hub install transformer_nist_wait_5
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、预测代码示例
- ```python
import paddlehub as hub
model = hub.Module(name="transformer_nist_wait_5")
# 待预测数据(模拟同声传译实时输入)
text = [
"他",
"他还",
"他还说",
"他还说现在",
"他还说现在正在",
"他还说现在正在为",
"他还说现在正在为这",
"他还说现在正在为这一",
"他还说现在正在为这一会议",
"他还说现在正在为这一会议作出",
"他还说现在正在为这一会议作出安排",
"他还说现在正在为这一会议作出安排。",
]
for t in text:
print("input: {}".format(t))
result = model.translate(t)
print("model output: {}\n".format(result))
# input: 他
# model output:
#
# input: 他还
# model output:
#
# input: 他还说
# model output:
#
# input: 他还说现在
# model output:
#
# input: 他还说现在正在
# model output: he
#
# input: 他还说现在正在为
# model output: he also
#
# input: 他还说现在正在为这
# model output: he also said
#
# input: 他还说现在正在为这一
# model output: he also said that
#
# input: 他还说现在正在为这一会议
# model output: he also said that he
#
# input: 他还说现在正在为这一会议作出
# model output: he also said that he was
#
# input: 他还说现在正在为这一会议作出安排
# model output: he also said that he was making
#
# input: 他还说现在正在为这一会议作出安排。
# model output: he also said that he was making arrangements for this meeting .
```
- ### 2、 API
- ```python
__init__(max_length=256, max_out_len=256)
```
- 初始化module, 可配置模型的输入文本的最大长度
- **参数**
- max_length(int): 输入文本的最大长度,默认值为256。
- max_out_len(int): 输出文本的最大解码长度,超过最大解码长度时会截断句子的后半部分,默认值为256。
- ```python
translate(text, use_gpu=False)
```
- 预测API,输入源语言的文本(模拟同传语音输入),解码后输出翻译后的目标语言文本。
- **参数**
- text(str): 输入源语言的文本,数据类型为str
- use_gpu(bool): 是否使用gpu进行预测,默认为False
- **返回**
- result(str): 翻译后的目标语言文本。
## 四、服务部署
- PaddleHub Serving可以部署一个在线语义匹配服务,可以将此接口用于在线web应用。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m transformer_nist_wait_5
```
- 启动时会显示加载模型过程,启动成功后显示
- ```shell
Loading transformer_nist_wait_5 successful.
```
- 这样就完成了服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
# 待预测数据(模拟同声传译实时输入)
text = [
"他",
"他还",
"他还说",
"他还说现在",
"他还说现在正在",
"他还说现在正在为",
"他还说现在正在为这",
"他还说现在正在为这一",
"他还说现在正在为这一会议",
"他还说现在正在为这一会议作出",
"他还说现在正在为这一会议作出安排",
"他还说现在正在为这一会议作出安排。",
]
# 指定预测方法为transformer_nist_wait_5并发送post请求,content-type类型应指定json方式
# HOST_IP为服务器IP
url = "http://HOST_IP:8866/predict/transformer_nist_wait_5"
headers = {"Content-Type": "application/json"}
for t in text:
print("input: {}".format(t))
r = requests.post(url=url, headers=headers, data=json.dumps(t))
# 打印预测结果
print("model output: {}\n".format(result))
- 关于PaddleHub Serving更多信息参考:[服务部署](../../../../docs/docs_ch/tutorial/serving.md)
## 五、更新历史
* 1.0.0
初始发布
```shell
hub install transformer_nist_wait_5==1.0.0
```
\ No newline at end of file
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import numpy as np
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddlenlp.transformers import WordEmbedding, PositionalEmbedding
class DecoderLayer(nn.TransformerDecoderLayer):
def __init__(self, *args, **kwargs):
super(DecoderLayer, self).__init__(*args, **kwargs)
def forward(self, tgt, memory, tgt_mask=None, memory_mask=None, cache=None):
residual = tgt
if self.normalize_before:
tgt = self.norm1(tgt)
if cache is None:
tgt = self.self_attn(tgt, tgt, tgt, tgt_mask, None)
else:
tgt, incremental_cache = self.self_attn(tgt, tgt, tgt, tgt_mask,
cache[0])
tgt = residual + self.dropout1(tgt)
if not self.normalize_before:
tgt = self.norm1(tgt)
residual = tgt
if self.normalize_before:
tgt = self.norm2(tgt)
if len(memory) == 1:
# Full sent
tgt = self.cross_attn(tgt, memory[0], memory[0], memory_mask, None)
else:
# Wait-k policy
cross_attn_outputs = []
for i in range(tgt.shape[1]):
q = tgt[:, i:i + 1, :]
if i >= len(memory):
e = memory[-1]
else:
e = memory[i]
cross_attn_outputs.append(
self.cross_attn(q, e, e, memory_mask[:, :, i:i + 1, :
e.shape[1]], None))
tgt = paddle.concat(cross_attn_outputs, axis=1)
tgt = residual + self.dropout2(tgt)
if not self.normalize_before:
tgt = self.norm2(tgt)
residual = tgt
if self.normalize_before:
tgt = self.norm3(tgt)
tgt = self.linear2(self.dropout(self.activation(self.linear1(tgt))))
tgt = residual + self.dropout3(tgt)
if not self.normalize_before:
tgt = self.norm3(tgt)
return tgt if cache is None else (tgt, (incremental_cache, ))
class Decoder(nn.TransformerDecoder):
"""
PaddlePaddle 2.1 casts memory_mask.dtype to memory.dtype, but in STACL,
type of memory is list, having no dtype attribute.
"""
def forward(self, tgt, memory, tgt_mask=None, memory_mask=None, cache=None):
output = tgt
new_caches = []
for i, mod in enumerate(self.layers):
if cache is None:
output = mod(output,
memory,
tgt_mask=tgt_mask,
memory_mask=memory_mask,
cache=None)
else:
output, new_cache = mod(output,
memory,
tgt_mask=tgt_mask,
memory_mask=memory_mask,
cache=cache[i])
new_caches.append(new_cache)
if self.norm is not None:
output = self.norm(output)
return output if cache is None else (output, new_caches)
class SimultaneousTransformer(nn.Layer):
"""
model
"""
def __init__(self,
src_vocab_size,
trg_vocab_size,
max_length=256,
n_layer=6,
n_head=8,
d_model=512,
d_inner_hid=2048,
dropout=0.1,
weight_sharing=False,
bos_id=0,
eos_id=1,
waitk=-1):
super(SimultaneousTransformer, self).__init__()
self.trg_vocab_size = trg_vocab_size
self.emb_dim = d_model
self.bos_id = bos_id
self.eos_id = eos_id
self.dropout = dropout
self.waitk = waitk
self.n_layer = n_layer
self.n_head = n_head
self.d_model = d_model
self.src_word_embedding = WordEmbedding(
vocab_size=src_vocab_size, emb_dim=d_model, bos_id=self.bos_id)
self.src_pos_embedding = PositionalEmbedding(
emb_dim=d_model, max_length=max_length+1)
if weight_sharing:
assert src_vocab_size == trg_vocab_size, (
"Vocabularies in source and target should be same for weight sharing."
)
self.trg_word_embedding = self.src_word_embedding
self.trg_pos_embedding = self.src_pos_embedding
else:
self.trg_word_embedding = WordEmbedding(
vocab_size=trg_vocab_size, emb_dim=d_model, bos_id=self.bos_id)
self.trg_pos_embedding = PositionalEmbedding(
emb_dim=d_model, max_length=max_length+1)
encoder_layer = nn.TransformerEncoderLayer(
d_model=d_model,
nhead=n_head,
dim_feedforward=d_inner_hid,
dropout=dropout,
activation='relu',
normalize_before=True,
bias_attr=[False, True])
encoder_norm = nn.LayerNorm(d_model)
self.encoder = nn.TransformerEncoder(
encoder_layer=encoder_layer, num_layers=n_layer, norm=encoder_norm)
decoder_layer = DecoderLayer(
d_model=d_model,
nhead=n_head,
dim_feedforward=d_inner_hid,
dropout=dropout,
activation='relu',
normalize_before=True,
bias_attr=[False, False, True])
decoder_norm = nn.LayerNorm(d_model)
self.decoder = Decoder(
decoder_layer=decoder_layer, num_layers=n_layer, norm=decoder_norm)
if weight_sharing:
self.linear = lambda x: paddle.matmul(
x=x, y=self.trg_word_embedding.word_embedding.weight, transpose_y=True)
else:
self.linear = nn.Linear(
in_features=d_model,
out_features=trg_vocab_size,
bias_attr=False)
def forward(self, src_word, trg_word):
src_max_len = paddle.shape(src_word)[-1]
trg_max_len = paddle.shape(trg_word)[-1]
base_attn_bias = paddle.cast(
src_word == self.bos_id,
dtype=paddle.get_default_dtype()).unsqueeze([1, 2]) * -1e9
src_slf_attn_bias = base_attn_bias
src_slf_attn_bias.stop_gradient = True
trg_slf_attn_bias = paddle.tensor.triu(
(paddle.ones(
(trg_max_len, trg_max_len),
dtype=paddle.get_default_dtype()) * -np.inf),
1)
trg_slf_attn_bias.stop_gradient = True
trg_src_attn_bias = paddle.tile(base_attn_bias, [1, 1, trg_max_len, 1])
src_pos = paddle.cast(
src_word != self.bos_id, dtype="int64") * paddle.arange(
start=0, end=src_max_len)
trg_pos = paddle.cast(
trg_word != self.bos_id, dtype="int64") * paddle.arange(
start=0, end=trg_max_len)
src_emb = self.src_word_embedding(src_word)
src_pos_emb = self.src_pos_embedding(src_pos)
src_emb = src_emb + src_pos_emb
enc_input = F.dropout(
src_emb, p=self.dropout,
training=self.training) if self.dropout else src_emb
with paddle.static.amp.fp16_guard():
if self.waitk >= src_max_len or self.waitk == -1:
# Full sentence
enc_outputs = [
self.encoder(
enc_input, src_mask=src_slf_attn_bias)
]
else:
# Wait-k policy
enc_outputs = []
for i in range(self.waitk, src_max_len + 1):
enc_output = self.encoder(
enc_input[:, :i, :],
src_mask=src_slf_attn_bias[:, :, :, :i])
enc_outputs.append(enc_output)
trg_emb = self.trg_word_embedding(trg_word)
trg_pos_emb = self.trg_pos_embedding(trg_pos)
trg_emb = trg_emb + trg_pos_emb
dec_input = F.dropout(
trg_emb, p=self.dropout,
training=self.training) if self.dropout else trg_emb
dec_output = self.decoder(
dec_input,
enc_outputs,
tgt_mask=trg_slf_attn_bias,
memory_mask=trg_src_attn_bias)
predict = self.linear(dec_output)
return predict
def beam_search(self, src_word, beam_size=4, max_len=256, waitk=-1):
# TODO: "Speculative Beam Search for Simultaneous Translation"
raise NotImplementedError
def greedy_search(self,
src_word,
max_len=256,
waitk=-1,
caches=None,
bos_id=None):
"""
greedy_search uses streaming reader. It doesn't need calling
encoder many times, an a sub-sentence just needs calling encoder once.
So, it needs previous state(caches) and last one of generated
tokens id last time.
"""
src_max_len = paddle.shape(src_word)[-1]
base_attn_bias = paddle.cast(
src_word == self.bos_id,
dtype=paddle.get_default_dtype()).unsqueeze([1, 2]) * -1e9
src_slf_attn_bias = base_attn_bias
src_slf_attn_bias.stop_gradient = True
trg_src_attn_bias = paddle.tile(base_attn_bias, [1, 1, 1, 1])
src_pos = paddle.cast(
src_word != self.bos_id, dtype="int64") * paddle.arange(
start=0, end=src_max_len)
src_emb = self.src_word_embedding(src_word)
src_pos_emb = self.src_pos_embedding(src_pos)
src_emb = src_emb + src_pos_emb
enc_input = F.dropout(
src_emb, p=self.dropout,
training=self.training) if self.dropout else src_emb
enc_outputs = [self.encoder(enc_input, src_mask=src_slf_attn_bias)]
# constant number
batch_size = enc_outputs[-1].shape[0]
max_len = (
enc_outputs[-1].shape[1] + 20) if max_len is None else max_len
end_token_tensor = paddle.full(
shape=[batch_size, 1], fill_value=self.eos_id, dtype="int64")
predict_ids = []
log_probs = paddle.full(
shape=[batch_size, 1], fill_value=0, dtype="float32")
if not bos_id:
trg_word = paddle.full(
shape=[batch_size, 1], fill_value=self.bos_id, dtype="int64")
else:
trg_word = paddle.full(
shape=[batch_size, 1], fill_value=bos_id, dtype="int64")
# init states (caches) for transformer
if not caches:
caches = self.decoder.gen_cache(enc_outputs[-1], do_zip=False)
for i in range(max_len):
trg_pos = paddle.full(
shape=trg_word.shape, fill_value=i, dtype="int64")
trg_emb = self.trg_word_embedding(trg_word)
trg_pos_emb = self.trg_pos_embedding(trg_pos)
trg_emb = trg_emb + trg_pos_emb
dec_input = F.dropout(
trg_emb, p=self.dropout,
training=self.training) if self.dropout else trg_emb
if waitk < 0 or i >= len(enc_outputs):
# if the decoder step is full sent or longer than all source
# step, then read the whole src
_e = enc_outputs[-1]
dec_output, caches = self.decoder(
dec_input, [_e], None,
trg_src_attn_bias[:, :, :, :_e.shape[1]], caches)
else:
_e = enc_outputs[i]
dec_output, caches = self.decoder(
dec_input, [_e], None,
trg_src_attn_bias[:, :, :, :_e.shape[1]], caches)
dec_output = paddle.reshape(
dec_output, shape=[-1, dec_output.shape[-1]])
logits = self.linear(dec_output)
step_log_probs = paddle.log(F.softmax(logits, axis=-1))
log_probs = paddle.add(x=step_log_probs, y=log_probs)
scores = log_probs
topk_scores, topk_indices = paddle.topk(x=scores, k=1)
finished = paddle.equal(topk_indices, end_token_tensor)
trg_word = topk_indices
log_probs = topk_scores
predict_ids.append(topk_indices)
if paddle.all(finished).numpy():
break
predict_ids = paddle.stack(predict_ids, axis=0)
finished_seq = paddle.transpose(predict_ids, [1, 2, 0])
finished_scores = topk_scores
return finished_seq, finished_scores, caches
\ No newline at end of file
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import jieba
import paddle
from paddlenlp.transformers import position_encoding_init
from paddlenlp.transformers import WordEmbedding, PositionalEmbedding
from paddlehub.env import MODULE_HOME
from paddlehub.module.module import moduleinfo, serving
from transformer_nist_wait_5.model import SimultaneousTransformer
from transformer_nist_wait_5.processor import STACLTokenizer, predict
@moduleinfo(
name="transformer_nist_wait_5",
version="1.0.0",
summary="",
author="PaddlePaddle",
author_email="",
type="nlp/simultaneous_translation",
)
class STTransformer():
"""
Transformer model for simultaneous translation.
"""
# Model config
model_config = {
# Number of head used in multi-head attention.
"n_head": 8,
# Number of sub-layers to be stacked in the encoder and decoder.
"n_layer": 6,
# The dimension for word embeddings, which is also the last dimension of
# the input and output of multi-head attention, position-wise feed-forward
# networks, encoder and decoder.
"d_model": 512,
}
def __init__(self,
max_length=256,
max_out_len=256,
):
super(STTransformer, self).__init__()
bpe_codes_fpath = os.path.join(MODULE_HOME, "transformer_nist_wait_5", "assets", "2M.zh2en.dict4bpe.zh")
src_vocab_fpath = os.path.join(MODULE_HOME, "transformer_nist_wait_5", "assets", "nist.20k.zh.vocab")
trg_vocab_fpath = os.path.join(MODULE_HOME, "transformer_nist_wait_5", "assets", "nist.10k.en.vocab")
params_fpath = os.path.join(MODULE_HOME, "transformer_nist_wait_5", "assets", "transformer.pdparams")
self.max_length = max_length
self.max_out_len = max_out_len
self.tokenizer = STACLTokenizer(
bpe_codes_fpath,
src_vocab_fpath,
trg_vocab_fpath,
)
src_vocab_size = self.tokenizer.src_vocab_size
trg_vocab_size = self.tokenizer.trg_vocab_size
self.transformer = SimultaneousTransformer(
src_vocab_size,
trg_vocab_size,
max_length=self.max_length,
n_layer=self.model_config['n_layer'],
n_head=self.model_config['n_head'],
d_model=self.model_config['d_model'],
)
model_dict = paddle.load(params_fpath)
# To avoid a longer length than training, reset the size of position
# encoding to max_length
model_dict["src_pos_embedding.pos_encoder.weight"] = position_encoding_init(
self.max_length + 1, self.model_config['d_model'])
model_dict["trg_pos_embedding.pos_encoder.weight"] = position_encoding_init(
self.max_length + 1, self.model_config['d_model'])
self.transformer.load_dict(model_dict)
@serving
def translate(self, text, use_gpu=False):
paddle.set_device('gpu') if use_gpu else paddle.set_device('cpu')
# Word segmentation
text = ' '.join(jieba.cut(text))
# For decoding max length
decoder_max_length = 1
# For decoding cache
cache = None
# For decoding start token id
bos_id = None
# Current source word index
i = 0
# For decoding: is_last=True, max_len=256
is_last = False
# Tokenized id
user_input_tokenized = []
# Store the translation
result = []
bpe_str, tokenized_src = self.tokenizer.tokenize(text)
while i < len(tokenized_src):
user_input_tokenized.append(tokenized_src[i])
if bpe_str[i] in ['。', '?', '!']:
is_last = True
result, cache, bos_id = predict(
user_input_tokenized,
decoder_max_length,
is_last,
cache,
bos_id,
result,
self.tokenizer,
self.transformer,
max_out_len=self.max_out_len)
i += 1
return " ".join(result)
\ No newline at end of file
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle
from paddlenlp.data import Vocab
from subword_nmt import subword_nmt
class STACLTokenizer:
"""
Jieba+BPE, and convert tokens to ids.
"""
def __init__(self,
bpe_codes_fpath,
src_vocab_fpath,
trg_vocab_fpath,
special_token=["<s>", "<e>", "<unk>"]):
bpe_parser = subword_nmt.create_apply_bpe_parser()
bpe_args = bpe_parser.parse_args(args=['-c', bpe_codes_fpath])
self.bpe = subword_nmt.BPE(bpe_args.codes, bpe_args.merges,
bpe_args.separator, None,
bpe_args.glossaries)
self.src_vocab = Vocab.load_vocabulary(
src_vocab_fpath,
bos_token=special_token[0],
eos_token=special_token[1],
unk_token=special_token[2])
self.trg_vocab = Vocab.load_vocabulary(
trg_vocab_fpath,
bos_token=special_token[0],
eos_token=special_token[1],
unk_token=special_token[2])
self.src_vocab_size = len(self.src_vocab)
self.trg_vocab_size = len(self.trg_vocab)
def tokenize(self, text):
bpe_str = self.bpe.process_line(text)
ids = self.src_vocab.to_indices(bpe_str.split())
return bpe_str.split(), ids
def post_process_seq(seq,
bos_idx=0,
eos_idx=1,
output_bos=False,
output_eos=False):
"""
Post-process the decoded sequence.
"""
eos_pos = len(seq) - 1
for i, idx in enumerate(seq):
if idx == eos_idx:
eos_pos = i
break
seq = [
idx for idx in seq[:eos_pos + 1]
if (output_bos or idx != bos_idx) and (output_eos or idx != eos_idx)
]
return seq
def predict(tokenized_src,
decoder_max_length,
is_last,
cache,
bos_id,
result,
tokenizer,
transformer,
n_best=1,
max_out_len=256,
eos_idx=1,
waitk=5,
):
# Set evaluate mode
transformer.eval()
if len(tokenized_src) < waitk:
return result, cache, bos_id
with paddle.no_grad():
paddle.disable_static()
input_src = tokenized_src
if is_last:
decoder_max_length = max_out_len
input_src += [eos_idx]
src_word = paddle.to_tensor(input_src).unsqueeze(axis=0)
finished_seq, finished_scores, cache = transformer.greedy_search(
src_word,
max_len=decoder_max_length,
waitk=waitk,
caches=cache,
bos_id=bos_id)
finished_seq = finished_seq.numpy()
for beam_idx, beam in enumerate(finished_seq[0]):
if beam_idx >= n_best:
break
id_list = post_process_seq(beam)
if len(id_list) == 0:
continue
bos_id = id_list[-1]
word_list = tokenizer.trg_vocab.to_tokens(id_list)
for word in word_list:
result.append(word)
res = ' '.join(word_list).replace('@@ ', '')
paddle.enable_static()
return result, cache, bos_id
\ No newline at end of file
# transformer_nist_wait_7
|模型名称|transformer_nist_wait_7|
| :--- | :---: |
|类别|同声传译|
|网络|transformer|
|数据集|NIST 2008-中英翻译数据集|
|是否支持Fine-tuning|否|
|模型大小|377MB|
|最新更新日期|2021-09-17|
|数据指标|-|
## 一、模型基本信息
- ### 模型介绍
- 同声传译(Simultaneous Translation),即在句子完成之前进行翻译,同声传译的目标是实现同声传译的自动化,它可以与源语言同时翻译,延迟时间只有几秒钟。
STACL 是论文 [STACL: Simultaneous Translation with Implicit Anticipation and Controllable Latency using Prefix-to-Prefix Framework](https://www.aclweb.org/anthology/P19-1289/) 中针对同传提出的适用于所有同传场景的翻译架构。
- STACL 主要具有以下优势:
- Prefix-to-Prefix架构拥有预测能力,即在未看到源词的情况下仍然可以翻译出对应的目标词,克服了SOV→SVO等词序差异
<p align="center">
<img src="https://user-images.githubusercontent.com/40840292/133761990-13e55d0f-5c3a-476c-8865-5808d13cba97.png"> <br />
</p>
和传统的机器翻译模型主要的区别在于翻译时是否需要利用全句的源句。上图中,Seq2Seq模型需要等到全句的源句(1-5)全部输入Encoder后,Decoder才开始解码进行翻译;而STACL架构采用了Wait-k(图中Wait-2)的策略,当源句只有两个词(1和2)输入到Encoder后,Decoder即可开始解码预测目标句的第一个词。
- Wait-k策略可以不需要全句的源句,直接预测目标句,可以实现任意的字级延迟,同时保持较高的翻译质量。
<p align="center">
<img src="https://user-images.githubusercontent.com/40840292/133762098-6ea6f3ca-0d70-4a0a-981d-0fcc6f3cd96b.png"> <br />
</p>
Wait-k策略首先等待源句单词,然后与源句的其余部分同时翻译,即输出总是隐藏在输入后面。这是受到同声传译人员的启发,同声传译人员通常会在几秒钟内开始翻译演讲者的演讲,在演讲者结束几秒钟后完成。例如,如果k=2,第一个目标词使用前2个源词预测,第二个目标词使用前3个源词预测,以此类推。上图中,(a)simultaneous: our wait-2 等到"布什"和"总统"输入后就开始解码预测"pres.",而(b) non-simultaneous baseline 为传统的翻译模型,需要等到整句"布什 总统 在 莫斯科 与 普京 会晤"才开始解码预测。
- 该PaddleHub Module基于transformer网络结构,采用wait-7策略进行中文到英文的翻译。
## 二、安装
- ### 1、环境依赖
- paddlepaddle >= 2.1.0
- paddlehub >= 2.1.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
- ### 2、安装
- ```shell
$ hub install transformer_nist_wait_7
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、预测代码示例
- ```python
import paddlehub as hub
model = hub.Module(name="transformer_nist_wait_7")
# 待预测数据(模拟同声传译实时输入)
text = [
"他",
"他还",
"他还说",
"他还说现在",
"他还说现在正在",
"他还说现在正在为",
"他还说现在正在为这",
"他还说现在正在为这一",
"他还说现在正在为这一会议",
"他还说现在正在为这一会议作出",
"他还说现在正在为这一会议作出安排",
"他还说现在正在为这一会议作出安排。",
]
for t in text:
print("input: {}".format(t))
result = model.translate(t)
print("model output: {}\n".format(result))
# input: 他
# model output:
#
# input: 他还
# model output:
#
# input: 他还说
# model output:
#
# input: 他还说现在
# model output:
#
# input: 他还说现在正在
# model output:
#
# input: 他还说现在正在为
# model output:
#
# input: 他还说现在正在为这
# model output: he
#
# input: 他还说现在正在为这一
# model output: he also
#
# input: 他还说现在正在为这一会议
# model output: he also said
#
# input: 他还说现在正在为这一会议作出
# model output: he also said that
#
# input: 他还说现在正在为这一会议作出安排
# model output: he also said that arrangements
#
# input: 他还说现在正在为这一会议作出安排。
# model output: he also said that arrangements are now being made for this meeting .
```
- ### 2、 API
- ```python
__init__(max_length=256, max_out_len=256)
```
- 初始化module, 可配置模型的输入文本的最大长度
- **参数**
- max_length(int): 输入文本的最大长度,默认值为256。
- max_out_len(int): 输出文本的最大解码长度,超过最大解码长度时会截断句子的后半部分,默认值为256。
- ```python
translate(text, use_gpu=False)
```
- 预测API,输入源语言的文本(模拟同传语音输入),解码后输出翻译后的目标语言文本。
- **参数**
- text(str): 输入源语言的文本,数据类型为str
- use_gpu(bool): 是否使用gpu进行预测,默认为False
- **返回**
- result(str): 翻译后的目标语言文本。
## 四、服务部署
- PaddleHub Serving可以部署一个在线语义匹配服务,可以将此接口用于在线web应用。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m transformer_nist_wait_7
```
- 启动时会显示加载模型过程,启动成功后显示
- ```shell
Loading transformer_nist_wait_7 successful.
```
- 这样就完成了服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
# 待预测数据(模拟同声传译实时输入)
text = [
"他",
"他还",
"他还说",
"他还说现在",
"他还说现在正在",
"他还说现在正在为",
"他还说现在正在为这",
"他还说现在正在为这一",
"他还说现在正在为这一会议",
"他还说现在正在为这一会议作出",
"他还说现在正在为这一会议作出安排",
"他还说现在正在为这一会议作出安排。",
]
# 指定预测方法为transformer_nist_wait_7并发送post请求,content-type类型应指定json方式
# HOST_IP为服务器IP
url = "http://HOST_IP:8866/predict/transformer_nist_wait_7"
headers = {"Content-Type": "application/json"}
for t in text:
print("input: {}".format(t))
r = requests.post(url=url, headers=headers, data=json.dumps(t))
# 打印预测结果
print("model output: {}\n".format(result))
- 关于PaddleHub Serving更多信息参考:[服务部署](../../../../docs/docs_ch/tutorial/serving.md)
## 五、更新历史
* 1.0.0
初始发布
```shell
hub install transformer_nist_wait_7==1.0.0
```
\ No newline at end of file
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import numpy as np
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddlenlp.transformers import WordEmbedding, PositionalEmbedding
class DecoderLayer(nn.TransformerDecoderLayer):
def __init__(self, *args, **kwargs):
super(DecoderLayer, self).__init__(*args, **kwargs)
def forward(self, tgt, memory, tgt_mask=None, memory_mask=None, cache=None):
residual = tgt
if self.normalize_before:
tgt = self.norm1(tgt)
if cache is None:
tgt = self.self_attn(tgt, tgt, tgt, tgt_mask, None)
else:
tgt, incremental_cache = self.self_attn(tgt, tgt, tgt, tgt_mask,
cache[0])
tgt = residual + self.dropout1(tgt)
if not self.normalize_before:
tgt = self.norm1(tgt)
residual = tgt
if self.normalize_before:
tgt = self.norm2(tgt)
if len(memory) == 1:
# Full sent
tgt = self.cross_attn(tgt, memory[0], memory[0], memory_mask, None)
else:
# Wait-k policy
cross_attn_outputs = []
for i in range(tgt.shape[1]):
q = tgt[:, i:i + 1, :]
if i >= len(memory):
e = memory[-1]
else:
e = memory[i]
cross_attn_outputs.append(
self.cross_attn(q, e, e, memory_mask[:, :, i:i + 1, :
e.shape[1]], None))
tgt = paddle.concat(cross_attn_outputs, axis=1)
tgt = residual + self.dropout2(tgt)
if not self.normalize_before:
tgt = self.norm2(tgt)
residual = tgt
if self.normalize_before:
tgt = self.norm3(tgt)
tgt = self.linear2(self.dropout(self.activation(self.linear1(tgt))))
tgt = residual + self.dropout3(tgt)
if not self.normalize_before:
tgt = self.norm3(tgt)
return tgt if cache is None else (tgt, (incremental_cache, ))
class Decoder(nn.TransformerDecoder):
"""
PaddlePaddle 2.1 casts memory_mask.dtype to memory.dtype, but in STACL,
type of memory is list, having no dtype attribute.
"""
def forward(self, tgt, memory, tgt_mask=None, memory_mask=None, cache=None):
output = tgt
new_caches = []
for i, mod in enumerate(self.layers):
if cache is None:
output = mod(output,
memory,
tgt_mask=tgt_mask,
memory_mask=memory_mask,
cache=None)
else:
output, new_cache = mod(output,
memory,
tgt_mask=tgt_mask,
memory_mask=memory_mask,
cache=cache[i])
new_caches.append(new_cache)
if self.norm is not None:
output = self.norm(output)
return output if cache is None else (output, new_caches)
class SimultaneousTransformer(nn.Layer):
"""
model
"""
def __init__(self,
src_vocab_size,
trg_vocab_size,
max_length=256,
n_layer=6,
n_head=8,
d_model=512,
d_inner_hid=2048,
dropout=0.1,
weight_sharing=False,
bos_id=0,
eos_id=1,
waitk=-1):
super(SimultaneousTransformer, self).__init__()
self.trg_vocab_size = trg_vocab_size
self.emb_dim = d_model
self.bos_id = bos_id
self.eos_id = eos_id
self.dropout = dropout
self.waitk = waitk
self.n_layer = n_layer
self.n_head = n_head
self.d_model = d_model
self.src_word_embedding = WordEmbedding(
vocab_size=src_vocab_size, emb_dim=d_model, bos_id=self.bos_id)
self.src_pos_embedding = PositionalEmbedding(
emb_dim=d_model, max_length=max_length+1)
if weight_sharing:
assert src_vocab_size == trg_vocab_size, (
"Vocabularies in source and target should be same for weight sharing."
)
self.trg_word_embedding = self.src_word_embedding
self.trg_pos_embedding = self.src_pos_embedding
else:
self.trg_word_embedding = WordEmbedding(
vocab_size=trg_vocab_size, emb_dim=d_model, bos_id=self.bos_id)
self.trg_pos_embedding = PositionalEmbedding(
emb_dim=d_model, max_length=max_length+1)
encoder_layer = nn.TransformerEncoderLayer(
d_model=d_model,
nhead=n_head,
dim_feedforward=d_inner_hid,
dropout=dropout,
activation='relu',
normalize_before=True,
bias_attr=[False, True])
encoder_norm = nn.LayerNorm(d_model)
self.encoder = nn.TransformerEncoder(
encoder_layer=encoder_layer, num_layers=n_layer, norm=encoder_norm)
decoder_layer = DecoderLayer(
d_model=d_model,
nhead=n_head,
dim_feedforward=d_inner_hid,
dropout=dropout,
activation='relu',
normalize_before=True,
bias_attr=[False, False, True])
decoder_norm = nn.LayerNorm(d_model)
self.decoder = Decoder(
decoder_layer=decoder_layer, num_layers=n_layer, norm=decoder_norm)
if weight_sharing:
self.linear = lambda x: paddle.matmul(
x=x, y=self.trg_word_embedding.word_embedding.weight, transpose_y=True)
else:
self.linear = nn.Linear(
in_features=d_model,
out_features=trg_vocab_size,
bias_attr=False)
def forward(self, src_word, trg_word):
src_max_len = paddle.shape(src_word)[-1]
trg_max_len = paddle.shape(trg_word)[-1]
base_attn_bias = paddle.cast(
src_word == self.bos_id,
dtype=paddle.get_default_dtype()).unsqueeze([1, 2]) * -1e9
src_slf_attn_bias = base_attn_bias
src_slf_attn_bias.stop_gradient = True
trg_slf_attn_bias = paddle.tensor.triu(
(paddle.ones(
(trg_max_len, trg_max_len),
dtype=paddle.get_default_dtype()) * -np.inf),
1)
trg_slf_attn_bias.stop_gradient = True
trg_src_attn_bias = paddle.tile(base_attn_bias, [1, 1, trg_max_len, 1])
src_pos = paddle.cast(
src_word != self.bos_id, dtype="int64") * paddle.arange(
start=0, end=src_max_len)
trg_pos = paddle.cast(
trg_word != self.bos_id, dtype="int64") * paddle.arange(
start=0, end=trg_max_len)
src_emb = self.src_word_embedding(src_word)
src_pos_emb = self.src_pos_embedding(src_pos)
src_emb = src_emb + src_pos_emb
enc_input = F.dropout(
src_emb, p=self.dropout,
training=self.training) if self.dropout else src_emb
with paddle.static.amp.fp16_guard():
if self.waitk >= src_max_len or self.waitk == -1:
# Full sentence
enc_outputs = [
self.encoder(
enc_input, src_mask=src_slf_attn_bias)
]
else:
# Wait-k policy
enc_outputs = []
for i in range(self.waitk, src_max_len + 1):
enc_output = self.encoder(
enc_input[:, :i, :],
src_mask=src_slf_attn_bias[:, :, :, :i])
enc_outputs.append(enc_output)
trg_emb = self.trg_word_embedding(trg_word)
trg_pos_emb = self.trg_pos_embedding(trg_pos)
trg_emb = trg_emb + trg_pos_emb
dec_input = F.dropout(
trg_emb, p=self.dropout,
training=self.training) if self.dropout else trg_emb
dec_output = self.decoder(
dec_input,
enc_outputs,
tgt_mask=trg_slf_attn_bias,
memory_mask=trg_src_attn_bias)
predict = self.linear(dec_output)
return predict
def beam_search(self, src_word, beam_size=4, max_len=256, waitk=-1):
# TODO: "Speculative Beam Search for Simultaneous Translation"
raise NotImplementedError
def greedy_search(self,
src_word,
max_len=256,
waitk=-1,
caches=None,
bos_id=None):
"""
greedy_search uses streaming reader. It doesn't need calling
encoder many times, an a sub-sentence just needs calling encoder once.
So, it needs previous state(caches) and last one of generated
tokens id last time.
"""
src_max_len = paddle.shape(src_word)[-1]
base_attn_bias = paddle.cast(
src_word == self.bos_id,
dtype=paddle.get_default_dtype()).unsqueeze([1, 2]) * -1e9
src_slf_attn_bias = base_attn_bias
src_slf_attn_bias.stop_gradient = True
trg_src_attn_bias = paddle.tile(base_attn_bias, [1, 1, 1, 1])
src_pos = paddle.cast(
src_word != self.bos_id, dtype="int64") * paddle.arange(
start=0, end=src_max_len)
src_emb = self.src_word_embedding(src_word)
src_pos_emb = self.src_pos_embedding(src_pos)
src_emb = src_emb + src_pos_emb
enc_input = F.dropout(
src_emb, p=self.dropout,
training=self.training) if self.dropout else src_emb
enc_outputs = [self.encoder(enc_input, src_mask=src_slf_attn_bias)]
# constant number
batch_size = enc_outputs[-1].shape[0]
max_len = (
enc_outputs[-1].shape[1] + 20) if max_len is None else max_len
end_token_tensor = paddle.full(
shape=[batch_size, 1], fill_value=self.eos_id, dtype="int64")
predict_ids = []
log_probs = paddle.full(
shape=[batch_size, 1], fill_value=0, dtype="float32")
if not bos_id:
trg_word = paddle.full(
shape=[batch_size, 1], fill_value=self.bos_id, dtype="int64")
else:
trg_word = paddle.full(
shape=[batch_size, 1], fill_value=bos_id, dtype="int64")
# init states (caches) for transformer
if not caches:
caches = self.decoder.gen_cache(enc_outputs[-1], do_zip=False)
for i in range(max_len):
trg_pos = paddle.full(
shape=trg_word.shape, fill_value=i, dtype="int64")
trg_emb = self.trg_word_embedding(trg_word)
trg_pos_emb = self.trg_pos_embedding(trg_pos)
trg_emb = trg_emb + trg_pos_emb
dec_input = F.dropout(
trg_emb, p=self.dropout,
training=self.training) if self.dropout else trg_emb
if waitk < 0 or i >= len(enc_outputs):
# if the decoder step is full sent or longer than all source
# step, then read the whole src
_e = enc_outputs[-1]
dec_output, caches = self.decoder(
dec_input, [_e], None,
trg_src_attn_bias[:, :, :, :_e.shape[1]], caches)
else:
_e = enc_outputs[i]
dec_output, caches = self.decoder(
dec_input, [_e], None,
trg_src_attn_bias[:, :, :, :_e.shape[1]], caches)
dec_output = paddle.reshape(
dec_output, shape=[-1, dec_output.shape[-1]])
logits = self.linear(dec_output)
step_log_probs = paddle.log(F.softmax(logits, axis=-1))
log_probs = paddle.add(x=step_log_probs, y=log_probs)
scores = log_probs
topk_scores, topk_indices = paddle.topk(x=scores, k=1)
finished = paddle.equal(topk_indices, end_token_tensor)
trg_word = topk_indices
log_probs = topk_scores
predict_ids.append(topk_indices)
if paddle.all(finished).numpy():
break
predict_ids = paddle.stack(predict_ids, axis=0)
finished_seq = paddle.transpose(predict_ids, [1, 2, 0])
finished_scores = topk_scores
return finished_seq, finished_scores, caches
\ No newline at end of file
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import jieba
import paddle
from paddlenlp.transformers import position_encoding_init
from paddlenlp.transformers import WordEmbedding, PositionalEmbedding
from paddlehub.env import MODULE_HOME
from paddlehub.module.module import moduleinfo, serving
from transformer_nist_wait_7.model import SimultaneousTransformer
from transformer_nist_wait_7.processor import STACLTokenizer, predict
@moduleinfo(
name="transformer_nist_wait_7",
version="1.0.0",
summary="",
author="PaddlePaddle",
author_email="",
type="nlp/simultaneous_translation",
)
class STTransformer():
"""
Transformer model for simultaneous translation.
"""
# Model config
model_config = {
# Number of head used in multi-head attention.
"n_head": 8,
# Number of sub-layers to be stacked in the encoder and decoder.
"n_layer": 6,
# The dimension for word embeddings, which is also the last dimension of
# the input and output of multi-head attention, position-wise feed-forward
# networks, encoder and decoder.
"d_model": 512,
}
def __init__(self,
max_length=256,
max_out_len=256,
):
super(STTransformer, self).__init__()
bpe_codes_fpath = os.path.join(MODULE_HOME, "transformer_nist_wait_7", "assets", "2M.zh2en.dict4bpe.zh")
src_vocab_fpath = os.path.join(MODULE_HOME, "transformer_nist_wait_7", "assets", "nist.20k.zh.vocab")
trg_vocab_fpath = os.path.join(MODULE_HOME, "transformer_nist_wait_7", "assets", "nist.10k.en.vocab")
params_fpath = os.path.join(MODULE_HOME, "transformer_nist_wait_7", "assets", "transformer.pdparams")
self.max_length = max_length
self.max_out_len = max_out_len
self.tokenizer = STACLTokenizer(
bpe_codes_fpath,
src_vocab_fpath,
trg_vocab_fpath,
)
src_vocab_size = self.tokenizer.src_vocab_size
trg_vocab_size = self.tokenizer.trg_vocab_size
self.transformer = SimultaneousTransformer(
src_vocab_size,
trg_vocab_size,
max_length=self.max_length,
n_layer=self.model_config['n_layer'],
n_head=self.model_config['n_head'],
d_model=self.model_config['d_model'],
)
model_dict = paddle.load(params_fpath)
# To avoid a longer length than training, reset the size of position
# encoding to max_length
model_dict["src_pos_embedding.pos_encoder.weight"] = position_encoding_init(
self.max_length + 1, self.model_config['d_model'])
model_dict["trg_pos_embedding.pos_encoder.weight"] = position_encoding_init(
self.max_length + 1, self.model_config['d_model'])
self.transformer.load_dict(model_dict)
@serving
def translate(self, text, use_gpu=False):
paddle.set_device('gpu') if use_gpu else paddle.set_device('cpu')
# Word segmentation
text = ' '.join(jieba.cut(text))
# For decoding max length
decoder_max_length = 1
# For decoding cache
cache = None
# For decoding start token id
bos_id = None
# Current source word index
i = 0
# For decoding: is_last=True, max_len=256
is_last = False
# Tokenized id
user_input_tokenized = []
# Store the translation
result = []
bpe_str, tokenized_src = self.tokenizer.tokenize(text)
while i < len(tokenized_src):
user_input_tokenized.append(tokenized_src[i])
if bpe_str[i] in ['。', '?', '!']:
is_last = True
result, cache, bos_id = predict(
user_input_tokenized,
decoder_max_length,
is_last,
cache,
bos_id,
result,
self.tokenizer,
self.transformer,
max_out_len=self.max_out_len)
i += 1
return " ".join(result)
\ No newline at end of file
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle
from paddlenlp.data import Vocab
from subword_nmt import subword_nmt
class STACLTokenizer:
"""
Jieba+BPE, and convert tokens to ids.
"""
def __init__(self,
bpe_codes_fpath,
src_vocab_fpath,
trg_vocab_fpath,
special_token=["<s>", "<e>", "<unk>"]):
bpe_parser = subword_nmt.create_apply_bpe_parser()
bpe_args = bpe_parser.parse_args(args=['-c', bpe_codes_fpath])
self.bpe = subword_nmt.BPE(bpe_args.codes, bpe_args.merges,
bpe_args.separator, None,
bpe_args.glossaries)
self.src_vocab = Vocab.load_vocabulary(
src_vocab_fpath,
bos_token=special_token[0],
eos_token=special_token[1],
unk_token=special_token[2])
self.trg_vocab = Vocab.load_vocabulary(
trg_vocab_fpath,
bos_token=special_token[0],
eos_token=special_token[1],
unk_token=special_token[2])
self.src_vocab_size = len(self.src_vocab)
self.trg_vocab_size = len(self.trg_vocab)
def tokenize(self, text):
bpe_str = self.bpe.process_line(text)
ids = self.src_vocab.to_indices(bpe_str.split())
return bpe_str.split(), ids
def post_process_seq(seq,
bos_idx=0,
eos_idx=1,
output_bos=False,
output_eos=False):
"""
Post-process the decoded sequence.
"""
eos_pos = len(seq) - 1
for i, idx in enumerate(seq):
if idx == eos_idx:
eos_pos = i
break
seq = [
idx for idx in seq[:eos_pos + 1]
if (output_bos or idx != bos_idx) and (output_eos or idx != eos_idx)
]
return seq
def predict(tokenized_src,
decoder_max_length,
is_last,
cache,
bos_id,
result,
tokenizer,
transformer,
n_best=1,
max_out_len=256,
eos_idx=1,
waitk=7,
):
# Set evaluate mode
transformer.eval()
if len(tokenized_src) < waitk:
return result, cache, bos_id
with paddle.no_grad():
paddle.disable_static()
input_src = tokenized_src
if is_last:
decoder_max_length = max_out_len
input_src += [eos_idx]
src_word = paddle.to_tensor(input_src).unsqueeze(axis=0)
finished_seq, finished_scores, cache = transformer.greedy_search(
src_word,
max_len=decoder_max_length,
waitk=waitk,
caches=cache,
bos_id=bos_id)
finished_seq = finished_seq.numpy()
for beam_idx, beam in enumerate(finished_seq[0]):
if beam_idx >= n_best:
break
id_list = post_process_seq(beam)
if len(id_list) == 0:
continue
bos_id = id_list[-1]
word_list = tokenizer.trg_vocab.to_tokens(id_list)
for word in word_list:
result.append(word)
res = ' '.join(word_list).replace('@@ ', '')
paddle.enable_static()
return result, cache, bos_id
\ No newline at end of file
# transformer_nist_wait_all
|模型名称|transformer_nist_wait_all|
| :--- | :---: |
|类别|同声传译|
|网络|transformer|
|数据集|NIST 2008-中英翻译数据集|
|是否支持Fine-tuning|否|
|模型大小|377MB|
|最新更新日期|2021-09-17|
|数据指标|-|
## 一、模型基本信息
- ### 模型介绍
- 同声传译(Simultaneous Translation),即在句子完成之前进行翻译,同声传译的目标是实现同声传译的自动化,它可以与源语言同时翻译,延迟时间只有几秒钟。
STACL 是论文 [STACL: Simultaneous Translation with Implicit Anticipation and Controllable Latency using Prefix-to-Prefix Framework](https://www.aclweb.org/anthology/P19-1289/) 中针对同传提出的适用于所有同传场景的翻译架构。
- STACL 主要具有以下优势:
- Prefix-to-Prefix架构拥有预测能力,即在未看到源词的情况下仍然可以翻译出对应的目标词,克服了SOV→SVO等词序差异
<p align="center">
<img src="https://user-images.githubusercontent.com/40840292/133761990-13e55d0f-5c3a-476c-8865-5808d13cba97.png"> <br />
</p>
和传统的机器翻译模型主要的区别在于翻译时是否需要利用全句的源句。上图中,Seq2Seq模型需要等到全句的源句(1-5)全部输入Encoder后,Decoder才开始解码进行翻译;而STACL架构采用了Wait-k(图中Wait-2)的策略,当源句只有两个词(1和2)输入到Encoder后,Decoder即可开始解码预测目标句的第一个词。
- Wait-k策略可以不需要全句的源句,直接预测目标句,可以实现任意的字级延迟,同时保持较高的翻译质量。
<p align="center">
<img src="https://user-images.githubusercontent.com/40840292/133762098-6ea6f3ca-0d70-4a0a-981d-0fcc6f3cd96b.png"> <br />
</p>
Wait-k策略首先等待源句单词,然后与源句的其余部分同时翻译,即输出总是隐藏在输入后面。这是受到同声传译人员的启发,同声传译人员通常会在几秒钟内开始翻译演讲者的演讲,在演讲者结束几秒钟后完成。例如,如果k=2,第一个目标词使用前2个源词预测,第二个目标词使用前3个源词预测,以此类推。上图中,(a)simultaneous: our wait-2 等到"布什"和"总统"输入后就开始解码预测"pres.",而(b) non-simultaneous baseline 为传统的翻译模型,需要等到整句"布什 总统 在 莫斯科 与 普京 会晤"才开始解码预测。
- 该PaddleHub Module基于transformer网络结构,采用的策略是等到全句结束再进行中文到英文的翻译,即waitk=-1。
## 二、安装
- ### 1、环境依赖
- paddlepaddle >= 2.1.0
- paddlehub >= 2.1.0 | [如何安装PaddleHub](../../../../docs/docs_ch/get_start/installation.rst)
- ### 2、安装
- ```shell
$ hub install transformer_nist_wait_all
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、预测代码示例
- ```python
import paddlehub as hub
model = hub.Module(name="transformer_nist_wait_all")
# 待预测数据(模拟同声传译实时输入)
text = [
"他",
"他还",
"他还说",
"他还说现在",
"他还说现在正在",
"他还说现在正在为",
"他还说现在正在为这",
"他还说现在正在为这一",
"他还说现在正在为这一会议",
"他还说现在正在为这一会议作出",
"他还说现在正在为这一会议作出安排",
"他还说现在正在为这一会议作出安排。",
]
for t in text:
print("input: {}".format(t))
result = model.translate(t)
print("model output: {}\n".format(result))
# input: 他
# model output:
#
# input: 他还
# model output:
#
# input: 他还说
# model output:
#
# input: 他还说现在
# model output:
#
# input: 他还说现在正在
# model output:
#
# input: 他还说现在正在为
# model output:
#
# input: 他还说现在正在为这
# model output:
#
# input: 他还说现在正在为这一
# model output:
#
# input: 他还说现在正在为这一会议
# model output:
#
# input: 他还说现在正在为这一会议作出
# model output:
#
# input: 他还说现在正在为这一会议作出安排
# model output:
#
# input: 他还说现在正在为这一会议作出安排。
# model output: he also said that arrangements are now being made for this meeting .
```
- ### 2、 API
- ```python
__init__(max_length=256, max_out_len=256)
```
- 初始化module, 可配置模型的输入文本的最大长度
- **参数**
- max_length(int): 输入文本的最大长度,默认值为256。
- max_out_len(int): 输出文本的最大解码长度,超过最大解码长度时会截断句子的后半部分,默认值为256。
- ```python
translate(text, use_gpu=False)
```
- 预测API,输入源语言的文本(模拟同传语音输入),解码后输出翻译后的目标语言文本。
- **参数**
- text(str): 输入源语言的文本,数据类型为str
- use_gpu(bool): 是否使用gpu进行预测,默认为False
- **返回**
- result(str): 翻译后的目标语言文本。
## 四、服务部署
- PaddleHub Serving可以部署一个在线语义匹配服务,可以将此接口用于在线web应用。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m transformer_nist_wait_all
```
- 启动时会显示加载模型过程,启动成功后显示
- ```shell
Loading transformer_nist_wait_all successful.
```
- 这样就完成了服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA_VISIBLE_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
# 待预测数据(模拟同声传译实时输入)
text = [
"他",
"他还",
"他还说",
"他还说现在",
"他还说现在正在",
"他还说现在正在为",
"他还说现在正在为这",
"他还说现在正在为这一",
"他还说现在正在为这一会议",
"他还说现在正在为这一会议作出",
"他还说现在正在为这一会议作出安排",
"他还说现在正在为这一会议作出安排。",
]
# 指定预测方法为transformer_nist_wait_all并发送post请求,content-type类型应指定json方式
# HOST_IP为服务器IP
url = "http://HOST_IP:8866/predict/transformer_nist_wait_all"
headers = {"Content-Type": "application/json"}
for t in text:
print("input: {}".format(t))
r = requests.post(url=url, headers=headers, data=json.dumps(t))
# 打印预测结果
print("model output: {}\n".format(result))
- 关于PaddleHub Serving更多信息参考:[服务部署](../../../../docs/docs_ch/tutorial/serving.md)
## 五、更新历史
* 1.0.0
初始发布
```shell
hub install transformer_nist_wait_all==1.0.0
```
\ No newline at end of file
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import numpy as np
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddlenlp.transformers import WordEmbedding, PositionalEmbedding
class DecoderLayer(nn.TransformerDecoderLayer):
def __init__(self, *args, **kwargs):
super(DecoderLayer, self).__init__(*args, **kwargs)
def forward(self, tgt, memory, tgt_mask=None, memory_mask=None, cache=None):
residual = tgt
if self.normalize_before:
tgt = self.norm1(tgt)
if cache is None:
tgt = self.self_attn(tgt, tgt, tgt, tgt_mask, None)
else:
tgt, incremental_cache = self.self_attn(tgt, tgt, tgt, tgt_mask,
cache[0])
tgt = residual + self.dropout1(tgt)
if not self.normalize_before:
tgt = self.norm1(tgt)
residual = tgt
if self.normalize_before:
tgt = self.norm2(tgt)
if len(memory) == 1:
# Full sent
tgt = self.cross_attn(tgt, memory[0], memory[0], memory_mask, None)
else:
# Wait-k policy
cross_attn_outputs = []
for i in range(tgt.shape[1]):
q = tgt[:, i:i + 1, :]
if i >= len(memory):
e = memory[-1]
else:
e = memory[i]
cross_attn_outputs.append(
self.cross_attn(q, e, e, memory_mask[:, :, i:i + 1, :
e.shape[1]], None))
tgt = paddle.concat(cross_attn_outputs, axis=1)
tgt = residual + self.dropout2(tgt)
if not self.normalize_before:
tgt = self.norm2(tgt)
residual = tgt
if self.normalize_before:
tgt = self.norm3(tgt)
tgt = self.linear2(self.dropout(self.activation(self.linear1(tgt))))
tgt = residual + self.dropout3(tgt)
if not self.normalize_before:
tgt = self.norm3(tgt)
return tgt if cache is None else (tgt, (incremental_cache, ))
class Decoder(nn.TransformerDecoder):
"""
PaddlePaddle 2.1 casts memory_mask.dtype to memory.dtype, but in STACL,
type of memory is list, having no dtype attribute.
"""
def forward(self, tgt, memory, tgt_mask=None, memory_mask=None, cache=None):
output = tgt
new_caches = []
for i, mod in enumerate(self.layers):
if cache is None:
output = mod(output,
memory,
tgt_mask=tgt_mask,
memory_mask=memory_mask,
cache=None)
else:
output, new_cache = mod(output,
memory,
tgt_mask=tgt_mask,
memory_mask=memory_mask,
cache=cache[i])
new_caches.append(new_cache)
if self.norm is not None:
output = self.norm(output)
return output if cache is None else (output, new_caches)
class SimultaneousTransformer(nn.Layer):
"""
model
"""
def __init__(self,
src_vocab_size,
trg_vocab_size,
max_length=256,
n_layer=6,
n_head=8,
d_model=512,
d_inner_hid=2048,
dropout=0.1,
weight_sharing=False,
bos_id=0,
eos_id=1,
waitk=-1):
super(SimultaneousTransformer, self).__init__()
self.trg_vocab_size = trg_vocab_size
self.emb_dim = d_model
self.bos_id = bos_id
self.eos_id = eos_id
self.dropout = dropout
self.waitk = waitk
self.n_layer = n_layer
self.n_head = n_head
self.d_model = d_model
self.src_word_embedding = WordEmbedding(
vocab_size=src_vocab_size, emb_dim=d_model, bos_id=self.bos_id)
self.src_pos_embedding = PositionalEmbedding(
emb_dim=d_model, max_length=max_length+1)
if weight_sharing:
assert src_vocab_size == trg_vocab_size, (
"Vocabularies in source and target should be same for weight sharing."
)
self.trg_word_embedding = self.src_word_embedding
self.trg_pos_embedding = self.src_pos_embedding
else:
self.trg_word_embedding = WordEmbedding(
vocab_size=trg_vocab_size, emb_dim=d_model, bos_id=self.bos_id)
self.trg_pos_embedding = PositionalEmbedding(
emb_dim=d_model, max_length=max_length+1)
encoder_layer = nn.TransformerEncoderLayer(
d_model=d_model,
nhead=n_head,
dim_feedforward=d_inner_hid,
dropout=dropout,
activation='relu',
normalize_before=True,
bias_attr=[False, True])
encoder_norm = nn.LayerNorm(d_model)
self.encoder = nn.TransformerEncoder(
encoder_layer=encoder_layer, num_layers=n_layer, norm=encoder_norm)
decoder_layer = DecoderLayer(
d_model=d_model,
nhead=n_head,
dim_feedforward=d_inner_hid,
dropout=dropout,
activation='relu',
normalize_before=True,
bias_attr=[False, False, True])
decoder_norm = nn.LayerNorm(d_model)
self.decoder = Decoder(
decoder_layer=decoder_layer, num_layers=n_layer, norm=decoder_norm)
if weight_sharing:
self.linear = lambda x: paddle.matmul(
x=x, y=self.trg_word_embedding.word_embedding.weight, transpose_y=True)
else:
self.linear = nn.Linear(
in_features=d_model,
out_features=trg_vocab_size,
bias_attr=False)
def forward(self, src_word, trg_word):
src_max_len = paddle.shape(src_word)[-1]
trg_max_len = paddle.shape(trg_word)[-1]
base_attn_bias = paddle.cast(
src_word == self.bos_id,
dtype=paddle.get_default_dtype()).unsqueeze([1, 2]) * -1e9
src_slf_attn_bias = base_attn_bias
src_slf_attn_bias.stop_gradient = True
trg_slf_attn_bias = paddle.tensor.triu(
(paddle.ones(
(trg_max_len, trg_max_len),
dtype=paddle.get_default_dtype()) * -np.inf),
1)
trg_slf_attn_bias.stop_gradient = True
trg_src_attn_bias = paddle.tile(base_attn_bias, [1, 1, trg_max_len, 1])
src_pos = paddle.cast(
src_word != self.bos_id, dtype="int64") * paddle.arange(
start=0, end=src_max_len)
trg_pos = paddle.cast(
trg_word != self.bos_id, dtype="int64") * paddle.arange(
start=0, end=trg_max_len)
src_emb = self.src_word_embedding(src_word)
src_pos_emb = self.src_pos_embedding(src_pos)
src_emb = src_emb + src_pos_emb
enc_input = F.dropout(
src_emb, p=self.dropout,
training=self.training) if self.dropout else src_emb
with paddle.static.amp.fp16_guard():
if self.waitk >= src_max_len or self.waitk == -1:
# Full sentence
enc_outputs = [
self.encoder(
enc_input, src_mask=src_slf_attn_bias)
]
else:
# Wait-k policy
enc_outputs = []
for i in range(self.waitk, src_max_len + 1):
enc_output = self.encoder(
enc_input[:, :i, :],
src_mask=src_slf_attn_bias[:, :, :, :i])
enc_outputs.append(enc_output)
trg_emb = self.trg_word_embedding(trg_word)
trg_pos_emb = self.trg_pos_embedding(trg_pos)
trg_emb = trg_emb + trg_pos_emb
dec_input = F.dropout(
trg_emb, p=self.dropout,
training=self.training) if self.dropout else trg_emb
dec_output = self.decoder(
dec_input,
enc_outputs,
tgt_mask=trg_slf_attn_bias,
memory_mask=trg_src_attn_bias)
predict = self.linear(dec_output)
return predict
def beam_search(self, src_word, beam_size=4, max_len=256, waitk=-1):
# TODO: "Speculative Beam Search for Simultaneous Translation"
raise NotImplementedError
def greedy_search(self,
src_word,
max_len=256,
waitk=-1,
caches=None,
bos_id=None):
"""
greedy_search uses streaming reader. It doesn't need calling
encoder many times, an a sub-sentence just needs calling encoder once.
So, it needs previous state(caches) and last one of generated
tokens id last time.
"""
src_max_len = paddle.shape(src_word)[-1]
base_attn_bias = paddle.cast(
src_word == self.bos_id,
dtype=paddle.get_default_dtype()).unsqueeze([1, 2]) * -1e9
src_slf_attn_bias = base_attn_bias
src_slf_attn_bias.stop_gradient = True
trg_src_attn_bias = paddle.tile(base_attn_bias, [1, 1, 1, 1])
src_pos = paddle.cast(
src_word != self.bos_id, dtype="int64") * paddle.arange(
start=0, end=src_max_len)
src_emb = self.src_word_embedding(src_word)
src_pos_emb = self.src_pos_embedding(src_pos)
src_emb = src_emb + src_pos_emb
enc_input = F.dropout(
src_emb, p=self.dropout,
training=self.training) if self.dropout else src_emb
enc_outputs = [self.encoder(enc_input, src_mask=src_slf_attn_bias)]
# constant number
batch_size = enc_outputs[-1].shape[0]
max_len = (
enc_outputs[-1].shape[1] + 20) if max_len is None else max_len
end_token_tensor = paddle.full(
shape=[batch_size, 1], fill_value=self.eos_id, dtype="int64")
predict_ids = []
log_probs = paddle.full(
shape=[batch_size, 1], fill_value=0, dtype="float32")
if not bos_id:
trg_word = paddle.full(
shape=[batch_size, 1], fill_value=self.bos_id, dtype="int64")
else:
trg_word = paddle.full(
shape=[batch_size, 1], fill_value=bos_id, dtype="int64")
# init states (caches) for transformer
if not caches:
caches = self.decoder.gen_cache(enc_outputs[-1], do_zip=False)
for i in range(max_len):
trg_pos = paddle.full(
shape=trg_word.shape, fill_value=i, dtype="int64")
trg_emb = self.trg_word_embedding(trg_word)
trg_pos_emb = self.trg_pos_embedding(trg_pos)
trg_emb = trg_emb + trg_pos_emb
dec_input = F.dropout(
trg_emb, p=self.dropout,
training=self.training) if self.dropout else trg_emb
if waitk < 0 or i >= len(enc_outputs):
# if the decoder step is full sent or longer than all source
# step, then read the whole src
_e = enc_outputs[-1]
dec_output, caches = self.decoder(
dec_input, [_e], None,
trg_src_attn_bias[:, :, :, :_e.shape[1]], caches)
else:
_e = enc_outputs[i]
dec_output, caches = self.decoder(
dec_input, [_e], None,
trg_src_attn_bias[:, :, :, :_e.shape[1]], caches)
dec_output = paddle.reshape(
dec_output, shape=[-1, dec_output.shape[-1]])
logits = self.linear(dec_output)
step_log_probs = paddle.log(F.softmax(logits, axis=-1))
log_probs = paddle.add(x=step_log_probs, y=log_probs)
scores = log_probs
topk_scores, topk_indices = paddle.topk(x=scores, k=1)
finished = paddle.equal(topk_indices, end_token_tensor)
trg_word = topk_indices
log_probs = topk_scores
predict_ids.append(topk_indices)
if paddle.all(finished).numpy():
break
predict_ids = paddle.stack(predict_ids, axis=0)
finished_seq = paddle.transpose(predict_ids, [1, 2, 0])
finished_scores = topk_scores
return finished_seq, finished_scores, caches
\ No newline at end of file
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import jieba
import paddle
from paddlenlp.transformers import position_encoding_init
from paddlenlp.transformers import WordEmbedding, PositionalEmbedding
from paddlehub.env import MODULE_HOME
from paddlehub.module.module import moduleinfo, serving
from transformer_nist_wait_all.model import SimultaneousTransformer
from transformer_nist_wait_all.processor import STACLTokenizer, predict
@moduleinfo(
name="transformer_nist_wait_all",
version="1.0.0",
summary="",
author="PaddlePaddle",
author_email="",
type="nlp/simultaneous_translation",
)
class STTransformer():
"""
Transformer model for simultaneous translation.
"""
# Model config
model_config = {
# Number of head used in multi-head attention.
"n_head": 8,
# Number of sub-layers to be stacked in the encoder and decoder.
"n_layer": 6,
# The dimension for word embeddings, which is also the last dimension of
# the input and output of multi-head attention, position-wise feed-forward
# networks, encoder and decoder.
"d_model": 512,
}
def __init__(self,
max_length=256,
max_out_len=256,
):
super(STTransformer, self).__init__()
bpe_codes_fpath = os.path.join(MODULE_HOME, "transformer_nist_wait_all", "assets", "2M.zh2en.dict4bpe.zh")
src_vocab_fpath = os.path.join(MODULE_HOME, "transformer_nist_wait_all", "assets", "nist.20k.zh.vocab")
trg_vocab_fpath = os.path.join(MODULE_HOME, "transformer_nist_wait_all", "assets", "nist.10k.en.vocab")
params_fpath = os.path.join(MODULE_HOME, "transformer_nist_wait_all", "assets", "transformer.pdparams")
self.max_length = max_length
self.max_out_len = max_out_len
self.tokenizer = STACLTokenizer(
bpe_codes_fpath,
src_vocab_fpath,
trg_vocab_fpath,
)
src_vocab_size = self.tokenizer.src_vocab_size
trg_vocab_size = self.tokenizer.trg_vocab_size
self.transformer = SimultaneousTransformer(
src_vocab_size,
trg_vocab_size,
max_length=self.max_length,
n_layer=self.model_config['n_layer'],
n_head=self.model_config['n_head'],
d_model=self.model_config['d_model'],
)
model_dict = paddle.load(params_fpath)
# To avoid a longer length than training, reset the size of position
# encoding to max_length
model_dict["src_pos_embedding.pos_encoder.weight"] = position_encoding_init(
self.max_length + 1, self.model_config['d_model'])
model_dict["trg_pos_embedding.pos_encoder.weight"] = position_encoding_init(
self.max_length + 1, self.model_config['d_model'])
self.transformer.load_dict(model_dict)
@serving
def translate(self, text, use_gpu=False):
paddle.set_device('gpu') if use_gpu else paddle.set_device('cpu')
# Word segmentation
text = ' '.join(jieba.cut(text))
# For decoding max length
decoder_max_length = 1
# For decoding cache
cache = None
# For decoding start token id
bos_id = None
# Current source word index
i = 0
# For decoding: is_last=True, max_len=256
is_last = False
# Tokenized id
user_input_tokenized = []
# Store the translation
result = []
bpe_str, tokenized_src = self.tokenizer.tokenize(text)
while i < len(tokenized_src):
user_input_tokenized.append(tokenized_src[i])
if bpe_str[i] in ['。', '?', '!']:
is_last = True
result, cache, bos_id = predict(
user_input_tokenized,
decoder_max_length,
is_last,
cache,
bos_id,
result,
self.tokenizer,
self.transformer,
max_out_len=self.max_out_len)
i += 1
return " ".join(result)
\ No newline at end of file
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddle
from paddlenlp.data import Vocab
from subword_nmt import subword_nmt
class STACLTokenizer:
"""
Jieba+BPE, and convert tokens to ids.
"""
def __init__(self,
bpe_codes_fpath,
src_vocab_fpath,
trg_vocab_fpath,
special_token=["<s>", "<e>", "<unk>"]):
bpe_parser = subword_nmt.create_apply_bpe_parser()
bpe_args = bpe_parser.parse_args(args=['-c', bpe_codes_fpath])
self.bpe = subword_nmt.BPE(bpe_args.codes, bpe_args.merges,
bpe_args.separator, None,
bpe_args.glossaries)
self.src_vocab = Vocab.load_vocabulary(
src_vocab_fpath,
bos_token=special_token[0],
eos_token=special_token[1],
unk_token=special_token[2])
self.trg_vocab = Vocab.load_vocabulary(
trg_vocab_fpath,
bos_token=special_token[0],
eos_token=special_token[1],
unk_token=special_token[2])
self.src_vocab_size = len(self.src_vocab)
self.trg_vocab_size = len(self.trg_vocab)
def tokenize(self, text):
bpe_str = self.bpe.process_line(text)
ids = self.src_vocab.to_indices(bpe_str.split())
return bpe_str.split(), ids
def post_process_seq(seq,
bos_idx=0,
eos_idx=1,
output_bos=False,
output_eos=False):
"""
Post-process the decoded sequence.
"""
eos_pos = len(seq) - 1
for i, idx in enumerate(seq):
if idx == eos_idx:
eos_pos = i
break
seq = [
idx for idx in seq[:eos_pos + 1]
if (output_bos or idx != bos_idx) and (output_eos or idx != eos_idx)
]
return seq
def predict(tokenized_src,
decoder_max_length,
is_last,
cache,
bos_id,
result,
tokenizer,
transformer,
n_best=1,
max_out_len=256,
eos_idx=1,
waitk=-1,
):
# Set evaluate mode
transformer.eval()
if not is_last:
return result, cache, bos_id
with paddle.no_grad():
paddle.disable_static()
input_src = tokenized_src
if is_last:
decoder_max_length = max_out_len
input_src += [eos_idx]
src_word = paddle.to_tensor(input_src).unsqueeze(axis=0)
finished_seq, finished_scores, cache = transformer.greedy_search(
src_word,
max_len=decoder_max_length,
waitk=waitk,
caches=cache,
bos_id=bos_id)
finished_seq = finished_seq.numpy()
for beam_idx, beam in enumerate(finished_seq[0]):
if beam_idx >= n_best:
break
id_list = post_process_seq(beam)
if len(id_list) == 0:
continue
bos_id = id_list[-1]
word_list = tokenizer.trg_vocab.to_tokens(id_list)
for word in word_list:
result.append(word)
res = ' '.join(word_list).replace('@@ ', '')
paddle.enable_static()
return result, cache, bos_id
\ No newline at end of file
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册