Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
models
提交
9f9f3a94
M
models
项目概览
PaddlePaddle
/
models
1 年多 前同步成功
通知
222
Star
6828
Fork
2962
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
602
列表
看板
标记
里程碑
合并请求
255
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
M
models
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
602
Issue
602
列表
看板
标记
里程碑
合并请求
255
合并请求
255
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
9f9f3a94
编写于
10月 15, 2017
作者:
C
Cao Ying
提交者:
GitHub
10月 15, 2017
浏览文件
操作
浏览文件
下载
差异文件
Merge pull request #380 from peterzhang2029/nested_seq_refine
Fix doc style.
上级
eeb85f25
327fc655
变更
7
隐藏空白更改
内联
并排
Showing
7 changed file
with
212 addition
and
206 deletion
+212
-206
image_classification/README.md
image_classification/README.md
+1
-1
image_classification/index.html
image_classification/index.html
+1
-1
nested_sequence/text_classification/.gitignore
nested_sequence/text_classification/.gitignore
+2
-0
nested_sequence/text_classification/README.md
nested_sequence/text_classification/README.md
+13
-11
nested_sequence/text_classification/index.html
nested_sequence/text_classification/index.html
+13
-11
scheduled_sampling/README.md
scheduled_sampling/README.md
+91
-91
scheduled_sampling/index.html
scheduled_sampling/index.html
+91
-91
未找到文件。
image_classification/README.md
浏览文件 @
9f9f3a94
...
...
@@ -235,4 +235,4 @@ parameters.init_from_tar(gzip.open('Paddle_ResNet50.tar.gz', 'r'))
```
### 注意事项
模型压缩包中所含各文件的文件名
对应了
和模型配置中的参数名一一对应,是加载模型参数的依据。我们提供的预训练模型均使用了示例代码中的配置,如需修改网络配置,请多加注意,需要保证网络配置中的参数名和压缩包中的文件名能够正确对应。
模型压缩包中所含各文件的文件名和模型配置中的参数名一一对应,是加载模型参数的依据。我们提供的预训练模型均使用了示例代码中的配置,如需修改网络配置,请多加注意,需要保证网络配置中的参数名和压缩包中的文件名能够正确对应。
image_classification/index.html
浏览文件 @
9f9f3a94
...
...
@@ -277,7 +277,7 @@ parameters.init_from_tar(gzip.open('Paddle_ResNet50.tar.gz', 'r'))
```
### 注意事项
模型压缩包中所含各文件的文件名
对应了
和模型配置中的参数名一一对应,是加载模型参数的依据。我们提供的预训练模型均使用了示例代码中的配置,如需修改网络配置,请多加注意,需要保证网络配置中的参数名和压缩包中的文件名能够正确对应。
模型压缩包中所含各文件的文件名和模型配置中的参数名一一对应,是加载模型参数的依据。我们提供的预训练模型均使用了示例代码中的配置,如需修改网络配置,请多加注意,需要保证网络配置中的参数名和压缩包中的文件名能够正确对应。
</div>
<!-- You can change the lines below now. -->
...
...
nested_sequence/text_classification/.gitignore
0 → 100644
浏览文件 @
9f9f3a94
.DS_Store
*.pyc
nested_sequence/text_classification/README.md
浏览文件 @
9f9f3a94
...
...
@@ -26,7 +26,7 @@ PaddlePaddle 实现该网络结构的代码见 `network_conf.py`。
```
python
nest_group
=
paddle
.
layer
.
recurrent_group
(
input
=
[
paddle
.
layer
.
SubsequenceInput
(
emb
),
hidden_size
],
step
=
cnn_cov_group
)
step
=
cnn_cov_group
)
```
...
...
@@ -40,10 +40,10 @@ CNN网络具体代码实现如下:
```
python
def
cnn_cov_group
(
group_input
,
hidden_size
):
"""
Co
volution group definition
Co
nvolution group definition.
:param group_input: The input of this layer.
:type group_input: LayerOutput
:params hidden_size:
Size of FC
layer.
:params hidden_size:
The size of the fully connected
layer.
:type hidden_size: int
"""
conv3
=
paddle
.
networks
.
sequence_conv_pool
(
...
...
@@ -63,11 +63,13 @@ PaddlePaddle 中已经封装好的带有池化的文本序列卷积模块:`pad
在得到每个句子的表示向量之后, 将所有句子表示向量经过一个平均池化层, 得到一个样本的向量表示, 向量经过一个全连接层输出最终的预测结果。 代码如下:
```
python
avg_pool
=
paddle
.
layer
.
pooling
(
input
=
nest_group
,
pooling_type
=
paddle
.
pooling
.
Avg
(),
agg_level
=
paddle
.
layer
.
AggregateLevel
.
TO_NO_SEQUENCE
)
avg_pool
=
paddle
.
layer
.
pooling
(
input
=
nest_group
,
pooling_type
=
paddle
.
pooling
.
Avg
(),
agg_level
=
paddle
.
layer
.
AggregateLevel
.
TO_NO_SEQUENCE
)
prob
=
paddle
.
layer
.
mixed
(
size
=
class_num
,
input
=
[
paddle
.
layer
.
full_matrix_projection
(
input
=
avg_pool
)],
act
=
paddle
.
activation
.
Softmax
())
input
=
[
paddle
.
layer
.
full_matrix_projection
(
input
=
avg_pool
)],
act
=
paddle
.
activation
.
Softmax
())
```
## 安装依赖包
```
bash
...
...
@@ -122,10 +124,10 @@ python infer.py --model_path 'models/params_pass_00000.tar.gz'
输入数据格式如下:每一行为一条样本,以
`\t`
分隔,第一列是类别标签,第二列是输入文本的内容。以下是两条示例数据:
```
positive This movie is very good. The actor is so handsome.
negative What a terrible movie. I waste so much time.
```
```
positive This movie is very good. The actor is so handsome.
negative What a terrible movie. I waste so much time.
```
2.
编写数据读取接口
...
...
nested_sequence/text_classification/index.html
浏览文件 @
9f9f3a94
...
...
@@ -68,7 +68,7 @@ PaddlePaddle 实现该网络结构的代码见 `network_conf.py`。
``` python
nest_group = paddle.layer.recurrent_group(input=[paddle.layer.SubsequenceInput(emb),
hidden_size],
step=cnn_cov_group)
step=cnn_cov_group)
```
...
...
@@ -82,10 +82,10 @@ CNN网络具体代码实现如下:
```python
def cnn_cov_group(group_input, hidden_size):
"""
Co
volution group definition
Co
nvolution group definition.
:param group_input: The input of this layer.
:type group_input: LayerOutput
:params hidden_size:
Size of FC
layer.
:params hidden_size:
The size of the fully connected
layer.
:type hidden_size: int
"""
conv3 = paddle.networks.sequence_conv_pool(
...
...
@@ -105,11 +105,13 @@ PaddlePaddle 中已经封装好的带有池化的文本序列卷积模块:`pad
在得到每个句子的表示向量之后, 将所有句子表示向量经过一个平均池化层, 得到一个样本的向量表示, 向量经过一个全连接层输出最终的预测结果。 代码如下:
```python
avg_pool = paddle.layer.pooling(input=nest_group, pooling_type=paddle.pooling.Avg(),
agg_level=paddle.layer.AggregateLevel.TO_NO_SEQUENCE)
avg_pool = paddle.layer.pooling(input=nest_group,
pooling_type=paddle.pooling.Avg(),
agg_level=paddle.layer.AggregateLevel.TO_NO_SEQUENCE)
prob = paddle.layer.mixed(size=class_num,
input=[paddle.layer.full_matrix_projection(input=avg_pool)],
act=paddle.activation.Softmax())
input=[paddle.layer.full_matrix_projection(input=avg_pool)],
act=paddle.activation.Softmax())
```
## 安装依赖包
```bash
...
...
@@ -164,10 +166,10 @@ python infer.py --model_path 'models/params_pass_00000.tar.gz'
输入数据格式如下:每一行为一条样本,以 `\t` 分隔,第一列是类别标签,第二列是输入文本的内容。以下是两条示例数据:
```
positive This movie is very good. The actor is so handsome.
negative What a terrible movie. I waste so much time.
```
```
positive This movie is very good. The actor is so handsome.
negative What a terrible movie. I waste so much time.
```
2.编写数据读取接口
...
...
scheduled_sampling/README.md
浏览文件 @
9f9f3a94
...
...
@@ -60,52 +60,52 @@ class RandomScheduleGenerator:
`__init__`
方法对类进行初始化,其
`schedule_type`
参数指定了使用哪种衰减方式,可选的方式有
`constant`
、
`linear`
、
`exponential`
和
`inverse_sigmoid`
。
`constant`
指对所有的mini-batch使用固定的$
\e
psilon_i$,
`linear`
指线性衰减方式,
`exponential`
表示指数衰减方式,
`inverse_sigmoid`
表示反向Sigmoid衰减。
`__init__`
方法的参数
`a`
和
`b`
表示衰减方法的参数,需要在验证集上调优。
`self.schedule_computers`
将衰减方式映射为计算$
\e
psilon_i$的函数。最后一行根据
`schedule_type`
将选择的衰减函数赋给
`self.schedule_computer`
变量。
```
python
def
__init__
(
self
,
schedule_type
,
a
,
b
):
"""
schduled_type: is the type of the decay. It supports constant, linear,
exponential, and inverse_sigmoid right now.
a: parameter of the decay (MUST BE DOUBLE)
b: parameter of the decay (MUST BE DOUBLE)
"""
self
.
schedule_type
=
schedule_type
self
.
a
=
a
self
.
b
=
b
self
.
data_processed_
=
0
self
.
schedule_computers
=
{
"constant"
:
lambda
a
,
b
,
d
:
a
,
"linear"
:
lambda
a
,
b
,
d
:
max
(
a
,
1
-
d
/
b
),
"exponential"
:
lambda
a
,
b
,
d
:
pow
(
a
,
d
/
b
),
"inverse_sigmoid"
:
lambda
a
,
b
,
d
:
b
/
(
b
+
math
.
exp
(
d
*
a
/
b
)),
}
assert
(
self
.
schedule_type
in
self
.
schedule_computers
)
self
.
schedule_computer
=
self
.
schedule_computers
[
self
.
schedule_type
]
def
__init__
(
self
,
schedule_type
,
a
,
b
):
"""
schduled_type: is the type of the decay. It supports constant, linear,
exponential, and inverse_sigmoid right now.
a: parameter of the decay (MUST BE DOUBLE)
b: parameter of the decay (MUST BE DOUBLE)
"""
self
.
schedule_type
=
schedule_type
self
.
a
=
a
self
.
b
=
b
self
.
data_processed_
=
0
self
.
schedule_computers
=
{
"constant"
:
lambda
a
,
b
,
d
:
a
,
"linear"
:
lambda
a
,
b
,
d
:
max
(
a
,
1
-
d
/
b
),
"exponential"
:
lambda
a
,
b
,
d
:
pow
(
a
,
d
/
b
),
"inverse_sigmoid"
:
lambda
a
,
b
,
d
:
b
/
(
b
+
math
.
exp
(
d
*
a
/
b
)),
}
assert
(
self
.
schedule_type
in
self
.
schedule_computers
)
self
.
schedule_computer
=
self
.
schedule_computers
[
self
.
schedule_type
]
```
`getScheduleRate`
根据衰减函数和已经处理的数据量计算$
\e
psilon_i$。
```
python
def
getScheduleRate
(
self
):
"""
Get the schedule sampling rate. Usually not needed to be called by the users
"""
return
self
.
schedule_computer
(
self
.
a
,
self
.
b
,
self
.
data_processed_
)
def
getScheduleRate
(
self
):
"""
Get the schedule sampling rate. Usually not needed to be called by the users
"""
return
self
.
schedule_computer
(
self
.
a
,
self
.
b
,
self
.
data_processed_
)
```
`processBatch`
方法根据概率值$
\e
psilon_i$进行采样,得到
`indexes`
,
`indexes`
中每个元素取值为
`0`
的概率为$
\e
psilon_i$,取值为
`1`
的概率为$1-
\e
psilon_i$。
`indexes`
决定了解码器的输入是真实元素还是生成的元素,取值为
`0`
表示使用真实元素,取值为
`1`
表示使用生成的元素。
```
python
def
processBatch
(
self
,
batch_size
):
"""
Get a batch_size of sampled indexes. These indexes can be passed to a
MultiplexLayer to select from the grouth truth and generated samples
from the last time step.
"""
rate
=
self
.
getScheduleRate
()
numbers
=
np
.
random
.
rand
(
batch_size
)
indexes
=
(
numbers
>=
rate
).
astype
(
'int32'
).
tolist
()
self
.
data_processed_
+=
batch_size
return
indexes
def
processBatch
(
self
,
batch_size
):
"""
Get a batch_size of sampled indexes. These indexes can be passed to a
MultiplexLayer to select from the grouth truth and generated samples
from the last time step.
"""
rate
=
self
.
getScheduleRate
()
numbers
=
np
.
random
.
rand
(
batch_size
)
indexes
=
(
numbers
>=
rate
).
astype
(
'int32'
).
tolist
()
self
.
data_processed_
+=
batch_size
return
indexes
```
Scheduled Sampling需要在序列到序列模型的基础上增加一个输入
`true_token_flag`
,以控制解码器输入。
...
...
@@ -148,62 +148,62 @@ def gen_schedule_data(reader):
训练时
`recurrent_group`
每一步调用的解码器函数如下:
```
python
def
gru_decoder_with_attention_train
(
enc_vec
,
enc_proj
,
true_word
,
true_token_flag
):
"""
The decoder step for training.
:param enc_vec: the encoder vector for attention
:type enc_vec: LayerOutput
:param enc_proj: the encoder projection for attention
:type enc_proj: LayerOutput
:param true_word: the ground-truth target word
:type true_word: LayerOutput
:param true_token_flag: the flag of using the ground-truth target word
:type true_token_flag: LayerOutput
:return: the softmax output layer
:rtype: LayerOutput
"""
decoder_mem
=
paddle
.
layer
.
memory
(
name
=
'gru_decoder'
,
size
=
decoder_size
,
boot_layer
=
decoder_boot
)
context
=
paddle
.
networks
.
simple_attention
(
encoded_sequence
=
enc_vec
,
encoded_proj
=
enc_proj
,
decoder_state
=
decoder_mem
)
gru_out_memory
=
paddle
.
layer
.
memory
(
name
=
'gru_out'
,
size
=
target_dict_dim
)
generated_word
=
paddle
.
layer
.
max_id
(
input
=
gru_out_memory
)
generated_word_emb
=
paddle
.
layer
.
embedding
(
input
=
generated_word
,
size
=
word_vector_dim
,
param_attr
=
paddle
.
attr
.
ParamAttr
(
name
=
'_target_language_embedding'
))
current_word
=
paddle
.
layer
.
multiplex
(
input
=
[
true_token_flag
,
true_word
,
generated_word_emb
])
with
paddle
.
layer
.
mixed
(
size
=
decoder_size
*
3
)
as
decoder_inputs
:
decoder_inputs
+=
paddle
.
layer
.
full_matrix_projection
(
input
=
context
)
decoder_inputs
+=
paddle
.
layer
.
full_matrix_projection
(
input
=
current_word
)
gru_step
=
paddle
.
layer
.
gru_step
(
name
=
'gru_decoder'
,
input
=
decoder_inputs
,
output_mem
=
decoder_mem
,
size
=
decoder_size
)
with
paddle
.
layer
.
mixed
(
name
=
'gru_out'
,
size
=
target_dict_dim
,
bias_attr
=
True
,
act
=
paddle
.
activation
.
Softmax
())
as
out
:
out
+=
paddle
.
layer
.
full_matrix_projection
(
input
=
gru_step
)
return
out
def
gru_decoder_with_attention_train
(
enc_vec
,
enc_proj
,
true_word
,
true_token_flag
):
"""
The decoder step for training.
:param enc_vec: the encoder vector for attention
:type enc_vec: LayerOutput
:param enc_proj: the encoder projection for attention
:type enc_proj: LayerOutput
:param true_word: the ground-truth target word
:type true_word: LayerOutput
:param true_token_flag: the flag of using the ground-truth target word
:type true_token_flag: LayerOutput
:return: the softmax output layer
:rtype: LayerOutput
"""
decoder_mem
=
paddle
.
layer
.
memory
(
name
=
'gru_decoder'
,
size
=
decoder_size
,
boot_layer
=
decoder_boot
)
context
=
paddle
.
networks
.
simple_attention
(
encoded_sequence
=
enc_vec
,
encoded_proj
=
enc_proj
,
decoder_state
=
decoder_mem
)
gru_out_memory
=
paddle
.
layer
.
memory
(
name
=
'gru_out'
,
size
=
target_dict_dim
)
generated_word
=
paddle
.
layer
.
max_id
(
input
=
gru_out_memory
)
generated_word_emb
=
paddle
.
layer
.
embedding
(
input
=
generated_word
,
size
=
word_vector_dim
,
param_attr
=
paddle
.
attr
.
ParamAttr
(
name
=
'_target_language_embedding'
))
current_word
=
paddle
.
layer
.
multiplex
(
input
=
[
true_token_flag
,
true_word
,
generated_word_emb
])
with
paddle
.
layer
.
mixed
(
size
=
decoder_size
*
3
)
as
decoder_inputs
:
decoder_inputs
+=
paddle
.
layer
.
full_matrix_projection
(
input
=
context
)
decoder_inputs
+=
paddle
.
layer
.
full_matrix_projection
(
input
=
current_word
)
gru_step
=
paddle
.
layer
.
gru_step
(
name
=
'gru_decoder'
,
input
=
decoder_inputs
,
output_mem
=
decoder_mem
,
size
=
decoder_size
)
with
paddle
.
layer
.
mixed
(
name
=
'gru_out'
,
size
=
target_dict_dim
,
bias_attr
=
True
,
act
=
paddle
.
activation
.
Softmax
())
as
out
:
out
+=
paddle
.
layer
.
full_matrix_projection
(
input
=
gru_step
)
return
out
```
该函数使用
`memory`
层
`gru_out_memory`
记忆上一时刻生成的元素,根据
`gru_out_memory`
选择概率最大的词语
`generated_word`
作为生成的词语。
`multiplex`
层会在真实元素
`true_word`
和生成的元素
`generated_word`
之间做出选择,并将选择的结果作为解码器输入。
`multiplex`
层使用了三个输入,分别为
`true_token_flag`
、
`true_word`
和
`generated_word_emb`
。对于这三个输入中每个元素,若
`true_token_flag`
中的值为
`0`
,则
`multiplex`
层输出
`true_word`
中的相应元素;若
`true_token_flag`
中的值为
`1`
,则
`multiplex`
层输出
`generated_word_emb`
中的相应元素。
...
...
scheduled_sampling/index.html
浏览文件 @
9f9f3a94
...
...
@@ -102,52 +102,52 @@ class RandomScheduleGenerator:
`__init__`方法对类进行初始化,其`schedule_type`参数指定了使用哪种衰减方式,可选的方式有`constant`、`linear`、`exponential`和`inverse_sigmoid`。`constant`指对所有的mini-batch使用固定的$\epsilon_i$,`linear`指线性衰减方式,`exponential`表示指数衰减方式,`inverse_sigmoid`表示反向Sigmoid衰减。`__init__`方法的参数`a`和`b`表示衰减方法的参数,需要在验证集上调优。`self.schedule_computers`将衰减方式映射为计算$\epsilon_i$的函数。最后一行根据`schedule_type`将选择的衰减函数赋给`self.schedule_computer`变量。
```python
def __init__(self, schedule_type, a, b):
"""
schduled_type: is the type of the decay. It supports constant, linear,
exponential, and inverse_sigmoid right now.
a: parameter of the decay (MUST BE DOUBLE)
b: parameter of the decay (MUST BE DOUBLE)
"""
self.schedule_type = schedule_type
self.a = a
self.b = b
self.data_processed_ = 0
self.schedule_computers = {
"constant": lambda a, b, d: a,
"linear": lambda a, b, d: max(a, 1 - d / b),
"exponential": lambda a, b, d: pow(a, d / b),
"inverse_sigmoid": lambda a, b, d: b / (b + math.exp(d * a / b)),
}
assert (self.schedule_type in self.schedule_computers)
self.schedule_computer = self.schedule_computers[self.schedule_type]
def __init__(self, schedule_type, a, b):
"""
schduled_type: is the type of the decay. It supports constant, linear,
exponential, and inverse_sigmoid right now.
a: parameter of the decay (MUST BE DOUBLE)
b: parameter of the decay (MUST BE DOUBLE)
"""
self.schedule_type = schedule_type
self.a = a
self.b = b
self.data_processed_ = 0
self.schedule_computers = {
"constant": lambda a, b, d: a,
"linear": lambda a, b, d: max(a, 1 - d / b),
"exponential": lambda a, b, d: pow(a, d / b),
"inverse_sigmoid": lambda a, b, d: b / (b + math.exp(d * a / b)),
}
assert (self.schedule_type in self.schedule_computers)
self.schedule_computer = self.schedule_computers[self.schedule_type]
```
`getScheduleRate`根据衰减函数和已经处理的数据量计算$\epsilon_i$。
```python
def getScheduleRate(self):
"""
Get the schedule sampling rate. Usually not needed to be called by the users
"""
return self.schedule_computer(self.a, self.b, self.data_processed_)
def getScheduleRate(self):
"""
Get the schedule sampling rate. Usually not needed to be called by the users
"""
return self.schedule_computer(self.a, self.b, self.data_processed_)
```
`processBatch`方法根据概率值$\epsilon_i$进行采样,得到`indexes`,`indexes`中每个元素取值为`0`的概率为$\epsilon_i$,取值为`1`的概率为$1-\epsilon_i$。`indexes`决定了解码器的输入是真实元素还是生成的元素,取值为`0`表示使用真实元素,取值为`1`表示使用生成的元素。
```python
def processBatch(self, batch_size):
"""
Get a batch_size of sampled indexes. These indexes can be passed to a
MultiplexLayer to select from the grouth truth and generated samples
from the last time step.
"""
rate = self.getScheduleRate()
numbers = np.random.rand(batch_size)
indexes = (numbers >= rate).astype('int32').tolist()
self.data_processed_ += batch_size
return indexes
def processBatch(self, batch_size):
"""
Get a batch_size of sampled indexes. These indexes can be passed to a
MultiplexLayer to select from the grouth truth and generated samples
from the last time step.
"""
rate = self.getScheduleRate()
numbers = np.random.rand(batch_size)
indexes = (numbers >= rate).astype('int32').tolist()
self.data_processed_ += batch_size
return indexes
```
Scheduled Sampling需要在序列到序列模型的基础上增加一个输入`true_token_flag`,以控制解码器输入。
...
...
@@ -190,62 +190,62 @@ def gen_schedule_data(reader):
训练时`recurrent_group`每一步调用的解码器函数如下:
```python
def gru_decoder_with_attention_train(enc_vec, enc_proj, true_word,
true_token_flag):
"""
The decoder step for training.
:param enc_vec: the encoder vector for attention
:type enc_vec: LayerOutput
:param enc_proj: the encoder projection for attention
:type enc_proj: LayerOutput
:param true_word: the ground-truth target word
:type true_word: LayerOutput
:param true_token_flag: the flag of using the ground-truth target word
:type true_token_flag: LayerOutput
:return: the softmax output layer
:rtype: LayerOutput
"""
decoder_mem = paddle.layer.memory(
name='gru_decoder', size=decoder_size, boot_layer=decoder_boot)
context = paddle.networks.simple_attention(
encoded_sequence=enc_vec,
encoded_proj=enc_proj,
decoder_state=decoder_mem)
gru_out_memory = paddle.layer.memory(
name='gru_out', size=target_dict_dim)
generated_word = paddle.layer.max_id(input=gru_out_memory)
generated_word_emb = paddle.layer.embedding(
input=generated_word,
size=word_vector_dim,
param_attr=paddle.attr.ParamAttr(name='_target_language_embedding'))
current_word = paddle.layer.multiplex(
input=[true_token_flag, true_word, generated_word_emb])
with paddle.layer.mixed(size=decoder_size * 3) as decoder_inputs:
decoder_inputs += paddle.layer.full_matrix_projection(input=context)
decoder_inputs += paddle.layer.full_matrix_projection(
input=current_word)
gru_step = paddle.layer.gru_step(
name='gru_decoder',
input=decoder_inputs,
output_mem=decoder_mem,
size=decoder_size)
with paddle.layer.mixed(
name='gru_out',
size=target_dict_dim,
bias_attr=True,
act=paddle.activation.Softmax()) as out:
out += paddle.layer.full_matrix_projection(input=gru_step)
return out
def gru_decoder_with_attention_train(enc_vec, enc_proj, true_word,
true_token_flag):
"""
The decoder step for training.
:param enc_vec: the encoder vector for attention
:type enc_vec: LayerOutput
:param enc_proj: the encoder projection for attention
:type enc_proj: LayerOutput
:param true_word: the ground-truth target word
:type true_word: LayerOutput
:param true_token_flag: the flag of using the ground-truth target word
:type true_token_flag: LayerOutput
:return: the softmax output layer
:rtype: LayerOutput
"""
decoder_mem = paddle.layer.memory(
name='gru_decoder', size=decoder_size, boot_layer=decoder_boot)
context = paddle.networks.simple_attention(
encoded_sequence=enc_vec,
encoded_proj=enc_proj,
decoder_state=decoder_mem)
gru_out_memory = paddle.layer.memory(
name='gru_out', size=target_dict_dim)
generated_word = paddle.layer.max_id(input=gru_out_memory)
generated_word_emb = paddle.layer.embedding(
input=generated_word,
size=word_vector_dim,
param_attr=paddle.attr.ParamAttr(name='_target_language_embedding'))
current_word = paddle.layer.multiplex(
input=[true_token_flag, true_word, generated_word_emb])
with paddle.layer.mixed(size=decoder_size * 3) as decoder_inputs:
decoder_inputs += paddle.layer.full_matrix_projection(input=context)
decoder_inputs += paddle.layer.full_matrix_projection(
input=current_word)
gru_step = paddle.layer.gru_step(
name='gru_decoder',
input=decoder_inputs,
output_mem=decoder_mem,
size=decoder_size)
with paddle.layer.mixed(
name='gru_out',
size=target_dict_dim,
bias_attr=True,
act=paddle.activation.Softmax()) as out:
out += paddle.layer.full_matrix_projection(input=gru_step)
return out
```
该函数使用`memory`层`gru_out_memory`记忆上一时刻生成的元素,根据`gru_out_memory`选择概率最大的词语`generated_word`作为生成的词语。`multiplex`层会在真实元素`true_word`和生成的元素`generated_word`之间做出选择,并将选择的结果作为解码器输入。`multiplex`层使用了三个输入,分别为`true_token_flag`、`true_word`和`generated_word_emb`。对于这三个输入中每个元素,若`true_token_flag`中的值为`0`,则`multiplex`层输出`true_word`中的相应元素;若`true_token_flag`中的值为`1`,则`multiplex`层输出`generated_word_emb`中的相应元素。
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录