GRU参数load问题
Created by: wangty163
你好,我现在想用fluid.layers.dynamic_gru
这个结构,但是发现一直和自己手算的结果不一致,所以想问一下当有gru结构中所用到的所有参数的值时,应该怎么设置gru的权重、参数矩阵?
- 模型、参数结构:
- 3个时间步,每个时间步输入特征为1维
- gru隐层节点数为2
- 下边是我手算的过程:
import numpy as np
u_w = np.array(
[[ 0.05270875, 0.07485402],
[ 0.5579587 , 0.1529336 ],
[-0.7364301 , -0.2818719 ]], dtype=np.float32)
u_b = np.array([1., 1. ], dtype=np.float32)
r_w = np.array(
[[ 0.32575977, -0.8603296 ],
[ 0.5314517 , -0.6671432 ],
[ 0.84039783, 0.53213954]], dtype=np.float32)
r_b = np.array([1., 1. ], dtype=np.float32)
c_w = np.array(
[[ 0.13200128, 0.22489548],
[-0.55870104, 0.8221725 ],
[ 0.19546998, 0.54274595]], dtype=np.float32)
c_b = np.array([0., 0.], dtype=np.float32)
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def call(inputs, state):
u = np.matmul(np.concatenate([inputs, state], axis=0), u_w) + u_b
u = sigmoid(u)
r = np.matmul(np.concatenate([inputs, state], axis=0), r_w) + r_b
r = sigmoid(r)
r_state = r * state
c = np.matmul(np.concatenate([inputs, r_state], axis=0), c_w) + c_b
c = np.tanh(c)
new_h = u * state + (1 - u) * c
return new_h
state_0 = np.array([0, 0])
state_1 = call(np.array([0.65115523]), state_0)
print('state_1:', state_1)
state_2 = call(np.array([0.8680488]), state_1)
print('state_2:', state_2)
state_3 = call(np.array([0.57991964]), state_2)
print('state_3:', state_3)
# -----------------------------------------------------------
# state_1: [0.02248567 0.0377275 ]
# state_2: [0.04507692 0.0841474 ]
# state_3: [0.05100743 0.11097222]
- 下边是我实现的fluid网络结构
hidden_dim = 2
input_fea = fluid.layers.data(dtype='float32', shape=[1],
name="input_fea ", lod_level=1)
input = fluid.layers.fc(input_fea, size=hidden_dim * 3, bias_attr=False, act=None)
output = fluid.layers.dynamic_gru(input, size=hidden_dim, origin_mode=True)
- 这时候网络中的参数有:
# (1, 6)
fc_0.w_0
# (2, 6)
gru_0.w_0
# (1, 6)
gru_0.b_0
想问一下u_w、u_b、r_w、r_b、c_w、c_b 与 fc_0.w_0、gru_0.w_0、gru_0.b_0的转换关系是什么? 如果我实现的fluid网络结构有问题的话,希望告知一下正确的网络结构的实现(fc和gru都有bias,我只采用了gru的bias,不知道有没有问题)以及上述几个变量的对应关系,谢谢!