新老版本lstm切换问题以及fluid吃内存的问题
Created by: wfeagle
batch_size=256,比较吃内存,可能是hidden_dim比较大(256是一共的结点,结果又 *4)
现在有三个问题: (1)老版本的lstm自带dropout属性,请问切换到fluid的话,下面的代码(主要是lstm部分是先dropout再lstm? 还是先lstm再dropout?还是这两种操作都不能跟老版本等价?如果要等效的话,该怎么做呢?) 老版本代码: layer_attr = ExtraLayerAttribute(drop_rate=0.5) lstm = lstmemory(input=hidden, layer_attr=layer_attr)
(3)老版本的有个settings是对nn的所有layer都生效吗?那么在下面的新代码中所有的layers是不是都要通过ParamAttr参数属性来配置呢? 老版本代码: settings( batch_size=256, learning_method=AdamOptimizer(), learning_rate=1e-3, regularization=L2Regularization(8e-4), gradient_clipping_threshold=25)
新版本的网络结构 `def bilstm_net(data, label, dict_dim, class_dim, emb_dim=64, hid_dim=256, emb_lr=1e-2, dropout_rate=0.5): """ Bi-Lstm net """ # embedding layer emb = fluid.layers.embedding( input=data, size=[dict_dim, emb_dim], param_attr=fluid.ParamAttr(learning_rate=emb_lr))
# bi-lstm layer
fc0 = fluid.layers.fc(input=emb, size=hid_dim * 4)
dropout0 = fluid.layers.dropout(
x=fc0, dropout_prob=dropout_rate, is_test=False)
lstm_h, c = fluid.layers.dynamic_lstm(
input=dropout0, size=hid_dim * 4, is_reverse=False)
rfc0 = fluid.layers.fc(input=emb, size=hid_dim * 4)
rdropout0 = fluid.layers.dropout(
x=rfc0, dropout_prob=dropout_rate, is_test=False)
rlstm_h, c = fluid.layers.dynamic_lstm(
input=rdropout0, size=hid_dim * 4, is_reverse=True)
# concat layer
lstm_concat = fluid.layers.concat(input=[lstm_h, rlstm_h], axis=1)
#attention
attention_vec = fluid.layers.fc(
input=lstm_concat, size=hid_dim // 2, act='tanh')
attention_weight = fluid.layers.fc(
input=attention_vec, size=1, bias_attr=False)
attention_weight = fluid.layers.sequence_softmax(
input=attention_weight)
weight_reshape = fluid.layers.reshape(x=attention_weight, shape=[-1])
scaled = fluid.layers.elementwise_mul(
x=lstm_concat, y=weight_reshape, axis=0)
context = fluid.layers.sequence_pool(input=scaled, pool_type='sum')
# softmax layer
prediction = fluid.layers.fc(input=context, size=class_dim, act='softmax')
cost = fluid.layers.cross_entropy(input=prediction, label=label)
avg_cost = fluid.layers.mean(x=cost)
acc = fluid.layers.accuracy(input=prediction, label=label)
return avg_cost, acc, prediction`