while_loop循环内计算出错
Created by: Ramlinbird
计算目标描述: 输入一个label,形状是[-1,-1],每个step的形状都会发生变化,需要进行的操作是将第2维的所有数值展开为one_hot形式,变成[-1, -1, self._padding_seqlen],然后在dim=1进行reduce_sum操作,变成[-1, self_padding_seqlen](多分类label产生)。
训练时发现无法直接对label进行one_hot,利用while_loop修改为循环模式后,发现期望输出和实际输出不一样。循环中的计算,似乎没有利用到“计数器”的信息,每次计算一样了。
label = layers.squeeze(label, axes=[1])
layers.Print(label, message="debug label: ")
### 原思路,因为直接对label做one_hot,偶发撑爆内存,所以改成后面的循环
label_oh_0 = fluid.one_hot(label, self._padding_seqlen - 2, allow_out_of_range=True)
layers.Print(layers.reduce_sum(label_oh_0), message="debug expected_label_sum: ")
label_oh_0 = layers.reduce_sum(label_oh_0, dim=1)
###
label_oh = fluid.one_hot(label[:, 0], self._padding_seqlen - 2, allow_out_of_range=True)
layers.Print(layers.reduce_sum(label_oh), message="debug label_oh: ")
def cond(ind, label, label_oh, depth): # 参数和loop_vars相对应
return ind < layers.shape(label)[1]
def body(ind, label, label_oh, depth): # 参数和loop_vars相对应
layers.Print(ind, message="debug ind-: ")
#### 这里似乎有问题,分析每次打印的 "debug label_oh-:",似乎是累加了同样的数值
layers.Print(layers.reduce_sum(label_oh), message="debug label_oh-: ")
label_oh += fluid.one_hot(label[:, ind], depth, allow_out_of_range=True)
#####
ind += 1
return [ind, label, label_oh, depth]
ind = layers.fill_constant(shape=[1], dtype='int32', value=1) # 循环计数器
depth = self._padding_seqlen - 2
layers.Print(depth, message="debug depth: ")
ind, label, label_oh, depth = layers.while_loop(cond, body, [ind, label, label_oh, depth], name=scope_name + "while_loop")
label = layers.cast(label_oh, "float32")
layers.Print(layers.reduce_sum(label_oh), message="debug {}_label_ohs: ".format(scope_name))
其中一个step的打印结果:
Tensor[squeeze_6.tmp_0]
shape: [389,5,]
dtype: l
data: -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
1589175039 debug depth: The place is:CUDAPlace(0)
Tensor[tmp_143]
shape: [1,]
dtype: i
data: 72,
1589175039 debug expected_label_sum: The place is:CUDAPlace(0)
Tensor[reduce_sum_32.tmp_0]
shape: [1,]
dtype: f
data: 116, ## 期望值116
1589175039 debug ind-: The place is:CUDAPlace(0)
Tensor[fill_constant_0.tmp_0]
shape: [1,]
dtype: i
data: 1,
1589175039 debug label_oh-: The place is:CUDAPlace(0)
Tensor[reduce_sum_34.tmp_0]
shape: [1,]
dtype: f
data: 50,
1589175039 debug ind-: The place is:CUDAPlace(0)
Tensor[fill_constant_0.tmp_0]
shape: [1,]
dtype: i
data: 2,
1589175039 debug label_oh-: The place is:CUDAPlace(0)
Tensor[reduce_sum_34.tmp_0]
shape: [1,]
dtype: f
data: 96, ##这里96-50=36
1589175039 debug ind-: The place is:CUDAPlace(0)
Tensor[fill_constant_0.tmp_0]
shape: [1,]
dtype: i
data: 3,
1589175039 debug label_oh-: The place is:CUDAPlace(0)
Tensor[reduce_sum_34.tmp_0]
shape: [1,]
dtype: f
data: 142, ##这里142-96=36
1589175039 debug ind-: The place is:CUDAPlace(0)
Tensor[fill_constant_0.tmp_0]
shape: [1,]
dtype: i
data: 4,
1589175039 debug label_oh-: The place is:CUDAPlace(0)
Tensor[reduce_sum_34.tmp_0]
shape: [1,]
dtype: f
data: 188,
1589175039 debug label_oh: The place is:CUDAPlace(0)
Tensor[reduce_sum_33.tmp_0]
shape: [1,]
dtype: f
data: 50,
1589175039 debug emcell__label_ohs: The place is:CUDAPlace(0)
Tensor[reduce_sum_35.tmp_0]
shape: [1,]
dtype: f
data: 234, #这里最终算出来的234,和期望116不一致。