paddle fluid 1.2 nce和paddle v2版本的nce计算loss上有什么区别吗,同样的网络结构,loss差别很大
Created by: huashaosmile
v2版本的nce使用 self.nce_layer = paddle.layer.nce( input=self.norm_fc_2, label=self._label_layer, num_classes=label_dim, param_attr=paddle.attr.Param(name='nce_w', initial_mean=0.0, initial_std=1.0), bias_attr=paddle.attr.Param(name='nce_b', initial_mean=0.0, initial_std=1.0), num_neg_samples=self._num_neg_samples, neg_distribution=self._label_freq )
fluid版本的nce使用 w_param = fluid.default_main_program().global_block().create_parameter(shape=[label_dim, hidden2_dim], dtype='float32', name='nce_w', default_initializer=fluid.initializer.Normal()) b_param = fluid.default_main_program().global_block().create_parameter(shape=[label_dim, 1], dtype='float32', name='nce_b', default_initializer=fluid.initializer.Normal())
# 配置NCE层
if not self._is_infer:
self.nce_layer = fluid.layers.nce(
input=self.norm_fc_2,
sampler='custom_dist',
custom_dist=self._dist,
label=self._label_layer,
num_total_classes=label_dim,
param_attr=fluid.ParamAttr(name='nce_w'),
bias_attr=fluid.ParamAttr(name='nce_b'),
num_neg_samples=self._num_neg_samples
)
另外实际使用时,当网络结构数据一样的情况下发现fluid训练比v2慢3倍左右