Shrink batch_norm_grad's inputs
Created by: tonyyang-svail
Currently, batch_norm_grad_op is generated by the default grad_op_maker. This is not ideal since MeanOut@GRAD
and VarianceOut@GRAD
are unused in the programDesc.
Op(batch_norm_grad),
inputs:{
Bias[batch_norm_0.b_0[64]({})],
Mean[batch_norm_0.w_1[64]({})],
MeanOut[batch_norm_0.w_1[64]({})],
MeanOut@GRAD[batch_norm_0.w_1@GRAD[64]({})],
SavedMean[batch_norm_0.tmp_0[64]({})],
SavedMean@GRAD[batch_norm_0.tmp_0@GRAD[64]({})],
SavedVariance[batch_norm_0.tmp_1[64]({})],
SavedVariance@GRAD[batch_norm_0.tmp_1@GRAD[64]({})],
Scale[batch_norm_0.w_0[64]({})],
Variance[batch_norm_0.w_2[64]({})],
VarianceOut[batch_norm_0.w_2[64]({})],
VarianceOut@GRAD[batch_norm_0.w_2@GRAD[64]({})],
X[conv2d_0.tmp_0[10, 64, 100, 100]({})],
Y[batch_norm_0.tmp_2[10, 64, 100, 100]({})],
Y@GRAD[batch_norm_0.tmp_2@GRAD[10, 64, 100, 100]({})]},
outputs:{
Bias@GRAD[batch_norm_0.b_0@GRAD[64]({})],
Mean@GRAD[],
Scale@GRAD[batch_norm_0.w_0@GRAD[64]({})],
Variance@GRAD[],
X@GRAD[conv2d_0.tmp_0@GRAD[10, 64, 100, 100]({})]
}