Problem of BatchNorm in Fluid.
Created by: qingqing01
Now the moving mean and variance in batch_norm
are created as parameters and set trainable
False:
https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/layers/nn.py#L1512
mean = helper.create_parameter(
attr=ParamAttr(
name=moving_mean_name, initializer=Constant(0.0), trainable=False),
shape=param_shape,
dtype=input.dtype)
mean.stop_gradient = True
variance = helper.create_parameter(
attr=ParamAttr(
name=moving_variance_name,
initializer=Constant(1.0),
trainable=False),
shape=param_shape,
dtype=input.dtype)
variance.stop_gradient = True
But when see the ProgrameDesc proto string, there is still some calculation operators related to the moving mean and variance. For example, I print the ProgrameDesc in MobileNet-SSD of one GPU, the following proto string is about the moving variance(batch_norm_x.w_2
is the moving variance):
ops {
inputs {
parameter: "X"
arguments: "batch_norm_2.w_2"
}
outputs {
parameter: "Out"
arguments: "_generated_var_52"
}
type: "scale"
attrs {
name: "scale"
type: FLOAT
f: 4.99999987369e-05
}
}
ops {
inputs {
parameter: "X"
arguments: "batch_norm_2.w_2@GRAD"
}
inputs {
parameter: "Y"
arguments: "_generated_var_52"
}
outputs {
parameter: "Out"
arguments: "batch_norm_2.w_2@GRAD"
}
type: "elementwise_add"
attrs {
name: "axis"
type: INT
i: -1
}
}