Problem of BatchNorm in Fluid. (#9273) · Issue · PaddlePaddle / Paddle

Problem of BatchNorm in Fluid.

Created by: qingqing01

Now the moving mean and variance in batch_norm are created as parameters and set trainable False:

https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/layers/nn.py#L1512

   mean = helper.create_parameter(
        attr=ParamAttr(
            name=moving_mean_name, initializer=Constant(0.0), trainable=False),
        shape=param_shape,
        dtype=input.dtype)
    mean.stop_gradient = True

    variance = helper.create_parameter(
        attr=ParamAttr(
            name=moving_variance_name,
            initializer=Constant(1.0),
            trainable=False),
        shape=param_shape,
        dtype=input.dtype)
    variance.stop_gradient = True

But when see the ProgrameDesc proto string, there is still some calculation operators related to the moving mean and variance. For example, I print the ProgrameDesc in MobileNet-SSD of one GPU, the following proto string is about the moving variance(batch_norm_x.w_2 is the moving variance):

  ops {
    inputs {
      parameter: "X"
      arguments: "batch_norm_2.w_2"
    }
    outputs {
      parameter: "Out"
      arguments: "_generated_var_52"
    }
    type: "scale"
    attrs {
      name: "scale"
      type: FLOAT
      f: 4.99999987369e-05
    }
  }
  ops {
    inputs {
      parameter: "X"
      arguments: "batch_norm_2.w_2@GRAD"
    }
    inputs {
      parameter: "Y"
      arguments: "_generated_var_52"
    }
    outputs {
      parameter: "Out"
      arguments: "batch_norm_2.w_2@GRAD"
    }
    type: "elementwise_add"
    attrs {
      name: "axis"
      type: INT
      i: -1
    }
  }

PaddlePaddle / Paddle 1 年多 前同步成功

Problem of BatchNorm in Fluid.

PaddlePaddle / Paddle
1 年多前同步成功