Initializer

Initializer

class paddle.v2.fluid.initializer.Initializer

Base class for variable initializers

Defines the common interface of variable initializers. They add operations to the init program that are used to initialize variables. Users should not use this class directly, but need to use one of its implementations.

ConstantInitializer

class paddle.v2.fluid.initializer.ConstantInitializer(value=0.0)

Implements the constant initializer

UniformInitializer

class paddle.v2.fluid.initializer.UniformInitializer(low=-1.0, high=1.0, seed=0)

Implements the random uniform distribution initializer

NormalInitializer

class paddle.v2.fluid.initializer.NormalInitializer(loc=0.0, scale=1.0, seed=0)

Implements the random Normal(Gaussian) distribution initializer

XavierInitializer

class paddle.v2.fluid.initializer.XavierInitializer(uniform=True, fan_in=None, fan_out=None, seed=0)

Implements the Xavier initializer

This class implements the Xavier weight initializer from the paper Understanding the difficulty of training deep feedforward neural networks[1] by Xavier Glorot and Yoshua Bengio.

This initializer is designed to keep the scale of the gradients approximately same in all the layers. In case of Uniform distribution, the range is [-x, x], where x = sqrt(6 / (fan_in + fan_out)). In case of Normal distribution, the mean is 0 and the standard deviation is sqrt(2/ (fan_in + fan_out)).

References

[1] Understanding the difficulty of training deep feedforward neural
networks. International conference on artificial intelligence and statistics. (http://proceedings.mlr.press/v9/glorot10a.html)

MSRAInitializer

class paddle.v2.fluid.initializer.MSRAInitializer(uniform=True, fan_in=None, seed=0)

Implements the MSRA initializer a.k.a. Kaiming Initializer

This class implements the weight initialization from the paper Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification[1] by Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun. This is a robust initialization method that particularly considers the rectifier nonlinearities. In case of Uniform distribution, the range is [-x, x], where x = sqrt(6 / fan_in). In case of Normal distribution, the mean is 0 and the standard deviation is sqrt(2/ fan_in).

References

[1] Delving Deep into Rectifiers: Surpassing Human-Level Performance
on ImageNet Classification (https://arxiv.org/abs/1502.01852)