Parameter Attribute Design and initializers (#5819) · Issue · PaddlePaddle / Paddle

Parameter Attribute Design and initializers

Created by: reyoung

We extract param_initializers from param_attr in PR #5760.

Problems

There are five arugments passed by param_attr before. They are learning rate, name, initializer and regularizer, trainable. If we extract all of them into a indivitual parameter, there could be several problems.

The number of layer parameters are increased.

From fc(param_attr=...), to fc(param_name=..., param_initializer=..., param_regularizer=..., param_learning_rate=..., param_trainable=...)

It is hard to add more field in param_attr.

Suppose we want to add a attribute to all parameter, like param_XXX. We will change all layers implementation now.

It will provide a paradox of API meanings.

For example, if we set fc(use_bias=False, bias_initializer=UniformInitializer(-1.0, 1.0)). What does it mean?

It is hard to specify attributes for multiple parameters.

For example. fc(input=[i1, i2], param_name=["w1", "w2"], size=100, param_learning_rate=[1.0, 0.5])

Solution

I think we may unify all arguments of parameter into one strong typed ParamAttr.

It could be

class ParamAttr(object):
   def __init__(self, name=None, initializer=None, regularizer=None, learning_rate=1.0, trainable=True):
      self.name = name
      self.initializer = initializer
      self.regularizer = regularizer
      self.learning_rate = learning_rate
      self.trainable = trainable

Users can specify parameter arugments by

fc(input=[i1, i2], param_attr=[
  ParamAttr(name='w1', initializer=Uniform(-1, 1)),
  ParamAttr(name='w2', initializer=Uniform(0, 1))
], bias_attr=False)

PaddlePaddle / Paddle 大约 2 年 前同步成功

Parameter Attribute Design and initializers

Problems

Solution

PaddlePaddle / Paddle
大约 2 年前同步成功