Parameter Attribute Design and initializers
Created by: reyoung
We extract param_initializers
from param_attr
in PR #5760.
Problems
There are five arugments passed by param_attr
before. They are learning rate, name, initializer and regularizer, trainable. If we extract all of them into a indivitual parameter, there could be several problems.
- The number of layer parameters are increased.
- From
fc(param_attr=...)
, tofc(param_name=..., param_initializer=..., param_regularizer=..., param_learning_rate=..., param_trainable=...)
- It is hard to add more field in param_attr.
- Suppose we want to add a attribute to all parameter, like param_XXX. We will change all layers implementation now.
- It will provide a paradox of API meanings.
- For example, if we set
fc(use_bias=False, bias_initializer=UniformInitializer(-1.0, 1.0))
. What does it mean?
- It is hard to specify attributes for multiple parameters.
- For example.
fc(input=[i1, i2], param_name=["w1", "w2"], size=100, param_learning_rate=[1.0, 0.5])
Solution
I think we may unify all arguments of parameter into one strong typed ParamAttr
.
It could be
class ParamAttr(object):
def __init__(self, name=None, initializer=None, regularizer=None, learning_rate=1.0, trainable=True):
self.name = name
self.initializer = initializer
self.regularizer = regularizer
self.learning_rate = learning_rate
self.trainable = trainable
Users can specify parameter arugments by
fc(input=[i1, i2], param_attr=[
ParamAttr(name='w1', initializer=Uniform(-1, 1)),
ParamAttr(name='w2', initializer=Uniform(0, 1))
], bias_attr=False)