The hyper parameters of paddle.optimizer does not work in v2 API.
Created by: qingqing01
The hyper parameters in the paddle.optimizer does not work in v2 API. For example, using momentum optimizer in the sentiment demo as follows,
optimizer = paddle.optimizer.Momentum(
learning_rate=2e-3,
momentum=0.9,
gradient_clipping_threshold=25.0,
regularization=paddle.optimizer.L2Regularization(rate=8e-4),
model_average=paddle.optimizer.ModelAverage(average_window=0.5))
Then print the proto-string
of config before this line in python/paddle/v2/trainer.py
, it can be found that the proto-string
of parameters does not contain the hyper parameters, such as L2 regularization and momentum. The momentum is 0 if you print it before this line in paddle/parameter/FirstOrderOptimizer.h
. The proto-string
of parameters are as follows,
parameters {
name: "___embedding_layer_0__.w0"
size: 658816
initial_mean: 0.0
initial_std: 0.0139387206988
dims: 5147
dims: 128
initial_strategy: 0
initial_smart: true
}
parameters {
name: "___sequence_conv_pool_0___conv_fc.w0"
size: 49152
initial_mean: 0.0
initial_std: 0.051031036308
dims: 384
dims: 128
initial_strategy: 0
initial_smart: true
}
....
But the correct proto-string
of parameters should contain decay_rate and momentum, as follows,
parameters {
name: "___embedding_0__.w0"
size: 3840000
momentum: 0.9
initial_mean: 0.0
initial_std: 0.0057735026919
decay_rate: 0.0008
dims: 30000
dims: 128
initial_strategy: 0
initial_smart: true
gradient_clipping_threshold: 25.0
}
parameters {
name: "___fc_layer_0__.w0"
size: 65536
momentum: 0.9
initial_mean: 0.0
initial_std: 0.0883883476483
decay_rate: 0.0008
dims: 128
dims: 512
initial_strategy: 0
initial_smart: true
gradient_clipping_threshold: 25.0
}
...