optimizer must be defined before network topology, otherwise BUG occurs.
Created by: lcy-seso
-
PaddlePaddle simplifies the way it parses the network configuration by introducing some global variables in this PR https://github.com/PaddlePaddle/Paddle/pull/2288.
-
optimizer
is used to set the default values for some parameters, includingL2 regularization
,gradient_clipping_threshold
and so on. These settings are stored in some global variables, so before parsing the network topology, these global variables should be correctly initialized. -
This means the definition of optimizer must be called before the definition of network topology to enable the global parameter settings.
-
The following codes are wrong because
gradient_clipping_threshold
andL2 regularization
will not work.cost = seq2seq_net(source_dict_dim, target_dict_dim) parameters = paddle.parameters.create(cost) optimizer = paddle.optimizer.RMSProp( learning_rate=1e-3, gradient_clipping_threshold=10.0, regularization=paddle.optimizer.L2Regularization(rate=8e-4))
-
This will be right:
optimizer = paddle.optimizer.RMSProp( learning_rate=1e-3, gradient_clipping_threshold=10.0, regularization=paddle.optimizer.L2Regularization(rate=8e-4)) cost = seq2seq_net(source_dict_dim, target_dict_dim) parameters = paddle.parameters.create(cost)
-
This is very dangerous for users because no error is reported.