Created by: willthefrog
this bug is quite peculiar and hard to track down, when learning rate for a parameter
is set via param_attr and learning rate scheduler is used, append_optimizer_op
will fail.
turns out when learning rate scaling is done in _create_param_lr
, which
basically add a scale op, however, the scaling op is appended to the program of
the global_learning_rate()
variable, which is still in orig_prog
, therefore
the resulting scaled learning rate can not be found in train_prog
.
the reason it works previously w/o lr scaling is this:
clone()
will create a variable with the same name as the
global_learning_rate()
variable, which will be used in append_optimizer_op