Fix sgd learing_rate problem
Created by: jacquesqiao
problem
some optimizer operator may change learning_rate during training, so learning_rate should be a tensor in scope.
In the formal PR, we move the learning_rate from attribute to tensor, but we just get it like
float lr = ctx.Input<Tensor>("LearningRate")->data<float>()[0];
This will have problem when the Tensor is a GPUTensor because we can't directly get value from GPUTensor.
way to fix
use Eigen tensor to represent the learning_rata and let Eigen do the computation but not get it's value as float.