Apply advantage function for reinforcement learning
Created by: reyoung
Users want to use Paddle for reinforcement learning. For RL, the advantage function should be applied for gradient or optimizer. We should let users write training for-loop
to complete this feature.
https://hackmit-baidu.slack.com/archives/C727D998C/p1505585517000037