Created by: Aurelius84
PR types
Others
PR changes
Others
Describe
Add Reinforcement learning unittest
TODO: In RL, we continuously select actions with multiple steps, then accumulate loss to apply optimization. But currently all vars shared with the same inner scope, which has problem in backward. I am work on it and will fix this in this PR https://github.com/PaddlePaddle/Paddle/pull/25579