Fork自 PaddlePaddle / Paddle
* test=develop, add op_register_version for roll_op
* test=develop, optimize roll_op_cuda_kernel