* add new api/op kthvalue * kthvalue cuda kernel to cub sorting * fix example code error * throw errors instead of LOG in cuda sort * throw errors by Paddle_ENFORCE
* add new OP mode * rename trans-variable name and fix UT