Correctly use host_vector in LoDTensor and expose LoDTensor to Python. (!4001) · 合并请求 · PaddlePaddle / Paddle

Correctly use host_vector in LoDTensor and expose LoDTensor to Python. !4001

Created by: qingqing01

Correctly use host_vector
- If use the default allocator in host_vector, the LoD information can not be accessed in CUDA kernel. So use the thrust::system::cuda::experimental::pinned_allocator<T> allocator.
- Add unit testing lod_tensor_test.cu to test LoDTensor for GPU.
Expose LoDTensor to Python.
- Expose LoDTensor in pybind.
- Add unit testing.