Created by: qingqing01
Fix #3883 (closed)
-
Correctly use host_vector
- If use the default allocator in host_vector, the LoD information can not be accessed in CUDA kernel. So use the
thrust::system::cuda::experimental::pinned_allocator<T>
allocator. - Add unit testing
lod_tensor_test.cu
to test LoDTensor for GPU.
- If use the default allocator in host_vector, the LoD information can not be accessed in CUDA kernel. So use the
-
Expose LoDTensor to Python.
- Expose LoDTensor in pybind.
- Add unit testing.