Created by: zhiqiu
PR types
Function optimization
PR changes
Others
Describe
Refine best_fit_allocator_test to support cuda-10.2
We found that the CUDA memory usage of each CUDADeviceContext increases from CUDA-10.1 to CUDA-10.2.
The original implementation of best_fit_allocator_test creates a new CUDADeviceContext for each thread and a large number of contexts may use off CUDA memory.
This PR fixes that problem.
