Created by: chengduoZH
WIP
The current CUDA Runtime Documentation states: Asynchronous(Memcpy): - For transfers from device memory to pageable host memory, the function will return only once the copy has completed.