Created by: liu-plus-wei
cudaStreamSynchronize randomly hang when used in multi-thread environment, replace it with cudaStreamQuery API on windows