Created by: kexinzhao
Right now, the data_type_transform test on GPU occasionally fails.
In order to fix this, we need to add additional context.Wait() between tensor copy and transform on GPU.