Created by: reyoung
Since GPU is an async device by default. We should sync computation when Python invoke run. So Python can get the correct computation result