Python GPU unit test can not run in parallel fully
Created by: QiJune
Paddle launches unit test in parallel, ctest -j6
. However, in GPU python unit test, we have to copy numpy array data to GPU, and then copy GPU data to numpy array. We use default CUDA stream in these two copy jobs.
In each GPU python unit test, executor will create its own DeviceContext, containing a CUDA stream. The computation between unit tests are in parallel, but the copy is sequential. We have to refine this.