Created by: xiaoxuesheng1234
The number of CUDAPlace, which is used in ParallelExecutor, is 8. And the Program will be copied 8 copies。 我看8个显卡都占满了,但速度和单卡一样,求大神帮忙解决一下。