Training on 2 GPUs is slower than on 1 GPU
Created by: hanwentao
The training of the following code is slower on 2 GPUs than on 1 GPU. This is a dummy autoencoder. It lasts about 4 seconds for one iteration for input size 500k (i.e., 500,000 vectors of size 213) on 1 GPU, and lasts about 5 seconds on 2 GPUs.
https://gist.github.com/hanwentao/c7fc71350204b835fd3c4e55bc6b934f