Paddle will hang using multiply GPU when cuda driver is 367.35
Created by: reyoung
Issue from gitter chat.
When using driver 367.35 and Titan X
GPU, Paddle will hang when merging gradient, and the NVIDIA nccl benchmarks will hang too.
It seems that the driver problems, and same problem occurred in Torch/NCCL(https://github.com/NVIDIA/nccl/issues/39)