GPU results are non-deterministic
Created by: daming-lu
When we stabilize all the randomness and run the same training twice on GPU (same GPU core), the results are different by a tiny precision. This is NOT happening on CPU.
See the attachments. The demo code is in this PR
vimdiff w2v_nocuda_t1.txt w2v_nocuda_t2.txt // no difference
vimdiff w2v_t1_cuda.txt w2v_t2_cuda.txt // has some tiny difference
w2v_nocuda_t1.txt w2v_nocuda_t2.txt w2v_t1_cuda.txt w2v_t2_cuda.txt