* add vectorized bfloat16 atomicAdd * fix compile error * fix compile error again * fix V100 compile error * fix V100 compile again