Created by: xuezhong
distributed trainning for transformer,two options: 1 device cpu, which can run only with cpu instead of GPU 2 local, which can run on multi machines