Created by: sandyhouse
supporting distributed training on 1, 2, 4, and 8 machines and supporting fp16