checkpoint for distributed training.
* wip * wip * refactor no.1 dir structure test=develop * fix linking test=develop * fix includes test=develop * fix build test=develop * fix build test=develop