Refine runtime (#1108)
* only master machine saves plan and has event logger * separate Data, Persistence, Cache, Log FileSystem config * refine * only specify data and snapshot path conf * forbit multiple machines use localfs as snapshot fs * networkfs as localfs * refine * Store log to snapshot (#1109) * use machine id, drop machine name * ensure setting machine id * allow save snapshot to localfs for distributed training (#1113) * Snapshot to master (#1116) * allow save snapshot to localfs for distributed training * fix mdSave to master for model parallel * fix review comment issues * add sanity check for machine id * rm useless comments * update example * Dev refine runtime add log stream mgr (#1142) * add LogStreamMgr * refine and refactor OutStream=>LogStream * bugfix * use LogStreamMgr to write graph, dot, plan, profile and proto * refine * simplify, remove LogStreamMgr (#1243) * simplify, remove LogStreamMgr * TeePersistentLogStream add static factory (#1244)
Showing
想要评论请 注册 或 登录