Created by: chenwhql
PR types
New features
PR changes
APIs
Describe
In order to simplify the writing of dynamic parallel training, remove DataParallel.scale_loss & apply_collective_grads.
-
DataParallel.scale_losss->loss.backward() -
DataParallel.apply_collective_grads->optimizer.step/minimize
TODO:
- remove two api in all related api example code (in next PR)
- move
scale_losssandapply_collective_gradsinto C++ for better preformance (in the future, now only for API stability of 2.0-RC)