Created by: chenwhql
PR types
New features
PR changes
APIs
Describe
In order to simplify the writing of dynamic parallel training, remove DataParallel.scale_loss & apply_collective_grads
.
-
DataParallel.scale_losss
->loss.backward()
-
DataParallel.apply_collective_grads
->optimizer.step/minimize
TODO:
- remove two api in all related api example code (in next PR)
- move
scale_losss
andapply_collective_grads
into C++ for better preformance (in the future, now only for API stability of 2.0-RC)