Created by: jacquesqiao
- Each trainer use part of the data and make sure one batch use the same data as 1 trainer.
- Fix sparse update scale problem, it also need to scale with 1/trainer_num as dense parameter.
- fix DebugStringEx crash problem.
- scale op support SelectedRows as input.