Created by: wanghaoshuang
Fix average_accumulate_op for parallel executor.
Fix model average on multi-GPUs.