Created by: yaoxuefeng6
Add a designed new version of downpour worker to get a batter speed performance when there are multi losses in a program. 1, set a sparse table "is_local" to pull all feasigns embedding before a training pass from parameter server 2,set a sparse table "is_async" to asynchronously run independent ops while pulling this sparse table's embeddings. 3, Use DownpourOpt worker to adjust ops order to gather forward ops and backwards ops by optimized loss in program. examplde: adam = fluid.optimizer.Adam(learning_rate=0.000005) adam = fleet.distributed_optimizer(adam, strategy={"device_worker":"DownpourSGDOpt"})