multi-loss optimization by adding a DownpourOpt worker (!22025) · 合并请求 · PaddlePaddle / Paddle

multi-loss optimization by adding a DownpourOpt worker !22025

Created by: yaoxuefeng6

This pr add a designed new version of downpouropt worker to get a batter speed performance when there are multi losses in a program. in Python front-end， we fix _minimize function of distributed optimizer to support multi-loss in one program. To get better speed performance when there are multi losses, we add some additional attribute of sparse table: 1, set a sparse table "is_local" to pull all feasigns embedding before a training pass from parameter server and generate a local table, while training, the embeddings can be directly pull from this local table. This attribute can be used when a sparse table is only used for ff. 2, set a sparse table "is_async" to asynchronously run independent ops while pulling this sparse table's embeddings. other independent ops can be run when pull sparse asynchronously in worker. 3, Use DownpourOpt worker to adjust ops order to gather forward ops and backwards ops by optimized loss in program. So that we can run ops flexibly in this worker. Currently, this worker has good support on these two attribute as mentioned above. example:

adam = fluid.optimizer.Adam(learning_rate=0.000005)
adam = fleet.distributed_optimizer(adam, strategy={"device_worker":"DownpourSGDOpt"})

PaddlePaddle / Paddle 1 年多 前同步成功

multi-loss optimization by adding a DownpourOpt worker !22025

PaddlePaddle / Paddle
1 年多前同步成功