add the support for pipeline (!24560) · 合并请求 · PaddlePaddle / Paddle

add the support for pipeline !24560

Created by: sandyhouse

PR types

New features

PR changes

Others

Describe

This pr implements the pipeline_trainer and the device worker (i.e., section_worker) to support pipeline. With pipeline, we mean a program is split into multiple sub-programs (sections) each of which is run on a device. The main purpose of pipeline is to train large-scale models that cannot fit on a single device or take advantage of different features of heterogeneous devices to make training more efficiently.

Currently, you have to use device_guard to assign the device on which a sub-program (section) runs; the whole dataset is used for each iteration as the train_from_dataset interface is used.

Todo:

use Executor.run instead of Executor.train_from_dataset.
use auto pipeline with fleet instead of device_guard.

PaddlePaddle / Paddle 大约 1 年 前同步成功