fluid support asynchronous training
Created by: jacquesqiao
Project
https://github.com/PaddlePaddle/Paddle/projects/61
Design
- Add async update design doc. https://github.com/PaddlePaddle/Paddle/pull/9932
- Add distributed training overview doc. https://github.com/PaddlePaddle/Paddle/pull/9937
Operators
- VariableResponse support deserialize var into local scope. #10060
- Refine listen and serve op, Separate RunSyncLoop to a method, prepare for RunAsyncLoop. #10080
- split optimization ops on pserver to independenty blocks #10123
- Create sub socpe when it is necessary #10124
-
Add an RunAsyncUpdate(no barrier and no lock) to listen_and_serv_op #9997 (closed)
- Prepare optimization block and PrepareContext for each parameter.
- Add BlockQueue for each parameter block. The queue is used to store the gradient VariableMessage of this parameter from trainers.
- Add a thread for each parameter to run optimization block.
- The thread will read gradient from its BlockQueue, create a subscope to deserialize it and then use this subscope to run optimization block.
- Add one thread to get parameter from the global scope for trainers.(Maybe we need a thread pool to speed up the get process. but it seems that GRPC interface can only work in one thread. Can have a test)
- send_vars and read_vars from pserver without send_barrier and get_barrier.
- Use multi thread todo update #10228
#9997 (closed)
Transpiler-
dist transpile async trainer program. Do not need to add
.trainer_n
suffix to gradient block in async mode. - dist transpile async pserver program. Do not need to aggregate gradient block.
Consider
- need to consider how to add learning rate decay in asynchronous training. Do we need lr_decay?
Benchmark
- benchmark of fluid async training #10180 (closed)