Split send_op into fetch_vars_op and send_vars_op
Created by: Yancey1989
Currently, trainer would send all gradients after execution of all the backward ops, like:
w1-->opA->w2->opB->opB(backward)->w2'->opB(backward)->w1'->send(w1',w2')
For the above process, send op will send all gradients until all the forward, backward ops done.
But actually, we would send the w2'
after opB(backward), send w1'
after opA(backward), parallel execution of computing op and IO op would improve the performance. For another hand, current SendOp
would not only do SEND, but also wait all send request finished
and receive parameters from pserver
, so we also need to split these into multiple Op.
For sync update
fetch(w1)-->opA->fetch(w2)->opB->opB(backward)->w2'->send(w2')->opB(backward)->w1'->send(w1')->send_barrier()
for async update, there is no send_varrier()
op at the end of the process.
fetch(w1)-->opA->fetch(w2)->opB->opB(backward)->w2'->send(w2')->opB(backward)->w1'->send(w1')
TODO
-
Implement
AsyncSendOp
,SendBarrierOp
. - Implement an IO threadpool to deal with Async Send.
-
Enhancement
distribute transpiler
with the async send op. - Update benchmark report.