Created by: Xreki
In recurrent_grad
op, inside grads need to be accumulated to the outside grad. In the current implementation, sum_op
s are created to do this. If there are N inside grads, N - 1
sum_ops will be created, which each sum_op adds one inside grad to the outside grad.
In this PR, a single sum functor is called directly to add N inside grads.
This PR is tested in PaddingRNN padding model.
model | before | after | speedup |
---|---|---|---|
small | 116.2 | 93.6 | 24% |
large | 148.4 | 134.3 | 10% |