Created by: sneaxiy
This PR adds double grad implementation in dygraph mode. The implementation is based on paddle.fluid.dygraph.grad
which would calculate the gradients of y
with respect to x
(x
and y
can be any vars in the network).
StarGAN models with gradient penalty are tested in this PR.
This PR also rewrites some original codes of dygraph, such as VariableWrapper
, SavedVariableWrapperList
, GradOpNode
, etc. After testing in PTB model, the time cost of one epoch on V100 CUDA 9 machine is about 82-84s after this revision, which is the same as the original develop code.