Created by: sneaxiy
-
Inside PartialGradEngine, if create_graph = False, the gradient accumulator can also be erased when gradient accumulation is finished, so that gpu/cpu memory can be saved.
-
The constructor of VarBase to unpack VarBase from VariableWrapper should consider whether VariableWrapper has grad instead of parsing a bool flag to judge whether to create grad var.