Created by: qingqing01
- Update backward.py:
- If there is no input grad var in all outputs of previous ops, do not append this op into graph.
- Only apply this stragety when double backward.
- Update some double backward op.
- Update sum_op to judge whether a tensor is empty by numel or IsInitialized().