@@ -106,9 +106,11 @@ See function `_addup_repetitive_outputs_` in `backward.py` for implementation de
...
@@ -106,9 +106,11 @@ See function `_addup_repetitive_outputs_` in `backward.py` for implementation de
In our framework, variables can be marked as *no_gradient*, it means that the gradient of this variable is unnecessary and can be considered as zero in model training. Apparently, when all the outputs of some `grad_op` are marked as *no_gradient*, the `grad_op` itself can be skipped in backward pass.
In our framework, variables can be marked as *no_gradient*, it means that the gradient of this variable is unnecessary and can be considered as zero in model training. Apparently, when all the outputs of some `grad_op` are marked as *no_gradient*, the `grad_op` itself can be skipped in backward pass.
But these unnecessary gradients still need to be creating and initialized by something, otherwise following `grad_op`s who take these gradients as inputs take the risk of using uninitialized memory. In our code, we employ `fill_zeros_like_op` to initialize them as all zeros.
Another situation is all the gradient inputs of some `grad_op` are marked as *no_gradient*, which means all of them can be considered as zeros. For `grad_op`s are in essence the propagation of gradients, all the outputs are definitely zeros when all gradient inputs are zeros. Therefore the `grad_op` can also be skipped.
This features are implemented in function `_remove_no_grad_branch_`. It checks new created `grad_op`s one-by-one, removes whose outputs are all in `no_grad_set` or inserts `fill_zeros_like_op` when its necessary. We can get the `no_grad_set` from the `_append_backward_ops_` argument `no_grad_dict` or generate it on the fly by scanning all variables' `no_gradient` attribute(True or False).
It should be noted that all these zero gradients still need to be creating and initialized by something, otherwise following `grad_op`s who take these gradients as inputs take the risk of using uninitialized memory. In our code, we employ `fill_zeros_like_op` to initialize them as all zeros.
This features are implemented in function `_remove_no_grad_branch_`. It checks new created `grad_op`s one-by-one, removes who can be skipped and inserts `fill_zeros_like_op` when its necessary. We can get the `no_grad_set` from the `_append_backward_ops_` argument `no_grad_dict` or generate it on the fly by scanning all variables' `no_gradient` attribute(True or False).
@@ -301,8 +301,9 @@ for op in reversed(block.ops):
...
@@ -301,8 +301,9 @@ for op in reversed(block.ops):
<divclass="section"id="no-gradient-variables">
<divclass="section"id="no-gradient-variables">
<spanid="no-gradient-variables"></span><h4>No Gradient Variables<aclass="headerlink"href="#no-gradient-variables"title="Permalink to this headline">¶</a></h4>
<spanid="no-gradient-variables"></span><h4>No Gradient Variables<aclass="headerlink"href="#no-gradient-variables"title="Permalink to this headline">¶</a></h4>
<p>In our framework, variables can be marked as <em>no_gradient</em>, it means that the gradient of this variable is unnecessary and can be considered as zero in model training. Apparently, when all the outputs of some <codeclass="docutils literal"><spanclass="pre">grad_op</span></code> are marked as <em>no_gradient</em>, the <codeclass="docutils literal"><spanclass="pre">grad_op</span></code> itself can be skipped in backward pass.</p>
<p>In our framework, variables can be marked as <em>no_gradient</em>, it means that the gradient of this variable is unnecessary and can be considered as zero in model training. Apparently, when all the outputs of some <codeclass="docutils literal"><spanclass="pre">grad_op</span></code> are marked as <em>no_gradient</em>, the <codeclass="docutils literal"><spanclass="pre">grad_op</span></code> itself can be skipped in backward pass.</p>
<p>But these unnecessary gradients still need to be creating and initialized by something, otherwise following <codeclass="docutils literal"><spanclass="pre">grad_op</span></code>s who take these gradients as inputs take the risk of using uninitialized memory. In our code, we employ <codeclass="docutils literal"><spanclass="pre">fill_zeros_like_op</span></code> to initialize them as all zeros.</p>
<p>Another situation is all the gradient inputs of some <codeclass="docutils literal"><spanclass="pre">grad_op</span></code> are marked as <em>no_gradient</em>, which means all of them can be considered as zeros. For <codeclass="docutils literal"><spanclass="pre">grad_op</span></code>s are in essence the propagation of gradients, all the outputs are definitely zeros when all gradient inputs are zeros. Therefore the <codeclass="docutils literal"><spanclass="pre">grad_op</span></code> can also be skipped.</p>
<p>This features are implemented in function <codeclass="docutils literal"><spanclass="pre">_remove_no_grad_branch_</span></code>. It checks new created <codeclass="docutils literal"><spanclass="pre">grad_op</span></code>s one-by-one, removes whose outputs are all in <codeclass="docutils literal"><spanclass="pre">no_grad_set</span></code> or inserts <codeclass="docutils literal"><spanclass="pre">fill_zeros_like_op</span></code> when its necessary. We can get the <codeclass="docutils literal"><spanclass="pre">no_grad_set</span></code> from the <codeclass="docutils literal"><spanclass="pre">_append_backward_ops_</span></code> argument <codeclass="docutils literal"><spanclass="pre">no_grad_dict</span></code> or generate it on the fly by scanning all variables’<codeclass="docutils literal"><spanclass="pre">no_gradient</span></code> attribute(True or False).</p>
<p>It should be noted that all these zero gradients still need to be creating and initialized by something, otherwise following <codeclass="docutils literal"><spanclass="pre">grad_op</span></code>s who take these gradients as inputs take the risk of using uninitialized memory. In our code, we employ <codeclass="docutils literal"><spanclass="pre">fill_zeros_like_op</span></code> to initialize them as all zeros.</p>
<p>This features are implemented in function <codeclass="docutils literal"><spanclass="pre">_remove_no_grad_branch_</span></code>. It checks new created <codeclass="docutils literal"><spanclass="pre">grad_op</span></code>s one-by-one, removes who can be skipped and inserts <codeclass="docutils literal"><spanclass="pre">fill_zeros_like_op</span></code> when its necessary. We can get the <codeclass="docutils literal"><spanclass="pre">no_grad_set</span></code> from the <codeclass="docutils literal"><spanclass="pre">_append_backward_ops_</span></code> argument <codeclass="docutils literal"><spanclass="pre">no_grad_dict</span></code> or generate it on the fly by scanning all variables’<codeclass="docutils literal"><spanclass="pre">no_gradient</span></code> attribute(True or False).</p>
@@ -106,9 +106,11 @@ See function `_addup_repetitive_outputs_` in `backward.py` for implementation de
...
@@ -106,9 +106,11 @@ See function `_addup_repetitive_outputs_` in `backward.py` for implementation de
In our framework, variables can be marked as *no_gradient*, it means that the gradient of this variable is unnecessary and can be considered as zero in model training. Apparently, when all the outputs of some `grad_op` are marked as *no_gradient*, the `grad_op` itself can be skipped in backward pass.
In our framework, variables can be marked as *no_gradient*, it means that the gradient of this variable is unnecessary and can be considered as zero in model training. Apparently, when all the outputs of some `grad_op` are marked as *no_gradient*, the `grad_op` itself can be skipped in backward pass.
But these unnecessary gradients still need to be creating and initialized by something, otherwise following `grad_op`s who take these gradients as inputs take the risk of using uninitialized memory. In our code, we employ `fill_zeros_like_op` to initialize them as all zeros.
Another situation is all the gradient inputs of some `grad_op` are marked as *no_gradient*, which means all of them can be considered as zeros. For `grad_op`s are in essence the propagation of gradients, all the outputs are definitely zeros when all gradient inputs are zeros. Therefore the `grad_op` can also be skipped.
This features are implemented in function `_remove_no_grad_branch_`. It checks new created `grad_op`s one-by-one, removes whose outputs are all in `no_grad_set` or inserts `fill_zeros_like_op` when its necessary. We can get the `no_grad_set` from the `_append_backward_ops_` argument `no_grad_dict` or generate it on the fly by scanning all variables' `no_gradient` attribute(True or False).
It should be noted that all these zero gradients still need to be creating and initialized by something, otherwise following `grad_op`s who take these gradients as inputs take the risk of using uninitialized memory. In our code, we employ `fill_zeros_like_op` to initialize them as all zeros.
This features are implemented in function `_remove_no_grad_branch_`. It checks new created `grad_op`s one-by-one, removes who can be skipped and inserts `fill_zeros_like_op` when its necessary. We can get the `no_grad_set` from the `_append_backward_ops_` argument `no_grad_dict` or generate it on the fly by scanning all variables' `no_gradient` attribute(True or False).
<p>In our framework, variables can be marked as <em>no_gradient</em>, it means that the gradient of this variable is unnecessary and can be considered as zero in model training. Apparently, when all the outputs of some <codeclass="docutils literal"><spanclass="pre">grad_op</span></code> are marked as <em>no_gradient</em>, the <codeclass="docutils literal"><spanclass="pre">grad_op</span></code> itself can be skipped in backward pass.</p>
<p>In our framework, variables can be marked as <em>no_gradient</em>, it means that the gradient of this variable is unnecessary and can be considered as zero in model training. Apparently, when all the outputs of some <codeclass="docutils literal"><spanclass="pre">grad_op</span></code> are marked as <em>no_gradient</em>, the <codeclass="docutils literal"><spanclass="pre">grad_op</span></code> itself can be skipped in backward pass.</p>
<p>But these unnecessary gradients still need to be creating and initialized by something, otherwise following <codeclass="docutils literal"><spanclass="pre">grad_op</span></code>s who take these gradients as inputs take the risk of using uninitialized memory. In our code, we employ <codeclass="docutils literal"><spanclass="pre">fill_zeros_like_op</span></code> to initialize them as all zeros.</p>
<p>Another situation is all the gradient inputs of some <codeclass="docutils literal"><spanclass="pre">grad_op</span></code> are marked as <em>no_gradient</em>, which means all of them can be considered as zeros. For <codeclass="docutils literal"><spanclass="pre">grad_op</span></code>s are in essence the propagation of gradients, all the outputs are definitely zeros when all gradient inputs are zeros. Therefore the <codeclass="docutils literal"><spanclass="pre">grad_op</span></code> can also be skipped.</p>
<p>This features are implemented in function <codeclass="docutils literal"><spanclass="pre">_remove_no_grad_branch_</span></code>. It checks new created <codeclass="docutils literal"><spanclass="pre">grad_op</span></code>s one-by-one, removes whose outputs are all in <codeclass="docutils literal"><spanclass="pre">no_grad_set</span></code> or inserts <codeclass="docutils literal"><spanclass="pre">fill_zeros_like_op</span></code> when its necessary. We can get the <codeclass="docutils literal"><spanclass="pre">no_grad_set</span></code> from the <codeclass="docutils literal"><spanclass="pre">_append_backward_ops_</span></code> argument <codeclass="docutils literal"><spanclass="pre">no_grad_dict</span></code> or generate it on the fly by scanning all variables’<codeclass="docutils literal"><spanclass="pre">no_gradient</span></code> attribute(True or False).</p>
<p>It should be noted that all these zero gradients still need to be creating and initialized by something, otherwise following <codeclass="docutils literal"><spanclass="pre">grad_op</span></code>s who take these gradients as inputs take the risk of using uninitialized memory. In our code, we employ <codeclass="docutils literal"><spanclass="pre">fill_zeros_like_op</span></code> to initialize them as all zeros.</p>
<p>This features are implemented in function <codeclass="docutils literal"><spanclass="pre">_remove_no_grad_branch_</span></code>. It checks new created <codeclass="docutils literal"><spanclass="pre">grad_op</span></code>s one-by-one, removes who can be skipped and inserts <codeclass="docutils literal"><spanclass="pre">fill_zeros_like_op</span></code> when its necessary. We can get the <codeclass="docutils literal"><spanclass="pre">no_grad_set</span></code> from the <codeclass="docutils literal"><spanclass="pre">_append_backward_ops_</span></code> argument <codeclass="docutils literal"><spanclass="pre">no_grad_dict</span></code> or generate it on the fly by scanning all variables’<codeclass="docutils literal"><spanclass="pre">no_gradient</span></code> attribute(True or False).</p>