Error clip is widely used in model training to prevent gradient exploding. It takes some specific rules to adjust variables' gradients and prevent them from being too large. With it, values of a gradient will be checked before they are taken by the next `grad_op` and be shrunk if necessary.
## Usage
Users are allowed to assign different error clip methods or attributes to different `Variable`s. Users can specify it as a parameter of `Variable`'s constructor:
```python
var = framework.Variable(..., error_clip=myErrorClip, ...)
```
The default value of `error_clip` is `None`, which means no error clip is employed. When it's not `None`, it should take an object of `BaseErrorClipAttr`'s derived class. So far, `BaseErrorClipAttr` has only one derived class: `ErrorClipByValue`, whose constructor is:
```python
ErrorClipByValue(max, min=None)
```
`max` and `min` represent the maximal and minimal clip threshold respectively. In backward pass, all values of `var`'s gradient greater than `max` or less than `min` will be clipped to `max` and `min` respectively. When the `min` is None, the minimal threshold will be assigned with `-max` automatically.
So we can enable the error clip with threshold `[-5.0, 5.0]` for variable `var` by:
```python
var = framework.Variable(..., error_clip=ErrorClipByValue(max=5.0), ...)
```
## Implementation
The `BaseErrorClipAttr` and its derived class `ErrorClipByValue` are defined in *clip.py*.
```python
class BaseErrorClipAttr(object):
def append_clip_op(self, block, grad_name):
raise NotImplementedError()
class ErrorClipByValue(BaseErrorClipAttr):
def __init__(self, max, min=None):
max = float(max)
if min is None:
min = -max
else:
min = float(min)
self.max = max
self.min = min
def append_clip_op(self, block, grad_name):
block.append_op(
type="clip",
inputs={"X": grad_name},
outputs={"Out": grad_name},
attrs={"min": self.min,
"max": self.max})
```
The `BaseErrorClipAttr` have one main member functions: `append_clip_op(self, block, grad_name)`.
This function is used to create a `clip_op` and append it to the end of given `block`. For different error clip algorithm require different `clip_op`, the function is defined as virtual in the base class. All derived classes must implement their own versions of this function.
These `clip_op`s should be inserted after `grad_op`s whose output gradients need to be clipped. It is equivalent to appending some `clip_op`s to the end of the target block every time a new `grad_op` is added.
```python
for op_desc in grad_op_descs:
new_op_desc = target_block.desc.append_op()
new_op_desc.copy_from(op_desc)
callback(block=target_block, context=grad_to_var)
```
Here we employ a callback function to complete this kind of jobs. In `_append_backward_ops_` function, each time after a `grad_op` is added to the `target_block`, a callback function is invoked. The logic of `clip_op` appending can be implemented inside the callback function.
The callback function for `clip_op` appending is defined in *clip.py*:
```python
def error_clip_callback(block, context):
# the context is a grad_to_var map
grad_to_var = context
op_desc = block.desc.op(block.desc.op_size() - 1)
for grad_n in filter(lambda n: grad_to_var.has_key(n),
This function takes a `block` and a `context`(which is actually a grad\_to\_var map) as inputs. It checks each output of the last `OpDesc` in the `block`. Notice that the last `OpDesc` of the `block` must be a `grad_op` and its outputs must be some forward variables' gradients. If an output gradient's corresponding forward variable has an attribute of `error_clip`, `error_clip_callback` will call the `error_clip`'s `append_clip_op` function to append the required `clip_op` into the `block`.
<spanid="error-clip"></span><h1>Error Clip<aclass="headerlink"href="#error-clip"title="Permalink to this headline">¶</a></h1>
<divclass="section"id="overview">
<spanid="overview"></span><h2>Overview<aclass="headerlink"href="#overview"title="Permalink to this headline">¶</a></h2>
<p>Error clip is widely used in model training to prevent gradient exploding. It takes some specific rules to adjust variables’ gradients and prevent them from being too large. With it, values of a gradient will be checked before they are taken by the next <codeclass="docutils literal"><spanclass="pre">grad_op</span></code> and be shrunk if necessary.</p>
</div>
<divclass="section"id="usage">
<spanid="usage"></span><h2>Usage<aclass="headerlink"href="#usage"title="Permalink to this headline">¶</a></h2>
<p>Users are allowed to assign different error clip methods or attributes to different <codeclass="docutils literal"><spanclass="pre">Variable</span></code>s. Users can specify it as a parameter of <codeclass="docutils literal"><spanclass="pre">Variable</span></code>‘s constructor:</p>
<p>The default value of <codeclass="docutils literal"><spanclass="pre">error_clip</span></code> is <codeclass="docutils literal"><spanclass="pre">None</span></code>, which means no error clip is employed. When it’s not <codeclass="docutils literal"><spanclass="pre">None</span></code>, it should take an object of <codeclass="docutils literal"><spanclass="pre">BaseErrorClipAttr</span></code>‘s derived class. So far, <codeclass="docutils literal"><spanclass="pre">BaseErrorClipAttr</span></code> has only one derived class: <codeclass="docutils literal"><spanclass="pre">ErrorClipByValue</span></code>, whose constructor is:</p>
<p><codeclass="docutils literal"><spanclass="pre">max</span></code> and <codeclass="docutils literal"><spanclass="pre">min</span></code> represent the maximal and minimal clip threshold respectively. In backward pass, all values of <codeclass="docutils literal"><spanclass="pre">var</span></code>‘s gradient greater than <codeclass="docutils literal"><spanclass="pre">max</span></code> or less than <codeclass="docutils literal"><spanclass="pre">min</span></code> will be clipped to <codeclass="docutils literal"><spanclass="pre">max</span></code> and <codeclass="docutils literal"><spanclass="pre">min</span></code> respectively. When the <codeclass="docutils literal"><spanclass="pre">min</span></code> is None, the minimal threshold will be assigned with <codeclass="docutils literal"><spanclass="pre">-max</span></code> automatically.</p>
<p>So we can enable the error clip with threshold <codeclass="docutils literal"><spanclass="pre">[-5.0,</span><spanclass="pre">5.0]</span></code> for variable <codeclass="docutils literal"><spanclass="pre">var</span></code> by:</p>
<spanid="implementation"></span><h2>Implementation<aclass="headerlink"href="#implementation"title="Permalink to this headline">¶</a></h2>
<p>The <codeclass="docutils literal"><spanclass="pre">BaseErrorClipAttr</span></code> and its derived class <codeclass="docutils literal"><spanclass="pre">ErrorClipByValue</span></code> are defined in <em>clip.py</em>.</p>
<p>The <codeclass="docutils literal"><spanclass="pre">BaseErrorClipAttr</span></code> have one main member functions: <codeclass="docutils literal"><spanclass="pre">append_clip_op(self,</span><spanclass="pre">block,</span><spanclass="pre">grad_name)</span></code>.</p>
<p>This function is used to create a <codeclass="docutils literal"><spanclass="pre">clip_op</span></code> and append it to the end of given <codeclass="docutils literal"><spanclass="pre">block</span></code>. For different error clip algorithm require different <codeclass="docutils literal"><spanclass="pre">clip_op</span></code>, the function is defined as virtual in the base class. All derived classes must implement their own versions of this function.</p>
<p>These <codeclass="docutils literal"><spanclass="pre">clip_op</span></code>s should be inserted after <codeclass="docutils literal"><spanclass="pre">grad_op</span></code>s whose output gradients need to be clipped. It is equivalent to appending some <codeclass="docutils literal"><spanclass="pre">clip_op</span></code>s to the end of the target block every time a new <codeclass="docutils literal"><spanclass="pre">grad_op</span></code> is added.</p>
<p>Here we employ a callback function to complete this kind of jobs. In <codeclass="docutils literal"><spanclass="pre">_append_backward_ops_</span></code> function, each time after a <codeclass="docutils literal"><spanclass="pre">grad_op</span></code> is added to the <codeclass="docutils literal"><spanclass="pre">target_block</span></code>, a callback function is invoked. The logic of <codeclass="docutils literal"><spanclass="pre">clip_op</span></code> appending can be implemented inside the callback function.</p>
<p>The callback function for <codeclass="docutils literal"><spanclass="pre">clip_op</span></code> appending is defined in <em>clip.py</em>:</p>
<p>This function takes a <codeclass="docutils literal"><spanclass="pre">block</span></code> and a <codeclass="docutils literal"><spanclass="pre">context</span></code>(which is actually a grad_to_var map) as inputs. It checks each output of the last <codeclass="docutils literal"><spanclass="pre">OpDesc</span></code> in the <codeclass="docutils literal"><spanclass="pre">block</span></code>. Notice that the last <codeclass="docutils literal"><spanclass="pre">OpDesc</span></code> of the <codeclass="docutils literal"><spanclass="pre">block</span></code> must be a <codeclass="docutils literal"><spanclass="pre">grad_op</span></code> and its outputs must be some forward variables’ gradients. If an output gradient’s corresponding forward variable has an attribute of <codeclass="docutils literal"><spanclass="pre">error_clip</span></code>, <codeclass="docutils literal"><spanclass="pre">error_clip_callback</span></code> will call the <codeclass="docutils literal"><spanclass="pre">error_clip</span></code>‘s <codeclass="docutils literal"><spanclass="pre">append_clip_op</span></code> function to append the required <codeclass="docutils literal"><spanclass="pre">clip_op</span></code> into the <codeclass="docutils literal"><spanclass="pre">block</span></code>.</p>
Built with <ahref="http://sphinx-doc.org/">Sphinx</a> using a <ahref="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <ahref="https://readthedocs.org">Read the Docs</a>.
Error clip is widely used in model training to prevent gradient exploding. It takes some specific rules to adjust variables' gradients and prevent them from being too large. With it, values of a gradient will be checked before they are taken by the next `grad_op` and be shrunk if necessary.
## Usage
Users are allowed to assign different error clip methods or attributes to different `Variable`s. Users can specify it as a parameter of `Variable`'s constructor:
```python
var = framework.Variable(..., error_clip=myErrorClip, ...)
```
The default value of `error_clip` is `None`, which means no error clip is employed. When it's not `None`, it should take an object of `BaseErrorClipAttr`'s derived class. So far, `BaseErrorClipAttr` has only one derived class: `ErrorClipByValue`, whose constructor is:
```python
ErrorClipByValue(max, min=None)
```
`max` and `min` represent the maximal and minimal clip threshold respectively. In backward pass, all values of `var`'s gradient greater than `max` or less than `min` will be clipped to `max` and `min` respectively. When the `min` is None, the minimal threshold will be assigned with `-max` automatically.
So we can enable the error clip with threshold `[-5.0, 5.0]` for variable `var` by:
```python
var = framework.Variable(..., error_clip=ErrorClipByValue(max=5.0), ...)
```
## Implementation
The `BaseErrorClipAttr` and its derived class `ErrorClipByValue` are defined in *clip.py*.
```python
class BaseErrorClipAttr(object):
def append_clip_op(self, block, grad_name):
raise NotImplementedError()
class ErrorClipByValue(BaseErrorClipAttr):
def __init__(self, max, min=None):
max = float(max)
if min is None:
min = -max
else:
min = float(min)
self.max = max
self.min = min
def append_clip_op(self, block, grad_name):
block.append_op(
type="clip",
inputs={"X": grad_name},
outputs={"Out": grad_name},
attrs={"min": self.min,
"max": self.max})
```
The `BaseErrorClipAttr` have one main member functions: `append_clip_op(self, block, grad_name)`.
This function is used to create a `clip_op` and append it to the end of given `block`. For different error clip algorithm require different `clip_op`, the function is defined as virtual in the base class. All derived classes must implement their own versions of this function.
These `clip_op`s should be inserted after `grad_op`s whose output gradients need to be clipped. It is equivalent to appending some `clip_op`s to the end of the target block every time a new `grad_op` is added.
```python
for op_desc in grad_op_descs:
new_op_desc = target_block.desc.append_op()
new_op_desc.copy_from(op_desc)
callback(block=target_block, context=grad_to_var)
```
Here we employ a callback function to complete this kind of jobs. In `_append_backward_ops_` function, each time after a `grad_op` is added to the `target_block`, a callback function is invoked. The logic of `clip_op` appending can be implemented inside the callback function.
The callback function for `clip_op` appending is defined in *clip.py*:
```python
def error_clip_callback(block, context):
# the context is a grad_to_var map
grad_to_var = context
op_desc = block.desc.op(block.desc.op_size() - 1)
for grad_n in filter(lambda n: grad_to_var.has_key(n),
This function takes a `block` and a `context`(which is actually a grad\_to\_var map) as inputs. It checks each output of the last `OpDesc` in the `block`. Notice that the last `OpDesc` of the `block` must be a `grad_op` and its outputs must be some forward variables' gradients. If an output gradient's corresponding forward variable has an attribute of `error_clip`, `error_clip_callback` will call the `error_clip`'s `append_clip_op` function to append the required `clip_op` into the `block`.
<p>Error clip is widely used in model training to prevent gradient exploding. It takes some specific rules to adjust variables’ gradients and prevent them from being too large. With it, values of a gradient will be checked before they are taken by the next <codeclass="docutils literal"><spanclass="pre">grad_op</span></code> and be shrunk if necessary.</p>
<p>Users are allowed to assign different error clip methods or attributes to different <codeclass="docutils literal"><spanclass="pre">Variable</span></code>s. Users can specify it as a parameter of <codeclass="docutils literal"><spanclass="pre">Variable</span></code>‘s constructor:</p>
<p>The default value of <codeclass="docutils literal"><spanclass="pre">error_clip</span></code> is <codeclass="docutils literal"><spanclass="pre">None</span></code>, which means no error clip is employed. When it’s not <codeclass="docutils literal"><spanclass="pre">None</span></code>, it should take an object of <codeclass="docutils literal"><spanclass="pre">BaseErrorClipAttr</span></code>‘s derived class. So far, <codeclass="docutils literal"><spanclass="pre">BaseErrorClipAttr</span></code> has only one derived class: <codeclass="docutils literal"><spanclass="pre">ErrorClipByValue</span></code>, whose constructor is:</p>
<p><codeclass="docutils literal"><spanclass="pre">max</span></code> and <codeclass="docutils literal"><spanclass="pre">min</span></code> represent the maximal and minimal clip threshold respectively. In backward pass, all values of <codeclass="docutils literal"><spanclass="pre">var</span></code>‘s gradient greater than <codeclass="docutils literal"><spanclass="pre">max</span></code> or less than <codeclass="docutils literal"><spanclass="pre">min</span></code> will be clipped to <codeclass="docutils literal"><spanclass="pre">max</span></code> and <codeclass="docutils literal"><spanclass="pre">min</span></code> respectively. When the <codeclass="docutils literal"><spanclass="pre">min</span></code> is None, the minimal threshold will be assigned with <codeclass="docutils literal"><spanclass="pre">-max</span></code> automatically.</p>
<p>So we can enable the error clip with threshold <codeclass="docutils literal"><spanclass="pre">[-5.0,</span><spanclass="pre">5.0]</span></code> for variable <codeclass="docutils literal"><spanclass="pre">var</span></code> by:</p>
<p>The <codeclass="docutils literal"><spanclass="pre">BaseErrorClipAttr</span></code> and its derived class <codeclass="docutils literal"><spanclass="pre">ErrorClipByValue</span></code> are defined in <em>clip.py</em>.</p>
<p>The <codeclass="docutils literal"><spanclass="pre">BaseErrorClipAttr</span></code> have one main member functions: <codeclass="docutils literal"><spanclass="pre">append_clip_op(self,</span><spanclass="pre">block,</span><spanclass="pre">grad_name)</span></code>.</p>
<p>This function is used to create a <codeclass="docutils literal"><spanclass="pre">clip_op</span></code> and append it to the end of given <codeclass="docutils literal"><spanclass="pre">block</span></code>. For different error clip algorithm require different <codeclass="docutils literal"><spanclass="pre">clip_op</span></code>, the function is defined as virtual in the base class. All derived classes must implement their own versions of this function.</p>
<p>These <codeclass="docutils literal"><spanclass="pre">clip_op</span></code>s should be inserted after <codeclass="docutils literal"><spanclass="pre">grad_op</span></code>s whose output gradients need to be clipped. It is equivalent to appending some <codeclass="docutils literal"><spanclass="pre">clip_op</span></code>s to the end of the target block every time a new <codeclass="docutils literal"><spanclass="pre">grad_op</span></code> is added.</p>
<p>Here we employ a callback function to complete this kind of jobs. In <codeclass="docutils literal"><spanclass="pre">_append_backward_ops_</span></code> function, each time after a <codeclass="docutils literal"><spanclass="pre">grad_op</span></code> is added to the <codeclass="docutils literal"><spanclass="pre">target_block</span></code>, a callback function is invoked. The logic of <codeclass="docutils literal"><spanclass="pre">clip_op</span></code> appending can be implemented inside the callback function.</p>
<p>The callback function for <codeclass="docutils literal"><spanclass="pre">clip_op</span></code> appending is defined in <em>clip.py</em>:</p>
<p>This function takes a <codeclass="docutils literal"><spanclass="pre">block</span></code> and a <codeclass="docutils literal"><spanclass="pre">context</span></code>(which is actually a grad_to_var map) as inputs. It checks each output of the last <codeclass="docutils literal"><spanclass="pre">OpDesc</span></code> in the <codeclass="docutils literal"><spanclass="pre">block</span></code>. Notice that the last <codeclass="docutils literal"><spanclass="pre">OpDesc</span></code> of the <codeclass="docutils literal"><spanclass="pre">block</span></code> must be a <codeclass="docutils literal"><spanclass="pre">grad_op</span></code> and its outputs must be some forward variables’ gradients. If an output gradient’s corresponding forward variable has an attribute of <codeclass="docutils literal"><spanclass="pre">error_clip</span></code>, <codeclass="docutils literal"><spanclass="pre">error_clip_callback</span></code> will call the <codeclass="docutils literal"><spanclass="pre">error_clip</span></code>‘s <codeclass="docutils literal"><spanclass="pre">append_clip_op</span></code> function to append the required <codeclass="docutils literal"><spanclass="pre">clip_op</span></code> into the <codeclass="docutils literal"><spanclass="pre">block</span></code>.</p>
Built with <ahref="http://sphinx-doc.org/">Sphinx</a> using a <ahref="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <ahref="https://readthedocs.org">Read the Docs</a>.