- Operator forward computing is easy to check if the result is right because it has a clear definition. **But** backpropagation is a notoriously difficult algorithm to debug and get right:
- 1. you should get the right backpropagation formula according to the forward computation.
- 2. you should implement it right in CPP.
- 3. it's difficult to prepare test data.
- Auto gradient check gets a numeric gradient by forward Operator and use it as a reference of the backward Operator's result. It has several advantages:
- 1. numeric gradient checker only need forward operator.
- 2. user only need to prepare the input data for forward Operator.
## Mathematical Theory
The following two document from stanford has a detailed explanation of how to get numeric gradient and why it's useful.
- [Gradient checking and advanced optimization(en)](http://deeplearning.stanford.edu/wiki/index.php/Gradient_checking_and_advanced_optimization)
- [Gradient checking and advanced optimization(cn)](http://ufldl.stanford.edu/wiki/index.php/%E6%A2%AF%E5%BA%A6%E6%A3%80%E9%AA%8C%E4%B8%8E%E9%AB%98%E7%BA%A7%E4%BC%98%E5%8C%96)
## Numeric Gradient Implementation
### Python Interface
```python
def get_numeric_gradient(op,
input_values,
output_name,
input_to_check,
delta=0.005,
local_scope=None):
"""
Get Numeric Gradient for an operator's input.
:param op: C++ operator instance, could be an network
:param input_values: The input variables. Should be an dictionary, key is
variable name. Value is numpy array.
:param output_name: The final output variable name.
:param input_to_check: The input variable need to get gradient.
:param delta: The perturbation value for numeric gradient method. The
smaller delta is, the more accurate result will get. But if that delta is
too small, it could occur numerical stability problem.
:param local_scope: The local scope used for get_numeric_gradient.
:return: The gradient array in numpy format.
"""
```
### Explaination:
- Why need `output_name`
- One Operator may have multiple Output, you can get independent gradient from each Output. So user should set one output to calculate.
- Why need `input_to_check`
- One operator may have multiple inputs. Gradient Op can calculate the gradient of these Inputs at the same time. But Numeric Gradient needs to calculate them one by one. So `get_numeric_gradient` is designed to calculate the gradient for one input. If you need to compute multiple inputs, you can call `get_numeric_gradient` multiple times.
### Core Algorithm Implementation
```python
# we only compute gradient of one element each time.
# we use a for loop to compute the gradient of every element.
for i in xrange(tensor_size):
# get one input element throw it's index i.
origin = tensor_to_check.get_float_element(i)
# add delta to it, run op and then get the sum of the result tensor.
x_pos = origin + delta
tensor_to_check.set_float_element(i, x_pos)
y_pos = get_output()
# plus delta to this element, run op and get the sum of the result tensor.
x_neg = origin - delta
tensor_to_check.set_float_element(i, x_neg)
y_neg = get_output()
# restore old value
tensor_to_check.set_float_element(i, origin)
# compute the gradient of this element and store it into a numpy array.
gradient_flat[i] = (y_pos - y_neg) / delta / 2
# reshape the gradient result to the shape of the source tensor.
<liclass="toctree-l2"><aclass="reference internal"href="../getstarted/build_and_install/index_en.html">Install and Build</a><ul>
<liclass="toctree-l3"><aclass="reference internal"href="../getstarted/build_and_install/docker_install_en.html">PaddlePaddle in Docker Containers</a></li>
<liclass="toctree-l2"><aclass="reference internal"href="../howto/usage/k8s/k8s_en.html">Paddle On Kubernetes</a></li>
<liclass="toctree-l2"><aclass="reference internal"href="../howto/usage/k8s/k8s_aws_en.html">Distributed PaddlePaddle Training on AWS with Kubernetes</a></li>
<liclass="toctree-l2"><aclass="reference internal"href="../howto/dev/new_layer_en.html">Write New Layers</a></li>
<spanid="auto-gradient-checker-design"></span><h1>Auto Gradient Checker Design<aclass="headerlink"href="#auto-gradient-checker-design"title="Permalink to this headline">¶</a></h1>
</div>
<divclass="section"id="backgraound">
<spanid="backgraound"></span><h1>Backgraound:<aclass="headerlink"href="#backgraound"title="Permalink to this headline">¶</a></h1>
<ulclass="simple">
<li>Operator forward computing is easy to check if the result is right because it has a clear definition. <strong>But</strong> backpropagation is a notoriously difficult algorithm to debug and get right:<ul>
<li><olclass="first">
<li>you should get the right backpropagation formula according to the forward computation.</li>
</ol>
</li>
<li><olclass="first">
<li>you should implement it right in CPP.</li>
</ol>
</li>
<li><olclass="first">
<li>it’s difficult to prepare test data.</li>
</ol>
</li>
</ul>
</li>
<li>Auto gradient check gets a numeric gradient by forward Operator and use it as a reference of the backward Operator’s result. It has several advantages:<ul>
<li><olclass="first">
<li>numeric gradient checker only need forward operator.</li>
</ol>
</li>
<li><olclass="first">
<li>user only need to prepare the input data for forward Operator.</li>
</ol>
</li>
</ul>
</li>
</ul>
</div>
<divclass="section"id="mathematical-theory">
<spanid="mathematical-theory"></span><h1>Mathematical Theory<aclass="headerlink"href="#mathematical-theory"title="Permalink to this headline">¶</a></h1>
<p>The following two document from stanford has a detailed explanation of how to get numeric gradient and why it’s useful.</p>
<divclass="toctree-wrapper compound">
<ul>
<liclass="toctree-l1"><aclass="reference external"href="http://deeplearning.stanford.edu/wiki/index.php/Gradient_checking_and_advanced_optimization">Gradient checking and advanced optimization(en)</a></li>
<liclass="toctree-l1"><aclass="reference external"href="http://ufldl.stanford.edu/wiki/index.php/%E6%A2%AF%E5%BA%A6%E6%A3%80%E9%AA%8C%E4%B8%8E%E9%AB%98%E7%BA%A7%E4%BC%98%E5%8C%96">Gradient checking and advanced optimization(cn)</a></li>
<spanid="numeric-gradient-implementation"></span><h1>Numeric Gradient Implementation<aclass="headerlink"href="#numeric-gradient-implementation"title="Permalink to this headline">¶</a></h1>
<divclass="section"id="python-interface">
<spanid="python-interface"></span><h2>Python Interface<aclass="headerlink"href="#python-interface"title="Permalink to this headline">¶</a></h2>
<spanclass="sd"> Get Numeric Gradient for an operator's input.</span>
<spanclass="sd"> :param op: C++ operator instance, could be an network</span>
<spanclass="sd"> :param input_values: The input variables. Should be an dictionary, key is</span>
<spanclass="sd"> variable name. Value is numpy array.</span>
<spanclass="sd"> :param output_name: The final output variable name.</span>
<spanclass="sd"> :param input_to_check: The input variable need to get gradient.</span>
<spanclass="sd"> :param delta: The perturbation value for numeric gradient method. The</span>
<spanclass="sd"> smaller delta is, the more accurate result will get. But if that delta is</span>
<spanclass="sd"> too small, it could occur numerical stability problem.</span>
<spanclass="sd"> :param local_scope: The local scope used for get_numeric_gradient.</span>
<spanclass="sd"> :return: The gradient array in numpy format.</span>
<spanclass="sd">"""</span>
</pre></div>
</div>
</div>
<divclass="section"id="explaination">
<spanid="explaination"></span><h2>Explaination:<aclass="headerlink"href="#explaination"title="Permalink to this headline">¶</a></h2>
<ulclass="simple">
<li>Why need <codeclass="docutils literal"><spanclass="pre">output_name</span></code><ul>
<li>One Operator may have multiple Output, you can get independent gradient from each Output. So user should set one output to calculate.</li>
</ul>
</li>
<li>Why need <codeclass="docutils literal"><spanclass="pre">input_to_check</span></code><ul>
<li>One operator may have multiple inputs. Gradient Op can calculate the gradient of these Inputs at the same time. But Numeric Gradient needs to calculate them one by one. So <codeclass="docutils literal"><spanclass="pre">get_numeric_gradient</span></code> is designed to calculate the gradient for one input. If you need to compute multiple inputs, you can call <codeclass="docutils literal"><spanclass="pre">get_numeric_gradient</span></code> multiple times.</li>
<spanid="core-algorithm-implementation"></span><h2>Core Algorithm Implementation<aclass="headerlink"href="#core-algorithm-implementation"title="Permalink to this headline">¶</a></h2>
<divclass="highlight-python"><divclass="highlight"><pre><span></span><spanclass="c1"># we only compute gradient of one element each time.</span>
<spanclass="c1"># we use a for loop to compute the gradient of every element.</span>
<spanid="auto-graident-checker-framework"></span><h1>Auto Graident Checker Framework<aclass="headerlink"href="#auto-graident-checker-framework"title="Permalink to this headline">¶</a></h1>
<p>Each Operator Kernel has three kinds of Gradient:</p>
<ulclass="simple">
<li><olclass="first">
<li>Numeric Gradient</li>
</ol>
</li>
<li><olclass="first">
<li>CPU Operator Gradient</li>
</ol>
</li>
<li><olclass="first">
<li>GPU Operator Gradient(if supported)</li>
</ol>
</li>
</ul>
<p>Numeric Gradient Only relies on forward Operator. So we use Numeric Gradient as the reference value.</p>
<ulclass="simple">
<li><olclass="first">
<li>calculate the numeric gradient.</li>
</ol>
</li>
<li><olclass="first">
<li>calculate CPU kernel Gradient with the backward Operator and compare it with the numeric gradient.</li>
</ol>
</li>
<li><olclass="first">
<li>calculate GPU kernel Gradient with the backward Operator and compare it with the numeric gradient.(if support GPU)</li>
</ol>
</li>
</ul>
<divclass="section"id="python-interface">
<spanid="id1"></span><h2>Python Interface<aclass="headerlink"href="#python-interface"title="Permalink to this headline">¶</a></h2>
<spanid="how-to-check-if-two-numpy-array-is-close-enough"></span><h2>How to check if two numpy array is close enough?<aclass="headerlink"href="#how-to-check-if-two-numpy-array-is-close-enough"title="Permalink to this headline">¶</a></h2>
<p>if <codeclass="docutils literal"><spanclass="pre">abs_numeric_grad</span></code> is nearly zero, then use abs error for numeric_grad, not relative</p>
Built with <ahref="http://sphinx-doc.org/">Sphinx</a> using a <ahref="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <ahref="https://readthedocs.org">Read the Docs</a>.
- Operator forward computing is easy to check if the result is right because it has a clear definition. **But** backpropagation is a notoriously difficult algorithm to debug and get right:
- 1. you should get the right backpropagation formula according to the forward computation.
- 2. you should implement it right in CPP.
- 3. it's difficult to prepare test data.
- Auto gradient check gets a numeric gradient by forward Operator and use it as a reference of the backward Operator's result. It has several advantages:
- 1. numeric gradient checker only need forward operator.
- 2. user only need to prepare the input data for forward Operator.
## Mathematical Theory
The following two document from stanford has a detailed explanation of how to get numeric gradient and why it's useful.
- [Gradient checking and advanced optimization(en)](http://deeplearning.stanford.edu/wiki/index.php/Gradient_checking_and_advanced_optimization)
- [Gradient checking and advanced optimization(cn)](http://ufldl.stanford.edu/wiki/index.php/%E6%A2%AF%E5%BA%A6%E6%A3%80%E9%AA%8C%E4%B8%8E%E9%AB%98%E7%BA%A7%E4%BC%98%E5%8C%96)
## Numeric Gradient Implementation
### Python Interface
```python
def get_numeric_gradient(op,
input_values,
output_name,
input_to_check,
delta=0.005,
local_scope=None):
"""
Get Numeric Gradient for an operator's input.
:param op: C++ operator instance, could be an network
:param input_values: The input variables. Should be an dictionary, key is
variable name. Value is numpy array.
:param output_name: The final output variable name.
:param input_to_check: The input variable need to get gradient.
:param delta: The perturbation value for numeric gradient method. The
smaller delta is, the more accurate result will get. But if that delta is
too small, it could occur numerical stability problem.
:param local_scope: The local scope used for get_numeric_gradient.
:return: The gradient array in numpy format.
"""
```
### Explaination:
- Why need `output_name`
- One Operator may have multiple Output, you can get independent gradient from each Output. So user should set one output to calculate.
- Why need `input_to_check`
- One operator may have multiple inputs. Gradient Op can calculate the gradient of these Inputs at the same time. But Numeric Gradient needs to calculate them one by one. So `get_numeric_gradient` is designed to calculate the gradient for one input. If you need to compute multiple inputs, you can call `get_numeric_gradient` multiple times.
### Core Algorithm Implementation
```python
# we only compute gradient of one element each time.
# we use a for loop to compute the gradient of every element.
for i in xrange(tensor_size):
# get one input element throw it's index i.
origin = tensor_to_check.get_float_element(i)
# add delta to it, run op and then get the sum of the result tensor.
x_pos = origin + delta
tensor_to_check.set_float_element(i, x_pos)
y_pos = get_output()
# plus delta to this element, run op and get the sum of the result tensor.
x_neg = origin - delta
tensor_to_check.set_float_element(i, x_neg)
y_neg = get_output()
# restore old value
tensor_to_check.set_float_element(i, origin)
# compute the gradient of this element and store it into a numpy array.
gradient_flat[i] = (y_pos - y_neg) / delta / 2
# reshape the gradient result to the shape of the source tensor.
<li>Operator forward computing is easy to check if the result is right because it has a clear definition. <strong>But</strong> backpropagation is a notoriously difficult algorithm to debug and get right:<ul>
<li><olclass="first">
<li>you should get the right backpropagation formula according to the forward computation.</li>
</ol>
</li>
<li><olclass="first">
<li>you should implement it right in CPP.</li>
</ol>
</li>
<li><olclass="first">
<li>it’s difficult to prepare test data.</li>
</ol>
</li>
</ul>
</li>
<li>Auto gradient check gets a numeric gradient by forward Operator and use it as a reference of the backward Operator’s result. It has several advantages:<ul>
<li><olclass="first">
<li>numeric gradient checker only need forward operator.</li>
</ol>
</li>
<li><olclass="first">
<li>user only need to prepare the input data for forward Operator.</li>
<li>Why need <codeclass="docutils literal"><spanclass="pre">output_name</span></code><ul>
<li>One Operator may have multiple Output, you can get independent gradient from each Output. So user should set one output to calculate.</li>
</ul>
</li>
<li>Why need <codeclass="docutils literal"><spanclass="pre">input_to_check</span></code><ul>
<li>One operator may have multiple inputs. Gradient Op can calculate the gradient of these Inputs at the same time. But Numeric Gradient needs to calculate them one by one. So <codeclass="docutils literal"><spanclass="pre">get_numeric_gradient</span></code> is designed to calculate the gradient for one input. If you need to compute multiple inputs, you can call <codeclass="docutils literal"><spanclass="pre">get_numeric_gradient</span></code> multiple times.</li>
<spanid="how-to-check-if-two-numpy-array-is-close-enough"></span><h2>How to check if two numpy array is close enough?<aclass="headerlink"href="#how-to-check-if-two-numpy-array-is-close-enough"title="永久链接至标题">¶</a></h2>
<p>if <codeclass="docutils literal"><spanclass="pre">abs_numeric_grad</span></code> is nearly zero, then use abs error for numeric_grad, not relative</p>
Built with <ahref="http://sphinx-doc.org/">Sphinx</a> using a <ahref="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <ahref="https://readthedocs.org">Read the Docs</a>.