Deploy to GitHub Pages: b9024492

d451b098 · Travis CI · 99326a46 · d451b098 · d451b098 · d451b098
8 changed file
--- a/develop/doc/_sources/design/auto_gradient_check.md.txt
+++ b/develop/doc/_sources/design/auto_gradient_check.md.txt
-## Auto Gradient Checker Design
+## Auto Gradient Check Design
-## Backgraound：
+## Background：
- Generally, it is easy to check whether the forward computation of an Operator is correct or not. However, backpropagation is a notoriously difficult algorithm to debug and get right:
+- Generally, it is easy to check whether the forward computation of an Operator is correct or not. However, backpropagation is a notoriously difficult algorithm to debug and get right because of the following challenges:
-  1. you should get the right backpropagation formula according to the forward computation.
+  1. The formula for backpropagation formula should be correct according to the forward computation.
-  2. you should implement it right in CPP.
+  2. The Implementation of the above shoule be correct in CPP.
-  3. it's difficult to prepare test data.
+  3. It is difficult to prepare an unbiased test data.
- Auto gradient checking gets a numerical gradient by forward Operator and use it as a reference of the backward Operator's result. It has several advantages:
+- Auto gradient checking gets a numerical gradient using forward Operator and uses it as a reference for the backward Operator's result. It has several advantages:
-  1. numerical gradient checker only need forward operator.
+  1. Numerical gradient checker only needs the forward operator.
-  2. user only need to prepare the input data for forward Operator.
+  2. The user only needs to prepare the input data for forward Operator and not worry about the backward Operator.
 ## Mathematical Theory
-The following two document from Stanford has a detailed explanation of how to get numerical gradient and why it's useful.
+The following documents from Stanford have a detailed explanation of how to compute the numerical gradient and why it is useful.
 - [Gradient checking and advanced optimization(en)](http://deeplearning.stanford.edu/wiki/index.php/Gradient_checking_and_advanced_optimization)
 - [Gradient checking and advanced optimization(cn)](http://ufldl.stanford.edu/wiki/index.php/%E6%A2%AF%E5%BA%A6%E6%A3%80%E9%AA%8C%E4%B8%8E%E9%AB%98%E7%BA%A7%E4%BC%98%E5%8C%96)
-## Numeric Gradient Implementation
+## Numerical Gradient Implementation
 ### Python Interface
 ```python
 def get_numerical_gradient(op,
@@ -27,73 +27,76 @@ def get_numerical_gradient(op,
                         delta=0.005,
                         local_scope=None):
    """
-    Get Numeric Gradient for an operator's input.
+    Get Numerical Gradient for the input of an operator.
-    :param op: C++ operator instance, could be an network
+    :param op: C++ operator instance, could be an network.
    :param input_values: The input variables. Should be an dictionary, whose key is
-    variable name, and value is numpy array.
+    variable name, and value is a numpy array.
    :param output_name: The final output variable name.
-    :param input_to_check: The input variable with respect to which to compute the gradient.
+    :param input_to_check: The input variable with respect to which the gradient has to be computed.
-    :param delta: The perturbation value for numeric gradient method. The
+    :param delta: The perturbation value for numerical gradient method. The
-    smaller delta is, the more accurate result will get. But if that delta is
+    smaller the delta, the more accurate the result. But if the delta is too
-     too small, it will suffer from numerical stability problem.
+    small, it will suffer from the numerical stability problem.
    :param local_scope: The local scope used for get_numeric_gradient.
    :return: The gradient array in numpy format.
    """
 ```
-### Explaination:
+### Explanation:
- Why need `output_name`
+- Why do we need an `output_name`
-  - An Operator may have multiple Output, one can get independent gradient from each Output. So caller should specify the name of the output variable.
+  - An Operator may have multiple Outputs, one can compute an independent gradient from each Output. So the caller should specify the name of the output variable.
- Why need `input_to_check`
+- Why do we need `input_to_check`
-  - One operator may have multiple inputs. Gradient Op can calculate the gradient of these inputs at the same time. But Numeric Gradient needs to calculate them one by one. So `get_numeric_gradient` is designed to calculate the gradient for one input. If you need to compute multiple inputs, you can call `get_numeric_gradient` multiple times.
+  - One operator can have multiple inputs. Gradient Op can calculate the gradient of these inputs at the same time. But Numerical Gradient needs to calculate them one by one. So `get_numeric_gradient` is designed to calculate the gradient for one input. If you need to compute multiple inputs, you can call `get_numeric_gradient` multiple times each with a different input.
 ### Core Algorithm Implementation
 ```python
-    # we only compute gradient of one element a time.
+    # we only compute the gradient of one element a time.
    # we use a for loop to compute the gradient of each element.
    for i in xrange(tensor_size):
-        # get one input element by its index i.
+        # get one input element using the index i.
-        origin = tensor_to_check.get_float_element(i)
+        original = tensor_to_check.get_float_element(i)
-        # add delta to it, run op and then get the new value of the result tensor.
+        # add delta to it, run the forward op and then
-        x_pos = origin + delta
+        # get the new value of the result tensor.
+        x_pos = original + delta
        tensor_to_check.set_float_element(i, x_pos)
        y_pos = get_output()
-        # plus delta to this element, run op and get the new value of the result tensor.
+        # Subtract delta from this element, run the op again
-        x_neg = origin - delta
+        # and get the new value of the result tensor.
+        x_neg = original - delta
        tensor_to_check.set_float_element(i, x_neg)
        y_neg = get_output()
        # restore old value
-        tensor_to_check.set_float_element(i, origin)
+        tensor_to_check.set_float_element(i, original)
-        # compute the gradient of this element and store it into a numpy array.
+        # compute the gradient of this element and store
+        # it into a numpy array.
        gradient_flat[i] = (y_pos - y_neg) / delta / 2
    # reshape the gradient result to the shape of the source tensor.
    return gradient_flat.reshape(tensor_to_check.get_dims())
 ```
-## Auto Graident Checker Framework
+## Auto Gradient Check Framework
 Each Operator Kernel has three kinds of Gradient:
 1. Numerical gradient
 2. CPU kernel gradient
-3. GPU kernel gradient (if supported)
+3. GPU kernel gradient (if supported by the device)
-The numerical gradient only relies on forward Operator. So we use the numerical gradient as the reference value. And the gradient checking is performed in the following three steps:
+The numerical gradient only relies on the forward Operator, so we use the numerical gradient as the reference value. The gradient checking is performed in the following three steps:
-1. calculate the numerical gradient
+1. Calculate the numerical gradient
-2. calculate CPU kernel gradient with the backward Operator and compare it with the numerical gradient
+2. Calculate CPU kernel gradient with the backward Operator and compare it with the numerical gradient.
-3. calculate GPU kernel gradient with the backward Operator and compare it with the numeric gradient (if supported)
+3. Calculate GPU kernel gradient with the backward Operator and compare it with the numeric gradient. (if supported)
 #### Python Interface
@@ -109,26 +112,27 @@ The numerical gradient only relies on forward Operator. So we use the numerical
        """
        :param forward_op: used to create backward_op
        :param input_vars: numpy value of input variable. The following
-            computation will use these variables.
+          computation will use these variables.
-        :param inputs_to_check: the input variable with respect to which to compute the gradient.
+        :param inputs_to_check: the input variable with respect to which the
+          gradient will be computed.
        :param output_name: The final output variable name.
        :param max_relative_error: The relative tolerance parameter.
-        :param no_grad_set: used when create backward ops
+        :param no_grad_set: used to create backward ops
        :param only_cpu: only compute and check gradient on cpu kernel.
        :return:
        """
 ```
-### How to check if two numpy array is close enough?
+### How to check if two numpy arrays are close enough?
-if `abs_numerical_grad` is nearly zero, then use abs error for numerical_grad
+if `abs_numerical_grad` is nearly zero, then use absolute error for numerical_grad.
 ```python
 numerical_grad = ...
 operator_grad = numpy.array(scope.find_var(grad_var_name(name)).get_tensor())
 abs_numerical_grad = numpy.abs(numerical_grad)
-# if abs_numerical_grad is nearly zero, then use abs error for numeric_grad, not relative
+# if abs_numerical_grad is nearly zero, then use abs error for
-# error.
+# numeric_grad, instead of relative error.
 abs_numerical_grad[abs_numerical_grad < 1e-3] = 1
 diff_mat = numpy.abs(abs_numerical_grad - operator_grad) / abs_numerical_grad
@@ -137,10 +141,10 @@ max_diff = numpy.max(diff_mat)
 #### Notes：
-The Input data for auto gradient checker should be reasonable to avoid numerical  stability problem.
+The Input data for auto gradient checker should be reasonable to avoid numerical stability problem.
-#### Refs:
+#### References:
 - [Gradient checking and advanced optimization(en)](http://deeplearning.stanford.edu/wiki/index.php/Gradient_checking_and_advanced_optimization)
 - [Gradient checking and advanced optimization(cn)](http://ufldl.stanford.edu/wiki/index.php/%E6%A2%AF%E5%BA%A6%E6%A3%80%E9%AA%8C%E4%B8%8E%E9%AB%98%E7%BA%A7%E4%BC%98%E5%8C%96)
--- a/develop/doc/design/auto_gradient_check.html
+++ b/develop/doc/design/auto_gradient_check.html
--- a/develop/doc/objects.inv
+++ b/develop/doc/objects.inv
--- a/develop/doc/searchindex.js
+++ b/develop/doc/searchindex.js
--- a/develop/doc_cn/_sources/design/auto_gradient_check.md.txt
+++ b/develop/doc_cn/_sources/design/auto_gradient_check.md.txt
-## Auto Gradient Checker Design
+## Auto Gradient Check Design
-## Backgraound：
+## Background：
- Generally, it is easy to check whether the forward computation of an Operator is correct or not. However, backpropagation is a notoriously difficult algorithm to debug and get right:
+- Generally, it is easy to check whether the forward computation of an Operator is correct or not. However, backpropagation is a notoriously difficult algorithm to debug and get right because of the following challenges:
-  1. you should get the right backpropagation formula according to the forward computation.
+  1. The formula for backpropagation formula should be correct according to the forward computation.
-  2. you should implement it right in CPP.
+  2. The Implementation of the above shoule be correct in CPP.
-  3. it's difficult to prepare test data.
+  3. It is difficult to prepare an unbiased test data.
- Auto gradient checking gets a numerical gradient by forward Operator and use it as a reference of the backward Operator's result. It has several advantages:
+- Auto gradient checking gets a numerical gradient using forward Operator and uses it as a reference for the backward Operator's result. It has several advantages:
-  1. numerical gradient checker only need forward operator.
+  1. Numerical gradient checker only needs the forward operator.
-  2. user only need to prepare the input data for forward Operator.
+  2. The user only needs to prepare the input data for forward Operator and not worry about the backward Operator.
 ## Mathematical Theory
-The following two document from Stanford has a detailed explanation of how to get numerical gradient and why it's useful.
+The following documents from Stanford have a detailed explanation of how to compute the numerical gradient and why it is useful.
 - [Gradient checking and advanced optimization(en)](http://deeplearning.stanford.edu/wiki/index.php/Gradient_checking_and_advanced_optimization)
 - [Gradient checking and advanced optimization(cn)](http://ufldl.stanford.edu/wiki/index.php/%E6%A2%AF%E5%BA%A6%E6%A3%80%E9%AA%8C%E4%B8%8E%E9%AB%98%E7%BA%A7%E4%BC%98%E5%8C%96)
-## Numeric Gradient Implementation
+## Numerical Gradient Implementation
 ### Python Interface
 ```python
 def get_numerical_gradient(op,
@@ -27,73 +27,76 @@ def get_numerical_gradient(op,
                         delta=0.005,
                         local_scope=None):
    """
-    Get Numeric Gradient for an operator's input.
+    Get Numerical Gradient for the input of an operator.
-    :param op: C++ operator instance, could be an network
+    :param op: C++ operator instance, could be an network.
    :param input_values: The input variables. Should be an dictionary, whose key is
-    variable name, and value is numpy array.
+    variable name, and value is a numpy array.
    :param output_name: The final output variable name.
-    :param input_to_check: The input variable with respect to which to compute the gradient.
+    :param input_to_check: The input variable with respect to which the gradient has to be computed.
-    :param delta: The perturbation value for numeric gradient method. The
+    :param delta: The perturbation value for numerical gradient method. The
-    smaller delta is, the more accurate result will get. But if that delta is
+    smaller the delta, the more accurate the result. But if the delta is too
-     too small, it will suffer from numerical stability problem.
+    small, it will suffer from the numerical stability problem.
    :param local_scope: The local scope used for get_numeric_gradient.
    :return: The gradient array in numpy format.
    """
 ```
-### Explaination:
+### Explanation:
- Why need `output_name`
+- Why do we need an `output_name`
-  - An Operator may have multiple Output, one can get independent gradient from each Output. So caller should specify the name of the output variable.
+  - An Operator may have multiple Outputs, one can compute an independent gradient from each Output. So the caller should specify the name of the output variable.
- Why need `input_to_check`
+- Why do we need `input_to_check`
-  - One operator may have multiple inputs. Gradient Op can calculate the gradient of these inputs at the same time. But Numeric Gradient needs to calculate them one by one. So `get_numeric_gradient` is designed to calculate the gradient for one input. If you need to compute multiple inputs, you can call `get_numeric_gradient` multiple times.
+  - One operator can have multiple inputs. Gradient Op can calculate the gradient of these inputs at the same time. But Numerical Gradient needs to calculate them one by one. So `get_numeric_gradient` is designed to calculate the gradient for one input. If you need to compute multiple inputs, you can call `get_numeric_gradient` multiple times each with a different input.
 ### Core Algorithm Implementation
 ```python
-    # we only compute gradient of one element a time.
+    # we only compute the gradient of one element a time.
    # we use a for loop to compute the gradient of each element.
    for i in xrange(tensor_size):
-        # get one input element by its index i.
+        # get one input element using the index i.
-        origin = tensor_to_check.get_float_element(i)
+        original = tensor_to_check.get_float_element(i)
-        # add delta to it, run op and then get the new value of the result tensor.
+        # add delta to it, run the forward op and then
-        x_pos = origin + delta
+        # get the new value of the result tensor.
+        x_pos = original + delta
        tensor_to_check.set_float_element(i, x_pos)
        y_pos = get_output()
-        # plus delta to this element, run op and get the new value of the result tensor.
+        # Subtract delta from this element, run the op again
-        x_neg = origin - delta
+        # and get the new value of the result tensor.
+        x_neg = original - delta
        tensor_to_check.set_float_element(i, x_neg)
        y_neg = get_output()
        # restore old value
-        tensor_to_check.set_float_element(i, origin)
+        tensor_to_check.set_float_element(i, original)
-        # compute the gradient of this element and store it into a numpy array.
+        # compute the gradient of this element and store
+        # it into a numpy array.
        gradient_flat[i] = (y_pos - y_neg) / delta / 2
    # reshape the gradient result to the shape of the source tensor.
    return gradient_flat.reshape(tensor_to_check.get_dims())
 ```
-## Auto Graident Checker Framework
+## Auto Gradient Check Framework
 Each Operator Kernel has three kinds of Gradient:
 1. Numerical gradient
 2. CPU kernel gradient
-3. GPU kernel gradient (if supported)
+3. GPU kernel gradient (if supported by the device)
-The numerical gradient only relies on forward Operator. So we use the numerical gradient as the reference value. And the gradient checking is performed in the following three steps:
+The numerical gradient only relies on the forward Operator, so we use the numerical gradient as the reference value. The gradient checking is performed in the following three steps:
-1. calculate the numerical gradient
+1. Calculate the numerical gradient
-2. calculate CPU kernel gradient with the backward Operator and compare it with the numerical gradient
+2. Calculate CPU kernel gradient with the backward Operator and compare it with the numerical gradient.
-3. calculate GPU kernel gradient with the backward Operator and compare it with the numeric gradient (if supported)
+3. Calculate GPU kernel gradient with the backward Operator and compare it with the numeric gradient. (if supported)
 #### Python Interface
@@ -109,26 +112,27 @@ The numerical gradient only relies on forward Operator. So we use the numerical
        """
        :param forward_op: used to create backward_op
        :param input_vars: numpy value of input variable. The following
-            computation will use these variables.
+          computation will use these variables.
-        :param inputs_to_check: the input variable with respect to which to compute the gradient.
+        :param inputs_to_check: the input variable with respect to which the
+          gradient will be computed.
        :param output_name: The final output variable name.
        :param max_relative_error: The relative tolerance parameter.
-        :param no_grad_set: used when create backward ops
+        :param no_grad_set: used to create backward ops
        :param only_cpu: only compute and check gradient on cpu kernel.
        :return:
        """
 ```
-### How to check if two numpy array is close enough?
+### How to check if two numpy arrays are close enough?
-if `abs_numerical_grad` is nearly zero, then use abs error for numerical_grad
+if `abs_numerical_grad` is nearly zero, then use absolute error for numerical_grad.
 ```python
 numerical_grad = ...
 operator_grad = numpy.array(scope.find_var(grad_var_name(name)).get_tensor())
 abs_numerical_grad = numpy.abs(numerical_grad)
-# if abs_numerical_grad is nearly zero, then use abs error for numeric_grad, not relative
+# if abs_numerical_grad is nearly zero, then use abs error for
-# error.
+# numeric_grad, instead of relative error.
 abs_numerical_grad[abs_numerical_grad < 1e-3] = 1
 diff_mat = numpy.abs(abs_numerical_grad - operator_grad) / abs_numerical_grad
@@ -137,10 +141,10 @@ max_diff = numpy.max(diff_mat)
 #### Notes：
-The Input data for auto gradient checker should be reasonable to avoid numerical  stability problem.
+The Input data for auto gradient checker should be reasonable to avoid numerical stability problem.
-#### Refs:
+#### References:
 - [Gradient checking and advanced optimization(en)](http://deeplearning.stanford.edu/wiki/index.php/Gradient_checking_and_advanced_optimization)
 - [Gradient checking and advanced optimization(cn)](http://ufldl.stanford.edu/wiki/index.php/%E6%A2%AF%E5%BA%A6%E6%A3%80%E9%AA%8C%E4%B8%8E%E9%AB%98%E7%BA%A7%E4%BC%98%E5%8C%96)
--- a/develop/doc_cn/design/auto_gradient_check.html
+++ b/develop/doc_cn/design/auto_gradient_check.html
@@ -8,7 +8,7 @@
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
-  <title>Auto Gradient Checker Design &mdash; PaddlePaddle  文档</title>
+  <title>Auto Gradient Check Design &mdash; PaddlePaddle  文档</title>
@@ -212,7 +212,7 @@
 <div role="navigation" aria-label="breadcrumbs navigation">
  <ul class="wy-breadcrumbs">
-    <li>Auto Gradient Checker Design</li>
+    <li>Auto Gradient Check Design</li>
  </ul>
 </div>
@@ -221,28 +221,28 @@
          <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
           <div itemprop="articleBody">
-  <div class="section" id="auto-gradient-checker-design">
+  <div class="section" id="auto-gradient-check-design">
-<span id="auto-gradient-checker-design"></span><h1>Auto Gradient Checker Design<a class="headerlink" href="#auto-gradient-checker-design" title="永久链接至标题">¶</a></h1>
+<span id="auto-gradient-check-design"></span><h1>Auto Gradient Check Design<a class="headerlink" href="#auto-gradient-check-design" title="永久链接至标题">¶</a></h1>
 </div>
-<div class="section" id="backgraound">
+<div class="section" id="background">
-<span id="backgraound"></span><h1>Backgraound：<a class="headerlink" href="#backgraound" title="永久链接至标题">¶</a></h1>
+<span id="background"></span><h1>Background：<a class="headerlink" href="#background" title="永久链接至标题">¶</a></h1>
 <ul class="simple">
-<li>Generally, it is easy to check whether the forward computation of an Operator is correct or not. However, backpropagation is a notoriously difficult algorithm to debug and get right:<ol>
+<li>Generally, it is easy to check whether the forward computation of an Operator is correct or not. However, backpropagation is a notoriously difficult algorithm to debug and get right because of the following challenges:<ol>
-<li>you should get the right backpropagation formula according to the forward computation.</li>
+<li>The formula for backpropagation formula should be correct according to the forward computation.</li>
-<li>you should implement it right in CPP.</li>
+<li>The Implementation of the above shoule be correct in CPP.</li>
-<li>it&#8217;s difficult to prepare test data.</li>
+<li>It is difficult to prepare an unbiased test data.</li>
 </ol>
 </li>
-<li>Auto gradient checking gets a numerical gradient by forward Operator and use it as a reference of the backward Operator&#8217;s result. It has several advantages:<ol>
+<li>Auto gradient checking gets a numerical gradient using forward Operator and uses it as a reference for the backward Operator&#8217;s result. It has several advantages:<ol>
-<li>numerical gradient checker only need forward operator.</li>
+<li>Numerical gradient checker only needs the forward operator.</li>
-<li>user only need to prepare the input data for forward Operator.</li>
+<li>The user only needs to prepare the input data for forward Operator and not worry about the backward Operator.</li>
 </ol>
 </li>
 </ul>
 </div>
 <div class="section" id="mathematical-theory">
 <span id="mathematical-theory"></span><h1>Mathematical Theory<a class="headerlink" href="#mathematical-theory" title="永久链接至标题">¶</a></h1>
-<p>The following two document from Stanford has a detailed explanation of how to get numerical gradient and why it&#8217;s useful.</p>
+<p>The following documents from Stanford have a detailed explanation of how to compute the numerical gradient and why it is useful.</p>
 <div class="toctree-wrapper compound">
 <ul>
 <li class="toctree-l1"><a class="reference external" href="http://deeplearning.stanford.edu/wiki/index.php/Gradient_checking_and_advanced_optimization">Gradient checking and advanced optimization(en)</a></li>
@@ -250,8 +250,8 @@
 </ul>
 </div>
 </div>
-<div class="section" id="numeric-gradient-implementation">
+<div class="section" id="numerical-gradient-implementation">
-<span id="numeric-gradient-implementation"></span><h1>Numeric Gradient Implementation<a class="headerlink" href="#numeric-gradient-implementation" title="永久链接至标题">¶</a></h1>
+<span id="numerical-gradient-implementation"></span><h1>Numerical Gradient Implementation<a class="headerlink" href="#numerical-gradient-implementation" title="永久链接至标题">¶</a></h1>
 <div class="section" id="python-interface">
 <span id="python-interface"></span><h2>Python Interface<a class="headerlink" href="#python-interface" title="永久链接至标题">¶</a></h2>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">get_numerical_gradient</span><span class="p">(</span><span class="n">op</span><span class="p">,</span>
@@ -261,57 +261,60 @@
                         <span class="n">delta</span><span class="o">=</span><span class="mf">0.005</span><span class="p">,</span>
                         <span class="n">local_scope</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>
    <span class="sd">&quot;&quot;&quot;</span>
-<span class="sd">    Get Numeric Gradient for an operator&#39;s input.</span>
+<span class="sd">    Get Numerical Gradient for the input of an operator.</span>
-<span class="sd">    :param op: C++ operator instance, could be an network</span>
+<span class="sd">    :param op: C++ operator instance, could be an network.</span>
 <span class="sd">    :param input_values: The input variables. Should be an dictionary, whose key is</span>
-<span class="sd">    variable name, and value is numpy array.</span>
+<span class="sd">    variable name, and value is a numpy array.</span>
 <span class="sd">    :param output_name: The final output variable name.</span>
-<span class="sd">    :param input_to_check: The input variable with respect to which to compute the gradient.</span>
+<span class="sd">    :param input_to_check: The input variable with respect to which the gradient has to be computed.</span>
-<span class="sd">    :param delta: The perturbation value for numeric gradient method. The</span>
+<span class="sd">    :param delta: The perturbation value for numerical gradient method. The</span>
-<span class="sd">    smaller delta is, the more accurate result will get. But if that delta is</span>
+<span class="sd">    smaller the delta, the more accurate the result. But if the delta is too</span>
-<span class="sd">     too small, it will suffer from numerical stability problem.</span>
+<span class="sd">    small, it will suffer from the numerical stability problem.</span>
 <span class="sd">    :param local_scope: The local scope used for get_numeric_gradient.</span>
 <span class="sd">    :return: The gradient array in numpy format.</span>
 <span class="sd">    &quot;&quot;&quot;</span>
 </pre></div>
 </div>
 </div>
-<div class="section" id="explaination">
+<div class="section" id="explanation">
-<span id="explaination"></span><h2>Explaination:<a class="headerlink" href="#explaination" title="永久链接至标题">¶</a></h2>
+<span id="explanation"></span><h2>Explanation:<a class="headerlink" href="#explanation" title="永久链接至标题">¶</a></h2>
 <ul class="simple">
-<li>Why need <code class="docutils literal"><span class="pre">output_name</span></code><ul>
+<li>Why do we need an <code class="docutils literal"><span class="pre">output_name</span></code><ul>
-<li>An Operator may have multiple Output, one can get independent gradient from each Output. So caller should specify the name of the output variable.</li>
+<li>An Operator may have multiple Outputs, one can compute an independent gradient from each Output. So the caller should specify the name of the output variable.</li>
 </ul>
 </li>
-<li>Why need <code class="docutils literal"><span class="pre">input_to_check</span></code><ul>
+<li>Why do we need <code class="docutils literal"><span class="pre">input_to_check</span></code><ul>
-<li>One operator may have multiple inputs. Gradient Op can calculate the gradient of these inputs at the same time. But Numeric Gradient needs to calculate them one by one. So <code class="docutils literal"><span class="pre">get_numeric_gradient</span></code> is designed to calculate the gradient for one input. If you need to compute multiple inputs, you can call <code class="docutils literal"><span class="pre">get_numeric_gradient</span></code> multiple times.</li>
+<li>One operator can have multiple inputs. Gradient Op can calculate the gradient of these inputs at the same time. But Numerical Gradient needs to calculate them one by one. So <code class="docutils literal"><span class="pre">get_numeric_gradient</span></code> is designed to calculate the gradient for one input. If you need to compute multiple inputs, you can call <code class="docutils literal"><span class="pre">get_numeric_gradient</span></code> multiple times each with a different input.</li>
 </ul>
 </li>
 </ul>
 </div>
 <div class="section" id="core-algorithm-implementation">
 <span id="core-algorithm-implementation"></span><h2>Core Algorithm Implementation<a class="headerlink" href="#core-algorithm-implementation" title="永久链接至标题">¶</a></h2>
-<div class="highlight-python"><div class="highlight"><pre><span></span>    <span class="c1"># we only compute gradient of one element a time.</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span>    <span class="c1"># we only compute the gradient of one element a time.</span>
    <span class="c1"># we use a for loop to compute the gradient of each element.</span>
    <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="n">tensor_size</span><span class="p">):</span>
-        <span class="c1"># get one input element by its index i.</span>
+        <span class="c1"># get one input element using the index i.</span>
-        <span class="n">origin</span> <span class="o">=</span> <span class="n">tensor_to_check</span><span class="o">.</span><span class="n">get_float_element</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
+        <span class="n">original</span> <span class="o">=</span> <span class="n">tensor_to_check</span><span class="o">.</span><span class="n">get_float_element</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
-        <span class="c1"># add delta to it, run op and then get the new value of the result tensor.</span>
+        <span class="c1"># add delta to it, run the forward op and then</span>
-        <span class="n">x_pos</span> <span class="o">=</span> <span class="n">origin</span> <span class="o">+</span> <span class="n">delta</span>
+        <span class="c1"># get the new value of the result tensor.</span>
+        <span class="n">x_pos</span> <span class="o">=</span> <span class="n">original</span> <span class="o">+</span> <span class="n">delta</span>
        <span class="n">tensor_to_check</span><span class="o">.</span><span class="n">set_float_element</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">x_pos</span><span class="p">)</span>
        <span class="n">y_pos</span> <span class="o">=</span> <span class="n">get_output</span><span class="p">()</span>
-        <span class="c1"># plus delta to this element, run op and get the new value of the result tensor.</span>
+        <span class="c1"># Subtract delta from this element, run the op again</span>
-        <span class="n">x_neg</span> <span class="o">=</span> <span class="n">origin</span> <span class="o">-</span> <span class="n">delta</span>
+        <span class="c1"># and get the new value of the result tensor.</span>
+        <span class="n">x_neg</span> <span class="o">=</span> <span class="n">original</span> <span class="o">-</span> <span class="n">delta</span>
        <span class="n">tensor_to_check</span><span class="o">.</span><span class="n">set_float_element</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">x_neg</span><span class="p">)</span>
        <span class="n">y_neg</span> <span class="o">=</span> <span class="n">get_output</span><span class="p">()</span>
        <span class="c1"># restore old value</span>
-        <span class="n">tensor_to_check</span><span class="o">.</span><span class="n">set_float_element</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">origin</span><span class="p">)</span>
+        <span class="n">tensor_to_check</span><span class="o">.</span><span class="n">set_float_element</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">original</span><span class="p">)</span>
-        <span class="c1"># compute the gradient of this element and store it into a numpy array.</span>
+        <span class="c1"># compute the gradient of this element and store</span>
+        <span class="c1"># it into a numpy array.</span>
        <span class="n">gradient_flat</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">y_pos</span> <span class="o">-</span> <span class="n">y_neg</span><span class="p">)</span> <span class="o">/</span> <span class="n">delta</span> <span class="o">/</span> <span class="mi">2</span>
    <span class="c1"># reshape the gradient result to the shape of the source tensor.</span>
@@ -320,19 +323,19 @@
 </div>
 </div>
 </div>
-<div class="section" id="auto-graident-checker-framework">
+<div class="section" id="auto-gradient-check-framework">
-<span id="auto-graident-checker-framework"></span><h1>Auto Graident Checker Framework<a class="headerlink" href="#auto-graident-checker-framework" title="永久链接至标题">¶</a></h1>
+<span id="auto-gradient-check-framework"></span><h1>Auto Gradient Check Framework<a class="headerlink" href="#auto-gradient-check-framework" title="永久链接至标题">¶</a></h1>
 <p>Each Operator Kernel has three kinds of Gradient:</p>
 <ol class="simple">
 <li>Numerical gradient</li>
 <li>CPU kernel gradient</li>
-<li>GPU kernel gradient (if supported)</li>
+<li>GPU kernel gradient (if supported by the device)</li>
 </ol>
-<p>The numerical gradient only relies on forward Operator. So we use the numerical gradient as the reference value. And the gradient checking is performed in the following three steps:</p>
+<p>The numerical gradient only relies on the forward Operator, so we use the numerical gradient as the reference value. The gradient checking is performed in the following three steps:</p>
 <ol class="simple">
-<li>calculate the numerical gradient</li>
+<li>Calculate the numerical gradient</li>
-<li>calculate CPU kernel gradient with the backward Operator and compare it with the numerical gradient</li>
+<li>Calculate CPU kernel gradient with the backward Operator and compare it with the numerical gradient.</li>
-<li>calculate GPU kernel gradient with the backward Operator and compare it with the numeric gradient (if supported)</li>
+<li>Calculate GPU kernel gradient with the backward Operator and compare it with the numeric gradient. (if supported)</li>
 </ol>
 <div class="section" id="python-interface">
 <span id="id1"></span><h2>Python Interface<a class="headerlink" href="#python-interface" title="永久链接至标题">¶</a></h2>
@@ -347,26 +350,27 @@
        <span class="sd">&quot;&quot;&quot;</span>
 <span class="sd">        :param forward_op: used to create backward_op</span>
 <span class="sd">        :param input_vars: numpy value of input variable. The following</span>
-<span class="sd">            computation will use these variables.</span>
+<span class="sd">          computation will use these variables.</span>
-<span class="sd">        :param inputs_to_check: the input variable with respect to which to compute the gradient.</span>
+<span class="sd">        :param inputs_to_check: the input variable with respect to which the</span>
+<span class="sd">          gradient will be computed.</span>
 <span class="sd">        :param output_name: The final output variable name.</span>
 <span class="sd">        :param max_relative_error: The relative tolerance parameter.</span>
-<span class="sd">        :param no_grad_set: used when create backward ops</span>
+<span class="sd">        :param no_grad_set: used to create backward ops</span>
 <span class="sd">        :param only_cpu: only compute and check gradient on cpu kernel.</span>
 <span class="sd">        :return:</span>
 <span class="sd">        &quot;&quot;&quot;</span>
 </pre></div>
 </div>
 </div>
-<div class="section" id="how-to-check-if-two-numpy-array-is-close-enough">
+<div class="section" id="how-to-check-if-two-numpy-arrays-are-close-enough">
-<span id="how-to-check-if-two-numpy-array-is-close-enough"></span><h2>How to check if two numpy array is close enough?<a class="headerlink" href="#how-to-check-if-two-numpy-array-is-close-enough" title="永久链接至标题">¶</a></h2>
+<span id="how-to-check-if-two-numpy-arrays-are-close-enough"></span><h2>How to check if two numpy arrays are close enough?<a class="headerlink" href="#how-to-check-if-two-numpy-arrays-are-close-enough" title="永久链接至标题">¶</a></h2>
-<p>if <code class="docutils literal"><span class="pre">abs_numerical_grad</span></code> is nearly zero, then use abs error for numerical_grad</p>
+<p>if <code class="docutils literal"><span class="pre">abs_numerical_grad</span></code> is nearly zero, then use absolute error for numerical_grad.</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">numerical_grad</span> <span class="o">=</span> <span class="o">...</span>
 <span class="n">operator_grad</span> <span class="o">=</span> <span class="n">numpy</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">scope</span><span class="o">.</span><span class="n">find_var</span><span class="p">(</span><span class="n">grad_var_name</span><span class="p">(</span><span class="n">name</span><span class="p">))</span><span class="o">.</span><span class="n">get_tensor</span><span class="p">())</span>
 <span class="n">abs_numerical_grad</span> <span class="o">=</span> <span class="n">numpy</span><span class="o">.</span><span class="n">abs</span><span class="p">(</span><span class="n">numerical_grad</span><span class="p">)</span>
-<span class="c1"># if abs_numerical_grad is nearly zero, then use abs error for numeric_grad, not relative</span>
+<span class="c1"># if abs_numerical_grad is nearly zero, then use abs error for</span>
-<span class="c1"># error.</span>
+<span class="c1"># numeric_grad, instead of relative error.</span>
 <span class="n">abs_numerical_grad</span><span class="p">[</span><span class="n">abs_numerical_grad</span> <span class="o">&lt;</span> <span class="mf">1e-3</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span>
 <span class="n">diff_mat</span> <span class="o">=</span> <span class="n">numpy</span><span class="o">.</span><span class="n">abs</span><span class="p">(</span><span class="n">abs_numerical_grad</span> <span class="o">-</span> <span class="n">operator_grad</span><span class="p">)</span> <span class="o">/</span> <span class="n">abs_numerical_grad</span>
@@ -375,10 +379,10 @@
 </div>
 <div class="section" id="notes">
 <span id="notes"></span><h3>Notes：<a class="headerlink" href="#notes" title="永久链接至标题">¶</a></h3>
-<p>The Input data for auto gradient checker should be reasonable to avoid numerical  stability problem.</p>
+<p>The Input data for auto gradient checker should be reasonable to avoid numerical stability problem.</p>
 </div>
-<div class="section" id="refs">
+<div class="section" id="references">
-<span id="refs"></span><h3>Refs:<a class="headerlink" href="#refs" title="永久链接至标题">¶</a></h3>
+<span id="references"></span><h3>References:<a class="headerlink" href="#references" title="永久链接至标题">¶</a></h3>
 <div class="toctree-wrapper compound">
 <ul>
 <li class="toctree-l1"><a class="reference external" href="http://deeplearning.stanford.edu/wiki/index.php/Gradient_checking_and_advanced_optimization">Gradient checking and advanced optimization(en)</a></li>

--- a/develop/doc_cn/objects.inv
+++ b/develop/doc_cn/objects.inv
--- a/develop/doc_cn/searchindex.js
+++ b/develop/doc_cn/searchindex.js