diff --git a/develop/doc/_sources/howto/dev/new_layer_en.rst.txt b/develop/doc/_sources/howto/dev/new_layer_en.rst.txt
index 46481f5ead33dc6a26507e021fd9ae0f8316e940..110a9fb38f890a766bb4480e91feb22d3b0838a5 100644
--- a/develop/doc/_sources/howto/dev/new_layer_en.rst.txt
+++ b/develop/doc/_sources/howto/dev/new_layer_en.rst.txt
@@ -29,7 +29,7 @@ Fully connected layer takes a dense input vector with dimension :math:`D_i`. It
 
 where :math:`f(.)` is an nonlinear *activation* function, such as sigmoid, tanh, and Relu.
 
-The transformation matrix :math:`W` and bias vector :math:`b` are the *parameters* of the layer. The *parameters* of a layer are learned during training in the *backward pass*. The backward pass computes the gradients of the output function with respect to all parameters and inputs. The optimizer can use chain rule to compute the gradients of the loss function with respect to each parameter. 
+The transformation matrix :math:`W` and bias vector :math:`b` are the *parameters* of the layer. The *parameters* of a layer are learned during training in the *backward pass*. The backward pass computes the gradients of the output function with respect to all parameters and inputs. The optimizer can use chain rule to compute the gradients of the loss function with respect to each parameter.
 
 Suppose our loss function is :math:`c(y)`, then
 
@@ -37,7 +37,7 @@ Suppose our loss function is :math:`c(y)`, then
 
    \frac{\partial c(y)}{\partial x} = \frac{\partial c(y)}{\partial y} \frac{\partial y}{\partial x}
 
-Suppose :math:`z = f(W^T x + b)`, then
+Suppose :math:`z = W^T x + b`, then
 
 .. math::
 
@@ -48,7 +48,7 @@ This derivative can be automatically computed by our base layer class.
 Then, for fully connected layer, we need to compute:
 
 .. math::
-  
+
    \frac{\partial z}{\partial x} = W, \frac{\partial z_j}{\partial W_{ij}} = x_i, \frac{\partial z}{\partial b} = \mathbf 1
 
 where :math:`\mathbf 1` is an all one vector, :math:`W_{ij}` is the number at the i-th row and j-th column of the matrix :math:`W`, :math:`z_j` is the j-th component of the vector :math:`z`, and :math:`x_i` is the i-th component of the vector :math:`x`.
@@ -322,7 +322,7 @@ All the gradient check unit tests are located in :code:`paddle/gserver/tests/tes
                       /* weight */ true);
       }
     }
-    
+
 If you are creating a new file for the test, such as :code:`paddle/gserver/tests/testFCGrad.cpp`, you need to add the file to :code:`paddle/gserver/tests/CMakeLists.txt`. An example is given below. All the unit tests will run when you execute the command :code:`make tests`. Notice that some layers might need high accuracy for the gradient check unit tests to work well. You need to configure :code:`WITH_DOUBLE` to `ON` when configuring cmake.
 
 .. code-block:: bash
diff --git a/develop/doc/howto/dev/new_layer_en.html b/develop/doc/howto/dev/new_layer_en.html
index 12b91ebdd2786b094ff41a2d4760b29b2221e454..e715d6fa79502c21c6070424f82b58b1595650fe 100644
--- a/develop/doc/howto/dev/new_layer_en.html
+++ b/develop/doc/howto/dev/new_layer_en.html
@@ -207,7 +207,7 @@ Fully connected layer takes a dense input vector with dimension <span class="mat
 <p>Suppose our loss function is <span class="math">\(c(y)\)</span>, then</p>
 <div class="math">
 \[\frac{\partial c(y)}{\partial x} = \frac{\partial c(y)}{\partial y} \frac{\partial y}{\partial x}\]</div>
-<p>Suppose <span class="math">\(z = f(W^T x + b)\)</span>, then</p>
+<p>Suppose <span class="math">\(z = W^T x + b\)</span>, then</p>
 <div class="math">
 \[\frac{\partial y}{\partial z} = \frac{\partial f(z)}{\partial z}\]</div>
 <p>This derivative can be automatically computed by our base layer class.</p>
diff --git a/develop/doc_cn/_sources/howto/dev/new_layer_cn.rst.txt b/develop/doc_cn/_sources/howto/dev/new_layer_cn.rst.txt
index 9489a921c70ad6ee5709f46445554f5d9640162c..75037e693b32f923ee7dc9dfec322495fe4ce10a 100644
--- a/develop/doc_cn/_sources/howto/dev/new_layer_cn.rst.txt
+++ b/develop/doc_cn/_sources/howto/dev/new_layer_cn.rst.txt
@@ -37,7 +37,7 @@
 
    \frac{\partial c(y)}{\partial x} = \frac{\partial c(y)}{\partial y} \frac{\partial y}{\partial x}
 
-假设 :math:`z = f(W^T x + b)` ，那么
+假设 :math:`z = W^T x + b` ，那么
 
 .. math::
 
diff --git a/develop/doc_cn/howto/dev/new_layer_cn.html b/develop/doc_cn/howto/dev/new_layer_cn.html
index e49c69d3d0c0e4084cf1264771d00c07d6668a38..cdacb6f4a8d232660aceefe7c208c0360b874a0b 100644
--- a/develop/doc_cn/howto/dev/new_layer_cn.html
+++ b/develop/doc_cn/howto/dev/new_layer_cn.html
@@ -207,7 +207,7 @@
 <p>假设损失函数是 <span class="math">\(c(y)\)</span> ，那么</p>
 <div class="math">
 \[\frac{\partial c(y)}{\partial x} = \frac{\partial c(y)}{\partial y} \frac{\partial y}{\partial x}\]</div>
-<p>假设 <span class="math">\(z = f(W^T x + b)\)</span> ，那么</p>
+<p>假设 <span class="math">\(z = W^T x + b\)</span> ，那么</p>
 <div class="math">
 \[\frac{\partial y}{\partial z} = \frac{\partial f(z)}{\partial z}\]</div>
 <p>PaddlePaddle的base layer类可以自动计算上面的导数。</p>