Deploy to GitHub Pages: fe84517b

6cdb184c · Travis CI · 26354f19 · 6cdb184c · 6cdb184c · 6cdb184c
6 changed file
--- a/develop/doc/api/v2/config/layer.html
+++ b/develop/doc/api/v2/config/layer.html
@@ -223,14 +223,15 @@
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; The Layer Name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer|list|tuple</em>) &#8211; The input layer. Could be a list/tuple of input layer.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; The layer dimension.</li>
 <li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation Type. Default is tanh.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; The Parameter Attribute|list.</li>
-<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Any</em>) &#8211; The Bias Attribute. If no bias, then pass False or
+<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Bool|Any</em>) &#8211; The Bias Attribute. If the parameter is set to
-something not type of paddle.v2.attr.ParameterAttribute. None will get a
+False or something not type of paddle.v2.attr.ParameterAttribute,
-default Bias.</li>
+no bias is defined. If the parameter is set to
+True, the bias is initialized to zero.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttributeNone</em>) &#8211; Extra Layer config.</li>
 </ul>
 </td>
@@ -264,7 +265,7 @@ specified, selective_fc acts exactly like fc.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; The Layer Name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer|list|tuple</em>) &#8211; The input layer.</li>
 <li><strong>select</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; The select layer. The output of select layer should be a
 sparse binary matrix, and treat as the mask of selective fc.
@@ -272,9 +273,10 @@ If is None, acts exactly like fc.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; The layer dimension.</li>
 <li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation Type. Default is tanh.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; The Parameter Attribute.</li>
-<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Any</em>) &#8211; The Bias Attribute. If no bias, then pass False or
+<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Bool|Any</em>) &#8211; The Bias Attribute. If the parameter is set to
-something not type of paddle.v2.attr.ParameterAttribute. None will get a
+False or something not type of paddle.v2.attr.ParameterAttribute,
-default Bias.</li>
+no bias is defined. If the parameter is set to
+True, the bias is initialized to zero.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttributeNone</em>) &#8211; Extra Layer config.</li>
 </ul>
 </td>
@@ -424,7 +426,7 @@ the right size (which is the end of array) to the left.</li>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; layer name</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>a</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Input layer a.</li>
 <li><strong>b</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer b.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; layer&#8217;s extra attribute.</li>
@@ -478,7 +480,7 @@ rest channels will be processed by rest group of filters.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Layer Input.</li>
 <li><strong>filter_size</strong> (<em>int|tuple|list</em>) &#8211; The x dimension of a filter kernel. Or input a tuple for
 two image dimension.</li>
@@ -497,8 +499,10 @@ image dimension</li>
 <li><strong>dilation</strong> (<em>int|tuple|list</em>) &#8211; The x dimension of the dilation. Or input a tuple for two
 image dimension</li>
 <li><strong>dilation_y</strong> (<em>int</em>) &#8211; The y dimension of the dilation.</li>
-<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|False</em>) &#8211; Convolution bias attribute. None means default bias.
+<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Bool|Any</em>) &#8211; The Bias Attribute. If the parameter is set to
-False means no bias.</li>
+False or something not type of paddle.v2.attr.ParameterAttribute,
+no bias is defined. If the parameter is set to
+True, the bias is initialized to zero.</li>
 <li><strong>num_channels</strong> (<em>int</em>) &#8211; number of input channels. If None will be set
 automatically from previous output.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; Convolution param attribute. None means default attribute</li>
@@ -569,15 +573,15 @@ parameter attribute is set by this parameter.</li>
 <dt>
 <em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">row_conv</code></dt>
 <dd><p>The row convolution is called lookahead convolution. It is firstly
-introduced in paper of <a class="reference external" href="https://arxiv.org/pdf/1512.02595v1.pdf">Deep Speech 2: End-toEnd Speech Recognition
+introduced in paper of <a class="reference external" href="https://arxiv.org/pdf/1512.02595v1.pdf">Deep Speech 2: End-to-End Speech Recognition
 in English and Mandarin</a> .</p>
 <p>The bidirectional RNN that learns representation for a sequence by
 performing a forward and a backward pass through the entire sequence.
 However, unlike unidirectional RNNs, bidirectional RNNs are challenging
 to deploy in an online and low-latency setting. The lookahead convolution
 incorporates information from future subsequences in a computationally
-efficient manner to improve unidirectional recurrent neural networks.</p>
+efficient manner to improve unidirectional RNNs.</p>
-<p>The connection of row convolution is different form the 1D sequence
+<p>The connection of row convolution is different from the 1D sequence
 convolution. Assumed that, the future context-length is k, that is to say,
 it can get the output at timestep t by using the the input feature from t-th
 timestep to (t+k+1)-th timestep. Assumed that the hidden dim of input
@@ -603,7 +607,7 @@ number plus one equals context_len.</p>
 plus one.</li>
 <li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation Type. Default is linear activation.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; The Parameter Attribute. If None, the parameter will be
-initialized smartly. It&#8217;s better set it by yourself.</li>
+initialized smartly. It&#8217;s better to set it by yourself.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttributeNone</em>) &#8211; Extra Layer config.</li>
 </ul>
 </td>
@@ -706,7 +710,7 @@ The details please refer to
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; layer&#8217;s input.</li>
 <li><strong>num_channels</strong> (<em>int</em>) &#8211; number of input channel.</li>
 <li><strong>pool_type</strong> &#8211; Pooling type. MaxPooling or AveragePooling. Default is MaxPooling.</li>
@@ -771,7 +775,7 @@ s = input.size / num_channels
 <li><strong>num_channels</strong> (<em>int|None</em>) &#8211; The channel number of input layer. If None will be set
 automatically from previous output.</li>
 <li><strong>groups</strong> (<em>int</em>) &#8211; The group number of input layer.</li>
-<li><strong>name</strong> (<em>None|basestring.</em>) &#8211; The name of this layer, which can not specify.</li>
+<li><strong>name</strong> (<em>None|basestring.</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; Extra Layer attribute.</li>
 </ul>
 </td>
@@ -807,7 +811,7 @@ The details please refer to
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>None|basestring</em>) &#8211; layer name.</li>
+<li><strong>name</strong> (<em>None|basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; layer&#8217;s input.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; Normalize in number of <span class="math">\(size\)</span> feature maps.</li>
 <li><strong>scale</strong> (<em>float</em>) &#8211; The hyper-parameter.</li>
@@ -855,7 +859,7 @@ y_i &amp;\gets \gamma \hat{x_i} + \beta \qquad &amp;//\ scale\ and\ shift\end{sp
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; batch normalization input. Better be linear activation.
 Because there is an activation inside batch_normalization.</li>
 <li><strong>batch_norm_type</strong> (<em>None|string</em><em>, </em><em>None</em><em> or </em><em>&quot;batch_norm&quot;</em><em> or </em><em>&quot;cudnn_batch_norm&quot;</em>) &#8211; We have batch_norm and cudnn_batch_norm. batch_norm
@@ -872,7 +876,7 @@ normalization will normalize input near zero.</li>
 <li><strong>num_channels</strong> (<em>int</em>) &#8211; num of image channels or previous layer&#8217;s number of
 filters. None will automatically get from layer&#8217;s
 input.</li>
-<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; <span class="math">\(\beta\)</span>, better be zero when initialize. So the
+<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Bool|Any</em>) &#8211; <span class="math">\(\beta\)</span>, better be zero when initialize. So the
 initial_std=0, initial_mean=1 is best practice.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; <span class="math">\(\gamma\)</span>, better be one when initialize. So the
 initial_std=0, initial_mean=1 is best practice.</li>
@@ -923,7 +927,7 @@ and <span class="math">\(out\)</span> is a (batchSize x dataDim) output vector.<
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Input layer.</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; extra layer attributes.</li>
 </ul>
 </td>
@@ -953,7 +957,7 @@ factors which dimensions equal to the channel&#8217;s number.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; The Layer Name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; The input layer.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; The Parameter Attribute|list.</li>
 </ul>
@@ -993,7 +997,7 @@ and the size of <span class="math">\(out\)</span> is a (batchSize x dataDim) .</
 </tr>
 <tr class="field-even field"><th class="field-name">type input:</th><td class="field-body">paddle.v2.config_base.Layer</td>
 </tr>
-<tr class="field-odd field"><th class="field-name">param name:</th><td class="field-body">Layer name.</td>
+<tr class="field-odd field"><th class="field-name">param name:</th><td class="field-body">The name of this layer. It is optional.</td>
 </tr>
 <tr class="field-even field"><th class="field-name">type name:</th><td class="field-body">basestring</td>
 </tr>
@@ -1038,9 +1042,12 @@ out_{i} = act(in_{i} + out_{i+1} * W) \ \ \text{for} \ start &lt;= i &lt; end\en
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Input Layer</li>
 <li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; activation.</li>
-<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; bias attribute.</li>
+<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Bool|Any</em>) &#8211; The Bias Attribute. If the parameter is set to
+False or something not type of paddle.v2.attr.ParameterAttribute,
+no bias is defined. If the parameter is set to
+True, the bias is initialized to zero.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; parameter attribute.</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; name of the layer</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; Layer Attribute.</li>
 </ul>
 </td>
@@ -1088,8 +1095,10 @@ more details about LSTM.</p>
 <li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; activation type, paddle.v2.activation.Tanh by default. <span class="math">\(h_t\)</span></li>
 <li><strong>gate_act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; gate activation type, paddle.v2.activation.Sigmoid by default.</li>
 <li><strong>state_act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; state activation type, paddle.v2.activation.Tanh by default.</li>
-<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|False</em>) &#8211; Bias attribute. None means default bias. False means no
+<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Bool|Any</em>) &#8211; The Bias Attribute. If the parameter is set to
-bias.</li>
+False or something not type of paddle.v2.attr.ParameterAttribute,
+no bias is defined. If the parameter is set to
+True, the bias is initialized to zero.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|False</em>) &#8211; Parameter Attribute.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttributeNone</em>) &#8211; Extra Layer attribute</li>
 </ul>
@@ -1156,8 +1165,10 @@ affects the <span class="math">\({\tilde{h_t}}\)</span>.</li>
 <li><strong>gate_act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; gate activation type, paddle.v2.activation.Sigmoid by default.
 This activation affects the <span class="math">\(z_t\)</span> and <span class="math">\(r_t\)</span>. It is the
 <span class="math">\(\sigma\)</span> in the above formula.</li>
-<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|False</em>) &#8211; Bias attribute. None means default bias. False means no
+<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Bool|Any</em>) &#8211; The Bias Attribute. If the parameter is set to
-bias.</li>
+False or something not type of paddle.v2.attr.ParameterAttribute,
+no bias is defined. If the parameter is set to
+True, the bias is initialized to zero.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|False</em>) &#8211; Parameter Attribute.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttributeNone</em>) &#8211; Extra Layer attribute</li>
 </ul>
@@ -1329,7 +1340,7 @@ output is <span class="math">\(o_t\)</span>, whose name is &#8216;state&#8217; a
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer&#8217;s name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; Layer&#8217;s size. NOTE: lstm layer&#8217;s size, should be equal to
 <code class="code docutils literal"><span class="pre">input.size/4</span></code>, and should be equal to
 <code class="code docutils literal"><span class="pre">state.size</span></code>.</li>
@@ -1340,7 +1351,10 @@ output is <span class="math">\(o_t\)</span>, whose name is &#8216;state&#8217; a
 be sigmoid only.</li>
 <li><strong>state_act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; State Activation Type. Default is sigmoid, and should
 be sigmoid only.</li>
-<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; Bias Attribute.</li>
+<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Bool|Any</em>) &#8211; The Bias Attribute. If the parameter is set to
+False or something not type of paddle.v2.attr.ParameterAttribute,
+no bias is defined. If the parameter is set to
+True, the bias is initialized to zero.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; layer&#8217;s extra attribute.</li>
 </ul>
 </td>
@@ -1370,9 +1384,12 @@ be sigmoid only.</li>
 <li><strong>output_mem</strong> &#8211; </li>
 <li><strong>size</strong> &#8211; </li>
 <li><strong>act</strong> &#8211; </li>
-<li><strong>name</strong> &#8211; </li>
+<li><strong>name</strong> &#8211; The name of this layer. It is optional.</li>
 <li><strong>gate_act</strong> &#8211; </li>
-<li><strong>bias_attr</strong> &#8211; </li>
+<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Bool|Any</em>) &#8211; The Bias Attribute. If the parameter is set to
+False or something not type of paddle.v2.attr.ParameterAttribute,
+no bias is defined. If the parameter is set to
+True, the bias is initialized to zero.</li>
 <li><strong>param_attr</strong> &#8211; the parameter_attribute for transforming the output_mem
 from previous step.</li>
 <li><strong>layer_attr</strong> &#8211; </li>
@@ -1486,7 +1503,7 @@ the output from input.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer&#8217;s name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; get output layer&#8217;s input. And this layer should contains
 multiple outputs.</li>
 <li><strong>arg_name</strong> (<em>basestring</em>) &#8211; Output name from input.</li>
@@ -1542,9 +1559,10 @@ Each inputs is a projection or operator.</p>
 <li><strong>input</strong> &#8211; inputs layer. It is an optional parameter. If set,
 then this function will just return layer&#8217;s name.</li>
 <li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation Type.</li>
-<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em><em> or </em><em>None</em><em> or </em><em>bool</em>) &#8211; The Bias Attribute. If no bias, then pass False or
+<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Bool|Any</em>) &#8211; The Bias Attribute. If the parameter is set to
-something not type of paddle.v2.attr.ParameterAttribute. None will get a
+False or something not type of paddle.v2.attr.ParameterAttribute,
-default Bias.</li>
+no bias is defined. If the parameter is set to
+True, the bias is initialized to zero.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; The extra layer config. Default is None.</li>
 </ul>
 </td>
@@ -1571,7 +1589,7 @@ default Bias.</li>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Name of this embedding layer.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; The input layer for this embedding. NOTE: must be Index Data.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; The embedding dimension.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None</em>) &#8211; The embedding parameter attribute. See paddle.v2.attr.ParameterAttribute
@@ -1967,12 +1985,15 @@ of stride is -1.</p>
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>agg_level</strong> (<em>AggregateLevel</em>) &#8211; AggregateLevel.TO_NO_SEQUENCE or
 AggregateLevel.TO_SEQUENCE</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer name.</li>
 <li><strong>pooling_type</strong> (<em>BasePoolingType|None</em>) &#8211; Type of pooling, MaxPooling(default), AvgPooling,
 SumPooling, SquareRootNPooling.</li>
 <li><strong>stride</strong> (<em>Int</em>) &#8211; The step size between successive pooling regions.</li>
-<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|False</em>) &#8211; Bias parameter attribute. False if no bias.</li>
+<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Bool|Any</em>) &#8211; The Bias Attribute. If the parameter is set to
+False or something not type of paddle.v2.attr.ParameterAttribute,
+no bias is defined. If the parameter is set to
+True, the bias is initialized to zero.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttributeNone</em>) &#8211; The Extra Attributes for layer, such as dropout.</li>
 </ul>
 </td>
@@ -2008,7 +2029,7 @@ of stride is -1.</p>
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>agg_level</strong> &#8211; Aggregated level</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Input layer name.</li>
 <li><strong>stride</strong> (<em>Int</em>) &#8211; The step size between successive pooling regions.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; extra layer attributes.</li>
@@ -2046,7 +2067,7 @@ of stride is -1.</p>
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>agg_level</strong> &#8211; aggregation level</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Input layer name.</li>
 <li><strong>stride</strong> (<em>Int</em>) &#8211; The step size between successive pooling regions.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; extra layer attributes.</li>
@@ -2080,7 +2101,7 @@ Inputs can be list of paddle.v2.config_base.Layer or list of projection.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input</strong> (<em>list|tuple|collections.Sequence</em>) &#8211; input layers or projections</li>
 <li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation type.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; Extra Layer Attribute.</li>
@@ -2124,14 +2145,15 @@ processed in one batch.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>a</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input sequence layer</li>
 <li><strong>b</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input sequence layer</li>
 <li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation type.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; Extra Layer Attribute.</li>
-<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em><em> or </em><em>None</em><em> or </em><em>bool</em>) &#8211; The Bias Attribute. If no bias, then pass False or
+<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Bool|Any</em>) &#8211; The Bias Attribute. If the parameter is set to
-something not type of paddle.v2.attr.ParameterAttribute. None will get a
+False or something not type of paddle.v2.attr.ParameterAttribute,
-default Bias.</li>
+no bias is defined. If the parameter is set to
+True, the bias is initialized to zero.</li>
 </ul>
 </td>
 </tr>
@@ -2176,7 +2198,7 @@ will be sliced for multiple times.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; name of this layer.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input for this layer, it should be a sequence.</li>
 <li><strong>starts</strong> (<em>paddle.v2.config_base.Layer|None</em>) &#8211; start indices to slice the input sequence.</li>
 <li><strong>ends</strong> (<em>paddle.v2.config_base.Layer|None</em>) &#8211; end indices to slice the input sequence.</li>
@@ -2218,7 +2240,7 @@ beam training.</p>
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; A nested sequence.</li>
 <li><strong>selected_indices</strong> &#8211; a set of sequence indices in the nested sequence.</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; name of this layer.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 </ul>
 </td>
 </tr>
@@ -2278,7 +2300,7 @@ convolution neural network, and before recurrent neural network.</p>
 <li><strong>stride_y</strong> (<em>int</em>) &#8211; The stride size in vertical direction.</li>
 <li><strong>padding_x</strong> (<em>int</em>) &#8211; The padding size in horizontal direction.</li>
 <li><strong>padding_y</strong> (<em>int</em>) &#8211; The padding size in vertical direction.</li>
-<li><strong>name</strong> (<em>None|basestring.</em>) &#8211; The name of this layer, which can not specify.</li>
+<li><strong>name</strong> (<em>None|basestring.</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttributeNone</em>) &#8211; Extra Layer config.</li>
 </ul>
 </td>
@@ -2332,9 +2354,11 @@ sequence is one) to sequence data.&#8221;</p>
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Input layer</li>
 <li><strong>expand_as</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Expand as this layer&#8217;s sequence info.</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
-<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|False</em>) &#8211; Bias attribute. None means default bias. False means no
+<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Bool|Any</em>) &#8211; The Bias Attribute. If the parameter is set to
-bias.</li>
+False or something not type of paddle.v2.attr.ParameterAttribute,
+no bias is defined. If the parameter is set to
+True, the bias is initialized to zero.</li>
 <li><strong>expand_level</strong> (<em>ExpandLevel</em>) &#8211; whether input layer is timestep(default) or sequence.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; extra layer attributes.</li>
 </ul>
@@ -2378,7 +2402,7 @@ bias.</li>
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Input layer</li>
 <li><strong>num_repeats</strong> (<em>int</em>) &#8211; Repeat the input so many times</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>as_row_vector</strong> (<em>bool</em>) &#8211; True for treating input as row vector and repeating
 in the column direction.  This is equivalent to apply
 concat() with num_repeats same input.
@@ -2423,7 +2447,7 @@ usually used when the input sample is some image or feature map.</p>
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Input layer.</li>
 <li><strong>height</strong> (<em>int</em>) &#8211; The height of the sample matrix</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; extra layer attributes.</li>
 </ul>
 </td>
@@ -2459,12 +2483,13 @@ output sequence has T*M/N instances, the dimension of each instance is N.</p>
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Input layer.</li>
 <li><strong>reshape_size</strong> (<em>int</em>) &#8211; the size of reshaped sequence.</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation type.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; extra layer attributes.</li>
-<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em><em> or </em><em>None</em><em> or </em><em>bool</em>) &#8211; The Bias Attribute. If no bias, then pass False or
+<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Bool|Any</em>) &#8211; The Bias Attribute. If the parameter is set to
-something not type of paddle.v2.attr.ParameterAttribute. None will get a
+False or something not type of paddle.v2.attr.ParameterAttribute,
-default Bias.</li>
+no bias is defined. If the parameter is set to
+True, the bias is initialized to zero.</li>
 </ul>
 </td>
 </tr>
@@ -2513,12 +2538,14 @@ Please refer to dropout for details.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer|list|tuple</em>) &#8211; Input layers. It could be a paddle.v2.config_base.Layer or list/tuple of
 paddle.v2.config_base.Layer.</li>
 <li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation Type, default is tanh.</li>
-<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|bool</em>) &#8211; Bias attribute. If False, means no bias. None is default
+<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Bool|Any</em>) &#8211; The Bias Attribute. If the parameter is set to
-bias.</li>
+False or something not type of paddle.v2.attr.ParameterAttribute,
+no bias is defined. If the parameter is set to
+True, the bias is initialized to zero.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; Extra Layer attribute.</li>
 </ul>
 </td>
@@ -2581,7 +2608,7 @@ processed in one batch.</p>
 <li><strong>weights</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; The weight layer.</li>
 <li><strong>vectors</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; The vector layer.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; the dimension of this layer.</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; The Layer Name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttributeNone</em>) &#8211; Extra Layer config.</li>
 </ul>
 </td>
@@ -2620,7 +2647,7 @@ which is used in NEURAL TURING MACHINE.</p>
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>list|tuple</em>) &#8211; Input layer.</li>
 <li><strong>weight</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Weight layer.</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; extra layer attributes.</li>
 </ul>
 </td>
@@ -2693,7 +2720,7 @@ and <span class="math">\(y\)</span> is a output vector.</p>
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Input layer.</li>
 <li><strong>weight</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Weight layer.</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; extra layer attributes.</li>
 </ul>
 </td>
@@ -2732,7 +2759,7 @@ processed in one batch.</p>
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Input layer.</li>
 <li><strong>weight</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Weight layer.</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; extra layer attributes.</li>
 </ul>
 </td>
@@ -2768,7 +2795,7 @@ ight)</p>
 <col class="field-name" />
 <col class="field-body" />
 <tbody valign="top">
-<tr class="field-odd field"><th class="field-name">param name:</th><td class="field-body">The Layer Name.</td>
+<tr class="field-odd field"><th class="field-name">param name:</th><td class="field-body">The name of this layer. It is optional.</td>
 </tr>
 <tr class="field-even field"><th class="field-name">type name:</th><td class="field-body">basestring</td>
 </tr>
@@ -2813,7 +2840,7 @@ element-wise. There is no activation and weight.</p>
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; The input layer.</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; The Layer Name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>slope</strong> (<em>float.</em>) &#8211; the scale factor.</li>
 <li><strong>intercept</strong> (<em>float.</em>) &#8211; the offset.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttributeNone</em>) &#8211; Extra Layer config.</li>
@@ -2860,15 +2887,16 @@ For example, each sample:</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; layer name</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>a</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Input layer a.</li>
 <li><strong>b</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer b.</li>
 <li><strong>size</strong> (<em>int.</em>) &#8211; the layer dimension.</li>
 <li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation Type. Default is tanh.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; The Parameter Attribute.</li>
-<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Any</em>) &#8211; The Bias Attribute. If no bias, then pass False or
+<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Bool|Any</em>) &#8211; The Bias Attribute. If the parameter is set to
-something not type of paddle.v2.attr.ParameterAttribute. None will get a
+False or something not type of paddle.v2.attr.ParameterAttribute,
-default Bias.</li>
+no bias is defined. If the parameter is set to
+True, the bias is initialized to zero.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttributeNone</em>) &#8211; Extra Layer config.</li>
 </ul>
 </td>
@@ -2907,7 +2935,7 @@ processed in one batch.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; layer name</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>a</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer a</li>
 <li><strong>b</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer b</li>
 <li><strong>scale</strong> (<em>float</em>) &#8211; scale for cosine value. default is 5.</li>
@@ -2946,7 +2974,7 @@ processed in one batch.</p>
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Input layer.</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; extra layer attributes.</li>
 </ul>
 </td>
@@ -2982,10 +3010,13 @@ bias are trainable.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; The Layer Name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer.</em>) &#8211; The input layer.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; The parameter attribute of scaling.</li>
-<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; The parameter attribute of shifting.</li>
+<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Bool|Any</em>) &#8211; The Bias Attribute. If the parameter is set to
+False or something not type of paddle.v2.attr.ParameterAttribute,
+no bias is defined. If the parameter is set to
+True, the bias is initialized to zero.</li>
 </ul>
 </td>
 </tr>
@@ -3020,7 +3051,7 @@ The result is stored in output.ids.</p>
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Input layer name.</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; extra layer attributes.</li>
 </ul>
 </td>
@@ -3053,7 +3084,7 @@ Sampling one id for one sample.</p>
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; The input layer.</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; The Layer Name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttributeNone</em>) &#8211; Extra Layer config.</li>
 </ul>
 </td>
@@ -3097,7 +3128,7 @@ For each index i from 0 to batchSize -1, the output is the i-th row of the
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>list of paddle.v2.config_base.Layer</em>) &#8211; Input layers.</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; extra layer attributes.</li>
 </ul>
 </td>
@@ -3167,7 +3198,7 @@ in width dimension.</p>
 <li><strong>pad_h</strong> (<em>list|None</em>) &#8211; padding size in height dimension.</li>
 <li><strong>pad_w</strong> (<em>list|None</em>) &#8211; padding size in width dimension.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; Extra Layer Attribute.</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 </ul>
 </td>
 </tr>
@@ -3203,7 +3234,7 @@ in width dimension.</p>
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer.</em>) &#8211; The first input layer.</li>
 <li><strong>label</strong> &#8211; The input label.</li>
-<li><strong>name</strong> (<em>None|basestring.</em>) &#8211; The name of this layers. It is not necessary.</li>
+<li><strong>name</strong> (<em>None|basestring.</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>coeff</strong> (<em>float.</em>) &#8211; The cost is multiplied with coeff.
 The coefficient affects the gradient in the backward.</li>
 <li><strong>weight</strong> (<em>LayerOutout</em>) &#8211; The cost of each sample is multiplied with each weight.
@@ -3243,7 +3274,7 @@ Input should be a vector of positive numbers, without normalization.</p>
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer.</em>) &#8211; The first input layer.</li>
 <li><strong>label</strong> &#8211; The input label.</li>
-<li><strong>name</strong> (<em>None|basestring.</em>) &#8211; The name of this layers. It is not necessary.</li>
+<li><strong>name</strong> (<em>None|basestring.</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>coeff</strong> (<em>float.</em>) &#8211; The coefficient affects the gradient in the backward.</li>
 <li><strong>softmax_selfnorm_alpha</strong> (<em>float.</em>) &#8211; The scale factor affects the cost.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; Extra Layer Attribute.</li>
@@ -3279,7 +3310,7 @@ Input should be a vector of positive numbers, without normalization.</p>
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; The first input layer.</li>
 <li><strong>label</strong> &#8211; The input label.</li>
-<li><strong>name</strong> (<em>None|basestring</em>) &#8211; The name of this layers. It is not necessary.</li>
+<li><strong>name</strong> (<em>None|basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>coeff</strong> (<em>float</em>) &#8211; The coefficient affects the gradient in the backward.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; Extra Layer Attribute.</li>
 </ul>
@@ -3328,7 +3359,7 @@ ight <a href="#id2"><span class="problematic" id="id3">|</span></a>leq delta</p>
 </tr>
 <tr class="field-even field"><th class="field-name">type input:</th><td class="field-body">paddle.v2.config_base.Layer.</td>
 </tr>
-<tr class="field-odd field"><th class="field-name">param name:</th><td class="field-body">The name of this layers. It is not necessary.</td>
+<tr class="field-odd field"><th class="field-name">param name:</th><td class="field-body">The name of this layer. It is optional.</td>
 </tr>
 <tr class="field-even field"><th class="field-name">type name:</th><td class="field-body">None|basestring.</td>
 </tr>
@@ -3387,7 +3418,7 @@ a true binary class label :math:<a href="#id6"><span class="problematic" id="id7
 </tr>
 <tr class="field-even field"><th class="field-name">type input:</th><td class="field-body">paddle.v2.config_base.Layer.</td>
 </tr>
-<tr class="field-odd field"><th class="field-name">param name:</th><td class="field-body">The name of this layers. It is not necessary.</td>
+<tr class="field-odd field"><th class="field-name">param name:</th><td class="field-body">The name of this layer. It is optional.</td>
 </tr>
 <tr class="field-even field"><th class="field-name">type name:</th><td class="field-body">None|basestring.</td>
 </tr>
@@ -3433,7 +3464,7 @@ a true binary class label :math:<a href="#id6"><span class="problematic" id="id7
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Samples of the same query should be loaded as sequence.</li>
 <li><strong>score</strong> &#8211; The 2nd input. Score of each sample.</li>
 <li><strong>NDCG_num</strong> (<em>int</em>) &#8211; The size of NDCG (Normalized Discounted Cumulative Gain),
-e.g., 5 for NDCG&#64;5. It must be less than for equal to the
+e.g., 5 for NDCG&#64;5. It must be less than or equal to the
 minimum size of lists.</li>
 <li><strong>max_sort_size</strong> (<em>int</em>) &#8211; The size of partial sorting in calculating gradient.
 If max_sort_size = -1, then for each list, the
@@ -3442,7 +3473,7 @@ In other cases, max_sort_size must be greater than or
 equal to NDCG_num. And if max_sort_size is greater
 than the size of a list, the algorithm will sort the
 entire list of get gradient.</li>
-<li><strong>name</strong> (<em>None|basestring</em>) &#8211; The name of this layers. It is not necessary.</li>
+<li><strong>name</strong> (<em>None|basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; Extra Layer Attribute.</li>
 </ul>
 </td>
@@ -3471,7 +3502,7 @@ entire list of get gradient.</li>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Network prediction.</li>
 <li><strong>label</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Data label.</li>
 <li><strong>weight</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; The weight affects the cost, namely the scale of cost.
@@ -3530,7 +3561,7 @@ Their dimension is one.</li>
 <li><strong>label</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Label is 1 or 0, means positive order and reverse order.</li>
 <li><strong>weight</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; The weight affects the cost, namely the scale of cost.
 It is an optional argument.</li>
-<li><strong>name</strong> (<em>None|basestring</em>) &#8211; The name of this layers. It is not necessary.</li>
+<li><strong>name</strong> (<em>None|basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>coeff</strong> (<em>float</em>) &#8211; The coefficient affects the gradient in the backward.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; Extra Layer Attribute.</li>
 </ul>
@@ -3563,7 +3594,7 @@ It is an optional argument.</li>
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer.</em>) &#8211; The first input layer.</li>
-<li><strong>name</strong> (<em>None|basestring.</em>) &#8211; The name of this layers. It is not necessary.</li>
+<li><strong>name</strong> (<em>None|basestring.</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; Extra Layer Attribute.</li>
 </ul>
 </td>
@@ -3603,7 +3634,7 @@ field model.</p>
 <li><strong>weight</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; The third layer is &#8220;weight&#8221; of each sample, which is an
 optional argument.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; Parameter attribute. None means default attribute</li>
-<li><strong>name</strong> (<em>None|basestring</em>) &#8211; The name of this layers. It is not necessary.</li>
+<li><strong>name</strong> (<em>None|basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>coeff</strong> (<em>float</em>) &#8211; The coefficient affects the gradient in the backward.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttributeNone</em>) &#8211; Extra Layer config.</li>
 </ul>
@@ -3644,7 +3675,7 @@ decoding or 0 for correct decoding.</p>
 <li><strong>size</strong> (<em>int</em>) &#8211; size of this layer.</li>
 <li><strong>label</strong> (<em>paddle.v2.config_base.Layer</em><em> or </em><em>None</em>) &#8211; None or ground-truth label.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; Parameter attribute. None means default attribute</li>
-<li><strong>name</strong> (<em>None|basestring</em>) &#8211; The name of this layers. It is not necessary.</li>
+<li><strong>name</strong> (<em>None|basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttributeNone</em>) &#8211; Extra Layer config.</li>
 </ul>
 </td>
@@ -3694,7 +3725,7 @@ should also be num_classes + 1.</p>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; The input layer.</li>
 <li><strong>label</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; The data layer of label with variable length.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; category numbers + 1.</li>
-<li><strong>name</strong> (<em>basestring|None</em>) &#8211; The name of this layer</li>
+<li><strong>name</strong> (<em>basestring|None</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>norm_by_times</strong> (<em>bool</em>) &#8211; Whether to normalization by times. False by default.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttributeNone</em>) &#8211; Extra Layer config.</li>
 </ul>
@@ -3754,7 +3785,7 @@ should be consistent as that used in your labels.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; The input layer.</li>
 <li><strong>label</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; The data layer of label with variable length.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; category numbers + 1.</li>
-<li><strong>name</strong> (<em>basestring|None</em>) &#8211; The name of this layer, which can not specify.</li>
+<li><strong>name</strong> (<em>basestring|None</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>blank</strong> (<em>int</em>) &#8211; the &#8216;blank&#8217; label used in ctc</li>
 <li><strong>norm_by_times</strong> (<em>bool</em>) &#8211; Whether to normalization by times. False by default.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttributeNone</em>) &#8211; Extra Layer config.</li>
@@ -3791,8 +3822,8 @@ A fast and simple algorithm for training neural probabilistic language models.</
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; layer name</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
-<li><strong>input</strong> (<em>paddle.v2.config_base.Layer|list|tuple|collections.Sequence</em>) &#8211; input layers. It could be a paddle.v2.config_base.Layer of list/tuple of paddle.v2.config_base.Layer.</li>
+<li><strong>input</strong> (<em>paddle.v2.config_base.Layer|list|tuple|collections.Sequence</em>) &#8211; The input layers. It could be a paddle.v2.config_base.Layer of list/tuple of paddle.v2.config_base.Layer.</li>
 <li><strong>label</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; label layer</li>
 <li><strong>weight</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; weight layer, can be None(default)</li>
 <li><strong>num_classes</strong> (<em>int</em>) &#8211; number of classes.</li>
@@ -3802,7 +3833,10 @@ A fast and simple algorithm for training neural probabilistic language models.</
 <li><strong>neg_distribution</strong> (<em>list|tuple|collections.Sequence|None</em>) &#8211; The distribution for generating the random negative labels.
 A uniform distribution will be used if not provided.
 If not None, its length must be equal to num_classes.</li>
-<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|False</em>) &#8211; Bias parameter attribute. True if no bias.</li>
+<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Bool|Any</em>) &#8211; The Bias Attribute. If the parameter is set to
+False or something not type of paddle.v2.attr.ParameterAttribute,
+no bias is defined. If the parameter is set to
+True, the bias is initialized to zero.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; Extra Layer Attribute.</li>
 </ul>
 </td>
@@ -3841,9 +3875,11 @@ Hierarchical Probabilistic Neural Network Language Model.&#8221;</p>
 paddle.v2.config_base.Layer.</li>
 <li><strong>label</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Label layer.</li>
 <li><strong>num_classes</strong> (<em>int|None</em>) &#8211; number of classes.</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; layer name</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
-<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|False</em>) &#8211; Bias attribute. None means default bias.
+<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Bool|Any</em>) &#8211; The Bias Attribute. If the parameter is set to
-False means no bias.</li>
+False or something not type of paddle.v2.attr.ParameterAttribute,
+no bias is defined. If the parameter is set to
+True, the bias is initialized to zero.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None</em>) &#8211; Parameter Attribute. None means default parameter.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; Extra Layer Attribute.</li>
 </ul>
@@ -3885,7 +3921,7 @@ size of input and label are equal. The formula is as follows,</p>
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; The input layer.</li>
 <li><strong>label</strong> &#8211; The input label.</li>
-<li><strong>name</strong> (<em>None|basestring</em>) &#8211; The name of this layers. It is not necessary.</li>
+<li><strong>name</strong> (<em>None|basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>coeff</strong> (<em>float</em>) &#8211; The coefficient affects the gradient in the backward.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; Extra Layer Attribute.</li>
 </ul>
@@ -3913,7 +3949,7 @@ size of input and label are equal. The formula is as follows,</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; The Layer Name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input_loc</strong> (<em>paddle.v2.config_base.Layer | List of paddle.v2.config_base.Layer</em>) &#8211; The input predict locations.</li>
 <li><strong>input_conf</strong> (<em>paddle.v2.config_base.Layer | List of paddle.v2.config_base.Layer</em>) &#8211; The input priorbox confidence.</li>
 <li><strong>priorbox</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; The input priorbox location and the variance.</li>
@@ -3955,7 +3991,7 @@ It is used by recurrent layer group.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Input layer name.</li>
 <li><strong>eos_id</strong> (<em>int</em>) &#8211; end id of sequence</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; extra layer attributes.</li>
@@ -3981,19 +4017,25 @@ It is used by recurrent layer group.</p>
 <dl class="class">
 <dt>
 <em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">dropout</code></dt>
-<dd><p>&#64;TODO(yuyang18): Add comments.</p>
+<dd><p>The example usage is:</p>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">dropout</span> <span class="o">=</span> <span class="n">dropout</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="nb">input</span><span class="p">,</span> <span class="n">dropout_rate</span><span class="o">=</span><span class="mf">0.5</span><span class="p">)</span>
+</pre></div>
+</div>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> &#8211; </li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
-<li><strong>input</strong> &#8211; </li>
+<li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; The input layer.</li>
-<li><strong>dropout_rate</strong> &#8211; </li>
+<li><strong>dropout_rate</strong> (<em>float</em>) &#8211; The probability of dropout.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first last"></p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">paddle.v2.config_base.Layer object.</p>
+</td>
+</tr>
+<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">paddle.v2.config_base.Layer</p>
 </td>
 </tr>
 </tbody>
@@ -4027,7 +4069,7 @@ a_i * z_i  &amp;\quad \mathrm{otherwise}\end{split}\]</div>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Name of this layer.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; The input layer.</li>
 <li><strong>partial_sum</strong> (<em>int</em>) &#8211; <p>this parameter makes a group of inputs share a same weight.</p>
 <ul>
@@ -4060,7 +4102,7 @@ a_i * z_i  &amp;\quad \mathrm{otherwise}\end{split}\]</div>
 <dd><p>The gated unit layer implements a simple gating mechanism over the input.
 The input <span class="math">\(X\)</span> is first projected into a new space <span class="math">\(X'\)</span>, and
 it is also used to produce a gate weight <span class="math">\(\sigma\)</span>. Element-wise
-prodict between <a href="#id12"><span class="problematic" id="id13">:match:`X&#8217;`</span></a> and <span class="math">\(\sigma\)</span> is finally returned.</p>
+product between <a href="#id11"><span class="problematic" id="id12">:match:`X&#8217;`</span></a> and <span class="math">\(\sigma\)</span> is finally returned.</p>
 <dl class="docutils">
 <dt>Reference:</dt>
 <dd>Language Modeling with Gated Convolutional Networks
@@ -4077,7 +4119,7 @@ prodict between <a href="#id12"><span class="problematic" id="id13">:match:`X&#8
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input for this layer.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; output size of the gated unit.</li>
 <li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; activation type of the projected input.</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; name of this layer.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>gate_attr</strong> (<em>paddle.v2.attr.ExtraAttributeNone</em>) &#8211; Attributes to tune the gate output, for example, error
 clipping threshold, dropout and so on. See paddle.v2.attr.ExtraAttribute for
 more details.</li>
@@ -4124,7 +4166,7 @@ no valid bounding box.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; The Layer Name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input_loc</strong> (<em>paddle.v2.config_base.Layer | List of paddle.v2.config_base.Layer.</em>) &#8211; The input predict locations.</li>
 <li><strong>input_conf</strong> (<em>paddle.v2.config_base.Layer | List of paddle.v2.config_base.Layer.</em>) &#8211; The input priorbox confidence.</li>
 <li><strong>priorbox</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; The input priorbox location and the variance.</li>

--- a/develop/doc/searchindex.js
+++ b/develop/doc/searchindex.js
--- a/develop/doc_cn/_sources/faq/index_cn.rst.txt
+++ b/develop/doc_cn/_sources/faq/index_cn.rst.txt
@@ -321,3 +321,55 @@ pip uninstall py_paddle paddle
 然后安装paddle的python环境, 在build目录下执行
 pip install python/dist/paddle*.whl && pip install ../paddle/dist/py_paddle*.whl
+16. PaddlePaddle存储的参数格式是什么，如何和明文进行相互转化
+---------------------------------------------------------
+PaddlePaddle保存的模型参数文件内容由16字节头信息和网络参数两部分组成。头信息中，1~4字节表示PaddlePaddle版本信息，请直接填充0；5~8字节表示每个参数占用的字节数，当保存的网络参数为float类型时为4，double类型时为8；9~16字节表示保存的参数总个数。
+将PaddlePaddle保存的模型参数还原回明文时，可以使用相应数据类型的 :code:`numpy.array` 加载具体网络参数，此时可以跳过PaddlePaddle模型参数文件的头信息。若在PaddlePaddle编译时，未指定按照double精度编译，默认情况下按照float精度计算，保存的参数也是float类型。这时在使用 :code:`numpy.array` 时，一般设置 :code:`dtype=float32` 。示例如下：
+..  code-block:: python
+    def read_parameter(fname, width):
+        s = open(fname).read()
+        # skip header
+        vec = np.fromstring(s[16:], dtype=np.float32)
+        # width is the size of the corresponding layer
+        np.savetxt(fname + ".csv", vec.reshape(width, -1),
+                fmt="%.6f", delimiter=",")
+将明文参数转化为PaddlePaddle可加载的模型参数时，首先构造头信息，再写入网络参数。下面的代码将随机生成的矩阵转化为可以被PaddlePaddle加载的模型参数。
+..  code-block:: python
+    def gen_rand_param(param_file, width, height, need_trans):
+        np.random.seed()
+        header = struct.pack("iil", 0, 4, height * width)
+        param = np.float32(np.random.rand(height, width))
+        with open(param_file, "w") as fparam:
+            fparam.write(header + param.tostring())
+17. 如何加载预训练参数
+------------------------------
+* 对加载预训练参数的层，设置其参数属性 :code:`is_static=True`，使该层的参数在训练过程中保持不变。以embedding层为例，代码如下：
+..  code-block:: python
+    emb_para = paddle.attr.Param(name='emb', is_static=True)
+    paddle.layer.embedding(size=word_dim, input=x, param_attr=emb_para)
+* 从模型文件将预训练参数载入 :code:`numpy.array`，在创建parameters后，使用 :code:`parameters.set()` 加载预训练参数。PaddlePaddle保存的模型参数文件前16字节为头信息，用户将参数载入 :code:`numpy.array` 时须从第17字节开始。以embedding层为例，代码如下：
+..  code-block:: python
+    def load_parameter(file_name, h, w):
+        with open(file_name, 'rb') as f:
+            f.read(16)  # skip header.
+            return np.fromfile(f, dtype=np.float32).reshape(h, w)
+    parameters = paddle.parameters.create(my_cost)
+    parameters.set('emb', load_parameter(emb_param_file, 30000, 256))
--- a/develop/doc_cn/api/v2/config/layer.html
+++ b/develop/doc_cn/api/v2/config/layer.html
@@ -230,14 +230,15 @@
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; The Layer Name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer|list|tuple</em>) &#8211; The input layer. Could be a list/tuple of input layer.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; The layer dimension.</li>
 <li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation Type. Default is tanh.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; The Parameter Attribute|list.</li>
-<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Any</em>) &#8211; The Bias Attribute. If no bias, then pass False or
+<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Bool|Any</em>) &#8211; The Bias Attribute. If the parameter is set to
-something not type of paddle.v2.attr.ParameterAttribute. None will get a
+False or something not type of paddle.v2.attr.ParameterAttribute,
-default Bias.</li>
+no bias is defined. If the parameter is set to
+True, the bias is initialized to zero.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttributeNone</em>) &#8211; Extra Layer config.</li>
 </ul>
 </td>
@@ -271,7 +272,7 @@ specified, selective_fc acts exactly like fc.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; The Layer Name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer|list|tuple</em>) &#8211; The input layer.</li>
 <li><strong>select</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; The select layer. The output of select layer should be a
 sparse binary matrix, and treat as the mask of selective fc.
@@ -279,9 +280,10 @@ If is None, acts exactly like fc.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; The layer dimension.</li>
 <li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation Type. Default is tanh.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; The Parameter Attribute.</li>
-<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Any</em>) &#8211; The Bias Attribute. If no bias, then pass False or
+<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Bool|Any</em>) &#8211; The Bias Attribute. If the parameter is set to
-something not type of paddle.v2.attr.ParameterAttribute. None will get a
+False or something not type of paddle.v2.attr.ParameterAttribute,
-default Bias.</li>
+no bias is defined. If the parameter is set to
+True, the bias is initialized to zero.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttributeNone</em>) &#8211; Extra Layer config.</li>
 </ul>
 </td>
@@ -431,7 +433,7 @@ the right size (which is the end of array) to the left.</li>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; layer name</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>a</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Input layer a.</li>
 <li><strong>b</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer b.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; layer&#8217;s extra attribute.</li>
@@ -485,7 +487,7 @@ rest channels will be processed by rest group of filters.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Layer Input.</li>
 <li><strong>filter_size</strong> (<em>int|tuple|list</em>) &#8211; The x dimension of a filter kernel. Or input a tuple for
 two image dimension.</li>
@@ -504,8 +506,10 @@ image dimension</li>
 <li><strong>dilation</strong> (<em>int|tuple|list</em>) &#8211; The x dimension of the dilation. Or input a tuple for two
 image dimension</li>
 <li><strong>dilation_y</strong> (<em>int</em>) &#8211; The y dimension of the dilation.</li>
-<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|False</em>) &#8211; Convolution bias attribute. None means default bias.
+<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Bool|Any</em>) &#8211; The Bias Attribute. If the parameter is set to
-False means no bias.</li>
+False or something not type of paddle.v2.attr.ParameterAttribute,
+no bias is defined. If the parameter is set to
+True, the bias is initialized to zero.</li>
 <li><strong>num_channels</strong> (<em>int</em>) &#8211; number of input channels. If None will be set
 automatically from previous output.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; Convolution param attribute. None means default attribute</li>
@@ -576,15 +580,15 @@ parameter attribute is set by this parameter.</li>
 <dt>
 <em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">row_conv</code></dt>
 <dd><p>The row convolution is called lookahead convolution. It is firstly
-introduced in paper of <a class="reference external" href="https://arxiv.org/pdf/1512.02595v1.pdf">Deep Speech 2: End-toEnd Speech Recognition
+introduced in paper of <a class="reference external" href="https://arxiv.org/pdf/1512.02595v1.pdf">Deep Speech 2: End-to-End Speech Recognition
 in English and Mandarin</a> .</p>
 <p>The bidirectional RNN that learns representation for a sequence by
 performing a forward and a backward pass through the entire sequence.
 However, unlike unidirectional RNNs, bidirectional RNNs are challenging
 to deploy in an online and low-latency setting. The lookahead convolution
 incorporates information from future subsequences in a computationally
-efficient manner to improve unidirectional recurrent neural networks.</p>
+efficient manner to improve unidirectional RNNs.</p>
-<p>The connection of row convolution is different form the 1D sequence
+<p>The connection of row convolution is different from the 1D sequence
 convolution. Assumed that, the future context-length is k, that is to say,
 it can get the output at timestep t by using the the input feature from t-th
 timestep to (t+k+1)-th timestep. Assumed that the hidden dim of input
@@ -610,7 +614,7 @@ number plus one equals context_len.</p>
 plus one.</li>
 <li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation Type. Default is linear activation.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; The Parameter Attribute. If None, the parameter will be
-initialized smartly. It&#8217;s better set it by yourself.</li>
+initialized smartly. It&#8217;s better to set it by yourself.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttributeNone</em>) &#8211; Extra Layer config.</li>
 </ul>
 </td>
@@ -713,7 +717,7 @@ The details please refer to
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; layer&#8217;s input.</li>
 <li><strong>num_channels</strong> (<em>int</em>) &#8211; number of input channel.</li>
 <li><strong>pool_type</strong> &#8211; Pooling type. MaxPooling or AveragePooling. Default is MaxPooling.</li>
@@ -778,7 +782,7 @@ s = input.size / num_channels
 <li><strong>num_channels</strong> (<em>int|None</em>) &#8211; The channel number of input layer. If None will be set
 automatically from previous output.</li>
 <li><strong>groups</strong> (<em>int</em>) &#8211; The group number of input layer.</li>
-<li><strong>name</strong> (<em>None|basestring.</em>) &#8211; The name of this layer, which can not specify.</li>
+<li><strong>name</strong> (<em>None|basestring.</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; Extra Layer attribute.</li>
 </ul>
 </td>
@@ -814,7 +818,7 @@ The details please refer to
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>None|basestring</em>) &#8211; layer name.</li>
+<li><strong>name</strong> (<em>None|basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; layer&#8217;s input.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; Normalize in number of <span class="math">\(size\)</span> feature maps.</li>
 <li><strong>scale</strong> (<em>float</em>) &#8211; The hyper-parameter.</li>
@@ -862,7 +866,7 @@ y_i &amp;\gets \gamma \hat{x_i} + \beta \qquad &amp;//\ scale\ and\ shift\end{sp
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; batch normalization input. Better be linear activation.
 Because there is an activation inside batch_normalization.</li>
 <li><strong>batch_norm_type</strong> (<em>None|string</em><em>, </em><em>None</em><em> or </em><em>&quot;batch_norm&quot;</em><em> or </em><em>&quot;cudnn_batch_norm&quot;</em>) &#8211; We have batch_norm and cudnn_batch_norm. batch_norm
@@ -879,7 +883,7 @@ normalization will normalize input near zero.</li>
 <li><strong>num_channels</strong> (<em>int</em>) &#8211; num of image channels or previous layer&#8217;s number of
 filters. None will automatically get from layer&#8217;s
 input.</li>
-<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; <span class="math">\(\beta\)</span>, better be zero when initialize. So the
+<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Bool|Any</em>) &#8211; <span class="math">\(\beta\)</span>, better be zero when initialize. So the
 initial_std=0, initial_mean=1 is best practice.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; <span class="math">\(\gamma\)</span>, better be one when initialize. So the
 initial_std=0, initial_mean=1 is best practice.</li>
@@ -930,7 +934,7 @@ and <span class="math">\(out\)</span> is a (batchSize x dataDim) output vector.<
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Input layer.</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; extra layer attributes.</li>
 </ul>
 </td>
@@ -960,7 +964,7 @@ factors which dimensions equal to the channel&#8217;s number.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; The Layer Name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; The input layer.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; The Parameter Attribute|list.</li>
 </ul>
@@ -1000,7 +1004,7 @@ and the size of <span class="math">\(out\)</span> is a (batchSize x dataDim) .</
 </tr>
 <tr class="field-even field"><th class="field-name">type input:</th><td class="field-body">paddle.v2.config_base.Layer</td>
 </tr>
-<tr class="field-odd field"><th class="field-name">param name:</th><td class="field-body">Layer name.</td>
+<tr class="field-odd field"><th class="field-name">param name:</th><td class="field-body">The name of this layer. It is optional.</td>
 </tr>
 <tr class="field-even field"><th class="field-name">type name:</th><td class="field-body">basestring</td>
 </tr>
@@ -1045,9 +1049,12 @@ out_{i} = act(in_{i} + out_{i+1} * W) \ \ \text{for} \ start &lt;= i &lt; end\en
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Input Layer</li>
 <li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; activation.</li>
-<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; bias attribute.</li>
+<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Bool|Any</em>) &#8211; The Bias Attribute. If the parameter is set to
+False or something not type of paddle.v2.attr.ParameterAttribute,
+no bias is defined. If the parameter is set to
+True, the bias is initialized to zero.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; parameter attribute.</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; name of the layer</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; Layer Attribute.</li>
 </ul>
 </td>
@@ -1095,8 +1102,10 @@ more details about LSTM.</p>
 <li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; activation type, paddle.v2.activation.Tanh by default. <span class="math">\(h_t\)</span></li>
 <li><strong>gate_act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; gate activation type, paddle.v2.activation.Sigmoid by default.</li>
 <li><strong>state_act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; state activation type, paddle.v2.activation.Tanh by default.</li>
-<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|False</em>) &#8211; Bias attribute. None means default bias. False means no
+<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Bool|Any</em>) &#8211; The Bias Attribute. If the parameter is set to
-bias.</li>
+False or something not type of paddle.v2.attr.ParameterAttribute,
+no bias is defined. If the parameter is set to
+True, the bias is initialized to zero.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|False</em>) &#8211; Parameter Attribute.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttributeNone</em>) &#8211; Extra Layer attribute</li>
 </ul>
@@ -1163,8 +1172,10 @@ affects the <span class="math">\({\tilde{h_t}}\)</span>.</li>
 <li><strong>gate_act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; gate activation type, paddle.v2.activation.Sigmoid by default.
 This activation affects the <span class="math">\(z_t\)</span> and <span class="math">\(r_t\)</span>. It is the
 <span class="math">\(\sigma\)</span> in the above formula.</li>
-<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|False</em>) &#8211; Bias attribute. None means default bias. False means no
+<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Bool|Any</em>) &#8211; The Bias Attribute. If the parameter is set to
-bias.</li>
+False or something not type of paddle.v2.attr.ParameterAttribute,
+no bias is defined. If the parameter is set to
+True, the bias is initialized to zero.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|False</em>) &#8211; Parameter Attribute.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttributeNone</em>) &#8211; Extra Layer attribute</li>
 </ul>
@@ -1336,7 +1347,7 @@ output is <span class="math">\(o_t\)</span>, whose name is &#8216;state&#8217; a
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer&#8217;s name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; Layer&#8217;s size. NOTE: lstm layer&#8217;s size, should be equal to
 <code class="code docutils literal"><span class="pre">input.size/4</span></code>, and should be equal to
 <code class="code docutils literal"><span class="pre">state.size</span></code>.</li>
@@ -1347,7 +1358,10 @@ output is <span class="math">\(o_t\)</span>, whose name is &#8216;state&#8217; a
 be sigmoid only.</li>
 <li><strong>state_act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; State Activation Type. Default is sigmoid, and should
 be sigmoid only.</li>
-<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; Bias Attribute.</li>
+<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Bool|Any</em>) &#8211; The Bias Attribute. If the parameter is set to
+False or something not type of paddle.v2.attr.ParameterAttribute,
+no bias is defined. If the parameter is set to
+True, the bias is initialized to zero.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; layer&#8217;s extra attribute.</li>
 </ul>
 </td>
@@ -1377,9 +1391,12 @@ be sigmoid only.</li>
 <li><strong>output_mem</strong> &#8211; </li>
 <li><strong>size</strong> &#8211; </li>
 <li><strong>act</strong> &#8211; </li>
-<li><strong>name</strong> &#8211; </li>
+<li><strong>name</strong> &#8211; The name of this layer. It is optional.</li>
 <li><strong>gate_act</strong> &#8211; </li>
-<li><strong>bias_attr</strong> &#8211; </li>
+<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Bool|Any</em>) &#8211; The Bias Attribute. If the parameter is set to
+False or something not type of paddle.v2.attr.ParameterAttribute,
+no bias is defined. If the parameter is set to
+True, the bias is initialized to zero.</li>
 <li><strong>param_attr</strong> &#8211; the parameter_attribute for transforming the output_mem
 from previous step.</li>
 <li><strong>layer_attr</strong> &#8211; </li>
@@ -1493,7 +1510,7 @@ the output from input.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer&#8217;s name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; get output layer&#8217;s input. And this layer should contains
 multiple outputs.</li>
 <li><strong>arg_name</strong> (<em>basestring</em>) &#8211; Output name from input.</li>
@@ -1549,9 +1566,10 @@ Each inputs is a projection or operator.</p>
 <li><strong>input</strong> &#8211; inputs layer. It is an optional parameter. If set,
 then this function will just return layer&#8217;s name.</li>
 <li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation Type.</li>
-<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em><em> or </em><em>None</em><em> or </em><em>bool</em>) &#8211; The Bias Attribute. If no bias, then pass False or
+<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Bool|Any</em>) &#8211; The Bias Attribute. If the parameter is set to
-something not type of paddle.v2.attr.ParameterAttribute. None will get a
+False or something not type of paddle.v2.attr.ParameterAttribute,
-default Bias.</li>
+no bias is defined. If the parameter is set to
+True, the bias is initialized to zero.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; The extra layer config. Default is None.</li>
 </ul>
 </td>
@@ -1578,7 +1596,7 @@ default Bias.</li>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Name of this embedding layer.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; The input layer for this embedding. NOTE: must be Index Data.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; The embedding dimension.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None</em>) &#8211; The embedding parameter attribute. See paddle.v2.attr.ParameterAttribute
@@ -1974,12 +1992,15 @@ of stride is -1.</p>
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
 <li><strong>agg_level</strong> (<em>AggregateLevel</em>) &#8211; AggregateLevel.TO_NO_SEQUENCE or
 AggregateLevel.TO_SEQUENCE</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer name.</li>
 <li><strong>pooling_type</strong> (<em>BasePoolingType|None</em>) &#8211; Type of pooling, MaxPooling(default), AvgPooling,
 SumPooling, SquareRootNPooling.</li>
 <li><strong>stride</strong> (<em>Int</em>) &#8211; The step size between successive pooling regions.</li>
-<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|False</em>) &#8211; Bias parameter attribute. False if no bias.</li>
+<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Bool|Any</em>) &#8211; The Bias Attribute. If the parameter is set to
+False or something not type of paddle.v2.attr.ParameterAttribute,
+no bias is defined. If the parameter is set to
+True, the bias is initialized to zero.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttributeNone</em>) &#8211; The Extra Attributes for layer, such as dropout.</li>
 </ul>
 </td>
@@ -2015,7 +2036,7 @@ of stride is -1.</p>
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
 <li><strong>agg_level</strong> &#8211; Aggregated level</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Input layer name.</li>
 <li><strong>stride</strong> (<em>Int</em>) &#8211; The step size between successive pooling regions.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; extra layer attributes.</li>
@@ -2053,7 +2074,7 @@ of stride is -1.</p>
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
 <li><strong>agg_level</strong> &#8211; aggregation level</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Input layer name.</li>
 <li><strong>stride</strong> (<em>Int</em>) &#8211; The step size between successive pooling regions.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; extra layer attributes.</li>
@@ -2087,7 +2108,7 @@ Inputs can be list of paddle.v2.config_base.Layer or list of projection.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input</strong> (<em>list|tuple|collections.Sequence</em>) &#8211; input layers or projections</li>
 <li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation type.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; Extra Layer Attribute.</li>
@@ -2131,14 +2152,15 @@ processed in one batch.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>a</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input sequence layer</li>
 <li><strong>b</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input sequence layer</li>
 <li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation type.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; Extra Layer Attribute.</li>
-<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em><em> or </em><em>None</em><em> or </em><em>bool</em>) &#8211; The Bias Attribute. If no bias, then pass False or
+<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Bool|Any</em>) &#8211; The Bias Attribute. If the parameter is set to
-something not type of paddle.v2.attr.ParameterAttribute. None will get a
+False or something not type of paddle.v2.attr.ParameterAttribute,
-default Bias.</li>
+no bias is defined. If the parameter is set to
+True, the bias is initialized to zero.</li>
 </ul>
 </td>
 </tr>
@@ -2183,7 +2205,7 @@ will be sliced for multiple times.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; name of this layer.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input for this layer, it should be a sequence.</li>
 <li><strong>starts</strong> (<em>paddle.v2.config_base.Layer|None</em>) &#8211; start indices to slice the input sequence.</li>
 <li><strong>ends</strong> (<em>paddle.v2.config_base.Layer|None</em>) &#8211; end indices to slice the input sequence.</li>
@@ -2225,7 +2247,7 @@ beam training.</p>
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; A nested sequence.</li>
 <li><strong>selected_indices</strong> &#8211; a set of sequence indices in the nested sequence.</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; name of this layer.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 </ul>
 </td>
 </tr>
@@ -2285,7 +2307,7 @@ convolution neural network, and before recurrent neural network.</p>
 <li><strong>stride_y</strong> (<em>int</em>) &#8211; The stride size in vertical direction.</li>
 <li><strong>padding_x</strong> (<em>int</em>) &#8211; The padding size in horizontal direction.</li>
 <li><strong>padding_y</strong> (<em>int</em>) &#8211; The padding size in vertical direction.</li>
-<li><strong>name</strong> (<em>None|basestring.</em>) &#8211; The name of this layer, which can not specify.</li>
+<li><strong>name</strong> (<em>None|basestring.</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttributeNone</em>) &#8211; Extra Layer config.</li>
 </ul>
 </td>
@@ -2339,9 +2361,11 @@ sequence is one) to sequence data.&#8221;</p>
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Input layer</li>
 <li><strong>expand_as</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Expand as this layer&#8217;s sequence info.</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
-<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|False</em>) &#8211; Bias attribute. None means default bias. False means no
+<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Bool|Any</em>) &#8211; The Bias Attribute. If the parameter is set to
-bias.</li>
+False or something not type of paddle.v2.attr.ParameterAttribute,
+no bias is defined. If the parameter is set to
+True, the bias is initialized to zero.</li>
 <li><strong>expand_level</strong> (<em>ExpandLevel</em>) &#8211; whether input layer is timestep(default) or sequence.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; extra layer attributes.</li>
 </ul>
@@ -2385,7 +2409,7 @@ bias.</li>
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Input layer</li>
 <li><strong>num_repeats</strong> (<em>int</em>) &#8211; Repeat the input so many times</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>as_row_vector</strong> (<em>bool</em>) &#8211; True for treating input as row vector and repeating
 in the column direction.  This is equivalent to apply
 concat() with num_repeats same input.
@@ -2430,7 +2454,7 @@ usually used when the input sample is some image or feature map.</p>
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Input layer.</li>
 <li><strong>height</strong> (<em>int</em>) &#8211; The height of the sample matrix</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; extra layer attributes.</li>
 </ul>
 </td>
@@ -2466,12 +2490,13 @@ output sequence has T*M/N instances, the dimension of each instance is N.</p>
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Input layer.</li>
 <li><strong>reshape_size</strong> (<em>int</em>) &#8211; the size of reshaped sequence.</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation type.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; extra layer attributes.</li>
-<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em><em> or </em><em>None</em><em> or </em><em>bool</em>) &#8211; The Bias Attribute. If no bias, then pass False or
+<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Bool|Any</em>) &#8211; The Bias Attribute. If the parameter is set to
-something not type of paddle.v2.attr.ParameterAttribute. None will get a
+False or something not type of paddle.v2.attr.ParameterAttribute,
-default Bias.</li>
+no bias is defined. If the parameter is set to
+True, the bias is initialized to zero.</li>
 </ul>
 </td>
 </tr>
@@ -2520,12 +2545,14 @@ Please refer to dropout for details.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer|list|tuple</em>) &#8211; Input layers. It could be a paddle.v2.config_base.Layer or list/tuple of
 paddle.v2.config_base.Layer.</li>
 <li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation Type, default is tanh.</li>
-<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|bool</em>) &#8211; Bias attribute. If False, means no bias. None is default
+<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Bool|Any</em>) &#8211; The Bias Attribute. If the parameter is set to
-bias.</li>
+False or something not type of paddle.v2.attr.ParameterAttribute,
+no bias is defined. If the parameter is set to
+True, the bias is initialized to zero.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; Extra Layer attribute.</li>
 </ul>
 </td>
@@ -2588,7 +2615,7 @@ processed in one batch.</p>
 <li><strong>weights</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; The weight layer.</li>
 <li><strong>vectors</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; The vector layer.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; the dimension of this layer.</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; The Layer Name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttributeNone</em>) &#8211; Extra Layer config.</li>
 </ul>
 </td>
@@ -2627,7 +2654,7 @@ which is used in NEURAL TURING MACHINE.</p>
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>list|tuple</em>) &#8211; Input layer.</li>
 <li><strong>weight</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Weight layer.</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; extra layer attributes.</li>
 </ul>
 </td>
@@ -2700,7 +2727,7 @@ and <span class="math">\(y\)</span> is a output vector.</p>
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Input layer.</li>
 <li><strong>weight</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Weight layer.</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; extra layer attributes.</li>
 </ul>
 </td>
@@ -2739,7 +2766,7 @@ processed in one batch.</p>
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Input layer.</li>
 <li><strong>weight</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Weight layer.</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; extra layer attributes.</li>
 </ul>
 </td>
@@ -2775,7 +2802,7 @@ ight)</p>
 <col class="field-name" />
 <col class="field-body" />
 <tbody valign="top">
-<tr class="field-odd field"><th class="field-name">param name:</th><td class="field-body">The Layer Name.</td>
+<tr class="field-odd field"><th class="field-name">param name:</th><td class="field-body">The name of this layer. It is optional.</td>
 </tr>
 <tr class="field-even field"><th class="field-name">type name:</th><td class="field-body">basestring</td>
 </tr>
@@ -2820,7 +2847,7 @@ element-wise. There is no activation and weight.</p>
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; The input layer.</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; The Layer Name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>slope</strong> (<em>float.</em>) &#8211; the scale factor.</li>
 <li><strong>intercept</strong> (<em>float.</em>) &#8211; the offset.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttributeNone</em>) &#8211; Extra Layer config.</li>
@@ -2867,15 +2894,16 @@ For example, each sample:</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; layer name</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>a</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Input layer a.</li>
 <li><strong>b</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer b.</li>
 <li><strong>size</strong> (<em>int.</em>) &#8211; the layer dimension.</li>
 <li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; Activation Type. Default is tanh.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; The Parameter Attribute.</li>
-<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Any</em>) &#8211; The Bias Attribute. If no bias, then pass False or
+<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Bool|Any</em>) &#8211; The Bias Attribute. If the parameter is set to
-something not type of paddle.v2.attr.ParameterAttribute. None will get a
+False or something not type of paddle.v2.attr.ParameterAttribute,
-default Bias.</li>
+no bias is defined. If the parameter is set to
+True, the bias is initialized to zero.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttributeNone</em>) &#8211; Extra Layer config.</li>
 </ul>
 </td>
@@ -2914,7 +2942,7 @@ processed in one batch.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; layer name</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>a</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer a</li>
 <li><strong>b</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input layer b</li>
 <li><strong>scale</strong> (<em>float</em>) &#8211; scale for cosine value. default is 5.</li>
@@ -2953,7 +2981,7 @@ processed in one batch.</p>
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Input layer.</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; extra layer attributes.</li>
 </ul>
 </td>
@@ -2989,10 +3017,13 @@ bias are trainable.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; The Layer Name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer.</em>) &#8211; The input layer.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; The parameter attribute of scaling.</li>
-<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; The parameter attribute of shifting.</li>
+<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Bool|Any</em>) &#8211; The Bias Attribute. If the parameter is set to
+False or something not type of paddle.v2.attr.ParameterAttribute,
+no bias is defined. If the parameter is set to
+True, the bias is initialized to zero.</li>
 </ul>
 </td>
 </tr>
@@ -3027,7 +3058,7 @@ The result is stored in output.ids.</p>
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Input layer name.</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; extra layer attributes.</li>
 </ul>
 </td>
@@ -3060,7 +3091,7 @@ Sampling one id for one sample.</p>
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; The input layer.</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; The Layer Name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttributeNone</em>) &#8211; Extra Layer config.</li>
 </ul>
 </td>
@@ -3104,7 +3135,7 @@ For each index i from 0 to batchSize -1, the output is the i-th row of the
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>list of paddle.v2.config_base.Layer</em>) &#8211; Input layers.</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; extra layer attributes.</li>
 </ul>
 </td>
@@ -3174,7 +3205,7 @@ in width dimension.</p>
 <li><strong>pad_h</strong> (<em>list|None</em>) &#8211; padding size in height dimension.</li>
 <li><strong>pad_w</strong> (<em>list|None</em>) &#8211; padding size in width dimension.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; Extra Layer Attribute.</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 </ul>
 </td>
 </tr>
@@ -3210,7 +3241,7 @@ in width dimension.</p>
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer.</em>) &#8211; The first input layer.</li>
 <li><strong>label</strong> &#8211; The input label.</li>
-<li><strong>name</strong> (<em>None|basestring.</em>) &#8211; The name of this layers. It is not necessary.</li>
+<li><strong>name</strong> (<em>None|basestring.</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>coeff</strong> (<em>float.</em>) &#8211; The cost is multiplied with coeff.
 The coefficient affects the gradient in the backward.</li>
 <li><strong>weight</strong> (<em>LayerOutout</em>) &#8211; The cost of each sample is multiplied with each weight.
@@ -3250,7 +3281,7 @@ Input should be a vector of positive numbers, without normalization.</p>
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer.</em>) &#8211; The first input layer.</li>
 <li><strong>label</strong> &#8211; The input label.</li>
-<li><strong>name</strong> (<em>None|basestring.</em>) &#8211; The name of this layers. It is not necessary.</li>
+<li><strong>name</strong> (<em>None|basestring.</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>coeff</strong> (<em>float.</em>) &#8211; The coefficient affects the gradient in the backward.</li>
 <li><strong>softmax_selfnorm_alpha</strong> (<em>float.</em>) &#8211; The scale factor affects the cost.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; Extra Layer Attribute.</li>
@@ -3286,7 +3317,7 @@ Input should be a vector of positive numbers, without normalization.</p>
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; The first input layer.</li>
 <li><strong>label</strong> &#8211; The input label.</li>
-<li><strong>name</strong> (<em>None|basestring</em>) &#8211; The name of this layers. It is not necessary.</li>
+<li><strong>name</strong> (<em>None|basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>coeff</strong> (<em>float</em>) &#8211; The coefficient affects the gradient in the backward.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; Extra Layer Attribute.</li>
 </ul>
@@ -3335,7 +3366,7 @@ ight <a href="#id2"><span class="problematic" id="id3">|</span></a>leq delta</p>
 </tr>
 <tr class="field-even field"><th class="field-name">type input:</th><td class="field-body">paddle.v2.config_base.Layer.</td>
 </tr>
-<tr class="field-odd field"><th class="field-name">param name:</th><td class="field-body">The name of this layers. It is not necessary.</td>
+<tr class="field-odd field"><th class="field-name">param name:</th><td class="field-body">The name of this layer. It is optional.</td>
 </tr>
 <tr class="field-even field"><th class="field-name">type name:</th><td class="field-body">None|basestring.</td>
 </tr>
@@ -3394,7 +3425,7 @@ a true binary class label :math:<a href="#id6"><span class="problematic" id="id7
 </tr>
 <tr class="field-even field"><th class="field-name">type input:</th><td class="field-body">paddle.v2.config_base.Layer.</td>
 </tr>
-<tr class="field-odd field"><th class="field-name">param name:</th><td class="field-body">The name of this layers. It is not necessary.</td>
+<tr class="field-odd field"><th class="field-name">param name:</th><td class="field-body">The name of this layer. It is optional.</td>
 </tr>
 <tr class="field-even field"><th class="field-name">type name:</th><td class="field-body">None|basestring.</td>
 </tr>
@@ -3440,7 +3471,7 @@ a true binary class label :math:<a href="#id6"><span class="problematic" id="id7
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Samples of the same query should be loaded as sequence.</li>
 <li><strong>score</strong> &#8211; The 2nd input. Score of each sample.</li>
 <li><strong>NDCG_num</strong> (<em>int</em>) &#8211; The size of NDCG (Normalized Discounted Cumulative Gain),
-e.g., 5 for NDCG&#64;5. It must be less than for equal to the
+e.g., 5 for NDCG&#64;5. It must be less than or equal to the
 minimum size of lists.</li>
 <li><strong>max_sort_size</strong> (<em>int</em>) &#8211; The size of partial sorting in calculating gradient.
 If max_sort_size = -1, then for each list, the
@@ -3449,7 +3480,7 @@ In other cases, max_sort_size must be greater than or
 equal to NDCG_num. And if max_sort_size is greater
 than the size of a list, the algorithm will sort the
 entire list of get gradient.</li>
-<li><strong>name</strong> (<em>None|basestring</em>) &#8211; The name of this layers. It is not necessary.</li>
+<li><strong>name</strong> (<em>None|basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; Extra Layer Attribute.</li>
 </ul>
 </td>
@@ -3478,7 +3509,7 @@ entire list of get gradient.</li>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Network prediction.</li>
 <li><strong>label</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Data label.</li>
 <li><strong>weight</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; The weight affects the cost, namely the scale of cost.
@@ -3537,7 +3568,7 @@ Their dimension is one.</li>
 <li><strong>label</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Label is 1 or 0, means positive order and reverse order.</li>
 <li><strong>weight</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; The weight affects the cost, namely the scale of cost.
 It is an optional argument.</li>
-<li><strong>name</strong> (<em>None|basestring</em>) &#8211; The name of this layers. It is not necessary.</li>
+<li><strong>name</strong> (<em>None|basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>coeff</strong> (<em>float</em>) &#8211; The coefficient affects the gradient in the backward.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; Extra Layer Attribute.</li>
 </ul>
@@ -3570,7 +3601,7 @@ It is an optional argument.</li>
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer.</em>) &#8211; The first input layer.</li>
-<li><strong>name</strong> (<em>None|basestring.</em>) &#8211; The name of this layers. It is not necessary.</li>
+<li><strong>name</strong> (<em>None|basestring.</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; Extra Layer Attribute.</li>
 </ul>
 </td>
@@ -3610,7 +3641,7 @@ field model.</p>
 <li><strong>weight</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; The third layer is &#8220;weight&#8221; of each sample, which is an
 optional argument.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; Parameter attribute. None means default attribute</li>
-<li><strong>name</strong> (<em>None|basestring</em>) &#8211; The name of this layers. It is not necessary.</li>
+<li><strong>name</strong> (<em>None|basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>coeff</strong> (<em>float</em>) &#8211; The coefficient affects the gradient in the backward.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttributeNone</em>) &#8211; Extra Layer config.</li>
 </ul>
@@ -3651,7 +3682,7 @@ decoding or 0 for correct decoding.</p>
 <li><strong>size</strong> (<em>int</em>) &#8211; size of this layer.</li>
 <li><strong>label</strong> (<em>paddle.v2.config_base.Layer</em><em> or </em><em>None</em>) &#8211; None or ground-truth label.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute</em>) &#8211; Parameter attribute. None means default attribute</li>
-<li><strong>name</strong> (<em>None|basestring</em>) &#8211; The name of this layers. It is not necessary.</li>
+<li><strong>name</strong> (<em>None|basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttributeNone</em>) &#8211; Extra Layer config.</li>
 </ul>
 </td>
@@ -3701,7 +3732,7 @@ should also be num_classes + 1.</p>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; The input layer.</li>
 <li><strong>label</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; The data layer of label with variable length.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; category numbers + 1.</li>
-<li><strong>name</strong> (<em>basestring|None</em>) &#8211; The name of this layer</li>
+<li><strong>name</strong> (<em>basestring|None</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>norm_by_times</strong> (<em>bool</em>) &#8211; Whether to normalization by times. False by default.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttributeNone</em>) &#8211; Extra Layer config.</li>
 </ul>
@@ -3761,7 +3792,7 @@ should be consistent as that used in your labels.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; The input layer.</li>
 <li><strong>label</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; The data layer of label with variable length.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; category numbers + 1.</li>
-<li><strong>name</strong> (<em>basestring|None</em>) &#8211; The name of this layer, which can not specify.</li>
+<li><strong>name</strong> (<em>basestring|None</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>blank</strong> (<em>int</em>) &#8211; the &#8216;blank&#8217; label used in ctc</li>
 <li><strong>norm_by_times</strong> (<em>bool</em>) &#8211; Whether to normalization by times. False by default.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttributeNone</em>) &#8211; Extra Layer config.</li>
@@ -3798,8 +3829,8 @@ A fast and simple algorithm for training neural probabilistic language models.</
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; layer name</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
-<li><strong>input</strong> (<em>paddle.v2.config_base.Layer|list|tuple|collections.Sequence</em>) &#8211; input layers. It could be a paddle.v2.config_base.Layer of list/tuple of paddle.v2.config_base.Layer.</li>
+<li><strong>input</strong> (<em>paddle.v2.config_base.Layer|list|tuple|collections.Sequence</em>) &#8211; The input layers. It could be a paddle.v2.config_base.Layer of list/tuple of paddle.v2.config_base.Layer.</li>
 <li><strong>label</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; label layer</li>
 <li><strong>weight</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; weight layer, can be None(default)</li>
 <li><strong>num_classes</strong> (<em>int</em>) &#8211; number of classes.</li>
@@ -3809,7 +3840,10 @@ A fast and simple algorithm for training neural probabilistic language models.</
 <li><strong>neg_distribution</strong> (<em>list|tuple|collections.Sequence|None</em>) &#8211; The distribution for generating the random negative labels.
 A uniform distribution will be used if not provided.
 If not None, its length must be equal to num_classes.</li>
-<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|False</em>) &#8211; Bias parameter attribute. True if no bias.</li>
+<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Bool|Any</em>) &#8211; The Bias Attribute. If the parameter is set to
+False or something not type of paddle.v2.attr.ParameterAttribute,
+no bias is defined. If the parameter is set to
+True, the bias is initialized to zero.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; Extra Layer Attribute.</li>
 </ul>
 </td>
@@ -3848,9 +3882,11 @@ Hierarchical Probabilistic Neural Network Language Model.&#8221;</p>
 paddle.v2.config_base.Layer.</li>
 <li><strong>label</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Label layer.</li>
 <li><strong>num_classes</strong> (<em>int|None</em>) &#8211; number of classes.</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; layer name</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
-<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|False</em>) &#8211; Bias attribute. None means default bias.
+<li><strong>bias_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None|Bool|Any</em>) &#8211; The Bias Attribute. If the parameter is set to
-False means no bias.</li>
+False or something not type of paddle.v2.attr.ParameterAttribute,
+no bias is defined. If the parameter is set to
+True, the bias is initialized to zero.</li>
 <li><strong>param_attr</strong> (<em>paddle.v2.attr.ParameterAttribute|None</em>) &#8211; Parameter Attribute. None means default parameter.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; Extra Layer Attribute.</li>
 </ul>
@@ -3892,7 +3928,7 @@ size of input and label are equal. The formula is as follows,</p>
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; The input layer.</li>
 <li><strong>label</strong> &#8211; The input label.</li>
-<li><strong>name</strong> (<em>None|basestring</em>) &#8211; The name of this layers. It is not necessary.</li>
+<li><strong>name</strong> (<em>None|basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>coeff</strong> (<em>float</em>) &#8211; The coefficient affects the gradient in the backward.</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; Extra Layer Attribute.</li>
 </ul>
@@ -3920,7 +3956,7 @@ size of input and label are equal. The formula is as follows,</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; The Layer Name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input_loc</strong> (<em>paddle.v2.config_base.Layer | List of paddle.v2.config_base.Layer</em>) &#8211; The input predict locations.</li>
 <li><strong>input_conf</strong> (<em>paddle.v2.config_base.Layer | List of paddle.v2.config_base.Layer</em>) &#8211; The input priorbox confidence.</li>
 <li><strong>priorbox</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; The input priorbox location and the variance.</li>
@@ -3962,7 +3998,7 @@ It is used by recurrent layer group.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; Input layer name.</li>
 <li><strong>eos_id</strong> (<em>int</em>) &#8211; end id of sequence</li>
 <li><strong>layer_attr</strong> (<em>paddle.v2.attr.ExtraAttribute</em>) &#8211; extra layer attributes.</li>
@@ -3988,19 +4024,25 @@ It is used by recurrent layer group.</p>
 <dl class="class">
 <dt>
 <em class="property">class </em><code class="descclassname">paddle.v2.layer.</code><code class="descname">dropout</code></dt>
-<dd><p>&#64;TODO(yuyang18): Add comments.</p>
+<dd><p>The example usage is:</p>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">dropout</span> <span class="o">=</span> <span class="n">dropout</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="nb">input</span><span class="p">,</span> <span class="n">dropout_rate</span><span class="o">=</span><span class="mf">0.5</span><span class="p">)</span>
+</pre></div>
+</div>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> &#8211; </li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
-<li><strong>input</strong> &#8211; </li>
+<li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; The input layer.</li>
-<li><strong>dropout_rate</strong> &#8211; </li>
+<li><strong>dropout_rate</strong> (<em>float</em>) &#8211; The probability of dropout.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first last"></p>
+<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">paddle.v2.config_base.Layer object.</p>
+</td>
+</tr>
+<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">paddle.v2.config_base.Layer</p>
 </td>
 </tr>
 </tbody>
@@ -4034,7 +4076,7 @@ a_i * z_i  &amp;\quad \mathrm{otherwise}\end{split}\]</div>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; Name of this layer.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; The input layer.</li>
 <li><strong>partial_sum</strong> (<em>int</em>) &#8211; <p>this parameter makes a group of inputs share a same weight.</p>
 <ul>
@@ -4067,7 +4109,7 @@ a_i * z_i  &amp;\quad \mathrm{otherwise}\end{split}\]</div>
 <dd><p>The gated unit layer implements a simple gating mechanism over the input.
 The input <span class="math">\(X\)</span> is first projected into a new space <span class="math">\(X'\)</span>, and
 it is also used to produce a gate weight <span class="math">\(\sigma\)</span>. Element-wise
-prodict between <a href="#id12"><span class="problematic" id="id13">:match:`X&#8217;`</span></a> and <span class="math">\(\sigma\)</span> is finally returned.</p>
+product between <a href="#id11"><span class="problematic" id="id12">:match:`X&#8217;`</span></a> and <span class="math">\(\sigma\)</span> is finally returned.</p>
 <dl class="docutils">
 <dt>Reference:</dt>
 <dd>Language Modeling with Gated Convolutional Networks
@@ -4084,7 +4126,7 @@ prodict between <a href="#id12"><span class="problematic" id="id13">:match:`X&#8
 <li><strong>input</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; input for this layer.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; output size of the gated unit.</li>
 <li><strong>act</strong> (<em>paddle.v2.activation.Base</em>) &#8211; activation type of the projected input.</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; name of this layer.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>gate_attr</strong> (<em>paddle.v2.attr.ExtraAttributeNone</em>) &#8211; Attributes to tune the gate output, for example, error
 clipping threshold, dropout and so on. See paddle.v2.attr.ExtraAttribute for
 more details.</li>
@@ -4131,7 +4173,7 @@ no valid bounding box.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; The Layer Name.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; The name of this layer. It is optional.</li>
 <li><strong>input_loc</strong> (<em>paddle.v2.config_base.Layer | List of paddle.v2.config_base.Layer.</em>) &#8211; The input predict locations.</li>
 <li><strong>input_conf</strong> (<em>paddle.v2.config_base.Layer | List of paddle.v2.config_base.Layer.</em>) &#8211; The input priorbox confidence.</li>
 <li><strong>priorbox</strong> (<em>paddle.v2.config_base.Layer</em>) &#8211; The input priorbox location and the variance.</li>

--- a/develop/doc_cn/faq/index_cn.html
+++ b/develop/doc_cn/faq/index_cn.html
@@ -186,42 +186,44 @@
           <div itemprop="articleBody">
  <div class="section" id="faq">
-<h1><a class="toc-backref" href="#id9">FAQ</a><a class="headerlink" href="#faq" title="永久链接至标题">¶</a></h1>
+<h1><a class="toc-backref" href="#id11">FAQ</a><a class="headerlink" href="#faq" title="永久链接至标题">¶</a></h1>
 <div class="contents topic" id="contents">
 <p class="topic-title first">Contents</p>
 <ul class="simple">
-<li><a class="reference internal" href="#faq" id="id9">FAQ</a><ul>
+<li><a class="reference internal" href="#faq" id="id11">FAQ</a><ul>
-<li><a class="reference internal" href="#id1" id="id10">1. 如何减少内存占用</a><ul>
+<li><a class="reference internal" href="#id1" id="id12">1. 如何减少内存占用</a><ul>
-<li><a class="reference internal" href="#dataprovider" id="id11">减少DataProvider缓冲池内存</a></li>
+<li><a class="reference internal" href="#dataprovider" id="id13">减少DataProvider缓冲池内存</a></li>
-<li><a class="reference internal" href="#id2" id="id12">神经元激活内存</a></li>
+<li><a class="reference internal" href="#id2" id="id14">神经元激活内存</a></li>
-<li><a class="reference internal" href="#id3" id="id13">参数内存</a></li>
+<li><a class="reference internal" href="#id3" id="id15">参数内存</a></li>
 </ul>
 </li>
-<li><a class="reference internal" href="#paddlepaddle" id="id14">2. 如何加速PaddlePaddle的训练速度</a><ul>
+<li><a class="reference internal" href="#paddlepaddle" id="id16">2. 如何加速PaddlePaddle的训练速度</a><ul>
-<li><a class="reference internal" href="#id4" id="id15">减少数据载入的耗时</a></li>
+<li><a class="reference internal" href="#id4" id="id17">减少数据载入的耗时</a></li>
-<li><a class="reference internal" href="#id5" id="id16">加速训练速度</a></li>
+<li><a class="reference internal" href="#id5" id="id18">加速训练速度</a></li>
-<li><a class="reference internal" href="#id6" id="id17">利用更多的计算资源</a></li>
+<li><a class="reference internal" href="#id6" id="id19">利用更多的计算资源</a></li>
 </ul>
 </li>
-<li><a class="reference internal" href="#illegal-instruction" id="id18">3. 遇到“非法指令”或者是“illegal instruction”</a></li>
+<li><a class="reference internal" href="#illegal-instruction" id="id20">3. 遇到“非法指令”或者是“illegal instruction”</a></li>
-<li><a class="reference internal" href="#sgd" id="id19">4. 如何选择SGD算法的学习率</a></li>
+<li><a class="reference internal" href="#sgd" id="id21">4. 如何选择SGD算法的学习率</a></li>
-<li><a class="reference internal" href="#id7" id="id20">5. 如何初始化参数</a></li>
+<li><a class="reference internal" href="#id7" id="id22">5. 如何初始化参数</a></li>
-<li><a class="reference internal" href="#id8" id="id21">6. 如何共享参数</a></li>
+<li><a class="reference internal" href="#id8" id="id23">6. 如何共享参数</a></li>
-<li><a class="reference internal" href="#cp27mu-linux-x86-64-whl-is-not-a-supported-wheel-on-this-platform" id="id22">7. *-cp27mu-linux_x86_64.whl is not a supported wheel on this platform.</a></li>
+<li><a class="reference internal" href="#cp27mu-linux-x86-64-whl-is-not-a-supported-wheel-on-this-platform" id="id24">7. *-cp27mu-linux_x86_64.whl is not a supported wheel on this platform.</a></li>
-<li><a class="reference internal" href="#python" id="id23">8.  python相关的单元测试都过不了</a></li>
+<li><a class="reference internal" href="#python" id="id25">8.  python相关的单元测试都过不了</a></li>
-<li><a class="reference internal" href="#docker-gpu-cuda-driver-version-is-insufficient" id="id24">9. 运行Docker GPU镜像出现 &#8220;CUDA driver version is insufficient&#8221;</a></li>
+<li><a class="reference internal" href="#docker-gpu-cuda-driver-version-is-insufficient" id="id26">9. 运行Docker GPU镜像出现 &#8220;CUDA driver version is insufficient&#8221;</a></li>
-<li><a class="reference internal" href="#cmake-pythonlibspythoninterp" id="id25">10. CMake源码编译, 找到的PythonLibs和PythonInterp版本不一致</a></li>
+<li><a class="reference internal" href="#cmake-pythonlibspythoninterp" id="id27">10. CMake源码编译, 找到的PythonLibs和PythonInterp版本不一致</a></li>
-<li><a class="reference internal" href="#cmake-paddle0-0-0" id="id26">11. CMake源码编译，Paddle版本号为0.0.0</a></li>
+<li><a class="reference internal" href="#cmake-paddle0-0-0" id="id28">11. CMake源码编译，Paddle版本号为0.0.0</a></li>
-<li><a class="reference internal" href="#a-protocol-message-was-rejected-because-it-was-too-big" id="id27">12. A protocol message was rejected because it was too big</a></li>
+<li><a class="reference internal" href="#a-protocol-message-was-rejected-because-it-was-too-big" id="id29">12. A protocol message was rejected because it was too big</a></li>
-<li><a class="reference internal" href="#gpu" id="id28">13. 如何指定GPU设备</a></li>
+<li><a class="reference internal" href="#gpu" id="id30">13. 如何指定GPU设备</a></li>
-<li><a class="reference internal" href="#floating-point-exception" id="id29">14. 训练过程中出现 <code class="code docutils literal"><span class="pre">Floating</span> <span class="pre">point</span> <span class="pre">exception</span></code>, 训练因此退出怎么办?</a></li>
+<li><a class="reference internal" href="#floating-point-exception" id="id31">14. 训练过程中出现 <code class="code docutils literal"><span class="pre">Floating</span> <span class="pre">point</span> <span class="pre">exception</span></code>, 训练因此退出怎么办?</a></li>
-<li><a class="reference internal" href="#import-paddle-v2-as-paddle-importerror-no-module-named-v2" id="id30">15. 编译安装后执行 import paddle.v2 as paddle 报ImportError: No module named v2</a></li>
+<li><a class="reference internal" href="#import-paddle-v2-as-paddle-importerror-no-module-named-v2" id="id32">15. 编译安装后执行 import paddle.v2 as paddle 报ImportError: No module named v2</a></li>
+<li><a class="reference internal" href="#id9" id="id33">16. PaddlePaddle存储的参数格式是什么，如何和明文进行相互转化</a></li>
+<li><a class="reference internal" href="#id10" id="id34">17. 如何加载预训练参数</a></li>
 </ul>
 </li>
 </ul>
 </div>
 <div class="section" id="id1">
-<h2><a class="toc-backref" href="#id10">1. 如何减少内存占用</a><a class="headerlink" href="#id1" title="永久链接至标题">¶</a></h2>
+<h2><a class="toc-backref" href="#id12">1. 如何减少内存占用</a><a class="headerlink" href="#id1" title="永久链接至标题">¶</a></h2>
 <p>神经网络的训练本身是一个非常消耗内存和显存的工作，经常会消耗数10GB的内存和数GB的显存。
 PaddlePaddle的内存占用主要分为如下几个方面:</p>
 <ul class="simple">
@@ -232,7 +234,7 @@ PaddlePaddle的内存占用主要分为如下几个方面:</p>
 </ul>
 <p>其中，其他内存杂项是指PaddlePaddle本身所用的一些内存，包括字符串分配，临时变量等等，暂不考虑在内。</p>
 <div class="section" id="dataprovider">
-<h3><a class="toc-backref" href="#id11">减少DataProvider缓冲池内存</a><a class="headerlink" href="#dataprovider" title="永久链接至标题">¶</a></h3>
+<h3><a class="toc-backref" href="#id13">减少DataProvider缓冲池内存</a><a class="headerlink" href="#dataprovider" title="永久链接至标题">¶</a></h3>
 <p>PyDataProvider使用的是异步加载，同时在内存里直接随即选取数据来做Shuffle。即</p>
 <img src="../_images/graphviz-9be6aad37f57c60f4b971dde0ef44ce27179cf9a.png" alt="digraph {
    rankdir=LR;
@@ -252,7 +254,7 @@ PaddlePaddle的内存占用主要分为如下几个方面:</p>
 <p>这样做可以极大的减少内存占用，并且可能会加速训练过程，详细文档参考 <a class="reference internal" href="../api/v1/data_provider/pydataprovider2_cn.html#api-pydataprovider2"><span class="std std-ref">PyDataProvider2的使用</span></a> 。</p>
 </div>
 <div class="section" id="id2">
-<h3><a class="toc-backref" href="#id12">神经元激活内存</a><a class="headerlink" href="#id2" title="永久链接至标题">¶</a></h3>
+<h3><a class="toc-backref" href="#id14">神经元激活内存</a><a class="headerlink" href="#id2" title="永久链接至标题">¶</a></h3>
 <p>神经网络在训练的时候，会对每一个激活暂存一些数据，如神经元激活值等。
 在反向传递的时候，这些数据会被用来更新参数。这些数据使用的内存主要和两个参数有关系，
 一是batch size，另一个是每条序列(Sequence)长度。所以，其实也是和每个mini-batch中包含
@@ -265,7 +267,7 @@ PaddlePaddle的内存占用主要分为如下几个方面:</p>
 </ul>
 </div>
 <div class="section" id="id3">
-<h3><a class="toc-backref" href="#id13">参数内存</a><a class="headerlink" href="#id3" title="永久链接至标题">¶</a></h3>
+<h3><a class="toc-backref" href="#id15">参数内存</a><a class="headerlink" href="#id3" title="永久链接至标题">¶</a></h3>
 <p>PaddlePaddle支持非常多的优化算法(Optimizer)，不同的优化算法需要使用不同大小的内存。
 例如使用 <code class="code docutils literal"><span class="pre">adadelta</span></code> 算法，则需要使用等于权重参数规模大约5倍的内存。举例，如果参数保存下来的模型目录
 文件为 <code class="code docutils literal"><span class="pre">100M</span></code>， 那么该优化算法至少需要 <code class="code docutils literal"><span class="pre">500M</span></code> 的内存。</p>
@@ -273,7 +275,7 @@ PaddlePaddle的内存占用主要分为如下几个方面:</p>
 </div>
 </div>
 <div class="section" id="paddlepaddle">
-<h2><a class="toc-backref" href="#id14">2. 如何加速PaddlePaddle的训练速度</a><a class="headerlink" href="#paddlepaddle" title="永久链接至标题">¶</a></h2>
+<h2><a class="toc-backref" href="#id16">2. 如何加速PaddlePaddle的训练速度</a><a class="headerlink" href="#paddlepaddle" title="永久链接至标题">¶</a></h2>
 <p>加速PaddlePaddle训练可以考虑从以下几个方面：</p>
 <ul class="simple">
 <li>减少数据载入的耗时</li>
@@ -281,7 +283,7 @@ PaddlePaddle的内存占用主要分为如下几个方面:</p>
 <li>利用分布式训练驾驭更多的计算资源</li>
 </ul>
 <div class="section" id="id4">
-<h3><a class="toc-backref" href="#id15">减少数据载入的耗时</a><a class="headerlink" href="#id4" title="永久链接至标题">¶</a></h3>
+<h3><a class="toc-backref" href="#id17">减少数据载入的耗时</a><a class="headerlink" href="#id4" title="永久链接至标题">¶</a></h3>
 <p>使用<code class="code docutils literal"><span class="pre">pydataprovider</span></code>时，可以减少缓存池的大小，同时设置内存缓存功能，即可以极大的加速数据载入流程。
 <code class="code docutils literal"><span class="pre">DataProvider</span></code> 缓存池的减小，和之前减小通过减小缓存池来减小内存占用的原理一致。</p>
 <div class="highlight-default"><div class="highlight"><pre><span></span><span class="nd">@provider</span><span class="p">(</span><span class="n">min_pool_size</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="o">...</span><span class="p">)</span>
@@ -295,7 +297,7 @@ PaddlePaddle的内存占用主要分为如下几个方面:</p>
 <p>同时 <code class="code docutils literal"><span class="pre">&#64;provider</span></code> 接口有一个 <code class="code docutils literal"><span class="pre">cache</span></code> 参数来控制缓存方法，将其设置成 <code class="code docutils literal"><span class="pre">CacheType.CACHE_PASS_IN_MEM</span></code> 的话，会将第一个 <code class="code docutils literal"><span class="pre">pass</span></code> (过完所有训练数据即为一个pass)生成的数据缓存在内存里，在之后的 <code class="code docutils literal"><span class="pre">pass</span></code> 中，不会再从 <code class="code docutils literal"><span class="pre">python</span></code> 端读取数据，而是直接从内存的缓存里读取数据。这也会极大减少数据读入的耗时。</p>
 </div>
 <div class="section" id="id5">
-<h3><a class="toc-backref" href="#id16">加速训练速度</a><a class="headerlink" href="#id5" title="永久链接至标题">¶</a></h3>
+<h3><a class="toc-backref" href="#id18">加速训练速度</a><a class="headerlink" href="#id5" title="永久链接至标题">¶</a></h3>
 <p>PaddlePaddle支持Sparse的训练，sparse训练需要训练特征是 <code class="code docutils literal"><span class="pre">sparse_binary_vector</span></code> 、 <code class="code docutils literal"><span class="pre">sparse_vector</span></code> 、或者 <code class="code docutils literal"><span class="pre">integer_value</span></code> 的任一一种。同时，与这个训练数据交互的Layer，需要将其Parameter设置成 sparse 更新模式，即设置 <code class="code docutils literal"><span class="pre">sparse_update=True</span></code></p>
 <p>这里使用简单的 <code class="code docutils literal"><span class="pre">word2vec</span></code> 训练语言模型距离，具体使用方法为:</p>
 <p>使用一个词前两个词和后两个词，来预测这个中间的词。这个任务的DataProvider为:</p>
@@ -328,7 +330,7 @@ PaddlePaddle的内存占用主要分为如下几个方面:</p>
 </div>
 </div>
 <div class="section" id="id6">
-<h3><a class="toc-backref" href="#id17">利用更多的计算资源</a><a class="headerlink" href="#id6" title="永久链接至标题">¶</a></h3>
+<h3><a class="toc-backref" href="#id19">利用更多的计算资源</a><a class="headerlink" href="#id6" title="永久链接至标题">¶</a></h3>
 <p>利用更多的计算资源可以分为一下几个方式来进行:</p>
 <ul class="simple">
 <li>单机CPU训练<ul>
@@ -348,17 +350,17 @@ PaddlePaddle的内存占用主要分为如下几个方面:</p>
 </div>
 </div>
 <div class="section" id="illegal-instruction">
-<h2><a class="toc-backref" href="#id18">3. 遇到“非法指令”或者是“illegal instruction”</a><a class="headerlink" href="#illegal-instruction" title="永久链接至标题">¶</a></h2>
+<h2><a class="toc-backref" href="#id20">3. 遇到“非法指令”或者是“illegal instruction”</a><a class="headerlink" href="#illegal-instruction" title="永久链接至标题">¶</a></h2>
 <p>PaddlePaddle使用avx SIMD指令提高cpu执行效率，因此错误的使用二进制发行版可能会导致这种错误，请选择正确的版本。</p>
 </div>
 <div class="section" id="sgd">
-<h2><a class="toc-backref" href="#id19">4. 如何选择SGD算法的学习率</a><a class="headerlink" href="#sgd" title="永久链接至标题">¶</a></h2>
+<h2><a class="toc-backref" href="#id21">4. 如何选择SGD算法的学习率</a><a class="headerlink" href="#sgd" title="永久链接至标题">¶</a></h2>
 <p>在采用sgd/async_sgd进行训练时，一个重要的问题是选择正确的learning_rate。如果learning_rate太大，那么训练有可能不收敛，如果learning_rate太小，那么收敛可能很慢，导致训练时间过长。</p>
 <p>通常做法是从一个比较大的learning_rate开始试，如果不收敛，那减少学习率10倍继续试验，直到训练收敛为止。那么如何判断训练不收敛呢？可以估计出如果模型采用不变的输出最小的cost0是多少。</p>
 <p>如果训练过程的的cost明显高于这个常数输出的cost，那么我们可以判断为训练不收敛。举一个例子，假如我们是三分类问题，采用multi-class-cross-entropy作为cost，数据中0,1,2三类的比例为 <code class="code docutils literal"><span class="pre">0.2,</span> <span class="pre">0.5,</span> <span class="pre">0.3</span></code> , 那么常数输出所能达到的最小cost是 <code class="code docutils literal"><span class="pre">-(0.2*log(0.2)+0.5*log(0.5)+0.3*log(0.3))=1.03</span></code> 。如果训练一个pass（或者更早）后，cost还大于这个数，那么可以认为训练不收敛，应该降低学习率。</p>
 </div>
 <div class="section" id="id7">
-<h2><a class="toc-backref" href="#id20">5. 如何初始化参数</a><a class="headerlink" href="#id7" title="永久链接至标题">¶</a></h2>
+<h2><a class="toc-backref" href="#id22">5. 如何初始化参数</a><a class="headerlink" href="#id7" title="永久链接至标题">¶</a></h2>
 <p>默认情况下，PaddlePaddle使用均值0，标准差为 <span class="math">\(\frac{1}{\sqrt{d}}\)</span> 来初始化参数。其中 <span class="math">\(d\)</span> 为参数矩阵的宽度。这种初始化方式在一般情况下不会产生很差的结果。如果用户想要自定义初始化方式，PaddlePaddle目前提供两种参数初始化的方式:</p>
 <ul class="simple">
 <li>高斯分布。将 <code class="code docutils literal"><span class="pre">param_attr</span></code> 设置成 <code class="code docutils literal"><span class="pre">param_attr=ParamAttr(initial_mean=0.0,</span> <span class="pre">initial_std=1.0)</span></code></li>
@@ -372,7 +374,7 @@ PaddlePaddle的内存占用主要分为如下几个方面:</p>
 <p>上述代码将bias全部初始化为1.0, 同时将参数初始化为 <code class="code docutils literal"><span class="pre">[1.0,</span> <span class="pre">-1.0]</span></code> 的均匀分布。</p>
 </div>
 <div class="section" id="id8">
-<h2><a class="toc-backref" href="#id21">6. 如何共享参数</a><a class="headerlink" href="#id8" title="永久链接至标题">¶</a></h2>
+<h2><a class="toc-backref" href="#id23">6. 如何共享参数</a><a class="headerlink" href="#id8" title="永久链接至标题">¶</a></h2>
 <p>PaddlePaddle的参数使用名字 <code class="code docutils literal"><span class="pre">name</span></code> 作为参数的ID，相同名字的参数，会共享参数。设置参数的名字，可以使用 <code class="code docutils literal"><span class="pre">ParamAttr(name=&quot;YOUR_PARAM_NAME&quot;)</span></code> 来设置。更方便的设置方式，是使得要共享的参数使用同样的 <code class="code docutils literal"><span class="pre">ParamAttr</span></code> 对象。</p>
 <p>简单的全连接网络，参数共享的配置示例为:</p>
 <div class="highlight-default"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">paddle.trainer_config_helpers</span> <span class="k">import</span> <span class="o">*</span>
@@ -409,7 +411,7 @@ PaddlePaddle的内存占用主要分为如下几个方面:</p>
 <p>这里 <code class="code docutils literal"><span class="pre">hidden_a</span></code> 和 <code class="code docutils literal"><span class="pre">hidden_b</span></code> 使用了同样的parameter和bias。并且softmax层的两个输入也使用了同样的参数 <code class="code docutils literal"><span class="pre">softmax_param</span></code>。</p>
 </div>
 <div class="section" id="cp27mu-linux-x86-64-whl-is-not-a-supported-wheel-on-this-platform">
-<h2><a class="toc-backref" href="#id22">7. *-cp27mu-linux_x86_64.whl is not a supported wheel on this platform.</a><a class="headerlink" href="#cp27mu-linux-x86-64-whl-is-not-a-supported-wheel-on-this-platform" title="永久链接至标题">¶</a></h2>
+<h2><a class="toc-backref" href="#id24">7. *-cp27mu-linux_x86_64.whl is not a supported wheel on this platform.</a><a class="headerlink" href="#cp27mu-linux-x86-64-whl-is-not-a-supported-wheel-on-this-platform" title="永久链接至标题">¶</a></h2>
 <p>出现这个问题的主要原因是，系统编译wheel包的时候，使用的 <code class="code docutils literal"><span class="pre">wheel</span></code> 包是最新的，
 而系统中的 <code class="code docutils literal"><span class="pre">pip</span></code> 包比较老。具体的解决方法是，更新 <code class="code docutils literal"><span class="pre">pip</span></code> 包并重新编译PaddlePaddle。
 更新 <code class="code docutils literal"><span class="pre">pip</span></code> 包的方法是:</p>
@@ -418,7 +420,7 @@ PaddlePaddle的内存占用主要分为如下几个方面:</p>
 </div>
 </div>
 <div class="section" id="python">
-<h2><a class="toc-backref" href="#id23">8.  python相关的单元测试都过不了</a><a class="headerlink" href="#python" title="永久链接至标题">¶</a></h2>
+<h2><a class="toc-backref" href="#id25">8.  python相关的单元测试都过不了</a><a class="headerlink" href="#python" title="永久链接至标题">¶</a></h2>
 <p>如果出现以下python相关的单元测试都过不了的情况：</p>
 <div class="highlight-bash"><div class="highlight"><pre><span></span><span class="m">24</span> - test_PyDataProvider <span class="o">(</span>Failed<span class="o">)</span>
 <span class="m">26</span> - test_RecurrentGradientMachine <span class="o">(</span>Failed<span class="o">)</span>
@@ -449,7 +451,7 @@ Please uninstall paddle package before start unittest. Try to <span class="s1">&
 </ul>
 </div>
 <div class="section" id="docker-gpu-cuda-driver-version-is-insufficient">
-<h2><a class="toc-backref" href="#id24">9. 运行Docker GPU镜像出现 &#8220;CUDA driver version is insufficient&#8221;</a><a class="headerlink" href="#docker-gpu-cuda-driver-version-is-insufficient" title="永久链接至标题">¶</a></h2>
+<h2><a class="toc-backref" href="#id26">9. 运行Docker GPU镜像出现 &#8220;CUDA driver version is insufficient&#8221;</a><a class="headerlink" href="#docker-gpu-cuda-driver-version-is-insufficient" title="永久链接至标题">¶</a></h2>
 <p>用户在使用PaddlePaddle GPU的Docker镜像的时候，常常出现 <cite>Cuda Error: CUDA driver version is insufficient for CUDA runtime version</cite>, 原因在于没有把机器上CUDA相关的驱动和库映射到容器内部。
 具体的解决方法是：</p>
 <div class="highlight-bash"><div class="highlight"><pre><span></span>$ <span class="nb">export</span> <span class="nv">CUDA_SO</span><span class="o">=</span><span class="s2">&quot;</span><span class="k">$(</span><span class="se">\l</span>s usr/lib64/libcuda* <span class="p">|</span> xargs -I<span class="o">{}</span> <span class="nb">echo</span> <span class="s1">&#39;-v {}:{}&#39;</span><span class="k">)</span><span class="s2"> </span><span class="k">$(</span><span class="se">\l</span>s /usr/lib64/libnvidia* <span class="p">|</span> xargs -I<span class="o">{}</span> <span class="nb">echo</span> <span class="s1">&#39;-v {}:{}&#39;</span><span class="k">)</span><span class="s2">&quot;</span>
@@ -460,7 +462,7 @@ $ docker run <span class="si">${</span><span class="nv">CUDA_SO</span><span clas
 <p>更多关于Docker的安装与使用, 请参考 <a class="reference external" href="http://www.paddlepaddle.org/doc_cn/build_and_install/install/docker_install.html">PaddlePaddle Docker 文档</a> 。</p>
 </div>
 <div class="section" id="cmake-pythonlibspythoninterp">
-<h2><a class="toc-backref" href="#id25">10. CMake源码编译, 找到的PythonLibs和PythonInterp版本不一致</a><a class="headerlink" href="#cmake-pythonlibspythoninterp" title="永久链接至标题">¶</a></h2>
+<h2><a class="toc-backref" href="#id27">10. CMake源码编译, 找到的PythonLibs和PythonInterp版本不一致</a><a class="headerlink" href="#cmake-pythonlibspythoninterp" title="永久链接至标题">¶</a></h2>
 <p>这是目前CMake寻找Python的逻辑存在缺陷，如果系统安装了多个Python版本，CMake找到的Python库和Python解释器版本可能有不一致现象，导致编译PaddlePaddle失败。正确的解决方法是，
 用户强制指定特定的Python版本，具体操作如下：</p>
 <blockquote>
@@ -471,7 +473,7 @@ $ docker run <span class="si">${</span><span class="nv">CUDA_SO</span><span clas
 <p>用户需要指定本机上Python的路径：<code class="docutils literal"><span class="pre">&lt;exc_path&gt;</span></code>, <code class="docutils literal"><span class="pre">&lt;lib_path&gt;</span></code>, <code class="docutils literal"><span class="pre">&lt;inc_path&gt;</span></code></p>
 </div>
 <div class="section" id="cmake-paddle0-0-0">
-<h2><a class="toc-backref" href="#id26">11. CMake源码编译，Paddle版本号为0.0.0</a><a class="headerlink" href="#cmake-paddle0-0-0" title="永久链接至标题">¶</a></h2>
+<h2><a class="toc-backref" href="#id28">11. CMake源码编译，Paddle版本号为0.0.0</a><a class="headerlink" href="#cmake-paddle0-0-0" title="永久链接至标题">¶</a></h2>
 <p>如果运行 <code class="code docutils literal"><span class="pre">paddle</span> <span class="pre">version</span></code>, 出现 <code class="code docutils literal"><span class="pre">PaddlePaddle</span> <span class="pre">0.0.0</span></code>；或者运行 <code class="code docutils literal"><span class="pre">cmake</span> <span class="pre">..</span></code>，出现</p>
 <div class="highlight-bash"><div class="highlight"><pre><span></span>CMake Warning at cmake/version.cmake:20 <span class="o">(</span>message<span class="o">)</span>:
  Cannot add paddle version from git tag
@@ -480,7 +482,7 @@ $ docker run <span class="si">${</span><span class="nv">CUDA_SO</span><span clas
 <p>那么用户需要拉取所有的远程分支到本机，命令为 <code class="code docutils literal"><span class="pre">git</span> <span class="pre">fetch</span> <span class="pre">upstream</span></code>，然后重新cmake即可。</p>
 </div>
 <div class="section" id="a-protocol-message-was-rejected-because-it-was-too-big">
-<h2><a class="toc-backref" href="#id27">12. A protocol message was rejected because it was too big</a><a class="headerlink" href="#a-protocol-message-was-rejected-because-it-was-too-big" title="永久链接至标题">¶</a></h2>
+<h2><a class="toc-backref" href="#id29">12. A protocol message was rejected because it was too big</a><a class="headerlink" href="#a-protocol-message-was-rejected-because-it-was-too-big" title="永久链接至标题">¶</a></h2>
 <p>如果在训练NLP相关模型时，出现以下错误：</p>
 <div class="highlight-bash"><div class="highlight"><pre><span></span><span class="o">[</span>libprotobuf ERROR google/protobuf/io/coded_stream.cc:171<span class="o">]</span> A protocol message was rejected because it was too big <span class="o">(</span>more than <span class="m">67108864</span> bytes<span class="o">)</span>.  To increase the limit <span class="o">(</span>or to disable these warnings<span class="o">)</span>, see CodedInputStream::SetTotalBytesLimit<span class="o">()</span> in google/protobuf/io/coded_stream.h.
 F1205 <span class="m">14</span>:59:50.295174 <span class="m">14703</span> TrainerConfigHelper.cpp:59<span class="o">]</span> Check failed: m-&gt;conf.ParseFromString<span class="o">(</span>configProtoStr<span class="o">)</span>
@@ -511,7 +513,7 @@ F1205 <span class="m">14</span>:59:50.295174 <span class="m">14703</span> Traine
 <p>完整源码可参考 <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/tree/develop/demo/seqToseq">seqToseq</a> 示例。</p>
 </div>
 <div class="section" id="gpu">
-<h2><a class="toc-backref" href="#id28">13. 如何指定GPU设备</a><a class="headerlink" href="#gpu" title="永久链接至标题">¶</a></h2>
+<h2><a class="toc-backref" href="#id30">13. 如何指定GPU设备</a><a class="headerlink" href="#gpu" title="永久链接至标题">¶</a></h2>
 <p>例如机器上有4块GPU，编号从0开始，指定使用2、3号GPU：</p>
 <ul class="simple">
 <li>方式1：通过 <a class="reference external" href="http://www.acceleware.com/blog/cudavisibledevices-masking-gpus">CUDA_VISIBLE_DEVICES</a> 环境变量来指定特定的GPU。</li>
@@ -527,7 +529,7 @@ F1205 <span class="m">14</span>:59:50.295174 <span class="m">14703</span> Traine
 </div>
 </div>
 <div class="section" id="floating-point-exception">
-<h2><a class="toc-backref" href="#id29">14. 训练过程中出现 <code class="code docutils literal"><span class="pre">Floating</span> <span class="pre">point</span> <span class="pre">exception</span></code>, 训练因此退出怎么办?</a><a class="headerlink" href="#floating-point-exception" title="永久链接至标题">¶</a></h2>
+<h2><a class="toc-backref" href="#id31">14. 训练过程中出现 <code class="code docutils literal"><span class="pre">Floating</span> <span class="pre">point</span> <span class="pre">exception</span></code>, 训练因此退出怎么办?</a><a class="headerlink" href="#floating-point-exception" title="永久链接至标题">¶</a></h2>
 <p>Paddle二进制在运行时捕获了浮点数异常，只要出现浮点数异常(即训练过程中出现NaN或者Inf)，立刻退出。浮点异常通常的原因是浮点数溢出、除零等问题。
 主要原因包括两个方面:</p>
 <ul class="simple">
@@ -538,12 +540,57 @@ F1205 <span class="m">14</span>:59:50.295174 <span class="m">14703</span> Traine
 <p>主要的解决办法是减小学习律或者对数据进行归一化处理。</p>
 </div>
 <div class="section" id="import-paddle-v2-as-paddle-importerror-no-module-named-v2">
-<h2><a class="toc-backref" href="#id30">15. 编译安装后执行 import paddle.v2 as paddle 报ImportError: No module named v2</a><a class="headerlink" href="#import-paddle-v2-as-paddle-importerror-no-module-named-v2" title="永久链接至标题">¶</a></h2>
+<h2><a class="toc-backref" href="#id32">15. 编译安装后执行 import paddle.v2 as paddle 报ImportError: No module named v2</a><a class="headerlink" href="#import-paddle-v2-as-paddle-importerror-no-module-named-v2" title="永久链接至标题">¶</a></h2>
 <p>先查看一下是否曾经安装过paddle v1版本，有的话需要先卸载：</p>
 <p>pip uninstall py_paddle paddle</p>
 <p>然后安装paddle的python环境, 在build目录下执行</p>
 <p>pip install python/dist/paddle*.whl &amp;&amp; pip install ../paddle/dist/py_paddle*.whl</p>
 </div>
+<div class="section" id="id9">
+<h2><a class="toc-backref" href="#id33">16. PaddlePaddle存储的参数格式是什么，如何和明文进行相互转化</a><a class="headerlink" href="#id9" title="永久链接至标题">¶</a></h2>
+<p>PaddlePaddle保存的模型参数文件内容由16字节头信息和网络参数两部分组成。头信息中，1~4字节表示PaddlePaddle版本信息，请直接填充0；5~8字节表示每个参数占用的字节数，当保存的网络参数为float类型时为4，double类型时为8；9~16字节表示保存的参数总个数。</p>
+<p>将PaddlePaddle保存的模型参数还原回明文时，可以使用相应数据类型的 <code class="code docutils literal"><span class="pre">numpy.array</span></code> 加载具体网络参数，此时可以跳过PaddlePaddle模型参数文件的头信息。若在PaddlePaddle编译时，未指定按照double精度编译，默认情况下按照float精度计算，保存的参数也是float类型。这时在使用 <code class="code docutils literal"><span class="pre">numpy.array</span></code> 时，一般设置 <code class="code docutils literal"><span class="pre">dtype=float32</span></code> 。示例如下：</p>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">read_parameter</span><span class="p">(</span><span class="n">fname</span><span class="p">,</span> <span class="n">width</span><span class="p">):</span>
+    <span class="n">s</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="n">fname</span><span class="p">)</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
+    <span class="c1"># skip header</span>
+    <span class="n">vec</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">fromstring</span><span class="p">(</span><span class="n">s</span><span class="p">[</span><span class="mi">16</span><span class="p">:],</span> <span class="n">dtype</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">float32</span><span class="p">)</span>
+    <span class="c1"># width is the size of the corresponding layer</span>
+    <span class="n">np</span><span class="o">.</span><span class="n">savetxt</span><span class="p">(</span><span class="n">fname</span> <span class="o">+</span> <span class="s2">&quot;.csv&quot;</span><span class="p">,</span> <span class="n">vec</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">width</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">),</span>
+            <span class="n">fmt</span><span class="o">=</span><span class="s2">&quot;</span><span class="si">%.6f</span><span class="s2">&quot;</span><span class="p">,</span> <span class="n">delimiter</span><span class="o">=</span><span class="s2">&quot;,&quot;</span><span class="p">)</span>
+</pre></div>
+</div>
+<p>将明文参数转化为PaddlePaddle可加载的模型参数时，首先构造头信息，再写入网络参数。下面的代码将随机生成的矩阵转化为可以被PaddlePaddle加载的模型参数。</p>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">gen_rand_param</span><span class="p">(</span><span class="n">param_file</span><span class="p">,</span> <span class="n">width</span><span class="p">,</span> <span class="n">height</span><span class="p">,</span> <span class="n">need_trans</span><span class="p">):</span>
+    <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">()</span>
+    <span class="n">header</span> <span class="o">=</span> <span class="n">struct</span><span class="o">.</span><span class="n">pack</span><span class="p">(</span><span class="s2">&quot;iil&quot;</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="n">height</span> <span class="o">*</span> <span class="n">width</span><span class="p">)</span>
+    <span class="n">param</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">float32</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">rand</span><span class="p">(</span><span class="n">height</span><span class="p">,</span> <span class="n">width</span><span class="p">))</span>
+    <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">param_file</span><span class="p">,</span> <span class="s2">&quot;w&quot;</span><span class="p">)</span> <span class="k">as</span> <span class="n">fparam</span><span class="p">:</span>
+        <span class="n">fparam</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">header</span> <span class="o">+</span> <span class="n">param</span><span class="o">.</span><span class="n">tostring</span><span class="p">())</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="id10">
+<h2><a class="toc-backref" href="#id34">17. 如何加载预训练参数</a><a class="headerlink" href="#id10" title="永久链接至标题">¶</a></h2>
+<ul class="simple">
+<li>对加载预训练参数的层，设置其参数属性 <code class="code docutils literal"><span class="pre">is_static=True</span></code>，使该层的参数在训练过程中保持不变。以embedding层为例，代码如下：</li>
+</ul>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">emb_para</span> <span class="o">=</span> <span class="n">paddle</span><span class="o">.</span><span class="n">attr</span><span class="o">.</span><span class="n">Param</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">&#39;emb&#39;</span><span class="p">,</span> <span class="n">is_static</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
+<span class="n">paddle</span><span class="o">.</span><span class="n">layer</span><span class="o">.</span><span class="n">embedding</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="n">word_dim</span><span class="p">,</span> <span class="nb">input</span><span class="o">=</span><span class="n">x</span><span class="p">,</span> <span class="n">param_attr</span><span class="o">=</span><span class="n">emb_para</span><span class="p">)</span>
+</pre></div>
+</div>
+<ul class="simple">
+<li>从模型文件将预训练参数载入 <code class="code docutils literal"><span class="pre">numpy.array</span></code>，在创建parameters后，使用 <code class="code docutils literal"><span class="pre">parameters.set()</span></code> 加载预训练参数。PaddlePaddle保存的模型参数文件前16字节为头信息，用户将参数载入 <code class="code docutils literal"><span class="pre">numpy.array</span></code> 时须从第17字节开始。以embedding层为例，代码如下：</li>
+</ul>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">load_parameter</span><span class="p">(</span><span class="n">file_name</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">):</span>
+    <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">file_name</span><span class="p">,</span> <span class="s1">&#39;rb&#39;</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
+        <span class="n">f</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="mi">16</span><span class="p">)</span>  <span class="c1"># skip header.</span>
+        <span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">fromfile</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">float32</span><span class="p">)</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">)</span>
+<span class="n">parameters</span> <span class="o">=</span> <span class="n">paddle</span><span class="o">.</span><span class="n">parameters</span><span class="o">.</span><span class="n">create</span><span class="p">(</span><span class="n">my_cost</span><span class="p">)</span>
+<span class="n">parameters</span><span class="o">.</span><span class="n">set</span><span class="p">(</span><span class="s1">&#39;emb&#39;</span><span class="p">,</span> <span class="n">load_parameter</span><span class="p">(</span><span class="n">emb_param_file</span><span class="p">,</span> <span class="mi">30000</span><span class="p">,</span> <span class="mi">256</span><span class="p">))</span>
+</pre></div>
+</div>
+</div>
 </div>

--- a/develop/doc_cn/searchindex.js
+++ b/develop/doc_cn/searchindex.js