Deploy to GitHub Pages: 8c71e093

2e94fbcf · Travis CI · 9bef9f31 · 2e94fbcf · 2e94fbcf · 2e94fbcf
4 changed file
--- a/develop/doc/api/v2/config/networks.html
+++ b/develop/doc/api/v2/config/networks.html
@@ -194,39 +194,39 @@
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">sequence_conv_pool</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><p>Text convolution pooling layers helper.</p>
+<dd><p>Text convolution pooling group.</p>
 <p>Text input =&gt; Context Projection =&gt; FC Layer =&gt; Pooling =&gt; Output.</p>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; name of output layer(pooling layer name)</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; group name.</li>
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; name of input layer</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>context_len</strong> (<em>int</em>) &#8211; context projection length. See
 context_projection&#8217;s document.</li>
 <li><strong>hidden_size</strong> (<em>int</em>) &#8211; FC Layer size.</li>
-<li><strong>context_start</strong> (<em>int</em><em> or </em><em>None</em>) &#8211; context projection length. See
+<li><strong>context_start</strong> (<em>int|None</em>) &#8211; context start position. See
 context_projection&#8217;s context_start.</li>
-<li><strong>pool_type</strong> (<em>BasePoolingType.</em>) &#8211; pooling layer type. See pooling_layer&#8217;s document.</li>
+<li><strong>pool_type</strong> (<em>BasePoolingType</em>) &#8211; pooling layer type. See pooling_layer&#8217;s document.</li>
 <li><strong>context_proj_layer_name</strong> (<em>basestring</em>) &#8211; context projection layer name.
 None if user don&#8217;t care.</li>
-<li><strong>context_proj_param_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None.</em>) &#8211; context projection parameter attribute.
+<li><strong>context_proj_param_attr</strong> (<em>ParameterAttribute|None</em>) &#8211; padding parameter attribute of context projection layer.
-None if user don&#8217;t care.</li>
+If false, it means padding always be zero.</li>
 <li><strong>fc_layer_name</strong> (<em>basestring</em>) &#8211; fc layer name. None if user don&#8217;t care.</li>
-<li><strong>fc_param_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None</em>) &#8211; fc layer parameter attribute. None if user don&#8217;t care.</li>
+<li><strong>fc_param_attr</strong> (<em>ParameterAttribute|None</em>) &#8211; fc layer parameter attribute. None if user don&#8217;t care.</li>
-<li><strong>fc_bias_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None</em>) &#8211; fc bias parameter attribute. False if no bias,
+<li><strong>fc_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; fc bias parameter attribute. False if no bias,
+None if user don&#8217;t care.</li>
+<li><strong>fc_act</strong> (<em>BaseActivation</em>) &#8211; fc layer activation type. None means tanh.</li>
+<li><strong>pool_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; pooling layer bias attr. False if no bias.
 None if user don&#8217;t care.</li>
-<li><strong>fc_act</strong> (<em>BaseActivation</em>) &#8211; fc layer activation type. None means tanh</li>
-<li><strong>pool_bias_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None.</em>) &#8211; pooling layer bias attr. None if don&#8217;t care.
-False if no bias.</li>
 <li><strong>fc_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; fc layer extra attribute.</li>
 <li><strong>context_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; context projection layer extra attribute.</li>
 <li><strong>pool_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; pooling layer extra attribute.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">output layer name.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">layer&#8217;s output.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -242,39 +242,39 @@ False if no bias.</li>
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">text_conv_pool</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><p>Text convolution pooling layers helper.</p>
+<dd><p>Text convolution pooling group.</p>
 <p>Text input =&gt; Context Projection =&gt; FC Layer =&gt; Pooling =&gt; Output.</p>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; name of output layer(pooling layer name)</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; group name.</li>
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; name of input layer</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>context_len</strong> (<em>int</em>) &#8211; context projection length. See
 context_projection&#8217;s document.</li>
 <li><strong>hidden_size</strong> (<em>int</em>) &#8211; FC Layer size.</li>
-<li><strong>context_start</strong> (<em>int</em><em> or </em><em>None</em>) &#8211; context projection length. See
+<li><strong>context_start</strong> (<em>int|None</em>) &#8211; context start position. See
 context_projection&#8217;s context_start.</li>
-<li><strong>pool_type</strong> (<em>BasePoolingType.</em>) &#8211; pooling layer type. See pooling_layer&#8217;s document.</li>
+<li><strong>pool_type</strong> (<em>BasePoolingType</em>) &#8211; pooling layer type. See pooling_layer&#8217;s document.</li>
 <li><strong>context_proj_layer_name</strong> (<em>basestring</em>) &#8211; context projection layer name.
 None if user don&#8217;t care.</li>
-<li><strong>context_proj_param_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None.</em>) &#8211; context projection parameter attribute.
+<li><strong>context_proj_param_attr</strong> (<em>ParameterAttribute|None</em>) &#8211; padding parameter attribute of context projection layer.
-None if user don&#8217;t care.</li>
+If false, it means padding always be zero.</li>
 <li><strong>fc_layer_name</strong> (<em>basestring</em>) &#8211; fc layer name. None if user don&#8217;t care.</li>
-<li><strong>fc_param_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None</em>) &#8211; fc layer parameter attribute. None if user don&#8217;t care.</li>
+<li><strong>fc_param_attr</strong> (<em>ParameterAttribute|None</em>) &#8211; fc layer parameter attribute. None if user don&#8217;t care.</li>
-<li><strong>fc_bias_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None</em>) &#8211; fc bias parameter attribute. False if no bias,
+<li><strong>fc_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; fc bias parameter attribute. False if no bias,
+None if user don&#8217;t care.</li>
+<li><strong>fc_act</strong> (<em>BaseActivation</em>) &#8211; fc layer activation type. None means tanh.</li>
+<li><strong>pool_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; pooling layer bias attr. False if no bias.
 None if user don&#8217;t care.</li>
-<li><strong>fc_act</strong> (<em>BaseActivation</em>) &#8211; fc layer activation type. None means tanh</li>
-<li><strong>pool_bias_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None.</em>) &#8211; pooling layer bias attr. None if don&#8217;t care.
-False if no bias.</li>
 <li><strong>fc_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; fc layer extra attribute.</li>
 <li><strong>context_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; context projection layer extra attribute.</li>
 <li><strong>pool_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; pooling layer extra attribute.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">output layer name.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">layer&#8217;s output.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -294,36 +294,37 @@ False if no bias.</li>
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">img_conv_bn_pool</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>Convolution, batch normalization, pooling group.</p>
+<p>Img input =&gt; Conv =&gt; BN =&gt; Pooling =&gt; Output.</p>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; group name</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; group name.</li>
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; layer&#8217;s input</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
-<li><strong>filter_size</strong> (<em>int</em>) &#8211; see img_conv_layer&#8217;s document</li>
+<li><strong>filter_size</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>num_filters</strong> (<em>int</em>) &#8211; see img_conv_layer&#8217;s document</li>
+<li><strong>num_filters</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>pool_size</strong> (<em>int</em>) &#8211; see img_pool_layer&#8217;s document.</li>
+<li><strong>pool_size</strong> (<em>int</em>) &#8211; see img_pool_layer for details.</li>
-<li><strong>pool_type</strong> (<em>BasePoolingType</em>) &#8211; see img_pool_layer&#8217;s document.</li>
+<li><strong>pool_type</strong> (<em>BasePoolingType</em>) &#8211; see img_pool_layer for details.</li>
-<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; see batch_norm_layer&#8217;s document.</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; see batch_norm_layer for details.</li>
-<li><strong>groups</strong> (<em>int</em>) &#8211; see img_conv_layer&#8217;s document</li>
+<li><strong>groups</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>conv_stride</strong> (<em>int</em>) &#8211; see img_conv_layer&#8217;s document.</li>
+<li><strong>conv_stride</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>conv_padding</strong> (<em>int</em>) &#8211; see img_conv_layer&#8217;s document.</li>
+<li><strong>conv_padding</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>conv_bias_attr</strong> (<em>ParameterAttribute</em>) &#8211; see img_conv_layer&#8217;s document.</li>
+<li><strong>conv_bias_attr</strong> (<em>ParameterAttribute</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>num_channel</strong> (<em>int</em>) &#8211; see img_conv_layer&#8217;s document.</li>
+<li><strong>num_channel</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>conv_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; see img_conv_layer&#8217;s document.</li>
+<li><strong>conv_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>shared_bias</strong> (<em>bool</em>) &#8211; see img_conv_layer&#8217;s document.</li>
+<li><strong>shared_bias</strong> (<em>bool</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>conv_layer_attr</strong> (<em>ExtraLayerOutput</em>) &#8211; see img_conv_layer&#8217;s document.</li>
+<li><strong>conv_layer_attr</strong> (<em>ExtraLayerOutput</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>bn_param_attr</strong> (<em>ParameterAttribute.</em>) &#8211; see batch_norm_layer&#8217;s document.</li>
+<li><strong>bn_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; see batch_norm_layer for details.</li>
-<li><strong>bn_bias_attr</strong> &#8211; see batch_norm_layer&#8217;s document.</li>
+<li><strong>bn_bias_attr</strong> (<em>ParameterAttribute</em>) &#8211; see batch_norm_layer for details.</li>
-<li><strong>bn_layer_attr</strong> &#8211; ParameterAttribute.</li>
+<li><strong>bn_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; see batch_norm_layer for details.</li>
-<li><strong>pool_stride</strong> (<em>int</em>) &#8211; see img_pool_layer&#8217;s document.</li>
+<li><strong>pool_stride</strong> (<em>int</em>) &#8211; see img_pool_layer for details.</li>
-<li><strong>pool_padding</strong> (<em>int</em>) &#8211; see img_pool_layer&#8217;s document.</li>
+<li><strong>pool_padding</strong> (<em>int</em>) &#8211; see img_pool_layer for details.</li>
-<li><strong>pool_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; see img_pool_layer&#8217;s document.</li>
+<li><strong>pool_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; see img_pool_layer for details.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">Layer groups output</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">layer&#8217;s output</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -347,26 +348,26 @@ False if no bias.</li>
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>conv_batchnorm_drop_rate</strong> (<em>list</em>) &#8211; if conv_with_batchnorm[i] is true,
 conv_batchnorm_drop_rate[i] represents the drop rate of each batch norm.</li>
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; layer&#8217;s input.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
-<li><strong>conv_num_filter</strong> (<em>int</em>) &#8211; output channels num.</li>
+<li><strong>conv_num_filter</strong> (<em>list|tuple</em>) &#8211; list of output channels num.</li>
 <li><strong>pool_size</strong> (<em>int</em>) &#8211; pooling filter size.</li>
 <li><strong>num_channels</strong> (<em>int</em>) &#8211; input channels num.</li>
 <li><strong>conv_padding</strong> (<em>int</em>) &#8211; convolution padding size.</li>
 <li><strong>conv_filter_size</strong> (<em>int</em>) &#8211; convolution filter size.</li>
 <li><strong>conv_act</strong> (<em>BaseActivation</em>) &#8211; activation funciton after convolution.</li>
-<li><strong>conv_with_batchnorm</strong> (<em>list</em>) &#8211; conv_with_batchnorm[i] represents
+<li><strong>conv_with_batchnorm</strong> (<em>list</em>) &#8211; if conv_with_batchnorm[i] is true,
-if there is a batch normalization after each convolution.</li>
+there is a batch normalization operation after each convolution.</li>
 <li><strong>pool_stride</strong> (<em>int</em>) &#8211; pooling stride size.</li>
 <li><strong>pool_type</strong> (<em>BasePoolingType</em>) &#8211; pooling type.</li>
-<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; Convolution param attribute.
+<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; param attribute of convolution layer,
 None means default attribute.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">Layer&#8217;s output</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">layer&#8217;s output</p>
 </td>
 </tr>
-<tr class="field-odd field"><th class="field-name">Type:</th><td class="field-body"><p class="first last">LayerOutput</p>
+<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
 </td>
 </tr>
 </tbody>
@@ -380,34 +381,34 @@ None means default attribute.</li>
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">simple_img_conv_pool</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>Simple image convolution and pooling group.</p>
-<p>Input =&gt; conv =&gt; pooling</p>
+<p>Img input =&gt; Conv =&gt; Pooling =&gt; Output.</p>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; group name</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; group name.</li>
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
-<li><strong>filter_size</strong> (<em>int</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>filter_size</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>num_filters</strong> (<em>int</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>num_filters</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>pool_size</strong> (<em>int</em>) &#8211; see img_pool_layer for details</li>
+<li><strong>pool_size</strong> (<em>int</em>) &#8211; see img_pool_layer for details.</li>
-<li><strong>pool_type</strong> (<em>BasePoolingType</em>) &#8211; see img_pool_layer for details</li>
+<li><strong>pool_type</strong> (<em>BasePoolingType</em>) &#8211; see img_pool_layer for details.</li>
-<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>groups</strong> (<em>int</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>groups</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>conv_stride</strong> (<em>int</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>conv_stride</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>conv_padding</strong> (<em>int</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>conv_padding</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>bias_attr</strong> (<em>ParameterAttribute</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>bias_attr</strong> (<em>ParameterAttribute</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>num_channel</strong> (<em>int</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>num_channel</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>shared_bias</strong> (<em>bool</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>shared_bias</strong> (<em>bool</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>conv_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>conv_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>pool_stride</strong> (<em>int</em>) &#8211; see img_pool_layer for details</li>
+<li><strong>pool_stride</strong> (<em>int</em>) &#8211; see img_pool_layer for details.</li>
-<li><strong>pool_padding</strong> (<em>int</em>) &#8211; see img_pool_layer for details</li>
+<li><strong>pool_padding</strong> (<em>int</em>) &#8211; see img_pool_layer for details.</li>
-<li><strong>pool_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; see img_pool_layer for details</li>
+<li><strong>pool_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; see img_pool_layer for details.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">Layer&#8217;s output</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">layer&#8217;s output</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -432,13 +433,16 @@ None means default attribute.</li>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>num_classes</strong> &#8211; </li>
+<li><strong>num_classes</strong> (<em>int</em>) &#8211; number of class.</li>
-<li><strong>input_image</strong> (<em>LayerOutput</em>) &#8211; </li>
+<li><strong>input_image</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
-<li><strong>num_channels</strong> (<em>int</em>) &#8211; </li>
+<li><strong>num_channels</strong> (<em>int</em>) &#8211; input channels num.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first last"></p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">layer&#8217;s output</p>
+</td>
+</tr>
+<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
 </td>
 </tr>
 </tbody>
@@ -456,9 +460,9 @@ None means default attribute.</li>
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">lstmemory_unit</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><p>Define calculations that a LSTM unit performs during a single time step.
+<dd><p>lstmemory_unit defines the caculation process of a LSTM unit during a
-This function itself is not a recurrent layer, so it can not be
+single time step. This function is not a recurrent layer, so it can not be
-directly used to process sequence inputs. This function is always used in
+directly used to process sequence input. This function is always used in
 recurrent_group (see layers.py for more details) to implement attention
 mechanism.</p>
 <p>Please refer to  <strong>Generating Sequences With Recurrent Neural Networks</strong>
@@ -479,21 +483,21 @@ for more details about LSTM. The link goes as follows:
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>out_memory</strong> (<em>LayerOutput | None</em>) &#8211; output of previous time step</li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; lstmemory unit name.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; lstmemory unit size.</li>
-<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; Parameter config, None if use default.</li>
+<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; parameter attribute, None means default attribute.</li>
-<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; lstm final activiation type</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; last activiation type of lstm.</li>
-<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; lstm gate activiation type</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; gate activiation type of lstm.</li>
-<li><strong>state_act</strong> (<em>BaseActivation</em>) &#8211; lstm state activiation type.</li>
+<li><strong>state_act</strong> (<em>BaseActivation</em>) &#8211; state activiation type of lstm.</li>
-<li><strong>input_proj_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; bias attribute for input-to-hidden projection.
+<li><strong>input_proj_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; bias attribute for input to hidden projection.
 False means no bias, None means default bias.</li>
 <li><strong>input_proj_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; extra layer attribute for input to hidden
 projection of the LSTM unit, such as dropout, error clipping.</li>
-<li><strong>lstm_bias_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias parameter attribute of lstm layer.
+<li><strong>lstm_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; bias parameter attribute of lstm layer.
 False means no bias, None means default bias.</li>
-<li><strong>lstm_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; lstm layer&#8217;s extra attribute.</li>
+<li><strong>lstm_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; extra attribute of lstm layer.</li>
 </ul>
 </td>
 </tr>
@@ -516,9 +520,9 @@ False means no bias, None means default bias.</li>
 <dd><p>lstm_group is a recurrent_group version of Long Short Term Memory. It
 does exactly the same calculation as the lstmemory layer (see lstmemory in
 layers.py for the maths) does. A promising benefit is that LSTM memory
-cell states, or hidden states in every time step are accessible to the
+cell states(or hidden states) in every time step are accessible to the
 user. This is especially useful in attention model. If you do not need to
-access the internal states of the lstm, but merely use its outputs,
+access the internal states of the lstm and merely use its outputs,
 it is recommended to use the lstmemory, which is relatively faster than
 lstmemory_group.</p>
 <p>NOTE: In PaddlePaddle&#8217;s implementation, the following input-to-hidden
@@ -540,18 +544,18 @@ full_matrix_projection must be included before lstmemory_unit is called.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; lstmemory group size.</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; name of the lstmemory group.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; name of lstmemory group.</li>
-<li><strong>out_memory</strong> (<em>LayerOutput | None</em>) &#8211; output of previous time step</li>
+<li><strong>out_memory</strong> (<em>LayerOutput | None</em>) &#8211; output of previous time step.</li>
-<li><strong>reverse</strong> (<em>bool</em>) &#8211; is lstm reversed</li>
+<li><strong>reverse</strong> (<em>bool</em>) &#8211; process the input in a reverse order or not.</li>
-<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; Parameter config, None if use default.</li>
+<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; parameter attribute, None means default attribute.</li>
-<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; lstm final activiation type</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; last activiation type of lstm.</li>
-<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; lstm gate activiation type</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; gate activiation type of lstm.</li>
-<li><strong>state_act</strong> (<em>BaseActivation</em>) &#8211; lstm state activiation type.</li>
+<li><strong>state_act</strong> (<em>BaseActivation</em>) &#8211; state activiation type of lstm.</li>
-<li><strong>lstm_bias_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias parameter attribute of lstm layer.
+<li><strong>lstm_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; bias parameter attribute of lstm layer.
 False means no bias, None means default bias.</li>
-<li><strong>input_proj_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; bias attribute for input-to-hidden projection.
+<li><strong>input_proj_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; bias attribute for input to hidden projection.
 False means no bias, None means default bias.</li>
 <li><strong>input_proj_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; extra layer attribute for input to hidden
 projection of the LSTM unit, such as dropout, error clipping.</li>
@@ -576,34 +580,34 @@ projection of the LSTM unit, such as dropout, error clipping.</li>
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">simple_lstm</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>Simple LSTM Cell.</p>
-<p>It just combine a mixed layer with fully_matrix_projection and a lstmemory
+<p>It just combines a mixed layer with fully_matrix_projection and a lstmemory
-layer. The simple lstm cell was implemented as follow equations.</p>
+layer. The simple lstm cell was implemented with follow equations.</p>
 <div class="math">
 \[ \begin{align}\begin{aligned}i_t &amp; = \sigma(W_{xi}x_{t} + W_{hi}h_{t-1} + W_{ci}c_{t-1} + b_i)\\f_t &amp; = \sigma(W_{xf}x_{t} + W_{hf}h_{t-1} + W_{cf}c_{t-1} + b_f)\\c_t &amp; = f_tc_{t-1} + i_t tanh (W_{xc}x_t+W_{hc}h_{t-1} + b_c)\\o_t &amp; = \sigma(W_{xo}x_{t} + W_{ho}h_{t-1} + W_{co}c_t + b_o)\\h_t &amp; = o_t tanh(c_t)\end{aligned}\end{align} \]</div>
-<p>Please refer <strong>Generating Sequences With Recurrent Neural Networks</strong> if you
+<p>Please refer to <strong>Generating Sequences With Recurrent Neural Networks</strong> for more
-want to know what lstm is. <a class="reference external" href="http://arxiv.org/abs/1308.0850">Link</a> is here.</p>
+details about lstm. <a class="reference external" href="http://arxiv.org/abs/1308.0850">Link</a> is here.</p>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>name</strong> (<em>basestring</em>) &#8211; lstm layer name.</li>
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; layer&#8217;s input.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; lstm layer size.</li>
-<li><strong>reverse</strong> (<em>bool</em>) &#8211; whether to process the input data in a reverse order</li>
+<li><strong>reverse</strong> (<em>bool</em>) &#8211; process the input in a reverse order or not.</li>
-<li><strong>mat_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; mixed layer&#8217;s matrix projection parameter attribute.</li>
+<li><strong>mat_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; parameter attribute of matrix projection in mixed layer.</li>
 <li><strong>bias_param_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias parameter attribute. False means no bias, None
 means default bias.</li>
-<li><strong>inner_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; lstm cell parameter attribute.</li>
+<li><strong>inner_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; parameter attribute of lstm cell.</li>
-<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; lstm final activiation type</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; last activiation type of lstm.</li>
-<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; lstm gate activiation type</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; gate activiation type of lstm.</li>
-<li><strong>state_act</strong> (<em>BaseActivation</em>) &#8211; lstm state activiation type.</li>
+<li><strong>state_act</strong> (<em>BaseActivation</em>) &#8211; state activiation type of lstm.</li>
-<li><strong>mixed_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; mixed layer&#8217;s extra attribute.</li>
+<li><strong>mixed_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; extra attribute of mixed layer.</li>
-<li><strong>lstm_cell_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; lstm layer&#8217;s extra attribute.</li>
+<li><strong>lstm_cell_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; extra attribute of lstm.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">lstm layer name.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">layer&#8217;s output.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -620,8 +624,8 @@ means default bias.</li>
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">bidirectional_lstm</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>A bidirectional_lstm is a recurrent unit that iterates over the input
-sequence both in forward and bardward orders, and then concatenate two
+sequence both in forward and backward orders, and then concatenate two
-outputs form a final output. However, concatenation of two outputs
+outputs to form a final output. However, concatenation of two outputs
 is not the only way to form the final output, you can also, for example,
 just add them together.</p>
 <p>Please refer to  <strong>Neural Machine Translation by Jointly Learning to Align
@@ -640,15 +644,14 @@ The link goes as follows:
 <li><strong>name</strong> (<em>basestring</em>) &#8211; bidirectional lstm layer name.</li>
 <li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; lstm layer size.</li>
-<li><strong>return_seq</strong> (<em>bool</em>) &#8211; If set False, outputs of the last time step are
+<li><strong>return_seq</strong> (<em>bool</em>) &#8211; If set False, the last time step of output are
 concatenated and returned.
-If set True, the entire output sequences that are
+If set True, the entire output sequences in forward
-processed in forward and backward directions are
+and backward directions are concatenated and returned.</li>
-concatenated and returned.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object accroding to the return_seq.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -667,9 +670,9 @@ concatenated and returned.</li>
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">gru_unit</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><p>Define calculations that a gated recurrent unit performs in a single time
+<dd><p>gru_unit defines the calculation process of a gated recurrent unit during a single
-step. This function itself is not a recurrent layer, so it can not be
+time step. This function is not a recurrent layer, so it can not be
-directly used to process sequence inputs. This function is always used in
+directly used to process sequence input. This function is always used in
 the recurrent_group (see layers.py for more details) to implement attention
 mechanism.</p>
 <p>Please see grumemory in layers.py for the details about the maths.</p>
@@ -678,13 +681,13 @@ mechanism.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>memory_boot</strong> (<em>LayerOutput | None</em>) &#8211; the initialization state of the LSTM cell.</li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; name of the gru group.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; hidden size of the gru.</li>
-<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; type of the activation</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; activation type of gru</li>
-<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; type of the gate activation</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; gate activation type or gru</li>
-<li><strong>gru_layer_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; Extra parameter attribute of the gru layer.</li>
+<li><strong>gru_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; Extra attribute of the gru layer.</li>
 </ul>
 </td>
 </tr>
@@ -708,11 +711,11 @@ mechanism.</p>
 does exactly the same calculation as the grumemory layer does. A promising
 benefit is that gru hidden states are accessible to the user. This is
 especially useful in attention model. If you do not need to access
-any internal state, but merely use the outputs of a GRU, it is recommended
+any internal state and merely use the outputs of a GRU, it is recommended
 to use the grumemory, which is relatively faster.</p>
 <p>Please see grumemory in layers.py for more detail about the maths.</p>
 <p>The example usage is:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">gru</span> <span class="o">=</span> <span class="n">gur_group</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="p">[</span><span class="n">layer1</span><span class="p">],</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">gru</span> <span class="o">=</span> <span class="n">gru_group</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="p">[</span><span class="n">layer1</span><span class="p">],</span>
                <span class="n">size</span><span class="o">=</span><span class="mi">256</span><span class="p">,</span>
                <span class="n">act</span><span class="o">=</span><span class="n">TanhActivation</span><span class="p">(),</span>
                <span class="n">gate_act</span><span class="o">=</span><span class="n">SigmoidActivation</span><span class="p">())</span>
@@ -723,15 +726,16 @@ to use the grumemory, which is relatively faster.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>memory_boot</strong> (<em>LayerOutput | None</em>) &#8211; the initialization state of the LSTM cell.</li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; name of the gru group.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; hidden size of the gru.</li>
-<li><strong>reverse</strong> (<em>bool</em>) &#8211; whether to process the input data in a reverse order</li>
+<li><strong>reverse</strong> (<em>bool</em>) &#8211; process the input in a reverse order or not.</li>
-<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; type of the activiation</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; activiation type of gru</li>
-<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; type of the gate activiation</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; gate activiation type of gru</li>
-<li><strong>gru_bias_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias. False means no bias, None means default bias.</li>
+<li><strong>gru_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; bias parameter attribute of gru layer,
-<li><strong>gru_layer_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; Extra parameter attribute of the gru layer.</li>
+False means no bias, None means default bias.</li>
+<li><strong>gru_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; Extra attribute of the gru layer.</li>
 </ul>
 </td>
 </tr>
@@ -751,11 +755,11 @@ to use the grumemory, which is relatively faster.</p>
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">simple_gru</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><p>You maybe see gru_step_layer, grumemory in layers.py, gru_unit, gru_group,
+<dd><p>You may see gru_step_layer, grumemory in layers.py, gru_unit, gru_group,
 simple_gru in network.py. The reason why there are so many interfaces is
 that we have two ways to implement recurrent neural network. One way is to
 use one complete layer to implement rnn (including simple rnn, gru and lstm)
-with multiple time steps, such as recurrent_layer, lstmemory, grumemory. But,
+with multiple time steps, such as recurrent_layer, lstmemory, grumemory. But
 the multiplication operation <span class="math">\(W x_t\)</span> is not computed in these layers.
 See details in their interfaces in layers.py.
 The other implementation is to use an recurrent group which can ensemble a
@@ -785,14 +789,15 @@ gru_group, and gru_group is relatively better than simple_gru.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; name of the gru group.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; hidden size of the gru.</li>
-<li><strong>reverse</strong> (<em>bool</em>) &#8211; whether to process the input data in a reverse order</li>
+<li><strong>reverse</strong> (<em>bool</em>) &#8211; process the input in a reverse order or not.</li>
-<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; type of the activiation</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; activiation type of gru</li>
-<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; type of the gate activiation</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; gate activiation type of gru</li>
-<li><strong>gru_bias_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias. False means no bias, None means default bias.</li>
+<li><strong>gru_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; bias parameter attribute of gru layer,
-<li><strong>gru_layer_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; Extra parameter attribute of the gru layer.</li>
+False means no bias, None means default bias.</li>
+<li><strong>gru_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; Extra attribute of the gru layer.</li>
 </ul>
 </td>
 </tr>
@@ -812,8 +817,8 @@ gru_group, and gru_group is relatively better than simple_gru.</p>
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">simple_gru2</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><p>simple_gru2 is the same with simple_gru, but using grumemory instead
+<dd><p>simple_gru2 is the same with simple_gru, but using grumemory instead.
-Please see grumemory in layers.py for more detail about the maths.
+Please refer to grumemory in layers.py for more detail about the math.
 simple_gru2 is faster than simple_gru.</p>
 <p>The example usage is:</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">gru</span> <span class="o">=</span> <span class="n">simple_gru2</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="p">[</span><span class="n">layer1</span><span class="p">],</span> <span class="n">size</span><span class="o">=</span><span class="mi">256</span><span class="p">)</span>
@@ -824,14 +829,15 @@ simple_gru2 is faster than simple_gru.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; name of the gru group.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; hidden size of the gru.</li>
-<li><strong>reverse</strong> (<em>bool</em>) &#8211; whether to process the input data in a reverse order</li>
+<li><strong>reverse</strong> (<em>bool</em>) &#8211; process the input in a reverse order or not.</li>
-<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; type of the activiation</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; activiation type of gru</li>
-<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; type of the gate activiation</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; gate activiation type of gru</li>
-<li><strong>gru_bias_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias. False means no bias, None means default bias.</li>
+<li><strong>gru_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; bias parameter attribute of gru layer,
-<li><strong>gru_layer_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; Extra parameter attribute of the gru layer.</li>
+False means no bias, None means default bias.</li>
+<li><strong>gru_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; Extra attribute of the gru layer.</li>
 </ul>
 </td>
 </tr>
@@ -852,7 +858,7 @@ simple_gru2 is faster than simple_gru.</p>
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">bidirectional_gru</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>A bidirectional_gru is a recurrent unit that iterates over the input
-sequence both in forward and bardward orders, and then concatenate two
+sequence both in forward and backward orders, and then concatenate two
 outputs to form a final output. However, concatenation of two outputs
 is not the only way to form the final output, you can also, for example,
 just add them together.</p>
@@ -868,11 +874,10 @@ just add them together.</p>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; bidirectional gru layer name.</li>
 <li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; gru layer size.</li>
-<li><strong>return_seq</strong> (<em>bool</em>) &#8211; If set False, outputs of the last time step are
+<li><strong>return_seq</strong> (<em>bool</em>) &#8211; If set False, the last time step of output are
 concatenated and returned.
-If set True, the entire output sequences that are
+If set True, the entire output sequences in forward
-processed in forward and backward directions are
+and backward directions are concatenated and returned.</li>
-concatenated and returned.</li>
 </ul>
 </td>
 </tr>
@@ -893,7 +898,7 @@ concatenated and returned.</li>
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">simple_attention</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><p>Calculate and then return a context vector by attention machanism.
+<dd><p>Calculate and return a context vector with attention mechanism.
 Size of the context vector equals to size of the encoded_sequence.</p>
 <div class="math">
 \[ \begin{align}\begin{aligned}a(s_{i-1},h_{j}) &amp; = v_{a}f(W_{a}s_{t-1} + U_{a}h_{j})\\e_{i,j} &amp; = a(s_{i-1}, h_{j})\\a_{i,j} &amp; = \frac{exp(e_{i,j})}{\sum_{k=1}^{T_x}{exp(e_{i,k})}}\\c_{i} &amp; = \sum_{j=1}^{T_{x}}a_{i,j}h_{j}\end{aligned}\end{align} \]</div>
@@ -917,8 +922,8 @@ Align and Translate</strong> for more details. The link is as follows:
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>name</strong> (<em>basestring</em>) &#8211; name of the attention model.</li>
 <li><strong>softmax_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; parameter attribute of sequence softmax
-that is used to produce attention weight</li>
+that is used to produce attention weight.</li>
-<li><strong>weight_act</strong> (<em>Activation</em>) &#8211; activation of the attention model</li>
+<li><strong>weight_act</strong> (<em>BaseActivation</em>) &#8211; activation of the attention model.</li>
 <li><strong>encoded_sequence</strong> (<em>LayerOutput</em>) &#8211; output of the encoder</li>
 <li><strong>encoded_proj</strong> (<em>LayerOutput</em>) &#8211; attention weight is computed by a feed forward neural
 network which has two inputs : decoder&#8217;s hidden state

--- a/develop/doc/searchindex.js
+++ b/develop/doc/searchindex.js
--- a/develop/doc_cn/api/v2/config/networks.html
+++ b/develop/doc_cn/api/v2/config/networks.html
@@ -201,39 +201,39 @@
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">sequence_conv_pool</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><p>Text convolution pooling layers helper.</p>
+<dd><p>Text convolution pooling group.</p>
 <p>Text input =&gt; Context Projection =&gt; FC Layer =&gt; Pooling =&gt; Output.</p>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; name of output layer(pooling layer name)</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; group name.</li>
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; name of input layer</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>context_len</strong> (<em>int</em>) &#8211; context projection length. See
 context_projection&#8217;s document.</li>
 <li><strong>hidden_size</strong> (<em>int</em>) &#8211; FC Layer size.</li>
-<li><strong>context_start</strong> (<em>int</em><em> or </em><em>None</em>) &#8211; context projection length. See
+<li><strong>context_start</strong> (<em>int|None</em>) &#8211; context start position. See
 context_projection&#8217;s context_start.</li>
-<li><strong>pool_type</strong> (<em>BasePoolingType.</em>) &#8211; pooling layer type. See pooling_layer&#8217;s document.</li>
+<li><strong>pool_type</strong> (<em>BasePoolingType</em>) &#8211; pooling layer type. See pooling_layer&#8217;s document.</li>
 <li><strong>context_proj_layer_name</strong> (<em>basestring</em>) &#8211; context projection layer name.
 None if user don&#8217;t care.</li>
-<li><strong>context_proj_param_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None.</em>) &#8211; context projection parameter attribute.
+<li><strong>context_proj_param_attr</strong> (<em>ParameterAttribute|None</em>) &#8211; padding parameter attribute of context projection layer.
-None if user don&#8217;t care.</li>
+If false, it means padding always be zero.</li>
 <li><strong>fc_layer_name</strong> (<em>basestring</em>) &#8211; fc layer name. None if user don&#8217;t care.</li>
-<li><strong>fc_param_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None</em>) &#8211; fc layer parameter attribute. None if user don&#8217;t care.</li>
+<li><strong>fc_param_attr</strong> (<em>ParameterAttribute|None</em>) &#8211; fc layer parameter attribute. None if user don&#8217;t care.</li>
-<li><strong>fc_bias_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None</em>) &#8211; fc bias parameter attribute. False if no bias,
+<li><strong>fc_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; fc bias parameter attribute. False if no bias,
+None if user don&#8217;t care.</li>
+<li><strong>fc_act</strong> (<em>BaseActivation</em>) &#8211; fc layer activation type. None means tanh.</li>
+<li><strong>pool_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; pooling layer bias attr. False if no bias.
 None if user don&#8217;t care.</li>
-<li><strong>fc_act</strong> (<em>BaseActivation</em>) &#8211; fc layer activation type. None means tanh</li>
-<li><strong>pool_bias_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None.</em>) &#8211; pooling layer bias attr. None if don&#8217;t care.
-False if no bias.</li>
 <li><strong>fc_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; fc layer extra attribute.</li>
 <li><strong>context_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; context projection layer extra attribute.</li>
 <li><strong>pool_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; pooling layer extra attribute.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">output layer name.</p>
+<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">layer&#8217;s output.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -249,39 +249,39 @@ False if no bias.</li>
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">text_conv_pool</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><p>Text convolution pooling layers helper.</p>
+<dd><p>Text convolution pooling group.</p>
 <p>Text input =&gt; Context Projection =&gt; FC Layer =&gt; Pooling =&gt; Output.</p>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; name of output layer(pooling layer name)</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; group name.</li>
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; name of input layer</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>context_len</strong> (<em>int</em>) &#8211; context projection length. See
 context_projection&#8217;s document.</li>
 <li><strong>hidden_size</strong> (<em>int</em>) &#8211; FC Layer size.</li>
-<li><strong>context_start</strong> (<em>int</em><em> or </em><em>None</em>) &#8211; context projection length. See
+<li><strong>context_start</strong> (<em>int|None</em>) &#8211; context start position. See
 context_projection&#8217;s context_start.</li>
-<li><strong>pool_type</strong> (<em>BasePoolingType.</em>) &#8211; pooling layer type. See pooling_layer&#8217;s document.</li>
+<li><strong>pool_type</strong> (<em>BasePoolingType</em>) &#8211; pooling layer type. See pooling_layer&#8217;s document.</li>
 <li><strong>context_proj_layer_name</strong> (<em>basestring</em>) &#8211; context projection layer name.
 None if user don&#8217;t care.</li>
-<li><strong>context_proj_param_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None.</em>) &#8211; context projection parameter attribute.
+<li><strong>context_proj_param_attr</strong> (<em>ParameterAttribute|None</em>) &#8211; padding parameter attribute of context projection layer.
-None if user don&#8217;t care.</li>
+If false, it means padding always be zero.</li>
 <li><strong>fc_layer_name</strong> (<em>basestring</em>) &#8211; fc layer name. None if user don&#8217;t care.</li>
-<li><strong>fc_param_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None</em>) &#8211; fc layer parameter attribute. None if user don&#8217;t care.</li>
+<li><strong>fc_param_attr</strong> (<em>ParameterAttribute|None</em>) &#8211; fc layer parameter attribute. None if user don&#8217;t care.</li>
-<li><strong>fc_bias_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None</em>) &#8211; fc bias parameter attribute. False if no bias,
+<li><strong>fc_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; fc bias parameter attribute. False if no bias,
+None if user don&#8217;t care.</li>
+<li><strong>fc_act</strong> (<em>BaseActivation</em>) &#8211; fc layer activation type. None means tanh.</li>
+<li><strong>pool_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; pooling layer bias attr. False if no bias.
 None if user don&#8217;t care.</li>
-<li><strong>fc_act</strong> (<em>BaseActivation</em>) &#8211; fc layer activation type. None means tanh</li>
-<li><strong>pool_bias_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None.</em>) &#8211; pooling layer bias attr. None if don&#8217;t care.
-False if no bias.</li>
 <li><strong>fc_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; fc layer extra attribute.</li>
 <li><strong>context_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; context projection layer extra attribute.</li>
 <li><strong>pool_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; pooling layer extra attribute.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">output layer name.</p>
+<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">layer&#8217;s output.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -301,36 +301,37 @@ False if no bias.</li>
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">img_conv_bn_pool</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>Convolution, batch normalization, pooling group.</p>
+<p>Img input =&gt; Conv =&gt; BN =&gt; Pooling =&gt; Output.</p>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; group name</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; group name.</li>
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; layer&#8217;s input</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
-<li><strong>filter_size</strong> (<em>int</em>) &#8211; see img_conv_layer&#8217;s document</li>
+<li><strong>filter_size</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>num_filters</strong> (<em>int</em>) &#8211; see img_conv_layer&#8217;s document</li>
+<li><strong>num_filters</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>pool_size</strong> (<em>int</em>) &#8211; see img_pool_layer&#8217;s document.</li>
+<li><strong>pool_size</strong> (<em>int</em>) &#8211; see img_pool_layer for details.</li>
-<li><strong>pool_type</strong> (<em>BasePoolingType</em>) &#8211; see img_pool_layer&#8217;s document.</li>
+<li><strong>pool_type</strong> (<em>BasePoolingType</em>) &#8211; see img_pool_layer for details.</li>
-<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; see batch_norm_layer&#8217;s document.</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; see batch_norm_layer for details.</li>
-<li><strong>groups</strong> (<em>int</em>) &#8211; see img_conv_layer&#8217;s document</li>
+<li><strong>groups</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>conv_stride</strong> (<em>int</em>) &#8211; see img_conv_layer&#8217;s document.</li>
+<li><strong>conv_stride</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>conv_padding</strong> (<em>int</em>) &#8211; see img_conv_layer&#8217;s document.</li>
+<li><strong>conv_padding</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>conv_bias_attr</strong> (<em>ParameterAttribute</em>) &#8211; see img_conv_layer&#8217;s document.</li>
+<li><strong>conv_bias_attr</strong> (<em>ParameterAttribute</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>num_channel</strong> (<em>int</em>) &#8211; see img_conv_layer&#8217;s document.</li>
+<li><strong>num_channel</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>conv_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; see img_conv_layer&#8217;s document.</li>
+<li><strong>conv_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>shared_bias</strong> (<em>bool</em>) &#8211; see img_conv_layer&#8217;s document.</li>
+<li><strong>shared_bias</strong> (<em>bool</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>conv_layer_attr</strong> (<em>ExtraLayerOutput</em>) &#8211; see img_conv_layer&#8217;s document.</li>
+<li><strong>conv_layer_attr</strong> (<em>ExtraLayerOutput</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>bn_param_attr</strong> (<em>ParameterAttribute.</em>) &#8211; see batch_norm_layer&#8217;s document.</li>
+<li><strong>bn_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; see batch_norm_layer for details.</li>
-<li><strong>bn_bias_attr</strong> &#8211; see batch_norm_layer&#8217;s document.</li>
+<li><strong>bn_bias_attr</strong> (<em>ParameterAttribute</em>) &#8211; see batch_norm_layer for details.</li>
-<li><strong>bn_layer_attr</strong> &#8211; ParameterAttribute.</li>
+<li><strong>bn_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; see batch_norm_layer for details.</li>
-<li><strong>pool_stride</strong> (<em>int</em>) &#8211; see img_pool_layer&#8217;s document.</li>
+<li><strong>pool_stride</strong> (<em>int</em>) &#8211; see img_pool_layer for details.</li>
-<li><strong>pool_padding</strong> (<em>int</em>) &#8211; see img_pool_layer&#8217;s document.</li>
+<li><strong>pool_padding</strong> (<em>int</em>) &#8211; see img_pool_layer for details.</li>
-<li><strong>pool_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; see img_pool_layer&#8217;s document.</li>
+<li><strong>pool_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; see img_pool_layer for details.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">Layer groups output</p>
+<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">layer&#8217;s output</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -354,26 +355,26 @@ False if no bias.</li>
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
 <li><strong>conv_batchnorm_drop_rate</strong> (<em>list</em>) &#8211; if conv_with_batchnorm[i] is true,
 conv_batchnorm_drop_rate[i] represents the drop rate of each batch norm.</li>
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; layer&#8217;s input.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
-<li><strong>conv_num_filter</strong> (<em>int</em>) &#8211; output channels num.</li>
+<li><strong>conv_num_filter</strong> (<em>list|tuple</em>) &#8211; list of output channels num.</li>
 <li><strong>pool_size</strong> (<em>int</em>) &#8211; pooling filter size.</li>
 <li><strong>num_channels</strong> (<em>int</em>) &#8211; input channels num.</li>
 <li><strong>conv_padding</strong> (<em>int</em>) &#8211; convolution padding size.</li>
 <li><strong>conv_filter_size</strong> (<em>int</em>) &#8211; convolution filter size.</li>
 <li><strong>conv_act</strong> (<em>BaseActivation</em>) &#8211; activation funciton after convolution.</li>
-<li><strong>conv_with_batchnorm</strong> (<em>list</em>) &#8211; conv_with_batchnorm[i] represents
+<li><strong>conv_with_batchnorm</strong> (<em>list</em>) &#8211; if conv_with_batchnorm[i] is true,
-if there is a batch normalization after each convolution.</li>
+there is a batch normalization operation after each convolution.</li>
 <li><strong>pool_stride</strong> (<em>int</em>) &#8211; pooling stride size.</li>
 <li><strong>pool_type</strong> (<em>BasePoolingType</em>) &#8211; pooling type.</li>
-<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; Convolution param attribute.
+<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; param attribute of convolution layer,
 None means default attribute.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">Layer&#8217;s output</p>
+<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">layer&#8217;s output</p>
 </td>
 </tr>
-<tr class="field-odd field"><th class="field-name">Type:</th><td class="field-body"><p class="first last">LayerOutput</p>
+<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">LayerOutput</p>
 </td>
 </tr>
 </tbody>
@@ -387,34 +388,34 @@ None means default attribute.</li>
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">simple_img_conv_pool</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>Simple image convolution and pooling group.</p>
-<p>Input =&gt; conv =&gt; pooling</p>
+<p>Img input =&gt; Conv =&gt; Pooling =&gt; Output.</p>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; group name</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; group name.</li>
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
-<li><strong>filter_size</strong> (<em>int</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>filter_size</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>num_filters</strong> (<em>int</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>num_filters</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>pool_size</strong> (<em>int</em>) &#8211; see img_pool_layer for details</li>
+<li><strong>pool_size</strong> (<em>int</em>) &#8211; see img_pool_layer for details.</li>
-<li><strong>pool_type</strong> (<em>BasePoolingType</em>) &#8211; see img_pool_layer for details</li>
+<li><strong>pool_type</strong> (<em>BasePoolingType</em>) &#8211; see img_pool_layer for details.</li>
-<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>groups</strong> (<em>int</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>groups</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>conv_stride</strong> (<em>int</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>conv_stride</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>conv_padding</strong> (<em>int</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>conv_padding</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>bias_attr</strong> (<em>ParameterAttribute</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>bias_attr</strong> (<em>ParameterAttribute</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>num_channel</strong> (<em>int</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>num_channel</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>shared_bias</strong> (<em>bool</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>shared_bias</strong> (<em>bool</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>conv_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; see img_conv_layer for details</li>
+<li><strong>conv_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; see img_conv_layer for details.</li>
-<li><strong>pool_stride</strong> (<em>int</em>) &#8211; see img_pool_layer for details</li>
+<li><strong>pool_stride</strong> (<em>int</em>) &#8211; see img_pool_layer for details.</li>
-<li><strong>pool_padding</strong> (<em>int</em>) &#8211; see img_pool_layer for details</li>
+<li><strong>pool_padding</strong> (<em>int</em>) &#8211; see img_pool_layer for details.</li>
-<li><strong>pool_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; see img_pool_layer for details</li>
+<li><strong>pool_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; see img_pool_layer for details.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">Layer&#8217;s output</p>
+<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">layer&#8217;s output</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -439,13 +440,16 @@ None means default attribute.</li>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>num_classes</strong> &#8211; </li>
+<li><strong>num_classes</strong> (<em>int</em>) &#8211; number of class.</li>
-<li><strong>input_image</strong> (<em>LayerOutput</em>) &#8211; </li>
+<li><strong>input_image</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
-<li><strong>num_channels</strong> (<em>int</em>) &#8211; </li>
+<li><strong>num_channels</strong> (<em>int</em>) &#8211; input channels num.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first last"></p>
+<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">layer&#8217;s output</p>
+</td>
+</tr>
+<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">LayerOutput</p>
 </td>
 </tr>
 </tbody>
@@ -463,9 +467,9 @@ None means default attribute.</li>
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">lstmemory_unit</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><p>Define calculations that a LSTM unit performs during a single time step.
+<dd><p>lstmemory_unit defines the caculation process of a LSTM unit during a
-This function itself is not a recurrent layer, so it can not be
+single time step. This function is not a recurrent layer, so it can not be
-directly used to process sequence inputs. This function is always used in
+directly used to process sequence input. This function is always used in
 recurrent_group (see layers.py for more details) to implement attention
 mechanism.</p>
 <p>Please refer to  <strong>Generating Sequences With Recurrent Neural Networks</strong>
@@ -486,21 +490,21 @@ for more details about LSTM. The link goes as follows:
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>out_memory</strong> (<em>LayerOutput | None</em>) &#8211; output of previous time step</li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; lstmemory unit name.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; lstmemory unit size.</li>
-<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; Parameter config, None if use default.</li>
+<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; parameter attribute, None means default attribute.</li>
-<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; lstm final activiation type</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; last activiation type of lstm.</li>
-<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; lstm gate activiation type</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; gate activiation type of lstm.</li>
-<li><strong>state_act</strong> (<em>BaseActivation</em>) &#8211; lstm state activiation type.</li>
+<li><strong>state_act</strong> (<em>BaseActivation</em>) &#8211; state activiation type of lstm.</li>
-<li><strong>input_proj_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; bias attribute for input-to-hidden projection.
+<li><strong>input_proj_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; bias attribute for input to hidden projection.
 False means no bias, None means default bias.</li>
 <li><strong>input_proj_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; extra layer attribute for input to hidden
 projection of the LSTM unit, such as dropout, error clipping.</li>
-<li><strong>lstm_bias_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias parameter attribute of lstm layer.
+<li><strong>lstm_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; bias parameter attribute of lstm layer.
 False means no bias, None means default bias.</li>
-<li><strong>lstm_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; lstm layer&#8217;s extra attribute.</li>
+<li><strong>lstm_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; extra attribute of lstm layer.</li>
 </ul>
 </td>
 </tr>
@@ -523,9 +527,9 @@ False means no bias, None means default bias.</li>
 <dd><p>lstm_group is a recurrent_group version of Long Short Term Memory. It
 does exactly the same calculation as the lstmemory layer (see lstmemory in
 layers.py for the maths) does. A promising benefit is that LSTM memory
-cell states, or hidden states in every time step are accessible to the
+cell states(or hidden states) in every time step are accessible to the
 user. This is especially useful in attention model. If you do not need to
-access the internal states of the lstm, but merely use its outputs,
+access the internal states of the lstm and merely use its outputs,
 it is recommended to use the lstmemory, which is relatively faster than
 lstmemory_group.</p>
 <p>NOTE: In PaddlePaddle&#8217;s implementation, the following input-to-hidden
@@ -547,18 +551,18 @@ full_matrix_projection must be included before lstmemory_unit is called.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; lstmemory group size.</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; name of the lstmemory group.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; name of lstmemory group.</li>
-<li><strong>out_memory</strong> (<em>LayerOutput | None</em>) &#8211; output of previous time step</li>
+<li><strong>out_memory</strong> (<em>LayerOutput | None</em>) &#8211; output of previous time step.</li>
-<li><strong>reverse</strong> (<em>bool</em>) &#8211; is lstm reversed</li>
+<li><strong>reverse</strong> (<em>bool</em>) &#8211; process the input in a reverse order or not.</li>
-<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; Parameter config, None if use default.</li>
+<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; parameter attribute, None means default attribute.</li>
-<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; lstm final activiation type</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; last activiation type of lstm.</li>
-<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; lstm gate activiation type</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; gate activiation type of lstm.</li>
-<li><strong>state_act</strong> (<em>BaseActivation</em>) &#8211; lstm state activiation type.</li>
+<li><strong>state_act</strong> (<em>BaseActivation</em>) &#8211; state activiation type of lstm.</li>
-<li><strong>lstm_bias_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias parameter attribute of lstm layer.
+<li><strong>lstm_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; bias parameter attribute of lstm layer.
 False means no bias, None means default bias.</li>
-<li><strong>input_proj_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; bias attribute for input-to-hidden projection.
+<li><strong>input_proj_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; bias attribute for input to hidden projection.
 False means no bias, None means default bias.</li>
 <li><strong>input_proj_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; extra layer attribute for input to hidden
 projection of the LSTM unit, such as dropout, error clipping.</li>
@@ -583,34 +587,34 @@ projection of the LSTM unit, such as dropout, error clipping.</li>
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">simple_lstm</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>Simple LSTM Cell.</p>
-<p>It just combine a mixed layer with fully_matrix_projection and a lstmemory
+<p>It just combines a mixed layer with fully_matrix_projection and a lstmemory
-layer. The simple lstm cell was implemented as follow equations.</p>
+layer. The simple lstm cell was implemented with follow equations.</p>
 <div class="math">
 \[ \begin{align}\begin{aligned}i_t &amp; = \sigma(W_{xi}x_{t} + W_{hi}h_{t-1} + W_{ci}c_{t-1} + b_i)\\f_t &amp; = \sigma(W_{xf}x_{t} + W_{hf}h_{t-1} + W_{cf}c_{t-1} + b_f)\\c_t &amp; = f_tc_{t-1} + i_t tanh (W_{xc}x_t+W_{hc}h_{t-1} + b_c)\\o_t &amp; = \sigma(W_{xo}x_{t} + W_{ho}h_{t-1} + W_{co}c_t + b_o)\\h_t &amp; = o_t tanh(c_t)\end{aligned}\end{align} \]</div>
-<p>Please refer <strong>Generating Sequences With Recurrent Neural Networks</strong> if you
+<p>Please refer to <strong>Generating Sequences With Recurrent Neural Networks</strong> for more
-want to know what lstm is. <a class="reference external" href="http://arxiv.org/abs/1308.0850">Link</a> is here.</p>
+details about lstm. <a class="reference external" href="http://arxiv.org/abs/1308.0850">Link</a> is here.</p>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
 <li><strong>name</strong> (<em>basestring</em>) &#8211; lstm layer name.</li>
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; layer&#8217;s input.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; lstm layer size.</li>
-<li><strong>reverse</strong> (<em>bool</em>) &#8211; whether to process the input data in a reverse order</li>
+<li><strong>reverse</strong> (<em>bool</em>) &#8211; process the input in a reverse order or not.</li>
-<li><strong>mat_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; mixed layer&#8217;s matrix projection parameter attribute.</li>
+<li><strong>mat_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; parameter attribute of matrix projection in mixed layer.</li>
 <li><strong>bias_param_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias parameter attribute. False means no bias, None
 means default bias.</li>
-<li><strong>inner_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; lstm cell parameter attribute.</li>
+<li><strong>inner_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; parameter attribute of lstm cell.</li>
-<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; lstm final activiation type</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; last activiation type of lstm.</li>
-<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; lstm gate activiation type</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; gate activiation type of lstm.</li>
-<li><strong>state_act</strong> (<em>BaseActivation</em>) &#8211; lstm state activiation type.</li>
+<li><strong>state_act</strong> (<em>BaseActivation</em>) &#8211; state activiation type of lstm.</li>
-<li><strong>mixed_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; mixed layer&#8217;s extra attribute.</li>
+<li><strong>mixed_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; extra attribute of mixed layer.</li>
-<li><strong>lstm_cell_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; lstm layer&#8217;s extra attribute.</li>
+<li><strong>lstm_cell_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; extra attribute of lstm.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">lstm layer name.</p>
+<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">layer&#8217;s output.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -627,8 +631,8 @@ means default bias.</li>
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">bidirectional_lstm</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>A bidirectional_lstm is a recurrent unit that iterates over the input
-sequence both in forward and bardward orders, and then concatenate two
+sequence both in forward and backward orders, and then concatenate two
-outputs form a final output. However, concatenation of two outputs
+outputs to form a final output. However, concatenation of two outputs
 is not the only way to form the final output, you can also, for example,
 just add them together.</p>
 <p>Please refer to  <strong>Neural Machine Translation by Jointly Learning to Align
@@ -647,15 +651,14 @@ The link goes as follows:
 <li><strong>name</strong> (<em>basestring</em>) &#8211; bidirectional lstm layer name.</li>
 <li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; lstm layer size.</li>
-<li><strong>return_seq</strong> (<em>bool</em>) &#8211; If set False, outputs of the last time step are
+<li><strong>return_seq</strong> (<em>bool</em>) &#8211; If set False, the last time step of output are
 concatenated and returned.
-If set True, the entire output sequences that are
+If set True, the entire output sequences in forward
-processed in forward and backward directions are
+and backward directions are concatenated and returned.</li>
-concatenated and returned.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">LayerOutput object accroding to the return_seq.</p>
+<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -674,9 +677,9 @@ concatenated and returned.</li>
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">gru_unit</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><p>Define calculations that a gated recurrent unit performs in a single time
+<dd><p>gru_unit defines the calculation process of a gated recurrent unit during a single
-step. This function itself is not a recurrent layer, so it can not be
+time step. This function is not a recurrent layer, so it can not be
-directly used to process sequence inputs. This function is always used in
+directly used to process sequence input. This function is always used in
 the recurrent_group (see layers.py for more details) to implement attention
 mechanism.</p>
 <p>Please see grumemory in layers.py for the details about the maths.</p>
@@ -685,13 +688,13 @@ mechanism.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>memory_boot</strong> (<em>LayerOutput | None</em>) &#8211; the initialization state of the LSTM cell.</li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; name of the gru group.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; hidden size of the gru.</li>
-<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; type of the activation</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; activation type of gru</li>
-<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; type of the gate activation</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; gate activation type or gru</li>
-<li><strong>gru_layer_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; Extra parameter attribute of the gru layer.</li>
+<li><strong>gru_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; Extra attribute of the gru layer.</li>
 </ul>
 </td>
 </tr>
@@ -715,11 +718,11 @@ mechanism.</p>
 does exactly the same calculation as the grumemory layer does. A promising
 benefit is that gru hidden states are accessible to the user. This is
 especially useful in attention model. If you do not need to access
-any internal state, but merely use the outputs of a GRU, it is recommended
+any internal state and merely use the outputs of a GRU, it is recommended
 to use the grumemory, which is relatively faster.</p>
 <p>Please see grumemory in layers.py for more detail about the maths.</p>
 <p>The example usage is:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">gru</span> <span class="o">=</span> <span class="n">gur_group</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="p">[</span><span class="n">layer1</span><span class="p">],</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">gru</span> <span class="o">=</span> <span class="n">gru_group</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="p">[</span><span class="n">layer1</span><span class="p">],</span>
                <span class="n">size</span><span class="o">=</span><span class="mi">256</span><span class="p">,</span>
                <span class="n">act</span><span class="o">=</span><span class="n">TanhActivation</span><span class="p">(),</span>
                <span class="n">gate_act</span><span class="o">=</span><span class="n">SigmoidActivation</span><span class="p">())</span>
@@ -730,15 +733,16 @@ to use the grumemory, which is relatively faster.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>memory_boot</strong> (<em>LayerOutput | None</em>) &#8211; the initialization state of the LSTM cell.</li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; name of the gru group.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; hidden size of the gru.</li>
-<li><strong>reverse</strong> (<em>bool</em>) &#8211; whether to process the input data in a reverse order</li>
+<li><strong>reverse</strong> (<em>bool</em>) &#8211; process the input in a reverse order or not.</li>
-<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; type of the activiation</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; activiation type of gru</li>
-<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; type of the gate activiation</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; gate activiation type of gru</li>
-<li><strong>gru_bias_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias. False means no bias, None means default bias.</li>
+<li><strong>gru_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; bias parameter attribute of gru layer,
-<li><strong>gru_layer_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; Extra parameter attribute of the gru layer.</li>
+False means no bias, None means default bias.</li>
+<li><strong>gru_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; Extra attribute of the gru layer.</li>
 </ul>
 </td>
 </tr>
@@ -758,11 +762,11 @@ to use the grumemory, which is relatively faster.</p>
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">simple_gru</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><p>You maybe see gru_step_layer, grumemory in layers.py, gru_unit, gru_group,
+<dd><p>You may see gru_step_layer, grumemory in layers.py, gru_unit, gru_group,
 simple_gru in network.py. The reason why there are so many interfaces is
 that we have two ways to implement recurrent neural network. One way is to
 use one complete layer to implement rnn (including simple rnn, gru and lstm)
-with multiple time steps, such as recurrent_layer, lstmemory, grumemory. But,
+with multiple time steps, such as recurrent_layer, lstmemory, grumemory. But
 the multiplication operation <span class="math">\(W x_t\)</span> is not computed in these layers.
 See details in their interfaces in layers.py.
 The other implementation is to use an recurrent group which can ensemble a
@@ -792,14 +796,15 @@ gru_group, and gru_group is relatively better than simple_gru.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; name of the gru group.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; hidden size of the gru.</li>
-<li><strong>reverse</strong> (<em>bool</em>) &#8211; whether to process the input data in a reverse order</li>
+<li><strong>reverse</strong> (<em>bool</em>) &#8211; process the input in a reverse order or not.</li>
-<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; type of the activiation</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; activiation type of gru</li>
-<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; type of the gate activiation</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; gate activiation type of gru</li>
-<li><strong>gru_bias_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias. False means no bias, None means default bias.</li>
+<li><strong>gru_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; bias parameter attribute of gru layer,
-<li><strong>gru_layer_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; Extra parameter attribute of the gru layer.</li>
+False means no bias, None means default bias.</li>
+<li><strong>gru_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; Extra attribute of the gru layer.</li>
 </ul>
 </td>
 </tr>
@@ -819,8 +824,8 @@ gru_group, and gru_group is relatively better than simple_gru.</p>
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">simple_gru2</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><p>simple_gru2 is the same with simple_gru, but using grumemory instead
+<dd><p>simple_gru2 is the same with simple_gru, but using grumemory instead.
-Please see grumemory in layers.py for more detail about the maths.
+Please refer to grumemory in layers.py for more detail about the math.
 simple_gru2 is faster than simple_gru.</p>
 <p>The example usage is:</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">gru</span> <span class="o">=</span> <span class="n">simple_gru2</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="p">[</span><span class="n">layer1</span><span class="p">],</span> <span class="n">size</span><span class="o">=</span><span class="mi">256</span><span class="p">)</span>
@@ -831,14 +836,15 @@ simple_gru2 is faster than simple_gru.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; name of the gru group.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; hidden size of the gru.</li>
-<li><strong>reverse</strong> (<em>bool</em>) &#8211; whether to process the input data in a reverse order</li>
+<li><strong>reverse</strong> (<em>bool</em>) &#8211; process the input in a reverse order or not.</li>
-<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; type of the activiation</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; activiation type of gru</li>
-<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; type of the gate activiation</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; gate activiation type of gru</li>
-<li><strong>gru_bias_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias. False means no bias, None means default bias.</li>
+<li><strong>gru_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; bias parameter attribute of gru layer,
-<li><strong>gru_layer_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; Extra parameter attribute of the gru layer.</li>
+False means no bias, None means default bias.</li>
+<li><strong>gru_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; Extra attribute of the gru layer.</li>
 </ul>
 </td>
 </tr>
@@ -859,7 +865,7 @@ simple_gru2 is faster than simple_gru.</p>
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">bidirectional_gru</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>A bidirectional_gru is a recurrent unit that iterates over the input
-sequence both in forward and bardward orders, and then concatenate two
+sequence both in forward and backward orders, and then concatenate two
 outputs to form a final output. However, concatenation of two outputs
 is not the only way to form the final output, you can also, for example,
 just add them together.</p>
@@ -875,11 +881,10 @@ just add them together.</p>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; bidirectional gru layer name.</li>
 <li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; gru layer size.</li>
-<li><strong>return_seq</strong> (<em>bool</em>) &#8211; If set False, outputs of the last time step are
+<li><strong>return_seq</strong> (<em>bool</em>) &#8211; If set False, the last time step of output are
 concatenated and returned.
-If set True, the entire output sequences that are
+If set True, the entire output sequences in forward
-processed in forward and backward directions are
+and backward directions are concatenated and returned.</li>
-concatenated and returned.</li>
 </ul>
 </td>
 </tr>
@@ -900,7 +905,7 @@ concatenated and returned.</li>
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">simple_attention</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><p>Calculate and then return a context vector by attention machanism.
+<dd><p>Calculate and return a context vector with attention mechanism.
 Size of the context vector equals to size of the encoded_sequence.</p>
 <div class="math">
 \[ \begin{align}\begin{aligned}a(s_{i-1},h_{j}) &amp; = v_{a}f(W_{a}s_{t-1} + U_{a}h_{j})\\e_{i,j} &amp; = a(s_{i-1}, h_{j})\\a_{i,j} &amp; = \frac{exp(e_{i,j})}{\sum_{k=1}^{T_x}{exp(e_{i,k})}}\\c_{i} &amp; = \sum_{j=1}^{T_{x}}a_{i,j}h_{j}\end{aligned}\end{align} \]</div>
@@ -924,8 +929,8 @@ Align and Translate</strong> for more details. The link is as follows:
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
 <li><strong>name</strong> (<em>basestring</em>) &#8211; name of the attention model.</li>
 <li><strong>softmax_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; parameter attribute of sequence softmax
-that is used to produce attention weight</li>
+that is used to produce attention weight.</li>
-<li><strong>weight_act</strong> (<em>Activation</em>) &#8211; activation of the attention model</li>
+<li><strong>weight_act</strong> (<em>BaseActivation</em>) &#8211; activation of the attention model.</li>
 <li><strong>encoded_sequence</strong> (<em>LayerOutput</em>) &#8211; output of the encoder</li>
 <li><strong>encoded_proj</strong> (<em>LayerOutput</em>) &#8211; attention weight is computed by a feed forward neural
 network which has two inputs : decoder&#8217;s hidden state

--- a/develop/doc_cn/searchindex.js
+++ b/develop/doc_cn/searchindex.js