Deploy to GitHub Pages: 8c71e093

2e94fbcf · Travis CI · 9bef9f31 · 2e94fbcf · 2e94fbcf · 2e94fbcf
4 changed file
--- a/develop/doc/api/v2/config/networks.html
+++ b/develop/doc/api/v2/config/networks.html
@@ -194,39 +194,39 @@
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">sequence_conv_pool</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><p>Text convolution pooling layers helper.</p>
+<dd><p>Text convolution pooling group.</p>
 <p>Text input =&gt; Context Projection =&gt; FC Layer =&gt; Pooling =&gt; Output.</p>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; name of output layer(pooling layer name)</li>
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; name of input layer</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; group name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>context_len</strong> (<em>int</em>) &#8211; context projection length. See
 context_projection&#8217;s document.</li>
 <li><strong>hidden_size</strong> (<em>int</em>) &#8211; FC Layer size.</li>
-<li><strong>context_start</strong> (<em>int</em><em> or </em><em>None</em>) &#8211; context projection length. See
+<li><strong>context_start</strong> (<em>int|None</em>) &#8211; context start position. See
 context_projection&#8217;s context_start.</li>
-<li><strong>pool_type</strong> (<em>BasePoolingType.</em>) &#8211; pooling layer type. See pooling_layer&#8217;s document.</li>
+<li><strong>pool_type</strong> (<em>BasePoolingType</em>) &#8211; pooling layer type. See pooling_layer&#8217;s document.</li>
 <li><strong>context_proj_layer_name</strong> (<em>basestring</em>) &#8211; context projection layer name.
 None if user don&#8217;t care.</li>
-<li><strong>context_proj_param_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None.</em>) &#8211; context projection parameter attribute.
-None if user don&#8217;t care.</li>
+<li><strong>context_proj_param_attr</strong> (<em>ParameterAttribute|None</em>) &#8211; padding parameter attribute of context projection layer.
+If false, it means padding always be zero.</li>
 <li><strong>fc_layer_name</strong> (<em>basestring</em>) &#8211; fc layer name. None if user don&#8217;t care.</li>
-<li><strong>fc_param_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None</em>) &#8211; fc layer parameter attribute. None if user don&#8217;t care.</li>
-<li><strong>fc_bias_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None</em>) &#8211; fc bias parameter attribute. False if no bias,
+<li><strong>fc_param_attr</strong> (<em>ParameterAttribute|None</em>) &#8211; fc layer parameter attribute. None if user don&#8217;t care.</li>
+<li><strong>fc_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; fc bias parameter attribute. False if no bias,
+None if user don&#8217;t care.</li>
+<li><strong>fc_act</strong> (<em>BaseActivation</em>) &#8211; fc layer activation type. None means tanh.</li>
+<li><strong>pool_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; pooling layer bias attr. False if no bias.
 None if user don&#8217;t care.</li>
-<li><strong>fc_act</strong> (<em>BaseActivation</em>) &#8211; fc layer activation type. None means tanh</li>
-<li><strong>pool_bias_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None.</em>) &#8211; pooling layer bias attr. None if don&#8217;t care.
-False if no bias.</li>
 <li><strong>fc_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; fc layer extra attribute.</li>
 <li><strong>context_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; context projection layer extra attribute.</li>
 <li><strong>pool_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; pooling layer extra attribute.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">output layer name.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">layer&#8217;s output.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -242,39 +242,39 @@ False if no bias.</li>
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">text_conv_pool</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><p>Text convolution pooling layers helper.</p>
+<dd><p>Text convolution pooling group.</p>
 <p>Text input =&gt; Context Projection =&gt; FC Layer =&gt; Pooling =&gt; Output.</p>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; name of output layer(pooling layer name)</li>
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; name of input layer</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; group name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>context_len</strong> (<em>int</em>) &#8211; context projection length. See
 context_projection&#8217;s document.</li>
 <li><strong>hidden_size</strong> (<em>int</em>) &#8211; FC Layer size.</li>
-<li><strong>context_start</strong> (<em>int</em><em> or </em><em>None</em>) &#8211; context projection length. See
+<li><strong>context_start</strong> (<em>int|None</em>) &#8211; context start position. See
 context_projection&#8217;s context_start.</li>
-<li><strong>pool_type</strong> (<em>BasePoolingType.</em>) &#8211; pooling layer type. See pooling_layer&#8217;s document.</li>
+<li><strong>pool_type</strong> (<em>BasePoolingType</em>) &#8211; pooling layer type. See pooling_layer&#8217;s document.</li>
 <li><strong>context_proj_layer_name</strong> (<em>basestring</em>) &#8211; context projection layer name.
 None if user don&#8217;t care.</li>
-<li><strong>context_proj_param_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None.</em>) &#8211; context projection parameter attribute.
-None if user don&#8217;t care.</li>
+<li><strong>context_proj_param_attr</strong> (<em>ParameterAttribute|None</em>) &#8211; padding parameter attribute of context projection layer.
+If false, it means padding always be zero.</li>
 <li><strong>fc_layer_name</strong> (<em>basestring</em>) &#8211; fc layer name. None if user don&#8217;t care.</li>
-<li><strong>fc_param_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None</em>) &#8211; fc layer parameter attribute. None if user don&#8217;t care.</li>
-<li><strong>fc_bias_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None</em>) &#8211; fc bias parameter attribute. False if no bias,
+<li><strong>fc_param_attr</strong> (<em>ParameterAttribute|None</em>) &#8211; fc layer parameter attribute. None if user don&#8217;t care.</li>
+<li><strong>fc_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; fc bias parameter attribute. False if no bias,
+None if user don&#8217;t care.</li>
+<li><strong>fc_act</strong> (<em>BaseActivation</em>) &#8211; fc layer activation type. None means tanh.</li>
+<li><strong>pool_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; pooling layer bias attr. False if no bias.
 None if user don&#8217;t care.</li>
-<li><strong>fc_act</strong> (<em>BaseActivation</em>) &#8211; fc layer activation type. None means tanh</li>
-<li><strong>pool_bias_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None.</em>) &#8211; pooling layer bias attr. None if don&#8217;t care.
-False if no bias.</li>
 <li><strong>fc_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; fc layer extra attribute.</li>
 <li><strong>context_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; context projection layer extra attribute.</li>
 <li><strong>pool_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; pooling layer extra attribute.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">output layer name.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">layer&#8217;s output.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -294,36 +294,37 @@ False if no bias.</li>
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">img_conv_bn_pool</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>Convolution, batch normalization, pooling group.</p>
+<p>Img input =&gt; Conv =&gt; BN =&gt; Pooling =&gt; Output.</p>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; group name</li>
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; layer&#8217;s input</li>
-<li><strong>filter_size</strong> (<em>int</em>) &#8211; see img_conv_layer&#8217;s document</li>
-<li><strong>num_filters</strong> (<em>int</em>) &#8211; see img_conv_layer&#8217;s document</li>
-<li><strong>pool_size</strong> (<em>int</em>) &#8211; see img_pool_layer&#8217;s document.</li>
-<li><strong>pool_type</strong> (<em>BasePoolingType</em>) &#8211; see img_pool_layer&#8217;s document.</li>
-<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; see batch_norm_layer&#8217;s document.</li>
-<li><strong>groups</strong> (<em>int</em>) &#8211; see img_conv_layer&#8217;s document</li>
-<li><strong>conv_stride</strong> (<em>int</em>) &#8211; see img_conv_layer&#8217;s document.</li>
-<li><strong>conv_padding</strong> (<em>int</em>) &#8211; see img_conv_layer&#8217;s document.</li>
-<li><strong>conv_bias_attr</strong> (<em>ParameterAttribute</em>) &#8211; see img_conv_layer&#8217;s document.</li>
-<li><strong>num_channel</strong> (<em>int</em>) &#8211; see img_conv_layer&#8217;s document.</li>
-<li><strong>conv_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; see img_conv_layer&#8217;s document.</li>
-<li><strong>shared_bias</strong> (<em>bool</em>) &#8211; see img_conv_layer&#8217;s document.</li>
-<li><strong>conv_layer_attr</strong> (<em>ExtraLayerOutput</em>) &#8211; see img_conv_layer&#8217;s document.</li>
-<li><strong>bn_param_attr</strong> (<em>ParameterAttribute.</em>) &#8211; see batch_norm_layer&#8217;s document.</li>
-<li><strong>bn_bias_attr</strong> &#8211; see batch_norm_layer&#8217;s document.</li>
-<li><strong>bn_layer_attr</strong> &#8211; ParameterAttribute.</li>
-<li><strong>pool_stride</strong> (<em>int</em>) &#8211; see img_pool_layer&#8217;s document.</li>
-<li><strong>pool_padding</strong> (<em>int</em>) &#8211; see img_pool_layer&#8217;s document.</li>
-<li><strong>pool_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; see img_pool_layer&#8217;s document.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; group name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
+<li><strong>filter_size</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>num_filters</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>pool_size</strong> (<em>int</em>) &#8211; see img_pool_layer for details.</li>
+<li><strong>pool_type</strong> (<em>BasePoolingType</em>) &#8211; see img_pool_layer for details.</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; see batch_norm_layer for details.</li>
+<li><strong>groups</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>conv_stride</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>conv_padding</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>conv_bias_attr</strong> (<em>ParameterAttribute</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>num_channel</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>conv_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>shared_bias</strong> (<em>bool</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>conv_layer_attr</strong> (<em>ExtraLayerOutput</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>bn_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; see batch_norm_layer for details.</li>
+<li><strong>bn_bias_attr</strong> (<em>ParameterAttribute</em>) &#8211; see batch_norm_layer for details.</li>
+<li><strong>bn_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; see batch_norm_layer for details.</li>
+<li><strong>pool_stride</strong> (<em>int</em>) &#8211; see img_pool_layer for details.</li>
+<li><strong>pool_padding</strong> (<em>int</em>) &#8211; see img_pool_layer for details.</li>
+<li><strong>pool_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; see img_pool_layer for details.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">Layer groups output</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">layer&#8217;s output</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -347,26 +348,26 @@ False if no bias.</li>
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>conv_batchnorm_drop_rate</strong> (<em>list</em>) &#8211; if conv_with_batchnorm[i] is true,
 conv_batchnorm_drop_rate[i] represents the drop rate of each batch norm.</li>
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; layer&#8217;s input.</li>
-<li><strong>conv_num_filter</strong> (<em>int</em>) &#8211; output channels num.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
+<li><strong>conv_num_filter</strong> (<em>list|tuple</em>) &#8211; list of output channels num.</li>
 <li><strong>pool_size</strong> (<em>int</em>) &#8211; pooling filter size.</li>
 <li><strong>num_channels</strong> (<em>int</em>) &#8211; input channels num.</li>
 <li><strong>conv_padding</strong> (<em>int</em>) &#8211; convolution padding size.</li>
 <li><strong>conv_filter_size</strong> (<em>int</em>) &#8211; convolution filter size.</li>
 <li><strong>conv_act</strong> (<em>BaseActivation</em>) &#8211; activation funciton after convolution.</li>
-<li><strong>conv_with_batchnorm</strong> (<em>list</em>) &#8211; conv_with_batchnorm[i] represents
-if there is a batch normalization after each convolution.</li>
+<li><strong>conv_with_batchnorm</strong> (<em>list</em>) &#8211; if conv_with_batchnorm[i] is true,
+there is a batch normalization operation after each convolution.</li>
 <li><strong>pool_stride</strong> (<em>int</em>) &#8211; pooling stride size.</li>
 <li><strong>pool_type</strong> (<em>BasePoolingType</em>) &#8211; pooling type.</li>
-<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; Convolution param attribute.
+<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; param attribute of convolution layer,
 None means default attribute.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">Layer&#8217;s output</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">layer&#8217;s output</p>
 </td>
 </tr>
-<tr class="field-odd field"><th class="field-name">Type:</th><td class="field-body"><p class="first last">LayerOutput</p>
+<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
 </td>
 </tr>
 </tbody>
@@ -380,34 +381,34 @@ None means default attribute.</li>
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">simple_img_conv_pool</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>Simple image convolution and pooling group.</p>
-<p>Input =&gt; conv =&gt; pooling</p>
+<p>Img input =&gt; Conv =&gt; Pooling =&gt; Output.</p>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; group name</li>
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
-<li><strong>filter_size</strong> (<em>int</em>) &#8211; see img_conv_layer for details</li>
-<li><strong>num_filters</strong> (<em>int</em>) &#8211; see img_conv_layer for details</li>
-<li><strong>pool_size</strong> (<em>int</em>) &#8211; see img_pool_layer for details</li>
-<li><strong>pool_type</strong> (<em>BasePoolingType</em>) &#8211; see img_pool_layer for details</li>
-<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; see img_conv_layer for details</li>
-<li><strong>groups</strong> (<em>int</em>) &#8211; see img_conv_layer for details</li>
-<li><strong>conv_stride</strong> (<em>int</em>) &#8211; see img_conv_layer for details</li>
-<li><strong>conv_padding</strong> (<em>int</em>) &#8211; see img_conv_layer for details</li>
-<li><strong>bias_attr</strong> (<em>ParameterAttribute</em>) &#8211; see img_conv_layer for details</li>
-<li><strong>num_channel</strong> (<em>int</em>) &#8211; see img_conv_layer for details</li>
-<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; see img_conv_layer for details</li>
-<li><strong>shared_bias</strong> (<em>bool</em>) &#8211; see img_conv_layer for details</li>
-<li><strong>conv_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; see img_conv_layer for details</li>
-<li><strong>pool_stride</strong> (<em>int</em>) &#8211; see img_pool_layer for details</li>
-<li><strong>pool_padding</strong> (<em>int</em>) &#8211; see img_pool_layer for details</li>
-<li><strong>pool_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; see img_pool_layer for details</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; group name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
+<li><strong>filter_size</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>num_filters</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>pool_size</strong> (<em>int</em>) &#8211; see img_pool_layer for details.</li>
+<li><strong>pool_type</strong> (<em>BasePoolingType</em>) &#8211; see img_pool_layer for details.</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>groups</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>conv_stride</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>conv_padding</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>bias_attr</strong> (<em>ParameterAttribute</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>num_channel</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>shared_bias</strong> (<em>bool</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>conv_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>pool_stride</strong> (<em>int</em>) &#8211; see img_pool_layer for details.</li>
+<li><strong>pool_padding</strong> (<em>int</em>) &#8211; see img_pool_layer for details.</li>
+<li><strong>pool_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; see img_pool_layer for details.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">Layer&#8217;s output</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">layer&#8217;s output</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -432,13 +433,16 @@ None means default attribute.</li>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>num_classes</strong> &#8211; </li>
-<li><strong>input_image</strong> (<em>LayerOutput</em>) &#8211; </li>
-<li><strong>num_channels</strong> (<em>int</em>) &#8211; </li>
+<li><strong>num_classes</strong> (<em>int</em>) &#8211; number of class.</li>
+<li><strong>input_image</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
+<li><strong>num_channels</strong> (<em>int</em>) &#8211; input channels num.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first last"></p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">layer&#8217;s output</p>
+</td>
+</tr>
+<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
 </td>
 </tr>
 </tbody>
@@ -456,9 +460,9 @@ None means default attribute.</li>
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">lstmemory_unit</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><p>Define calculations that a LSTM unit performs during a single time step.
-This function itself is not a recurrent layer, so it can not be
-directly used to process sequence inputs. This function is always used in
+<dd><p>lstmemory_unit defines the caculation process of a LSTM unit during a
+single time step. This function is not a recurrent layer, so it can not be
+directly used to process sequence input. This function is always used in
 recurrent_group (see layers.py for more details) to implement attention
 mechanism.</p>
 <p>Please refer to  <strong>Generating Sequences With Recurrent Neural Networks</strong>
@@ -479,21 +483,21 @@ for more details about LSTM. The link goes as follows:
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>out_memory</strong> (<em>LayerOutput | None</em>) &#8211; output of previous time step</li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; lstmemory unit name.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; lstmemory unit size.</li>
-<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; Parameter config, None if use default.</li>
-<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; lstm final activiation type</li>
-<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; lstm gate activiation type</li>
-<li><strong>state_act</strong> (<em>BaseActivation</em>) &#8211; lstm state activiation type.</li>
-<li><strong>input_proj_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; bias attribute for input-to-hidden projection.
+<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; parameter attribute, None means default attribute.</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; last activiation type of lstm.</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; gate activiation type of lstm.</li>
+<li><strong>state_act</strong> (<em>BaseActivation</em>) &#8211; state activiation type of lstm.</li>
+<li><strong>input_proj_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; bias attribute for input to hidden projection.
 False means no bias, None means default bias.</li>
 <li><strong>input_proj_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; extra layer attribute for input to hidden
 projection of the LSTM unit, such as dropout, error clipping.</li>
-<li><strong>lstm_bias_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias parameter attribute of lstm layer.
+<li><strong>lstm_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; bias parameter attribute of lstm layer.
 False means no bias, None means default bias.</li>
-<li><strong>lstm_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; lstm layer&#8217;s extra attribute.</li>
+<li><strong>lstm_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; extra attribute of lstm layer.</li>
 </ul>
 </td>
 </tr>
@@ -516,9 +520,9 @@ False means no bias, None means default bias.</li>
 <dd><p>lstm_group is a recurrent_group version of Long Short Term Memory. It
 does exactly the same calculation as the lstmemory layer (see lstmemory in
 layers.py for the maths) does. A promising benefit is that LSTM memory
-cell states, or hidden states in every time step are accessible to the
+cell states(or hidden states) in every time step are accessible to the
 user. This is especially useful in attention model. If you do not need to
-access the internal states of the lstm, but merely use its outputs,
+access the internal states of the lstm and merely use its outputs,
 it is recommended to use the lstmemory, which is relatively faster than
 lstmemory_group.</p>
 <p>NOTE: In PaddlePaddle&#8217;s implementation, the following input-to-hidden
@@ -540,18 +544,18 @@ full_matrix_projection must be included before lstmemory_unit is called.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; lstmemory group size.</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; name of the lstmemory group.</li>
-<li><strong>out_memory</strong> (<em>LayerOutput | None</em>) &#8211; output of previous time step</li>
-<li><strong>reverse</strong> (<em>bool</em>) &#8211; is lstm reversed</li>
-<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; Parameter config, None if use default.</li>
-<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; lstm final activiation type</li>
-<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; lstm gate activiation type</li>
-<li><strong>state_act</strong> (<em>BaseActivation</em>) &#8211; lstm state activiation type.</li>
-<li><strong>lstm_bias_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias parameter attribute of lstm layer.
+<li><strong>name</strong> (<em>basestring</em>) &#8211; name of lstmemory group.</li>
+<li><strong>out_memory</strong> (<em>LayerOutput | None</em>) &#8211; output of previous time step.</li>
+<li><strong>reverse</strong> (<em>bool</em>) &#8211; process the input in a reverse order or not.</li>
+<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; parameter attribute, None means default attribute.</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; last activiation type of lstm.</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; gate activiation type of lstm.</li>
+<li><strong>state_act</strong> (<em>BaseActivation</em>) &#8211; state activiation type of lstm.</li>
+<li><strong>lstm_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; bias parameter attribute of lstm layer.
 False means no bias, None means default bias.</li>
-<li><strong>input_proj_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; bias attribute for input-to-hidden projection.
+<li><strong>input_proj_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; bias attribute for input to hidden projection.
 False means no bias, None means default bias.</li>
 <li><strong>input_proj_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; extra layer attribute for input to hidden
 projection of the LSTM unit, such as dropout, error clipping.</li>
@@ -576,34 +580,34 @@ projection of the LSTM unit, such as dropout, error clipping.</li>
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">simple_lstm</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>Simple LSTM Cell.</p>
-<p>It just combine a mixed layer with fully_matrix_projection and a lstmemory
-layer. The simple lstm cell was implemented as follow equations.</p>
+<p>It just combines a mixed layer with fully_matrix_projection and a lstmemory
+layer. The simple lstm cell was implemented with follow equations.</p>
 <div class="math">
 \[ \begin{align}\begin{aligned}i_t &amp; = \sigma(W_{xi}x_{t} + W_{hi}h_{t-1} + W_{ci}c_{t-1} + b_i)\\f_t &amp; = \sigma(W_{xf}x_{t} + W_{hf}h_{t-1} + W_{cf}c_{t-1} + b_f)\\c_t &amp; = f_tc_{t-1} + i_t tanh (W_{xc}x_t+W_{hc}h_{t-1} + b_c)\\o_t &amp; = \sigma(W_{xo}x_{t} + W_{ho}h_{t-1} + W_{co}c_t + b_o)\\h_t &amp; = o_t tanh(c_t)\end{aligned}\end{align} \]</div>
-<p>Please refer <strong>Generating Sequences With Recurrent Neural Networks</strong> if you
-want to know what lstm is. <a class="reference external" href="http://arxiv.org/abs/1308.0850">Link</a> is here.</p>
+<p>Please refer to <strong>Generating Sequences With Recurrent Neural Networks</strong> for more
+details about lstm. <a class="reference external" href="http://arxiv.org/abs/1308.0850">Link</a> is here.</p>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>name</strong> (<em>basestring</em>) &#8211; lstm layer name.</li>
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; layer&#8217;s input.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; lstm layer size.</li>
-<li><strong>reverse</strong> (<em>bool</em>) &#8211; whether to process the input data in a reverse order</li>
-<li><strong>mat_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; mixed layer&#8217;s matrix projection parameter attribute.</li>
+<li><strong>reverse</strong> (<em>bool</em>) &#8211; process the input in a reverse order or not.</li>
+<li><strong>mat_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; parameter attribute of matrix projection in mixed layer.</li>
 <li><strong>bias_param_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias parameter attribute. False means no bias, None
 means default bias.</li>
-<li><strong>inner_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; lstm cell parameter attribute.</li>
-<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; lstm final activiation type</li>
-<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; lstm gate activiation type</li>
-<li><strong>state_act</strong> (<em>BaseActivation</em>) &#8211; lstm state activiation type.</li>
-<li><strong>mixed_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; mixed layer&#8217;s extra attribute.</li>
-<li><strong>lstm_cell_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; lstm layer&#8217;s extra attribute.</li>
+<li><strong>inner_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; parameter attribute of lstm cell.</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; last activiation type of lstm.</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; gate activiation type of lstm.</li>
+<li><strong>state_act</strong> (<em>BaseActivation</em>) &#8211; state activiation type of lstm.</li>
+<li><strong>mixed_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; extra attribute of mixed layer.</li>
+<li><strong>lstm_cell_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; extra attribute of lstm.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">lstm layer name.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">layer&#8217;s output.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -620,8 +624,8 @@ means default bias.</li>
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">bidirectional_lstm</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>A bidirectional_lstm is a recurrent unit that iterates over the input
-sequence both in forward and bardward orders, and then concatenate two
-outputs form a final output. However, concatenation of two outputs
+sequence both in forward and backward orders, and then concatenate two
+outputs to form a final output. However, concatenation of two outputs
 is not the only way to form the final output, you can also, for example,
 just add them together.</p>
 <p>Please refer to  <strong>Neural Machine Translation by Jointly Learning to Align
@@ -640,15 +644,14 @@ The link goes as follows:
 <li><strong>name</strong> (<em>basestring</em>) &#8211; bidirectional lstm layer name.</li>
 <li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; lstm layer size.</li>
-<li><strong>return_seq</strong> (<em>bool</em>) &#8211; If set False, outputs of the last time step are
+<li><strong>return_seq</strong> (<em>bool</em>) &#8211; If set False, the last time step of output are
 concatenated and returned.
-If set True, the entire output sequences that are
-processed in forward and backward directions are
-concatenated and returned.</li>
+If set True, the entire output sequences in forward
+and backward directions are concatenated and returned.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object accroding to the return_seq.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -667,9 +670,9 @@ concatenated and returned.</li>
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">gru_unit</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><p>Define calculations that a gated recurrent unit performs in a single time
-step. This function itself is not a recurrent layer, so it can not be
-directly used to process sequence inputs. This function is always used in
+<dd><p>gru_unit defines the calculation process of a gated recurrent unit during a single
+time step. This function is not a recurrent layer, so it can not be
+directly used to process sequence input. This function is always used in
 the recurrent_group (see layers.py for more details) to implement attention
 mechanism.</p>
 <p>Please see grumemory in layers.py for the details about the maths.</p>
@@ -678,13 +681,13 @@ mechanism.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>memory_boot</strong> (<em>LayerOutput | None</em>) &#8211; the initialization state of the LSTM cell.</li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; name of the gru group.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; hidden size of the gru.</li>
-<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; type of the activation</li>
-<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; type of the gate activation</li>
-<li><strong>gru_layer_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; Extra parameter attribute of the gru layer.</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; activation type of gru</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; gate activation type or gru</li>
+<li><strong>gru_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; Extra attribute of the gru layer.</li>
 </ul>
 </td>
 </tr>
@@ -708,11 +711,11 @@ mechanism.</p>
 does exactly the same calculation as the grumemory layer does. A promising
 benefit is that gru hidden states are accessible to the user. This is
 especially useful in attention model. If you do not need to access
-any internal state, but merely use the outputs of a GRU, it is recommended
+any internal state and merely use the outputs of a GRU, it is recommended
 to use the grumemory, which is relatively faster.</p>
 <p>Please see grumemory in layers.py for more detail about the maths.</p>
 <p>The example usage is:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">gru</span> <span class="o">=</span> <span class="n">gur_group</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="p">[</span><span class="n">layer1</span><span class="p">],</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">gru</span> <span class="o">=</span> <span class="n">gru_group</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="p">[</span><span class="n">layer1</span><span class="p">],</span>
                <span class="n">size</span><span class="o">=</span><span class="mi">256</span><span class="p">,</span>
                <span class="n">act</span><span class="o">=</span><span class="n">TanhActivation</span><span class="p">(),</span>
                <span class="n">gate_act</span><span class="o">=</span><span class="n">SigmoidActivation</span><span class="p">())</span>
@@ -723,15 +726,16 @@ to use the grumemory, which is relatively faster.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>memory_boot</strong> (<em>LayerOutput | None</em>) &#8211; the initialization state of the LSTM cell.</li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; name of the gru group.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; hidden size of the gru.</li>
-<li><strong>reverse</strong> (<em>bool</em>) &#8211; whether to process the input data in a reverse order</li>
-<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; type of the activiation</li>
-<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; type of the gate activiation</li>
-<li><strong>gru_bias_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias. False means no bias, None means default bias.</li>
-<li><strong>gru_layer_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; Extra parameter attribute of the gru layer.</li>
+<li><strong>reverse</strong> (<em>bool</em>) &#8211; process the input in a reverse order or not.</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; activiation type of gru</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; gate activiation type of gru</li>
+<li><strong>gru_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; bias parameter attribute of gru layer,
+False means no bias, None means default bias.</li>
+<li><strong>gru_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; Extra attribute of the gru layer.</li>
 </ul>
 </td>
 </tr>
@@ -751,11 +755,11 @@ to use the grumemory, which is relatively faster.</p>
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">simple_gru</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><p>You maybe see gru_step_layer, grumemory in layers.py, gru_unit, gru_group,
+<dd><p>You may see gru_step_layer, grumemory in layers.py, gru_unit, gru_group,
 simple_gru in network.py. The reason why there are so many interfaces is
 that we have two ways to implement recurrent neural network. One way is to
 use one complete layer to implement rnn (including simple rnn, gru and lstm)
-with multiple time steps, such as recurrent_layer, lstmemory, grumemory. But,
+with multiple time steps, such as recurrent_layer, lstmemory, grumemory. But
 the multiplication operation <span class="math">\(W x_t\)</span> is not computed in these layers.
 See details in their interfaces in layers.py.
 The other implementation is to use an recurrent group which can ensemble a
@@ -785,14 +789,15 @@ gru_group, and gru_group is relatively better than simple_gru.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; name of the gru group.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; hidden size of the gru.</li>
-<li><strong>reverse</strong> (<em>bool</em>) &#8211; whether to process the input data in a reverse order</li>
-<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; type of the activiation</li>
-<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; type of the gate activiation</li>
-<li><strong>gru_bias_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias. False means no bias, None means default bias.</li>
-<li><strong>gru_layer_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; Extra parameter attribute of the gru layer.</li>
+<li><strong>reverse</strong> (<em>bool</em>) &#8211; process the input in a reverse order or not.</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; activiation type of gru</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; gate activiation type of gru</li>
+<li><strong>gru_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; bias parameter attribute of gru layer,
+False means no bias, None means default bias.</li>
+<li><strong>gru_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; Extra attribute of the gru layer.</li>
 </ul>
 </td>
 </tr>
@@ -812,8 +817,8 @@ gru_group, and gru_group is relatively better than simple_gru.</p>
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">simple_gru2</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><p>simple_gru2 is the same with simple_gru, but using grumemory instead
-Please see grumemory in layers.py for more detail about the maths.
+<dd><p>simple_gru2 is the same with simple_gru, but using grumemory instead.
+Please refer to grumemory in layers.py for more detail about the math.
 simple_gru2 is faster than simple_gru.</p>
 <p>The example usage is:</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">gru</span> <span class="o">=</span> <span class="n">simple_gru2</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="p">[</span><span class="n">layer1</span><span class="p">],</span> <span class="n">size</span><span class="o">=</span><span class="mi">256</span><span class="p">)</span>
@@ -824,14 +829,15 @@ simple_gru2 is faster than simple_gru.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; name of the gru group.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; hidden size of the gru.</li>
-<li><strong>reverse</strong> (<em>bool</em>) &#8211; whether to process the input data in a reverse order</li>
-<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; type of the activiation</li>
-<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; type of the gate activiation</li>
-<li><strong>gru_bias_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias. False means no bias, None means default bias.</li>
-<li><strong>gru_layer_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; Extra parameter attribute of the gru layer.</li>
+<li><strong>reverse</strong> (<em>bool</em>) &#8211; process the input in a reverse order or not.</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; activiation type of gru</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; gate activiation type of gru</li>
+<li><strong>gru_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; bias parameter attribute of gru layer,
+False means no bias, None means default bias.</li>
+<li><strong>gru_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; Extra attribute of the gru layer.</li>
 </ul>
 </td>
 </tr>
@@ -852,7 +858,7 @@ simple_gru2 is faster than simple_gru.</p>
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">bidirectional_gru</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>A bidirectional_gru is a recurrent unit that iterates over the input
-sequence both in forward and bardward orders, and then concatenate two
+sequence both in forward and backward orders, and then concatenate two
 outputs to form a final output. However, concatenation of two outputs
 is not the only way to form the final output, you can also, for example,
 just add them together.</p>
@@ -868,11 +874,10 @@ just add them together.</p>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; bidirectional gru layer name.</li>
 <li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; gru layer size.</li>
-<li><strong>return_seq</strong> (<em>bool</em>) &#8211; If set False, outputs of the last time step are
+<li><strong>return_seq</strong> (<em>bool</em>) &#8211; If set False, the last time step of output are
 concatenated and returned.
-If set True, the entire output sequences that are
-processed in forward and backward directions are
-concatenated and returned.</li>
+If set True, the entire output sequences in forward
+and backward directions are concatenated and returned.</li>
 </ul>
 </td>
 </tr>
@@ -893,7 +898,7 @@ concatenated and returned.</li>
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">simple_attention</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><p>Calculate and then return a context vector by attention machanism.
+<dd><p>Calculate and return a context vector with attention mechanism.
 Size of the context vector equals to size of the encoded_sequence.</p>
 <div class="math">
 \[ \begin{align}\begin{aligned}a(s_{i-1},h_{j}) &amp; = v_{a}f(W_{a}s_{t-1} + U_{a}h_{j})\\e_{i,j} &amp; = a(s_{i-1}, h_{j})\\a_{i,j} &amp; = \frac{exp(e_{i,j})}{\sum_{k=1}^{T_x}{exp(e_{i,k})}}\\c_{i} &amp; = \sum_{j=1}^{T_{x}}a_{i,j}h_{j}\end{aligned}\end{align} \]</div>
@@ -917,8 +922,8 @@ Align and Translate</strong> for more details. The link is as follows:
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>name</strong> (<em>basestring</em>) &#8211; name of the attention model.</li>
 <li><strong>softmax_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; parameter attribute of sequence softmax
-that is used to produce attention weight</li>
-<li><strong>weight_act</strong> (<em>Activation</em>) &#8211; activation of the attention model</li>
+that is used to produce attention weight.</li>
+<li><strong>weight_act</strong> (<em>BaseActivation</em>) &#8211; activation of the attention model.</li>
 <li><strong>encoded_sequence</strong> (<em>LayerOutput</em>) &#8211; output of the encoder</li>
 <li><strong>encoded_proj</strong> (<em>LayerOutput</em>) &#8211; attention weight is computed by a feed forward neural
 network which has two inputs : decoder&#8217;s hidden state

--- a/develop/doc/searchindex.js
+++ b/develop/doc/searchindex.js
--- a/develop/doc_cn/api/v2/config/networks.html
+++ b/develop/doc_cn/api/v2/config/networks.html
@@ -201,39 +201,39 @@
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">sequence_conv_pool</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><p>Text convolution pooling layers helper.</p>
+<dd><p>Text convolution pooling group.</p>
 <p>Text input =&gt; Context Projection =&gt; FC Layer =&gt; Pooling =&gt; Output.</p>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; name of output layer(pooling layer name)</li>
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; name of input layer</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; group name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>context_len</strong> (<em>int</em>) &#8211; context projection length. See
 context_projection&#8217;s document.</li>
 <li><strong>hidden_size</strong> (<em>int</em>) &#8211; FC Layer size.</li>
-<li><strong>context_start</strong> (<em>int</em><em> or </em><em>None</em>) &#8211; context projection length. See
+<li><strong>context_start</strong> (<em>int|None</em>) &#8211; context start position. See
 context_projection&#8217;s context_start.</li>
-<li><strong>pool_type</strong> (<em>BasePoolingType.</em>) &#8211; pooling layer type. See pooling_layer&#8217;s document.</li>
+<li><strong>pool_type</strong> (<em>BasePoolingType</em>) &#8211; pooling layer type. See pooling_layer&#8217;s document.</li>
 <li><strong>context_proj_layer_name</strong> (<em>basestring</em>) &#8211; context projection layer name.
 None if user don&#8217;t care.</li>
-<li><strong>context_proj_param_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None.</em>) &#8211; context projection parameter attribute.
-None if user don&#8217;t care.</li>
+<li><strong>context_proj_param_attr</strong> (<em>ParameterAttribute|None</em>) &#8211; padding parameter attribute of context projection layer.
+If false, it means padding always be zero.</li>
 <li><strong>fc_layer_name</strong> (<em>basestring</em>) &#8211; fc layer name. None if user don&#8217;t care.</li>
-<li><strong>fc_param_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None</em>) &#8211; fc layer parameter attribute. None if user don&#8217;t care.</li>
-<li><strong>fc_bias_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None</em>) &#8211; fc bias parameter attribute. False if no bias,
+<li><strong>fc_param_attr</strong> (<em>ParameterAttribute|None</em>) &#8211; fc layer parameter attribute. None if user don&#8217;t care.</li>
+<li><strong>fc_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; fc bias parameter attribute. False if no bias,
+None if user don&#8217;t care.</li>
+<li><strong>fc_act</strong> (<em>BaseActivation</em>) &#8211; fc layer activation type. None means tanh.</li>
+<li><strong>pool_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; pooling layer bias attr. False if no bias.
 None if user don&#8217;t care.</li>
-<li><strong>fc_act</strong> (<em>BaseActivation</em>) &#8211; fc layer activation type. None means tanh</li>
-<li><strong>pool_bias_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None.</em>) &#8211; pooling layer bias attr. None if don&#8217;t care.
-False if no bias.</li>
 <li><strong>fc_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; fc layer extra attribute.</li>
 <li><strong>context_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; context projection layer extra attribute.</li>
 <li><strong>pool_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; pooling layer extra attribute.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">output layer name.</p>
+<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">layer&#8217;s output.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -249,39 +249,39 @@ False if no bias.</li>
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">text_conv_pool</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><p>Text convolution pooling layers helper.</p>
+<dd><p>Text convolution pooling group.</p>
 <p>Text input =&gt; Context Projection =&gt; FC Layer =&gt; Pooling =&gt; Output.</p>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; name of output layer(pooling layer name)</li>
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; name of input layer</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; group name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>context_len</strong> (<em>int</em>) &#8211; context projection length. See
 context_projection&#8217;s document.</li>
 <li><strong>hidden_size</strong> (<em>int</em>) &#8211; FC Layer size.</li>
-<li><strong>context_start</strong> (<em>int</em><em> or </em><em>None</em>) &#8211; context projection length. See
+<li><strong>context_start</strong> (<em>int|None</em>) &#8211; context start position. See
 context_projection&#8217;s context_start.</li>
-<li><strong>pool_type</strong> (<em>BasePoolingType.</em>) &#8211; pooling layer type. See pooling_layer&#8217;s document.</li>
+<li><strong>pool_type</strong> (<em>BasePoolingType</em>) &#8211; pooling layer type. See pooling_layer&#8217;s document.</li>
 <li><strong>context_proj_layer_name</strong> (<em>basestring</em>) &#8211; context projection layer name.
 None if user don&#8217;t care.</li>
-<li><strong>context_proj_param_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None.</em>) &#8211; context projection parameter attribute.
-None if user don&#8217;t care.</li>
+<li><strong>context_proj_param_attr</strong> (<em>ParameterAttribute|None</em>) &#8211; padding parameter attribute of context projection layer.
+If false, it means padding always be zero.</li>
 <li><strong>fc_layer_name</strong> (<em>basestring</em>) &#8211; fc layer name. None if user don&#8217;t care.</li>
-<li><strong>fc_param_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None</em>) &#8211; fc layer parameter attribute. None if user don&#8217;t care.</li>
-<li><strong>fc_bias_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None</em>) &#8211; fc bias parameter attribute. False if no bias,
+<li><strong>fc_param_attr</strong> (<em>ParameterAttribute|None</em>) &#8211; fc layer parameter attribute. None if user don&#8217;t care.</li>
+<li><strong>fc_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; fc bias parameter attribute. False if no bias,
+None if user don&#8217;t care.</li>
+<li><strong>fc_act</strong> (<em>BaseActivation</em>) &#8211; fc layer activation type. None means tanh.</li>
+<li><strong>pool_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; pooling layer bias attr. False if no bias.
 None if user don&#8217;t care.</li>
-<li><strong>fc_act</strong> (<em>BaseActivation</em>) &#8211; fc layer activation type. None means tanh</li>
-<li><strong>pool_bias_attr</strong> (<em>ParameterAttribute</em><em> or </em><em>None.</em>) &#8211; pooling layer bias attr. None if don&#8217;t care.
-False if no bias.</li>
 <li><strong>fc_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; fc layer extra attribute.</li>
 <li><strong>context_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; context projection layer extra attribute.</li>
 <li><strong>pool_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; pooling layer extra attribute.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">output layer name.</p>
+<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">layer&#8217;s output.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -301,36 +301,37 @@ False if no bias.</li>
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">img_conv_bn_pool</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>Convolution, batch normalization, pooling group.</p>
+<p>Img input =&gt; Conv =&gt; BN =&gt; Pooling =&gt; Output.</p>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; group name</li>
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; layer&#8217;s input</li>
-<li><strong>filter_size</strong> (<em>int</em>) &#8211; see img_conv_layer&#8217;s document</li>
-<li><strong>num_filters</strong> (<em>int</em>) &#8211; see img_conv_layer&#8217;s document</li>
-<li><strong>pool_size</strong> (<em>int</em>) &#8211; see img_pool_layer&#8217;s document.</li>
-<li><strong>pool_type</strong> (<em>BasePoolingType</em>) &#8211; see img_pool_layer&#8217;s document.</li>
-<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; see batch_norm_layer&#8217;s document.</li>
-<li><strong>groups</strong> (<em>int</em>) &#8211; see img_conv_layer&#8217;s document</li>
-<li><strong>conv_stride</strong> (<em>int</em>) &#8211; see img_conv_layer&#8217;s document.</li>
-<li><strong>conv_padding</strong> (<em>int</em>) &#8211; see img_conv_layer&#8217;s document.</li>
-<li><strong>conv_bias_attr</strong> (<em>ParameterAttribute</em>) &#8211; see img_conv_layer&#8217;s document.</li>
-<li><strong>num_channel</strong> (<em>int</em>) &#8211; see img_conv_layer&#8217;s document.</li>
-<li><strong>conv_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; see img_conv_layer&#8217;s document.</li>
-<li><strong>shared_bias</strong> (<em>bool</em>) &#8211; see img_conv_layer&#8217;s document.</li>
-<li><strong>conv_layer_attr</strong> (<em>ExtraLayerOutput</em>) &#8211; see img_conv_layer&#8217;s document.</li>
-<li><strong>bn_param_attr</strong> (<em>ParameterAttribute.</em>) &#8211; see batch_norm_layer&#8217;s document.</li>
-<li><strong>bn_bias_attr</strong> &#8211; see batch_norm_layer&#8217;s document.</li>
-<li><strong>bn_layer_attr</strong> &#8211; ParameterAttribute.</li>
-<li><strong>pool_stride</strong> (<em>int</em>) &#8211; see img_pool_layer&#8217;s document.</li>
-<li><strong>pool_padding</strong> (<em>int</em>) &#8211; see img_pool_layer&#8217;s document.</li>
-<li><strong>pool_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; see img_pool_layer&#8217;s document.</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; group name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
+<li><strong>filter_size</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>num_filters</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>pool_size</strong> (<em>int</em>) &#8211; see img_pool_layer for details.</li>
+<li><strong>pool_type</strong> (<em>BasePoolingType</em>) &#8211; see img_pool_layer for details.</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; see batch_norm_layer for details.</li>
+<li><strong>groups</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>conv_stride</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>conv_padding</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>conv_bias_attr</strong> (<em>ParameterAttribute</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>num_channel</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>conv_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>shared_bias</strong> (<em>bool</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>conv_layer_attr</strong> (<em>ExtraLayerOutput</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>bn_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; see batch_norm_layer for details.</li>
+<li><strong>bn_bias_attr</strong> (<em>ParameterAttribute</em>) &#8211; see batch_norm_layer for details.</li>
+<li><strong>bn_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; see batch_norm_layer for details.</li>
+<li><strong>pool_stride</strong> (<em>int</em>) &#8211; see img_pool_layer for details.</li>
+<li><strong>pool_padding</strong> (<em>int</em>) &#8211; see img_pool_layer for details.</li>
+<li><strong>pool_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; see img_pool_layer for details.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">Layer groups output</p>
+<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">layer&#8217;s output</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -354,26 +355,26 @@ False if no bias.</li>
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
 <li><strong>conv_batchnorm_drop_rate</strong> (<em>list</em>) &#8211; if conv_with_batchnorm[i] is true,
 conv_batchnorm_drop_rate[i] represents the drop rate of each batch norm.</li>
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; layer&#8217;s input.</li>
-<li><strong>conv_num_filter</strong> (<em>int</em>) &#8211; output channels num.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
+<li><strong>conv_num_filter</strong> (<em>list|tuple</em>) &#8211; list of output channels num.</li>
 <li><strong>pool_size</strong> (<em>int</em>) &#8211; pooling filter size.</li>
 <li><strong>num_channels</strong> (<em>int</em>) &#8211; input channels num.</li>
 <li><strong>conv_padding</strong> (<em>int</em>) &#8211; convolution padding size.</li>
 <li><strong>conv_filter_size</strong> (<em>int</em>) &#8211; convolution filter size.</li>
 <li><strong>conv_act</strong> (<em>BaseActivation</em>) &#8211; activation funciton after convolution.</li>
-<li><strong>conv_with_batchnorm</strong> (<em>list</em>) &#8211; conv_with_batchnorm[i] represents
-if there is a batch normalization after each convolution.</li>
+<li><strong>conv_with_batchnorm</strong> (<em>list</em>) &#8211; if conv_with_batchnorm[i] is true,
+there is a batch normalization operation after each convolution.</li>
 <li><strong>pool_stride</strong> (<em>int</em>) &#8211; pooling stride size.</li>
 <li><strong>pool_type</strong> (<em>BasePoolingType</em>) &#8211; pooling type.</li>
-<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; Convolution param attribute.
+<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; param attribute of convolution layer,
 None means default attribute.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">Layer&#8217;s output</p>
+<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">layer&#8217;s output</p>
 </td>
 </tr>
-<tr class="field-odd field"><th class="field-name">Type:</th><td class="field-body"><p class="first last">LayerOutput</p>
+<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">LayerOutput</p>
 </td>
 </tr>
 </tbody>
@@ -387,34 +388,34 @@ None means default attribute.</li>
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">simple_img_conv_pool</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>Simple image convolution and pooling group.</p>
-<p>Input =&gt; conv =&gt; pooling</p>
+<p>Img input =&gt; Conv =&gt; Pooling =&gt; Output.</p>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>name</strong> (<em>basestring</em>) &#8211; group name</li>
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
-<li><strong>filter_size</strong> (<em>int</em>) &#8211; see img_conv_layer for details</li>
-<li><strong>num_filters</strong> (<em>int</em>) &#8211; see img_conv_layer for details</li>
-<li><strong>pool_size</strong> (<em>int</em>) &#8211; see img_pool_layer for details</li>
-<li><strong>pool_type</strong> (<em>BasePoolingType</em>) &#8211; see img_pool_layer for details</li>
-<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; see img_conv_layer for details</li>
-<li><strong>groups</strong> (<em>int</em>) &#8211; see img_conv_layer for details</li>
-<li><strong>conv_stride</strong> (<em>int</em>) &#8211; see img_conv_layer for details</li>
-<li><strong>conv_padding</strong> (<em>int</em>) &#8211; see img_conv_layer for details</li>
-<li><strong>bias_attr</strong> (<em>ParameterAttribute</em>) &#8211; see img_conv_layer for details</li>
-<li><strong>num_channel</strong> (<em>int</em>) &#8211; see img_conv_layer for details</li>
-<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; see img_conv_layer for details</li>
-<li><strong>shared_bias</strong> (<em>bool</em>) &#8211; see img_conv_layer for details</li>
-<li><strong>conv_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; see img_conv_layer for details</li>
-<li><strong>pool_stride</strong> (<em>int</em>) &#8211; see img_pool_layer for details</li>
-<li><strong>pool_padding</strong> (<em>int</em>) &#8211; see img_pool_layer for details</li>
-<li><strong>pool_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; see img_pool_layer for details</li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; group name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
+<li><strong>filter_size</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>num_filters</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>pool_size</strong> (<em>int</em>) &#8211; see img_pool_layer for details.</li>
+<li><strong>pool_type</strong> (<em>BasePoolingType</em>) &#8211; see img_pool_layer for details.</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>groups</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>conv_stride</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>conv_padding</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>bias_attr</strong> (<em>ParameterAttribute</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>num_channel</strong> (<em>int</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>shared_bias</strong> (<em>bool</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>conv_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; see img_conv_layer for details.</li>
+<li><strong>pool_stride</strong> (<em>int</em>) &#8211; see img_pool_layer for details.</li>
+<li><strong>pool_padding</strong> (<em>int</em>) &#8211; see img_pool_layer for details.</li>
+<li><strong>pool_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; see img_pool_layer for details.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">Layer&#8217;s output</p>
+<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">layer&#8217;s output</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -439,13 +440,16 @@ None means default attribute.</li>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>num_classes</strong> &#8211; </li>
-<li><strong>input_image</strong> (<em>LayerOutput</em>) &#8211; </li>
-<li><strong>num_channels</strong> (<em>int</em>) &#8211; </li>
+<li><strong>num_classes</strong> (<em>int</em>) &#8211; number of class.</li>
+<li><strong>input_image</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
+<li><strong>num_channels</strong> (<em>int</em>) &#8211; input channels num.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first last"></p>
+<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">layer&#8217;s output</p>
+</td>
+</tr>
+<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">LayerOutput</p>
 </td>
 </tr>
 </tbody>
@@ -463,9 +467,9 @@ None means default attribute.</li>
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">lstmemory_unit</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><p>Define calculations that a LSTM unit performs during a single time step.
-This function itself is not a recurrent layer, so it can not be
-directly used to process sequence inputs. This function is always used in
+<dd><p>lstmemory_unit defines the caculation process of a LSTM unit during a
+single time step. This function is not a recurrent layer, so it can not be
+directly used to process sequence input. This function is always used in
 recurrent_group (see layers.py for more details) to implement attention
 mechanism.</p>
 <p>Please refer to  <strong>Generating Sequences With Recurrent Neural Networks</strong>
@@ -486,21 +490,21 @@ for more details about LSTM. The link goes as follows:
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>out_memory</strong> (<em>LayerOutput | None</em>) &#8211; output of previous time step</li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; lstmemory unit name.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; lstmemory unit size.</li>
-<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; Parameter config, None if use default.</li>
-<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; lstm final activiation type</li>
-<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; lstm gate activiation type</li>
-<li><strong>state_act</strong> (<em>BaseActivation</em>) &#8211; lstm state activiation type.</li>
-<li><strong>input_proj_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; bias attribute for input-to-hidden projection.
+<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; parameter attribute, None means default attribute.</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; last activiation type of lstm.</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; gate activiation type of lstm.</li>
+<li><strong>state_act</strong> (<em>BaseActivation</em>) &#8211; state activiation type of lstm.</li>
+<li><strong>input_proj_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; bias attribute for input to hidden projection.
 False means no bias, None means default bias.</li>
 <li><strong>input_proj_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; extra layer attribute for input to hidden
 projection of the LSTM unit, such as dropout, error clipping.</li>
-<li><strong>lstm_bias_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias parameter attribute of lstm layer.
+<li><strong>lstm_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; bias parameter attribute of lstm layer.
 False means no bias, None means default bias.</li>
-<li><strong>lstm_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; lstm layer&#8217;s extra attribute.</li>
+<li><strong>lstm_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; extra attribute of lstm layer.</li>
 </ul>
 </td>
 </tr>
@@ -523,9 +527,9 @@ False means no bias, None means default bias.</li>
 <dd><p>lstm_group is a recurrent_group version of Long Short Term Memory. It
 does exactly the same calculation as the lstmemory layer (see lstmemory in
 layers.py for the maths) does. A promising benefit is that LSTM memory
-cell states, or hidden states in every time step are accessible to the
+cell states(or hidden states) in every time step are accessible to the
 user. This is especially useful in attention model. If you do not need to
-access the internal states of the lstm, but merely use its outputs,
+access the internal states of the lstm and merely use its outputs,
 it is recommended to use the lstmemory, which is relatively faster than
 lstmemory_group.</p>
 <p>NOTE: In PaddlePaddle&#8217;s implementation, the following input-to-hidden
@@ -547,18 +551,18 @@ full_matrix_projection must be included before lstmemory_unit is called.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; lstmemory group size.</li>
-<li><strong>name</strong> (<em>basestring</em>) &#8211; name of the lstmemory group.</li>
-<li><strong>out_memory</strong> (<em>LayerOutput | None</em>) &#8211; output of previous time step</li>
-<li><strong>reverse</strong> (<em>bool</em>) &#8211; is lstm reversed</li>
-<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; Parameter config, None if use default.</li>
-<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; lstm final activiation type</li>
-<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; lstm gate activiation type</li>
-<li><strong>state_act</strong> (<em>BaseActivation</em>) &#8211; lstm state activiation type.</li>
-<li><strong>lstm_bias_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias parameter attribute of lstm layer.
+<li><strong>name</strong> (<em>basestring</em>) &#8211; name of lstmemory group.</li>
+<li><strong>out_memory</strong> (<em>LayerOutput | None</em>) &#8211; output of previous time step.</li>
+<li><strong>reverse</strong> (<em>bool</em>) &#8211; process the input in a reverse order or not.</li>
+<li><strong>param_attr</strong> (<em>ParameterAttribute</em>) &#8211; parameter attribute, None means default attribute.</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; last activiation type of lstm.</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; gate activiation type of lstm.</li>
+<li><strong>state_act</strong> (<em>BaseActivation</em>) &#8211; state activiation type of lstm.</li>
+<li><strong>lstm_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; bias parameter attribute of lstm layer.
 False means no bias, None means default bias.</li>
-<li><strong>input_proj_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; bias attribute for input-to-hidden projection.
+<li><strong>input_proj_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; bias attribute for input to hidden projection.
 False means no bias, None means default bias.</li>
 <li><strong>input_proj_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; extra layer attribute for input to hidden
 projection of the LSTM unit, such as dropout, error clipping.</li>
@@ -583,34 +587,34 @@ projection of the LSTM unit, such as dropout, error clipping.</li>
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">simple_lstm</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>Simple LSTM Cell.</p>
-<p>It just combine a mixed layer with fully_matrix_projection and a lstmemory
-layer. The simple lstm cell was implemented as follow equations.</p>
+<p>It just combines a mixed layer with fully_matrix_projection and a lstmemory
+layer. The simple lstm cell was implemented with follow equations.</p>
 <div class="math">
 \[ \begin{align}\begin{aligned}i_t &amp; = \sigma(W_{xi}x_{t} + W_{hi}h_{t-1} + W_{ci}c_{t-1} + b_i)\\f_t &amp; = \sigma(W_{xf}x_{t} + W_{hf}h_{t-1} + W_{cf}c_{t-1} + b_f)\\c_t &amp; = f_tc_{t-1} + i_t tanh (W_{xc}x_t+W_{hc}h_{t-1} + b_c)\\o_t &amp; = \sigma(W_{xo}x_{t} + W_{ho}h_{t-1} + W_{co}c_t + b_o)\\h_t &amp; = o_t tanh(c_t)\end{aligned}\end{align} \]</div>
-<p>Please refer <strong>Generating Sequences With Recurrent Neural Networks</strong> if you
-want to know what lstm is. <a class="reference external" href="http://arxiv.org/abs/1308.0850">Link</a> is here.</p>
+<p>Please refer to <strong>Generating Sequences With Recurrent Neural Networks</strong> for more
+details about lstm. <a class="reference external" href="http://arxiv.org/abs/1308.0850">Link</a> is here.</p>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
 <li><strong>name</strong> (<em>basestring</em>) &#8211; lstm layer name.</li>
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; layer&#8217;s input.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; lstm layer size.</li>
-<li><strong>reverse</strong> (<em>bool</em>) &#8211; whether to process the input data in a reverse order</li>
-<li><strong>mat_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; mixed layer&#8217;s matrix projection parameter attribute.</li>
+<li><strong>reverse</strong> (<em>bool</em>) &#8211; process the input in a reverse order or not.</li>
+<li><strong>mat_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; parameter attribute of matrix projection in mixed layer.</li>
 <li><strong>bias_param_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias parameter attribute. False means no bias, None
 means default bias.</li>
-<li><strong>inner_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; lstm cell parameter attribute.</li>
-<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; lstm final activiation type</li>
-<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; lstm gate activiation type</li>
-<li><strong>state_act</strong> (<em>BaseActivation</em>) &#8211; lstm state activiation type.</li>
-<li><strong>mixed_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; mixed layer&#8217;s extra attribute.</li>
-<li><strong>lstm_cell_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; lstm layer&#8217;s extra attribute.</li>
+<li><strong>inner_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; parameter attribute of lstm cell.</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; last activiation type of lstm.</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; gate activiation type of lstm.</li>
+<li><strong>state_act</strong> (<em>BaseActivation</em>) &#8211; state activiation type of lstm.</li>
+<li><strong>mixed_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; extra attribute of mixed layer.</li>
+<li><strong>lstm_cell_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; extra attribute of lstm.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">lstm layer name.</p>
+<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">layer&#8217;s output.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -627,8 +631,8 @@ means default bias.</li>
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">bidirectional_lstm</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>A bidirectional_lstm is a recurrent unit that iterates over the input
-sequence both in forward and bardward orders, and then concatenate two
-outputs form a final output. However, concatenation of two outputs
+sequence both in forward and backward orders, and then concatenate two
+outputs to form a final output. However, concatenation of two outputs
 is not the only way to form the final output, you can also, for example,
 just add them together.</p>
 <p>Please refer to  <strong>Neural Machine Translation by Jointly Learning to Align
@@ -647,15 +651,14 @@ The link goes as follows:
 <li><strong>name</strong> (<em>basestring</em>) &#8211; bidirectional lstm layer name.</li>
 <li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; lstm layer size.</li>
-<li><strong>return_seq</strong> (<em>bool</em>) &#8211; If set False, outputs of the last time step are
+<li><strong>return_seq</strong> (<em>bool</em>) &#8211; If set False, the last time step of output are
 concatenated and returned.
-If set True, the entire output sequences that are
-processed in forward and backward directions are
-concatenated and returned.</li>
+If set True, the entire output sequences in forward
+and backward directions are concatenated and returned.</li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">LayerOutput object accroding to the return_seq.</p>
+<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">LayerOutput object.</p>
 </td>
 </tr>
 <tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">LayerOutput</p>
@@ -674,9 +677,9 @@ concatenated and returned.</li>
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">gru_unit</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><p>Define calculations that a gated recurrent unit performs in a single time
-step. This function itself is not a recurrent layer, so it can not be
-directly used to process sequence inputs. This function is always used in
+<dd><p>gru_unit defines the calculation process of a gated recurrent unit during a single
+time step. This function is not a recurrent layer, so it can not be
+directly used to process sequence input. This function is always used in
 the recurrent_group (see layers.py for more details) to implement attention
 mechanism.</p>
 <p>Please see grumemory in layers.py for the details about the maths.</p>
@@ -685,13 +688,13 @@ mechanism.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>memory_boot</strong> (<em>LayerOutput | None</em>) &#8211; the initialization state of the LSTM cell.</li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; name of the gru group.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; hidden size of the gru.</li>
-<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; type of the activation</li>
-<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; type of the gate activation</li>
-<li><strong>gru_layer_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; Extra parameter attribute of the gru layer.</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; activation type of gru</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; gate activation type or gru</li>
+<li><strong>gru_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; Extra attribute of the gru layer.</li>
 </ul>
 </td>
 </tr>
@@ -715,11 +718,11 @@ mechanism.</p>
 does exactly the same calculation as the grumemory layer does. A promising
 benefit is that gru hidden states are accessible to the user. This is
 especially useful in attention model. If you do not need to access
-any internal state, but merely use the outputs of a GRU, it is recommended
+any internal state and merely use the outputs of a GRU, it is recommended
 to use the grumemory, which is relatively faster.</p>
 <p>Please see grumemory in layers.py for more detail about the maths.</p>
 <p>The example usage is:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">gru</span> <span class="o">=</span> <span class="n">gur_group</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="p">[</span><span class="n">layer1</span><span class="p">],</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">gru</span> <span class="o">=</span> <span class="n">gru_group</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="p">[</span><span class="n">layer1</span><span class="p">],</span>
                <span class="n">size</span><span class="o">=</span><span class="mi">256</span><span class="p">,</span>
                <span class="n">act</span><span class="o">=</span><span class="n">TanhActivation</span><span class="p">(),</span>
                <span class="n">gate_act</span><span class="o">=</span><span class="n">SigmoidActivation</span><span class="p">())</span>
@@ -730,15 +733,16 @@ to use the grumemory, which is relatively faster.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>memory_boot</strong> (<em>LayerOutput | None</em>) &#8211; the initialization state of the LSTM cell.</li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; name of the gru group.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; hidden size of the gru.</li>
-<li><strong>reverse</strong> (<em>bool</em>) &#8211; whether to process the input data in a reverse order</li>
-<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; type of the activiation</li>
-<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; type of the gate activiation</li>
-<li><strong>gru_bias_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias. False means no bias, None means default bias.</li>
-<li><strong>gru_layer_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; Extra parameter attribute of the gru layer.</li>
+<li><strong>reverse</strong> (<em>bool</em>) &#8211; process the input in a reverse order or not.</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; activiation type of gru</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; gate activiation type of gru</li>
+<li><strong>gru_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; bias parameter attribute of gru layer,
+False means no bias, None means default bias.</li>
+<li><strong>gru_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; Extra attribute of the gru layer.</li>
 </ul>
 </td>
 </tr>
@@ -758,11 +762,11 @@ to use the grumemory, which is relatively faster.</p>
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">simple_gru</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><p>You maybe see gru_step_layer, grumemory in layers.py, gru_unit, gru_group,
+<dd><p>You may see gru_step_layer, grumemory in layers.py, gru_unit, gru_group,
 simple_gru in network.py. The reason why there are so many interfaces is
 that we have two ways to implement recurrent neural network. One way is to
 use one complete layer to implement rnn (including simple rnn, gru and lstm)
-with multiple time steps, such as recurrent_layer, lstmemory, grumemory. But,
+with multiple time steps, such as recurrent_layer, lstmemory, grumemory. But
 the multiplication operation <span class="math">\(W x_t\)</span> is not computed in these layers.
 See details in their interfaces in layers.py.
 The other implementation is to use an recurrent group which can ensemble a
@@ -792,14 +796,15 @@ gru_group, and gru_group is relatively better than simple_gru.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; name of the gru group.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; hidden size of the gru.</li>
-<li><strong>reverse</strong> (<em>bool</em>) &#8211; whether to process the input data in a reverse order</li>
-<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; type of the activiation</li>
-<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; type of the gate activiation</li>
-<li><strong>gru_bias_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias. False means no bias, None means default bias.</li>
-<li><strong>gru_layer_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; Extra parameter attribute of the gru layer.</li>
+<li><strong>reverse</strong> (<em>bool</em>) &#8211; process the input in a reverse order or not.</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; activiation type of gru</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; gate activiation type of gru</li>
+<li><strong>gru_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; bias parameter attribute of gru layer,
+False means no bias, None means default bias.</li>
+<li><strong>gru_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; Extra attribute of the gru layer.</li>
 </ul>
 </td>
 </tr>
@@ -819,8 +824,8 @@ gru_group, and gru_group is relatively better than simple_gru.</p>
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">simple_gru2</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><p>simple_gru2 is the same with simple_gru, but using grumemory instead
-Please see grumemory in layers.py for more detail about the maths.
+<dd><p>simple_gru2 is the same with simple_gru, but using grumemory instead.
+Please refer to grumemory in layers.py for more detail about the math.
 simple_gru2 is faster than simple_gru.</p>
 <p>The example usage is:</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">gru</span> <span class="o">=</span> <span class="n">simple_gru2</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="p">[</span><span class="n">layer1</span><span class="p">],</span> <span class="n">size</span><span class="o">=</span><span class="mi">256</span><span class="p">)</span>
@@ -831,14 +836,15 @@ simple_gru2 is faster than simple_gru.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer name.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; name of the gru group.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; hidden size of the gru.</li>
-<li><strong>reverse</strong> (<em>bool</em>) &#8211; whether to process the input data in a reverse order</li>
-<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; type of the activiation</li>
-<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; type of the gate activiation</li>
-<li><strong>gru_bias_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; bias. False means no bias, None means default bias.</li>
-<li><strong>gru_layer_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; Extra parameter attribute of the gru layer.</li>
+<li><strong>reverse</strong> (<em>bool</em>) &#8211; process the input in a reverse order or not.</li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; activiation type of gru</li>
+<li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; gate activiation type of gru</li>
+<li><strong>gru_bias_attr</strong> (<em>ParameterAttribute|False|None</em>) &#8211; bias parameter attribute of gru layer,
+False means no bias, None means default bias.</li>
+<li><strong>gru_layer_attr</strong> (<em>ExtraLayerAttribute</em>) &#8211; Extra attribute of the gru layer.</li>
 </ul>
 </td>
 </tr>
@@ -859,7 +865,7 @@ simple_gru2 is faster than simple_gru.</p>
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">bidirectional_gru</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>A bidirectional_gru is a recurrent unit that iterates over the input
-sequence both in forward and bardward orders, and then concatenate two
+sequence both in forward and backward orders, and then concatenate two
 outputs to form a final output. However, concatenation of two outputs
 is not the only way to form the final output, you can also, for example,
 just add them together.</p>
@@ -875,11 +881,10 @@ just add them together.</p>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; bidirectional gru layer name.</li>
 <li><strong>input</strong> (<em>LayerOutput</em>) &#8211; input layer.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; gru layer size.</li>
-<li><strong>return_seq</strong> (<em>bool</em>) &#8211; If set False, outputs of the last time step are
+<li><strong>return_seq</strong> (<em>bool</em>) &#8211; If set False, the last time step of output are
 concatenated and returned.
-If set True, the entire output sequences that are
-processed in forward and backward directions are
-concatenated and returned.</li>
+If set True, the entire output sequences in forward
+and backward directions are concatenated and returned.</li>
 </ul>
 </td>
 </tr>
@@ -900,7 +905,7 @@ concatenated and returned.</li>
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.v2.networks.</code><code class="descname">simple_attention</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><p>Calculate and then return a context vector by attention machanism.
+<dd><p>Calculate and return a context vector with attention mechanism.
 Size of the context vector equals to size of the encoded_sequence.</p>
 <div class="math">
 \[ \begin{align}\begin{aligned}a(s_{i-1},h_{j}) &amp; = v_{a}f(W_{a}s_{t-1} + U_{a}h_{j})\\e_{i,j} &amp; = a(s_{i-1}, h_{j})\\a_{i,j} &amp; = \frac{exp(e_{i,j})}{\sum_{k=1}^{T_x}{exp(e_{i,k})}}\\c_{i} &amp; = \sum_{j=1}^{T_{x}}a_{i,j}h_{j}\end{aligned}\end{align} \]</div>
@@ -924,8 +929,8 @@ Align and Translate</strong> for more details. The link is as follows:
 <tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
 <li><strong>name</strong> (<em>basestring</em>) &#8211; name of the attention model.</li>
 <li><strong>softmax_param_attr</strong> (<em>ParameterAttribute</em>) &#8211; parameter attribute of sequence softmax
-that is used to produce attention weight</li>
-<li><strong>weight_act</strong> (<em>Activation</em>) &#8211; activation of the attention model</li>
+that is used to produce attention weight.</li>
+<li><strong>weight_act</strong> (<em>BaseActivation</em>) &#8211; activation of the attention model.</li>
 <li><strong>encoded_sequence</strong> (<em>LayerOutput</em>) &#8211; output of the encoder</li>
 <li><strong>encoded_proj</strong> (<em>LayerOutput</em>) &#8211; attention weight is computed by a feed forward neural
 network which has two inputs : decoder&#8217;s hidden state

--- a/develop/doc_cn/searchindex.js
+++ b/develop/doc_cn/searchindex.js