Deploy to GitHub Pages: d130d181

7f5f4cad · Travis CI · 3efa499e · 7f5f4cad · 7f5f4cad · 7f5f4cad
4 changed file
--- a/doc/searchindex.js
+++ b/doc/searchindex.js
--- a/doc/ui/api/trainer_config_helpers/layers.html
+++ b/doc/ui/api/trainer_config_helpers/layers.html
@@ -91,7 +91,7 @@ strings.</td>
 <h2>LayerOutput<a class="headerlink" href="#layeroutput" title="Permalink to this headline">¶</a></h2>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.trainer_config_helpers.layers.</code><code class="descname">LayerOutput</code><span class="sig-paren">(</span><em>name</em>, <em>layer_type</em>, <em>parents=None</em>, <em>activation=None</em>, <em>num_filters=None</em>, <em>img_norm_type=None</em>, <em>size=None</em>, <em>outputs=None</em><span class="sig-paren">)</span></dt>
+<em class="property">class </em><code class="descclassname">paddle.trainer_config_helpers.layers.</code><code class="descname">LayerOutput</code><span class="sig-paren">(</span><em>name</em>, <em>layer_type</em>, <em>parents=None</em>, <em>activation=None</em>, <em>num_filters=None</em>, <em>img_norm_type=None</em>, <em>size=None</em>, <em>outputs=None</em>, <em>reverse=None</em><span class="sig-paren">)</span></dt>
 <dd><p>LayerOutput is output for layer function. It is used internally by several
 reasons.</p>
 <ul>
@@ -115,7 +115,7 @@ reasons.</p>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; Layer output name.</li>
 <li><strong>layer_type</strong> (<em>basestring</em>) &#8211; Current Layer Type. One of LayerType enumeration.</li>
 <li><strong>activation</strong> (<em>BaseActivation.</em>) &#8211; Layer Activation.</li>
-<li><strong>parents</strong> (<em>list|tuple</em>) &#8211; Layer&#8217;s parents.</li>
+<li><strong>parents</strong> (<em>list|tuple|collection.Sequence</em>) &#8211; Layer&#8217;s parents.</li>
 </ul>
 </td>
 </tr>
@@ -219,7 +219,7 @@ of this layer maybe sparse. It requires an additional input to indicate
 several selected columns for output. If the selected columns is not
 specified, selective_fc_layer acts exactly like fc_layer.</p>
 <p>The simple usage is:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">sel_fc</span> <span class="o">=</span> <span class="n">selective_fc_layer</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="nb">input</span><span class="p">,</span> <span class="mi">128</span><span class="p">,</span> <span class="n">act</span><span class="o">=</span><span class="n">TanhActivation</span><span class="p">())</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">sel_fc</span> <span class="o">=</span> <span class="n">selective_fc_layer</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="nb">input</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">128</span><span class="p">,</span> <span class="n">act</span><span class="o">=</span><span class="n">TanhActivation</span><span class="p">())</span>
 </pre></div>
 </div>
 <table class="docutils field-list" frame="void" rules="none">
@@ -229,6 +229,8 @@ specified, selective_fc_layer acts exactly like fc_layer.</p>
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>name</strong> (<em>basestring</em>) &#8211; The Layer Name.</li>
 <li><strong>input</strong> (<em>LayerOutput|list|tuple</em>) &#8211; The input layer.</li>
+<li><strong>select</strong> (<em>LayerOutput</em>) &#8211; The select layer. The output of select layer should be a
+sparse binary matrix, and treat as the mask of selective fc.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; The layer dimension.</li>
 <li><strong>act</strong> (<em>BaseActivation</em>) &#8211; Activation Type. Default is tanh.</li>
 <li><strong>param_attr</strong> (<a class="reference internal" href="attrs.html#paddle.trainer_config_helpers.attrs.ParameterAttribute" title="paddle.trainer_config_helpers.attrs.ParameterAttribute"><em>ParameterAttribute</em></a>) &#8211; The Parameter Attribute.</li>
@@ -257,7 +259,7 @@ default Bias.</li>
 <h2>conv_operator<a class="headerlink" href="#conv-operator" title="Permalink to this headline">¶</a></h2>
 <dl class="function">
 <dt>
-<code class="descclassname">paddle.trainer_config_helpers.layers.</code><code class="descname">conv_operator</code><span class="sig-paren">(</span><em>img</em>, <em>filter</em>, <em>filter_size</em>, <em>num_filters</em>, <em>num_channel=None</em>, <em>stride=1</em>, <em>padding=0</em>, <em>groups=1</em>, <em>filter_size_y=None</em>, <em>stride_y=None</em>, <em>padding_y=None</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.trainer_config_helpers.layers.</code><code class="descname">conv_operator</code><span class="sig-paren">(</span><em>img</em>, <em>filter</em>, <em>filter_size</em>, <em>num_filters</em>, <em>num_channel=None</em>, <em>stride=1</em>, <em>padding=0</em>, <em>filter_size_y=None</em>, <em>stride_y=None</em>, <em>padding_y=None</em><span class="sig-paren">)</span></dt>
 <dd><p>Different from img_conv_layer, conv_op is an Operator, which can be used
 in mixed_layer. And conv_op takes two inputs to perform convolution.
 The first input is the image and the second is filter kernel. It only
@@ -265,7 +267,7 @@ support GPU mode.</p>
 <p>The example usage is:</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">op</span> <span class="o">=</span> <span class="n">conv_operator</span><span class="p">(</span><span class="n">img</span><span class="o">=</span><span class="n">input1</span><span class="p">,</span>
                   <span class="nb">filter</span><span class="o">=</span><span class="n">input2</span><span class="p">,</span>
-                   <span class="n">filter_size</span><span class="o">=</span><span class="mf">3.0</span><span class="p">,</span>
+                   <span class="n">filter_size</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span>
                   <span class="n">num_filters</span><span class="o">=</span><span class="mi">64</span><span class="p">,</span>
                   <span class="n">num_channels</span><span class="o">=</span><span class="mi">64</span><span class="p">)</span>
 </pre></div>
@@ -320,13 +322,15 @@ the filter&#8217;s shape can be (filter_size, filter_size_y).</li>
 <dl class="docutils">
 <dt>In this formular:</dt>
 <dd><ul class="first last simple">
-<li>a&#8217;s index is computed modulo M.</li>
+<li>a&#8217;s index is computed modulo M. When it is negative, then get item from
-<li>b&#8217;s index is computed modulo N.</li>
+the right side (which is the end of array) to the left.</li>
+<li>b&#8217;s index is computed modulo N. When it is negative, then get item from
+the right size (which is the end of array) to the left.</li>
 </ul>
 </dd>
 </dl>
 <p>The example usage is:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">conv_shift</span> <span class="o">=</span> <span class="n">conv_shif_layer</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="p">[</span><span class="n">layer1</span><span class="p">,</span> <span class="n">layer2</span><span class="p">])</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">conv_shift</span> <span class="o">=</span> <span class="n">conv_shift_layer</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="p">[</span><span class="n">layer1</span><span class="p">,</span> <span class="n">layer2</span><span class="p">])</span>
 </pre></div>
 </div>
 <table class="docutils field-list" frame="void" rules="none">
@@ -335,7 +339,8 @@ the filter&#8217;s shape can be (filter_size, filter_size_y).</li>
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>name</strong> (<em>basestring</em>) &#8211; layer name</li>
-<li><strong>input</strong> (<em>LayerOutput|list|tuple.</em>) &#8211; Input layer.</li>
+<li><strong>a</strong> (<em>LayerOutput</em>) &#8211; Input layer a.</li>
+<li><strong>b</strong> (<em>LayerOutput</em>) &#8211; input layer b</li>
 </ul>
 </td>
 </tr>
@@ -374,16 +379,19 @@ rest channels will be processed by rest group of filters.</p>
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
 <li><strong>input</strong> (<em>LayerOutput</em>) &#8211; Layer Input.</li>
-<li><strong>filter_size</strong> (<em>int</em>) &#8211; The x dimension of a filter kernel.</li>
+<li><strong>filter_size</strong> (<em>int|tuple|list</em>) &#8211; The x dimension of a filter kernel. Or input a tuple for
-<li><strong>filter_size_y</strong> (<em>int</em>) &#8211; The y dimension of a filter kernel. Since PaddlePaddle
+two image dimension.</li>
+<li><strong>filter_size_y</strong> (<em>int|None</em>) &#8211; The y dimension of a filter kernel. Since PaddlePaddle
 currently supports rectangular filters, the filter&#8217;s
 shape will be (filter_size, filter_size_y).</li>
 <li><strong>num_filters</strong> &#8211; Each filter group&#8217;s number of filter</li>
 <li><strong>act</strong> (<em>BaseActivation</em>) &#8211; Activation type. Default is tanh</li>
 <li><strong>groups</strong> (<em>int</em>) &#8211; Group size of filters.</li>
-<li><strong>stride</strong> (<em>int</em>) &#8211; The x dimension of the stride.</li>
+<li><strong>stride</strong> (<em>int|tuple|list</em>) &#8211; The x dimension of the stride. Or input a tuple for two image
+dimension.</li>
 <li><strong>stride_y</strong> (<em>int</em>) &#8211; The y dimension of the stride.</li>
-<li><strong>padding</strong> (<em>int</em>) &#8211; The x dimension of the padding.</li>
+<li><strong>padding</strong> (<em>int|tuple|list</em>) &#8211; The x dimension of the padding. Or input a tuple for two
+image dimension</li>
 <li><strong>padding_y</strong> (<em>int</em>) &#8211; The y dimension of the padding.</li>
 <li><strong>bias_attr</strong> (<em>ParameterAttribute|False</em>) &#8211; Convolution bias attribute. None means default bias.
 False means no bias.</li>
@@ -508,7 +516,6 @@ The details please refer to
 <li><strong>power</strong> (<em>float</em>) &#8211; The hyper-parameter.</li>
 <li><strong>num_channels</strong> &#8211; input layer&#8217;s filers number or channels. If
 num_channels is None, it will be set automatically.</li>
-<li><strong>blocked</strong> &#8211; namely normalize in number of blocked feature maps.</li>
 <li><strong>layer_attr</strong> (<a class="reference internal" href="attrs.html#paddle.trainer_config_helpers.attrs.ExtraLayerAttribute" title="paddle.trainer_config_helpers.attrs.ExtraLayerAttribute"><em>ExtraLayerAttribute</em></a>) &#8211; Extra Layer Attribute.</li>
 </ul>
 </td>
@@ -549,7 +556,7 @@ y_i &amp;\gets \gamma \hat{x_i} + \beta \qquad &amp;//\ scale\ and\ shift\end{sp
 <li><strong>name</strong> (<em>basestring</em>) &#8211; layer name.</li>
 <li><strong>input</strong> (<em>LayerOutput</em>) &#8211; batch normalization input. Better be linear activation.
 Because there is an activation inside batch_normalization.</li>
-<li><strong>batch_norm_type</strong> &#8211; We have batch_norm and cudnn_batch_norm. batch_norm
+<li><strong>batch_norm_type</strong> (<em>None|string, None or &quot;batch_norm&quot; or &quot;cudnn_batch_norm&quot;</em>) &#8211; We have batch_norm and cudnn_batch_norm. batch_norm
 supports both CPU and GPU. cudnn_batch_norm requires
 cuDNN version greater or equal to v4 (&gt;=v4). But
 cudnn_batch_norm is faster and needs less memory
@@ -637,23 +644,34 @@ and <span class="math">\(out\)</span> is a (batchSize x dataDim) output vector.<
 <dl class="function">
 <dt>
 <code class="descclassname">paddle.trainer_config_helpers.layers.</code><code class="descname">recurrent_layer</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
-<dd><p>TODO(yuyang18): Add docs</p>
+<dd><p>Simple recurrent unit layer. It is just a fully connect layer through both
+time and neural network.</p>
+<p>For each sequence [start, end] it performs the following computation:</p>
+<div class="math">
+\[\begin{split}out_{i} = act(in_{i})     \      \      \text{for} \ i = start \\
+out_{i} = act(in_{i} + out_{i-1} * W) \ \ \text{for} \ start &lt; i &lt;= end\end{split}\]</div>
+<p>If reversed is true, the order is reversed:</p>
+<div class="math">
+\[\begin{split}out_{i} = act(in_{i})           \    \   \text{for} \ i = end  \\
+out_{i} = act(in_{i} + out_{i+1} * W) \ \ \text{for} \ start &lt;= i &lt; end\end{split}\]</div>
 <table class="docutils field-list" frame="void" rules="none">
 <col class="field-name" />
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> &#8211; </li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; Input Layer</li>
-<li><strong>size</strong> &#8211; </li>
+<li><strong>act</strong> (<em>BaseActivation</em>) &#8211; activation.</li>
-<li><strong>act</strong> &#8211; </li>
+<li><strong>bias_attr</strong> (<a class="reference internal" href="attrs.html#paddle.trainer_config_helpers.attrs.ParameterAttribute" title="paddle.trainer_config_helpers.attrs.ParameterAttribute"><em>ParameterAttribute</em></a>) &#8211; bias attribute.</li>
-<li><strong>bias_attr</strong> &#8211; </li>
+<li><strong>param_attr</strong> (<a class="reference internal" href="attrs.html#paddle.trainer_config_helpers.attrs.ParameterAttribute" title="paddle.trainer_config_helpers.attrs.ParameterAttribute"><em>ParameterAttribute</em></a>) &#8211; parameter attribute.</li>
-<li><strong>param_attr</strong> &#8211; </li>
+<li><strong>name</strong> (<em>basestring</em>) &#8211; name of the layer</li>
-<li><strong>name</strong> &#8211; </li>
+<li><strong>layer_attr</strong> (<a class="reference internal" href="attrs.html#paddle.trainer_config_helpers.attrs.ExtraLayerAttribute" title="paddle.trainer_config_helpers.attrs.ExtraLayerAttribute"><em>ExtraLayerAttribute</em></a>) &#8211; Layer Attribute.</li>
-<li><strong>layer_attr</strong> &#8211; </li>
 </ul>
 </td>
 </tr>
-<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first last">LayerOutput object.</p>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">LayerOutput object.</p>
+</td>
+</tr>
+<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">LayerOutput</p>
 </td>
 </tr>
 </tbody>
@@ -803,7 +821,7 @@ Recurrent Neural Networks on Sequence Modeling.</a></p>
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>name</strong> (<em>None|basestring</em>) &#8211; The gru layer name.</li>
 <li><strong>input</strong> (<em>LayerOutput.</em>) &#8211; input layer.</li>
-<li><strong>reverse</strong> (<em>bool</em>) &#8211; Wether sequence process is reversed or not.</li>
+<li><strong>reverse</strong> (<em>bool</em>) &#8211; Whether sequence process is reversed or not.</li>
 <li><strong>act</strong> (<em>BaseActivation</em>) &#8211; activation type, TanhActivation by default. This activation
 affects the <span class="math">\({\tilde{h_t}}\)</span>.</li>
 <li><strong>gate_act</strong> (<em>BaseActivation</em>) &#8211; gate activation type, SigmoidActivation by default.
@@ -813,6 +831,8 @@ This activation affects the <span class="math">\(z_t\)</span> and <span class="m
 bias.</li>
 <li><strong>param_attr</strong> (<em>ParameterAttribute|None|False</em>) &#8211; Parameter Attribute.</li>
 <li><strong>layer_attr</strong> (<em>ExtraLayerAttribute|None</em>) &#8211; Extra Layer attribute</li>
+<li><strong>size</strong> (<em>None</em>) &#8211; Stub parameter of size, but actually not used. If set this size
+will get a warning.</li>
 </ul>
 </td>
 </tr>
@@ -936,14 +956,14 @@ to maintain tractability.</p>
 <p>The example usage is:</p>
 <div class="highlight-python"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">rnn_step</span><span class="p">(</span><span class="nb">input</span><span class="p">):</span>
    <span class="n">last_time_step_output</span> <span class="o">=</span> <span class="n">memory</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">&#39;rnn&#39;</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">512</span><span class="p">)</span>
-    <span class="k">with</span> <span class="n">mixed_layer</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="mi">512</span><span class="p">)</span> <span class="k">as</span> <span class="n">simple_rnn</span><span class="p">:</span>
+    <span class="k">with</span> <span class="n">mixed_layer</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="mi">512</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s1">&#39;rnn&#39;</span><span class="p">)</span> <span class="k">as</span> <span class="n">simple_rnn</span><span class="p">:</span>
        <span class="n">simple_rnn</span> <span class="o">+=</span> <span class="n">full_matrix_projection</span><span class="p">(</span><span class="nb">input</span><span class="p">)</span>
        <span class="n">simple_rnn</span> <span class="o">+=</span> <span class="n">last_time_step_output</span>
    <span class="k">return</span> <span class="n">simple_rnn</span>
 <span class="n">beam_gen</span> <span class="o">=</span> <span class="n">beam_search</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s2">&quot;decoder&quot;</span><span class="p">,</span>
                       <span class="n">step</span><span class="o">=</span><span class="n">rnn_step</span><span class="p">,</span>
-                       <span class="nb">input</span><span class="o">=</span><span class="p">[</span><span class="n">StaticInput</span><span class="p">(</span><span class="s2">&quot;encoder_last&quot;</span><span class="p">)],</span>
+                       <span class="nb">input</span><span class="o">=</span><span class="p">[</span><span class="n">StaticInput</span><span class="p">(</span><span class="n">encoder_last</span><span class="p">)],</span>
                       <span class="n">bos_id</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span>
                       <span class="n">eos_id</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
                       <span class="n">beam_size</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span>
@@ -961,22 +981,23 @@ to maintain tractability.</p>
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>name</strong> (<em>base string</em>) &#8211; Name of the recurrent unit that generates sequences.</li>
 <li><strong>step</strong> (<em>callable</em>) &#8211; <p>A callable function that defines the calculation in a time
-step, and it is appled to sequences with arbitrary length by
+step, and it is applied to sequences with arbitrary length by
 sharing a same set of weights.</p>
 <p>You can refer to the first parameter of recurrent_group, or
 demo/seqToseq/seqToseq_net.py for more details.</p>
 </li>
-<li><strong>input</strong> (<em>StaticInput|GeneratedInput</em>) &#8211; Input data for the recurrent unit</li>
+<li><strong>input</strong> (<em>list</em>) &#8211; Input data for the recurrent unit</li>
 <li><strong>bos_id</strong> (<em>int</em>) &#8211; Index of the start symbol in the dictionary. The start symbol
 is a special token for NLP task, which indicates the
 beginning of a sequence. In the generation task, the start
-symbol is ensential, since it is used to initialize the RNN
+symbol is essential, since it is used to initialize the RNN
 internal state.</li>
 <li><strong>eos_id</strong> (<em>int</em>) &#8211; Index of the end symbol in the dictionary. The end symbol is
 a special token for NLP task, which indicates the end of a
 sequence. The generation process will stop once the end
 symbol is generated, or a pre-defined max iteration number
 is exceeded.</li>
+<li><strong>max_length</strong> (<em>int</em>) &#8211; Max generated sequence length.</li>
 <li><strong>beam_size</strong> (<em>int</em>) &#8211; Beam search for sequence generation is an iterative search
 algorithm. To maintain tractability, every iteration only
 only stores a predetermined number, called the beam_size,
@@ -1166,7 +1187,7 @@ It performs element-wise multiplication with weight.</p>
 <h2>dotmul_operator<a class="headerlink" href="#dotmul-operator" title="Permalink to this headline">¶</a></h2>
 <dl class="function">
 <dt>
-<code class="descclassname">paddle.trainer_config_helpers.layers.</code><code class="descname">dotmul_operator</code><span class="sig-paren">(</span><em>x</em>, <em>y</em>, <em>scale=1</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.trainer_config_helpers.layers.</code><code class="descname">dotmul_operator</code><span class="sig-paren">(</span><em>a=None</em>, <em>b=None</em>, <em>scale=1</em>, <em>**kwargs</em><span class="sig-paren">)</span></dt>
 <dd><p>DotMulOperator takes two inputs and performs element-wise multiplication:</p>
 <div class="math">
 \[out.row[i] += scale * (x.row[i] .* y.row[i])\]</div>
@@ -1181,8 +1202,8 @@ scale is a config scalar, its default value is one.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>x</strong> (<em>LayerOutput</em>) &#8211; Input layer1</li>
+<li><strong>a</strong> (<em>LayerOutput</em>) &#8211; Input layer1</li>
-<li><strong>y</strong> (<em>LayerOutput</em>) &#8211; Input layer2</li>
+<li><strong>b</strong> (<em>LayerOutput</em>) &#8211; Input layer2</li>
 <li><strong>scale</strong> (<em>float</em>) &#8211; config scalar, default value is one.</li>
 </ul>
 </td>
@@ -1274,7 +1295,7 @@ It select dimesions [offset, offset+layer_size) from input:</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> (<em>LayerOutput.</em>) &#8211; Input Layer.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; Input Layer.</li>
 <li><strong>offset</strong> (<em>int</em>) &#8211; Offset, None if use default.</li>
 </ul>
 </td>
@@ -1493,7 +1514,7 @@ Inputs can be list of LayerOutput or list of projection.</p>
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>name</strong> (<em>basestring</em>) &#8211; Layer name.</li>
-<li><strong>input</strong> (<em>list|tuple</em>) &#8211; input layers or projections</li>
+<li><strong>input</strong> (<em>list|tuple|collection.Sequence</em>) &#8211; input layers or projections</li>
 <li><strong>act</strong> (<em>BaseActivation</em>) &#8211; Activation type.</li>
 <li><strong>layer_attr</strong> (<a class="reference internal" href="attrs.html#paddle.trainer_config_helpers.attrs.ExtraLayerAttribute" title="paddle.trainer_config_helpers.attrs.ExtraLayerAttribute"><em>ExtraLayerAttribute</em></a>) &#8211; Extra Layer Attribute.</li>
 </ul>
@@ -1701,7 +1722,7 @@ bias.</li>
 <p>Note that the above computation is for one sample. Multiple samples are
 processed in one batch.</p>
 <p>The simple usage is:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">linear_comb</span> <span class="o">=</span> <span class="n">linear_comb_layer</span><span class="p">(</span><span class="n">weighs</span><span class="o">=</span><span class="n">weight</span><span class="p">,</span> <span class="n">vectors</span><span class="o">=</span><span class="n">vectors</span><span class="p">,</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">linear_comb</span> <span class="o">=</span> <span class="n">linear_comb_layer</span><span class="p">(</span><span class="n">weights</span><span class="o">=</span><span class="n">weight</span><span class="p">,</span> <span class="n">vectors</span><span class="o">=</span><span class="n">vectors</span><span class="p">,</span>
                                <span class="n">size</span><span class="o">=</span><span class="n">elem_dim</span><span class="p">)</span>
 </pre></div>
 </div>
@@ -1710,7 +1731,8 @@ processed in one batch.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; The input layers.</li>
+<li><strong>weights</strong> (<em>LayerOutput</em>) &#8211; The weight layer.</li>
+<li><strong>vectors</strong> (<em>LayerOutput</em>) &#8211; The vector layer.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; the dimension of this layer.</li>
 <li><strong>name</strong> (<em>basestring</em>) &#8211; The Layer Name.</li>
 </ul>
@@ -1887,20 +1909,20 @@ element-wise. There is no activation and weight.</p>
 <dd><p>This layer performs tensor operation for two input.
 For example, each sample:</p>
 <div class="math">
-\[y_{i} = x_{1} * W_{i} * {x_{2}^\mathrm{T}}, i=0,1,...,K-1\]</div>
+\[y_{i} = a * W_{i} * {b^\mathrm{T}}, i=0,1,...,K-1\]</div>
 <dl class="docutils">
 <dt>In this formular:</dt>
 <dd><ul class="first last simple">
-<li><span class="math">\(x_{1}\)</span>: the first input contains M elements.</li>
+<li><span class="math">\(a\)</span>: the first input contains M elements.</li>
-<li><span class="math">\(x_{2}\)</span>: the second input contains N elements.</li>
+<li><span class="math">\(b\)</span>: the second input contains N elements.</li>
 <li><span class="math">\(y_{i}\)</span>: the i-th element of y.</li>
 <li><span class="math">\(W_{i}\)</span>: the i-th learned weight, shape if [M, N]</li>
-<li><span class="math">\({x_{2}}^\mathrm{T}\)</span>: the transpose of <span class="math">\(x_{2}\)</span>.</li>
+<li><span class="math">\(b^\mathrm{T}\)</span>: the transpose of <span class="math">\(b_{2}\)</span>.</li>
 </ul>
 </dd>
 </dl>
 <p>The simple usage is:</p>
-<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">tensor</span> <span class="o">=</span> <span class="n">tensor_layer</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="p">[</span><span class="n">layer1</span><span class="p">,</span> <span class="n">layer2</span><span class="p">])</span>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">tensor</span> <span class="o">=</span> <span class="n">tensor_layer</span><span class="p">(</span><span class="n">a</span><span class="o">=</span><span class="n">layer1</span><span class="p">,</span> <span class="n">b</span><span class="o">=</span><span class="n">layer2</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">1000</span><span class="p">)</span>
 </pre></div>
 </div>
 <table class="docutils field-list" frame="void" rules="none">
@@ -1909,10 +1931,11 @@ For example, each sample:</p>
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>name</strong> (<em>basestring</em>) &#8211; layer name</li>
-<li><strong>input</strong> (<em>LayerOutput|list|tuple.</em>) &#8211; Input layer.</li>
+<li><strong>a</strong> (<em>LayerOutput</em>) &#8211; Input layer a.</li>
+<li><strong>b</strong> (<em>LayerOutput</em>) &#8211; input layer b.</li>
 <li><strong>size</strong> (<em>int.</em>) &#8211; the layer dimension.</li>
 <li><strong>act</strong> (<em>BaseActivation</em>) &#8211; Activation Type. Default is tanh.</li>
-<li><strong>param_attr</strong> (<em>ParameterAttribute|list</em>) &#8211; The Parameter Attribute.</li>
+<li><strong>param_attr</strong> (<a class="reference internal" href="attrs.html#paddle.trainer_config_helpers.attrs.ParameterAttribute" title="paddle.trainer_config_helpers.attrs.ParameterAttribute"><em>ParameterAttribute</em></a>) &#8211; The Parameter Attribute.</li>
 <li><strong>bias_attr</strong> (<em>ParameterAttribute|None|Any</em>) &#8211; The Bias Attribute. If no bias, then pass False or
 something not type of ParameterAttribute. None will get a
 default Bias.</li>
@@ -2192,7 +2215,6 @@ Sampling one id for one sample.</p>
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>LayerOutput.</em>) &#8211; The first input layer.</li>
 <li><strong>label</strong> &#8211; The input label.</li>
-<li><strong>type</strong> (<em>basestring.</em>) &#8211; The type of cost.</li>
 <li><strong>name</strong> (<em>None|basestring.</em>) &#8211; The name of this layers. It is not necessary.</li>
 <li><strong>coeff</strong> (<em>float.</em>) &#8211; The coefficient affects the gradient in the backward.</li>
 </ul>
@@ -2227,9 +2249,7 @@ Sampling one id for one sample.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; The 1st input. Samples of the same query should be loaded
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; Samples of the same query should be loaded as sequence.</li>
-as sequence. User should provided socres for each sample.
-The score should be the 2nd input of this layer.</li>
 <li><strong>score</strong> &#8211; The 2nd input. Score of each sample.</li>
 <li><strong>NDCG_num</strong> (<em>int</em>) &#8211; The size of NDCG (Normalized Discounted Cumulative Gain),
 e.g., 5 for NDCG&#64;5. It must be less than for equal to the
@@ -2242,7 +2262,6 @@ equal to NDCG_num. And if max_sort_size is greater
 than the size of a list, the algorithm will sort the
 entire list of get gradient.</li>
 <li><strong>name</strong> (<em>None|basestring</em>) &#8211; The name of this layers. It is not necessary.</li>
-<li><strong>coeff</strong> (<em>float</em>) &#8211; The coefficient affects the gradient in the backward.</li>
 </ul>
 </td>
 </tr>
@@ -2330,7 +2349,7 @@ field model.</p>
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
 <li><strong>input</strong> (<em>LayerOutput</em>) &#8211; The first input layer is the feature.</li>
-<li><strong>label</strong> &#8211; The second input layer is label.</li>
+<li><strong>label</strong> (<em>LayerOutput</em>) &#8211; The second input layer is label.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; The category number.</li>
 <li><strong>weight</strong> (<em>LayerOutput</em>) &#8211; The third layer is &#8220;weight&#8221; of each sample, which is an
 optional argument.</li>
@@ -2415,10 +2434,10 @@ should also be num_classes + 1.</p>
 <col class="field-body" />
 <tbody valign="top">
 <tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
-<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; The input layers.</li>
+<li><strong>input</strong> (<em>LayerOutput</em>) &#8211; The input layer.</li>
 <li><strong>label</strong> (<em>LayerOutput</em>) &#8211; The data layer of label with variable length.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; category numbers + 1.</li>
-<li><strong>name</strong> (<em>string|None</em>) &#8211; The name of this layer, which can not specify.</li>
+<li><strong>name</strong> (<em>basestring|None</em>) &#8211; The name of this layer</li>
 <li><strong>norm_by_times</strong> (<em>bool</em>) &#8211; Whether to normalization by times. False by default.</li>
 </ul>
 </td>

--- a/doc/ui/api/trainer_config_helpers/networks.html
+++ b/doc/ui/api/trainer_config_helpers/networks.html
@@ -381,7 +381,7 @@ layers.py for the maths) does. A promising benefit is that LSTM memory
 cell states, or hidden states in every time step are accessible to for the
 user. This is especially useful in attention model. If you do not need to
 access to the internal states of the lstm, but merely use its outputs,
-it is recommanded to use the lstmemory, which is relatively faster than
+it is recommended to use the lstmemory, which is relatively faster than
 lstmemory_group.</p>
 <p>NOTE: In PaddlePaddle&#8217;s implementation, the following input-to-hidden
 multiplications:
@@ -736,7 +736,7 @@ compute attention weight.</li>
 <h2>outputs<a class="headerlink" href="#outputs" title="Permalink to this headline">¶</a></h2>
 <dl class="function">
 <dt>
-<code class="descclassname">paddle.trainer_config_helpers.networks.</code><code class="descname">outputs</code><span class="sig-paren">(</span><em>layers</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.trainer_config_helpers.networks.</code><code class="descname">outputs</code><span class="sig-paren">(</span><em>layers</em>, <em>*args</em><span class="sig-paren">)</span></dt>
 <dd><p>Declare the end of network. Currently it will only calculate the
 input/output order of network. It will calculate the predict network or
 train network&#8217;s output automatically.</p>

--- a/doc/ui/api/trainer_config_helpers/poolings.html
+++ b/doc/ui/api/trainer_config_helpers/poolings.html
@@ -92,11 +92,20 @@ Each PoolingType contains one parameter:</p>
 <h1>MaxPooling<a class="headerlink" href="#maxpooling" title="Permalink to this headline">¶</a></h1>
 <dl class="class">
 <dt>
-<em class="property">class </em><code class="descclassname">paddle.trainer_config_helpers.poolings.</code><code class="descname">MaxPooling</code></dt>
+<em class="property">class </em><code class="descclassname">paddle.trainer_config_helpers.poolings.</code><code class="descname">MaxPooling</code><span class="sig-paren">(</span><em>output_max_index=None</em><span class="sig-paren">)</span></dt>
 <dd><p>Max pooling.</p>
 <p>Return the very large values for each dimension in sequence or time steps.</p>
 <div class="math">
 \[max(samples\_of\_a\_sequence)\]</div>
+<table class="docutils field-list" frame="void" rules="none">
+<col class="field-name" />
+<col class="field-body" />
+<tbody valign="top">
+<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><strong>output_max_index</strong> (<em>bool|None</em>) &#8211; True if output sequence max index instead of max
+value. None means use default value in proto.</td>
+</tr>
+</tbody>
+</table>
 </dd></dl>
 </div>