Deploy to GitHub Pages: 0311fd15

b94b06f6 · Travis CI · d590f048 · b94b06f6 · b94b06f6 · b94b06f6
7 changed file
--- a/develop/doc/_sources/api/v2/fluid/layers.rst.txt
+++ b/develop/doc/_sources/api/v2/fluid/layers.rst.txt
@@ -18,6 +18,11 @@ dynamic_lstm
 ..  autofunction:: paddle.v2.fluid.layers.dynamic_lstm
    :noindex:

+dynamic_lstmp
+-------------
+..  autofunction:: paddle.v2.fluid.layers.dynamic_lstmp
+    :noindex:
+
 dynamic_gru
 -----------
 ..  autofunction:: paddle.v2.fluid.layers.dynamic_gru

--- a/develop/doc/api/v2/fluid/layers.html
+++ b/develop/doc/api/v2/fluid/layers.html
@@ -358,7 +358,7 @@ with zeros whenever lookup encounters it in <code class="xref py py-attr docutil
 <h2>dynamic_lstm<a class="headerlink" href="#dynamic-lstm" title="Permalink to this headline">¶</a></h2>
 <dl class="function">
 <dt>
-<code class="descclassname">paddle.v2.fluid.layers.</code><code class="descname">dynamic_lstm</code><span class="sig-paren">(</span><em>input</em>, <em>size</em>, <em>param_attr=None</em>, <em>bias_attr=None</em>, <em>use_peepholes=True</em>, <em>is_reverse=False</em>, <em>gate_activation='sigmoid'</em>, <em>cell_activation='tanh'</em>, <em>candidate_activation='tanh'</em>, <em>dtype='float32'</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.fluid.layers.</code><code class="descname">dynamic_lstm</code><span class="sig-paren">(</span><em>input</em>, <em>size</em>, <em>param_attr=None</em>, <em>bias_attr=None</em>, <em>use_peepholes=True</em>, <em>is_reverse=False</em>, <em>gate_activation='sigmoid'</em>, <em>cell_activation='tanh'</em>, <em>candidate_activation='tanh'</em>, <em>dtype='float32'</em>, <em>name=None</em><span class="sig-paren">)</span></dt>
 <dd><p><strong>Dynamic LSTM Layer</strong></p>
 <p>The defalut implementation is diagonal/peephole connection
 (<a class="reference external" href="https://arxiv.org/pdf/1402.1128.pdf">https://arxiv.org/pdf/1402.1128.pdf</a>), the formula is as follows:</p>
@@ -368,7 +368,7 @@ with zeros whenever lookup encounters it in <code class="xref py py-attr docutil
 the matrix of weights from the input gate to the input), <span class="math">\(W_{ic},     W_{fc}, W_{oc}\)</span> are diagonal weight matrices for peephole connections. In
 our implementation, we use vectors to reprenset these diagonal weight
 matrices. The <span class="math">\(b\)</span> terms denote bias vectors (<span class="math">\(b_i\)</span> is the input
-gate bias vector), <span class="math">\(\sigma\)</span> is the non-line activations, such as
+gate bias vector), <span class="math">\(\sigma\)</span> is the non-linear activations, such as
 logistic sigmoid function, and <span class="math">\(i, f, o\)</span> and <span class="math">\(c\)</span> are the input
 gate, forget gate, output gate, and cell activation vectors, respectively,
 all of which have the same size as the cell output activation vector <span class="math">\(h\)</span>.</p>
@@ -394,15 +394,15 @@ tensor in this Variable is a matrix with shape
 (T X 4D), where T is the total time steps in this
 mini-batch, D is the hidden size.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; 4 * hidden size.</li>
-<li><strong>param_attr</strong> (<em>ParamAttr</em>) &#8211; <p>The parameter attribute for the learnable
+<li><strong>param_attr</strong> (<em>ParamAttr|None</em>) &#8211; <p>The parameter attribute for the learnable
 hidden-hidden weights.</p>
 <ul>
+<li>Weights = {<span class="math">\(W_{ch}, W_{ih},                                                 W_{fh}, W_{oh}\)</span>}</li>
 <li>The shape is (D x 4D), where D is the hidden
 size.</li>
-<li>Weights = {<span class="math">\(W_{ch}, W_{ih},                                                 W_{fh}, W_{oh}\)</span>}</li>
 </ul>
 </li>
-<li><strong>bias_attr</strong> (<em>ParamAttr</em>) &#8211; <p>The bias attribute for the learnable bias
+<li><strong>bias_attr</strong> (<em>ParamAttr|None</em>) &#8211; <p>The bias attribute for the learnable bias
 weights, which contains two parts, input-hidden
 bias weights and peephole connections weights if
 setting <cite>use_peepholes</cite> to <cite>True</cite>.</p>
@@ -411,8 +411,8 @@ setting <cite>use_peepholes</cite> to <cite>True</cite>.</p>
 </ol>
 <blockquote>
 <div><ul>
-<li>The shape is (1 x 4D).</li>
 <li>Biases = {<span class="math">\(b_c, b_i, b_f, b_o\)</span>}.</li>
+<li>The shape is (1 x 4D).</li>
 </ul>
 </div></blockquote>
 <ol class="arabic" start="2">
@@ -420,8 +420,8 @@ setting <cite>use_peepholes</cite> to <cite>True</cite>.</p>
 </ol>
 <blockquote>
 <div><ul>
-<li>The shape is (1 x 7D).</li>
 <li>Biases = { <span class="math">\(b_c, b_i, b_f, b_o, W_{ic},                                                  W_{fc}, W_{oc}\)</span>}.</li>
+<li>The shape is (1 x 7D).</li>
 </ul>
 </div></blockquote>
 </li>
@@ -437,6 +437,8 @@ output gate. Choices = [&#8220;sigmoid&#8221;, &#8220;tanh&#8221;, &#8220;relu&#
 Choices = [&#8220;sigmoid&#8221;, &#8220;tanh&#8221;, &#8220;relu&#8221;, &#8220;identity&#8221;],
 default &#8220;tanh&#8221;.</li>
 <li><strong>dtype</strong> (<em>str</em>) &#8211; Data type. Choices = [&#8220;float32&#8221;, &#8220;float64&#8221;], default &#8220;float32&#8221;.</li>
+<li><strong>name</strong> (<em>str|None</em>) &#8211; A name for this layer(optional). If set None, the layer
+will be named automatically.</li>
 </ul>
 </td>
 </tr>
@@ -458,6 +460,131 @@ default &#8220;tanh&#8221;.</li>
 </div>
 </dd></dl>

+</div>
+<div class="section" id="dynamic-lstmp">
+<h2>dynamic_lstmp<a class="headerlink" href="#dynamic-lstmp" title="Permalink to this headline">¶</a></h2>
+<dl class="function">
+<dt>
+<code class="descclassname">paddle.v2.fluid.layers.</code><code class="descname">dynamic_lstmp</code><span class="sig-paren">(</span><em>input</em>, <em>size</em>, <em>proj_size</em>, <em>param_attr=None</em>, <em>bias_attr=None</em>, <em>use_peepholes=True</em>, <em>is_reverse=False</em>, <em>gate_activation='sigmoid'</em>, <em>cell_activation='tanh'</em>, <em>candidate_activation='tanh'</em>, <em>proj_activation='tanh'</em>, <em>dtype='float32'</em>, <em>name=None</em><span class="sig-paren">)</span></dt>
+<dd><p><strong>Dynamic LSTMP Layer</strong></p>
+<p>LSTMP (LSTM with recurrent projection) layer has a separate projection
+layer after the LSTM layer, projecting the original hidden state to a
+lower-dimensional one, which is proposed to reduce the number of total
+parameters and furthermore computational complexity for the LSTM,
+espeacially for the case that the size of output units is relative
+large (<a class="reference external" href="https://research.google.com/pubs/archive/43905.pdf">https://research.google.com/pubs/archive/43905.pdf</a>).</p>
+<p>The formula is as follows:</p>
+<div class="math">
+\[ \begin{align}\begin{aligned}i_t &amp; = \sigma(W_{ix}x_{t} + W_{ir}r_{t-1} + W_{ic}c_{t-1} + b_i)\\f_t &amp; = \sigma(W_{fx}x_{t} + W_{fr}r_{t-1} + W_{fc}c_{t-1} + b_f)\\\tilde{c_t} &amp; = act_g(W_{cx}x_t + W_{cr}r_{t-1} + b_c)\\o_t &amp; = \sigma(W_{ox}x_{t} + W_{or}r_{t-1} + W_{oc}c_t + b_o)\\c_t &amp; = f_t \odot c_{t-1} + i_t \odot \tilde{c_t}\\h_t &amp; = o_t \odot act_h(c_t)\\r_t &amp; = \overline{act_h}(W_{rh}h_t)\end{aligned}\end{align} \]</div>
+<p>In the above formula:</p>
+<ul class="simple">
+<li><span class="math">\(W\)</span>: Denotes weight matrices (e.g. <span class="math">\(W_{xi}\)</span> is           the matrix of weights from the input gate to the input).</li>
+<li><span class="math">\(W_{ic}\)</span>, <span class="math">\(W_{fc}\)</span>, <span class="math">\(W_{oc}\)</span>: Diagonal weight           matrices for peephole connections. In our implementation,           we use vectors to reprenset these diagonal weight matrices.</li>
+<li><span class="math">\(b\)</span>: Denotes bias vectors (e.g. <span class="math">\(b_i\)</span> is the input gate           bias vector).</li>
+<li><span class="math">\(\sigma\)</span>: The activation, such as logistic sigmoid function.</li>
+<li><span class="math">\(i, f, o\)</span> and <span class="math">\(c\)</span>: The input gate, forget gate, output           gate, and cell activation vectors, respectively, all of which have           the same size as the cell output activation vector <span class="math">\(h\)</span>.</li>
+<li><span class="math">\(h\)</span>: The hidden state.</li>
+<li><span class="math">\(r\)</span>: The recurrent projection of the hidden state.</li>
+<li><span class="math">\(\tilde{c_t}\)</span>: The candidate hidden state, whose           computation is based on the current input and previous hidden state.</li>
+<li><span class="math">\(\odot\)</span>: The element-wise product of the vectors.</li>
+<li><span class="math">\(act_g\)</span> and <span class="math">\(act_h\)</span>: The cell input and cell output           activation functions and <cite>tanh</cite> is usually used for them.</li>
+<li><span class="math">\(\overline{act_h}\)</span>: The activation function for the projection           output, usually using <cite>identity</cite> or same as <span class="math">\(act_h\)</span>.</li>
+</ul>
+<p>Set <cite>use_peepholes</cite> to <cite>False</cite> to disable peephole connection. The formula
+is omitted here, please refer to the paper
+<a class="reference external" href="http://www.bioinf.jku.at/publications/older/2604.pdf">http://www.bioinf.jku.at/publications/older/2604.pdf</a> for details.</p>
+<p>Note that these <span class="math">\(W_{xi}x_{t}, W_{xf}x_{t}, W_{xc}x_{t}, W_{xo}x_{t}\)</span>
+operations on the input <span class="math">\(x_{t}\)</span> are NOT included in this operator.
+Users can choose to use fully-connected layer before LSTMP layer.</p>
+<table class="docutils field-list" frame="void" rules="none">
+<col class="field-name" />
+<col class="field-body" />
+<tbody valign="top">
+<tr class="field-odd field"><th class="field-name">Parameters:</th><td class="field-body"><ul class="first simple">
+<li><strong>input</strong> (<em>Variable</em>) &#8211; The input of dynamic_lstmp layer, which supports
+variable-time length input sequence. The underlying
+tensor in this Variable is a matrix with shape
+(T X 4D), where T is the total time steps in this
+mini-batch, D is the hidden size.</li>
+<li><strong>size</strong> (<em>int</em>) &#8211; 4 * hidden size.</li>
+<li><strong>proj_size</strong> (<em>int</em>) &#8211; The size of projection output.</li>
+<li><strong>param_attr</strong> (<em>ParamAttr|None</em>) &#8211; <p>The parameter attribute for the learnable
+hidden-hidden weight and projection weight.</p>
+<ul>
+<li>Hidden-hidden weight = {<span class="math">\(W_{ch}, W_{ih},                                                 W_{fh}, W_{oh}\)</span>}.</li>
+<li>The shape of hidden-hidden weight is (P x 4D),
+where P is the projection size and D the hidden
+size.</li>
+<li>Projection weight = {<span class="math">\(W_{rh}\)</span>}.</li>
+<li>The shape of projection weight is (D x P).</li>
+</ul>
+</li>
+<li><strong>bias_attr</strong> (<em>ParamAttr|None</em>) &#8211; <p>The bias attribute for the learnable bias
+weights, which contains two parts, input-hidden
+bias weights and peephole connections weights if
+setting <cite>use_peepholes</cite> to <cite>True</cite>.</p>
+<ol class="arabic">
+<li><cite>use_peepholes = False</cite></li>
+</ol>
+<blockquote>
+<div><ul>
+<li>Biases = {<span class="math">\(b_c, b_i, b_f, b_o\)</span>}.</li>
+<li>The shape is (1 x 4D).</li>
+</ul>
+</div></blockquote>
+<ol class="arabic" start="2">
+<li><cite>use_peepholes = True</cite></li>
+</ol>
+<blockquote>
+<div><ul>
+<li>Biases = { <span class="math">\(b_c, b_i, b_f, b_o, W_{ic},                                                  W_{fc}, W_{oc}\)</span>}.</li>
+<li>The shape is (1 x 7D).</li>
+</ul>
+</div></blockquote>
+</li>
+<li><strong>use_peepholes</strong> (<em>bool</em>) &#8211; Whether to enable diagonal/peephole connections,
+default <cite>True</cite>.</li>
+<li><strong>is_reverse</strong> (<em>bool</em>) &#8211; Whether to compute reversed LSTM, default <cite>False</cite>.</li>
+<li><strong>gate_activation</strong> (<em>str</em>) &#8211; The activation for input gate, forget gate and
+output gate. Choices = [&#8220;sigmoid&#8221;, &#8220;tanh&#8221;, &#8220;relu&#8221;,
+&#8220;identity&#8221;], default &#8220;sigmoid&#8221;.</li>
+<li><strong>cell_activation</strong> (<em>str</em>) &#8211; The activation for cell output. Choices = [&#8220;sigmoid&#8221;,
+&#8220;tanh&#8221;, &#8220;relu&#8221;, &#8220;identity&#8221;], default &#8220;tanh&#8221;.</li>
+<li><strong>candidate_activation</strong> (<em>str</em>) &#8211; The activation for candidate hidden state.
+Choices = [&#8220;sigmoid&#8221;, &#8220;tanh&#8221;, &#8220;relu&#8221;, &#8220;identity&#8221;],
+default &#8220;tanh&#8221;.</li>
+<li><strong>proj_activation</strong> (<em>str</em>) &#8211; The activation for projection output.
+Choices = [&#8220;sigmoid&#8221;, &#8220;tanh&#8221;, &#8220;relu&#8221;, &#8220;identity&#8221;],
+default &#8220;tanh&#8221;.</li>
+<li><strong>dtype</strong> (<em>str</em>) &#8211; Data type. Choices = [&#8220;float32&#8221;, &#8220;float64&#8221;], default &#8220;float32&#8221;.</li>
+<li><strong>name</strong> (<em>str|None</em>) &#8211; A name for this layer(optional). If set None, the layer
+will be named automatically.</li>
+</ul>
+</td>
+</tr>
+<tr class="field-even field"><th class="field-name">Returns:</th><td class="field-body"><p class="first">The projection of hidden state, and cell state of LSTMP. The                shape of projection is (T x P), for the cell state which is                (T x D), and both LoD is the same with the <cite>input</cite>.</p>
+</td>
+</tr>
+<tr class="field-odd field"><th class="field-name">Return type:</th><td class="field-body"><p class="first last">tuple</p>
+</td>
+</tr>
+</tbody>
+</table>
+<p class="rubric">Examples</p>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">hidden_dim</span><span class="p">,</span> <span class="n">proj_dim</span> <span class="o">=</span> <span class="mi">512</span><span class="p">,</span> <span class="mi">256</span>
+<span class="n">fc_out</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">fc</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">input_seq</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="n">hidden_dim</span> <span class="o">*</span> <span class="mi">4</span><span class="p">,</span>
+                         <span class="n">act</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span> <span class="n">bias_attr</span><span class="o">=</span><span class="bp">None</span><span class="p">)</span>
+<span class="n">proj_out</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">dynamic_lstmp</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">fc_out</span><span class="p">,</span>
+                                         <span class="n">size</span><span class="o">=</span><span class="n">hidden_dim</span> <span class="o">*</span> <span class="mi">4</span><span class="p">,</span>
+                                         <span class="n">proj_size</span><span class="o">=</span><span class="n">proj_dim</span><span class="p">,</span>
+                                         <span class="n">use_peepholes</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span>
+                                         <span class="n">is_reverse</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
+                                         <span class="n">cell_activation</span><span class="o">=</span><span class="s2">&quot;tanh&quot;</span><span class="p">,</span>
+                                         <span class="n">proj_activation</span><span class="o">=</span><span class="s2">&quot;tanh&quot;</span><span class="p">)</span>
+</pre></div>
+</div>
+</dd></dl>
+
 </div>
 <div class="section" id="dynamic-gru">
 <h2>dynamic_gru<a class="headerlink" href="#dynamic-gru" title="Permalink to this headline">¶</a></h2>

--- a/develop/doc/operators.json
+++ b/develop/doc/operators.json
@@ -349,6 +349,105 @@
   "comment" : "(string, default: tanh)The activation for candidate hidden state, `tanh` by default.",
   "generated" : 0
 } ] 
+},{
+ "type" : "lstmp",
+ "comment" : "\nLong-Short Term Memory with recurrent Projection layer (LSTMP) Operator.\n\nLSTMP has a separate projection layer after the LSTM layer, projecting the \noriginal hidden state to a lower-dimensional one, which is proposed to reduce \nthe number of total parameters and furthermore computational complexity for \nthe LSTM, espeacially for the case that the size of output units is relative \nlarge (https://research.google.com/pubs/archive/43905.pdf). \n\nThe formula is as follows:\n\n$$\ni_t = \\sigma(W_{ix}x_{t} + W_{ir}r_{t-1} + W_{ic}c_{t-1} + b_i) \\\\\n\nf_t = \\sigma(W_{fx}x_{t} + W_{fr}r_{t-1} + W_{fc}c_{t-1} + b_f) \\\\\n\n\\tilde{c_t} = act_g(W_{cx}x_t + W_{cr}r_{t-1} + b_c) \\\\\n\no_t = \\sigma(W_{ox}x_{t} + W_{or}r_{t-1} + W_{oc}c_t + b_o) \\\\\n\nc_t = f_t \\odot c_{t-1} + i_t \\odot \\tilde{c_t} \\\\\n\nh_t = o_t \\odot act_h(c_t) \\\\\n\nr_t = \\overline{act_h}(W_{rh}h_t)\n$$\n\nwhere the W terms denote weight matrices (e.g. $W_{xi}$ is the matrix\nof weights from the input gate to the input), $W_{ic}, W_{fc}, W_{oc}$\nare diagonal weight matrices for peephole connections. In our implementation,\nwe use vectors to reprenset these diagonal weight matrices. The b terms\ndenote bias vectors ($b_i$ is the input gate bias vector), $\\sigma$\nis the activation, such as logistic sigmoid function, and\n$i, f, o$ and $c$ are the input gate, forget gate, output gate,\nand cell activation vectors, respectively, all of which have the same size as\nthe cell output activation vector $h$. Here $h$ is usually called the hidden \nstate and $r$ denotes its recurrent projection. And $\\tilde{c_t}$ is also \ncalled the candidate hidden state, whose computation is based on the current \ninput and previous hidden state.\n\nThe $\\odot$ is the element-wise product of the vectors. $act_g$ and $act_h$\nare the cell input and cell output activation functions and `tanh` is usually\nused for them. $\\overline{act_h}$ is the activation function for the \nprojection output, usually using `identity` or same as $act_h$.\n\nNote that these $W_{xi}x_{t}, W_{xf}x_{t}, W_{xc}x_{t}, W_{xo}x_{t}$\noperations on the input $x_{t}$ are NOT included in this operator.\nUsers can choose to use fully-connected operator before LSTMP operator.\n\n",
+ "inputs" : [ 
+ { 
+   "name" : "Input",
+   "comment" : "(LoDTensor) the input for sequence data, which supports variable-time length input sequence. The underlying tensor in this LoDTensor is a matrix with shape (T X 4D), where T is the total time steps in this mini-batch, D is the hidden size.",
+   "duplicable" : 0,
+   "intermediate" : 0
+ }, { 
+   "name" : "H0",
+   "comment" : "(Tensor, optional) the initial hidden state is an optional input. This is a tensor with shape (N x D), where N is the batch size and D is the hidden size.",
+   "duplicable" : 0,
+   "intermediate" : 0
+ }, { 
+   "name" : "C0",
+   "comment" : "(Tensor, optional) the initial cell state is an optional input. This is a tensor with shape (N x D), where N is the batch size. `C0` should not be null if `H0` provided.",
+   "duplicable" : 0,
+   "intermediate" : 0
+ }, { 
+   "name" : "Weight",
+   "comment" : "(Tensor) the learnable hidden-hidden weights. - The shape is (P x 4D), where P is the projection layer size and  D is the hidden size. - Weight = {W_cr, W_ir, W_fr, W_or}",
+   "duplicable" : 0,
+   "intermediate" : 0
+ }, { 
+   "name" : "ProjWeight",
+   "comment" : "(Tensor) the learnable weight of the projection layer. - The shape is (D x P), where P is the recurrent projection layer size and  D is the hidden size. - ProjWeight = {W_rh}",
+   "duplicable" : 0,
+   "intermediate" : 0
+ }, { 
+   "name" : "Bias",
+   "comment" : "(Tensor) the learnable biases, which contains two parts: input-hidden biases and peephole connections weights if setting `use_peepholes` to `True`. 1. `use_peepholes = False`  - The shape is (1 x 4D).  - Bias = {b_c, b_i, b_f, b_o}.2. `use_peepholes = True`  - The shape is (1 x 7D).  - Bias = {b_c, b_i, b_f, b_o, W_ic, W_fc, W_oc}.",
+   "duplicable" : 0,
+   "intermediate" : 0
+ } ], 
+ "outputs" : [ 
+ { 
+   "name" : "Projection",
+   "comment" : "(LoDTensor) the projection of the hidden state of LSTMP operator. The shape is (T x P), and LoD is the same with the `Input`.",
+   "duplicable" : 0,
+   "intermediate" : 0
+ }, { 
+   "name" : "Cell",
+   "comment" : "(LoDTensor) the cell state of LSTMP operator. The shape is (T x D), and lod is the same with the `Input`.",
+   "duplicable" : 0,
+   "intermediate" : 0
+ }, { 
+   "name" : "BatchGate",
+   "comment" : "(LoDTensor) This LoDTensor contains input gate, forget gate and output gate after the activations. This LoDTensor has the same shape as the reorganized input, which is also be called batch input. The LoD size is 2. The first-level LoD is the batch offsets and the second contains the indices, which denotes the position of reorganized sequence in the raw input.",
+   "duplicable" : 0,
+   "intermediate" : 1
+ }, { 
+   "name" : "BatchCellPreAct",
+   "comment" : "(LoDTensor) the pre-activation cell state reorganized in batch. This LoDTensor is obtained in the forward and used in the backward.",
+   "duplicable" : 0,
+   "intermediate" : 1
+ }, { 
+   "name" : "BatchHidden",
+   "comment" : "(LoDTensor) the hidden state reorganized in batch. This LoDTensor is obtained in the forward and used in the backward.",
+   "duplicable" : 0,
+   "intermediate" : 1
+ }, { 
+   "name" : "OrderedP0",
+   "comment" : "(Tensor) the projection of the initial hidden state H0. This is a tensor with shape (N x P), where N is the batch size and P is the hidden size.",
+   "duplicable" : 0,
+   "intermediate" : 1
+ } ], 
+ "attrs" : [ 
+ { 
+   "name" : "use_peepholes",
+   "type" : "bool",
+   "comment" : "(bool, defalut: True) whether to enable diagonal/peephole connections.",
+   "generated" : 0
+ }, { 
+   "name" : "is_reverse",
+   "type" : "bool",
+   "comment" : "(bool, defalut: False) whether to compute reversed LSTMP.",
+   "generated" : 0
+ }, { 
+   "name" : "gate_activation",
+   "type" : "string",
+   "comment" : "(string, default: sigmoid)The activation for input gate, forget gate and output gate, `sigmoid` by default.",
+   "generated" : 0
+ }, { 
+   "name" : "cell_activation",
+   "type" : "string",
+   "comment" : "(string, default: tanh)The activation for cell output, `tanh` by defalut.",
+   "generated" : 0
+ }, { 
+   "name" : "candidate_activation",
+   "type" : "string",
+   "comment" : "(string, default: tanh)The activation for candidate hidden state, `tanh` by default.",
+   "generated" : 0
+ }, { 
+   "name" : "proj_activation",
+   "type" : "string",
+   "comment" : "(string, default: tanh)The activation for projection output, `tanh` by defalut.",
+   "generated" : 0
+ } ] 
 },{
 "type" : "warpctc",
 "comment" : "\nAn operator integrating the open-source\n[warp-ctc](https://github.com/baidu-research/warp-ctc) library, which is used in\n[Deep Speech 2: End-toEnd Speech Recognition in English and Mandarin](\nhttps://arxiv.org/pdf/1512.02595v1.pdf),\nto compute Connectionist Temporal Classification (CTC) loss.\nIt can be aliased as softmax with ctc, since a native softmax activation is\ninterated to the warp-ctc library, to to normlize values for each row of the\ninput tensor.\n\nMore detail of CTC loss can be found by refering to\n[Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with\nRecurrent Neural Networks](\nhttp://machinelearning.wustl.edu/mlpapers/paper_files/icml2006_GravesFGS06.pdf).\n",
@@ -1348,34 +1447,6 @@
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
-},{
- "type" : "scatter",
- "comment" : "\nScatter Operator.\n\nThis operator obtains output by updating the input on selected indices on the first axis:\n\n$$\nOut = Ref \\\\\nOut[Index] = Ref[Index] + Updates\n$$\n\n",
- "inputs" : [ 
- { 
-   "name" : "Ref",
-   "comment" : "The source input of scatter op",
-   "duplicable" : 0,
-   "intermediate" : 0
- }, { 
-   "name" : "Index",
-   "comment" : "The index input of scatter op where Ref will be updated",
-   "duplicable" : 0,
-   "intermediate" : 0
- }, { 
-   "name" : "Updates",
-   "comment" : "The updated value of updates op",
-   "duplicable" : 0,
-   "intermediate" : 0
- } ], 
- "outputs" : [ 
- { 
-   "name" : "Out",
-   "comment" : "The output of add op",
-   "duplicable" : 0,
-   "intermediate" : 0
- } ], 
- "attrs" : [  ] 
 },{
 "type" : "max_sequence_len",
 "comment" : "Calculate the max sequence length through lod_rank_table.",
@@ -2134,6 +2205,34 @@
   "comment" : "(bool, default false) Use Nesterov Momentum",
   "generated" : 0
 } ] 
+},{
+ "type" : "scatter",
+ "comment" : "\nScatter Operator.\n\nThis operator obtains output by updating the input on selected indices on the first axis:\n\n$$\nOut = Ref \\\\\nOut[Index] = Ref[Index] + Updates\n$$\n\n",
+ "inputs" : [ 
+ { 
+   "name" : "Ref",
+   "comment" : "The source input of scatter op",
+   "duplicable" : 0,
+   "intermediate" : 0
+ }, { 
+   "name" : "Index",
+   "comment" : "The index input of scatter op where Ref will be updated",
+   "duplicable" : 0,
+   "intermediate" : 0
+ }, { 
+   "name" : "Updates",
+   "comment" : "The updated value of updates op",
+   "duplicable" : 0,
+   "intermediate" : 0
+ } ], 
+ "outputs" : [ 
+ { 
+   "name" : "Out",
+   "comment" : "The output of add op",
+   "duplicable" : 0,
+   "intermediate" : 0
+ } ], 
+ "attrs" : [  ] 
 },{
 "type" : "uniform_random",
 "comment" : "\nUniform random operator.\n\nThis operator initializes a tensor with random values sampled from a \nuniform distribution.\n\n",
@@ -2172,6 +2271,29 @@
   "comment" : "(int, default 5(FP32)) Output tensor data type",
   "generated" : 0
 } ] 
+},{
+ "type" : "logical_xor",
+ "comment" : "logical_xor Operator\n\nIt operates element-wise on X and Y, and returns the Out. X, Y and Out are N-dim boolean tensors.\nEach element of Out is calculated by $$Out = (X || Y) \\, \\&\\& \\, !(X \\&\\& Y)$$\n",
+ "inputs" : [ 
+ { 
+   "name" : "X",
+   "comment" : "(LoDTensor) Left hand operand of logical_xor operator",
+   "duplicable" : 0,
+   "intermediate" : 0
+ }, { 
+   "name" : "Y",
+   "comment" : "(LoDTensor) Right hand operand of logical_xor operator",
+   "duplicable" : 0,
+   "intermediate" : 0
+ } ], 
+ "outputs" : [ 
+ { 
+   "name" : "Out",
+   "comment" : "(LoDTensor) n-dim bool tensor. Each element is $$Out = (X || Y) \\, \\&\\& \\, !(X \\&\\& Y)$$",
+   "duplicable" : 0,
+   "intermediate" : 0
+ } ], 
+ "attrs" : [  ] 
 },{
 "type" : "pad",
 "comment" : "\nPad Operator.\n\nPad input into output, as specified by paddings and pad_value. \nThe input should be a k-D tensor(k > 0 and k < 7). As an example:\n\nGiven:\n\nX = [[1, 2],\n     [3, 4]],\n\npaddings = [0, 1, 1, 2],\n\nand\n\npad_value = 0,\n\nwe have:\n\nOut = [[0, 1, 2, 0, 0]\n       [0, 3, 4, 0, 0]\n       [0, 0, 0, 0, 0]]\n\n",
@@ -2868,40 +2990,6 @@
   "comment" : "The exponential factor of Pow",
   "generated" : 0
 } ] 
-},{
- "type" : "lookup_table",
- "comment" : "\nLookup Table Operator.\n\nThis operator is used to perform lookups on the parameter W,\nthen concatenated into a dense tensor.\n\nThe input Ids can carry the LoD (Level of Details) information,\nor not. And the output only shares the LoD information with input Ids.\n\n",
- "inputs" : [ 
- { 
-   "name" : "W",
-   "comment" : "An input represents embedding tensors, which is a learnable parameter.",
-   "duplicable" : 0,
-   "intermediate" : 0
- }, { 
-   "name" : "Ids",
-   "comment" : "An input with type int32 or int64 contains the ids to be looked up in W. Ids must be a column vector with rank = 2. The 2nd dimension size must be 1.",
-   "duplicable" : 0,
-   "intermediate" : 0
- } ], 
- "outputs" : [ 
- { 
-   "name" : "Out",
-   "comment" : "The lookup results, which have the same type as W.",
-   "duplicable" : 0,
-   "intermediate" : 0
- } ], 
- "attrs" : [ 
- { 
-   "name" : "is_sparse",
-   "type" : "bool",
-   "comment" : "(boolean, default false) Sparse update",
-   "generated" : 0
- }, { 
-   "name" : "padding_idx",
-   "type" : "long",
-   "comment" : "(int64, default -1) If the value is -1, it makes no effect to lookup. Otherwise the given value indicates padding the output with zeros whenever lookup encounters it in Ids.",
-   "generated" : 0
- } ] 
 },{
 "type" : "unpool",
 "comment" : "\nInput shape is: $(N, C_{in}, H_{in}, W_{in})$, Output shape is:\n$(N, C_{out}, H_{out}, W_{out})$, where\n$$\nH_{out} = (H_{in}−1) * strides[0] − 2 * paddings[0] + ksize[0] \\\\\nW_{out} = (W_{in}−1) * strides[1] − 2 * paddings[1] + ksize[1]\n$$\nPaper: http://www.matthewzeiler.com/wp-content/uploads/2017/07/iccv2011.pdf\n",
@@ -3004,6 +3092,65 @@
   "comment" : "(int, default 5 (FP32)) Output data type",
   "generated" : 0
 } ] 
+},{
+ "type" : "logical_and",
+ "comment" : "logical_and Operator\n\nIt operates element-wise on X and Y, and returns the Out. X, Y and Out are N-dim boolean tensors.\nEach element of Out is calculated by $$Out = X \\&\\& Y$$\n",
+ "inputs" : [ 
+ { 
+   "name" : "X",
+   "comment" : "(LoDTensor) Left hand operand of logical_and operator",
+   "duplicable" : 0,
+   "intermediate" : 0
+ }, { 
+   "name" : "Y",
+   "comment" : "(LoDTensor) Right hand operand of logical_and operator",
+   "duplicable" : 0,
+   "intermediate" : 0
+ } ], 
+ "outputs" : [ 
+ { 
+   "name" : "Out",
+   "comment" : "(LoDTensor) n-dim bool tensor. Each element is $$Out = X \\&\\& Y$$",
+   "duplicable" : 0,
+   "intermediate" : 0
+ } ], 
+ "attrs" : [  ] 
+},{
+ "type" : "logical_not",
+ "comment" : "logical_not Operator\n\nIt operates element-wise on X, and returns the Out. X and Out are N-dim boolean tensors.\nEach element of Out is calculated by $$Out = !X$$\n",
+ "inputs" : [ 
+ { 
+   "name" : "X",
+   "comment" : "(LoDTensor) Operand of logical_not operator",
+   "duplicable" : 0,
+   "intermediate" : 0
+ } ], 
+ "outputs" : [ 
+ { 
+   "name" : "Out",
+   "comment" : "(LoDTensor) n-dim bool tensor. Each element is $$Out = !X$$",
+   "duplicable" : 0,
+   "intermediate" : 0
+ } ], 
+ "attrs" : [  ] 
+},{
+ "type" : "abs",
+ "comment" : "\nAbs Activation Operator.\n\n$out = |x|$\n\n",
+ "inputs" : [ 
+ { 
+   "name" : "X",
+   "comment" : "Input of Abs operator",
+   "duplicable" : 0,
+   "intermediate" : 0
+ } ], 
+ "outputs" : [ 
+ { 
+   "name" : "Out",
+   "comment" : "Output of Abs operator",
+   "duplicable" : 0,
+   "intermediate" : 0
+ } ], 
+ "attrs" : [  ] 
 },{
 "type" : "softplus",
 "comment" : "\nSoftplus Activation Operator.\n\n$out = \\ln(1 + e^{x})$\n\n",
@@ -3393,105 +3540,6 @@
   "comment" : "(float, default 1.0)The scaling factor of the scale operator.",
   "generated" : 0
 } ] 
-},{
- "type" : "lstmp",
- "comment" : "\nLong-Short Term Memory with recurrent Projection layer (LSTMP) Operator.\n\nLSTMP has a separate projection layer after the LSTM layer, projecting the \noriginal hidden state to a lower-dimensional one, which is proposed to reduce \nthe number of total parameters and furthermore computational complexity for \nthe LSTM, espeacially for the case that the size of output units is relative \nlarge (https://research.google.com/pubs/archive/43905.pdf). \n\nThe formula is as follows:\n\n$$\ni_t = \\sigma(W_{ix}x_{t} + W_{ir}r_{t-1} + W_{ic}c_{t-1} + b_i) \\\\\n\nf_t = \\sigma(W_{fx}x_{t} + W_{fr}r_{t-1} + W_{fc}c_{t-1} + b_f) \\\\\n\n\\tilde{c_t} = act_g(W_{cx}x_t + W_{cr}r_{t-1} + b_c) \\\\\n\no_t = \\sigma(W_{ox}x_{t} + W_{or}r_{t-1} + W_{oc}c_t + b_o) \\\\\n\nc_t = f_t \\odot c_{t-1} + i_t \\odot \\tilde{c_t} \\\\\n\nh_t = o_t \\odot act_h(c_t) \\\\\n\nr_t = \\overline{act_h}(W_{rh}h_t)\n$$\n\nwhere the W terms denote weight matrices (e.g. $W_{xi}$ is the matrix\nof weights from the input gate to the input), $W_{ic}, W_{fc}, W_{oc}$\nare diagonal weight matrices for peephole connections. In our implementation,\nwe use vectors to reprenset these diagonal weight matrices. The b terms\ndenote bias vectors ($b_i$ is the input gate bias vector), $\\sigma$\nis the activation, such as logistic sigmoid function, and\n$i, f, o$ and $c$ are the input gate, forget gate, output gate,\nand cell activation vectors, respectively, all of which have the same size as\nthe cell output activation vector $h$. Here $h$ is usually called the hidden \nstate and $r$ denotes its recurrent projection. And $\\tilde{c_t}$ is also \ncalled the candidate hidden state, whose computation is based on the current \ninput and previous hidden state.\n\nThe $\\odot$ is the element-wise product of the vectors. $act_g$ and $act_h$\nare the cell input and cell output activation functions and `tanh` is usually\nused for them. $\\overline{act_h}$ is the activation function for the \nprojection output, usually using `identity` or same as $act_h$.\n\nNote that these $W_{xi}x_{t}, W_{xf}x_{t}, W_{xc}x_{t}, W_{xo}x_{t}$\noperations on the input $x_{t}$ are NOT included in this operator.\nUsers can choose to use fully-connected operator before LSTMP operator.\n\n",
- "inputs" : [ 
- { 
-   "name" : "Input",
-   "comment" : "(LoDTensor) the input for sequence data, which supports variable-time length input sequence. The underlying tensor in this LoDTensor is a matrix with shape (T X 4D), where T is the total time steps in this mini-batch, D is the hidden size.",
-   "duplicable" : 0,
-   "intermediate" : 0
- }, { 
-   "name" : "H0",
-   "comment" : "(Tensor, optional) the initial hidden state is an optional input. This is a tensor with shape (N x D), where N is the batch size and D is the hidden size.",
-   "duplicable" : 0,
-   "intermediate" : 0
- }, { 
-   "name" : "C0",
-   "comment" : "(Tensor, optional) the initial cell state is an optional input. This is a tensor with shape (N x D), where N is the batch size. `C0` should not be null if `H0` provided.",
-   "duplicable" : 0,
-   "intermediate" : 0
- }, { 
-   "name" : "Weight",
-   "comment" : "(Tensor) the learnable hidden-hidden weights. - The shape is (P x 4D), where P is the projection layer size and  D is the hidden size. - Weight = {W_cr, W_ir, W_fr, W_or}",
-   "duplicable" : 0,
-   "intermediate" : 0
- }, { 
-   "name" : "ProjWeight",
-   "comment" : "(Tensor) the learnable weight of the projection layer. - The shape is (D x P), where P is the recurrent projection layer size and  D is the hidden size. - ProjWeight = {W_rh}",
-   "duplicable" : 0,
-   "intermediate" : 0
- }, { 
-   "name" : "Bias",
-   "comment" : "(Tensor) the learnable biases, which contains two parts: input-hidden biases and peephole connections weights if setting `use_peepholes` to `True`. 1. `use_peepholes = False`  - The shape is (1 x 4D).  - Bias = {b_c, b_i, b_f, b_o}.2. `use_peepholes = True`  - The shape is (1 x 7D).  - Bias = {b_c, b_i, b_f, b_o, W_ic, W_fc, W_oc}.",
-   "duplicable" : 0,
-   "intermediate" : 0
- } ], 
- "outputs" : [ 
- { 
-   "name" : "Projection",
-   "comment" : "(LoDTensor) the projection of the hidden state of LSTMP operator. The shape is (T x P), and LoD is the same with the `Input`.",
-   "duplicable" : 0,
-   "intermediate" : 0
- }, { 
-   "name" : "Cell",
-   "comment" : "(LoDTensor) the cell state of LSTMP operator. The shape is (T x D), and lod is the same with the `Input`.",
-   "duplicable" : 0,
-   "intermediate" : 0
- }, { 
-   "name" : "BatchGate",
-   "comment" : "(LoDTensor) This LoDTensor contains input gate, forget gate and output gate after the activations. This LoDTensor has the same shape as the reorganized input, which is also be called batch input. The LoD size is 2. The first-level LoD is the batch offsets and the second contains the indices, which denotes the position of reorganized sequence in the raw input.",
-   "duplicable" : 0,
-   "intermediate" : 1
- }, { 
-   "name" : "BatchCellPreAct",
-   "comment" : "(LoDTensor) the pre-activation cell state reorganized in batch. This LoDTensor is obtained in the forward and used in the backward.",
-   "duplicable" : 0,
-   "intermediate" : 1
- }, { 
-   "name" : "BatchHidden",
-   "comment" : "(LoDTensor) the hidden state reorganized in batch. This LoDTensor is obtained in the forward and used in the backward.",
-   "duplicable" : 0,
-   "intermediate" : 1
- }, { 
-   "name" : "OrderedP0",
-   "comment" : "(Tensor) the projection of the initial hidden state H0. This is a tensor with shape (N x P), where N is the batch size and P is the hidden size.",
-   "duplicable" : 0,
-   "intermediate" : 1
- } ], 
- "attrs" : [ 
- { 
-   "name" : "use_peepholes",
-   "type" : "bool",
-   "comment" : "(bool, defalut: True) whether to enable diagonal/peephole connections.",
-   "generated" : 0
- }, { 
-   "name" : "is_reverse",
-   "type" : "bool",
-   "comment" : "(bool, defalut: False) whether to compute reversed LSTMP.",
-   "generated" : 0
- }, { 
-   "name" : "gate_activation",
-   "type" : "string",
-   "comment" : "(string, default: sigmoid)The activation for input gate, forget gate and output gate, `sigmoid` by default.",
-   "generated" : 0
- }, { 
-   "name" : "cell_activation",
-   "type" : "string",
-   "comment" : "(string, default: tanh)The activation for cell output, `tanh` by defalut.",
-   "generated" : 0
- }, { 
-   "name" : "candidate_activation",
-   "type" : "string",
-   "comment" : "(string, default: tanh)The activation for candidate hidden state, `tanh` by default.",
-   "generated" : 0
- }, { 
-   "name" : "proj_activation",
-   "type" : "string",
-   "comment" : "(string, default: tanh)The activation for projection output, `tanh` by defalut.",
-   "generated" : 0
- } ] 
 },{
 "type" : "mean",
 "comment" : "\nMean Operator.\n\nOut is a scalar which is the mean of all elements in X. \n\n",
@@ -3511,106 +3559,58 @@
 } ], 
 "attrs" : [  ] 
 },{
- "type" : "lod_tensor_to_array",
- "comment" : "",
+ "type" : "lookup_table",
+ "comment" : "\nLookup Table Operator.\n\nThis operator is used to perform lookups on the parameter W,\nthen concatenated into a dense tensor.\n\nThe input Ids can carry the LoD (Level of Details) information,\nor not. And the output only shares the LoD information with input Ids.\n\n",
 "inputs" : [ 
 { 
-   "name" : "X",
-   "comment" : "",
+   "name" : "W",
+   "comment" : "An input represents embedding tensors, which is a learnable parameter.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
-   "name" : "RankTable",
-   "comment" : "",
+   "name" : "Ids",
+   "comment" : "An input with type int32 or int64 contains the ids to be looked up in W. Ids must be a column vector with rank = 2. The 2nd dimension size must be 1.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
-   "comment" : "",
+   "comment" : "The lookup results, which have the same type as W.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
- "attrs" : [  ] 
-},{
- "type" : "logical_and",
- "comment" : "logical_and Operator\n\nIt operates element-wise on X and Y, and returns the Out. X, Y and Out are N-dim boolean tensors.\nEach element of Out is calculated by $$Out = X \\&\\& Y$$\n",
- "inputs" : [ 
+ "attrs" : [ 
 { 
-   "name" : "X",
-   "comment" : "(LoDTensor) Left hand operand of logical_and operator",
-   "duplicable" : 0,
-   "intermediate" : 0
+   "name" : "is_sparse",
+   "type" : "bool",
+   "comment" : "(boolean, default false) Sparse update",
+   "generated" : 0
 }, { 
-   "name" : "Y",
-   "comment" : "(LoDTensor) Right hand operand of logical_and operator",
-   "duplicable" : 0,
-   "intermediate" : 0
- } ], 
- "outputs" : [ 
- { 
-   "name" : "Out",
-   "comment" : "(LoDTensor) n-dim bool tensor. Each element is $$Out = X \\&\\& Y$$",
-   "duplicable" : 0,
-   "intermediate" : 0
- } ], 
- "attrs" : [  ] 
-},{
- "type" : "logical_not",
- "comment" : "logical_not Operator\n\nIt operates element-wise on X, and returns the Out. X and Out are N-dim boolean tensors.\nEach element of Out is calculated by $$Out = !X$$\n",
- "inputs" : [ 
- { 
-   "name" : "X",
-   "comment" : "(LoDTensor) Operand of logical_not operator",
-   "duplicable" : 0,
-   "intermediate" : 0
- } ], 
- "outputs" : [ 
- { 
-   "name" : "Out",
-   "comment" : "(LoDTensor) n-dim bool tensor. Each element is $$Out = !X$$",
-   "duplicable" : 0,
-   "intermediate" : 0
- } ], 
- "attrs" : [  ] 
-},{
- "type" : "abs",
- "comment" : "\nAbs Activation Operator.\n\n$out = |x|$\n\n",
- "inputs" : [ 
- { 
-   "name" : "X",
-   "comment" : "Input of Abs operator",
-   "duplicable" : 0,
-   "intermediate" : 0
- } ], 
- "outputs" : [ 
- { 
-   "name" : "Out",
-   "comment" : "Output of Abs operator",
-   "duplicable" : 0,
-   "intermediate" : 0
- } ], 
- "attrs" : [  ] 
+   "name" : "padding_idx",
+   "type" : "long",
+   "comment" : "(int64, default -1) If the value is -1, it makes no effect to lookup. Otherwise the given value indicates padding the output with zeros whenever lookup encounters it in Ids.",
+   "generated" : 0
+ } ] 
 },{
- "type" : "logical_xor",
- "comment" : "logical_xor Operator\n\nIt operates element-wise on X and Y, and returns the Out. X, Y and Out are N-dim boolean tensors.\nEach element of Out is calculated by $$Out = (X || Y) \\, \\&\\& \\, !(X \\&\\& Y)$$\n",
+ "type" : "lod_tensor_to_array",
+ "comment" : "",
 "inputs" : [ 
 { 
   "name" : "X",
-   "comment" : "(LoDTensor) Left hand operand of logical_xor operator",
+   "comment" : "",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
-   "name" : "Y",
-   "comment" : "(LoDTensor) Right hand operand of logical_xor operator",
+   "name" : "RankTable",
+   "comment" : "",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
-   "comment" : "(LoDTensor) n-dim bool tensor. Each element is $$Out = (X || Y) \\, \\&\\& \\, !(X \\&\\& Y)$$",
+   "comment" : "",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
@@ -5162,24 +5162,6 @@
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
-},{
- "type" : "floor",
- "comment" : "\nFloor Activation Operator.\n\n$out = floor(x)$\n\n",
- "inputs" : [ 
- { 
-   "name" : "X",
-   "comment" : "Input of Floor operator",
-   "duplicable" : 0,
-   "intermediate" : 0
- } ], 
- "outputs" : [ 
- { 
-   "name" : "Out",
-   "comment" : "Output of Floor operator",
-   "duplicable" : 0,
-   "intermediate" : 0
- } ], 
- "attrs" : [  ] 
 },{
 "type" : "sequence_concat",
 "comment" : "\nThe sequence_concat operator concatenates multiple LoDTensors.\nIt only supports sequence (LoD Tensor with level number is 1)\nor a nested sequence (LoD tensor with level number is 2) as its input.\n- Case1:\n  If the axis is other than 0(here, axis is 1 and level is 1),\n  each input should have the same LoD information and the LoD\n  information of the output keeps the same as the input.\n\n  LoD(x0) = {{0,2,4}, {0,1,2,3,4}}; Dims(x0) = (4,3,4)\n  LoD(x1) = {{0,2,4}, {0,1,2,3,4}}; Dims(x1) = (4,4,4)\n  LoD(Out) = {{0,2,4}, {0,1,2,3,4}}; Dims(Out) = (4,7,4)\n\n- Case2:\n  If the axis is 0(here, leve is 0), the inputs are concatenated along\n  time steps, the LoD information of the output need to re-compute.\n  The LoD information of level-1 should be same.\n\n  LoD(x0) = {{0,2,4}, {0,1,2,3,4}}; Dims(x0) = (4,3,4)\n  LoD(x1) = {{0,2,4}, {0,1,3,5,7}}; Dims(x1) = (7,3,4)\n  LoD(Out) = {{0,2,4}, {0,2,5,8,11}}; Dims(Out) = (11,3,4)\n\n- Case3:\n  If the axis is 0(here, level is 1).\n\n  LoD(x0) = {{0,2,4}, {0,1,2,3,4}}; Dims(x0) = (4,3,4)\n  LoD(x1) = {{0,3,4}, {0,1,3,5,7}}; Dims(x1) = (7,3,4)\n  LoD(Out) = {{0,5,8}, {0,1,2,3,5,7,8,9,11}}; Dims(Out) = (11,3,4)\n\n- Case4:\n  If the LoD number is 1, axis is 0, level is 0\n\n  LoD(x0) = {{0,1,2,3,4}}; Dims(x0) = (4,3,4)\n  LoD(x1) = {{0,1,3,5,7}}; Dims(x1) = (7,3,4)\n  LoD(Out) = {{0,2,5,8,11}}; Dims(Out) = (11,3,4)\n\nNOTE: The levels of all the inputs should be the same.\n    ",
@@ -5209,6 +5191,24 @@
   "comment" : "(int, default 0) The level at which the inputs will be joined. If the level is 0, the inputs will be joined at the nested sequence level. If the level is 1, the inputs will be joined at the sequence level. The level should be less than the level number of inputs.",
   "generated" : 0
 } ] 
+},{
+ "type" : "floor",
+ "comment" : "\nFloor Activation Operator.\n\n$out = floor(x)$\n\n",
+ "inputs" : [ 
+ { 
+   "name" : "X",
+   "comment" : "Input of Floor operator",
+   "duplicable" : 0,
+   "intermediate" : 0
+ } ], 
+ "outputs" : [ 
+ { 
+   "name" : "Out",
+   "comment" : "Output of Floor operator",
+   "duplicable" : 0,
+   "intermediate" : 0
+ } ], 
+ "attrs" : [  ] 
 },{
 "type" : "cast",
 "comment" : "\nCast Operator.\n\nThis Operator casts the input tensor to another data type and\nreturns tha Output Tensor.\n\n",

--- a/develop/doc/searchindex.js
+++ b/develop/doc/searchindex.js
--- a/develop/doc_cn/_sources/api/v2/fluid/layers.rst.txt
+++ b/develop/doc_cn/_sources/api/v2/fluid/layers.rst.txt
@@ -18,6 +18,11 @@ dynamic_lstm
 ..  autofunction:: paddle.v2.fluid.layers.dynamic_lstm
    :noindex:

+dynamic_lstmp
+-------------
+..  autofunction:: paddle.v2.fluid.layers.dynamic_lstmp
+    :noindex:
+
 dynamic_gru
 -----------
 ..  autofunction:: paddle.v2.fluid.layers.dynamic_gru

--- a/develop/doc_cn/api/v2/fluid/layers.html
+++ b/develop/doc_cn/api/v2/fluid/layers.html
@@ -377,7 +377,7 @@ with zeros whenever lookup encounters it in <code class="xref py py-attr docutil
 <h2>dynamic_lstm<a class="headerlink" href="#dynamic-lstm" title="永久链接至标题">¶</a></h2>
 <dl class="function">
 <dt>
-<code class="descclassname">paddle.v2.fluid.layers.</code><code class="descname">dynamic_lstm</code><span class="sig-paren">(</span><em>input</em>, <em>size</em>, <em>param_attr=None</em>, <em>bias_attr=None</em>, <em>use_peepholes=True</em>, <em>is_reverse=False</em>, <em>gate_activation='sigmoid'</em>, <em>cell_activation='tanh'</em>, <em>candidate_activation='tanh'</em>, <em>dtype='float32'</em><span class="sig-paren">)</span></dt>
+<code class="descclassname">paddle.v2.fluid.layers.</code><code class="descname">dynamic_lstm</code><span class="sig-paren">(</span><em>input</em>, <em>size</em>, <em>param_attr=None</em>, <em>bias_attr=None</em>, <em>use_peepholes=True</em>, <em>is_reverse=False</em>, <em>gate_activation='sigmoid'</em>, <em>cell_activation='tanh'</em>, <em>candidate_activation='tanh'</em>, <em>dtype='float32'</em>, <em>name=None</em><span class="sig-paren">)</span></dt>
 <dd><p><strong>Dynamic LSTM Layer</strong></p>
 <p>The defalut implementation is diagonal/peephole connection
 (<a class="reference external" href="https://arxiv.org/pdf/1402.1128.pdf">https://arxiv.org/pdf/1402.1128.pdf</a>), the formula is as follows:</p>
@@ -387,7 +387,7 @@ with zeros whenever lookup encounters it in <code class="xref py py-attr docutil
 the matrix of weights from the input gate to the input), <span class="math">\(W_{ic},     W_{fc}, W_{oc}\)</span> are diagonal weight matrices for peephole connections. In
 our implementation, we use vectors to reprenset these diagonal weight
 matrices. The <span class="math">\(b\)</span> terms denote bias vectors (<span class="math">\(b_i\)</span> is the input
-gate bias vector), <span class="math">\(\sigma\)</span> is the non-line activations, such as
+gate bias vector), <span class="math">\(\sigma\)</span> is the non-linear activations, such as
 logistic sigmoid function, and <span class="math">\(i, f, o\)</span> and <span class="math">\(c\)</span> are the input
 gate, forget gate, output gate, and cell activation vectors, respectively,
 all of which have the same size as the cell output activation vector <span class="math">\(h\)</span>.</p>
@@ -413,15 +413,15 @@ tensor in this Variable is a matrix with shape
 (T X 4D), where T is the total time steps in this
 mini-batch, D is the hidden size.</li>
 <li><strong>size</strong> (<em>int</em>) &#8211; 4 * hidden size.</li>
-<li><strong>param_attr</strong> (<em>ParamAttr</em>) &#8211; <p>The parameter attribute for the learnable
+<li><strong>param_attr</strong> (<em>ParamAttr|None</em>) &#8211; <p>The parameter attribute for the learnable
 hidden-hidden weights.</p>
 <ul>
+<li>Weights = {<span class="math">\(W_{ch}, W_{ih},                                                 W_{fh}, W_{oh}\)</span>}</li>
 <li>The shape is (D x 4D), where D is the hidden
 size.</li>
-<li>Weights = {<span class="math">\(W_{ch}, W_{ih},                                                 W_{fh}, W_{oh}\)</span>}</li>
 </ul>
 </li>
-<li><strong>bias_attr</strong> (<em>ParamAttr</em>) &#8211; <p>The bias attribute for the learnable bias
+<li><strong>bias_attr</strong> (<em>ParamAttr|None</em>) &#8211; <p>The bias attribute for the learnable bias
 weights, which contains two parts, input-hidden
 bias weights and peephole connections weights if
 setting <cite>use_peepholes</cite> to <cite>True</cite>.</p>
@@ -430,8 +430,8 @@ setting <cite>use_peepholes</cite> to <cite>True</cite>.</p>
 </ol>
 <blockquote>
 <div><ul>
-<li>The shape is (1 x 4D).</li>
 <li>Biases = {<span class="math">\(b_c, b_i, b_f, b_o\)</span>}.</li>
+<li>The shape is (1 x 4D).</li>
 </ul>
 </div></blockquote>
 <ol class="arabic" start="2">
@@ -439,8 +439,8 @@ setting <cite>use_peepholes</cite> to <cite>True</cite>.</p>
 </ol>
 <blockquote>
 <div><ul>
-<li>The shape is (1 x 7D).</li>
 <li>Biases = { <span class="math">\(b_c, b_i, b_f, b_o, W_{ic},                                                  W_{fc}, W_{oc}\)</span>}.</li>
+<li>The shape is (1 x 7D).</li>
 </ul>
 </div></blockquote>
 </li>
@@ -456,6 +456,8 @@ output gate. Choices = [&#8220;sigmoid&#8221;, &#8220;tanh&#8221;, &#8220;relu&#
 Choices = [&#8220;sigmoid&#8221;, &#8220;tanh&#8221;, &#8220;relu&#8221;, &#8220;identity&#8221;],
 default &#8220;tanh&#8221;.</li>
 <li><strong>dtype</strong> (<em>str</em>) &#8211; Data type. Choices = [&#8220;float32&#8221;, &#8220;float64&#8221;], default &#8220;float32&#8221;.</li>
+<li><strong>name</strong> (<em>str|None</em>) &#8211; A name for this layer(optional). If set None, the layer
+will be named automatically.</li>
 </ul>
 </td>
 </tr>
@@ -477,6 +479,131 @@ default &#8220;tanh&#8221;.</li>
 </div>
 </dd></dl>

+</div>
+<div class="section" id="dynamic-lstmp">
+<h2>dynamic_lstmp<a class="headerlink" href="#dynamic-lstmp" title="永久链接至标题">¶</a></h2>
+<dl class="function">
+<dt>
+<code class="descclassname">paddle.v2.fluid.layers.</code><code class="descname">dynamic_lstmp</code><span class="sig-paren">(</span><em>input</em>, <em>size</em>, <em>proj_size</em>, <em>param_attr=None</em>, <em>bias_attr=None</em>, <em>use_peepholes=True</em>, <em>is_reverse=False</em>, <em>gate_activation='sigmoid'</em>, <em>cell_activation='tanh'</em>, <em>candidate_activation='tanh'</em>, <em>proj_activation='tanh'</em>, <em>dtype='float32'</em>, <em>name=None</em><span class="sig-paren">)</span></dt>
+<dd><p><strong>Dynamic LSTMP Layer</strong></p>
+<p>LSTMP (LSTM with recurrent projection) layer has a separate projection
+layer after the LSTM layer, projecting the original hidden state to a
+lower-dimensional one, which is proposed to reduce the number of total
+parameters and furthermore computational complexity for the LSTM,
+espeacially for the case that the size of output units is relative
+large (<a class="reference external" href="https://research.google.com/pubs/archive/43905.pdf">https://research.google.com/pubs/archive/43905.pdf</a>).</p>
+<p>The formula is as follows:</p>
+<div class="math">
+\[ \begin{align}\begin{aligned}i_t &amp; = \sigma(W_{ix}x_{t} + W_{ir}r_{t-1} + W_{ic}c_{t-1} + b_i)\\f_t &amp; = \sigma(W_{fx}x_{t} + W_{fr}r_{t-1} + W_{fc}c_{t-1} + b_f)\\\tilde{c_t} &amp; = act_g(W_{cx}x_t + W_{cr}r_{t-1} + b_c)\\o_t &amp; = \sigma(W_{ox}x_{t} + W_{or}r_{t-1} + W_{oc}c_t + b_o)\\c_t &amp; = f_t \odot c_{t-1} + i_t \odot \tilde{c_t}\\h_t &amp; = o_t \odot act_h(c_t)\\r_t &amp; = \overline{act_h}(W_{rh}h_t)\end{aligned}\end{align} \]</div>
+<p>In the above formula:</p>
+<ul class="simple">
+<li><span class="math">\(W\)</span>: Denotes weight matrices (e.g. <span class="math">\(W_{xi}\)</span> is           the matrix of weights from the input gate to the input).</li>
+<li><span class="math">\(W_{ic}\)</span>, <span class="math">\(W_{fc}\)</span>, <span class="math">\(W_{oc}\)</span>: Diagonal weight           matrices for peephole connections. In our implementation,           we use vectors to reprenset these diagonal weight matrices.</li>
+<li><span class="math">\(b\)</span>: Denotes bias vectors (e.g. <span class="math">\(b_i\)</span> is the input gate           bias vector).</li>
+<li><span class="math">\(\sigma\)</span>: The activation, such as logistic sigmoid function.</li>
+<li><span class="math">\(i, f, o\)</span> and <span class="math">\(c\)</span>: The input gate, forget gate, output           gate, and cell activation vectors, respectively, all of which have           the same size as the cell output activation vector <span class="math">\(h\)</span>.</li>
+<li><span class="math">\(h\)</span>: The hidden state.</li>
+<li><span class="math">\(r\)</span>: The recurrent projection of the hidden state.</li>
+<li><span class="math">\(\tilde{c_t}\)</span>: The candidate hidden state, whose           computation is based on the current input and previous hidden state.</li>
+<li><span class="math">\(\odot\)</span>: The element-wise product of the vectors.</li>
+<li><span class="math">\(act_g\)</span> and <span class="math">\(act_h\)</span>: The cell input and cell output           activation functions and <cite>tanh</cite> is usually used for them.</li>
+<li><span class="math">\(\overline{act_h}\)</span>: The activation function for the projection           output, usually using <cite>identity</cite> or same as <span class="math">\(act_h\)</span>.</li>
+</ul>
+<p>Set <cite>use_peepholes</cite> to <cite>False</cite> to disable peephole connection. The formula
+is omitted here, please refer to the paper
+<a class="reference external" href="http://www.bioinf.jku.at/publications/older/2604.pdf">http://www.bioinf.jku.at/publications/older/2604.pdf</a> for details.</p>
+<p>Note that these <span class="math">\(W_{xi}x_{t}, W_{xf}x_{t}, W_{xc}x_{t}, W_{xo}x_{t}\)</span>
+operations on the input <span class="math">\(x_{t}\)</span> are NOT included in this operator.
+Users can choose to use fully-connected layer before LSTMP layer.</p>
+<table class="docutils field-list" frame="void" rules="none">
+<col class="field-name" />
+<col class="field-body" />
+<tbody valign="top">
+<tr class="field-odd field"><th class="field-name">参数:</th><td class="field-body"><ul class="first simple">
+<li><strong>input</strong> (<em>Variable</em>) &#8211; The input of dynamic_lstmp layer, which supports
+variable-time length input sequence. The underlying
+tensor in this Variable is a matrix with shape
+(T X 4D), where T is the total time steps in this
+mini-batch, D is the hidden size.</li>
+<li><strong>size</strong> (<em>int</em>) &#8211; 4 * hidden size.</li>
+<li><strong>proj_size</strong> (<em>int</em>) &#8211; The size of projection output.</li>
+<li><strong>param_attr</strong> (<em>ParamAttr|None</em>) &#8211; <p>The parameter attribute for the learnable
+hidden-hidden weight and projection weight.</p>
+<ul>
+<li>Hidden-hidden weight = {<span class="math">\(W_{ch}, W_{ih},                                                 W_{fh}, W_{oh}\)</span>}.</li>
+<li>The shape of hidden-hidden weight is (P x 4D),
+where P is the projection size and D the hidden
+size.</li>
+<li>Projection weight = {<span class="math">\(W_{rh}\)</span>}.</li>
+<li>The shape of projection weight is (D x P).</li>
+</ul>
+</li>
+<li><strong>bias_attr</strong> (<em>ParamAttr|None</em>) &#8211; <p>The bias attribute for the learnable bias
+weights, which contains two parts, input-hidden
+bias weights and peephole connections weights if
+setting <cite>use_peepholes</cite> to <cite>True</cite>.</p>
+<ol class="arabic">
+<li><cite>use_peepholes = False</cite></li>
+</ol>
+<blockquote>
+<div><ul>
+<li>Biases = {<span class="math">\(b_c, b_i, b_f, b_o\)</span>}.</li>
+<li>The shape is (1 x 4D).</li>
+</ul>
+</div></blockquote>
+<ol class="arabic" start="2">
+<li><cite>use_peepholes = True</cite></li>
+</ol>
+<blockquote>
+<div><ul>
+<li>Biases = { <span class="math">\(b_c, b_i, b_f, b_o, W_{ic},                                                  W_{fc}, W_{oc}\)</span>}.</li>
+<li>The shape is (1 x 7D).</li>
+</ul>
+</div></blockquote>
+</li>
+<li><strong>use_peepholes</strong> (<em>bool</em>) &#8211; Whether to enable diagonal/peephole connections,
+default <cite>True</cite>.</li>
+<li><strong>is_reverse</strong> (<em>bool</em>) &#8211; Whether to compute reversed LSTM, default <cite>False</cite>.</li>
+<li><strong>gate_activation</strong> (<em>str</em>) &#8211; The activation for input gate, forget gate and
+output gate. Choices = [&#8220;sigmoid&#8221;, &#8220;tanh&#8221;, &#8220;relu&#8221;,
+&#8220;identity&#8221;], default &#8220;sigmoid&#8221;.</li>
+<li><strong>cell_activation</strong> (<em>str</em>) &#8211; The activation for cell output. Choices = [&#8220;sigmoid&#8221;,
+&#8220;tanh&#8221;, &#8220;relu&#8221;, &#8220;identity&#8221;], default &#8220;tanh&#8221;.</li>
+<li><strong>candidate_activation</strong> (<em>str</em>) &#8211; The activation for candidate hidden state.
+Choices = [&#8220;sigmoid&#8221;, &#8220;tanh&#8221;, &#8220;relu&#8221;, &#8220;identity&#8221;],
+default &#8220;tanh&#8221;.</li>
+<li><strong>proj_activation</strong> (<em>str</em>) &#8211; The activation for projection output.
+Choices = [&#8220;sigmoid&#8221;, &#8220;tanh&#8221;, &#8220;relu&#8221;, &#8220;identity&#8221;],
+default &#8220;tanh&#8221;.</li>
+<li><strong>dtype</strong> (<em>str</em>) &#8211; Data type. Choices = [&#8220;float32&#8221;, &#8220;float64&#8221;], default &#8220;float32&#8221;.</li>
+<li><strong>name</strong> (<em>str|None</em>) &#8211; A name for this layer(optional). If set None, the layer
+will be named automatically.</li>
+</ul>
+</td>
+</tr>
+<tr class="field-even field"><th class="field-name">返回:</th><td class="field-body"><p class="first">The projection of hidden state, and cell state of LSTMP. The                shape of projection is (T x P), for the cell state which is                (T x D), and both LoD is the same with the <cite>input</cite>.</p>
+</td>
+</tr>
+<tr class="field-odd field"><th class="field-name">返回类型:</th><td class="field-body"><p class="first last">tuple</p>
+</td>
+</tr>
+</tbody>
+</table>
+<p class="rubric">Examples</p>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">hidden_dim</span><span class="p">,</span> <span class="n">proj_dim</span> <span class="o">=</span> <span class="mi">512</span><span class="p">,</span> <span class="mi">256</span>
+<span class="n">fc_out</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">fc</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">input_seq</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="n">hidden_dim</span> <span class="o">*</span> <span class="mi">4</span><span class="p">,</span>
+                         <span class="n">act</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span> <span class="n">bias_attr</span><span class="o">=</span><span class="bp">None</span><span class="p">)</span>
+<span class="n">proj_out</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">fluid</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">dynamic_lstmp</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">fc_out</span><span class="p">,</span>
+                                         <span class="n">size</span><span class="o">=</span><span class="n">hidden_dim</span> <span class="o">*</span> <span class="mi">4</span><span class="p">,</span>
+                                         <span class="n">proj_size</span><span class="o">=</span><span class="n">proj_dim</span><span class="p">,</span>
+                                         <span class="n">use_peepholes</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span>
+                                         <span class="n">is_reverse</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
+                                         <span class="n">cell_activation</span><span class="o">=</span><span class="s2">&quot;tanh&quot;</span><span class="p">,</span>
+                                         <span class="n">proj_activation</span><span class="o">=</span><span class="s2">&quot;tanh&quot;</span><span class="p">)</span>
+</pre></div>
+</div>
+</dd></dl>
+
 </div>
 <div class="section" id="dynamic-gru">
 <h2>dynamic_gru<a class="headerlink" href="#dynamic-gru" title="永久链接至标题">¶</a></h2>

--- a/develop/doc_cn/searchindex.js
+++ b/develop/doc_cn/searchindex.js