Deploy to GitHub Pages: 53cb4df0

4e88e677 · Travis CI · 3991a9b9 · 4e88e677 · 4e88e677 · 4e88e677
8 changed file
--- a/develop/doc/_sources/design/ops/sequence_decoder.md.txt
+++ b/develop/doc/_sources/design/ops/sequence_decoder.md.txt
+# Design: Sequence Decoder Generating LoDTensors
+In tasks such as machine translation and image to text, 
+a [sequence decoder](https://github.com/PaddlePaddle/book/blob/develop/08.machine_translation/README.md) is necessary to generate sequences.
+
+This documentation describes how to implement the sequence decoder as an operator.
+
+## Beam Search based Decoder
+The [beam search algorithm](https://en.wikipedia.org/wiki/Beam_search) is necessary when generating sequences, 
+it is a heuristic search algorithm that explores the paths by expanding the most promising node in a limited set.
+
+In the old version of PaddlePaddle, a C++ class `RecurrentGradientMachine` implements the general sequence decoder based on beam search, 
+due to the complexity, the implementation relays on a lot of special data structures, 
+quite trivial and hard to be customized by users.
+
+There are a lot of heuristic tricks in the sequence generation tasks, 
+so the flexibility of sequence decoder is very important to users.
+
+During PaddlePaddle's refactoring work,
+some new concept is proposed such as [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md) and [TensorArray](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/tensor_array.md) that can better support sequence usage,
+and they can help to make the implementation of beam search based sequence decoder **more transparent and modular** .
+
+For example, the RNN sates, candidates IDs and probabilities of beam search can be represented as `LoDTensors`;
+the selected candidate's IDs in each time step can be stored in a `TensorArray`, and `Packed` to the sentences translated.
+
+## Changing LoD's absolute offset to relative offsets
+The current `LoDTensor` is designed to store levels of variable-length sequences,
+it stores several arrays of integers each represents a level.
+
+The integers in each level represents the begin and end (not inclusive) offset of a sequence **in the underlying tensor**, 
+let's call this format the **absolute-offset LoD** for clear.
+
+The relative-offset LoD can fast retrieve any sequence but fails to represent empty sequences, for example, a two-level LoD is as follows
+```python
+[[0, 3, 9]
+ [0, 2, 3, 3, 3, 9]]
+```
+The first level tells that there are two sequences:
+- the first's offset is `[0, 3)`
+- the second's offset is `[3, 9)`
+
+while on the second level, there are several empty sequences that both begin and end at `3`.
+It is impossible to tell how many empty second-level sequences exist in the first-level sequences.
+
+There are many scenarios that relay on empty sequence representation,
+such as machine translation or image to text, one instance has no translations or the empty candidate set for a prefix.
+
+So let's introduce another format of LoD, 
+it stores **the offsets of the lower level sequences** and is called **relative-offset** LoD.
+
+For example, to represent the same sequences of the above data
+
+```python
+[[0, 3, 6]
+ [0, 2, 3, 3, 3, 9]]
+```
+
+the first level represents that there are two sequences, 
+their offsets in the second-level LoD is `[0, 3)` and `[3, 5)`.
+
+The second level is the same with the relative offset example because the lower level is a tensor.
+It is easy to find out the second sequence in the first-level LoD has two empty sequences.
+
+The following demos are based on relative-offset LoD.
+
+## Usage in a simple machine translation model
+Let's start from a simple machine translation model that is simplified from [machine translation chapter](https://github.com/PaddlePaddle/book/tree/develop/08.machine_translation) to draw a simple blueprint of what a sequence decoder can do and how to use it.
+
+The model has an encoder that learns the semantic vector from a sequence,
+and a decoder which uses the sequence decoder to generate new sentences.
+
+**Encoder**
+```python
+import paddle as pd
+
+dict_size = 8000
+source_dict_size = dict_size
+target_dict_size = dict_size
+word_vector_dim = 128
+encoder_dim = 128
+decoder_dim = 128
+beam_size = 5
+max_length = 120
+
+# encoder
+src_word_id = pd.data(
+    name='source_language_word',
+    type=pd.data.integer_value_sequence(source_dict_dim))
+src_embedding = pd.embedding(size=source_dict_size, size=word_vector_dim)
+
+src_word_vec = pd.lookup(src_embedding, src_word_id)
+
+encoder_out_seq = pd.gru(input=src_word_vec, size=encoder_dim)
+
+encoder_ctx = pd.last_seq(encoder_out_seq)
+# encoder_ctx_proj is the learned semantic vector
+encoder_ctx_proj = pd.fc(
+    encoder_ctx, size=decoder_dim, act=pd.activation.Tanh(), bias=None)
+```
+
+**Decoder**
+
+```python
+def generate():
+    decoder = pd.while_loop()
+    with decoder.step():
+        decoder_mem = decoder.memory(init=encoder_ctx)  # mark the memory
+        generated_ids = decoder.memory() # TODO init to batch_size <s>s
+        generated_scores = decoder.memory() # TODO init to batch_size 1s or 0s
+
+        target_word = pd.lookup(trg_embedding, gendrated_ids)
+        # expand encoder_ctx's batch to fit target_word's lod
+        # for example
+        # decoder_mem.lod is
+        # [[0 1 3],
+        #  [0 1 3 6]]
+        # its tensor content is [a1 a2 a3 a4 a5]
+        # which means there are 2 sentences to translate
+        #   - the first sentence has 1 translation prefixes, the offsets are [0, 1)
+        #   - the second sentence has 2 translation prefixes, the offsets are [1, 3) and [3, 6)
+        # the target_word.lod is 
+        # [[0, 1, 6]
+        #  [0, 2, 4, 7, 9 12]]
+        # which means 2 sentences to translate, each has 1 and 5 prefixes
+        # the first prefix has 2 candidates
+        # the following has 2, 3, 2, 3 candidates
+        # the encoder_ctx_expanded's content will be
+        # [a1 a1 a2 a2 a3 a3 a3 a4 a4 a5 a5 a5]
+        encoder_ctx_expanded = pd.lod_expand(encoder_ctx, target_word)
+        decoder_input = pd.fc(
+            act=pd.activation.Linear(),
+            input=[target_word, encoder_ctx],
+            size=3 * decoder_dim)
+        gru_out, cur_mem = pd.gru_step(
+            decoder_input, mem=decoder_mem, size=decoder_dim)
+        scores = pd.fc(
+            gru_out,
+            size=trg_dic_size,
+            bias=None,
+            act=pd.activation.Softmax())
+        # K is an config
+        topk_scores, topk_ids = pd.top_k(scores, K)
+        topk_generated_scores = pd.add_scalar(topk_scores, generated_scores)
+
+        selected_ids, selected_generation_scores = decoder.beam_search(
+            topk_ids, topk_generated_scores)
+
+        # update the states
+        decoder_mem.update(cur_mem)  # tells how to update state
+        generated_ids.update(selected_ids)
+        generated_scores.update(selected_generation_scores)
+
+        decoder.output(selected_ids)
+        decoder.output(selected_generation_scores)
+
+translation_ids, translation_scores = decoder()
+```
+The `decoder.beam_search` is a operator that given the candidates and the scores of translations including the candidates,
+return the result of the beam search algorithm.
+
+In this way, users can customize anything on the inputs or outputs of beam search, for example, two ways to prune some translation prefixes
+
+1. meke the correspondind elements in `topk_generated_scores` zero or some small values, beam_search will discard this candidate.
+2. remove some specific candidate in `selected_ids`
+3. get the final `translation_ids`, remove the translation sequence in it.
+
+The implementation of sequence decoder can reuse the C++ class [RNNAlgorithm](https://github.com/Superjom/Paddle/blob/68cac3c0f8451fe62a4cdf156747d6dc0ee000b3/paddle/operators/dynamic_recurrent_op.h#L30),
+so the python syntax is quite similar to a [RNN](https://github.com/Superjom/Paddle/blob/68cac3c0f8451fe62a4cdf156747d6dc0ee000b3/doc/design/block.md#blocks-with-for-and-rnnop).
+
+Both of them are two-level `LoDTensors`
+
+- the first level represents `batch_size` of (source) sentences;
+- the second level represents the candidate ID sets for translation prefix.
+
+for example, 3 source sentences to translate, and has 2, 3, 1 candidates.
+
+Unlike an RNN, in sequence decoder, the previous state and the current state have different LoD and shape,
+a `lod_expand` operator is used to expand the LoD of the previous state to fit the current state.
+
+For example, the previous state
+
+* LoD is `[0, 1, 3][0, 2, 5, 6]`
+* content of tensor is `a1 a2 b1 b2 b3 c1`
+
+the current state stored in `encoder_ctx_expanded`
+
+* LoD is `[0, 2, 7][0 3 5 8 9 11 11]`
+* the content is 
+  - a1 a1 a1 (a1 has 3 candidates, so the state should be copied 3 times for each candidates)
+  - a2 a2
+  - b1 b1 b1
+  - b2
+  - b3 b3
+  - None (c1 has 0 candidates, so c1 is dropped)
+
+Benefit from the relative offset LoD, empty candidate set can be represented naturally.
+
+the status in each time step can be stored in `TensorArray`, and `Pack`ed to a final LoDTensor, the corresponding syntax is 
+
+```python
+decoder.output(selected_ids)
+decoder.output(selected_generation_scores)
+```
+
+the `selected_ids` is the candidate ids for the prefixes, 
+it will be `Packed` by `TensorArray` to a two-level `LoDTensor`,
+the first level represents the source sequences,
+the second level represents generated sequences.
+
+Pack the `selected_scores` will get a `LoDTensor` that stores scores of each candidate of translations.
+
+Pack the `selected_generation_scores` will get a `LoDTensor`, and each tail is the probability of the translation.
+
+## LoD and shape changes during decoding
+<p align="center">
+  <img src="./images/LOD-and-shape-changes-during-decoding.jpg"/>
+</p>
+
+According the image above, the only phrase to change LoD is beam search.
+
+## Beam search design
+The beam search algorthm will be implemented as one method of the sequence decoder, it has 3 inputs
+
+1. `topk_ids`, top K candidate ids for each prefix.
+2. `topk_scores`, the corresponding scores for `topk_ids`
+3. `generated_scores`, the score of the prefixes.
+
+All of the are LoDTensors, so that the sequence affilication is clear.
+Beam search will keep a beam for each prefix and select a smaller candidate set for each prefix.
+
+It will return three variables
+
+1. `selected_ids`, the final candidate beam search function selected for the next step.
+2. `selected_scores`, the scores for the candidates.
+3. `generated_scores`, the updated scores for each prefixes (with the new candidates appended).
+
+## Introducing the LoD-based `Pack` and `Unpack` methods in `TensorArray`
+The `selected_ids`, `selected_scores` and `generated_scores` are LoDTensors,
+and they exist in each time step,
+so it is natural to store them in arrays.
+
+Currently, PaddlePaddle has a module called `TensorArray` which can store an array of tensors,
+the results of beam search are better to store in a `TensorArray`.
+
+The `Pack` and `UnPack` in `TensorArray` are used to package tensors in the array to a `LoDTensor` or split the `LoDTensor` to an array of tensors. 
+It needs some extensions to support pack or unpack an array of `LoDTensors`.
--- a/develop/doc/design/ops/sequence_decoder.html
+++ b/develop/doc/design/ops/sequence_decoder.html
+
+
+<!DOCTYPE html>
+<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
+<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
+<head>
+  <meta charset="utf-8">
+  
+  <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  
+  <title>Design: Sequence Decoder Generating LoDTensors &mdash; PaddlePaddle  documentation</title>
+  
+
+  
+  
+
+  
+
+  
+  
+    
+
+  
+
+  
+  
+    <link rel="stylesheet" href="../../_static/css/theme.css" type="text/css" />
+  
+
+  
+  
+        <link rel="index" title="Index"
+              href="../../genindex.html"/>
+        <link rel="search" title="Search" href="../../search.html"/>
+    <link rel="top" title="PaddlePaddle  documentation" href="../../index.html"/> 
+
+  <link rel="stylesheet" href="https://cdn.jsdelivr.net/perfect-scrollbar/0.6.14/css/perfect-scrollbar.min.css" type="text/css" />
+  <link rel="stylesheet" href="../../_static/css/override.css" type="text/css" />
+  <script>
+  var _hmt = _hmt || [];
+  (function() {
+    var hm = document.createElement("script");
+    hm.src = "//hm.baidu.com/hm.js?b9a314ab40d04d805655aab1deee08ba";
+    var s = document.getElementsByTagName("script")[0]; 
+    s.parentNode.insertBefore(hm, s);
+  })();
+  </script>
+
+  
+
+  
+  <script src="../../_static/js/modernizr.min.js"></script>
+
+</head>
+
+<body class="wy-body-for-nav" role="document">
+
+  
+  <header class="site-header">
+    <div class="site-logo">
+      <a href="/"><img src="../../_static/images/PP_w.png"></a>
+    </div>
+    <div class="site-nav-links">
+      <div class="site-menu">
+        <a class="fork-on-github" href="https://github.com/PaddlePaddle/Paddle" target="_blank"><i class="fa fa-github"></i>Fork me on Github</a>
+        <div class="language-switcher dropdown">
+          <a type="button" data-toggle="dropdown">
+            <span>English</span>
+            <i class="fa fa-angle-up"></i>
+            <i class="fa fa-angle-down"></i>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="/doc_cn">中文</a></li>
+            <li><a href="/doc">English</a></li>
+          </ul>
+        </div>
+        <ul class="site-page-links">
+          <li><a href="/">Home</a></li>
+        </ul>
+      </div>
+      <div class="doc-module">
+        
+        <ul>
+<li class="toctree-l1"><a class="reference internal" href="../../getstarted/index_en.html">GET STARTED</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../../howto/index_en.html">HOW TO</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../../api/index_en.html">API</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../../mobile/index_en.html">MOBILE</a></li>
+</ul>
+
+        
+<div role="search">
+  <form id="rtd-search-form" class="wy-form" action="../../search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" />
+    <input type="hidden" name="check_keywords" value="yes" />
+    <input type="hidden" name="area" value="default" />
+  </form>
+</div>        
+      </div>
+    </div>
+  </header>
+  
+  <div class="main-content-wrap">
+
+    
+    <nav class="doc-menu-vertical" role="navigation">
+        
+          
+          <ul>
+<li class="toctree-l1"><a class="reference internal" href="../../getstarted/index_en.html">GET STARTED</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../../getstarted/build_and_install/index_en.html">Install and Build</a><ul>
+<li class="toctree-l3"><a class="reference internal" href="../../getstarted/build_and_install/docker_install_en.html">PaddlePaddle in Docker Containers</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../../getstarted/build_and_install/build_from_source_en.html">Installing from Sources</a></li>
+</ul>
+</li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal" href="../../howto/index_en.html">HOW TO</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../../howto/usage/cmd_parameter/index_en.html">Set Command-line Parameters</a><ul>
+<li class="toctree-l3"><a class="reference internal" href="../../howto/usage/cmd_parameter/use_case_en.html">Use Case</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../../howto/usage/cmd_parameter/arguments_en.html">Argument Outline</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../../howto/usage/cmd_parameter/detail_introduction_en.html">Detail Description</a></li>
+</ul>
+</li>
+<li class="toctree-l2"><a class="reference internal" href="../../howto/usage/cluster/cluster_train_en.html">PaddlePaddle Distributed Training</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../../howto/usage/k8s/k8s_en.html">Paddle On Kubernetes</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../../howto/usage/k8s/k8s_aws_en.html">Distributed PaddlePaddle Training on AWS with Kubernetes</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../../howto/dev/build_en.html">Build PaddlePaddle from Source Code and Run Unit Test</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../../howto/dev/new_layer_en.html">Write New Layers</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../../howto/dev/contribute_to_paddle_en.html">Contribute Code</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../../howto/deep_model/rnn/index_en.html">RNN Models</a><ul>
+<li class="toctree-l3"><a class="reference internal" href="../../howto/deep_model/rnn/rnn_config_en.html">RNN Configuration</a></li>
+</ul>
+</li>
+<li class="toctree-l2"><a class="reference internal" href="../../howto/optimization/gpu_profiling_en.html">Tune GPU Performance</a></li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal" href="../../api/index_en.html">API</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../../api/v2/model_configs.html">Model Configuration</a><ul>
+<li class="toctree-l3"><a class="reference internal" href="../../api/v2/config/activation.html">Activation</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../../api/v2/config/layer.html">Layers</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../../api/v2/config/evaluators.html">Evaluators</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../../api/v2/config/optimizer.html">Optimizer</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../../api/v2/config/pooling.html">Pooling</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../../api/v2/config/networks.html">Networks</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../../api/v2/config/attr.html">Parameter Attribute</a></li>
+</ul>
+</li>
+<li class="toctree-l2"><a class="reference internal" href="../../api/v2/data.html">Data Reader Interface and DataSets</a><ul>
+<li class="toctree-l3"><a class="reference internal" href="../../api/v2/data/data_reader.html">Data Reader Interface</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../../api/v2/data/image.html">Image Interface</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../../api/v2/data/dataset.html">Dataset</a></li>
+</ul>
+</li>
+<li class="toctree-l2"><a class="reference internal" href="../../api/v2/run_logic.html">Training and Inference</a></li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal" href="../../mobile/index_en.html">MOBILE</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../../mobile/cross_compiling_for_android_en.html">Build PaddlePaddle for Android</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../../mobile/cross_compiling_for_raspberry_en.html">Build PaddlePaddle for Raspberry Pi</a></li>
+</ul>
+</li>
+</ul>
+
+        
+    </nav>
+    
+    <section class="doc-content-wrap">
+
+      
+
+ 
+
+
+
+
+
+
+
+<div role="navigation" aria-label="breadcrumbs navigation">
+  <ul class="wy-breadcrumbs">
+      
+    <li>Design: Sequence Decoder Generating LoDTensors</li>
+  </ul>
+</div>
+      
+      <div class="wy-nav-content" id="doc-content">
+        <div class="rst-content">
+          <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
+           <div itemprop="articleBody">
+            
+  <div class="section" id="design-sequence-decoder-generating-lodtensors">
+<span id="design-sequence-decoder-generating-lodtensors"></span><h1>Design: Sequence Decoder Generating LoDTensors<a class="headerlink" href="#design-sequence-decoder-generating-lodtensors" title="Permalink to this headline">¶</a></h1>
+<p>In tasks such as machine translation and image to text,
+a <a class="reference external" href="https://github.com/PaddlePaddle/book/blob/develop/08.machine_translation/README.md">sequence decoder</a> is necessary to generate sequences.</p>
+<p>This documentation describes how to implement the sequence decoder as an operator.</p>
+<div class="section" id="beam-search-based-decoder">
+<span id="beam-search-based-decoder"></span><h2>Beam Search based Decoder<a class="headerlink" href="#beam-search-based-decoder" title="Permalink to this headline">¶</a></h2>
+<p>The <a class="reference external" href="https://en.wikipedia.org/wiki/Beam_search">beam search algorithm</a> is necessary when generating sequences,
+it is a heuristic search algorithm that explores the paths by expanding the most promising node in a limited set.</p>
+<p>In the old version of PaddlePaddle, a C++ class <code class="docutils literal"><span class="pre">RecurrentGradientMachine</span></code> implements the general sequence decoder based on beam search,
+due to the complexity, the implementation relays on a lot of special data structures,
+quite trivial and hard to be customized by users.</p>
+<p>There are a lot of heuristic tricks in the sequence generation tasks,
+so the flexibility of sequence decoder is very important to users.</p>
+<p>During PaddlePaddle&#8217;s refactoring work,
+some new concept is proposed such as <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md">LoDTensor</a> and <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/tensor_array.md">TensorArray</a> that can better support sequence usage,
+and they can help to make the implementation of beam search based sequence decoder <strong>more transparent and modular</strong> .</p>
+<p>For example, the RNN sates, candidates IDs and probabilities of beam search can be represented as <code class="docutils literal"><span class="pre">LoDTensors</span></code>;
+the selected candidate&#8217;s IDs in each time step can be stored in a <code class="docutils literal"><span class="pre">TensorArray</span></code>, and <code class="docutils literal"><span class="pre">Packed</span></code> to the sentences translated.</p>
+</div>
+<div class="section" id="changing-lod-s-absolute-offset-to-relative-offsets">
+<span id="changing-lod-s-absolute-offset-to-relative-offsets"></span><h2>Changing LoD&#8217;s absolute offset to relative offsets<a class="headerlink" href="#changing-lod-s-absolute-offset-to-relative-offsets" title="Permalink to this headline">¶</a></h2>
+<p>The current <code class="docutils literal"><span class="pre">LoDTensor</span></code> is designed to store levels of variable-length sequences,
+it stores several arrays of integers each represents a level.</p>
+<p>The integers in each level represents the begin and end (not inclusive) offset of a sequence <strong>in the underlying tensor</strong>,
+let&#8217;s call this format the <strong>absolute-offset LoD</strong> for clear.</p>
+<p>The relative-offset LoD can fast retrieve any sequence but fails to represent empty sequences, for example, a two-level LoD is as follows</p>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="p">[[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">9</span><span class="p">]</span>
+ <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">9</span><span class="p">]]</span>
+</pre></div>
+</div>
+<p>The first level tells that there are two sequences:</p>
+<ul class="simple">
+<li>the first&#8217;s offset is <code class="docutils literal"><span class="pre">[0,</span> <span class="pre">3)</span></code></li>
+<li>the second&#8217;s offset is <code class="docutils literal"><span class="pre">[3,</span> <span class="pre">9)</span></code></li>
+</ul>
+<p>while on the second level, there are several empty sequences that both begin and end at <code class="docutils literal"><span class="pre">3</span></code>.
+It is impossible to tell how many empty second-level sequences exist in the first-level sequences.</p>
+<p>There are many scenarios that relay on empty sequence representation,
+such as machine translation or image to text, one instance has no translations or the empty candidate set for a prefix.</p>
+<p>So let&#8217;s introduce another format of LoD,
+it stores <strong>the offsets of the lower level sequences</strong> and is called <strong>relative-offset</strong> LoD.</p>
+<p>For example, to represent the same sequences of the above data</p>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="p">[[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">6</span><span class="p">]</span>
+ <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">9</span><span class="p">]]</span>
+</pre></div>
+</div>
+<p>the first level represents that there are two sequences,
+their offsets in the second-level LoD is <code class="docutils literal"><span class="pre">[0,</span> <span class="pre">3)</span></code> and <code class="docutils literal"><span class="pre">[3,</span> <span class="pre">5)</span></code>.</p>
+<p>The second level is the same with the relative offset example because the lower level is a tensor.
+It is easy to find out the second sequence in the first-level LoD has two empty sequences.</p>
+<p>The following demos are based on relative-offset LoD.</p>
+</div>
+<div class="section" id="usage-in-a-simple-machine-translation-model">
+<span id="usage-in-a-simple-machine-translation-model"></span><h2>Usage in a simple machine translation model<a class="headerlink" href="#usage-in-a-simple-machine-translation-model" title="Permalink to this headline">¶</a></h2>
+<p>Let&#8217;s start from a simple machine translation model that is simplified from <a class="reference external" href="https://github.com/PaddlePaddle/book/tree/develop/08.machine_translation">machine translation chapter</a> to draw a simple blueprint of what a sequence decoder can do and how to use it.</p>
+<p>The model has an encoder that learns the semantic vector from a sequence,
+and a decoder which uses the sequence decoder to generate new sentences.</p>
+<p><strong>Encoder</strong></p>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">paddle</span> <span class="kn">as</span> <span class="nn">pd</span>
+
+<span class="n">dict_size</span> <span class="o">=</span> <span class="mi">8000</span>
+<span class="n">source_dict_size</span> <span class="o">=</span> <span class="n">dict_size</span>
+<span class="n">target_dict_size</span> <span class="o">=</span> <span class="n">dict_size</span>
+<span class="n">word_vector_dim</span> <span class="o">=</span> <span class="mi">128</span>
+<span class="n">encoder_dim</span> <span class="o">=</span> <span class="mi">128</span>
+<span class="n">decoder_dim</span> <span class="o">=</span> <span class="mi">128</span>
+<span class="n">beam_size</span> <span class="o">=</span> <span class="mi">5</span>
+<span class="n">max_length</span> <span class="o">=</span> <span class="mi">120</span>
+
+<span class="c1"># encoder</span>
+<span class="n">src_word_id</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">data</span><span class="p">(</span>
+    <span class="n">name</span><span class="o">=</span><span class="s1">&#39;source_language_word&#39;</span><span class="p">,</span>
+    <span class="nb">type</span><span class="o">=</span><span class="n">pd</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">integer_value_sequence</span><span class="p">(</span><span class="n">source_dict_dim</span><span class="p">))</span>
+<span class="n">src_embedding</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">embedding</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="n">source_dict_size</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="n">word_vector_dim</span><span class="p">)</span>
+
+<span class="n">src_word_vec</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">lookup</span><span class="p">(</span><span class="n">src_embedding</span><span class="p">,</span> <span class="n">src_word_id</span><span class="p">)</span>
+
+<span class="n">encoder_out_seq</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">gru</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">src_word_vec</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="n">encoder_dim</span><span class="p">)</span>
+
+<span class="n">encoder_ctx</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">last_seq</span><span class="p">(</span><span class="n">encoder_out_seq</span><span class="p">)</span>
+<span class="c1"># encoder_ctx_proj is the learned semantic vector</span>
+<span class="n">encoder_ctx_proj</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">fc</span><span class="p">(</span>
+    <span class="n">encoder_ctx</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="n">decoder_dim</span><span class="p">,</span> <span class="n">act</span><span class="o">=</span><span class="n">pd</span><span class="o">.</span><span class="n">activation</span><span class="o">.</span><span class="n">Tanh</span><span class="p">(),</span> <span class="n">bias</span><span class="o">=</span><span class="bp">None</span><span class="p">)</span>
+</pre></div>
+</div>
+<p><strong>Decoder</strong></p>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">generate</span><span class="p">():</span>
+    <span class="n">decoder</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">while_loop</span><span class="p">()</span>
+    <span class="k">with</span> <span class="n">decoder</span><span class="o">.</span><span class="n">step</span><span class="p">():</span>
+        <span class="n">decoder_mem</span> <span class="o">=</span> <span class="n">decoder</span><span class="o">.</span><span class="n">memory</span><span class="p">(</span><span class="n">init</span><span class="o">=</span><span class="n">encoder_ctx</span><span class="p">)</span>  <span class="c1"># mark the memory</span>
+        <span class="n">generated_ids</span> <span class="o">=</span> <span class="n">decoder</span><span class="o">.</span><span class="n">memory</span><span class="p">()</span> <span class="c1"># TODO init to batch_size &lt;s&gt;s</span>
+        <span class="n">generated_scores</span> <span class="o">=</span> <span class="n">decoder</span><span class="o">.</span><span class="n">memory</span><span class="p">()</span> <span class="c1"># TODO init to batch_size 1s or 0s</span>
+
+        <span class="n">target_word</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">lookup</span><span class="p">(</span><span class="n">trg_embedding</span><span class="p">,</span> <span class="n">gendrated_ids</span><span class="p">)</span>
+        <span class="c1"># expand encoder_ctx&#39;s batch to fit target_word&#39;s lod</span>
+        <span class="c1"># for example</span>
+        <span class="c1"># decoder_mem.lod is</span>
+        <span class="c1"># [[0 1 3],</span>
+        <span class="c1">#  [0 1 3 6]]</span>
+        <span class="c1"># its tensor content is [a1 a2 a3 a4 a5]</span>
+        <span class="c1"># which means there are 2 sentences to translate</span>
+        <span class="c1">#   - the first sentence has 1 translation prefixes, the offsets are [0, 1)</span>
+        <span class="c1">#   - the second sentence has 2 translation prefixes, the offsets are [1, 3) and [3, 6)</span>
+        <span class="c1"># the target_word.lod is </span>
+        <span class="c1"># [[0, 1, 6]</span>
+        <span class="c1">#  [0, 2, 4, 7, 9 12]]</span>
+        <span class="c1"># which means 2 sentences to translate, each has 1 and 5 prefixes</span>
+        <span class="c1"># the first prefix has 2 candidates</span>
+        <span class="c1"># the following has 2, 3, 2, 3 candidates</span>
+        <span class="c1"># the encoder_ctx_expanded&#39;s content will be</span>
+        <span class="c1"># [a1 a1 a2 a2 a3 a3 a3 a4 a4 a5 a5 a5]</span>
+        <span class="n">encoder_ctx_expanded</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">lod_expand</span><span class="p">(</span><span class="n">encoder_ctx</span><span class="p">,</span> <span class="n">target_word</span><span class="p">)</span>
+        <span class="n">decoder_input</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">fc</span><span class="p">(</span>
+            <span class="n">act</span><span class="o">=</span><span class="n">pd</span><span class="o">.</span><span class="n">activation</span><span class="o">.</span><span class="n">Linear</span><span class="p">(),</span>
+            <span class="nb">input</span><span class="o">=</span><span class="p">[</span><span class="n">target_word</span><span class="p">,</span> <span class="n">encoder_ctx</span><span class="p">],</span>
+            <span class="n">size</span><span class="o">=</span><span class="mi">3</span> <span class="o">*</span> <span class="n">decoder_dim</span><span class="p">)</span>
+        <span class="n">gru_out</span><span class="p">,</span> <span class="n">cur_mem</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">gru_step</span><span class="p">(</span>
+            <span class="n">decoder_input</span><span class="p">,</span> <span class="n">mem</span><span class="o">=</span><span class="n">decoder_mem</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="n">decoder_dim</span><span class="p">)</span>
+        <span class="n">scores</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">fc</span><span class="p">(</span>
+            <span class="n">gru_out</span><span class="p">,</span>
+            <span class="n">size</span><span class="o">=</span><span class="n">trg_dic_size</span><span class="p">,</span>
+            <span class="n">bias</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span>
+            <span class="n">act</span><span class="o">=</span><span class="n">pd</span><span class="o">.</span><span class="n">activation</span><span class="o">.</span><span class="n">Softmax</span><span class="p">())</span>
+        <span class="c1"># K is an config</span>
+        <span class="n">topk_scores</span><span class="p">,</span> <span class="n">topk_ids</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">top_k</span><span class="p">(</span><span class="n">scores</span><span class="p">,</span> <span class="n">K</span><span class="p">)</span>
+        <span class="n">topk_generated_scores</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">add_scalar</span><span class="p">(</span><span class="n">topk_scores</span><span class="p">,</span> <span class="n">generated_scores</span><span class="p">)</span>
+
+        <span class="n">selected_ids</span><span class="p">,</span> <span class="n">selected_generation_scores</span> <span class="o">=</span> <span class="n">decoder</span><span class="o">.</span><span class="n">beam_search</span><span class="p">(</span>
+            <span class="n">topk_ids</span><span class="p">,</span> <span class="n">topk_generated_scores</span><span class="p">)</span>
+
+        <span class="c1"># update the states</span>
+        <span class="n">decoder_mem</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="n">cur_mem</span><span class="p">)</span>  <span class="c1"># tells how to update state</span>
+        <span class="n">generated_ids</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="n">selected_ids</span><span class="p">)</span>
+        <span class="n">generated_scores</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="n">selected_generation_scores</span><span class="p">)</span>
+
+        <span class="n">decoder</span><span class="o">.</span><span class="n">output</span><span class="p">(</span><span class="n">selected_ids</span><span class="p">)</span>
+        <span class="n">decoder</span><span class="o">.</span><span class="n">output</span><span class="p">(</span><span class="n">selected_generation_scores</span><span class="p">)</span>
+
+<span class="n">translation_ids</span><span class="p">,</span> <span class="n">translation_scores</span> <span class="o">=</span> <span class="n">decoder</span><span class="p">()</span>
+</pre></div>
+</div>
+<p>The <code class="docutils literal"><span class="pre">decoder.beam_search</span></code> is a operator that given the candidates and the scores of translations including the candidates,
+return the result of the beam search algorithm.</p>
+<p>In this way, users can customize anything on the inputs or outputs of beam search, for example, two ways to prune some translation prefixes</p>
+<ol class="simple">
+<li>meke the correspondind elements in <code class="docutils literal"><span class="pre">topk_generated_scores</span></code> zero or some small values, beam_search will discard this candidate.</li>
+<li>remove some specific candidate in <code class="docutils literal"><span class="pre">selected_ids</span></code></li>
+<li>get the final <code class="docutils literal"><span class="pre">translation_ids</span></code>, remove the translation sequence in it.</li>
+</ol>
+<p>The implementation of sequence decoder can reuse the C++ class <a class="reference external" href="https://github.com/Superjom/Paddle/blob/68cac3c0f8451fe62a4cdf156747d6dc0ee000b3/paddle/operators/dynamic_recurrent_op.h#L30">RNNAlgorithm</a>,
+so the python syntax is quite similar to a <a class="reference external" href="https://github.com/Superjom/Paddle/blob/68cac3c0f8451fe62a4cdf156747d6dc0ee000b3/doc/design/block.md#blocks-with-for-and-rnnop">RNN</a>.</p>
+<p>Both of them are two-level <code class="docutils literal"><span class="pre">LoDTensors</span></code></p>
+<ul class="simple">
+<li>the first level represents <code class="docutils literal"><span class="pre">batch_size</span></code> of (source) sentences;</li>
+<li>the second level represents the candidate ID sets for translation prefix.</li>
+</ul>
+<p>for example, 3 source sentences to translate, and has 2, 3, 1 candidates.</p>
+<p>Unlike an RNN, in sequence decoder, the previous state and the current state have different LoD and shape,
+a <code class="docutils literal"><span class="pre">lod_expand</span></code> operator is used to expand the LoD of the previous state to fit the current state.</p>
+<p>For example, the previous state</p>
+<ul class="simple">
+<li>LoD is <code class="docutils literal"><span class="pre">[0,</span> <span class="pre">1,</span> <span class="pre">3][0,</span> <span class="pre">2,</span> <span class="pre">5,</span> <span class="pre">6]</span></code></li>
+<li>content of tensor is <code class="docutils literal"><span class="pre">a1</span> <span class="pre">a2</span> <span class="pre">b1</span> <span class="pre">b2</span> <span class="pre">b3</span> <span class="pre">c1</span></code></li>
+</ul>
+<p>the current state stored in <code class="docutils literal"><span class="pre">encoder_ctx_expanded</span></code></p>
+<ul class="simple">
+<li>LoD is <code class="docutils literal"><span class="pre">[0,</span> <span class="pre">2,</span> <span class="pre">7][0</span> <span class="pre">3</span> <span class="pre">5</span> <span class="pre">8</span> <span class="pre">9</span> <span class="pre">11</span> <span class="pre">11]</span></code></li>
+<li>the content is<ul>
+<li>a1 a1 a1 (a1 has 3 candidates, so the state should be copied 3 times for each candidates)</li>
+<li>a2 a2</li>
+<li>b1 b1 b1</li>
+<li>b2</li>
+<li>b3 b3</li>
+<li>None (c1 has 0 candidates, so c1 is dropped)</li>
+</ul>
+</li>
+</ul>
+<p>Benefit from the relative offset LoD, empty candidate set can be represented naturally.</p>
+<p>the status in each time step can be stored in <code class="docutils literal"><span class="pre">TensorArray</span></code>, and <code class="docutils literal"><span class="pre">Pack</span></code>ed to a final LoDTensor, the corresponding syntax is</p>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">decoder</span><span class="o">.</span><span class="n">output</span><span class="p">(</span><span class="n">selected_ids</span><span class="p">)</span>
+<span class="n">decoder</span><span class="o">.</span><span class="n">output</span><span class="p">(</span><span class="n">selected_generation_scores</span><span class="p">)</span>
+</pre></div>
+</div>
+<p>the <code class="docutils literal"><span class="pre">selected_ids</span></code> is the candidate ids for the prefixes,
+it will be <code class="docutils literal"><span class="pre">Packed</span></code> by <code class="docutils literal"><span class="pre">TensorArray</span></code> to a two-level <code class="docutils literal"><span class="pre">LoDTensor</span></code>,
+the first level represents the source sequences,
+the second level represents generated sequences.</p>
+<p>Pack the <code class="docutils literal"><span class="pre">selected_scores</span></code> will get a <code class="docutils literal"><span class="pre">LoDTensor</span></code> that stores scores of each candidate of translations.</p>
+<p>Pack the <code class="docutils literal"><span class="pre">selected_generation_scores</span></code> will get a <code class="docutils literal"><span class="pre">LoDTensor</span></code>, and each tail is the probability of the translation.</p>
+</div>
+<div class="section" id="lod-and-shape-changes-during-decoding">
+<span id="lod-and-shape-changes-during-decoding"></span><h2>LoD and shape changes during decoding<a class="headerlink" href="#lod-and-shape-changes-during-decoding" title="Permalink to this headline">¶</a></h2>
+<p align="center">
+  <img src="./images/LOD-and-shape-changes-during-decoding.jpg"/>
+</p><p>According the image above, the only phrase to change LoD is beam search.</p>
+</div>
+<div class="section" id="beam-search-design">
+<span id="beam-search-design"></span><h2>Beam search design<a class="headerlink" href="#beam-search-design" title="Permalink to this headline">¶</a></h2>
+<p>The beam search algorthm will be implemented as one method of the sequence decoder, it has 3 inputs</p>
+<ol class="simple">
+<li><code class="docutils literal"><span class="pre">topk_ids</span></code>, top K candidate ids for each prefix.</li>
+<li><code class="docutils literal"><span class="pre">topk_scores</span></code>, the corresponding scores for <code class="docutils literal"><span class="pre">topk_ids</span></code></li>
+<li><code class="docutils literal"><span class="pre">generated_scores</span></code>, the score of the prefixes.</li>
+</ol>
+<p>All of the are LoDTensors, so that the sequence affilication is clear.
+Beam search will keep a beam for each prefix and select a smaller candidate set for each prefix.</p>
+<p>It will return three variables</p>
+<ol class="simple">
+<li><code class="docutils literal"><span class="pre">selected_ids</span></code>, the final candidate beam search function selected for the next step.</li>
+<li><code class="docutils literal"><span class="pre">selected_scores</span></code>, the scores for the candidates.</li>
+<li><code class="docutils literal"><span class="pre">generated_scores</span></code>, the updated scores for each prefixes (with the new candidates appended).</li>
+</ol>
+</div>
+<div class="section" id="introducing-the-lod-based-pack-and-unpack-methods-in-tensorarray">
+<span id="introducing-the-lod-based-pack-and-unpack-methods-in-tensorarray"></span><h2>Introducing the LoD-based <code class="docutils literal"><span class="pre">Pack</span></code> and <code class="docutils literal"><span class="pre">Unpack</span></code> methods in <code class="docutils literal"><span class="pre">TensorArray</span></code><a class="headerlink" href="#introducing-the-lod-based-pack-and-unpack-methods-in-tensorarray" title="Permalink to this headline">¶</a></h2>
+<p>The <code class="docutils literal"><span class="pre">selected_ids</span></code>, <code class="docutils literal"><span class="pre">selected_scores</span></code> and <code class="docutils literal"><span class="pre">generated_scores</span></code> are LoDTensors,
+and they exist in each time step,
+so it is natural to store them in arrays.</p>
+<p>Currently, PaddlePaddle has a module called <code class="docutils literal"><span class="pre">TensorArray</span></code> which can store an array of tensors,
+the results of beam search are better to store in a <code class="docutils literal"><span class="pre">TensorArray</span></code>.</p>
+<p>The <code class="docutils literal"><span class="pre">Pack</span></code> and <code class="docutils literal"><span class="pre">UnPack</span></code> in <code class="docutils literal"><span class="pre">TensorArray</span></code> are used to package tensors in the array to a <code class="docutils literal"><span class="pre">LoDTensor</span></code> or split the <code class="docutils literal"><span class="pre">LoDTensor</span></code> to an array of tensors.
+It needs some extensions to support pack or unpack an array of <code class="docutils literal"><span class="pre">LoDTensors</span></code>.</p>
+</div>
+</div>
+
+
+           </div>
+          </div>
+          <footer>
+  
+
+  <hr/>
+
+  <div role="contentinfo">
+    <p>
+        &copy; Copyright 2016, PaddlePaddle developers.
+
+    </p>
+  </div>
+  Built with <a href="http://sphinx-doc.org/">Sphinx</a> using a <a href="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>. 
+
+</footer>
+
+        </div>
+      </div>
+
+    </section>
+
+  </div>
+  
+
+
+  
+
+    <script type="text/javascript">
+        var DOCUMENTATION_OPTIONS = {
+            URL_ROOT:'../../',
+            VERSION:'',
+            COLLAPSE_INDEX:false,
+            FILE_SUFFIX:'.html',
+            HAS_SOURCE:  true,
+            SOURCELINK_SUFFIX: ".txt",
+        };
+    </script>
+      <script type="text/javascript" src="../../_static/jquery.js"></script>
+      <script type="text/javascript" src="../../_static/underscore.js"></script>
+      <script type="text/javascript" src="../../_static/doctools.js"></script>
+      <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
+       
+  
+
+  
+  
+    <script type="text/javascript" src="../../_static/js/theme.js"></script>
+  
+  
+  <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js" integrity="sha384-Tc5IQib027qvyjSMfHjOMaLkfuWVxZxUPnCJA7l2mCWNIpG9mGCD8wGNIcPD7Txa" crossorigin="anonymous"></script>
+  <script src="https://cdn.jsdelivr.net/perfect-scrollbar/0.6.14/js/perfect-scrollbar.jquery.min.js"></script>
+  <script src="../../_static/js/paddle_doc_init.js"></script> 
+
+</body>
+</html>
\ No newline at end of file
--- a/develop/doc/objects.inv
+++ b/develop/doc/objects.inv
--- a/develop/doc/searchindex.js
+++ b/develop/doc/searchindex.js
--- a/develop/doc_cn/_sources/design/ops/sequence_decoder.md.txt
+++ b/develop/doc_cn/_sources/design/ops/sequence_decoder.md.txt
+# Design: Sequence Decoder Generating LoDTensors
+In tasks such as machine translation and image to text, 
+a [sequence decoder](https://github.com/PaddlePaddle/book/blob/develop/08.machine_translation/README.md) is necessary to generate sequences.
+
+This documentation describes how to implement the sequence decoder as an operator.
+
+## Beam Search based Decoder
+The [beam search algorithm](https://en.wikipedia.org/wiki/Beam_search) is necessary when generating sequences, 
+it is a heuristic search algorithm that explores the paths by expanding the most promising node in a limited set.
+
+In the old version of PaddlePaddle, a C++ class `RecurrentGradientMachine` implements the general sequence decoder based on beam search, 
+due to the complexity, the implementation relays on a lot of special data structures, 
+quite trivial and hard to be customized by users.
+
+There are a lot of heuristic tricks in the sequence generation tasks, 
+so the flexibility of sequence decoder is very important to users.
+
+During PaddlePaddle's refactoring work,
+some new concept is proposed such as [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md) and [TensorArray](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/tensor_array.md) that can better support sequence usage,
+and they can help to make the implementation of beam search based sequence decoder **more transparent and modular** .
+
+For example, the RNN sates, candidates IDs and probabilities of beam search can be represented as `LoDTensors`;
+the selected candidate's IDs in each time step can be stored in a `TensorArray`, and `Packed` to the sentences translated.
+
+## Changing LoD's absolute offset to relative offsets
+The current `LoDTensor` is designed to store levels of variable-length sequences,
+it stores several arrays of integers each represents a level.
+
+The integers in each level represents the begin and end (not inclusive) offset of a sequence **in the underlying tensor**, 
+let's call this format the **absolute-offset LoD** for clear.
+
+The relative-offset LoD can fast retrieve any sequence but fails to represent empty sequences, for example, a two-level LoD is as follows
+```python
+[[0, 3, 9]
+ [0, 2, 3, 3, 3, 9]]
+```
+The first level tells that there are two sequences:
+- the first's offset is `[0, 3)`
+- the second's offset is `[3, 9)`
+
+while on the second level, there are several empty sequences that both begin and end at `3`.
+It is impossible to tell how many empty second-level sequences exist in the first-level sequences.
+
+There are many scenarios that relay on empty sequence representation,
+such as machine translation or image to text, one instance has no translations or the empty candidate set for a prefix.
+
+So let's introduce another format of LoD, 
+it stores **the offsets of the lower level sequences** and is called **relative-offset** LoD.
+
+For example, to represent the same sequences of the above data
+
+```python
+[[0, 3, 6]
+ [0, 2, 3, 3, 3, 9]]
+```
+
+the first level represents that there are two sequences, 
+their offsets in the second-level LoD is `[0, 3)` and `[3, 5)`.
+
+The second level is the same with the relative offset example because the lower level is a tensor.
+It is easy to find out the second sequence in the first-level LoD has two empty sequences.
+
+The following demos are based on relative-offset LoD.
+
+## Usage in a simple machine translation model
+Let's start from a simple machine translation model that is simplified from [machine translation chapter](https://github.com/PaddlePaddle/book/tree/develop/08.machine_translation) to draw a simple blueprint of what a sequence decoder can do and how to use it.
+
+The model has an encoder that learns the semantic vector from a sequence,
+and a decoder which uses the sequence decoder to generate new sentences.
+
+**Encoder**
+```python
+import paddle as pd
+
+dict_size = 8000
+source_dict_size = dict_size
+target_dict_size = dict_size
+word_vector_dim = 128
+encoder_dim = 128
+decoder_dim = 128
+beam_size = 5
+max_length = 120
+
+# encoder
+src_word_id = pd.data(
+    name='source_language_word',
+    type=pd.data.integer_value_sequence(source_dict_dim))
+src_embedding = pd.embedding(size=source_dict_size, size=word_vector_dim)
+
+src_word_vec = pd.lookup(src_embedding, src_word_id)
+
+encoder_out_seq = pd.gru(input=src_word_vec, size=encoder_dim)
+
+encoder_ctx = pd.last_seq(encoder_out_seq)
+# encoder_ctx_proj is the learned semantic vector
+encoder_ctx_proj = pd.fc(
+    encoder_ctx, size=decoder_dim, act=pd.activation.Tanh(), bias=None)
+```
+
+**Decoder**
+
+```python
+def generate():
+    decoder = pd.while_loop()
+    with decoder.step():
+        decoder_mem = decoder.memory(init=encoder_ctx)  # mark the memory
+        generated_ids = decoder.memory() # TODO init to batch_size <s>s
+        generated_scores = decoder.memory() # TODO init to batch_size 1s or 0s
+
+        target_word = pd.lookup(trg_embedding, gendrated_ids)
+        # expand encoder_ctx's batch to fit target_word's lod
+        # for example
+        # decoder_mem.lod is
+        # [[0 1 3],
+        #  [0 1 3 6]]
+        # its tensor content is [a1 a2 a3 a4 a5]
+        # which means there are 2 sentences to translate
+        #   - the first sentence has 1 translation prefixes, the offsets are [0, 1)
+        #   - the second sentence has 2 translation prefixes, the offsets are [1, 3) and [3, 6)
+        # the target_word.lod is 
+        # [[0, 1, 6]
+        #  [0, 2, 4, 7, 9 12]]
+        # which means 2 sentences to translate, each has 1 and 5 prefixes
+        # the first prefix has 2 candidates
+        # the following has 2, 3, 2, 3 candidates
+        # the encoder_ctx_expanded's content will be
+        # [a1 a1 a2 a2 a3 a3 a3 a4 a4 a5 a5 a5]
+        encoder_ctx_expanded = pd.lod_expand(encoder_ctx, target_word)
+        decoder_input = pd.fc(
+            act=pd.activation.Linear(),
+            input=[target_word, encoder_ctx],
+            size=3 * decoder_dim)
+        gru_out, cur_mem = pd.gru_step(
+            decoder_input, mem=decoder_mem, size=decoder_dim)
+        scores = pd.fc(
+            gru_out,
+            size=trg_dic_size,
+            bias=None,
+            act=pd.activation.Softmax())
+        # K is an config
+        topk_scores, topk_ids = pd.top_k(scores, K)
+        topk_generated_scores = pd.add_scalar(topk_scores, generated_scores)
+
+        selected_ids, selected_generation_scores = decoder.beam_search(
+            topk_ids, topk_generated_scores)
+
+        # update the states
+        decoder_mem.update(cur_mem)  # tells how to update state
+        generated_ids.update(selected_ids)
+        generated_scores.update(selected_generation_scores)
+
+        decoder.output(selected_ids)
+        decoder.output(selected_generation_scores)
+
+translation_ids, translation_scores = decoder()
+```
+The `decoder.beam_search` is a operator that given the candidates and the scores of translations including the candidates,
+return the result of the beam search algorithm.
+
+In this way, users can customize anything on the inputs or outputs of beam search, for example, two ways to prune some translation prefixes
+
+1. meke the correspondind elements in `topk_generated_scores` zero or some small values, beam_search will discard this candidate.
+2. remove some specific candidate in `selected_ids`
+3. get the final `translation_ids`, remove the translation sequence in it.
+
+The implementation of sequence decoder can reuse the C++ class [RNNAlgorithm](https://github.com/Superjom/Paddle/blob/68cac3c0f8451fe62a4cdf156747d6dc0ee000b3/paddle/operators/dynamic_recurrent_op.h#L30),
+so the python syntax is quite similar to a [RNN](https://github.com/Superjom/Paddle/blob/68cac3c0f8451fe62a4cdf156747d6dc0ee000b3/doc/design/block.md#blocks-with-for-and-rnnop).
+
+Both of them are two-level `LoDTensors`
+
+- the first level represents `batch_size` of (source) sentences;
+- the second level represents the candidate ID sets for translation prefix.
+
+for example, 3 source sentences to translate, and has 2, 3, 1 candidates.
+
+Unlike an RNN, in sequence decoder, the previous state and the current state have different LoD and shape,
+a `lod_expand` operator is used to expand the LoD of the previous state to fit the current state.
+
+For example, the previous state
+
+* LoD is `[0, 1, 3][0, 2, 5, 6]`
+* content of tensor is `a1 a2 b1 b2 b3 c1`
+
+the current state stored in `encoder_ctx_expanded`
+
+* LoD is `[0, 2, 7][0 3 5 8 9 11 11]`
+* the content is 
+  - a1 a1 a1 (a1 has 3 candidates, so the state should be copied 3 times for each candidates)
+  - a2 a2
+  - b1 b1 b1
+  - b2
+  - b3 b3
+  - None (c1 has 0 candidates, so c1 is dropped)
+
+Benefit from the relative offset LoD, empty candidate set can be represented naturally.
+
+the status in each time step can be stored in `TensorArray`, and `Pack`ed to a final LoDTensor, the corresponding syntax is 
+
+```python
+decoder.output(selected_ids)
+decoder.output(selected_generation_scores)
+```
+
+the `selected_ids` is the candidate ids for the prefixes, 
+it will be `Packed` by `TensorArray` to a two-level `LoDTensor`,
+the first level represents the source sequences,
+the second level represents generated sequences.
+
+Pack the `selected_scores` will get a `LoDTensor` that stores scores of each candidate of translations.
+
+Pack the `selected_generation_scores` will get a `LoDTensor`, and each tail is the probability of the translation.
+
+## LoD and shape changes during decoding
+<p align="center">
+  <img src="./images/LOD-and-shape-changes-during-decoding.jpg"/>
+</p>
+
+According the image above, the only phrase to change LoD is beam search.
+
+## Beam search design
+The beam search algorthm will be implemented as one method of the sequence decoder, it has 3 inputs
+
+1. `topk_ids`, top K candidate ids for each prefix.
+2. `topk_scores`, the corresponding scores for `topk_ids`
+3. `generated_scores`, the score of the prefixes.
+
+All of the are LoDTensors, so that the sequence affilication is clear.
+Beam search will keep a beam for each prefix and select a smaller candidate set for each prefix.
+
+It will return three variables
+
+1. `selected_ids`, the final candidate beam search function selected for the next step.
+2. `selected_scores`, the scores for the candidates.
+3. `generated_scores`, the updated scores for each prefixes (with the new candidates appended).
+
+## Introducing the LoD-based `Pack` and `Unpack` methods in `TensorArray`
+The `selected_ids`, `selected_scores` and `generated_scores` are LoDTensors,
+and they exist in each time step,
+so it is natural to store them in arrays.
+
+Currently, PaddlePaddle has a module called `TensorArray` which can store an array of tensors,
+the results of beam search are better to store in a `TensorArray`.
+
+The `Pack` and `UnPack` in `TensorArray` are used to package tensors in the array to a `LoDTensor` or split the `LoDTensor` to an array of tensors. 
+It needs some extensions to support pack or unpack an array of `LoDTensors`.
--- a/develop/doc_cn/design/ops/sequence_decoder.html
+++ b/develop/doc_cn/design/ops/sequence_decoder.html
+
+
+<!DOCTYPE html>
+<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
+<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
+<head>
+  <meta charset="utf-8">
+  
+  <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  
+  <title>Design: Sequence Decoder Generating LoDTensors &mdash; PaddlePaddle  文档</title>
+  
+
+  
+  
+
+  
+
+  
+  
+    
+
+  
+
+  
+  
+    <link rel="stylesheet" href="../../_static/css/theme.css" type="text/css" />
+  
+
+  
+  
+        <link rel="index" title="索引"
+              href="../../genindex.html"/>
+        <link rel="search" title="搜索" href="../../search.html"/>
+    <link rel="top" title="PaddlePaddle  文档" href="../../index.html"/> 
+
+  <link rel="stylesheet" href="https://cdn.jsdelivr.net/perfect-scrollbar/0.6.14/css/perfect-scrollbar.min.css" type="text/css" />
+  <link rel="stylesheet" href="../../_static/css/override.css" type="text/css" />
+  <script>
+  var _hmt = _hmt || [];
+  (function() {
+    var hm = document.createElement("script");
+    hm.src = "//hm.baidu.com/hm.js?b9a314ab40d04d805655aab1deee08ba";
+    var s = document.getElementsByTagName("script")[0]; 
+    s.parentNode.insertBefore(hm, s);
+  })();
+  </script>
+
+  
+
+  
+  <script src="../../_static/js/modernizr.min.js"></script>
+
+</head>
+
+<body class="wy-body-for-nav" role="document">
+
+  
+  <header class="site-header">
+    <div class="site-logo">
+      <a href="/"><img src="../../_static/images/PP_w.png"></a>
+    </div>
+    <div class="site-nav-links">
+      <div class="site-menu">
+        <a class="fork-on-github" href="https://github.com/PaddlePaddle/Paddle" target="_blank"><i class="fa fa-github"></i>Fork me on Github</a>
+        <div class="language-switcher dropdown">
+          <a type="button" data-toggle="dropdown">
+            <span>English</span>
+            <i class="fa fa-angle-up"></i>
+            <i class="fa fa-angle-down"></i>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="/doc_cn">中文</a></li>
+            <li><a href="/doc">English</a></li>
+          </ul>
+        </div>
+        <ul class="site-page-links">
+          <li><a href="/">Home</a></li>
+        </ul>
+      </div>
+      <div class="doc-module">
+        
+        <ul>
+<li class="toctree-l1"><a class="reference internal" href="../../getstarted/index_cn.html">新手入门</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../../howto/index_cn.html">进阶指南</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../../api/index_cn.html">API</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../../faq/index_cn.html">FAQ</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../../mobile/index_cn.html">MOBILE</a></li>
+</ul>
+
+        
+<div role="search">
+  <form id="rtd-search-form" class="wy-form" action="../../search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" />
+    <input type="hidden" name="check_keywords" value="yes" />
+    <input type="hidden" name="area" value="default" />
+  </form>
+</div>        
+      </div>
+    </div>
+  </header>
+  
+  <div class="main-content-wrap">
+
+    
+    <nav class="doc-menu-vertical" role="navigation">
+        
+          
+          <ul>
+<li class="toctree-l1"><a class="reference internal" href="../../getstarted/index_cn.html">新手入门</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../../getstarted/build_and_install/index_cn.html">安装与编译</a><ul>
+<li class="toctree-l3"><a class="reference internal" href="../../getstarted/build_and_install/docker_install_cn.html">PaddlePaddle的Docker容器使用方式</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../../getstarted/build_and_install/cmake/build_from_source_cn.html">PaddlePaddle的编译选项</a></li>
+</ul>
+</li>
+<li class="toctree-l2"><a class="reference internal" href="../../getstarted/concepts/use_concepts_cn.html">基本使用概念</a></li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal" href="../../howto/index_cn.html">进阶指南</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../../howto/usage/cmd_parameter/index_cn.html">设置命令行参数</a><ul>
+<li class="toctree-l3"><a class="reference internal" href="../../howto/usage/cmd_parameter/use_case_cn.html">使用案例</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../../howto/usage/cmd_parameter/arguments_cn.html">参数概述</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../../howto/usage/cmd_parameter/detail_introduction_cn.html">细节描述</a></li>
+</ul>
+</li>
+<li class="toctree-l2"><a class="reference internal" href="../../howto/usage/cluster/cluster_train_cn.html">PaddlePaddle分布式训练</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../../howto/usage/k8s/k8s_basis_cn.html">Kubernetes 简介</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../../howto/usage/k8s/k8s_cn.html">Kubernetes单机训练</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../../howto/usage/k8s/k8s_distributed_cn.html">Kubernetes分布式训练</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../../howto/dev/build_cn.html">编译PaddlePaddle和运行单元测试</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../../howto/dev/write_docs_cn.html">如何贡献/修改文档</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../../howto/deep_model/rnn/index_cn.html">RNN相关模型</a><ul>
+<li class="toctree-l3"><a class="reference internal" href="../../howto/deep_model/rnn/rnn_config_cn.html">RNN配置</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../../howto/deep_model/rnn/recurrent_group_cn.html">Recurrent Group教程</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../../howto/deep_model/rnn/hierarchical_layer_cn.html">支持双层序列作为输入的Layer</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../../howto/deep_model/rnn/hrnn_rnn_api_compare_cn.html">单双层RNN API对比介绍</a></li>
+</ul>
+</li>
+<li class="toctree-l2"><a class="reference internal" href="../../howto/optimization/gpu_profiling_cn.html">GPU性能分析与调优</a></li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal" href="../../api/index_cn.html">API</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../../api/v2/model_configs.html">模型配置</a><ul>
+<li class="toctree-l3"><a class="reference internal" href="../../api/v2/config/activation.html">Activation</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../../api/v2/config/layer.html">Layers</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../../api/v2/config/evaluators.html">Evaluators</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../../api/v2/config/optimizer.html">Optimizer</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../../api/v2/config/pooling.html">Pooling</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../../api/v2/config/networks.html">Networks</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../../api/v2/config/attr.html">Parameter Attribute</a></li>
+</ul>
+</li>
+<li class="toctree-l2"><a class="reference internal" href="../../api/v2/data.html">数据访问</a><ul>
+<li class="toctree-l3"><a class="reference internal" href="../../api/v2/data/data_reader.html">Data Reader Interface</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../../api/v2/data/image.html">Image Interface</a></li>
+<li class="toctree-l3"><a class="reference internal" href="../../api/v2/data/dataset.html">Dataset</a></li>
+</ul>
+</li>
+<li class="toctree-l2"><a class="reference internal" href="../../api/v2/run_logic.html">训练与应用</a></li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal" href="../../faq/index_cn.html">FAQ</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../../faq/build_and_install/index_cn.html">编译安装与单元测试</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../../faq/model/index_cn.html">模型配置</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../../faq/parameter/index_cn.html">参数设置</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../../faq/local/index_cn.html">本地训练与预测</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../../faq/cluster/index_cn.html">集群训练与预测</a></li>
+</ul>
+</li>
+<li class="toctree-l1"><a class="reference internal" href="../../mobile/index_cn.html">MOBILE</a><ul>
+<li class="toctree-l2"><a class="reference internal" href="../../mobile/cross_compiling_for_android_cn.html">构建Android平台上的PaddlePaddle库</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../../mobile/cross_compiling_for_ios_cn.html">构建iOS平台上的PaddlePaddle库</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../../mobile/cross_compiling_for_raspberry_cn.html">构建Raspberry Pi平台上的PaddlePaddle库</a></li>
+</ul>
+</li>
+</ul>
+
+        
+    </nav>
+    
+    <section class="doc-content-wrap">
+
+      
+
+ 
+
+
+
+
+
+
+
+<div role="navigation" aria-label="breadcrumbs navigation">
+  <ul class="wy-breadcrumbs">
+      
+    <li>Design: Sequence Decoder Generating LoDTensors</li>
+  </ul>
+</div>
+      
+      <div class="wy-nav-content" id="doc-content">
+        <div class="rst-content">
+          <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
+           <div itemprop="articleBody">
+            
+  <div class="section" id="design-sequence-decoder-generating-lodtensors">
+<span id="design-sequence-decoder-generating-lodtensors"></span><h1>Design: Sequence Decoder Generating LoDTensors<a class="headerlink" href="#design-sequence-decoder-generating-lodtensors" title="永久链接至标题">¶</a></h1>
+<p>In tasks such as machine translation and image to text,
+a <a class="reference external" href="https://github.com/PaddlePaddle/book/blob/develop/08.machine_translation/README.md">sequence decoder</a> is necessary to generate sequences.</p>
+<p>This documentation describes how to implement the sequence decoder as an operator.</p>
+<div class="section" id="beam-search-based-decoder">
+<span id="beam-search-based-decoder"></span><h2>Beam Search based Decoder<a class="headerlink" href="#beam-search-based-decoder" title="永久链接至标题">¶</a></h2>
+<p>The <a class="reference external" href="https://en.wikipedia.org/wiki/Beam_search">beam search algorithm</a> is necessary when generating sequences,
+it is a heuristic search algorithm that explores the paths by expanding the most promising node in a limited set.</p>
+<p>In the old version of PaddlePaddle, a C++ class <code class="docutils literal"><span class="pre">RecurrentGradientMachine</span></code> implements the general sequence decoder based on beam search,
+due to the complexity, the implementation relays on a lot of special data structures,
+quite trivial and hard to be customized by users.</p>
+<p>There are a lot of heuristic tricks in the sequence generation tasks,
+so the flexibility of sequence decoder is very important to users.</p>
+<p>During PaddlePaddle&#8217;s refactoring work,
+some new concept is proposed such as <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md">LoDTensor</a> and <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/tensor_array.md">TensorArray</a> that can better support sequence usage,
+and they can help to make the implementation of beam search based sequence decoder <strong>more transparent and modular</strong> .</p>
+<p>For example, the RNN sates, candidates IDs and probabilities of beam search can be represented as <code class="docutils literal"><span class="pre">LoDTensors</span></code>;
+the selected candidate&#8217;s IDs in each time step can be stored in a <code class="docutils literal"><span class="pre">TensorArray</span></code>, and <code class="docutils literal"><span class="pre">Packed</span></code> to the sentences translated.</p>
+</div>
+<div class="section" id="changing-lod-s-absolute-offset-to-relative-offsets">
+<span id="changing-lod-s-absolute-offset-to-relative-offsets"></span><h2>Changing LoD&#8217;s absolute offset to relative offsets<a class="headerlink" href="#changing-lod-s-absolute-offset-to-relative-offsets" title="永久链接至标题">¶</a></h2>
+<p>The current <code class="docutils literal"><span class="pre">LoDTensor</span></code> is designed to store levels of variable-length sequences,
+it stores several arrays of integers each represents a level.</p>
+<p>The integers in each level represents the begin and end (not inclusive) offset of a sequence <strong>in the underlying tensor</strong>,
+let&#8217;s call this format the <strong>absolute-offset LoD</strong> for clear.</p>
+<p>The relative-offset LoD can fast retrieve any sequence but fails to represent empty sequences, for example, a two-level LoD is as follows</p>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="p">[[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">9</span><span class="p">]</span>
+ <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">9</span><span class="p">]]</span>
+</pre></div>
+</div>
+<p>The first level tells that there are two sequences:</p>
+<ul class="simple">
+<li>the first&#8217;s offset is <code class="docutils literal"><span class="pre">[0,</span> <span class="pre">3)</span></code></li>
+<li>the second&#8217;s offset is <code class="docutils literal"><span class="pre">[3,</span> <span class="pre">9)</span></code></li>
+</ul>
+<p>while on the second level, there are several empty sequences that both begin and end at <code class="docutils literal"><span class="pre">3</span></code>.
+It is impossible to tell how many empty second-level sequences exist in the first-level sequences.</p>
+<p>There are many scenarios that relay on empty sequence representation,
+such as machine translation or image to text, one instance has no translations or the empty candidate set for a prefix.</p>
+<p>So let&#8217;s introduce another format of LoD,
+it stores <strong>the offsets of the lower level sequences</strong> and is called <strong>relative-offset</strong> LoD.</p>
+<p>For example, to represent the same sequences of the above data</p>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="p">[[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">6</span><span class="p">]</span>
+ <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">9</span><span class="p">]]</span>
+</pre></div>
+</div>
+<p>the first level represents that there are two sequences,
+their offsets in the second-level LoD is <code class="docutils literal"><span class="pre">[0,</span> <span class="pre">3)</span></code> and <code class="docutils literal"><span class="pre">[3,</span> <span class="pre">5)</span></code>.</p>
+<p>The second level is the same with the relative offset example because the lower level is a tensor.
+It is easy to find out the second sequence in the first-level LoD has two empty sequences.</p>
+<p>The following demos are based on relative-offset LoD.</p>
+</div>
+<div class="section" id="usage-in-a-simple-machine-translation-model">
+<span id="usage-in-a-simple-machine-translation-model"></span><h2>Usage in a simple machine translation model<a class="headerlink" href="#usage-in-a-simple-machine-translation-model" title="永久链接至标题">¶</a></h2>
+<p>Let&#8217;s start from a simple machine translation model that is simplified from <a class="reference external" href="https://github.com/PaddlePaddle/book/tree/develop/08.machine_translation">machine translation chapter</a> to draw a simple blueprint of what a sequence decoder can do and how to use it.</p>
+<p>The model has an encoder that learns the semantic vector from a sequence,
+and a decoder which uses the sequence decoder to generate new sentences.</p>
+<p><strong>Encoder</strong></p>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">paddle</span> <span class="kn">as</span> <span class="nn">pd</span>
+
+<span class="n">dict_size</span> <span class="o">=</span> <span class="mi">8000</span>
+<span class="n">source_dict_size</span> <span class="o">=</span> <span class="n">dict_size</span>
+<span class="n">target_dict_size</span> <span class="o">=</span> <span class="n">dict_size</span>
+<span class="n">word_vector_dim</span> <span class="o">=</span> <span class="mi">128</span>
+<span class="n">encoder_dim</span> <span class="o">=</span> <span class="mi">128</span>
+<span class="n">decoder_dim</span> <span class="o">=</span> <span class="mi">128</span>
+<span class="n">beam_size</span> <span class="o">=</span> <span class="mi">5</span>
+<span class="n">max_length</span> <span class="o">=</span> <span class="mi">120</span>
+
+<span class="c1"># encoder</span>
+<span class="n">src_word_id</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">data</span><span class="p">(</span>
+    <span class="n">name</span><span class="o">=</span><span class="s1">&#39;source_language_word&#39;</span><span class="p">,</span>
+    <span class="nb">type</span><span class="o">=</span><span class="n">pd</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">integer_value_sequence</span><span class="p">(</span><span class="n">source_dict_dim</span><span class="p">))</span>
+<span class="n">src_embedding</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">embedding</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="n">source_dict_size</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="n">word_vector_dim</span><span class="p">)</span>
+
+<span class="n">src_word_vec</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">lookup</span><span class="p">(</span><span class="n">src_embedding</span><span class="p">,</span> <span class="n">src_word_id</span><span class="p">)</span>
+
+<span class="n">encoder_out_seq</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">gru</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">src_word_vec</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="n">encoder_dim</span><span class="p">)</span>
+
+<span class="n">encoder_ctx</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">last_seq</span><span class="p">(</span><span class="n">encoder_out_seq</span><span class="p">)</span>
+<span class="c1"># encoder_ctx_proj is the learned semantic vector</span>
+<span class="n">encoder_ctx_proj</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">fc</span><span class="p">(</span>
+    <span class="n">encoder_ctx</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="n">decoder_dim</span><span class="p">,</span> <span class="n">act</span><span class="o">=</span><span class="n">pd</span><span class="o">.</span><span class="n">activation</span><span class="o">.</span><span class="n">Tanh</span><span class="p">(),</span> <span class="n">bias</span><span class="o">=</span><span class="bp">None</span><span class="p">)</span>
+</pre></div>
+</div>
+<p><strong>Decoder</strong></p>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">generate</span><span class="p">():</span>
+    <span class="n">decoder</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">while_loop</span><span class="p">()</span>
+    <span class="k">with</span> <span class="n">decoder</span><span class="o">.</span><span class="n">step</span><span class="p">():</span>
+        <span class="n">decoder_mem</span> <span class="o">=</span> <span class="n">decoder</span><span class="o">.</span><span class="n">memory</span><span class="p">(</span><span class="n">init</span><span class="o">=</span><span class="n">encoder_ctx</span><span class="p">)</span>  <span class="c1"># mark the memory</span>
+        <span class="n">generated_ids</span> <span class="o">=</span> <span class="n">decoder</span><span class="o">.</span><span class="n">memory</span><span class="p">()</span> <span class="c1"># TODO init to batch_size &lt;s&gt;s</span>
+        <span class="n">generated_scores</span> <span class="o">=</span> <span class="n">decoder</span><span class="o">.</span><span class="n">memory</span><span class="p">()</span> <span class="c1"># TODO init to batch_size 1s or 0s</span>
+
+        <span class="n">target_word</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">lookup</span><span class="p">(</span><span class="n">trg_embedding</span><span class="p">,</span> <span class="n">gendrated_ids</span><span class="p">)</span>
+        <span class="c1"># expand encoder_ctx&#39;s batch to fit target_word&#39;s lod</span>
+        <span class="c1"># for example</span>
+        <span class="c1"># decoder_mem.lod is</span>
+        <span class="c1"># [[0 1 3],</span>
+        <span class="c1">#  [0 1 3 6]]</span>
+        <span class="c1"># its tensor content is [a1 a2 a3 a4 a5]</span>
+        <span class="c1"># which means there are 2 sentences to translate</span>
+        <span class="c1">#   - the first sentence has 1 translation prefixes, the offsets are [0, 1)</span>
+        <span class="c1">#   - the second sentence has 2 translation prefixes, the offsets are [1, 3) and [3, 6)</span>
+        <span class="c1"># the target_word.lod is </span>
+        <span class="c1"># [[0, 1, 6]</span>
+        <span class="c1">#  [0, 2, 4, 7, 9 12]]</span>
+        <span class="c1"># which means 2 sentences to translate, each has 1 and 5 prefixes</span>
+        <span class="c1"># the first prefix has 2 candidates</span>
+        <span class="c1"># the following has 2, 3, 2, 3 candidates</span>
+        <span class="c1"># the encoder_ctx_expanded&#39;s content will be</span>
+        <span class="c1"># [a1 a1 a2 a2 a3 a3 a3 a4 a4 a5 a5 a5]</span>
+        <span class="n">encoder_ctx_expanded</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">lod_expand</span><span class="p">(</span><span class="n">encoder_ctx</span><span class="p">,</span> <span class="n">target_word</span><span class="p">)</span>
+        <span class="n">decoder_input</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">fc</span><span class="p">(</span>
+            <span class="n">act</span><span class="o">=</span><span class="n">pd</span><span class="o">.</span><span class="n">activation</span><span class="o">.</span><span class="n">Linear</span><span class="p">(),</span>
+            <span class="nb">input</span><span class="o">=</span><span class="p">[</span><span class="n">target_word</span><span class="p">,</span> <span class="n">encoder_ctx</span><span class="p">],</span>
+            <span class="n">size</span><span class="o">=</span><span class="mi">3</span> <span class="o">*</span> <span class="n">decoder_dim</span><span class="p">)</span>
+        <span class="n">gru_out</span><span class="p">,</span> <span class="n">cur_mem</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">gru_step</span><span class="p">(</span>
+            <span class="n">decoder_input</span><span class="p">,</span> <span class="n">mem</span><span class="o">=</span><span class="n">decoder_mem</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="n">decoder_dim</span><span class="p">)</span>
+        <span class="n">scores</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">fc</span><span class="p">(</span>
+            <span class="n">gru_out</span><span class="p">,</span>
+            <span class="n">size</span><span class="o">=</span><span class="n">trg_dic_size</span><span class="p">,</span>
+            <span class="n">bias</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span>
+            <span class="n">act</span><span class="o">=</span><span class="n">pd</span><span class="o">.</span><span class="n">activation</span><span class="o">.</span><span class="n">Softmax</span><span class="p">())</span>
+        <span class="c1"># K is an config</span>
+        <span class="n">topk_scores</span><span class="p">,</span> <span class="n">topk_ids</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">top_k</span><span class="p">(</span><span class="n">scores</span><span class="p">,</span> <span class="n">K</span><span class="p">)</span>
+        <span class="n">topk_generated_scores</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">add_scalar</span><span class="p">(</span><span class="n">topk_scores</span><span class="p">,</span> <span class="n">generated_scores</span><span class="p">)</span>
+
+        <span class="n">selected_ids</span><span class="p">,</span> <span class="n">selected_generation_scores</span> <span class="o">=</span> <span class="n">decoder</span><span class="o">.</span><span class="n">beam_search</span><span class="p">(</span>
+            <span class="n">topk_ids</span><span class="p">,</span> <span class="n">topk_generated_scores</span><span class="p">)</span>
+
+        <span class="c1"># update the states</span>
+        <span class="n">decoder_mem</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="n">cur_mem</span><span class="p">)</span>  <span class="c1"># tells how to update state</span>
+        <span class="n">generated_ids</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="n">selected_ids</span><span class="p">)</span>
+        <span class="n">generated_scores</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="n">selected_generation_scores</span><span class="p">)</span>
+
+        <span class="n">decoder</span><span class="o">.</span><span class="n">output</span><span class="p">(</span><span class="n">selected_ids</span><span class="p">)</span>
+        <span class="n">decoder</span><span class="o">.</span><span class="n">output</span><span class="p">(</span><span class="n">selected_generation_scores</span><span class="p">)</span>
+
+<span class="n">translation_ids</span><span class="p">,</span> <span class="n">translation_scores</span> <span class="o">=</span> <span class="n">decoder</span><span class="p">()</span>
+</pre></div>
+</div>
+<p>The <code class="docutils literal"><span class="pre">decoder.beam_search</span></code> is a operator that given the candidates and the scores of translations including the candidates,
+return the result of the beam search algorithm.</p>
+<p>In this way, users can customize anything on the inputs or outputs of beam search, for example, two ways to prune some translation prefixes</p>
+<ol class="simple">
+<li>meke the correspondind elements in <code class="docutils literal"><span class="pre">topk_generated_scores</span></code> zero or some small values, beam_search will discard this candidate.</li>
+<li>remove some specific candidate in <code class="docutils literal"><span class="pre">selected_ids</span></code></li>
+<li>get the final <code class="docutils literal"><span class="pre">translation_ids</span></code>, remove the translation sequence in it.</li>
+</ol>
+<p>The implementation of sequence decoder can reuse the C++ class <a class="reference external" href="https://github.com/Superjom/Paddle/blob/68cac3c0f8451fe62a4cdf156747d6dc0ee000b3/paddle/operators/dynamic_recurrent_op.h#L30">RNNAlgorithm</a>,
+so the python syntax is quite similar to a <a class="reference external" href="https://github.com/Superjom/Paddle/blob/68cac3c0f8451fe62a4cdf156747d6dc0ee000b3/doc/design/block.md#blocks-with-for-and-rnnop">RNN</a>.</p>
+<p>Both of them are two-level <code class="docutils literal"><span class="pre">LoDTensors</span></code></p>
+<ul class="simple">
+<li>the first level represents <code class="docutils literal"><span class="pre">batch_size</span></code> of (source) sentences;</li>
+<li>the second level represents the candidate ID sets for translation prefix.</li>
+</ul>
+<p>for example, 3 source sentences to translate, and has 2, 3, 1 candidates.</p>
+<p>Unlike an RNN, in sequence decoder, the previous state and the current state have different LoD and shape,
+a <code class="docutils literal"><span class="pre">lod_expand</span></code> operator is used to expand the LoD of the previous state to fit the current state.</p>
+<p>For example, the previous state</p>
+<ul class="simple">
+<li>LoD is <code class="docutils literal"><span class="pre">[0,</span> <span class="pre">1,</span> <span class="pre">3][0,</span> <span class="pre">2,</span> <span class="pre">5,</span> <span class="pre">6]</span></code></li>
+<li>content of tensor is <code class="docutils literal"><span class="pre">a1</span> <span class="pre">a2</span> <span class="pre">b1</span> <span class="pre">b2</span> <span class="pre">b3</span> <span class="pre">c1</span></code></li>
+</ul>
+<p>the current state stored in <code class="docutils literal"><span class="pre">encoder_ctx_expanded</span></code></p>
+<ul class="simple">
+<li>LoD is <code class="docutils literal"><span class="pre">[0,</span> <span class="pre">2,</span> <span class="pre">7][0</span> <span class="pre">3</span> <span class="pre">5</span> <span class="pre">8</span> <span class="pre">9</span> <span class="pre">11</span> <span class="pre">11]</span></code></li>
+<li>the content is<ul>
+<li>a1 a1 a1 (a1 has 3 candidates, so the state should be copied 3 times for each candidates)</li>
+<li>a2 a2</li>
+<li>b1 b1 b1</li>
+<li>b2</li>
+<li>b3 b3</li>
+<li>None (c1 has 0 candidates, so c1 is dropped)</li>
+</ul>
+</li>
+</ul>
+<p>Benefit from the relative offset LoD, empty candidate set can be represented naturally.</p>
+<p>the status in each time step can be stored in <code class="docutils literal"><span class="pre">TensorArray</span></code>, and <code class="docutils literal"><span class="pre">Pack</span></code>ed to a final LoDTensor, the corresponding syntax is</p>
+<div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">decoder</span><span class="o">.</span><span class="n">output</span><span class="p">(</span><span class="n">selected_ids</span><span class="p">)</span>
+<span class="n">decoder</span><span class="o">.</span><span class="n">output</span><span class="p">(</span><span class="n">selected_generation_scores</span><span class="p">)</span>
+</pre></div>
+</div>
+<p>the <code class="docutils literal"><span class="pre">selected_ids</span></code> is the candidate ids for the prefixes,
+it will be <code class="docutils literal"><span class="pre">Packed</span></code> by <code class="docutils literal"><span class="pre">TensorArray</span></code> to a two-level <code class="docutils literal"><span class="pre">LoDTensor</span></code>,
+the first level represents the source sequences,
+the second level represents generated sequences.</p>
+<p>Pack the <code class="docutils literal"><span class="pre">selected_scores</span></code> will get a <code class="docutils literal"><span class="pre">LoDTensor</span></code> that stores scores of each candidate of translations.</p>
+<p>Pack the <code class="docutils literal"><span class="pre">selected_generation_scores</span></code> will get a <code class="docutils literal"><span class="pre">LoDTensor</span></code>, and each tail is the probability of the translation.</p>
+</div>
+<div class="section" id="lod-and-shape-changes-during-decoding">
+<span id="lod-and-shape-changes-during-decoding"></span><h2>LoD and shape changes during decoding<a class="headerlink" href="#lod-and-shape-changes-during-decoding" title="永久链接至标题">¶</a></h2>
+<p align="center">
+  <img src="./images/LOD-and-shape-changes-during-decoding.jpg"/>
+</p><p>According the image above, the only phrase to change LoD is beam search.</p>
+</div>
+<div class="section" id="beam-search-design">
+<span id="beam-search-design"></span><h2>Beam search design<a class="headerlink" href="#beam-search-design" title="永久链接至标题">¶</a></h2>
+<p>The beam search algorthm will be implemented as one method of the sequence decoder, it has 3 inputs</p>
+<ol class="simple">
+<li><code class="docutils literal"><span class="pre">topk_ids</span></code>, top K candidate ids for each prefix.</li>
+<li><code class="docutils literal"><span class="pre">topk_scores</span></code>, the corresponding scores for <code class="docutils literal"><span class="pre">topk_ids</span></code></li>
+<li><code class="docutils literal"><span class="pre">generated_scores</span></code>, the score of the prefixes.</li>
+</ol>
+<p>All of the are LoDTensors, so that the sequence affilication is clear.
+Beam search will keep a beam for each prefix and select a smaller candidate set for each prefix.</p>
+<p>It will return three variables</p>
+<ol class="simple">
+<li><code class="docutils literal"><span class="pre">selected_ids</span></code>, the final candidate beam search function selected for the next step.</li>
+<li><code class="docutils literal"><span class="pre">selected_scores</span></code>, the scores for the candidates.</li>
+<li><code class="docutils literal"><span class="pre">generated_scores</span></code>, the updated scores for each prefixes (with the new candidates appended).</li>
+</ol>
+</div>
+<div class="section" id="introducing-the-lod-based-pack-and-unpack-methods-in-tensorarray">
+<span id="introducing-the-lod-based-pack-and-unpack-methods-in-tensorarray"></span><h2>Introducing the LoD-based <code class="docutils literal"><span class="pre">Pack</span></code> and <code class="docutils literal"><span class="pre">Unpack</span></code> methods in <code class="docutils literal"><span class="pre">TensorArray</span></code><a class="headerlink" href="#introducing-the-lod-based-pack-and-unpack-methods-in-tensorarray" title="永久链接至标题">¶</a></h2>
+<p>The <code class="docutils literal"><span class="pre">selected_ids</span></code>, <code class="docutils literal"><span class="pre">selected_scores</span></code> and <code class="docutils literal"><span class="pre">generated_scores</span></code> are LoDTensors,
+and they exist in each time step,
+so it is natural to store them in arrays.</p>
+<p>Currently, PaddlePaddle has a module called <code class="docutils literal"><span class="pre">TensorArray</span></code> which can store an array of tensors,
+the results of beam search are better to store in a <code class="docutils literal"><span class="pre">TensorArray</span></code>.</p>
+<p>The <code class="docutils literal"><span class="pre">Pack</span></code> and <code class="docutils literal"><span class="pre">UnPack</span></code> in <code class="docutils literal"><span class="pre">TensorArray</span></code> are used to package tensors in the array to a <code class="docutils literal"><span class="pre">LoDTensor</span></code> or split the <code class="docutils literal"><span class="pre">LoDTensor</span></code> to an array of tensors.
+It needs some extensions to support pack or unpack an array of <code class="docutils literal"><span class="pre">LoDTensors</span></code>.</p>
+</div>
+</div>
+
+
+           </div>
+          </div>
+          <footer>
+  
+
+  <hr/>
+
+  <div role="contentinfo">
+    <p>
+        &copy; Copyright 2016, PaddlePaddle developers.
+
+    </p>
+  </div>
+  Built with <a href="http://sphinx-doc.org/">Sphinx</a> using a <a href="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>. 
+
+</footer>
+
+        </div>
+      </div>
+
+    </section>
+
+  </div>
+  
+
+
+  
+
+    <script type="text/javascript">
+        var DOCUMENTATION_OPTIONS = {
+            URL_ROOT:'../../',
+            VERSION:'',
+            COLLAPSE_INDEX:false,
+            FILE_SUFFIX:'.html',
+            HAS_SOURCE:  true,
+            SOURCELINK_SUFFIX: ".txt",
+        };
+    </script>
+      <script type="text/javascript" src="../../_static/jquery.js"></script>
+      <script type="text/javascript" src="../../_static/underscore.js"></script>
+      <script type="text/javascript" src="../../_static/doctools.js"></script>
+      <script type="text/javascript" src="../../_static/translations.js"></script>
+      <script type="text/javascript" src="https://cdn.bootcss.com/mathjax/2.7.0/MathJax.js"></script>
+       
+  
+
+  
+  
+    <script type="text/javascript" src="../../_static/js/theme.js"></script>
+  
+  
+  <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js" integrity="sha384-Tc5IQib027qvyjSMfHjOMaLkfuWVxZxUPnCJA7l2mCWNIpG9mGCD8wGNIcPD7Txa" crossorigin="anonymous"></script>
+  <script src="https://cdn.jsdelivr.net/perfect-scrollbar/0.6.14/js/perfect-scrollbar.jquery.min.js"></script>
+  <script src="../../_static/js/paddle_doc_init.js"></script> 
+
+</body>
+</html>
\ No newline at end of file
--- a/develop/doc_cn/objects.inv
+++ b/develop/doc_cn/objects.inv
--- a/develop/doc_cn/searchindex.js
+++ b/develop/doc_cn/searchindex.js